#### Feature Engineering ####
The process of `feature engineering` includes following steps:

- Brainstorming or Testing features;
- Deciding what features to create;
- Creating features;
- Checking how the features work with your model;
- Improving your features if needed;
- Go back to brainstorming/creating more features until the work is done.


In [1]:
%matplotlib inline
import pandas as pd
import numpy as np
import requests
import json
import holidays as hd
import calendar
from datetime import datetime, date
from pprint import pprint

In [2]:
cycle_usage = pd.read_csv("cycleusage_cleansed2.csv")
cycle_usage.count()

StartStation Id               376625
Start Date                    376625
EndStation Id                 376625
End Date                      376625
Duration                      376625
StartStation Id Used          376625
EndStation Id Used            376625
Frequency                     376625
StartStation Address          376625
StartStation latitude         376625
StartStation longitude        376625
StartStation capacity         376625
EndStation Address            376625
EndStation latitude           376625
EndStation longitude          376625
EndStation capacity           376625
Daily Weather                 376625
Hourly Weather                376625
Humidity                      376625
Windspeed                     376625
Apparent Temperature (Avg)    376625
dtype: int64

In [3]:
cycle_usage.head()

Unnamed: 0,StartStation Id,Start Date,EndStation Id,End Date,Duration,StartStation Id Used,EndStation Id Used,Frequency,StartStation Address,StartStation latitude,...,StartStation capacity,EndStation Address,EndStation latitude,EndStation longitude,EndStation capacity,Daily Weather,Hourly Weather,Humidity,Windspeed,Apparent Temperature (Avg)
0,14,02/04/2016 15:52,89,02/04/2016 15:54,120,394178,1458,1410,"Belgrove Street , King's Cross",51.529944,...,48,"Tavistock Place, Bloomsbury",51.52625,-0.12351,19,fog,"[{'time': 1459551600, 'summary': 'Clear', 'ico...",0.67,2.96,52.035
1,14,04/04/2016 11:21,89,04/04/2016 11:23,120,394178,1458,1410,"Belgrove Street , King's Cross",51.529944,...,48,"Tavistock Place, Bloomsbury",51.52625,-0.12351,19,partly-cloudy-day,"[{'time': 1459724400, 'summary': 'Mostly Cloud...",0.83,3.26,50.025
2,14,04/04/2016 11:43,89,04/04/2016 11:46,180,394178,1458,1410,"Belgrove Street , King's Cross",51.529944,...,48,"Tavistock Place, Bloomsbury",51.52625,-0.12351,19,partly-cloudy-day,"[{'time': 1459724400, 'summary': 'Mostly Cloud...",0.83,3.26,50.025
3,14,06/04/2016 01:07,89,06/04/2016 01:10,180,394178,1458,1410,"Belgrove Street , King's Cross",51.529944,...,48,"Tavistock Place, Bloomsbury",51.52625,-0.12351,19,partly-cloudy-day,"[{'time': 1459897200, 'summary': 'Clear', 'ico...",0.72,5.21,46.15
4,14,06/04/2016 18:46,89,06/04/2016 18:49,180,394178,1458,1410,"Belgrove Street , King's Cross",51.529944,...,48,"Tavistock Place, Bloomsbury",51.52625,-0.12351,19,partly-cloudy-day,"[{'time': 1459897200, 'summary': 'Clear', 'ico...",0.72,5.21,46.15


In [4]:
rm_columns = {
    #"StartStation Id",
    #"Start Date",
    "StartStation Address",
   # "StartStation capacity",
    #"EndStation Id",
    "End Date",
    "EndStation Address",
   # "EndStation capacity",
   # "Duration",
   # "Frequency",
  #  "Humidity",
   # "Windspeed",
  #  "Apparent Temperature (Avg)",
    "StartStation Id Used",
    "EndStation Id Used",
    "StartStation latitude",
    "StartStation longitude",
    "EndStation latitude",
    "EndStation longitude",
    "Hourly Weather",
   # "distance (geodesic)"
   # "Daily Weather"
}

cycle_usage.drop(columns=rm_columns, inplace=True)
cycle_usage.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 376625 entries, 0 to 376624
Data columns (total 11 columns):
StartStation Id               376625 non-null int64
Start Date                    376625 non-null object
EndStation Id                 376625 non-null int64
Duration                      376625 non-null int64
Frequency                     376625 non-null int64
StartStation capacity         376625 non-null int64
EndStation capacity           376625 non-null int64
Daily Weather                 376625 non-null object
Humidity                      376625 non-null float64
Windspeed                     376625 non-null float64
Apparent Temperature (Avg)    376625 non-null float64
dtypes: float64(3), int64(6), object(2)
memory usage: 31.6+ MB


In [5]:
# Check for empty values and empty strings
np.where(pd.isnull(cycle_usage))
np.where(cycle_usage.applymap(lambda x: x == ''))

(array([], dtype=int64), array([], dtype=int64))

In [6]:
cycle_usage.dropna(how='any', thresh=None, subset=None, inplace=True)
cycle_usage.count()

StartStation Id               376625
Start Date                    376625
EndStation Id                 376625
Duration                      376625
Frequency                     376625
StartStation capacity         376625
EndStation capacity           376625
Daily Weather                 376625
Humidity                      376625
Windspeed                     376625
Apparent Temperature (Avg)    376625
dtype: int64

#### Darksky note:#####
> Our system is presently very simple: it finds the “worst” weather condition that will happen during the day (4AM to 4AM), and uses the icon for it. The only case where a daily icon will show a *-night value is partly-cloudy-night, and this is done to match the daily summary text. We already have intentions to change this behavior, because it is confusing. 
In the meantime, you can assume that if partly-cloudy-night is the worst weather condition that was found, that it was clear during the day. So you can just treat partly-cloudy-night as an alias for clear-day.

In [7]:
cycle_usage.groupby(by="Daily Weather").count()
cycle_usage["Daily Weather"].loc[cycle_usage["Daily Weather"]=="partly-cloudy-night"] = "clear-day"

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)


In [8]:
cycle_usage.index.max()

376624

In [9]:
cycle_usage.reset_index(drop=True, inplace=True)
cycle_usage.size

4142875

In [10]:
# Anomaly detection of date format
for index, p in cycle_usage.iterrows():
    if (len(p["Start Date"]) == 19):
        print(index, p["Start Date"])
        cycle_usage["Start Date"].iloc[index] = p["Start Date"][:15]
        
    elif (len(p["Start Date"]) > 19):
        print("anomaly", index, p["Start Date"])
        cycle_usage["Start Date"].iloc[index] = str(p["Start Date"][:10] + " " +p["Start Date"][-5:])

cycle_usage.dropna(inplace=True)
cycle_usage.count()


391 25/12/2015 10:50:00
392 25/12/2015 12:49:00
393 25/12/2015 12:50:00
394 25/12/2015 12:52:00
395 25/12/2015 20:05:00
396 25/12/2015 20:05:00
397 26/12/2015 19:00:00
398 28/12/2015 11:50:00
399 29/12/2015 10:01:00
400 29/12/2015 18:47:00
401 30/12/2015 14:27:00
402 01/01/2016 12:26:00
403 02/01/2016 19:26:00
404 04/01/2016 17:53:00
405 06/01/2016 17:44:00
406 08/01/2016 13:01:00
407 13/12/2015 13:05:00
408 13/12/2015 20:25:00
409 14/12/2015 18:38:00
410 15/12/2015 08:25:00
411 15/12/2015 22:17:00
412 16/12/2015 21:27:00
413 19/12/2015 12:01:00
414 20/12/2015 14:52:00
415 22/12/2015 18:56:00
1493 25/12/2015 16:25:00
1494 06/01/2016 07:38:00
1495 06/01/2016 08:23:00
1496 14/12/2015 07:15:00
1497 15/12/2015 08:23:00
1498 16/12/2015 07:40:00
1499 17/12/2015 07:44:00
1500 21/12/2015 07:17:00
1714 26/12/2015 00:27:00
2065 04/01/2016 17:37:00
2066 07/01/2016 17:16:00
2067 15/12/2015 22:57:00
2068 16/12/2015 16:27:00
2069 17/12/2015 16:00:00
2070 17/12/2015 18:13:00
2071 22/12/2015 06:16:00


24883 15/12/2015 13:24:00
24884 15/12/2015 16:41:00
24885 16/12/2015 06:57:00
24886 16/12/2015 17:38:00
24887 17/12/2015 06:59:00
24888 17/12/2015 07:28:00
24889 17/12/2015 08:37:00
24890 17/12/2015 17:00:00
24891 17/12/2015 19:38:00
24892 18/12/2015 08:30:00
24893 18/12/2015 08:45:00
24894 21/12/2015 08:14:00
24895 21/12/2015 08:33:00
24896 21/12/2015 19:08:00
24897 22/12/2015 08:28:00
24898 22/12/2015 17:10:00
26253 26/12/2015 20:24:00
26254 29/12/2015 17:44:00
26255 05/01/2016 17:42:00
26256 16/12/2015 23:36:00
26629 04/01/2016 08:12:00
26630 05/01/2016 08:12:00
26631 06/01/2016 08:12:00
26632 14/12/2015 07:48:00
26633 15/12/2015 07:43:00
26634 17/12/2015 07:46:00
26635 18/12/2015 07:49:00
28852 26/12/2015 01:15:00
28853 26/12/2015 01:15:00
28854 26/12/2015 01:16:00
28855 26/12/2015 11:27:00
28856 27/12/2015 10:14:00
28857 27/12/2015 16:31:00
28858 27/12/2015 17:00:00
28859 27/12/2015 17:04:00
28860 27/12/2015 17:48:00
28861 01/01/2016 04:13:00
28862 01/01/2016 04:13:00
28863 01/01/

50290 14/12/2015 08:56:00
50769 04/01/2016 06:39:00
50770 04/01/2016 08:42:00
50771 05/01/2016 06:40:00
50772 06/01/2016 06:39:00
50773 06/01/2016 08:45:00
50774 06/01/2016 08:56:00
50775 08/01/2016 06:43:00
50776 09/01/2016 06:47:00
50777 14/12/2015 06:39:00
50778 14/12/2015 08:49:00
50779 14/12/2015 08:58:00
50780 15/12/2015 06:21:00
50781 16/12/2015 06:23:00
50782 17/12/2015 06:26:00
50783 17/12/2015 08:49:00
50784 17/12/2015 09:26:00
50785 18/12/2015 08:44:00
50786 18/12/2015 09:11:00
50787 21/12/2015 06:23:00
50788 21/12/2015 09:02:00
50789 22/12/2015 06:40:00
50790 22/12/2015 09:23:00
50791 23/12/2015 08:51:00
52315 30/12/2015 08:38:00
52316 04/01/2016 07:24:00
52317 06/01/2016 08:45:00
52318 07/01/2016 19:18:00
52319 08/01/2016 07:23:00
52320 15/12/2015 07:23:00
52321 15/12/2015 08:33:00
52322 16/12/2015 08:13:00
52323 16/12/2015 17:41:00
52324 21/12/2015 07:23:00
52325 22/12/2015 07:22:00
53367 18/12/2015 01:02:00
53445 27/12/2015 10:24:00
53572 04/01/2016 14:11:00
53620 04/01/

73677 04/01/2016 06:43:00
73678 04/01/2016 07:33:00
73679 04/01/2016 07:43:00
73680 04/01/2016 07:57:00
73681 04/01/2016 08:18:00
73682 05/01/2016 06:45:00
73683 05/01/2016 07:58:00
73684 05/01/2016 08:12:00
73685 05/01/2016 08:20:00
73686 06/01/2016 06:24:00
73687 06/01/2016 07:14:00
73688 06/01/2016 07:43:00
73689 06/01/2016 07:55:00
73690 06/01/2016 07:56:00
73691 07/01/2016 06:41:00
73692 07/01/2016 06:50:00
73693 07/01/2016 07:42:00
73694 07/01/2016 08:06:00
73695 07/01/2016 09:04:00
73696 07/01/2016 16:20:00
73697 08/01/2016 07:39:00
73698 08/01/2016 09:27:00
73699 14/12/2015 06:35:00
73700 14/12/2015 07:45:00
73701 14/12/2015 08:47:00
73702 15/12/2015 06:47:00
73703 15/12/2015 07:14:00
73704 15/12/2015 08:03:00
73705 15/12/2015 08:12:00
73706 15/12/2015 08:13:00
73707 15/12/2015 09:22:00
73708 15/12/2015 09:50:00
73709 15/12/2015 09:51:00
73710 16/12/2015 06:31:00
73711 16/12/2015 07:51:00
73712 16/12/2015 08:15:00
73713 17/12/2015 06:44:00
73714 17/12/2015 07:45:00
73715 17/12/

99257 07/01/2016 09:16:00
99258 08/01/2016 08:13:00
99259 08/01/2016 08:41:00
99260 08/01/2016 08:46:00
99261 08/01/2016 08:48:00
99262 14/12/2015 07:39:00
99263 14/12/2015 08:30:00
99264 14/12/2015 08:36:00
99265 14/12/2015 09:03:00
99266 15/12/2015 08:18:00
99267 16/12/2015 08:01:00
99268 16/12/2015 08:29:00
99269 17/12/2015 07:47:00
99270 17/12/2015 08:12:00
99271 17/12/2015 08:40:00
99272 17/12/2015 08:53:00
99273 17/12/2015 09:02:00
99274 17/12/2015 10:12:00
99275 18/12/2015 07:30:00
99276 18/12/2015 08:50:00
99277 18/12/2015 09:13:00
99278 18/12/2015 09:27:00
99279 18/12/2015 12:28:00
99280 21/12/2015 08:13:00
99281 21/12/2015 08:29:00
99282 21/12/2015 08:53:00
99283 22/12/2015 07:58:00
99284 22/12/2015 08:12:00
99285 23/12/2015 08:34:00
99286 23/12/2015 09:11:00
99287 23/12/2015 11:31:00
99288 23/12/2015 11:33:00
99289 24/12/2015 09:43:00
102008 06/01/2016 09:06:00
102687 28/12/2015 09:25:00
102816 05/01/2016 08:10:00
102817 06/01/2016 08:10:00
102818 08/01/2016 08:09:00
102819 

126127 17/12/2015 09:02:00
126128 18/12/2015 07:52:00
126129 18/12/2015 08:16:00
126130 18/12/2015 08:31:00
126131 21/12/2015 00:37:00
126132 21/12/2015 08:42:00
126133 22/12/2015 08:12:00
126134 22/12/2015 08:28:00
126135 22/12/2015 08:29:00
126136 23/12/2015 08:34:00
129362 28/12/2015 20:56:00
129363 28/12/2015 21:06:00
129364 29/12/2015 14:07:00
129365 31/12/2015 07:33:00
129366 05/01/2016 08:09:00
129367 08/01/2016 09:06:00
129368 17/12/2015 09:05:00
129369 17/12/2015 21:24:00
129370 18/12/2015 09:05:00
129995 25/12/2015 10:35:00
129996 25/12/2015 12:38:00
129997 31/12/2015 14:23:00
129998 08/01/2016 17:24:00
129999 18/12/2015 17:32:00
130000 22/12/2015 18:30:00
130001 22/12/2015 18:30:00
130181 21/12/2015 10:01:00
130562 29/12/2015 09:48:00
130563 04/01/2016 13:17:00
130564 13/12/2015 09:37:00
130565 13/12/2015 09:38:00
130566 16/12/2015 12:09:00
130567 17/12/2015 10:29:00
130568 19/12/2015 13:09:00
130569 20/12/2015 06:33:00
130570 21/12/2015 09:15:00
130571 22/12/2015 08:44:00
1

155706 05/01/2016 08:05:00
155707 07/01/2016 08:04:00
155708 08/01/2016 08:03:00
155709 14/12/2015 08:15:00
155710 15/12/2015 08:04:00
155711 15/12/2015 08:10:00
155712 16/12/2015 08:05:00
155713 17/12/2015 07:57:00
155714 18/12/2015 07:57:00
155715 21/12/2015 07:54:00
155716 23/12/2015 07:58:00
156650 05/01/2016 07:32:00
156651 05/01/2016 08:23:00
156652 06/01/2016 07:54:00
156653 07/01/2016 08:01:00
156654 08/01/2016 09:02:00
156655 08/01/2016 10:10:00
156656 08/01/2016 23:54:00
156657 14/12/2015 07:49:00
156658 14/12/2015 14:31:00
156659 16/12/2015 07:37:00
156660 19/12/2015 13:07:00
156661 23/12/2015 08:31:00
158358 06/01/2016 08:56:00
158767 20/12/2015 19:10:00
159109 07/01/2016 07:27:00
159110 08/01/2016 07:25:00
159111 18/12/2015 17:25:00
159112 21/12/2015 07:26:00
159113 22/12/2015 07:26:00
159114 23/12/2015 07:25:00
159115 24/12/2015 06:56:00
159116 24/12/2015 07:26:00
159679 25/12/2015 13:15:00
159680 25/12/2015 13:16:00
159681 30/12/2015 08:23:00
159957 08/01/2016 06:25:00
1

183187 08/01/2016 08:42:00
183188 13/12/2015 04:55:00
183189 14/12/2015 09:58:00
183190 16/12/2015 21:00:00
183191 17/12/2015 08:44:00
183192 18/12/2015 08:28:00
183193 18/12/2015 08:30:00
185408 29/12/2015 09:21:00
185409 30/12/2015 09:47:00
185410 04/01/2016 07:39:00
185411 04/01/2016 07:43:00
185412 04/01/2016 07:55:00
185413 04/01/2016 08:36:00
185414 04/01/2016 13:32:00
185415 04/01/2016 16:14:00
185416 05/01/2016 07:40:00
185417 05/01/2016 07:45:00
185418 05/01/2016 08:07:00
185419 05/01/2016 08:28:00
185420 05/01/2016 08:29:00
185421 05/01/2016 08:32:00
185422 06/01/2016 07:38:00
185423 06/01/2016 07:49:00
185424 06/01/2016 08:13:00
185425 06/01/2016 08:24:00
185426 06/01/2016 08:34:00
185427 06/01/2016 08:39:00
185428 06/01/2016 09:03:00
185429 07/01/2016 07:40:00
185430 07/01/2016 07:45:00
185431 07/01/2016 08:30:00
185432 08/01/2016 06:47:00
185433 08/01/2016 07:38:00
185434 08/01/2016 07:43:00
185435 08/01/2016 08:32:00
185436 08/01/2016 08:34:00
185437 08/01/2016 08:34:00
1

207834 27/12/2015 16:30:00
207835 04/01/2016 07:46:00
207836 04/01/2016 08:37:00
207837 05/01/2016 07:45:00
207838 05/01/2016 08:41:00
207839 06/01/2016 07:50:00
207840 06/01/2016 08:41:00
207841 06/01/2016 08:56:00
207842 06/01/2016 21:50:00
207843 06/01/2016 21:51:00
207844 06/01/2016 21:54:00
207845 06/01/2016 21:56:00
207846 07/01/2016 08:28:00
207847 08/01/2016 07:49:00
207848 08/01/2016 07:49:00
207849 08/01/2016 09:11:00
207850 14/12/2015 07:40:00
207851 14/12/2015 07:45:00
207852 14/12/2015 08:02:00
207853 15/12/2015 07:46:00
207854 15/12/2015 08:35:00
207855 15/12/2015 15:38:00
207856 16/12/2015 07:44:00
207857 16/12/2015 07:45:00
207858 16/12/2015 08:27:00
207859 16/12/2015 08:30:00
207860 17/12/2015 07:47:00
207861 17/12/2015 07:59:00
207862 18/12/2015 07:51:00
207863 19/12/2015 13:09:00
207864 20/12/2015 19:55:00
207865 21/12/2015 07:50:00
207866 21/12/2015 08:22:00
207867 22/12/2015 07:52:00
207868 22/12/2015 08:59:00
207869 23/12/2015 07:58:00
207870 23/12/2015 08:50:00
2

240185 06/01/2016 08:33:00
240186 06/01/2016 08:45:00
240187 07/01/2016 07:54:00
240188 07/01/2016 07:55:00
240189 08/01/2016 07:55:00
240190 14/12/2015 07:55:00
240191 15/12/2015 07:26:00
240192 15/12/2015 07:55:00
240193 15/12/2015 08:14:00
240194 15/12/2015 08:18:00
240195 16/12/2015 07:24:00
240196 17/12/2015 07:28:00
240197 17/12/2015 07:55:00
240198 17/12/2015 08:21:00
240199 18/12/2015 07:58:00
240200 18/12/2015 08:45:00
240201 21/12/2015 07:24:00
240202 21/12/2015 07:53:00
240203 21/12/2015 07:59:00
240204 22/12/2015 07:25:00
240205 22/12/2015 07:58:00
240206 23/12/2015 07:25:00
240207 23/12/2015 07:56:00
240208 24/12/2015 07:34:00
242579 29/12/2015 08:01:00
242580 29/12/2015 08:49:00
242581 30/12/2015 08:39:00
242582 31/12/2015 08:46:00
242583 01/01/2016 19:49:00
242584 04/01/2016 07:45:00
242585 04/01/2016 07:51:00
242586 04/01/2016 08:26:00
242587 05/01/2016 07:52:00
242588 05/01/2016 08:04:00
242589 06/01/2016 07:44:00
242590 06/01/2016 08:10:00
242591 07/01/2016 08:03:00
2

263855 03/01/2016 19:57:00
263856 04/01/2016 07:38:00
263857 04/01/2016 08:23:00
263858 04/01/2016 10:01:00
263859 04/01/2016 14:00:00
263860 04/01/2016 18:49:00
263861 05/01/2016 20:12:00
263862 06/01/2016 07:59:00
263863 06/01/2016 08:53:00
263864 06/01/2016 09:07:00
263865 06/01/2016 12:40:00
263866 06/01/2016 19:26:00
263867 07/01/2016 20:14:00
263868 08/01/2016 07:43:00
263869 08/01/2016 08:08:00
263870 08/01/2016 08:16:00
263871 08/01/2016 08:58:00
263872 08/01/2016 21:37:00
263873 09/01/2016 12:22:00
263874 15/12/2015 07:51:00
263875 15/12/2015 15:12:00
263876 16/12/2015 08:21:00
263877 16/12/2015 15:19:00
263878 17/12/2015 07:42:00
263879 17/12/2015 21:52:00
263880 18/12/2015 08:30:00
263881 19/12/2015 19:59:00
263882 20/12/2015 18:39:00
263883 21/12/2015 17:27:00
263884 21/12/2015 17:52:00
263885 22/12/2015 09:40:00
263886 22/12/2015 19:54:00
263887 22/12/2015 19:54:00
263888 23/12/2015 09:27:00
263889 23/12/2015 09:49:00
263890 24/12/2015 08:48:00
266458 27/12/2015 15:02:00
2

296593 04/01/2016 08:41:00
296594 05/01/2016 07:38:00
296595 05/01/2016 08:05:00
296596 05/01/2016 08:25:00
296597 06/01/2016 07:25:00
296598 06/01/2016 08:28:00
296599 06/01/2016 09:11:00
296600 06/01/2016 10:49:00
296601 07/01/2016 07:37:00
296602 07/01/2016 08:05:00
296603 08/01/2016 07:37:00
296604 08/01/2016 08:04:00
296605 08/01/2016 08:22:00
296606 14/12/2015 07:28:00
296607 14/12/2015 07:43:00
296608 15/12/2015 06:33:00
296609 15/12/2015 07:27:00
296610 15/12/2015 07:37:00
296611 15/12/2015 08:19:00
296612 16/12/2015 06:32:00
296613 16/12/2015 08:07:00
296614 16/12/2015 08:22:00
296615 17/12/2015 07:42:00
296616 17/12/2015 08:18:00
296617 18/12/2015 07:38:00
296618 18/12/2015 07:39:00
296619 20/12/2015 09:51:00
296620 21/12/2015 06:31:00
296621 21/12/2015 06:32:00
296622 21/12/2015 08:05:00
296623 21/12/2015 08:43:00
296624 22/12/2015 07:36:00
296625 22/12/2015 08:07:00
296626 23/12/2015 07:37:00
296627 23/12/2015 08:18:00
298898 04/01/2016 07:55:00
298899 05/01/2016 08:02:00
2

320225 25/12/2015 10:14:00
320226 05/01/2016 18:20:00
320628 16/12/2015 19:12:00
320629 20/12/2015 17:09:00
320630 21/12/2015 19:17:00
321020 04/01/2016 07:56:00
321021 07/01/2016 07:56:00
321022 08/01/2016 07:56:00
321023 16/12/2015 07:55:00
321024 16/12/2015 08:01:00
321025 21/12/2015 07:53:00
321026 23/12/2015 08:00:00
321027 23/12/2015 15:10:00
321526 03/01/2016 21:31:00
321527 17/12/2015 00:30:00
322519 04/01/2016 07:46:00
322520 04/01/2016 08:13:00
322521 05/01/2016 07:47:00
322522 06/01/2016 07:51:00
322523 07/01/2016 17:12:00
322524 08/01/2016 07:45:00
322525 14/12/2015 07:56:00
322526 17/12/2015 07:53:00
322527 17/12/2015 21:15:00
322528 18/12/2015 07:21:00
322529 21/12/2015 08:29:00
322530 21/12/2015 08:37:00
322531 21/12/2015 09:07:00
322532 22/12/2015 08:23:00
322533 23/12/2015 11:02:00
323412 25/12/2015 09:40:00
323413 25/12/2015 09:41:00
323414 03/01/2016 20:18:00
323415 13/12/2015 20:24:00
323416 14/12/2015 18:02:00
323417 15/12/2015 07:59:00
323418 19/12/2015 15:21:00
3

345846 14/12/2015 14:27:00
345847 15/12/2015 11:59:00
345848 15/12/2015 14:27:00
345849 16/12/2015 19:37:00
345850 21/12/2015 11:41:00
346558 04/01/2016 11:58:00
347409 28/12/2015 10:13:00
347410 28/12/2015 10:14:00
347411 29/12/2015 08:44:00
347412 29/12/2015 09:03:00
347413 30/12/2015 10:40:00
347414 04/01/2016 07:14:00
347415 04/01/2016 07:45:00
347416 04/01/2016 07:58:00
347417 04/01/2016 08:06:00
347418 04/01/2016 08:26:00
347419 05/01/2016 07:43:00
347420 05/01/2016 08:14:00
347421 06/01/2016 08:27:00
347422 07/01/2016 07:57:00
347423 07/01/2016 08:10:00
347424 07/01/2016 09:31:00
347425 07/01/2016 15:36:00
347426 08/01/2016 07:15:00
347427 08/01/2016 08:42:00
347428 08/01/2016 09:42:00
347429 14/12/2015 07:57:00
347430 14/12/2015 08:16:00
347431 14/12/2015 08:21:00
347432 14/12/2015 08:33:00
347433 15/12/2015 07:41:00
347434 15/12/2015 08:05:00
347435 15/12/2015 08:32:00
347436 15/12/2015 09:04:00
347437 16/12/2015 08:27:00
347438 17/12/2015 07:57:00
347439 17/12/2015 08:15:00
3

369768 13/12/2015 10:27:00
369769 14/12/2015 06:25:00
369770 14/12/2015 07:21:00
369771 14/12/2015 07:48:00
369772 14/12/2015 08:49:00
369773 14/12/2015 08:51:00
369774 15/12/2015 06:15:00
369775 15/12/2015 08:15:00
369776 16/12/2015 06:14:00
369777 16/12/2015 07:53:00
369778 17/12/2015 06:11:00
369779 17/12/2015 07:49:00
369780 17/12/2015 08:18:00
369781 17/12/2015 08:34:00
369782 17/12/2015 08:48:00
369783 18/12/2015 06:42:00
369784 18/12/2015 07:47:00
369785 18/12/2015 08:17:00
369786 18/12/2015 08:28:00
369787 18/12/2015 08:58:00
369788 21/12/2015 07:41:00
369789 21/12/2015 07:43:00
369790 21/12/2015 08:14:00
369791 21/12/2015 09:03:00
369792 22/12/2015 07:52:00
369793 22/12/2015 08:16:00
369794 22/12/2015 08:17:00
369795 22/12/2015 08:28:00
369796 22/12/2015 09:04:00
369797 23/12/2015 06:14:00
369798 23/12/2015 08:30:00
369799 24/12/2015 08:10:00
369800 24/12/2015 08:30:00
372904 26/12/2015 09:01:00
373187 19/12/2015 18:37:00
373306 23/12/2015 00:40:00
373503 30/12/2015 08:31:00
3

StartStation Id               376625
Start Date                    376625
EndStation Id                 376625
Duration                      376625
Frequency                     376625
StartStation capacity         376625
EndStation capacity           376625
Daily Weather                 376625
Humidity                      376625
Windspeed                     376625
Apparent Temperature (Avg)    376625
dtype: int64

In [11]:
cycle_usage["Start Date"] = cycle_usage["Start Date"].str.slice(0, 16)

In [12]:
cycle_usage.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 376625 entries, 0 to 376624
Data columns (total 11 columns):
StartStation Id               376625 non-null int64
Start Date                    376625 non-null object
EndStation Id                 376625 non-null int64
Duration                      376625 non-null int64
Frequency                     376625 non-null int64
StartStation capacity         376625 non-null int64
EndStation capacity           376625 non-null int64
Daily Weather                 376625 non-null object
Humidity                      376625 non-null float64
Windspeed                     376625 non-null float64
Apparent Temperature (Avg)    376625 non-null float64
dtypes: float64(3), int64(6), object(2)
memory usage: 34.5+ MB


Adding weekdays (Monday, Tuesday...)

In [13]:
#Add weekdays
cycle_usage["Start Date"] =  pd.to_datetime(cycle_usage["Start Date"], format='%d/%m/%Y %H:%M')
cycle_usage['Weekday'] = cycle_usage.apply(lambda row: calendar.day_name[row["Start Date"].weekday()],axis=1)
cycle_usage.head()

Unnamed: 0,StartStation Id,Start Date,EndStation Id,Duration,Frequency,StartStation capacity,EndStation capacity,Daily Weather,Humidity,Windspeed,Apparent Temperature (Avg),Weekday
0,14,2016-04-02 15:52:00,89,120,1410,48,19,fog,0.67,2.96,52.035,Saturday
1,14,2016-04-04 11:21:00,89,120,1410,48,19,partly-cloudy-day,0.83,3.26,50.025,Monday
2,14,2016-04-04 11:43:00,89,180,1410,48,19,partly-cloudy-day,0.83,3.26,50.025,Monday
3,14,2016-04-06 01:07:00,89,180,1410,48,19,partly-cloudy-day,0.72,5.21,46.15,Wednesday
4,14,2016-04-06 18:46:00,89,180,1410,48,19,partly-cloudy-day,0.72,5.21,46.15,Wednesday



`Meteorologische Jahreszeiten` <br>
Nördliche Hemisphäre <br>
Frühling: 1. März bis 31. Mai <br>
Sommer: 1. Juni bis 31. August <br>
Herbst: 1. September bis 30. November <br>
Winter: 1. Dezember bis 28. Februar <br>

In [14]:
#Add seasons
def seasons(p):
    """Get meteorological season"""
    year = int(str(p["Start Date"])[:4])
    date_m = p["Start Date"]
    if date_m >= datetime(year, 3, 1, 0,0,0) and date_m <= datetime(year, 5, 31, 23,59,59):
        return "Spring"
    elif date_m >= datetime(year, 6, 1, 0,0,0) and date_m <= datetime(year, 8, 31, 23,59,59):
        return "Summer"
    elif date_m >= datetime(year, 9, 1, 0,0,0) and date_m <= datetime(year, 11, 30, 23,59,59):
        return "Autumn"
    elif date_m >= datetime(year, 12, 1, 0,0,0) or date_m < datetime(year, 3, 1, 23,59,59):
        return "Winter"
        
cycle_usage['Season'] = cycle_usage.apply(lambda row: seasons(row),axis=1)
cycle_usage.head()

Unnamed: 0,StartStation Id,Start Date,EndStation Id,Duration,Frequency,StartStation capacity,EndStation capacity,Daily Weather,Humidity,Windspeed,Apparent Temperature (Avg),Weekday,Season
0,14,2016-04-02 15:52:00,89,120,1410,48,19,fog,0.67,2.96,52.035,Saturday,Spring
1,14,2016-04-04 11:21:00,89,120,1410,48,19,partly-cloudy-day,0.83,3.26,50.025,Monday,Spring
2,14,2016-04-04 11:43:00,89,180,1410,48,19,partly-cloudy-day,0.83,3.26,50.025,Monday,Spring
3,14,2016-04-06 01:07:00,89,180,1410,48,19,partly-cloudy-day,0.72,5.21,46.15,Wednesday,Spring
4,14,2016-04-06 18:46:00,89,180,1410,48,19,partly-cloudy-day,0.72,5.21,46.15,Wednesday,Spring


Add month names

In [15]:
# Months
def months_names(p):
    """Returns month name"""
    months = {
        1: "January",
        2: "February",
        3: "March",
        4: "April",
        5: "May",
        6: "June",
        7: "July",
        8: "August",
        9: "September",
        10: "October",
        11: "November",
        12: "December"
    }
    return months.get(p["Start Date"].month, "not defined")

cycle_usage["Month"] = cycle_usage.apply(lambda row: months_names(row), axis=1)

In [16]:
cycle_usage.count()

StartStation Id               376625
Start Date                    376625
EndStation Id                 376625
Duration                      376625
Frequency                     376625
StartStation capacity         376625
EndStation capacity           376625
Daily Weather                 376625
Humidity                      376625
Windspeed                     376625
Apparent Temperature (Avg)    376625
Weekday                       376625
Season                        376625
Month                         376625
dtype: int64

##### Split Start Date #####
> Dates are difficult to handle for ML. Idea: splitting to several columns

In [17]:
#Extract only dd-mm-YYYY
cycle_usage['Date'] = cycle_usage.apply(lambda row: str(row["Start Date"])[:10], axis=1)
cycle_usage['Date'] = pd.to_datetime(cycle_usage.Date, format="%Y/%m/%d")
#Extracting Year
cycle_usage['Year'] = cycle_usage['Date'].dt.year
#Extracting Month
##cycle_usage['Month'] = cycle_usage['Date'].dt.month
#Extracting passed years since the date
cycle_usage['Passed_Years'] = date.today().year - cycle_usage['Date'].dt.year
#Extracting passed months since the date
cycle_usage['Passed_Months'] = (date.today().year - cycle_usage['Date'].dt.year) * 12 + date.today().month - cycle_usage['Date'].dt.month
cycle_usage.head()

Unnamed: 0,StartStation Id,Start Date,EndStation Id,Duration,Frequency,StartStation capacity,EndStation capacity,Daily Weather,Humidity,Windspeed,Apparent Temperature (Avg),Weekday,Season,Month,Date,Year,Passed_Years,Passed_Months
0,14,2016-04-02 15:52:00,89,120,1410,48,19,fog,0.67,2.96,52.035,Saturday,Spring,April,2016-04-02,2016,3,37
1,14,2016-04-04 11:21:00,89,120,1410,48,19,partly-cloudy-day,0.83,3.26,50.025,Monday,Spring,April,2016-04-04,2016,3,37
2,14,2016-04-04 11:43:00,89,180,1410,48,19,partly-cloudy-day,0.83,3.26,50.025,Monday,Spring,April,2016-04-04,2016,3,37
3,14,2016-04-06 01:07:00,89,180,1410,48,19,partly-cloudy-day,0.72,5.21,46.15,Wednesday,Spring,April,2016-04-06,2016,3,37
4,14,2016-04-06 18:46:00,89,180,1410,48,19,partly-cloudy-day,0.72,5.21,46.15,Wednesday,Spring,April,2016-04-06,2016,3,37


Adding new `frequency` column represents rented bikes on station per <b>day</b>.

In [18]:
# Calculate new frequency of rented bikes
cycle_usage = pd.merge(cycle_usage, cycle_usage.groupby(["Date"])["Humidity"].count().reset_index(name="Rented Bikes"), how='left', on="Date", 
         left_index=False, right_index=False, sort=True)
cycle_usage.head()

Unnamed: 0,StartStation Id,Start Date,EndStation Id,Duration,Frequency,StartStation capacity,EndStation capacity,Daily Weather,Humidity,Windspeed,Apparent Temperature (Avg),Weekday,Season,Month,Date,Year,Passed_Years,Passed_Months,Rented Bikes
0,14,2015-01-04 10:01:00,77,240,1632,48,26,fog,0.94,0.55,36.295,Sunday,Winter,January,2015-01-04,2015,4,52,33
1,14,2015-01-04 15:17:00,11,240,3362,48,24,fog,0.94,0.55,36.295,Sunday,Winter,January,2015-01-04,2015,4,52,33
2,14,2015-01-04 19:45:00,11,240,3362,48,24,fog,0.94,0.55,36.295,Sunday,Winter,January,2015-01-04,2015,4,52,33
3,14,2015-01-04 17:59:00,78,720,1079,48,17,fog,0.94,0.55,36.295,Sunday,Winter,January,2015-01-04,2015,4,52,33
4,14,2015-01-04 15:06:00,374,1080,1517,48,36,fog,0.94,0.55,36.295,Sunday,Winter,January,2015-01-04,2015,4,52,33


In [19]:
rm_columns = {
    "StartStation Id",
    "Start Date",
    "StartStation Address",
    "StartStation capacity",
    "EndStation Id",
    "End Date",
    "EndStation Address",
    "EndStation capacity",
    "Duration",
  #  "Frequency",
   # "Humidity",
   # "Windspeed",
   # "Apparent Temperature (Avg)",
    "StartStation Id Used",
    "EndStation Id Used",
    "StartStation latitude",
    "StartStation longitude",
    "EndStation latitude",
    "EndStation longitude",
    "Hourly Weather",
   # "distance (geodesic)",
   # "Daily Weather",
   # 'Rented Bikes' 
}

cycle_usage.drop(columns=rm_columns, inplace=True, errors="ignore")
#cycle_usage.drop_duplicates(inplace=True)
cycle_usage.tail()

Unnamed: 0,Frequency,Daily Weather,Humidity,Windspeed,Apparent Temperature (Avg),Weekday,Season,Month,Date,Year,Passed_Years,Passed_Months,Rented Bikes
376620,3903,partly-cloudy-day,0.83,6.26,50.315,Tuesday,Spring,March,2019-03-19,2019,0,2,323
376621,3903,partly-cloudy-day,0.83,6.26,50.315,Tuesday,Spring,March,2019-03-19,2019,0,2,323
376622,3903,partly-cloudy-day,0.83,6.26,50.315,Tuesday,Spring,March,2019-03-19,2019,0,2,323
376623,112,partly-cloudy-day,0.83,6.26,50.315,Tuesday,Spring,March,2019-03-19,2019,0,2,323
376624,1028,partly-cloudy-day,0.83,6.26,50.315,Tuesday,Spring,March,2019-03-19,2019,0,2,323


In [20]:
cycle_usage = cycle_usage.drop_duplicates(subset={'Date'})
cycle_usage.sort_values("Rented Bikes", ascending=True).head(1)

Unnamed: 0,Frequency,Daily Weather,Humidity,Windspeed,Apparent Temperature (Avg),Weekday,Season,Month,Date,Year,Passed_Years,Passed_Months,Rented Bikes
272345,3362,rain,0.87,4.06,36.545,Tuesday,Winter,December,2017-12-26,2017,2,17,12


In [21]:
cycle_usage.reset_index(drop=True, inplace=True)
cycle_usage.size

19695

In [22]:
from datetime import datetime, timedelta
(cycle_usage["Date"][0] - timedelta(1)).strftime('%Y-%m-%d')

'2015-01-03'

In [23]:
cycle_usage.head()

Unnamed: 0,Frequency,Daily Weather,Humidity,Windspeed,Apparent Temperature (Avg),Weekday,Season,Month,Date,Year,Passed_Years,Passed_Months,Rented Bikes
0,1632,fog,0.94,0.55,36.295,Sunday,Winter,January,2015-01-04,2015,4,52,33
1,1005,partly-cloudy-day,0.88,1.59,46.74,Monday,Winter,January,2015-01-05,2015,4,52,281
2,1005,partly-cloudy-day,0.86,2.07,42.15,Tuesday,Winter,January,2015-01-06,2015,4,52,279
3,1005,clear-day,0.86,4.13,45.45,Wednesday,Winter,January,2015-01-07,2015,4,52,274
4,1005,rain,0.87,3.6,46.2,Thursday,Winter,January,2015-01-08,2015,4,52,161


In [24]:
cycle_usage.index.max()

1514

Add rented bikes `future` feature.

In [25]:
cycle_usage["Rented Bikes (Future)"] = ""

for index, p in cycle_usage.iterrows():
    if (index-1 < cycle_usage.index.max()):
        cycle_usage["Rented Bikes (Future)"].iloc[index-1] = cycle_usage["Rented Bikes"].iloc[index]
 #   else:
   #     cycle_usage["Rented Bikes (Future)"].iloc[index] = 0

In [26]:
cycle_usage.tail()

Unnamed: 0,Frequency,Daily Weather,Humidity,Windspeed,Apparent Temperature (Avg),Weekday,Season,Month,Date,Year,Passed_Years,Passed_Months,Rented Bikes,Rented Bikes (Future)
1510,1410,wind,0.8,19.56,54.26,Friday,Spring,March,2019-03-15,2019,0,2,258,28
1511,384,wind,0.8,20.5,42.86,Saturday,Spring,March,2019-03-16,2019,0,2,28,41
1512,1005,partly-cloudy-day,0.74,13.56,38.345,Sunday,Spring,March,2019-03-17,2019,0,2,41,327
1513,1410,clear-day,0.74,7.6,46.57,Monday,Spring,March,2019-03-18,2019,0,2,327,323
1514,1410,partly-cloudy-day,0.83,6.26,50.315,Tuesday,Spring,March,2019-03-19,2019,0,2,323,33


In [27]:
cycle_usage.count()

Frequency                     1515
Daily Weather                 1515
Humidity                      1515
Windspeed                     1515
Apparent Temperature (Avg)    1515
Weekday                       1515
Season                        1515
Month                         1515
Date                          1515
Year                          1515
Passed_Years                  1515
Passed_Months                 1515
Rented Bikes                  1515
Rented Bikes (Future)         1515
dtype: int64

###### Holidays ######
Check if that day is a specific holiday?

In [28]:
#Consider holidays (e.g. Good Friday in UK)
def holiday(p):
    """ Checks if holiday """
    uk_holidays = hd.UK()
    if (p["Date"].date() in uk_holidays):
        return True
    else:
        return False
    
for date2, name in sorted(hd.UK(state='London', years=[2015,2016,2017,2018], observed=False).items()):
    print(date2, name)
    
cycle_usage["Holiday"] = cycle_usage.apply(lambda row: holiday(row), axis=1)            
cycle_usage.head()

2015-01-01 New Year's Day
2015-01-02 New Year Holiday [Scotland]
2015-03-17 St. Patrick's Day [Northern Ireland]
2015-04-03 Good Friday
2015-04-06 Easter Monday [England, Wales, Northern Ireland]
2015-05-04 May Day
2015-05-25 Spring Bank Holiday
2015-07-12 Battle of the Boyne [Northern Ireland]
2015-08-03 Summer Bank Holiday [Scotland]
2015-08-31 Late Summer Bank Holiday [England, Wales, Northern Ireland]
2015-11-30 St. Andrew's Day [Scotland]
2015-12-25 Christmas Day
2015-12-26 Boxing Day
2016-01-01 New Year's Day
2016-01-02 New Year Holiday [Scotland]
2016-03-17 St. Patrick's Day [Northern Ireland]
2016-03-25 Good Friday
2016-03-28 Easter Monday [England, Wales, Northern Ireland]
2016-05-02 May Day
2016-05-30 Spring Bank Holiday
2016-07-12 Battle of the Boyne [Northern Ireland]
2016-08-01 Summer Bank Holiday [Scotland]
2016-08-29 Late Summer Bank Holiday [England, Wales, Northern Ireland]
2016-11-30 St. Andrew's Day [Scotland]
2016-12-25 Christmas Day
2016-12-26 Boxing Day
2017-01-01

Unnamed: 0,Frequency,Daily Weather,Humidity,Windspeed,Apparent Temperature (Avg),Weekday,Season,Month,Date,Year,Passed_Years,Passed_Months,Rented Bikes,Rented Bikes (Future),Holiday
0,1632,fog,0.94,0.55,36.295,Sunday,Winter,January,2015-01-04,2015,4,52,33,281,False
1,1005,partly-cloudy-day,0.88,1.59,46.74,Monday,Winter,January,2015-01-05,2015,4,52,281,279,False
2,1005,partly-cloudy-day,0.86,2.07,42.15,Tuesday,Winter,January,2015-01-06,2015,4,52,279,274,False
3,1005,clear-day,0.86,4.13,45.45,Wednesday,Winter,January,2015-01-07,2015,4,52,274,161,False
4,1005,rain,0.87,3.6,46.2,Thursday,Winter,January,2015-01-08,2015,4,52,161,270,False


In [29]:
cycle_usage.iloc[605]

Frequency                                    1410
Daily Weather                   partly-cloudy-day
Humidity                                     0.74
Windspeed                                    1.24
Apparent Temperature (Avg)                 70.485
Weekday                                 Wednesday
Season                                     Autumn
Month                                   September
Date                          2016-09-07 00:00:00
Year                                         2016
Passed_Years                                    3
Passed_Months                                  32
Rented Bikes                                  381
Rented Bikes (Future)                         375
Holiday                                     False
Name: 605, dtype: object

##### Adding past data #####
Getting weather data from `yesterday`.

In [30]:
def add_yesterday(cycle_usage):
    """Adds on each day the past day's weather information"""
    from datetime import datetime, timedelta
    rm_columns = {
        "StartStation Id",
        "Start Date",
        "StartStation Address",
        "StartStation capacity",
        "EndStation Id",
        "End Date",
        "EndStation Address",
        "EndStation capacity",
        "Duration",
        "Frequency",
        "Holiday",
        "Humidity",
        "Windspeed",
        "Apparent Temperature (Avg)",
        "StartStation Id Used",
        "EndStation Id Used",
        "StartStation latitude",
        "StartStation longitude",
        "EndStation latitude",
        "EndStation longitude",
        "Hourly Weather",
        "distance (geodesic)",
        "Daily Weather",
        'Rented Bikes',
        'Rented Bikes (Future)',
        'Weekday',
        'Season',
        'Month',
        'Year',
        'Passed_Years',
        'Passed_Months',
        'Daily Weather (Past)',
        #'Yesterday',
        'Date'
    }
    
    cycle_usage["Yesterday"] = ""
    for index, p in cycle_usage.iterrows():
        cycle_usage["Yesterday"].iloc[index] = (cycle_usage["Date"].iloc[index] - timedelta(1)).strftime('%Y-%m-%d')

    df_r = cycle_usage.copy(True)
    df_r.drop(columns=rm_columns, inplace=True, errors="ignore")
    df_r["Yesterday"] = df_r["Yesterday"].astype(str)
    df_w = pd.read_csv("dates and weather_new.csv", sep=",")
   # df_w['Start Date'].replace('/','-',inplace=True, regex=True)
    df_w["Start Date"] = pd.to_datetime(df_w["Start Date"], format="%d.%m.%Y")
    df_w["Start Date"] = df_w["Start Date"].astype(str)
    df_t = pd.merge(df_r, df_w, left_on="Yesterday", right_on="Start Date", how='left')
    df_t.rename(columns={'Daily Weather' : 'Daily Weather (Past)', 
                         'Humidity' : 'Humidity (Past)',
                         'Windspeed' : 'Windspeed (Past)',
                         'Apparent Temperature (Avg)' : 'Apparent Temperature (Avg) (Past)'}, inplace=True)
 
    cycle_usage = pd.concat([cycle_usage, df_t[["Daily Weather (Past)", "Humidity (Past)", "Windspeed (Past)", "Apparent Temperature (Avg) (Past)"]]], axis=1)
    cycle_usage.drop(df_t.index[:1], inplace=True)
    cycle_usage["Daily Weather (Past)"].loc[cycle_usage["Daily Weather (Past)"]=="partly-cloudy-night"] = "clear-day"
    
    return cycle_usage
    
cycle_usage = add_yesterday(cycle_usage)

In [32]:
cycle_usage.head()

Unnamed: 0,Frequency,Daily Weather,Humidity,Windspeed,Apparent Temperature (Avg),Weekday,Season,Month,Date,Year,Passed_Years,Passed_Months,Rented Bikes,Rented Bikes (Future),Holiday,Yesterday,Daily Weather (Past),Humidity (Past),Windspeed (Past),Apparent Temperature (Avg) (Past)
1,1005,partly-cloudy-day,0.88,1.59,46.74,Monday,Winter,January,2015-01-05,2015,4,52,281,279,False,2015-01-04,fog,0.94,0.55,36.295
2,1005,partly-cloudy-day,0.86,2.07,42.15,Tuesday,Winter,January,2015-01-06,2015,4,52,279,274,False,2015-01-05,partly-cloudy-day,0.88,1.59,46.74
3,1005,clear-day,0.86,4.13,45.45,Wednesday,Winter,January,2015-01-07,2015,4,52,274,161,False,2015-01-06,partly-cloudy-day,0.86,2.07,42.15
4,1005,rain,0.87,3.6,46.2,Thursday,Winter,January,2015-01-08,2015,4,52,161,270,False,2015-01-07,clear-day,0.86,4.13,45.45
5,1410,partly-cloudy-day,0.81,7.43,56.085,Friday,Winter,January,2015-01-09,2015,4,52,270,62,False,2015-01-08,rain,0.87,3.6,46.2


In [33]:
ol = ["Month", "Season", "Weekday", "Holiday", "Daily Weather","Daily Weather (Past)", "Humidity", "Humidity (Past)", "Windspeed", "Windspeed (Past)", "Apparent Temperature (Avg)", "Apparent Temperature (Avg) (Past)", "Rented Bikes", "Rented Bikes (Future)"]
cycle_usage = cycle_usage[ol]

In [34]:
cycle_usage.iloc[600:610]

Unnamed: 0,Month,Season,Weekday,Holiday,Daily Weather,Daily Weather (Past),Humidity,Humidity (Past),Windspeed,Windspeed (Past),Apparent Temperature (Avg),Apparent Temperature (Avg) (Past),Rented Bikes,Rented Bikes (Future)
601,August,Summer,Saturday,False,partly-cloudy-day,partly-cloudy-day,0.74,0.66,1.97,1.28,69.925,70.015,93,77
602,August,Summer,Sunday,False,partly-cloudy-day,partly-cloudy-day,0.83,0.74,3.23,1.97,64.03,69.925,77,90
603,August,Summer,Monday,True,partly-cloudy-day,partly-cloudy-day,0.73,0.83,1.4,3.23,63.785,64.03,90,384
604,August,Summer,Tuesday,False,clear-day,partly-cloudy-day,0.67,0.73,1.22,1.4,66.405,63.785,384,381
605,September,Autumn,Wednesday,False,partly-cloudy-day,partly-cloudy-day,0.74,0.86,1.24,1.23,70.485,70.345,381,375
606,September,Autumn,Thursday,False,partly-cloudy-day,partly-cloudy-day,0.71,0.74,2.68,1.24,66.28,70.485,375,367
607,September,Autumn,Friday,False,partly-cloudy-day,partly-cloudy-day,0.79,0.71,4.78,2.68,67.97,66.28,367,61
608,September,Autumn,Saturday,False,partly-cloudy-day,partly-cloudy-day,0.88,0.79,3.54,4.78,57.92,67.97,61,106
609,September,Autumn,Sunday,False,clear-day,partly-cloudy-day,0.73,0.88,1.24,3.54,61.62,57.92,106,414
610,September,Autumn,Monday,False,partly-cloudy-day,clear-day,0.76,0.73,1.98,1.24,69.055,61.62,414,393


In [35]:
cycle_usage.tail()

Unnamed: 0,Month,Season,Weekday,Holiday,Daily Weather,Daily Weather (Past),Humidity,Humidity (Past),Windspeed,Windspeed (Past),Apparent Temperature (Avg),Apparent Temperature (Avg) (Past),Rented Bikes,Rented Bikes (Future)
1510,March,Spring,Friday,False,wind,wind,0.8,0.71,19.56,16.31,54.26,50.685,258,28
1511,March,Spring,Saturday,False,wind,wind,0.8,0.8,20.5,19.56,42.86,54.26,28,41
1512,March,Spring,Sunday,True,partly-cloudy-day,wind,0.74,0.8,13.56,20.5,38.345,42.86,41,327
1513,March,Spring,Monday,True,clear-day,partly-cloudy-day,0.74,0.74,7.6,13.56,46.57,38.345,327,323
1514,March,Spring,Tuesday,False,partly-cloudy-day,clear-day,0.83,0.74,6.26,7.6,50.315,46.57,323,33


In [36]:
cycle_usage.to_csv("features.csv", header=True, index=False)