### This next example is extended from https://machinelearningmastery.com/basic-feature-engineering-time-series-data-python/

### Read in the file __`data/daily-minimum-temperatures-in-me.csv`__
* Because this file contains time series data, we are going to pass some extra parameters to __`read_csv`__:
  * __`parse_dates=[0]`__ (first column contains dates that need to be parsed)
  * __`index_col=0`__ (first column contains index info for the time series
  * __`squeeze=True`__ (return a series not a DataFrame)

In [34]:
from pandas import read_csv
from pandas import DataFrame
series = read_csv('data/daily-minimum-temperatures-in-me.csv', header=0, index_col=0, parse_dates=True).squeeze("columns")
dataframe = DataFrame()
dataframe['month'] = [series.index[i].month for i in range(len(series))]
dataframe['day'] = [series.index[i].day for i in range(len(series))]
dataframe['temperature'] = [series[i] for i in range(len(series))]
print(series)

Date
1981-01-01    20.7
1981-01-02    17.9
1981-01-03    18.8
1981-01-04    14.6
1981-01-05    15.8
              ... 
1990-12-27    14.0
1990-12-28    13.6
1990-12-29    13.5
1990-12-30    15.7
1990-12-31    13.0
Name: Temp, Length: 3650, dtype: float64


### What is the index of the series?

In [6]:
series.index

DatetimeIndex(['1981-01-01', '1981-01-02', '1981-01-03', '1981-01-04',
               '1981-01-05', '1981-01-06', '1981-01-07', '1981-01-08',
               '1981-01-09', '1981-01-10',
               ...
               '1990-12-22', '1990-12-23', '1990-12-24', '1990-12-25',
               '1990-12-26', '1990-12-27', '1990-12-28', '1990-12-29',
               '1990-12-30', '1990-12-31'],
              dtype='datetime64[ns]', name='Date', length=3650, freq=None)

### What is the month of the first element in the series?

In [8]:
series.index[0].month

1

### What is the day of the first element in the series?

In [9]:
series.index[0].day

1

### Is the day of the first element of the series a weekend day? (Hint: __`datetime`__ objects have a __`weekday()`__ method)

In [35]:
series.index[0].weekday() > 4

False

### Create an empty DataFrame

In [15]:
data = DataFrame()

### We can create a new column in the dataframe called 'temperature' and fill the values from the series like this

In [18]:
data['temperature'] = [series[i] for i in range(len(series))]
data

Unnamed: 0,temperature
0,20.7
1,17.9
2,18.8
3,14.6
4,15.8
...,...
3645,14.0
3646,13.6
3647,13.5
3648,15.7


### Using a similar approach, add a column named __`month`__ to the dataframe with the month of each index element from the series 


In [20]:
data['month'] =[series.index[i].month for i in range(len(series))]
data

Unnamed: 0,temperature,month
0,20.7,1
1,17.9,1
2,18.8,1
3,14.6,1
4,15.8,1
...,...,...
3645,14.0,12
3646,13.6,12
3647,13.5,12
3648,15.7,12


### Using a similar approach, add a column called __`day`__ to the dataframe with the day of each index element from the series 

In [22]:
data['day'] =[series.index[i].day for i in range(len(series))]
data

Unnamed: 0,temperature,month,day
0,20.7,1,1
1,17.9,1,2
2,18.8,1,3
3,14.6,1,4
4,15.8,1,5
...,...,...,...
3645,14.0,12,27
3646,13.6,12,28
3647,13.5,12,29
3648,15.7,12,30


### Using a similar approach, add a column called __`weekend`__ to the dataframe with a boolean value indicating whether the day of each index element from the series is a weekend day or not. Make sure to convert the boolean value into an integer

In [26]:
data['weekend'] =[series.index[i].weekday()>4 for i in range(len(series))]
data

Unnamed: 0,temperature,month,day,weekend
0,20.7,1,1,False
1,17.9,1,2,False
2,18.8,1,3,True
3,14.6,1,4,True
4,15.8,1,5,False
...,...,...,...,...
3645,14.0,12,27,False
3646,13.6,12,28,False
3647,13.5,12,29,True
3648,15.7,12,30,True


### Investigate the structure of the DataFrame

In [30]:
data.shape

(3650, 4)

In [28]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3650 entries, 0 to 3649
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   temperature  3650 non-null   float64
 1   month        3650 non-null   int64  
 2   day          3650 non-null   int64  
 3   weekend      3650 non-null   bool   
dtypes: bool(1), float64(1), int64(2)
memory usage: 89.2 KB


In [33]:
from pandas import concat
temps = DataFrame(series.values)
window = temps.expanding()
dataframe = concat([window.min(), window.mean(), window.max(), temps.shift(-1)], axis=1)
dataframe.columns = ['min', 'mean', 'max', 't+1']
print(dataframe)

       min       mean   max   t+1
0     20.7  20.700000  20.7  17.9
1     17.9  19.300000  20.7  18.8
2     17.9  19.133333  20.7  14.6
3     14.6  18.000000  20.7  15.8
4     14.6  17.560000  20.7  15.8
...    ...        ...   ...   ...
3645   0.0  11.174712  26.3  13.6
3646   0.0  11.175377  26.3  13.5
3647   0.0  11.176014  26.3  15.7
3648   0.0  11.177254  26.3  13.0
3649   0.0  11.177753  26.3   NaN

[3650 rows x 4 columns]
