Preprocessing given dataset for the Time-Series forecasting. 

Pandas library provides excellent built-in support for time series data.  
Pandas represent time-series datasets as a Series. Dataframe is a collection of Series.   
A Series in an 1-Dimensional array with dates/time-labels for each row.  

In [None]:
import numpy as np 
import pandas as pd  
import matplotlib.pyplot as plt 
import seaborn as sb 
%matplotlib inline 

In [None]:
fem_birth = pd.read_csv('../Datasets/daily-total-female-births-CA.csv', header=0) 
fem_birth 

In [None]:
fem_birth['date'].dtype 

Loading data with parsed dates - 

Sometimes it will be difficult to directly parse objects into time data.  
So;  
dateparse = lambda x: pd.datetime.strptime(x, %Y-%m-%d %H:%M:%S)    
                                OR       
dateparse = lambda x: pd.to_datetime(x, format='%Y-%m-%d')  
    
df = pd.read_csv('dataset', parse_dates = [<date_column>], date_format = dateparse)  

In [None]:
dateparse = lambda x: pd.to_datetime(x, format='%Y-%m-%d')   
fem_birth_data = pd.read_csv('../Datasets/daily-total-female-births-CA.csv', header=0, parse_dates=[0], date_format=dateparse)  
fem_birth_data  

In [None]:
fem_birth_data['date'].dtype 

In [None]:
fem_birth_data = pd.read_csv('../Datasets/daily-total-female-births-CA.csv', header=0, parse_dates=[0])     # parse_dates = [date_column]  
fem_birth_data    

In [None]:
fem_birth_data['date'].dtype 

In [None]:
fem_birth_data.shape 

In [None]:
print(f" Births in January : \n{ fem_birth_data[(fem_birth_data['date'] >= '1959-01-01') & (fem_birth_data['date'] <= '1959-02-01')] } ")  

Loading data as Series - 

In [None]:
series = pd.read_csv('../Datasets/daily-total-female-births-CA.csv', header=0, parse_dates=[0], index_col=0).squeeze("columns")  
# index_col = <column_to_be_the_index>  
series   

In [None]:
series.shape  

In [None]:
print(f"Births in January : \n{series['1959-01']} ") 

Descriptive Statistics - 

In [None]:
fem_birth_data.describe()  

In [None]:
series.describe()  

------------------------------------------------------------------------------------------------------------------------------------------------
# Feature Engineering - 

Feature Engineering refers to modifyiing, deleting, or combining existing raw features into our data to create new features. These new features will help us enhancing the overall performance of the forecasting model.  
Input variables are also called features in the field of machine learning, and the task before us is to create or invent new input features from our time series dataset.  
---> Datetime features    ---> Lag features    ---> Window features    ---> Expanding features  

Date-Time Features : Components of time step itself for each observation.  

In [None]:
features_fbth = fem_birth_data.copy() 
features_fbth['year'] = fem_birth_data['date'].dt.year  
features_fbth['month'] = fem_birth_data['date'].dt.month  
features_fbth['day'] = fem_birth_data['date'].dt.day  
features_fbth  

Lag Features : Values at prior timestamps. To shift dates at a specific value.    

In [None]:
features_fbth['lag1'] =features_fbth['births'].shift(1) 
features_fbth['lag2'] =features_fbth['births'].shift(365) 
features_fbth['lag3'] =features_fbth['births'].shift(150)  
features_fbth 

Window Features : Summary of values over a prior window of prior time step.  

Rolling Window : Add summary of values at previous time steps.  

In [None]:
features_fbth['roll_mean'] = fem_birth_data['births'].rolling(window = 2).mean()  
features_fbth 

In [None]:
features_fbth['roll_max'] = fem_birth_data['births'].rolling(window = 3).max()  
features_fbth 

Expanding Window : Add summary of all previous data in series.  

In [None]:
features_fbth['expand_max'] = fem_birth_data['births'].expanding().max()  
features_fbth 

----------------------------------------------------------------------------------------------------------------------------------------------
# Visualization -  
By visualizing the series, we can identify and detect initial paterns, identify its competence, and spot potential problems such as outliers, missing values, and unequal spaccing.   

# Time Plot -  
The most basic and informative plot for visualizing a time series is the Time Plot.  
It is a line chart of the series values (y1, y2,....) over time (t = 1, 2,....), with temporal labels (e.g., calendar date) on the horizontal axis.  


In [None]:
# dateparse = lambda x: pd.to_datetime(x, format='%Y-%m-%d')   
# airl_miles_data = pd.read_csv('../Datasets/us-airlines-monthly-aircraft-miles-flown.csv', header=0, parse_dates=[0], date_format=dateparse) 

airl_miles_data = pd.read_csv('../Datasets/us-airlines-monthly-aircraft-miles-flown.csv', header=0, parse_dates=[0]) 
# parse_dates = [dates_column] 
airl_miles_data 

In [None]:
airl_miles_data['MilesMM'].plot() 

In [None]:
dataviz_fbth = fem_birth_data.copy() 
dataviz_fbth 

In [None]:
dataviz_fbth['births'].plot() 

In [None]:
# Ensure the index is a datetime index 
dataviz_fbth.index = pd.to_datetime(dataviz_fbth['date']) 
airl_miles_data.index = pd.to_datetime(airl_miles_data['Month'])  

# Convert datetime index to ordinal numbers
dataviz_fbth['date_ordinal'] = dataviz_fbth.index.map(pd.Timestamp.toordinal) 
airl_miles_data['month_ordinal'] = airl_miles_data.index.map(pd.Timestamp.toordinal)  

# print(dataviz_fbth) 
# print(airl_miles_data)  

In [None]:
dataviz_fbth 

In [None]:
airl_miles_data 

In [None]:
dataviz_fbth['births'].plot() 

In [None]:
airl_miles_data['MilesMM'].plot() 

->> Zooming In -  
    Zooming in or looking at to a shorter period within the series can reveal patterns that are hidden when viewing the entire series. This is especially important when the time series is long.  


In [None]:
dataviz_fbth2 = dataviz_fbth[(dataviz_fbth['date'] >= '1959-01-01') & (dataviz_fbth['date'] <= '1959-01-20')].copy() 
dataviz_fbth2 

In [None]:
dataviz_fbth2['births'].plot() 