# Linear Regression for Time Series Data

This notebook explores the application of linear regression to time series data. 

## Key Differences for Time Series

When working with time series data, there are a few additional steps required:

- **Reshaping the data**: Time series data often needs to be restructured to fit the input format expected by regression models
- **Using time as index**: Setting temporal values as the index helps maintain chronological ordering

## Process

The rest of the modeling process remains the same as standard linear regression:
1. Feature preparation
2. Model training
3. Prediction
4. Evaluation

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt 
import seaborn as sns
import statsmodels.api as sm

from sklearn.model_selection import train_test_split


In [6]:
# load data
df = pd.read_csv("otp_time_series_web.csv")
print(df.info())
print("====="*5)
print(df.describe())
print("====="*5)
print(df.head())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 108742 entries, 0 to 108741
Data columns (total 14 columns):
 #   Column              Non-Null Count   Dtype  
---  ------              --------------   -----  
 0   Route               108742 non-null  object 
 1   Departing_Port      108742 non-null  object 
 2   Arriving_Port       108742 non-null  object 
 3   Airline             108742 non-null  object 
 4   Month               108742 non-null  object 
 5   Sectors_Scheduled   108742 non-null  float64
 6   Sectors_Flown       108742 non-null  int64  
 7   Cancellations       108426 non-null  float64
 8   Departures_On_Time  108742 non-null  float64
 9   Arrivals_On_Time    108742 non-null  float64
 10  Departures_Delayed  108737 non-null  float64
 11  Arrivals_Delayed    108742 non-null  float64
 12  Year                108742 non-null  int64  
 13  Month_Num           108742 non-null  int64  
dtypes: float64(6), int64(3), object(5)
memory usage: 11.6+ MB
None
       Sectors_Schedu