# Modeling Notebook

Below is the notebook that the data scientist used to build his model. Here we create a simple Lasso model and get cross-validation and out of sample metrics to make sure that our model produces good accuracy metrics (we use R2 for our metric).

The final model deployed should be using `flight_prices_training.csv` as its training data. 

### Train Test Split

In [4]:
import pandas as pd
from sklearn.model_selection import train_test_split

df = pd.read_csv("flight_prices_training.csv")
train, test = train_test_split(df, test_size=0.2)

### Preprocessing

In [5]:
train = train.drop(columns=['flight'])
test = test.drop(columns=['flight'])

num_cols = ['days_left', 'duration']
cat_cols = ['airline', 'source_city', 'departure_time', 'stops', 'arrival_time', 'destination_city', 'class']

train = pd.get_dummies(train, prefix = cat_cols, columns = cat_cols)
test = pd.get_dummies(test, prefix = cat_cols, columns = cat_cols)

y_train = train['price']
X_train = train.drop(['price'], axis=1)
y_test = test['price']
X_test = test.drop(['price'], axis=1)

### Model Fitting and Cross Validation

In [10]:
from sklearn.model_selection import cross_val_score, cross_validate
from sklearn import linear_model

lasso = linear_model.Lasso(alpha=.1, max_iter=5000)
cv_results = cross_validate(lasso, X_train, y_train, cv=5, return_estimator=True)
print("Cross Val R2 Score: ", cv_results['test_score'].mean())

Cross Val R2 Score:  0.9113137661512379


### Final Out of Sample Testing

In [11]:
from sklearn.metrics import r2_score
lasso = lasso.fit(X_train, y_train)
predicted = lasso.predict(X_test)
print("Out of Sample R2 Score: ", r2_score(y_test, predicted))

Out of Sample R2 Score:  0.9102752391908628


In [3]:
train['airline'].value_counts()

Vistara      81836
Air_India    51823
Indigo       27526
GO_FIRST     14855
AirAsia      10359
SpiceJet      5698
Name: airline, dtype: int64

In [6]:
train

Unnamed: 0,duration,days_left,price,airline_AirAsia,airline_Air_India,airline_GO_FIRST,airline_Indigo,airline_SpiceJet,airline_Vistara,source_city_Bangalore,...,arrival_time_Morning,arrival_time_Night,destination_city_Bangalore,destination_city_Chennai,destination_city_Delhi,destination_city_Hyderabad,destination_city_Kolkata,destination_city_Mumbai,class_Business,class_Economy
100725,17.00,38,49725,0,1,0,0,0,0,0,...,0,0,0,0,0,0,1,0,1,0
67336,10.83,21,5584,0,0,1,0,0,0,0,...,0,0,0,0,1,0,0,0,0,1
70034,13.00,20,65832,0,0,0,0,0,1,0,...,1,0,0,1,0,0,0,0,1,0
187219,10.75,27,3654,0,1,0,0,0,0,0,...,0,1,0,0,0,1,0,0,0,1
238615,14.58,28,37985,0,0,0,0,0,1,1,...,0,0,0,0,0,1,0,0,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
87500,16.75,15,4337,0,0,1,0,0,0,1,...,0,0,0,0,0,1,0,0,0,1
160699,16.67,39,5574,0,0,1,0,0,0,0,...,0,1,0,0,0,1,0,0,0,1
196539,23.25,46,45672,0,0,0,0,0,1,0,...,0,0,0,1,0,0,0,0,1,0
153941,23.92,48,3466,0,0,0,0,1,0,1,...,1,0,0,0,1,0,0,0,0,1
