### AccelerateAI - Model Deployment

We are interested in deployeing a model to predic the mileage of cars. <br>
400 cars were measured and its data is available in the file Car_mileage_data.csv. <br>

 1) Train a decision tree and identify the features that impact the mileage of cars. <br>
 2) Deploy the model on GCP and share the link to predict mileage based on important features of the car.

In [None]:
# Import required libraries 
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import GridSearchCV

In [None]:
# Read the data
cars_df = pd.read_csv("Car_mileage_data.csv")
cars_df.sample(5)

In [None]:
#Check for missing values
cars_df.info()

In [None]:
# hp is incorrectly coded as object - convert to numeric
cars_df['hp'] = pd.to_numeric(cars_df['hp'], errors='coerce')
cars_df.dropna(inplace=True)


#cylinders and origin needs to be converted to dummy vaiable
onehot_car_df = pd.get_dummies(cars_df,columns = ["cylinders", "origin"])
onehot_car_df.sample(3)

### Identify the features that impact the mileage of cars

In [None]:
X_vars = onehot_car_df.drop(columns='mpg', axis=1)
Y_var = onehot_car_df['mpg']

# Train an decision tree regressor
dtree_reg = DecisionTreeRegressor()
dtree_reg.fit(X_vars, Y_var)

feature_imp = pd.Series(dtree_reg.feature_importances_ ,X_vars.columns)
feature_imp.sort_values(ascending=False)

#### The top 4 variables that impact mileage are "displacement", "hp", "acceleration" and "weight". 

In [None]:
X_train = X_vars[["displacement", "hp", "acceleration", "weight"]]
Y_train  = onehot_car_df['mpg']

In [None]:
# Check for the best paramteres

params = {'min_samples_split':[2,4,5,10,15,20],
          'min_samples_leaf':[5,10,15,20,30],
          'max_depth':[1,2,3,4,5,6,7,10,15]
         }

dtree_reg_cv = GridSearchCV(DecisionTreeRegressor(), param_grid=params, cv=5)
dtree_reg_cv.fit(X_train, Y_train) 

In [None]:
# Model parameters
dtree_reg_cv.best_params_ 

In [None]:
# Model accuracy overall
dtree_reg_cv.score(X_train, Y_train)

### Save the model as pickle file