<p align="center"><img width="50%" src="https://aimodelsharecontent.s3.amazonaws.com/aimodshare_banner.jpg" /></p>


---


<p align="center"><h1 align="center">Used Car Sales Price Prediction Competition

##### <p align="center">*Note: Dataset adapted from N. Birla's public dataset ['Vehicle Dataset'](https://www.kaggle.com/nehalbirla/vehicle-dataset-from-cardekho)* 
---
Let's share our models to a centralized leaderboard, so that we can collaborate and learn from the model experimentation process...

**Instructions:**
1.   Get data in and set up X_train / X_test / y_train
2.   Preprocess data  with Sklearn Column Transformer/ Write and Save Preprocessor function
3. Fit model on preprocessed data and save preprocessor function and model 
4. Generate predictions from X_test data and submit model to competition
5. Repeat submission process to improve place on leaderboard



## 1. Get data in and set up X_train, X_test, y_train objects

In [None]:
#install aimodelshare library
! pip install aimodelshare --upgrade

In [2]:
# Get competition data
from aimodelshare import download_data
download_data('public.ecr.aws/y2e2a1d6/used_car_competition_data-repository:latest') 


Data downloaded successfully.


In [3]:
# Separate data into X_train, y_train, and X_test
import pandas as pd

X_train = pd.read_csv("/content/used_car_competition_data/training_data.csv")
y_train = X_train['selling_price']
X_train.drop(['selling_price'], axis=1, inplace=True)

X_test=pd.read_csv("/content/used_car_competition_data/test_data.csv")

X_train.head()

Unnamed: 0,year,km_driven,fuel,seller_type,transmission,owner,mileage,engine,max_power,seats
0,2017,27000,Diesel,Individual,Manual,Second Owner,19.67,1582.0,126.2,5.0
1,2015,40000,Petrol,Individual,Manual,First Owner,13.24,1598.0,102.5,5.0
2,2019,8500,Diesel,Dealer,Automatic,First Owner,16.78,1995.0,190.0,5.0
3,2016,50000,Petrol,Individual,Manual,First Owner,22.74,796.0,47.3,5.0
4,2013,190000,Diesel,Individual,Manual,Second Owner,23.4,1248.0,74.0,5.0


##2.   Preprocess data using Sklearn Column Transformer / Write and Save Preprocessor function


In [4]:
# In this case we use Sklearn's Column transformer in our preprocessor function

from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder

#Preprocess data using sklearn's Column Transformer approach

# We create the preprocessing pipelines for both numeric and categorical data.
numeric_features = ['km_driven', 'mileage', 'engine', 'max_power', 'year']
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')), #'imputer' names the step
    ('scaler', StandardScaler())])

categorical_features = ['fuel', 'seller_type', 'transmission', 'owner', 'seats']

# Replacing missing values with Modal value and then one-hot encoding.
categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='most_frequent')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))])

# Final preprocessor object set up with ColumnTransformer...

preprocess = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)])

# fit preprocessor to your data
preprocess = preprocess.fit(X_train)

In [5]:
# Write function to transform data with preprocessor 
# In this case we use sklearn's Column transformer in our preprocessor function

def preprocessor(data):
    preprocessed_data=preprocess.transform(data)
    return preprocessed_data

In [6]:
# check shape of X data 
preprocessor(X_train).shape

(6502, 27)

##3. Fit model on preprocessed data and save preprocessor function and model 


In [None]:
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
import keras

feature_count=preprocessor(X_train).shape[1] #count features in input data

model = Sequential()
model.add(Dense(64, input_dim=feature_count, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(64, activation='relu'))

model.add(Dense(1, kernel_initializer='normal')) 
                                            
# Compile model
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mean_squared_error'])

# Fitting the NN to the Training set
model.fit(preprocessor(X_train), y_train, 
               batch_size = 60, 
               epochs = 50, validation_split=0.35)  

#### Save preprocessor function to local "preprocessor.zip" file

In [8]:
import aimodelshare as ai
ai.export_preprocessor(preprocessor,"") 

Your preprocessor is now saved to 'preprocessor.zip'


#### Save model to local ".onnx" file

In [9]:
# Save keras model to local ONNX file
from aimodelshare.aimsonnx import model_to_onnx

# Check how many preprocessed input features there are
from skl2onnx.common.data_types import FloatTensorType

onnx_model = model_to_onnx(model, framework='keras',
                          transfer_learning=False,
                          deep_learning=True)

with open("model.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

## 4. Generate predictions from X_test data and submit model to competition


In [10]:
#Set credentials using modelshare.org username/password

from aimodelshare.aws import set_credentials
    
apiurl="https://9p8cpepe62.execute-api.us-east-1.amazonaws.com/prod/m" #This is the unique rest api that powers this Used Car Sales Prediction Playground

set_credentials(apiurl=apiurl)

AI Modelshare Username:··········
AI Modelshare Password:··········
AI Model Share login credentials set successfully.


In [11]:
#Instantiate Competition
import aimodelshare as ai
mycompetition= ai.Competition(apiurl)

In [12]:
#Submit Model 1: 

#-- Generate predicted values (a list of predicted car prices) (Model 1)
predicted_values = model.predict(preprocessor(X_test))

# Submit Model 1 to Competition Leaderboard
mycompetition.submit_model(model_filepath = "model.onnx",
                                 preprocessor_filepath="preprocessor.zip",
                                 prediction_submission=predicted_values)

Insert search tags to help users find your model (optional): 
Provide any useful notes about your model (optional): 

Your model has been submitted as model version 7

To submit code used to create this model or to view current leaderboard navigate to Model Playground: 

 https://www.modelshare.org/detail/model:1307


In [13]:
# Get leaderboard to explore current best model architectures

# Get raw data in pandas data frame
data = mycompetition.get_leaderboard()

# Stylize leaderboard data
mycompetition.stylize_leaderboard(data)

Unnamed: 0,mse,rmse,mae,r2,ml_framework,transfer_learning,deep_learning,model_type,depth,num_params,dense_layers,relu_act,loss,optimizer,model_config,memory_size,username,version
1,8133545.26,2851.94,1756.01,0.94,sklearn,False,False,RandomForestRegressor,,,,,,,"{'bootstrap': True, 'ccp_alpha...",,AIModelShare,5
2,9480598.18,3079.06,1760.07,0.93,keras,False,True,Sequential,4.0,10177.0,4.0,3.0,str,Adam,"{'name': 'sequential', 'layers...",1300784.0,AIModelShare,7
5,16362526.65,4045.06,2490.88,0.89,sklearn,False,False,RandomForestRegressor,,,,,,,"{'bootstrap': True, 'ccp_alpha...",,AIModelShare,1
6,16362526.65,4045.06,2490.88,0.89,sklearn,False,False,RandomForestRegressor,,,,,,,"{'bootstrap': True, 'ccp_alpha...",,AIModelShare,3
7,16465909.63,4057.82,2492.87,0.88,sklearn,False,False,GradientBoostingRegressor,,,,,,,"{'alpha': 0.9, 'ccp_alpha': 0....",,AIModelShare,6
8,27925581.59,5284.47,3376.42,0.8,sklearn,False,False,RandomForestRegressor,,,,,,,"{'bootstrap': True, 'ccp_alpha...",,AIModelShare,2
9,27925581.59,5284.47,3376.42,0.8,sklearn,False,False,RandomForestRegressor,,,,,,,"{'bootstrap': True, 'ccp_alpha...",,AIModelShare,4


## 5. Repeat submission process to improve place on leaderboard


In [None]:
# Create model 2  -- Model with Dropout regularization
import keras
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout,BatchNormalization

feature_count=preprocessor(X_train).shape[1] #count features in input data

model_2 = Sequential()
model_2.add(Dense(64, input_dim=feature_count))
model_2.add(BatchNormalization())
model_2.add(Activation('relu'))
model_2.add(Dense(64))
model_2.add(BatchNormalization())
model_2.add(Activation('relu'))
model_2.add(Dense(64))
model_2.add(BatchNormalization())
model_2.add(Activation('relu'))
model_2.add(Dense(1, kernel_initializer='normal')) 
                                            
# Compile model
model_2.compile(loss='mean_squared_error', optimizer='adam', metrics=['mean_squared_error'])

# Fitting the NN to the Training set
model_2.fit(preprocessor(X_train), y_train, 
               batch_size = 60, 
               epochs = 60, validation_split=0.35)
                                        

In [15]:
# Save Model 2 to .onnx file

onnx_model = model_to_onnx(model_2, framework='keras',
                          transfer_learning=False,
                          deep_learning=True)

# Save model to local .onnx file
with open("model_2.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString()) 

In [16]:
# Submit Model 2

#-- Generate predicted y values (Model 2)
prediction_labels = model_2.predict(preprocessor(X_test))

# Submit Model 2 to Competition Leaderboard
mycompetition.submit_model(model_filepath = "model_2.onnx",
                                 prediction_submission=prediction_labels,
                                 preprocessor_filepath="preprocessor.zip")

Insert search tags to help users find your model (optional): 
Provide any useful notes about your model (optional): 

Your model has been submitted as model version 8

To submit code used to create this model or to view current leaderboard navigate to Model Playground: 

 https://www.modelshare.org/detail/model:1307


In [17]:
# Compare two or more models
data=mycompetition.compare_models([7, 8], verbose=1)
mycompetition.stylize_compare(data)

Unnamed: 0,Model_7_Layer,Model_7_Shape,Model_7_Params,Model_8_Layer,Model_8_Shape,Model_8_Params
0,Dense,"[None, 64]",1792.0,Dense,"[None, 64]",1792
1,Dense,"[None, 64]",4160.0,BatchNormalization,"[None, 64]",256
2,Dense,"[None, 64]",4160.0,Activation,"[None, 64]",0
3,Dense,"[None, 1]",65.0,Dense,"[None, 64]",4160
4,,,,BatchNormalization,"[None, 64]",256
5,,,,Activation,"[None, 64]",0
6,,,,Dense,"[None, 64]",4160
7,,,,BatchNormalization,"[None, 64]",256
8,,,,Activation,"[None, 64]",0
9,,,,Dense,"[None, 1]",65


## Optional: Tune model within range of hyperparameters with Keras Tuner

*Simple example shown below. Consult [documentation](https://keras.io/guides/keras_tuner/getting_started/) to see full functionality.*

In [None]:
! pip install keras_tuner

In [19]:
#Separate validation data 
from sklearn.model_selection import train_test_split
x_train_split, x_val, y_train_split, y_val = train_test_split(
     X_train, y_train, test_size=0.2, random_state=42)

In [20]:
import keras
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout, BatchNormalization
from keras.regularizers import l1, l2, l1_l2
import keras_tuner as kt

feature_count=preprocessor(X_train).shape[1] #count features in input data

#Define model structure & parameter search space with function
def build_model(hp):
    model = keras.Sequential()
    model.add(Dense(64, input_dim=feature_count, activation='relu', kernel_regularizer=l2(0.01), bias_regularizer=l2(0.01)))
    model.add(Dense(64, activation='relu', kernel_regularizer=l2(0.01), bias_regularizer=l2(0.01)))
    model.add(Dense(units=hp.Int("units", min_value=32, max_value=512, step=32), #range 32-512 inclusive, minimum step between tested values is 32
                    activation='relu', kernel_regularizer=l2(0.01), bias_regularizer=l2(0.01)))
    model.add(Dense(1, kernel_initializer='normal')) 
    model.compile(
        loss='mean_squared_error', optimizer='adam', metrics=['mean_squared_error'])
    return model                                          

#initialize the tuner (which will search through parameters)
tuner = kt.RandomSearch(
    hypermodel=build_model, 
    objective="mean_squared_error", # objective to optimize
    max_trials=3, #max number of trials to run during search
    executions_per_trial=3, #higher number reduces variance of results; guages model performance more accurately 
    overwrite=True,
    directory="tuning_model",
    project_name="tuning_units",
)

tuner.search(preprocessor(x_train_split), y_train_split, epochs=2, validation_data=(preprocessor(x_val), y_val))

Trial 3 Complete [00h 00m 09s]
mean_squared_error: 48742800.0

Best mean_squared_error So Far: 48742800.0
Total elapsed time: 00h 00m 29s


In [21]:
# Build model with best hyperparameters

# Get the top 2 hyperparameters.
best_hps = tuner.get_best_hyperparameters(5)
# Build the model with the best hp.
tuned_model = build_model(best_hps[0])
# Fit with the entire dataset.
tuned_model.fit(x=preprocessor(X_train), y=y_train, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f6fc667bb50>

In [22]:
# Save keras model to local ONNX file
from aimodelshare.aimsonnx import model_to_onnx

onnx_model = model_to_onnx(tuned_model, framework='keras',
                          transfer_learning=False,
                          deep_learning=True)

with open("tuned_model.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

In [23]:
#Submit Model 3: 

#-- Generate predicted values
prediction_labels = tuned_model.predict(preprocessor(X_test))

# Submit Model 3 to Competition Leaderboard
mycompetition.submit_model(model_filepath = "tuned_model.onnx",
                                 preprocessor_filepath="preprocessor.zip",
                                 prediction_submission=prediction_labels)

Insert search tags to help users find your model (optional): 
Provide any useful notes about your model (optional): 

Your model has been submitted as model version 9

To submit code used to create this model or to view current leaderboard navigate to Model Playground: 

 https://www.modelshare.org/detail/model:1307


In [24]:
# Get leaderboard

data = mycompetition.get_leaderboard()
mycompetition.stylize_leaderboard(data)

Unnamed: 0,mse,rmse,mae,r2,ml_framework,transfer_learning,deep_learning,model_type,depth,num_params,dense_layers,batchnormalization_layers,relu_act,loss,optimizer,model_config,memory_size,username,version
1,8133545.26,2851.94,1756.01,0.94,sklearn,False,False,RandomForestRegressor,,,,,,,,"{'bootstrap': True, 'ccp_alpha...",,AIModelShare,5
2,9480598.18,3079.06,1760.07,0.93,keras,False,True,Sequential,4.0,10177.0,4.0,,3.0,str,Adam,"{'name': 'sequential', 'layers...",1300784.0,AIModelShare,7
5,16362526.65,4045.06,2490.88,0.89,sklearn,False,False,RandomForestRegressor,,,,,,,,"{'bootstrap': True, 'ccp_alpha...",,AIModelShare,1
6,16362526.65,4045.06,2490.88,0.89,sklearn,False,False,RandomForestRegressor,,,,,,,,"{'bootstrap': True, 'ccp_alpha...",,AIModelShare,3
7,17096262.61,4134.76,2241.4,0.88,keras,False,True,Sequential,4.0,24961.0,4.0,,3.0,str,Adam,"{'name': 'sequential_1', 'laye...",1586192.0,AIModelShare,9
8,16465909.63,4057.82,2492.87,0.88,sklearn,False,False,GradientBoostingRegressor,,,,,,,,"{'alpha': 0.9, 'ccp_alpha': 0....",,AIModelShare,6
9,27925581.59,5284.47,3376.42,0.8,sklearn,False,False,RandomForestRegressor,,,,,,,,"{'bootstrap': True, 'ccp_alpha...",,AIModelShare,2
10,27925581.59,5284.47,3376.42,0.8,sklearn,False,False,RandomForestRegressor,,,,,,,,"{'bootstrap': True, 'ccp_alpha...",,AIModelShare,4
12,146785488.22,12115.51,7353.46,-0.03,keras,False,True,Sequential,7.0,10945.0,4.0,3.0,3.0,str,Adam,"{'name': 'sequential_1', 'laye...",2919448.0,AIModelShare,8


In [25]:
# Compare two or more models
data=mycompetition.compare_models([7, 8, 9], verbose=1)
mycompetition.stylize_compare(data)

Unnamed: 0,Model_7_Layer,Model_7_Shape,Model_7_Params,Model_8_Layer,Model_8_Shape,Model_8_Params,Model_9_Layer,Model_9_Shape,Model_9_Params
0,Dense,"[None, 64]",1792.0,Dense,"[None, 64]",1792,Dense,"[None, 64]",1792.0
1,Dense,"[None, 64]",4160.0,BatchNormalization,"[None, 64]",256,Dense,"[None, 64]",4160.0
2,Dense,"[None, 64]",4160.0,Activation,"[None, 64]",0,Dense,"[None, 288]",18720.0
3,Dense,"[None, 1]",65.0,Dense,"[None, 64]",4160,Dense,"[None, 1]",289.0
4,,,,BatchNormalization,"[None, 64]",256,,,
5,,,,Activation,"[None, 64]",0,,,
6,,,,Dense,"[None, 64]",4160,,,
7,,,,BatchNormalization,"[None, 64]",256,,,
8,,,,Activation,"[None, 64]",0,,,
9,,,,Dense,"[None, 1]",65,,,
