<p align="center"><img width="50%" src="https://aimodelsharecontent.s3.amazonaws.com/aimodshare_banner.jpg" /></p>


---

## Model Submission Guide: Alloy Balling Competition
Let's share our models to a centralized leaderboard, so that we can collaborate and learn from the model experimentation process...

**Instructions:**
1.   Get data in and set up X_train / X_test / y_train
2.   Preprocess data using function
3. Fit model on preprocessed data and save preprocessor function and model 
4. Generate predictions from X_test data and submit model to competition
5. Repeat submission process to improve place on leaderboard



## 1. Get data in and set up X_train, X_test, y_train objects

In [None]:
#install aimodelshare library
! pip install aimodelshare --upgrade

In [2]:
# Get competition data (loaded to your working directory)
from aimodelshare import download_data
download_data('public.ecr.aws/y2e2a1d6/balling_competition_data-repository:latest') 

Downloading [==>                                              ]

Data downloaded successfully.


In [3]:
# Separate data into X_train, y_train, and X_test
import pandas as pd
X_train=pd.read_csv("balling_competition_data/X_train.csv")
y_train=pd.read_csv("balling_competition_data/y_train.csv").values
X_test=pd.read_csv("balling_competition_data/X_test.csv")
X_train.head()


Unnamed: 0,Solid/Spread,LED [J/m],Dwell Time [s]
0,2.492745,333.333333,0.000167
1,3.69829,268.421053,0.000105
2,3.795608,61.038961,5.2e-05
3,57.568953,242.424242,0.000242
4,3.69829,80.8,4e-05


##2.   Preprocess data using function/ Write and Save Preprocessor function


In [4]:
# Write function to transform data with preprocessor 
# In this case we use panda's fillna in our preprocessor function to replace missing values with column means

def preprocessor(data):
    preprocessed_data=data.fillna({'Solid/Spread': 26.428271, 'LED [J/m]': 233.049568, 'Dwell Time [s]': 0.000125})

    return preprocessed_data.values #return preprocessed numpy array

In [5]:
# check shape of X data after preprocessing it using our new function
preprocessor(X_train).shape

(1871, 3)

##3. Fit model on preprocessed data and save preprocessor function and model 


In [6]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import AdaBoostClassifier
from sklearn import preprocessing
from imblearn.over_sampling import SMOTE
from imblearn.under_sampling import RandomUnderSampler
from imblearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from sklearn.calibration import calibration_curve

# Separate data into X_train, y_train, and X_test
import pandas as pd
X_train=pd.read_csv("balling_competition_data/X_train.csv")
y_train=pd.read_csv("balling_competition_data/y_train.csv").values
X_test=pd.read_csv("balling_competition_data/X_test.csv")
X_train.head()

#Apply SMOTE to training set
over = SMOTE(k_neighbors=5,sampling_strategy=.5,random_state=30)
under = RandomUnderSampler(sampling_strategy=.5,random_state=30)
steps = [('o', over), ('u', under)]
pipeline = Pipeline(steps=steps)
X_train, y_train = pipeline.fit_resample(X_train, y_train)

#Train AdaBoost classifier
model = AdaBoostClassifier(n_estimators=200, random_state=30)
model = model.fit(preprocessor(X_train), y_train)

model.score(preprocessor(X_train), y_train) # Fit score, 0-1 scale.

0.8732681336593318

#### Save preprocessor function to local "preprocessor.zip" file

In [7]:
import aimodelshare as ai
ai.export_preprocessor(preprocessor,"") 

Your preprocessor is now saved to 'preprocessor.zip'


#### Save model to local ".onnx" file

In [8]:
# Save sklearn model to local ONNX file
from aimodelshare.aimsonnx import model_to_onnx

# Check how many preprocessed input features are there?
from skl2onnx.common.data_types import FloatTensorType

feature_count=preprocessor(X_test).shape[1] #Get count of preprocessed features
initial_type = [('float_input', FloatTensorType([None, feature_count]))]  #Insert correct number of preprocessed features

onnx_model = model_to_onnx(model, framework='sklearn',
                          initial_types=initial_type,
                          transfer_learning=False,
                          deep_learning=False)

with open("model.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

## 4. Generate predictions from X_test data and submit model to competition


In [14]:
#Set credentials using modelshare.org username/password

from aimodelshare.aws import set_credentials

#This is the unique playground id that powers this Playground -- make sure to update the id for different competitions
playground_id="https://ov8kxwsss6.execute-api.us-east-2.amazonaws.com/prod/m"

set_credentials(apiurl=playground_id)

AI Modelshare Username:··········
AI Modelshare Password:··········
AI Model Share login credentials set successfully.


In [15]:
#Instantiate Competition
import aimodelshare as ai
mycompetition= ai.Competition(playground_id)

In [48]:
#Submit Model 1: 

#-- Generate predicted values (a list of predicted labels "survived" or "died") (Model 1)
prediction_labels = model.predict(preprocessor(X_test))

# Submit Model 1 to Competition Leaderboard
mycompetition.submit_model(model_filepath = "model.onnx",
                                 preprocessor_filepath="preprocessor.zip",
                                 prediction_submission=prediction_labels)

Insert search tags to help users find your model (optional): adaboost
Provide any useful notes about your model (optional): untuned, adaboost

Your model has been submitted as model version 4

To submit code used to create this model or to view current leaderboard navigate to Model Playground: 

 https://www.modelshare.org/detail/model:2712


In [49]:
# Get leaderboard to explore current best model architectures

# Get raw data in pandas data frame
data = mycompetition.get_leaderboard()

# Stylize leaderboard data
mycompetition.stylize_leaderboard(data)

Unnamed: 0,accuracy,f1_score,precision,recall,ml_framework,transfer_learning,deep_learning,model_type,num_params,optimizer,username,version
0,86.86%,75.39%,73.45%,78.10%,sklearn,False,False,AdaBoostClassifier,,,mikedparrott,4
1,85.90%,46.21%,42.95%,50.00%,sklearn,False,False,LogisticRegression,3.0,liblinear,mikedparrott,1
2,49.04%,44.95%,55.41%,60.84%,sklearn,False,False,LogisticRegression,3.0,lbfgs,mikedparrott,2
3,85.90%,46.21%,42.95%,50.00%,sklearn,False,False,LogisticRegression,3.0,liblinear,mikedparrott,3


## 5. Repeat submission process to improve place on leaderboard


In [68]:
# Train and submit model 2 using same preprocessor (note that you could save a new preprocessor, but we will use the same one for this example).
from sklearn.linear_model import LogisticRegression

model_2 = LogisticRegression(C=.01, penalty='l2')
model_2.fit(preprocessor(X_train), y_train) # Fitting to the training set.
model_2.score(preprocessor(X_train), y_train) # Fit score, 0-1 scale.

0.6666666666666666

In [None]:
# Save sklearn model to local ONNX file
from aimodelshare.aimsonnx import model_to_onnx

feature_count=preprocessor(X_test).shape[1] #Get count of preprocessed features
initial_type = [('float_input', FloatTensorType([None, feature_count]))]  # Insert correct number of preprocessed features

onnx_model = model_to_onnx(model_2, framework='sklearn',
                          initial_types=initial_type,
                          transfer_learning=False,
                          deep_learning=False)

with open("model2.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

In [None]:
#Submit Model 2: 

#-- Generate predicted y values (Model 2)
prediction_labels = model_2.predict(preprocessor(X_test))

# Submit Model 2 to Competition Leaderboard
mycompetition.submit_model(model_filepath = "model2.onnx",
                                 prediction_submission=prediction_labels,
                                 preprocessor_filepath="preprocessor.zip")

Insert search tags to help users find your model (optional): 
Provide any useful notes about your model (optional): 

Your model has been submitted as model version 4

To submit code used to create this model or to view current leaderboard navigate to Model Playground: 

 https://www.modelshare.org/detail/model:1656


In [50]:
# Compare two or more models
data=mycompetition.compare_models([1,2], verbose=1)
mycompetition.stylize_compare(data)

Unnamed: 0,param_name,default_value,model_version_1,model_version_2
0,C,1.000000,10,0.010000
1,class_weight,,,balanced
2,dual,False,False,False
3,fit_intercept,True,True,True
4,intercept_scaling,1,1,1
5,l1_ratio,,,
6,max_iter,100,100,100
7,multi_class,auto,auto,auto
8,n_jobs,,,
9,penalty,l2,l1,l2







In [69]:
# Submit a third model using GridSearchCV

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
import numpy as np

param_grid = {'n_estimators': np.arange(100, 300, 500),'max_depth':[1, 3, 5]} #np.arange creates sequence of numbers for each k value

gridmodel = GridSearchCV(RandomForestClassifier(), param_grid=param_grid, cv=10)

#use meta model methods to fit score and predict model:
gridmodel.fit(preprocessor(X_train), y_train)

#extract best score and parameter by calling objects "best_score_" and "best_params_"
print("best mean cross-validation score: {:.3f}".format(gridmodel.best_score_))
print("best parameters: {}".format(gridmodel.best_params_))


best mean cross-validation score: 0.825
best parameters: {'max_depth': 5, 'n_estimators': 100}


In [55]:
# Save sklearn model to local ONNX file
from aimodelshare.aimsonnx import model_to_onnx

feature_count=preprocessor(X_test).shape[1] #Get count of preprocessed features
initial_type = [('float_input', FloatTensorType([None, feature_count]))]  # Insert correct number of preprocessed features

onnx_model = model_to_onnx(gridmodel, framework='sklearn',
                          initial_types=initial_type,
                          transfer_learning=False,
                          deep_learning=False)

with open("gridmodel.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

In [56]:
#Submit Model 3: 

#-- Generate predicted values
prediction_labels = gridmodel.predict(preprocessor(X_test))

# Submit to Competition Leaderboard
mycompetition.submit_model(model_filepath = "gridmodel.onnx",
                                 preprocessor_filepath="preprocessor.zip",
                                 prediction_submission=prediction_labels)

Insert search tags to help users find your model (optional): random forest, tuned
Provide any useful notes about your model (optional): random forest, tuned with gridsearchcv

Your model has been submitted as model version 5

To submit code used to create this model or to view current leaderboard navigate to Model Playground: 

 https://www.modelshare.org/detail/model:2712


In [64]:
# Get leaderboard

data = mycompetition.get_leaderboard()
mycompetition.stylize_leaderboard(data)

Unnamed: 0,accuracy,f1_score,precision,recall,ml_framework,transfer_learning,deep_learning,model_type,num_params,optimizer,username,version
0,86.86%,75.39%,73.45%,78.10%,sklearn,False,False,AdaBoostClassifier,,,mikedparrott,4
1,86.54%,74.15%,72.69%,76.02%,sklearn,False,False,RandomForestClassifier,,,mikedparrott,5
2,84.62%,70.95%,69.47%,73.00%,sklearn,False,False,GradientBoostingClassifier,,,mikedparrott,6
3,85.90%,46.21%,42.95%,50.00%,sklearn,False,False,LogisticRegression,3.0,liblinear,mikedparrott,1
4,85.90%,46.21%,42.95%,50.00%,sklearn,False,False,LogisticRegression,3.0,liblinear,mikedparrott,3
5,49.04%,44.95%,55.41%,60.84%,sklearn,False,False,LogisticRegression,3.0,lbfgs,mikedparrott,2


In [61]:
# Compare two or more models
data=mycompetition.compare_models([1,2,4,5], verbose=1)
mycompetition.stylize_compare(data)

Unnamed: 0,param_name,default_value,model_version_1,model_version_2
0,C,1.000000,10,0.010000
1,class_weight,,,balanced
2,dual,False,False,False
3,fit_intercept,True,True,True
4,intercept_scaling,1,1,1
5,l1_ratio,,,
6,max_iter,100,100,100
7,multi_class,auto,auto,auto
8,n_jobs,,,
9,penalty,l2,l1,l2







Unnamed: 0,param_name,default_value,model_version_4
0,algorithm,SAMME.R,SAMME.R
1,base_estimator,,
2,learning_rate,1.000000,1.000000
3,n_estimators,50,200
4,random_state,,30







Unnamed: 0,param_name,default_value,model_version_5
0,bootstrap,True,True
1,ccp_alpha,0.000000,0.000000
2,class_weight,,
3,criterion,gini,gini
4,max_depth,,5
5,max_features,auto,auto
6,max_leaf_nodes,,
7,max_samples,,
8,min_impurity_decrease,0.000000,0.000000
9,min_impurity_split,,







In [63]:
# Here are several classic ML architectures you can consider choosing from to experiment with next:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import BaggingClassifier
from sklearn.ensemble import GradientBoostingClassifier

#Example code to fit model:
model = GradientBoostingClassifier(n_estimators=50, learning_rate=1.0,
    max_depth=1, random_state=0).fit(preprocessor(X_train), y_train)
model.score(preprocessor(X_train), y_train)

# Save sklearn model to local ONNX file
from aimodelshare.aimsonnx import model_to_onnx

feature_count=preprocessor(X_test).shape[1] #Get count of preprocessed features
initial_type = [('float_input', FloatTensorType([None, feature_count]))]  # Insert correct number of preprocessed features

onnx_model = model_to_onnx(model, framework='sklearn',
                          initial_types=initial_type,
                          transfer_learning=False,
                          deep_learning=False)

with open("model.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

#-- Generate predicted values (a list of predicted labels "real" or "fake")
prediction_labels = model.predict(preprocessor(X_test))

# Submit model to Competition Leaderboard
mycompetition.submit_model(model_filepath = "model.onnx",
                                 preprocessor_filepath="preprocessor.zip",
                                 prediction_submission=prediction_labels)


  return f(*args, **kwargs)


Insert search tags to help users find your model (optional): 
Provide any useful notes about your model (optional): 

Your model has been submitted as model version 6

To submit code used to create this model or to view current leaderboard navigate to Model Playground: 

 https://www.modelshare.org/detail/model:2712


# Submit tf.keras deep learning model 

In [21]:
import tensorflow.keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation

model = Sequential()
model.add(Dense(64, input_dim=3, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(64, activation='relu'))

model.add(Dense(2, activation='softmax')) 
                                            
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])


y_train_onehot=pd.get_dummies(y_train)
# Fitting the NN to the Training set
model.fit(preprocessor(X_train), y_train_onehot, 
               batch_size = 20, 
               epochs = 3, validation_split=0.25)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x7f304fdf47d0>

In [27]:
list(y_train_onehot.columns)

[0, 1]

In [11]:
# Save keras model to local ONNX file
from aimodelshare.aimsonnx import model_to_onnx

onnx_model = model_to_onnx(model, framework='keras',
                          transfer_learning=False,
                          deep_learning=True)

with open("model.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

In [30]:
#Submit Model: 

#-- Generate predicted y values
#Note: Keras predict returns the predicted column index location for classification models
prediction_column_index=model.predict(preprocessor(X_test)).argmax(axis=1)

# extract correct prediction labels 
prediction_labels = [list(y_train_onehot.columns)[i] for i in prediction_column_index]

# Submit Model 1 to Competition Leaderboard
mycompetition.submit_model(model_filepath = "model.onnx",
                                 preprocessor_filepath="preprocessor.zip",
                                 prediction_submission=prediction_labels)

Insert search tags to help users find your model (optional): 
Provide any useful notes about your model (optional): 

Your model has been submitted as model version 8

To submit code used to create this model or to view current leaderboard navigate to Model Playground: 

 https://www.modelshare.org/detail/model:2712


In [31]:
# Get leaderboard

data = mycompetition.get_leaderboard()
mycompetition.stylize_leaderboard(data)

Unnamed: 0,accuracy,f1_score,precision,recall,ml_framework,transfer_learning,deep_learning,model_type,depth,num_params,dense_layers,relu_act,softmax_act,loss,optimizer,memory_size,username,version
0,86.86%,75.39%,73.45%,78.10%,sklearn,False,False,AdaBoostClassifier,,,,,,,,,mikedparrott,4
1,86.54%,74.15%,72.69%,76.02%,sklearn,False,False,RandomForestClassifier,,,,,,,,,mikedparrott,5
2,84.62%,70.95%,69.47%,73.00%,sklearn,False,False,GradientBoostingClassifier,,,,,,,,,mikedparrott,6
3,85.90%,46.21%,42.95%,50.00%,sklearn,False,False,LogisticRegression,,3.0,,,,,liblinear,,mikedparrott,1
4,85.90%,46.21%,42.95%,50.00%,sklearn,False,False,LogisticRegression,,3.0,,,,,liblinear,,mikedparrott,3
5,49.04%,44.95%,55.41%,60.84%,sklearn,False,False,LogisticRegression,,3.0,,,,,lbfgs,,mikedparrott,2
6,85.26%,46.02%,42.90%,49.63%,keras,False,True,Sequential,4.0,8706.0,4.0,3.0,1.0,str,SGD,35872.0,mikedparrott,8
7,nan%,nan%,nan%,nan%,keras,False,True,Sequential,4.0,8706.0,4.0,3.0,1.0,str,SGD,35872.0,mikedparrott,7


## What if I just want to submit predictions?  
Or a model architecture doesn't submit.
Answer:  Just submit predictions without a model or preprocessor object

In [33]:
# Submit Model 1 to Competition Leaderboard
mycompetition.submit_model(model_filepath = None,
                                 preprocessor_filepath=None,
                                 prediction_submission=prediction_labels)

Insert search tags to help users find your model (optional): 
Provide any useful notes about your model (optional): 

Your model has been submitted as model version 9

To submit code used to create this model or to view current leaderboard navigate to Model Playground: 

 https://www.modelshare.org/detail/model:2712
