<p align="center"><img width="50%" src="https://aimodelsharecontent.s3.amazonaws.com/aimodshare_banner.jpg" /></p>


---

## Model Submission Guide: CapIQ-Rating Classification Competition
Let's share our models to a centralized leaderboard, so that we can collaborate and learn from the model experimentation process...

**Instructions:**
1.   Get data in and set up X_train / X_test / y_train
2.   Preprocess data using Sklearn Column Transformer/ Write and Save Preprocessor function
3. Fit model on preprocessed data and save preprocessor function and model 
4. Generate predictions from X_test data and submit model to competition
5. Repeat submission process to improve place on leaderboard



## 1. Get data in and set up X_train, X_test, y_train objects

In [None]:
#install aimodelshare library
! pip install aimodelshare --upgrade

In [2]:
# Get competition data
from aimodelshare import download_data
download_data('public.ecr.aws/y2e2a1d6/capiq_rating_competition_data-repository:latest') 


Data downloaded successfully.


In [3]:
# Separate data into X_train, y_train, and X_test
import pandas as pd
y_train_labels = pd.read_csv("capiq_rating_competition_data/y_train.csv", squeeze=True)
y_train = pd.get_dummies(y_train_labels)

X_train = pd.read_csv("capiq_rating_competition_data/X_train.csv")
X_test=pd.read_csv("capiq_rating_competition_data/X_test.csv")

X_train.head()

Unnamed: 0,Name,Symbol,Exchange,Industry,MarketCap,EnterpriseValue,Revenue,GrossProfit,EBITDA,EBIT,...,CurrentAssets,ShortTermDebt,LTD_Cap_Leases,Leases_LongTerm,LongTermDebt,Liabilities,Liabilities_N_Equity,Debt_Current,Debt_NonCurrent,Debt_Net
0,3M Company (NYSE:MMM),NYSE:MMM,New York Stock Exchange (NYSE),Industrials,99704.48,109611.58,31841.9,15447.6,8734.6,7966.3,...,13272.0,353.7,1117.3,253.1,11248.2,31269.35,37798.6,1471.0,11501.3,9166.3
1,AAR Corp. (NYSE:AIR),NYSE:AIR,New York Stock Exchange (NYSE),Industrials,1131.61,1412.65,1816.2,272.93,129.26,103.655,...,1014.52,0.06,37.37,13.08,328.11,1303.615,1760.14,37.43,341.19,276.71
2,Abbott Laboratories (NYSE:ABT),NYSE:ABT,New York Stock Exchange (NYSE),Health Care,112198.8,121025.92,27682.23,15806.85,6757.35,5505.365,...,20018.3,1243.9,608.1,261.3,15047.1,46540.7,60436.2,1852.0,15308.4,8373.9
3,AbbVie Inc. (NYSE:ABBV),NYSE:ABBV,New York Stock Exchange (NYSE),Health Care,135936.95,166341.05,31122.8,23163.6,13948.2,12604.85,...,22157.2,678.8,3891.5,186.2,37593.7,70162.0,71921.5,4570.3,37779.9,30633.7
4,Adecoagro S.A. (NYSE:AGRO),NYSE:AGRO,New York Stock Exchange (NYSE),Consumer Staples,1021.64,1760.08,817.16,260.81,255.28,193.0,...,623.58,7.6627,178.79,53.7321,595.08,1513.135,1949.59,186.449,648.81,606.12


##2.   Preprocess data using Sklearn / Write and Save Preprocessor function


In [4]:
# Simple Preprocessor with sklearn 
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer

numeric_features = ['MarketCap', 'EnterpriseValue', 'Revenue', 'GrossProfit', 'EBITDA', 
                    'EBIT', 'NetIncome', 'Cash', 'PPnE', 'Assets', 'Debt', 'Equity', 
                    'Receivables', 'Inventory', 'CurrentAssets', 'ShortTermDebt', 
                    'LTD_Cap_Leases', 'Leases_LongTerm', 'LongTermDebt', 'Liabilities', 
                    'Liabilities_N_Equity', 'Debt_Current', 'Debt_NonCurrent', 'Debt_Net']
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')), 
    ('scaler', StandardScaler())])

categorical_features = ['Industry']
## Replacing missing values with Modal value and then one-hot encoding.
categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='most_frequent')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))])

# Final preprocessor object set up with ColumnTransformer...

preprocess = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)])

# fit preprocessor to your data
preprocess = preprocess.fit(X_train)

In [5]:
# Here is where we actually write the preprocessor function:
def preprocessor(data):
    preprocessed_data=preprocess.transform(data)
    return preprocessed_data

In [6]:
# check shape of X data after preprocessing it using our new function
preprocessor(X_train).shape

(406, 34)

##3. Fit model on preprocessed data and save preprocessor function and model 


In [7]:
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(n_estimators=200, max_depth=3)
model.fit(preprocessor(X_train), y_train_labels) # Fitting to the training set.
model.score(preprocessor(X_train), y_train_labels) # Fit score, 0-1 scale.

0.4729064039408867

#### Save preprocessor function to local "preprocessor.zip" file

In [8]:
import aimodelshare as ai
ai.export_preprocessor(preprocessor,"") 

Your preprocessor is now saved to 'preprocessor.zip'


#### Save model to local ".onnx" file

In [9]:
# Save sklearn model to local ONNX file
from aimodelshare.aimsonnx import model_to_onnx

# Check how many preprocessed input features there are
from skl2onnx.common.data_types import FloatTensorType
feature_count=preprocessor(X_test).shape[1] #Gets count of preprocessed features
initial_type = [('float_input', FloatTensorType([None, feature_count]))]  # Inserts correct number of features in preprocessed data

onnx_model = model_to_onnx(model, framework='sklearn',
                          initial_types=initial_type,
                          transfer_learning=False,deep_learning=False)

with open("model.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

## 4. Generate predictions from X_test data and submit model to competition


In [10]:
#Set credentials using modelshare.org username/password

from aimodelshare.aws import set_credentials

#This is the unique rest api that powers this specific Classification Playground -- make sure to update the apiurl for new competition deployments
apiurl='https://3hf4nd1e6b.execute-api.us-east-1.amazonaws.com/prod/m'

set_credentials(apiurl=apiurl)

AI Modelshare Username:··········
AI Modelshare Password:··········
AI Model Share login credentials set successfully.


In [11]:
#Instantiate Competition
from aimodelshare import ModelPlayground

myplayground = ModelPlayground(playground_url='https://3hf4nd1e6b.execute-api.us-east-1.amazonaws.com/prod/m')
mycompetition= ai.Competition(myplayground.playground_url)

In [12]:
#Submit Model 1: 

#-- Generate predicted values (a list of predicted rating labels) (Model 1)
prediction_labels = model.predict(preprocessor(X_test))

# Submit Model 1 to Competition Leaderboard
mycompetition.submit_model(model_filepath = "model.onnx",
                                 preprocessor_filepath="preprocessor.zip",
                                 prediction_submission=prediction_labels)

Insert search tags to help users find your model (optional): 
Provide any useful notes about your model (optional): 

Your model has been submitted as model version 1

To submit code used to create this model or to view current leaderboard navigate to Model Playground: 

 https://www.modelshare.org/detail/model:1756


In [15]:
# Get leaderboard to explore current best model architectures

# Get raw data in pandas data frame
data = mycompetition.get_leaderboard()

# Stylize leaderboard data
mycompetition.stylize_leaderboard(data)

Unnamed: 0,accuracy,f1_score,precision,recall,ml_framework,transfer_learning,deep_learning,model_type,username,version
0,24.00%,18.33%,19.64%,23.97%,sklearn,False,False,RandomForestClassifier,ML_Risk_Mgmnt,1


## 5. Repeat submission process to improve place on leaderboard


In [16]:
# Train and submit model 2 using same preprocessor (note that you could save a new preprocessor, but we will use the same one for this example).
from sklearn.svm import SVC # "Support vector classifier"
model2 = SVC(kernel='linear', C=100) 
model2.fit(preprocessor(X_train), y_train_labels)
model2.score(preprocessor(X_train), y_train_labels)

0.6650246305418719

In [17]:
# Save sklearn model to local ONNX file
from aimodelshare.aimsonnx import model_to_onnx

# Check how many preprocessed input features there are
from skl2onnx.common.data_types import FloatTensorType
feature_count=preprocessor(X_test).shape[1] #Gets count of preprocessed features
initial_type = [('float_input', FloatTensorType([None, feature_count]))]  # Inserts correct number of features in preprocessed data

onnx_model = model_to_onnx(model2, framework='sklearn',
                          initial_types=initial_type,
                          transfer_learning=False,deep_learning=False)

with open("model2.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

In [18]:
#Submit Model 2: 

#-- Generate predicted values (Model 2)
prediction_labels = model2.predict(preprocessor(X_test))

# Submit Model 2 to Competition Leaderboard
mycompetition.submit_model(model_filepath = "model2.onnx",
                                 preprocessor_filepath="preprocessor.zip",
                                 prediction_submission=prediction_labels)

Insert search tags to help users find your model (optional): 
Provide any useful notes about your model (optional): 

Your model has been submitted as model version 2

To submit code used to create this model or to view current leaderboard navigate to Model Playground: 

 https://www.modelshare.org/detail/model:1756


In [None]:
# Build a third model using GridSearchCV

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
import numpy as np

param_grid = {'n_estimators': [150, 300, 500],'max_depth':[1, 3, 5]}

gridmodel = GridSearchCV(RandomForestClassifier(), param_grid=param_grid, cv=5)

#use meta model methods to fit score and predict model:
gridmodel.fit(preprocessor(X_train), y_train_labels)

#extract best score and parameter by calling objects "best_score_" and "best_params_"
print("best mean cross-validation score: {:.3f}".format(gridmodel.best_score_))
print("best parameters: {}".format(gridmodel.best_params_))


In [21]:
# Save sklearn model to local ONNX file
from aimodelshare.aimsonnx import model_to_onnx

feature_count=preprocessor(X_test).shape[1] #Gets count of preprocessed features
initial_type = [('float_input', FloatTensorType([None, feature_count]))]  # Inserts correct number of preprocessed features

onnx_model = model_to_onnx(gridmodel, framework='sklearn',
                          initial_types=initial_type,
                          transfer_learning=False,
                          deep_learning=False)

with open("gridmodel.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

In [22]:
#Submit Model 3: 

#-- Generate predicted values
prediction_labels = gridmodel.predict(preprocessor(X_test))

# Submit to Competition Leaderboard
mycompetition.submit_model(model_filepath = "gridmodel.onnx",
                                 preprocessor_filepath="preprocessor.zip",
                                 prediction_submission=prediction_labels)

Insert search tags to help users find your model (optional): 
Provide any useful notes about your model (optional): 

Your model has been submitted as model version 3

To submit code used to create this model or to view current leaderboard navigate to Model Playground: 

 https://www.modelshare.org/detail/model:1756


In [23]:
# Get leaderboard

data = mycompetition.get_leaderboard()
mycompetition.stylize_leaderboard(data)

Unnamed: 0,accuracy,f1_score,precision,recall,ml_framework,transfer_learning,deep_learning,model_type,num_params,username,version
0,32.00%,23.20%,28.63%,25.46%,sklearn,False,False,SVC,4080.0,ML_Risk_Mgmnt,2
1,24.00%,18.33%,19.64%,23.97%,sklearn,False,False,RandomForestClassifier,,ML_Risk_Mgmnt,1
2,18.00%,13.63%,13.48%,15.83%,sklearn,False,False,RandomForestClassifier,,ML_Risk_Mgmnt,3


In [24]:
# Compare two or more models
data=mycompetition.compare_models([1,3], verbose=1)
mycompetition.stylize_compare(data)

Unnamed: 0,param_name,default_value,model_version_1,model_version_3
0,bootstrap,True,True,True
1,ccp_alpha,0.000000,0.000000,0.000000
2,class_weight,,,
3,criterion,gini,gini,gini
4,max_depth,,3,5
5,max_features,auto,auto,auto
6,max_leaf_nodes,,,
7,max_samples,,,
8,min_impurity_decrease,0.000000,0.000000,0.000000
9,min_impurity_split,,,







In [26]:
# Here are several classic ML architectures you can consider choosing from to experiment with next:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import BaggingClassifier
from sklearn.ensemble import GradientBoostingClassifier

#Example code to fit model:
model = SVC(kernel='linear', C=1000).fit(preprocessor(X_train), y_train_labels)
model.score(preprocessor(X_train), y_train_labels)

# Save sklearn model to local ONNX file
from aimodelshare.aimsonnx import model_to_onnx

feature_count=preprocessor(X_test).shape[1] #Get count of preprocessed features
initial_type = [('float_input', FloatTensorType([None, feature_count]))]  # Insert correct number of preprocessed features

onnx_model = model_to_onnx(model, framework='sklearn',
                          initial_types=initial_type,
                          transfer_learning=False,
                          deep_learning=False)

with open("model.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

#-- Generate predicted values (a list of predicted labels)
prediction_labels = model.predict(preprocessor(X_test))

# Submit model to Competition Leaderboard
mycompetition.submit_model(model_filepath = "model.onnx",
                                 preprocessor_filepath="preprocessor.zip",
                                 prediction_submission=prediction_labels)


Insert search tags to help users find your model (optional): 
Provide any useful notes about your model (optional): 

Your model has been submitted as model version 4

To submit code used to create this model or to view current leaderboard navigate to Model Playground: 

 https://www.modelshare.org/detail/model:1756
