<p style="text-align: center"><img src="https://gitlab.aicrowd.com/aicrowd/assets/-/raw/master/challenges/clock-decomposition/notebook-banner.jpg?inline=false" alt="Drawing" style="height: 400px;"/></p>

# What is the notebook about?

The challenge is to use the features extracted from the Clock Drawing Test to build an automated and algorithm to predict whether each participant is one of three phases:

1)    Pre-Alzheimer’s (Early Warning)
2)    Post-Alzheimer’s (Detection)
3)    Normal (Not an Alzheimer’s patient)

In machine learning terms: this is a 3-class classification task.

# How to use this notebook? 📝

<p style="text-align: center"><img src="https://gitlab.aicrowd.com/aicrowd/assets/-/raw/master/notebook/aicrowd_notebook_submission_flow.png?inline=false" alt="notebook overview" style="width: 650px;"/></p>

- **Update the config parameters**. You can define the common variables here

Variable | Description
--- | ---
`AICROWD_DATASET_PATH` | Path to the file containing test data (The data will be available at `/ds_shared_drive/` on aridhia workspace). This should be an absolute path.
`AICROWD_PREDICTIONS_PATH` | Path to write the output to.
`AICROWD_ASSETS_DIR` | In case your notebook needs additional files (like model weights, etc.,), you can add them to a directory and specify the path to the directory here (please specify relative path). The contents of this directory will be sent to AIcrowd for evaluation.
`AICROWD_API_KEY` | In order to submit your code to AIcrowd, you need to provide your account's API key. This key is available at https://www.aicrowd.com/participants/me

- **Installing packages**. Please use the [Install packages 🗃](#install-packages-) section to install the packages
- **Training your models**. All the code within the [Training phase ⚙️](#training-phase-) section will be skipped during evaluation. **Please make sure to save your model weights in the assets directory and load them in the predictions phase section** 

# Setup AIcrowd Utilities 🛠

We use this to bundle the files for submission and create a submission on AIcrowd. Do not edit this block.

In [None]:
!pip install -q -U aicrowd-cli

In [None]:
%load_ext aicrowd.magic

# AIcrowd Runtime Configuration 🧷

Define configuration parameters. Please include any files needed for the notebook to run under `ASSETS_DIR`. We will copy the contents of this directory to your final submission file 🙂

The dataset is available under `/ds_shared_drive` on the workspace.

In [None]:
import os

# Please use the absolute for the location of the dataset.
# Or you can use relative path with `os.getcwd() + "test_data/validation.csv"`
AICROWD_DATASET_PATH = os.getenv("DATASET_PATH", "/ds_shared_drive/validation.csv")
AICROWD_PREDICTIONS_PATH = os.getenv("PREDICTIONS_PATH", "predictions.csv")
AICROWD_ASSETS_DIR = "assets"


# Install packages 🗃

Please add all pacakage installations in this section

In [None]:
!pip install numpy pandas sklearn
#!pip install xgboost
!pip install catboost
!pip install ipywidgets



# Define preprocessing code 💻

The code that is common between the training and the prediction sections should be defined here. During evaluation, we completely skip the training section. Please make sure to add any common logic between the training and prediction sections here.

### Import common packages

Please import packages that are common for training and prediction phases here.

In [None]:
import numpy as np
import pandas as pd
import pickle
import os
import sklearn
from sklearn.metrics import f1_score, log_loss
from catboost import CatBoostClassifier
from sklearn.model_selection import cross_val_score

# Training phase ⚙️

You can define your training code here. This sections will be skipped during evaluation.

In [None]:
pd.set_option('display.max_rows', 1000)
pd.set_option('display.max_columns', 200)

## Load training data

In [None]:
train = pd.read_csv(os.getenv("DATASET_PATH", "/ds_shared_drive/train.csv"))

In [None]:
target_mapping_dict = {'normal' : 0, 'post_alzheimer' : 1, 'pre_alzheimer' : 2}

In [None]:
train['diagnosis'] = train['diagnosis'].map(target_mapping_dict)

In [None]:
train.drop(columns = ['row_id'], inplace = True)

In [None]:
train['intersection_pos_rel_centre'].fillna('Nan', inplace = True)

In [None]:
train = pd.concat([train.drop(columns = ['intersection_pos_rel_centre']), pd.get_dummies(train['intersection_pos_rel_centre'], prefix = 'intersection_pos_rel_centre')], axis = 1)

In [None]:
X_columns = [x for x in train.columns if x not in ['diagnosis']]

In [None]:
Y_column = 'diagnosis'

In [None]:
# data = pd.concat([pd.concat([train.loc[train[train['diagnosis'] == 0].isna().sum(axis = 1).sort_values().head(int(train[train['diagnosis'] == 0].shape[0]/3)).sample(frac=1/20).index],
#                              train.loc[train[train['diagnosis'] == 0].isna().sum(axis = 1).sort_values().head(int(train[train['diagnosis'] == 0].shape[0]/3*2)).tail(int(train[train['diagnosis'] == 0].shape[0]/3)).sample(frac=1/4).index],
#                              train.loc[train[train['diagnosis'] == 0].isna().sum(axis = 1).sort_values().tail(int(train[train['diagnosis'] == 0].shape[0]/3)).sample(frac=1/4).index]], axis = 0),
#                    train[train['diagnosis'] != 0]], axis = 0).fillna(0)         

In [None]:
# data = pd.concat([pd.concat([train.loc[train[train['diagnosis'] == 0].isna().sum(axis = 1).sort_values().head(int(train[train['diagnosis'] == 0].shape[0]/3)).sample(frac=1/15).index],
#                              train.loc[train[train['diagnosis'] == 0].isna().sum(axis = 1).sort_values().head(int(train[train['diagnosis'] == 0].shape[0]/3*2)).tail(int(train[train['diagnosis'] == 0].shape[0]/3)).sample(frac=1/4).index],
#                              train.loc[train[train['diagnosis'] == 0].isna().sum(axis = 1).sort_values().tail(int(train[train['diagnosis'] == 0].shape[0]/3)).sample(frac=1/5).index]], axis = 0),
#                    train[train['diagnosis'] != 0]], axis = 0).fillna(0)         

In [None]:
# data = pd.concat([pd.concat([train.loc[train[train['diagnosis'] == 0].isna().sum(axis = 1).sort_values().head(int(train[train['diagnosis'] == 0].shape[0]/3)).sample(frac=1/8).index],
#                              train.loc[train[train['diagnosis'] == 0].isna().sum(axis = 1).sort_values().head(int(train[train['diagnosis'] == 0].shape[0]/3*2)).tail(int(train[train['diagnosis'] == 0].shape[0]/3)).sample(frac=1/7).index],
#                              train.loc[train[train['diagnosis'] == 0].isna().sum(axis = 1).sort_values().tail(int(train[train['diagnosis'] == 0].shape[0]/3)).sample(frac=1/6).index]], axis = 0),
#                    train[train['diagnosis'] != 0]], axis = 0).fillna(0)         

In [None]:
# data = pd.concat([train[(train['diagnosis'] == 0) & (train['number_of_digits'].isin([7, 8, 9, 10, 11, 12, 13])) & (train['double_major']/train['double_minor'] < 1.7)].sample(frac=1/8),
#                   train[train['diagnosis'] != 0]], axis = 0).fillna(0) 

In [None]:
data = train.copy()

In [None]:
#pd.Series(X_columns).to_csv(AICROWD_ASSETS_DIR + '/X_columns.csv', index = False)

In [None]:
# The validation data (we merge in the labels for convenience)
val = pd.read_csv(os.getenv("DATASET_PATH", "/ds_shared_drive/validation.csv"))
val = pd.merge(val, pd.read_csv(os.getenv("DATASET_PATH", "/ds_shared_drive/validation_ground_truth.csv")), 
               how='left', on='row_id')

val['intersection_pos_rel_centre'].fillna('Nan', inplace = True)
val = pd.concat([val.drop(columns = ['intersection_pos_rel_centre']), pd.get_dummies(val['intersection_pos_rel_centre'], prefix = 'intersection_pos_rel_centre')], axis = 1)

val['diagnosis'] = val['diagnosis'].map(target_mapping_dict)
val.fillna(0, inplace = True)

print(val.shape)
val.head()

(362, 126)


Unnamed: 0,row_id,number_of_digits,missing_digit_1,missing_digit_2,missing_digit_3,missing_digit_4,missing_digit_5,missing_digit_6,missing_digit_7,missing_digit_8,missing_digit_9,missing_digit_10,missing_digit_11,missing_digit_12,1 dist from cen,10 dist from cen,11 dist from cen,12 dist from cen,2 dist from cen,3 dist from cen,4 dist from cen,5 dist from cen,6 dist from cen,7 dist from cen,8 dist from cen,9 dist from cen,euc_dist_digit_1,euc_dist_digit_2,euc_dist_digit_3,euc_dist_digit_4,euc_dist_digit_5,euc_dist_digit_6,euc_dist_digit_7,euc_dist_digit_8,euc_dist_digit_9,euc_dist_digit_10,euc_dist_digit_11,euc_dist_digit_12,area_digit_1,area_digit_2,area_digit_3,area_digit_4,area_digit_5,area_digit_6,area_digit_7,area_digit_8,area_digit_9,area_digit_10,area_digit_11,area_digit_12,height_digit_1,height_digit_2,height_digit_3,height_digit_4,height_digit_5,height_digit_6,height_digit_7,height_digit_8,height_digit_9,height_digit_10,height_digit_11,height_digit_12,width_digit_1,width_digit_2,width_digit_3,width_digit_4,width_digit_5,width_digit_6,width_digit_7,width_digit_8,width_digit_9,width_digit_10,width_digit_11,width_digit_12,variance_width,variance_height,variance_area,deviation_dist_from_mid_axis,between_axis_digits_angle_sum,between_axis_digits_angle_var,between_digits_angle_cw_sum,between_digits_angle_cw_var,between_digits_angle_ccw_sum,between_digits_angle_ccw_var,sequence_flag_cw,sequence_flag_ccw,number_of_hands,hand_count_dummy,hour_hand_length,minute_hand_length,single_hand_length,clockhand_ratio,clockhand_diff,angle_between_hands,deviation_from_centre,hour_proximity_from_11,minute_proximity_from_2,hour_pointing_digit,actual_hour_digit,minute_pointing_digit,actual_minute_digit,final_rotation_angle,ellipse_circle_ratio,count_defects,percentage_inside_ellipse,pred_tremor,double_major,double_minor,vertical_dist,horizontal_dist,top_area_perc,bottom_area_perc,left_area_perc,right_area_perc,hor_count,vert_count,eleven_ten_error,other_error,time_diff,centre_dot_detect,diagnosis,intersection_pos_rel_centre_BL,intersection_pos_rel_centre_BR,intersection_pos_rel_centre_Nan,intersection_pos_rel_centre_TL,intersection_pos_rel_centre_TR
0,LA9JQ1JZMJ9D2MBZV,11.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,314.649805,0.0,408.240125,323.34811,321.706776,264.496219,203.330396,205.081082,282.01507,343.657169,416.71603,435.900218,6.119758,25.267069,17.29,6.006505,10.246421,14.43,4.778738,43.124586,46.8,0.0,67.293643,3.9,2001.0,4180.0,6318.0,6528.0,6370.0,8127.0,5610.0,3312.0,9372.0,0.0,3500.0,6336.0,69.0,95.0,117.0,128.0,98.0,129.0,102.0,69.0,142.0,0.0,70.0,72.0,29.0,44.0,54.0,51.0,65.0,63.0,55.0,48.0,66.0,0.0,50.0,88.0,225.618182,730.963636,4773900.0,20.605,360.0,854.199907,0.0,8623.343673,0.0,8623.343673,0.0,0.0,3.0,3.0,0.0,0.0,183.844962,0.0,0.0,0.0,0.0,0.0,0.0,0.0,11,0.0,2,0.0,84.75355,106,1.0,0,118.97178,106.379109,111.720745,112.581495,0.500272,0.499368,0.553194,0.446447,0,0,0,1,0.0,0.0,1,0,0,1,0,0
1,PSSRCWAPTAG72A1NT,6.0,1.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,235.663425,0.0,0.0,325.616722,0.0,0.0,288.257264,292.027396,334.951116,370.648756,0.0,0.0,22.88,0.0,0.0,72.8,72.787316,20.133319,96.33,0.0,60.95582,0.0,0.0,0.0,12390.0,0.0,0.0,8848.0,5632.0,10434.0,7739.0,0.0,11834.0,0.0,0.0,0.0,118.0,0.0,0.0,79.0,64.0,94.0,71.0,0.0,97.0,0.0,0.0,0.0,105.0,0.0,0.0,112.0,88.0,111.0,109.0,0.0,122.0,0.0,126.166667,391.766667,6631428.0,64.003333,0.0,5998.258485,0.0,16273.28554,0.0,16273.28554,0.0,0.0,1.0,1.0,0.0,0.0,99.180032,0.0,0.0,0.0,0.0,0.0,0.0,0.0,11,0.0,2,180.0,73.359021,99,1.0,0,123.968624,99.208099,104.829045,114.955335,0.572472,0.427196,0.496352,0.503273,0,1,0,1,0.0,0.0,0,0,0,1,0,0
2,GCTODIZJB42VCBZRZ,11.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,438.627689,429.789774,447.455305,447.033835,409.185166,361.946474,359.824957,0.0,345.937133,366.201106,375.225266,427.154831,112.333641,100.3719,86.45,86.234478,0.0,89.57,94.556399,97.331146,111.02,111.411562,116.061975,116.22,3182.0,4473.0,4554.0,5032.0,0.0,5355.0,4148.0,4320.0,4420.0,7290.0,2726.0,5184.0,43.0,71.0,69.0,68.0,0.0,51.0,68.0,48.0,52.0,81.0,47.0,81.0,74.0,63.0,66.0,74.0,0.0,105.0,61.0,90.0,85.0,90.0,58.0,64.0,228.072727,192.618182,1418911.0,100.815,360.0,315.683251,0.0,257.619483,0.0,257.619483,1.0,0.0,2.0,2.0,42.707325,78.437307,0.0,1.836624,35.729983,106.779868,55.597531,6.15111,0.57766,11.0,11,2.0,2,270.0,86.346225,120,1.0,0,124.13467,120.3921,122.90987,121.542463,0.494076,0.505583,0.503047,0.496615,1,0,0,0,0.0,0.0,0,1,0,0,0,0
3,7YMVQGV1CDB1WZFNE,3.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,408.827592,272.472476,0.0,195.714716,0.0,0.0,0.0,0.0,0.0,0.0,2.506574,0.0,4.35366,0.0,0.0,0.0,0.0,0.0,0.0,0.0,12.48,0.0,1794.0,0.0,3416.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3360.0,0.0,39.0,0.0,56.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,56.0,0.0,46.0,0.0,61.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,60.0,70.333333,96.333333,847729.3,12.48,360.0,0.0,360.0,11194.4051,0.0,11194.4051,1.0,0.0,3.0,3.0,0.0,0.0,204.987534,0.0,0.0,0.0,0.0,0.0,0.0,0.0,11,0.0,2,30.0,51.132436,16,0.8,1,69.766987,53.627186,53.983727,69.002438,0.555033,0.444633,0.580023,0.419575,0,1,0,1,0.0,0.0,1,0,0,1,0,0
4,PHEQC6DV3LTFJYIJU,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,436.069089,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,113.252059,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,25542.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,129.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,198.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,77.405367,92.911356,0.0,1.200322,15.505989,100.478258,8.853306,0.0,0.0,8.0,11,8.0,2,30.0,54.115853,18,0.666667,1,112.043734,87.607876,94.088846,101.540792,0.603666,0.395976,0.49499,0.504604,0,0,0,1,150.0,0.0,0,0,0,0,0,1


In [None]:
data['constr_hour_error'] = np.abs(data['hour_pointing_digit'] -  data['actual_hour_digit'])
val['constr_hour_error'] = np.abs(val['hour_pointing_digit'] -  val['actual_hour_digit'])

data['constr_minute_error'] = np.abs(data['minute_pointing_digit'] -  data['actual_minute_digit'])
val['constr_minute_error'] = np.abs(val['minute_pointing_digit'] -  val['actual_minute_digit'])

In [None]:
data['num'] = 12 - data[[x for x in X_columns if 'missing_digit' in x]].sum(axis = 1)
data['num'] = data['number_of_digits'] - data['num']

In [None]:
val['num'] = 12 - val[[x for x in X_columns if 'missing_digit' in x]].sum(axis = 1)
val['num'] = val['number_of_digits'] - val['num']

In [None]:
data['double_rel'] = data['double_major']/data['double_minor']

In [None]:
val['double_rel'] = val['double_major']/val['double_minor']

## Train your model

In [None]:
tmp = [x for x in data.columns]

In [None]:
# tmp.append('constr_minute_error')
# tmp.append('constr_hour_error')

In [None]:
tmp.remove('hour_pointing_digit')
tmp.remove('actual_hour_digit')

In [None]:
tmp.remove('diagnosis')

In [None]:
tmp.remove('minute_pointing_digit')
tmp.remove('actual_minute_digit')

In [None]:
tmp.remove('other_error')

In [None]:
#tmp.append('double_rel')

In [None]:
#tmp.remove('time_diff')

In [None]:
tmp.remove('centre_dot_detect')

In [None]:
tmp.remove('between_digits_angle_ccw_sum')

In [None]:
# 09.05.2021 15:37 - best
tmp.remove('intersection_pos_rel_centre_BR')
tmp.remove('intersection_pos_rel_centre_BL')
tmp.remove('intersection_pos_rel_centre_TR')
tmp.remove('intersection_pos_rel_centre_TL')
tmp.remove('intersection_pos_rel_centre_Nan')

In [None]:
data = pd.concat([data[tmp + [Y_column]], val[tmp + [Y_column]]], axis = 0).sample(frac = 1)

In [None]:
data = pd.read_csv('good_data_2.csv')

In [None]:
for x in [x for x in tmp if 'missing_digit' in x]:
    tmp.remove(x)

In [None]:
tmp.append('final_rotation_angle_constr')

In [None]:
tmp.remove('final_rotation_angle')

In [None]:
data['final_rotation_angle_constr'] = data['final_rotation_angle'].apply(lambda x: 1 if x > 180 else 0 )

In [None]:
# data_dict = dict()

# for i in range(10):
#     data_dict[i] = pd.concat([pd.concat([data.loc[data[data['diagnosis'] == 0].isna().sum(axis = 1).sort_values().head(int(data[data['diagnosis'] == 0].shape[0]/3)).sample(frac=1/8).index],
#                              data.loc[data[data['diagnosis'] == 0].isna().sum(axis = 1).sort_values().head(int(data[data['diagnosis'] == 0].shape[0]/3*2)).tail(int(data[data['diagnosis'] == 0].shape[0]/3)).sample(frac=1/7).index],
#                              data.loc[data[data['diagnosis'] == 0].isna().sum(axis = 1).sort_values().tail(int(data[data['diagnosis'] == 0].shape[0]/3)).sample(frac=1/7).index]], axis = 0),
#                    data[data['diagnosis'] != 0]], axis = 0).fillna(0)

In [None]:
# # 09.05.2021 15:37 - best
# data = pd.concat([pd.concat([data.loc[data[data['diagnosis'] == 0].isna().sum(axis = 1).sort_values().head(int(data[data['diagnosis'] == 0].shape[0]/3)).sample(frac=1/8).index],
#                              data.loc[data[data['diagnosis'] == 0].isna().sum(axis = 1).sort_values().head(int(data[data['diagnosis'] == 0].shape[0]/3*2)).tail(int(data[data['diagnosis'] == 0].shape[0]/3)).sample(frac=1/7).index],
#                              data.loc[data[data['diagnosis'] == 0].isna().sum(axis = 1).sort_values().tail(int(data[data['diagnosis'] == 0].shape[0]/3)).sample(frac=1/7).index]], axis = 0),
#                    data[data['diagnosis'] != 0]], axis = 0).fillna(0)

In [None]:
# # 09.05.2021 15:37 - best
# #model = CatBoostClassifier(verbose=False, cat_features=['intersection_pos_rel_centre'])
# model = CatBoostClassifier(n_estimators = 800, objective = 'MultiClass',  max_depth = 8, 
#                            learning_rate = 0.0125, eval_metric = 'MultiClass', verbose = 1, class_weights = {0 : 1, 1 : 1, 2: 1}, random_strength = 1)

In [None]:
# model_dict = dict()

# for x in range(10):
#     temp_df = data_dict[x]
#     print(x)
#     model = CatBoostClassifier(n_estimators = 800, objective = 'MultiClass',  max_depth = 8, 
#                                learning_rate = 0.0125, eval_metric = 'MultiClass', verbose = 1, class_weights = {0 : 1, 1 : 1, 2: 1}, random_strength = 1)
    
#     model.fit(temp_df[tmp], temp_df[Y_column], early_stopping_rounds = 50)
#     model_dict[x] = model

In [None]:
# for x in range(10):
#     print(x)
#     pickle.dump(model_dict[x], open(AICROWD_ASSETS_DIR + f'/model_{x}.sav', 'wb'))

In [None]:
model.fit(data[tmp], data[Y_column], early_stopping_rounds = 50)

0:	learn: 1.0879168	total: 87.9ms	remaining: 1m 10s
1:	learn: 1.0769253	total: 166ms	remaining: 1m 6s
2:	learn: 1.0661091	total: 226ms	remaining: 60s
3:	learn: 1.0550238	total: 288ms	remaining: 57.4s
4:	learn: 1.0451604	total: 372ms	remaining: 59.1s
5:	learn: 1.0351107	total: 436ms	remaining: 57.7s
6:	learn: 1.0257322	total: 501ms	remaining: 56.8s
7:	learn: 1.0170182	total: 572ms	remaining: 56.6s
8:	learn: 1.0087748	total: 637ms	remaining: 56s
9:	learn: 1.0000044	total: 701ms	remaining: 55.4s
10:	learn: 0.9911132	total: 778ms	remaining: 55.8s
11:	learn: 0.9827362	total: 844ms	remaining: 55.5s
12:	learn: 0.9747472	total: 910ms	remaining: 55.1s
13:	learn: 0.9670126	total: 979ms	remaining: 54.9s
14:	learn: 0.9594320	total: 1.06s	remaining: 55.6s
15:	learn: 0.9519765	total: 1.13s	remaining: 55.2s
16:	learn: 0.9448526	total: 1.2s	remaining: 55.1s
17:	learn: 0.9376249	total: 1.29s	remaining: 56.1s
18:	learn: 0.9303155	total: 1.35s	remaining: 55.6s
19:	learn: 0.9235361	total: 1.43s	remaining:

<catboost.core.CatBoostClassifier at 0x7f9fbcdb1d00>

In [None]:
# model.fit(data[tmp], data[Y_column], early_stopping_rounds = 50)
# #model.fit(tdf[tmp], tdf[Y_column], early_stopping_rounds = 100)

In [None]:
model = CatBoostClassifier(n_estimators = 800, objective = 'MultiClass',  max_depth = 8, 
                           learning_rate = 0.0125, eval_metric = 'MultiClass', verbose = 1, class_weights = {0 : 1, 1 : 1, 2: 1}, random_strength = 1)

In [None]:
cv = cross_val_score(model,  data[tmp],  data[Y_column], cv=4, scoring='neg_log_loss')

In [None]:
t = pd.DataFrame()

t['column'] = tmp

t['val'] = model.get_feature_importance()

In [None]:
t.sort_values(by = 'val')

Unnamed: 0,column,val
71,sequence_flag_ccw,0.120593
85,pred_tremor,0.129476
103,final_rotation_angle_constr,0.171378
70,sequence_flag_cw,0.234601
96,eleven_ten_error,0.254047
102,percentage_inside_ellipse,0.274478
67,between_digits_angle_cw_sum,0.28381
97,time_diff,0.339536
101,num,0.379349
95,vert_count,0.41371


In [None]:
# r

# {'Approach': 'No modifications',
#  'Log Loss': 0.553471330926346,
#  'F1': 0.43568303407641046}



In [None]:
# r
# {'Approach': 'No modifications',
#  'Log Loss': 0.5458732238255839,
#  'F1': 0.46820733777255513}



In [None]:
# r
# {'Approach': 'No modifications',
#  'Log Loss': 0.5585044732888239,
#  'F1': 0.46160382176544235}


In [None]:
# r
# {'Approach': 'No modifications',
#  'Log Loss': 0.5640339437338885,
#  'F1': 0.4526098373399501}

In [None]:
r
# {'Approach': 'No modifications',
#  'Log Loss': 0.5791115579262534,
#  'F1': 0.4519167955991061}

In [None]:
pd.Series(tmp).to_csv(AICROWD_ASSETS_DIR + '/X_columns.csv', index = False)

## Save your trained model

In [None]:
import pickle

In [None]:
pickle.dump(model, open(AICROWD_ASSETS_DIR + '/model.sav', 'wb'))

# Prediction phase 🔎

Please make sure to save the weights from the training section in your assets directory and load them in this section

In [None]:
# model = load_model_from_assets_dir(AIcrowdConfig.ASSETS_DIR)
model = pickle.load(open(AICROWD_ASSETS_DIR + '/model.sav', 'rb'))

In [None]:
X_columns = pd.read_csv(AICROWD_ASSETS_DIR + '/X_columns.csv')['0'].to_list()
Y_column = 'diagnosis'

## Load test data

In [None]:
test_data = pd.read_csv(AICROWD_DATASET_PATH)

In [None]:
test_data['intersection_pos_rel_centre'].fillna('Nan', inplace = True)

In [None]:
test_data.fillna(0, inplace = True)

In [None]:
test_data['constr_hour_error'] = np.abs(test_data['hour_pointing_digit'] -  test_data['actual_hour_digit'])
test_data['constr_minute_error'] = np.abs(test_data['minute_pointing_digit'] -  test_data['actual_minute_digit'])

In [None]:
test_data['num'] = 12 - test_data[[x for x in X_columns if 'missing_digit' in x]].sum(axis = 1)
test_data['num'] = test_data['number_of_digits'] - test_data['num']

In [None]:
test_data['double_rel'] = test_data['double_major']/test_data['double_minor']

In [None]:
test_data = pd.concat([test_data.drop(columns = ['intersection_pos_rel_centre']), pd.get_dummies(test_data['intersection_pos_rel_centre'], prefix = 'intersection_pos_rel_centre')], axis = 1)

In [None]:
test_data['final_rotation_angle_constr'] = test_data['final_rotation_angle'].apply(lambda x: 1 if x > 180 else 0 )

In [None]:
res = model.predict_proba(test_data[X_columns])

## Generate predictions

In [None]:
predictions = {
    "row_id": test_data["row_id"].values,
    "normal_diagnosis_probability": [x[0] for x in res],
    "post_alzheimer_diagnosis_probability": [x[1] for x in res],
    "pre_alzheimer_diagnosis_probability": [x[2] for x in res],
}

predictions_df = pd.DataFrame.from_dict(predictions)

## Save predictions 📨

In [None]:
predictions_df.to_csv(AICROWD_PREDICTIONS_PATH, index=False)

# Submit to AIcrowd 🚀

**NOTE: PLEASE SAVE THE NOTEBOOK BEFORE SUBMITTING IT (Ctrl + S)**

In [None]:
%env DATASET_PATH=$AICROWD_DATASET_PATH
    --assets-dir $AICROWD_ASSETS_DIR \
    --challenge addi-alzheimers-detection-challenge

[32mAPI Key valid[0m
[32mSaved API Key successfully![0m
env: DATASET_PATH=/ds_shared_drive/validation.csv
Using notebook: /home/desktop0/python_best.ipynb for submission...
Removing existing files from submission directory...
Scrubbing API keys from the notebook...
Collecting notebook...
Validating the submission...
Executing install.ipynb...
[NbConvertApp] Converting notebook /home/desktop0/submission/install.ipynb to notebook
[NbConvertApp] Executing notebook with kernel: python
[NbConvertApp] Writing 14189 bytes to /home/desktop0/submission/install.nbconvert.ipynb
Executing predict.ipynb...
[NbConvertApp] Converting notebook /home/desktop0/submission/predict.ipynb to notebook
[NbConvertApp] Executing notebook with kernel: python
[NbConvertApp] Writing 13787 bytes to /home/desktop0/submission/predict.nbconvert.ipynb
[2K[1;34msubmission.zip[0m [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━[0m [35m100.0%[0m • [32m35.6/35.6 MB[0m • [31m2.5 MB/s[0m • [36m0:00:00[0m[0m • [36m0: