# Hackerearth Genome And Genetics Challenge
<hr>

<p align="center">
    <img src="https://d2908q01vomqb2.cloudfront.net/cb4e5208b4cd87268b208e49452ed6e89a68e0b8/2021/07/16/HackerEarthFeatureImage.png" width="500" height="600">
</p>

A genetic disorder is a health condition that is usually caused by mutations in DNA or changes in the number or overall structure of chromosomes. Several types of commonly-known diseases are related to hereditary gene mutations. Genetic testing aids patients in making important decisions in the prevention, treatment, or early detection of hereditary disorders.

With increasing population, studies have shown that there has been an exponential increase in the number of genetic disorders. Low awareness of the importance of genetic testing contributes to the increase in the incidence of hereditary disorders. Many children succumb to these disorders and it is extremely important that genetic testing be done during pregnancy.

## Evaluation Metrics

- **Genetic Disorder** <br>
<code>score1 = max(0, 100*metrics.f1_score(actual["Genetic Disorder"], predicted["Genetic Disorder"], average="macro"))</code>

- **Disorder Subclass** <br>
<code>score2 = max(0, 100*metrics.f1_score(actual["Disorder Subclass"], predicted["Disorder Subclass"], average="macro"))</code>

- **Final score** <br>
<code>score = (score1/2)+(score2/2)</code>

## Challenge

**Link** : https://www.hackerearth.com/challenges/competitive/hackerearth-machine-learning-challenge-genetic-testing/machine-learning/predict-the-genetic-disorders-9-76826a5e/

## 1. Environment Setup

In [1]:
# Data manipulation packages
import pandas as pd
import numpy as np

# Visualization packages
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

# Scikit-learn packages
from sklearn.preprocessing import LabelEncoder, OrdinalEncoder, MinMaxScaler
from sklearn.model_selection import cross_val_score

# Machine Learning packages
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier
from sklearn.tree import DecisionTreeClassifier
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier

# Importing SMOTE for oversampling
from imblearn.over_sampling import SMOTE

## 2. Dataset

In [2]:
# Importing training dataset
df = pd.read_csv('data/train.csv')

# Importing test dataset
test_df = pd.read_csv('data/test.csv')

# Replacing -99 with null
df = df.replace('-99', np.nan)
df = df.replace(-99, np.nan)

test_df = test_df.replace('-99', np.nan)
test_df = test_df.replace(-99, np.nan)

# Excluding targets with null values
df = df[df['Genetic Disorder'].notnull()]
df = df[df['Disorder Subclass'].notnull()]

test_id = test_df['Patient Id']

In [3]:
df.head()

Unnamed: 0,Patient Id,Patient Age,Genes in mother's side,Inherited from father,Maternal gene,Paternal gene,Blood cell count (mcL),Patient First Name,Family Name,Father's name,...,Birth defects,White Blood cell count (thousand per microliter),Blood test result,Symptom 1,Symptom 2,Symptom 3,Symptom 4,Symptom 5,Genetic Disorder,Disorder Subclass
0,PID0x6418,2.0,Yes,No,Yes,No,4.760603,Richard,,Larre,...,,9.857562,,1.0,1.0,1.0,1.0,1.0,Mitochondrial genetic inheritance disorders,Leber's hereditary optic neuropathy
2,PID0x4a82,6.0,Yes,No,No,No,4.893297,Kimberly,,Nashon,...,Singular,,normal,0.0,1.0,1.0,1.0,1.0,Multifactorial genetic inheritance disorders,Diabetes
3,PID0x4ac8,12.0,Yes,No,Yes,No,4.70528,Jeffery,Hoelscher,Aayaan,...,Singular,7.919321,inconclusive,0.0,0.0,1.0,0.0,0.0,Mitochondrial genetic inheritance disorders,Leigh syndrome
4,PID0x1bf7,11.0,Yes,No,,Yes,4.720703,Johanna,Stutzman,Suave,...,Multiple,4.09821,,0.0,0.0,0.0,0.0,,Multifactorial genetic inheritance disorders,Cancer
5,PID0x44fe,14.0,Yes,No,Yes,No,5.103188,Richard,,Coleston,...,Multiple,10.27223,normal,1.0,0.0,0.0,1.0,0.0,Single-gene inheritance diseases,Cystic fibrosis


In [4]:
test_df.head()

Unnamed: 0,Patient Id,Patient Age,Genes in mother's side,Inherited from father,Maternal gene,Paternal gene,Blood cell count (mcL),Patient First Name,Family Name,Father's name,...,History of anomalies in previous pregnancies,No. of previous abortion,Birth defects,White Blood cell count (thousand per microliter),Blood test result,Symptom 1,Symptom 2,Symptom 3,Symptom 4,Symptom 5
0,PID0x4175,6,No,Yes,No,No,4.981655,Charles,,Kore,...,,2.0,Multiple,,slightly abnormal,True,True,True,True,True
1,PID0x21f5,10,Yes,No,,Yes,5.11889,Catherine,,Homero,...,Yes,,Multiple,8.179584,normal,False,False,False,True,False
2,PID0x49b8,5,No,,No,No,4.876204,James,,Danield,...,No,0.0,Singular,,slightly abnormal,False,False,True,True,False
3,PID0x2d97,13,No,Yes,Yes,No,4.687767,Brian,,Orville,...,Yes,,Singular,6.884071,normal,True,False,True,False,True
4,PID0x58da,5,No,,,Yes,5.152362,Gary,,Issiah,...,No,,Multiple,6.195178,normal,True,True,True,True,False


## 3. Exploratory Data Analysis

In [5]:
# Checking shape of training and test dataset
print("Training dataset shape: ", df.shape)
print("Test dataset shape: ", test_df.shape)

Training dataset shape:  (18047, 45)
Test dataset shape:  (9465, 43)


In [6]:
# Checking for null values in training dataset
df.isnull().sum()

Patient Id                                             0
Patient Age                                         1060
Genes in mother's side                                 0
Inherited from father                                220
Maternal gene                                       2071
Paternal gene                                          0
Blood cell count (mcL)                                 0
Patient First Name                                     0
Family Name                                         7176
Father's name                                          0
Mother's age                                        4457
Father's age                                        4418
Institute Name                                      3783
Location of Institute                                  0
Status                                                 0
Respiratory Rate (breaths/min)                      1570
Heart Rate (rates/min                               1528
Test 1                         

In [7]:
# Checking for null values in test dataset
test_df.isnull().sum()

Patient Id                                             0
Patient Age                                            0
Genes in mother's side                                 0
Inherited from father                                551
Maternal gene                                       3723
Paternal gene                                          0
Blood cell count (mcL)                                 0
Patient First Name                                     0
Family Name                                         9317
Father's name                                          0
Mother's age                                           0
Father's age                                           0
Institute Name                                      5004
Location of Institute                                  0
Status                                                 0
Respiratory Rate (breaths/min)                      4991
Heart Rate (rates/min                               4974
Test 1                         

In [8]:
# Checking training dataset dtypes
df.dtypes

Patient Id                                           object
Patient Age                                         float64
Genes in mother's side                               object
Inherited from father                                object
Maternal gene                                        object
Paternal gene                                        object
Blood cell count (mcL)                              float64
Patient First Name                                   object
Family Name                                          object
Father's name                                        object
Mother's age                                        float64
Father's age                                        float64
Institute Name                                       object
Location of Institute                                object
Status                                               object
Respiratory Rate (breaths/min)                       object
Heart Rate (rates/min                   

In [9]:
# Checking test dataset dtypes
test_df.dtypes

Patient Id                                           object
Patient Age                                           int64
Genes in mother's side                               object
Inherited from father                                object
Maternal gene                                        object
Paternal gene                                        object
Blood cell count (mcL)                              float64
Patient First Name                                   object
Family Name                                          object
Father's name                                        object
Mother's age                                          int64
Father's age                                          int64
Institute Name                                       object
Location of Institute                                object
Status                                               object
Respiratory Rate (breaths/min)                       object
Heart Rate (rates/min                   

In [10]:
df['Genes in mother\'s side'].value_counts()

Yes    10743
No      7304
Name: Genes in mother's side, dtype: int64

In [11]:
df['Inherited from father'].value_counts()

No     10773
Yes     7054
Name: Inherited from father, dtype: int64

In [12]:
df['Maternal gene'].value_counts()

Yes    8803
No     7173
Name: Maternal gene, dtype: int64

In [13]:
df['Paternal gene'].value_counts()

No     10239
Yes     7808
Name: Paternal gene, dtype: int64

In [14]:
df['Status'].value_counts()

Alive       9061
Deceased    8986
Name: Status, dtype: int64

In [15]:
df['Respiratory Rate (breaths/min)'].value_counts()

Normal (30-60)    8281
Tachypnea         8196
Name: Respiratory Rate (breaths/min), dtype: int64

In [16]:
df['Heart Rate (rates/min'].value_counts()

Normal         8396
Tachycardia    8123
Name: Heart Rate (rates/min, dtype: int64

In [17]:
df['Test 1'].value_counts()

0.0    16474
Name: Test 1, dtype: int64

In [18]:
df['Test 2'].value_counts()

0.0    16459
Name: Test 2, dtype: int64

In [19]:
df['Test 3'].value_counts()

0.0    16478
Name: Test 3, dtype: int64

In [20]:
df['Test 4'].value_counts()

1.0    16473
Name: Test 4, dtype: int64

In [21]:
df['Test 5'].value_counts()

0.0    16451
Name: Test 5, dtype: int64

In [22]:
df['Parental consent'].value_counts()

Yes    16468
Name: Parental consent, dtype: int64

In [23]:
df['Follow-up'].value_counts()

Low     8322
High    8150
Name: Follow-up, dtype: int64

In [24]:
df['Gender'].value_counts()

Male         5519
Ambiguous    5509
Female       5446
Name: Gender, dtype: int64

In [25]:
df['Birth asphyxia'].value_counts()

Yes              4248
Not available    4120
No record        4112
No               4015
Name: Birth asphyxia, dtype: int64

In [26]:
df['Autopsy shows birth defect (if applicable)'].value_counts()

Not applicable    9061
None              2805
Yes               2781
No                2643
Name: Autopsy shows birth defect (if applicable), dtype: int64

In [27]:
df['Place of birth'].value_counts()

Institute    8323
Home         8133
Name: Place of birth, dtype: int64

In [28]:
df['Folic acid details (peri-conceptional)'].value_counts()

Yes    8336
No     8147
Name: Folic acid details (peri-conceptional), dtype: int64

In [29]:
df['H/O serious maternal illness'].value_counts()

No     8292
Yes    8203
Name: H/O serious maternal illness, dtype: int64

In [30]:
df['H/O radiation exposure (x-ray)'].value_counts()

Not applicable    4156
No                4143
Yes               4130
-                 4034
Name: H/O radiation exposure (x-ray), dtype: int64

In [31]:
df['H/O substance abuse'].value_counts()

No                4170
-                 4130
Yes               4125
Not applicable    3990
Name: H/O substance abuse, dtype: int64

In [32]:
df['Assisted conception IVF/ART'].value_counts()

Yes    8274
No     8183
Name: Assisted conception IVF/ART, dtype: int64

In [33]:
df['History of anomalies in previous pregnancies'].value_counts()

Yes    8285
No     8148
Name: History of anomalies in previous pregnancies, dtype: int64

In [34]:
df['No. of previous abortion'].value_counts()

2.0    3396
1.0    3282
4.0    3281
0.0    3277
3.0    3265
Name: No. of previous abortion, dtype: int64

In [35]:
df['Birth defects'].value_counts()

Multiple    8242
Singular    8240
Name: Birth defects, dtype: int64

In [36]:
df['Blood test result'].value_counts()

slightly abnormal    4257
inconclusive         4109
normal               4091
abnormal             4026
Name: Blood test result, dtype: int64

In [37]:
df['Symptom 1'].value_counts()

1.0    9748
0.0    6721
Name: Symptom 1, dtype: int64

In [38]:
df['Symptom 2'].value_counts()

1.0    9055
0.0    7346
Name: Symptom 2, dtype: int64

In [39]:
df['Symptom 3'].value_counts()

1.0    8882
0.0    7635
Name: Symptom 3, dtype: int64

In [40]:
df['Symptom 4'].value_counts()

0.0    8257
1.0    8224
Name: Symptom 4, dtype: int64

In [41]:
df['Symptom 5'].value_counts()

0.0    8803
1.0    7631
Name: Symptom 5, dtype: int64

In [42]:
df['Genetic Disorder'].value_counts()

Mitochondrial genetic inheritance disorders     9241
Single-gene inheritance diseases                6929
Multifactorial genetic inheritance disorders    1877
Name: Genetic Disorder, dtype: int64

In [43]:
df['Disorder Subclass'].value_counts()

Leigh syndrome                         4683
Mitochondrial myopathy                 3971
Cystic fibrosis                        3145
Tay-Sachs                              2556
Diabetes                               1653
Hemochromatosis                        1228
Leber's hereditary optic neuropathy     587
Alzheimer's                             133
Cancer                                   91
Name: Disorder Subclass, dtype: int64

In [44]:
df.describe()

Unnamed: 0,Patient Age,Blood cell count (mcL),Mother's age,Father's age,Test 1,Test 2,Test 3,Test 4,Test 5,No. of previous abortion,White Blood cell count (thousand per microliter),Symptom 1,Symptom 2,Symptom 3,Symptom 4,Symptom 5
count,16987.0,18047.0,13590.0,13629.0,16474.0,16459.0,16478.0,16473.0,16451.0,16501.0,16440.0,16469.0,16401.0,16517.0,16481.0,16434.0
mean,6.948784,4.899198,34.576453,41.972559,0.0,0.0,0.0,1.0,0.0,1.999455,7.47574,0.5919,0.5521,0.537749,0.498999,0.464342
std,4.314395,0.199061,9.823005,13.064441,0.0,0.0,0.0,0.0,0.0,1.40947,2.65112,0.491497,0.497293,0.498588,0.500014,0.498742
min,0.0,4.14623,18.0,20.0,0.0,0.0,0.0,1.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0
25%,3.0,4.764199,26.0,30.0,0.0,0.0,0.0,1.0,0.0,1.0,5.422143,0.0,0.0,0.0,0.0,0.0
50%,7.0,4.900306,35.0,42.0,0.0,0.0,0.0,1.0,0.0,2.0,7.470549,1.0,1.0,1.0,0.0,0.0
75%,11.0,5.033654,43.0,53.0,0.0,0.0,0.0,1.0,0.0,3.0,9.51747,1.0,1.0,1.0,1.0,1.0
max,14.0,5.609829,51.0,64.0,0.0,0.0,0.0,1.0,0.0,4.0,12.0,1.0,1.0,1.0,1.0,1.0


In [45]:
test_df.describe()

Unnamed: 0,Patient Age,Blood cell count (mcL),Mother's age,Father's age,Test 1,Test 2,Test 3,Test 4,Test 5,No. of previous abortion,White Blood cell count (thousand per microliter)
count,9465.0,9465.0,9465.0,9465.0,7345.0,7384.0,7366.0,7383.0,7374.0,7369.0,7363.0
mean,7.041838,4.900207,34.575489,41.830745,0.0,0.0,0.0,1.0,0.0,2.017099,7.494913
std,4.337995,0.199159,9.83487,13.040945,0.0,0.0,0.0,0.0,0.0,1.408629,2.657389
min,0.0,4.120469,18.0,20.0,0.0,0.0,0.0,1.0,0.0,0.0,3.0
25%,3.0,4.765843,26.0,30.0,0.0,0.0,0.0,1.0,0.0,1.0,5.446564
50%,7.0,4.89895,35.0,42.0,0.0,0.0,0.0,1.0,0.0,2.0,7.435213
75%,11.0,5.033164,43.0,53.0,0.0,0.0,0.0,1.0,0.0,3.0,9.550831
max,14.0,5.676184,51.0,64.0,0.0,0.0,0.0,1.0,0.0,4.0,12.0


## 4. Data Preprocessing

In [46]:
X = df.drop(['Genetic Disorder', 'Disorder Subclass'], axis=1)
y1 = df['Genetic Disorder']
y2 = df['Disorder Subclass']

In [47]:
def _drop_cols(data):
    '''
    Dropping unnecessary columns
    '''
    drop_cols = ['Patient Id', 'Patient First Name', 'Family Name', "Father's name", 'Institute Name',
             'Location of Institute', 'Test 1', 'Test 2', 'Test 3', 'Test 4', 'Test 5', 'Parental consent']
    data = data.drop(drop_cols, axis=1)
    return data

In [48]:
X = _drop_cols(X)
test_df = _drop_cols(test_df)

In [49]:
def _data_cleaning(data):
    '''
    Cleaning the dataset
    '''
    data['Birth asphyxia'] = data['Birth asphyxia'].replace('No record', 'Not available')
    data['Autopsy shows birth defect (if applicable)'] = data['Autopsy shows birth defect (if applicable)'].replace('None', 'Not applicable')
    data['H/O radiation exposure (x-ray)'] = data['H/O radiation exposure (x-ray)'].replace('-', 'Not applicable')
    data['H/O substance abuse'] = data['H/O substance abuse'].replace('-', 'Not applicable')
    return data

In [50]:
X = _data_cleaning(X)
test_df = _data_cleaning(test_df)

In [51]:
def _data_imputing(data):
    '''
    Imputing missing values
    '''
    # categorical columns
    mode_cols = []
    
    # numerical columns
    median_cols = []
    
    for col in data.columns:
        if(len(data[col].unique()) < 6):
            mode_cols.append(col)
        else:
            median_cols.append(col)
            
    # Filling null values
    for lab in mode_cols:
        data[lab] = data[lab].fillna(data[lab].mode()[0])
    
    for lab in median_cols:
        data[lab] = data[lab].fillna(data[lab].median())
    return data

In [52]:
X = _data_imputing(X)
test_df = _data_imputing(test_df)

In [53]:
# Categorical columns
mode_cols = []

# Numerical columns
median_cols = []

for col in X.columns:
    if(len(X[col].unique()) < 6):
        mode_cols.append(col)
    else:
        median_cols.append(col)

In [54]:
ord_encoder = OrdinalEncoder()
minmax_scaler = MinMaxScaler()

# Encoding the categorical columns
for cat in mode_cols:
    label_fit = ord_encoder.fit(np.array(X[cat]).reshape(-1, 1))
    X[cat] = label_fit.transform(np.array(X[cat]).reshape(-1, 1))
    test_df[cat] = label_fit.transform(np.array(test_df[cat]).reshape(-1, 1))

# Normalizing numerical columns
scaler_fit = minmax_scaler.fit(X[median_cols])
X[median_cols] = scaler_fit.transform(X[median_cols])
test_df[median_cols] = scaler_fit.transform(test_df[median_cols])

In [55]:
X.head()

Unnamed: 0,Patient Age,Genes in mother's side,Inherited from father,Maternal gene,Paternal gene,Blood cell count (mcL),Mother's age,Father's age,Status,Respiratory Rate (breaths/min),...,History of anomalies in previous pregnancies,No. of previous abortion,Birth defects,White Blood cell count (thousand per microliter),Blood test result,Symptom 1,Symptom 2,Symptom 3,Symptom 4,Symptom 5
0,0.142857,1.0,0.0,1.0,0.0,0.419769,0.515152,0.5,0.0,0.0,...,1.0,2.0,0.0,0.761951,3.0,1.0,1.0,1.0,1.0,1.0
2,0.428571,1.0,0.0,0.0,0.0,0.510432,0.69697,0.045455,0.0,0.0,...,1.0,4.0,1.0,0.496728,2.0,0.0,1.0,1.0,1.0,1.0
3,0.857143,1.0,0.0,1.0,0.0,0.38197,0.090909,0.5,1.0,1.0,...,1.0,1.0,1.0,0.546591,1.0,0.0,0.0,1.0,0.0,0.0
4,0.785714,1.0,0.0,1.0,1.0,0.392507,0.424242,0.5,0.0,1.0,...,0.0,4.0,0.0,0.122023,3.0,0.0,0.0,0.0,0.0,0.0
5,1.0,1.0,0.0,1.0,0.0,0.653839,0.515152,0.5,1.0,0.0,...,0.0,0.0,0.0,0.808026,2.0,1.0,0.0,0.0,1.0,0.0


In [56]:
test_df.head()

Unnamed: 0,Patient Age,Genes in mother's side,Inherited from father,Maternal gene,Paternal gene,Blood cell count (mcL),Mother's age,Father's age,Status,Respiratory Rate (breaths/min),...,History of anomalies in previous pregnancies,No. of previous abortion,Birth defects,White Blood cell count (thousand per microliter),Blood test result,Symptom 1,Symptom 2,Symptom 3,Symptom 4,Symptom 5
0,0.428571,0.0,1.0,0.0,0.0,0.570802,0.606061,0.931818,0.0,1.0,...,0.0,2.0,0.0,0.492801,3.0,1.0,1.0,1.0,1.0,1.0
1,0.714286,1.0,0.0,1.0,1.0,0.664567,0.454545,0.75,0.0,0.0,...,1.0,2.0,0.0,0.575509,2.0,0.0,0.0,0.0,1.0,0.0
2,0.357143,0.0,0.0,0.0,0.0,0.498753,0.909091,0.909091,1.0,0.0,...,0.0,0.0,1.0,0.492801,3.0,0.0,0.0,1.0,1.0,0.0
3,0.928571,0.0,1.0,1.0,0.0,0.370004,0.212121,0.795455,0.0,0.0,...,1.0,2.0,1.0,0.431563,2.0,1.0,0.0,1.0,0.0,1.0
4,0.357143,0.0,0.0,1.0,1.0,0.687437,0.69697,0.409091,1.0,1.0,...,0.0,2.0,0.0,0.35502,2.0,1.0,1.0,1.0,1.0,0.0


## 5. Model Experimentation

In [57]:
# Label encoding the target

# Target 1
lab_y1 = LabelEncoder()
y1 = lab_y1.fit_transform(y1)

# Target 2
lab_y2 = LabelEncoder()
y2 = lab_y2.fit_transform(y2)

In [58]:
# Oversampling using SMOTE

# Target 1
X1_, y1_ = SMOTE().fit_resample(X, y1)

# Target 2
X2_, y2_ = SMOTE().fit_resample(X, y2)

In [59]:
def _model_experimentation(models, X, y):
    model_scores = {}
    for name, model in models.items():
        print(name)
        model_scores[name] = np.mean(cross_val_score(model, X, y, cv=10, n_jobs=-1, scoring='f1_macro')) * 100
    return model_scores

In [60]:
models = {"RFC" : RandomForestClassifier(n_jobs=-1),
          "XGB" : XGBClassifier(n_jobs=-1),
          "GB"  : GradientBoostingClassifier(),
          "ADA" : AdaBoostClassifier(),
          "LGB" : LGBMClassifier(),
          "KNN" : KNeighborsClassifier(),
          "LR"  : LogisticRegression(),
          "DT"  : DecisionTreeClassifier()}

score1 = _model_experimentation(models, X1_, y1_)
score2 = _model_experimentation(models, X2_, y2_)

RFC
XGB
GB
ADA
LGB
KNN
LR
DT
RFC
XGB
GB
ADA
LGB
KNN
LR
DT


In [61]:
score1

{'RFC': 71.2901243357021,
 'XGB': 66.71795208798264,
 'GB': 62.922693543188366,
 'ADA': 55.380298648626244,
 'LGB': 67.00546593973212,
 'KNN': 59.3973584597563,
 'LR': 55.54577125232657,
 'DT': 62.50095646360585}

In [62]:
score2

{'RFC': 75.47576678107501,
 'XGB': 68.18936010624567,
 'GB': 55.36685057179434,
 'ADA': 40.45289892872512,
 'LGB': 66.39346141708215,
 'KNN': 64.99780874598905,
 'LR': 45.29809781791744,
 'DT': 60.29714802080189}

## 6. Submission

In [72]:
def _submission(models, X1, X2, test, y1, y2):
    for name, model in models.items():
        # Training the model
        ml_model1 = model
        ml_model1.fit(X1, y1)
        
        # Making predicitions for Target 1
        pred1 = ml_model1.predict(test)
        
        # Inverse encoding the predictions
        pred1 = lab_y1.inverse_transform(pred1)
        
        # Training the model
        ml_model2 = model
        ml_model2.fit(X2, y2)
        
        # Making predicitions for Target 2
        pred2 = ml_model1.predict(test)
        
        # Inverse encoding the predictions
        pred2 = lab_y2.inverse_transform(pred2)

        subm = pd.DataFrame()
        subm['Patient Id'] = test_id
        subm['Genetic Disorder'] = pred1
        subm['Disorder Subclass'] = pred2
        
        subm.to_csv('Submission_'+name+'.csv', index=False)

In [73]:
_submission(models, X1_, X2_, test_df, y1_, y2_)









STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
