# DILEEPKUMAR - AIQoD - ML Assignment
-----------------------------------------------------------------------

The data contains features extracted from text similar to the one shown below:

![image.png](attachment:image.png)


You have to create a ML model that predict the probability that a piece of text belongs to a particular class.
Use techniques like Bag of Words, tfidf vectorization and word embedding. Please use Hash field value and explain how you are going to use the Hash field.

## Data extraction
For the documents nGrams have been extracted, Each row in the Train.csvcorresponds to one such nGram.

## Features
For a given nGram several features have been extracted (145). These features have been saved in the train.csvand test.csv. They have parsing, spatial, content and relative information.

    • Content: The cryptographic hash of the raw text.
    • Parsing: nGram is a number, text, alphanumeric, etc.
    • Spatial: Position and size of the nGram
    • Relational: details of text nearby the nGram

### The feature values can be:
    • Numbers. Continuous/discrete numerical values.
    • Boolean. The values include YES (true) or NO (false).
    • Categorical. Values within a finite set of possible values
    
## Labels
This are the labels corresponding to the probability that the current sample belongs to the given class. This is multilabel problem and hence a given sample can belong to more than one class.


## File descriptions
All the files are CSV.
train.csv - the features x x   of the training set. Each row
corresponds to a different sample, while each column is a different
feature.
    
    • trainLabels.csv - the expected labels y y   for the training set.
            Each row corresponds to a different sample, while each column is a different label. The order of the rows is aligned with train.csv.
            
    • test.csv - the features x x   of the test set.
            Each row corresponds to a different sample, while each column is a different feature.
            
    • sampleSubmission.csv - example of the expected probabilities y ̂  y  ^       for the test set.
            Each row contains two columns, namely one string and the probability of each sample belonging to each label.
            
    For example, if the test.csv has 3 samples and 4 labels, the submission file must have 13 rows with these strings in the first column: id_label, 1_y1, 1_y2, 1_y3, 1_y4, 2_y1, 2_y2, 2_y3, 2_y4, 3_y1, 3_y2, 3_y3, 3_y4, 4_y1, 4_y2, 4_y3, 4_y4
    
    

### Importing the neccessary packages

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import pickle

import time

import hashlib     # Function to hash a string using hashlib

from sklearn.preprocessing import LabelEncoder, StandardScaler

from sklearn.multioutput import MultiOutputClassifier
from sklearn.linear_model import LogisticRegression

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_extraction.text import HashingVectorizer
import scipy.sparse as sp

from sklearn.model_selection import train_test_split, KFold, cross_val_score, GridSearchCV

from sklearn.neighbors import KNeighborsClassifier

from sklearn.metrics import f1_score, accuracy_score, confusion_matrix, classification_report, ConfusionMatrixDisplay, roc_auc_score, roc_curve, RocCurveDisplay

from sklearn.tree import DecisionTreeClassifier

from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier


In [2]:
pd.set_option('display.max_columns', None)

### Reading the datas

In [3]:
# Load training data
train_features = pd.read_csv('./DATA Scientist Assignment/train.csv')
train_labels = pd.read_csv('./DATA Scientist Assignment/trainLabels.csv')

In [4]:
# Inspect the data
train_features.head()

Unnamed: 0,id,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,x16,x17,x18,x19,x20,x21,x22,x23,x24,x25,x26,x27,x28,x29,x30,x31,x32,x33,x34,x35,x36,x37,x38,x39,x40,x41,x42,x43,x44,x45,x46,x47,x48,x49,x50,x51,x52,x53,x54,x55,x56,x57,x58,x59,x60,x61,x62,x63,x64,x65,x66,x67,x68,x69,x70,x71,x72,x73,x74,x75,x76,x77,x78,x79,x80,x81,x82,x83,x84,x85,x86,x87,x88,x89,x90,x91,x92,x93,x94,x95,x96,x97,x98,x99,x100,x101,x102,x103,x104,x105,x106,x107,x108,x109,x110,x111,x112,x113,x114,x115,x116,x117,x118,x119,x120,x121,x122,x123,x124,x125,x126,x127,x128,x129,x130,x131,x132,x133,x134,x135,x136,x137,x138,x139,x140,x141,x142,x143,x144,x145
0,1,NO,NO,dqOiM6yBYgnVSezBRiQXs9bvOFnRqrtIoXRIElxD7g8=,GNjrXXA3SxbgD0dTRblAPO9jFJ7AIaZnu/f48g5XSUk=,0.576561,0.073139,0.481394,0.115697,0.472474,YES,NO,NO,NO,NO,42,0.396065,3,6,0.991018,0.0,0.82,3306,4676,YES,NO,YES,0,0.405047,0.46461,NO,NO,NO,NO,mimucPmJSF6NI6KM6cPIaaVxWaQyIQzSgtwTTb9bKlc=,s7mTY62CCkWUFc36AW2TlYAy5CIcniD2Vz+lHzyYCLg=,0.576561,0.073139,0.481394,0.115697,0.45856,YES,NO,YES,NO,NO,9,0.368263,2,10,0.992729,0.0,0.94,3306,4676,YES,NO,YES,1,0.375535,0.451301,+2TNtXRI6r9owdGCS80Ia9VVv8ZpuOpVaHEvxRGGu78=,NO,NO,Op+X3asn5H7EQJErI7PR0NkUs3YB+Ld/8OfWuiOC8tU=,GeerC2BbPUcQfQO86NmvOsKrfTvmW7HF+Iru9y+7DPA=,0.576561,0.073139,0.481394,0.115697,0.487598,YES,NO,NO,NO,NO,42,0.363131,6,10,0.987596,0.0,0.71,3306,4676,YES,NO,YES,0,0.375535,0.479734,bxU52teuxC05EZyzFihSiKHczE2ZAIVCXekVLG7j3C0=,NO,NO,+dia7tCOijlRGbABX0YKG5L85x/hXLyJwwplN5Qab04=,f4Uu1R9nnf/h03aqiRQT0Fw3WItzNToLCyRlW1Pn8Z8=,0.576561,0.073139,0.481394,0.115697,0.473079,YES,NO,NO,NO,NO,37,0.333618,4,6,0.987169,0.0,0.89,3306,4676,YES,NO,YES,1,0.34645,0.46461,0.576561,0.073139,0.481394,0.115697,0.473079,YES,NO,NO,NO,NO,42,0.363131,5,6,0.987596,0.0,0.81,3306,4676,YES,NO,YES,2,0.375535,0.46461
1,2,,,,,0.0,0.0,0.0,0.0,0.0,,,,,,0,0.0,0,0,0.0,0.0,0.0,0,0,,,,0,0.0,0.0,NO,NO,NO,NO,l0G2rvmLGE6mpPtAibFsoW/0SiNnAuyAc4k35TrHvoQ=,lblNNeOLanWhqgISofUngPYP0Ne1yQv3QeNHqCAoh48=,1.058379,0.125832,0.932547,0.663037,0.569047,YES,NO,NO,NO,NO,9,0.709921,5,6,0.96824,0.0,0.81,4678,3306,YES,NO,YES,3,0.741682,0.560282,MZZbXga8gvaCBqWpzrh2iKdOkcsz/bG/z4BVjUnqWT0=,NO,NO,TqL9cs8ZFzALzVpZv6wYBDi+6zwhrdarQE/3FH+XAlA=,aZTF/lredyP4cukeN8bh6kpBjYmS1QFNpPOg2LVm3Lg=,1.058379,0.125832,0.932547,0.663037,0.628474,YES,NO,NO,NO,NO,2,0.679371,8,7,0.937387,0.0,0.84,4678,3306,YES,NO,YES,1,0.741984,0.619282,YvZUuCDjLu9VvkCdBWgARWQrvm+FSXgxp0zIrMjcLBc=,NO,NO,dsyhxXKNNJy4WVGD/v4+UGyW3jHWkx2xTdg3STsf34A=,X6dDAI/DZOWvu0Dg6gCgRoNr2vTUz/mc4SdHTNUPS38=,1.058379,0.125832,0.932547,0.663037,0.602394,NO,NO,NO,NO,NO,11,0.581367,3,6,0.966122,0.0,0.87,4678,3306,NO,NO,NO,3,0.615245,0.59363,1.058379,0.125832,0.932547,0.663037,0.602394,YES,NO,NO,NO,NO,9,0.709921,4,6,0.96824,0.0,0.51,4678,3306,YES,NO,YES,4,0.741682,0.59363
2,3,NO,NO,ib4VpsEsqJHzDiyL0dZLQ+xQzDPrkxE+9T3mx5fv2wI=,X6dDAI/DZOWvu0Dg6gCgRoNr2vTUz/mc4SdHTNUPS38=,1.341803,0.051422,0.935572,0.04144,0.50171,NO,NO,YES,NO,NO,2,0.838475,3,5,0.966122,0.0,0.74,4678,3306,NO,NO,NO,2,0.872353,0.493159,NO,NO,YES,YES,9TRXThP/ifDpJRGFX1LQseibUA1NJ3XM53gy+1eZ46k=,XSJ6E8aAoZC7/KAu3eETpfMg3mCq7HVBFIVIsoMKh9E=,1.341803,0.051422,0.935572,0.04144,0.447627,YES,NO,NO,NO,YES,2,0.752269,5,7,0.95493,0.0,0.82,4678,3306,YES,NO,YES,2,0.797338,0.438435,cr+kkNnNFV9YL0vz029hk3ohIDmGuABRVNhFe0ePZyo=,NO,NO,oFsUwSLCWcj8UA1cqILh5afKVcvwlFA+ohJ147Wkz5I=,WV5vAHFyqkeuyFB5KVNGFOBuwjkUGKYc8wh9QfpVzAA=,1.341803,0.051422,0.935572,0.04144,0.522873,YES,NO,NO,NO,NO,1,0.732305,6,6,0.95493,0.0,0.8,4678,3306,YES,NO,YES,0,0.777374,0.513681,X6dDAI/DZOWvu0Dg6gCgRoNr2vTUz/mc4SdHTNUPS38=,NO,NO,mRPnGiKVOWTk/vzZaqlLXZRtdrkcQ/sX0hqBCqOuKq0=,oo9tGpHvTredpg9JkHgYbZAuxcwtSpQxU5mA/zUbxY8=,1.341803,0.051422,0.935572,0.04144,0.50171,NO,NO,NO,NO,NO,2,0.65729,6,5,0.936479,0.0,0.79,4678,3306,NO,NO,NO,0,0.720811,0.493159,1.341803,0.051422,0.935572,0.04144,0.50171,NO,NO,YES,NO,NO,5,0.742589,3,5,0.966122,0.0,0.85,4678,3306,NO,NO,NO,1,0.776467,0.493159
3,4,YES,NO,BfrqME7vdLw3suQp6YAT16W2piNUmpKhMzuDrVrFQ4w=,YGCdISifn4fLao/ASKdZFhGIq23oqzfSbUVb6px1pig=,0.653912,0.041471,0.940787,0.090851,0.556564,YES,NO,NO,NO,NO,37,0.127405,8,15,0.959171,0.0,0.96,3306,4678,YES,NO,YES,1,0.168234,0.546582,NO,NO,YES,NO,BfrqME7vdLw3suQp6YAT16W2piNUmpKhMzuDrVrFQ4w=,YGCdISifn4fLao/ASKdZFhGIq23oqzfSbUVb6px1pig=,0.653912,0.041471,0.940787,0.090851,0.556564,YES,NO,NO,NO,NO,37,0.127405,8,15,0.959171,0.0,0.96,3306,4678,YES,NO,YES,1,0.168234,0.546582,XQG0f+jmjLI0UHAXXH2RYL4MEHa+yd9okO+730PCZuc=,YES,NO,/1yAAEg6Qib4GMD+wvGOlGmpCIPIAzioWtcCwbns9/I=,YGCdISifn4fLao/ASKdZFhGIq23oqzfSbUVb6px1pig=,0.653912,0.041471,0.940787,0.090851,0.557774,YES,NO,NO,NO,NO,22,0.067764,8,15,0.959598,0.0,0.93,3306,4678,YES,NO,YES,2,0.108166,0.547792,Vl+TDNSupucNoI+Fqeo7bMCkxg1hRjgTSS6NYb9BW00=,YES,NO,/1yAAEg6Qib4GMD+wvGOlGmpCIPIAzioWtcCwbns9/I=,YGCdISifn4fLao/ASKdZFhGIq23oqzfSbUVb6px1pig=,0.653912,0.041471,0.940787,0.090851,0.557774,YES,NO,NO,NO,NO,22,0.067764,8,15,0.959598,0.0,0.93,3306,4678,YES,NO,YES,2,0.108166,0.547792,0.653912,0.041471,0.940787,0.090851,0.557774,YES,NO,NO,NO,NO,0,0.067764,17,15,0.92755,0.0,0.945,3306,4678,NO,NO,YES,3,0.168234,0.546582
4,5,NO,NO,RTjsrrR8DTlJyaIP9Q3Z8s0zseqlVQTrlSe97GCWfbk=,3yK2OPj1uYDsoMgsxsjY1FxXkOllD8Xfh20VYGqT+nU=,1.415919,0.0,1.0,0.0,0.375297,NO,NO,YES,NO,NO,1,0.523543,4,11,0.963004,0.0,1.0,1263,892,NO,NO,NO,2,0.560538,0.361045,NO,NO,NO,NO,XEDyQD4da6aJkZiBf+r7LD2VdhLGnCMsSpuRFUyCZgg=,Co/nVSLofrWsM5qpcKLXfekegArokgN29XjEXttuXK4=,1.415919,0.0,1.0,0.0,0.300079,YES,NO,NO,NO,YES,6,0.16704,3,3,0.971973,0.0,1.0,1263,892,YES,NO,YES,1,0.195067,0.285827,wIHg6aGH2GMPX6l1pCTzeS1bXE4jxRqmd9ubES4HgW8=,NO,NO,ST8+q2Jgb91pWEwLwmSoJzXEGsQKeQGbzlLbgHPtj4w=,rB07AAHPffU4zFFF8IrqfKSltyWcPyy4+q+IM5SLZiQ=,1.415919,0.0,1.0,0.0,0.400633,NO,NO,NO,NO,NO,9,0.144619,10,14,0.944507,-0.5,1.0,1263,892,NO,NO,NO,1,0.221973,0.386382,WYQEP5EEzM+P+nfkHKLkGko/S3RdBgfEQ3IcyYwrChE=,NO,NO,fylJzYvYlM0+kRBeLB3eFKKgCibqxFvBa8hL+WStwCE=,IoM2E9pNxABFR+H3yfapUL+ThKm7GtTzY7js9H/H99o=,1.415919,0.0,1.0,0.0,0.375297,NO,NO,NO,NO,NO,1,0.065022,8,11,0.92713,0.0,1.0,1263,892,NO,NO,NO,0,0.137892,0.361045,1.415919,0.0,1.0,0.0,0.375297,NO,NO,NO,NO,NO,9,0.146861,11,11,0.900224,0.0,1.0,1263,892,NO,NO,NO,1,0.246637,0.361045


In [5]:
train_features.shape

(9999, 146)

In [6]:
# train_features.dtypes.values

In [7]:
train_features.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9999 entries, 0 to 9998
Columns: 146 entries, id to x145
dtypes: float64(55), int64(31), object(60)
memory usage: 11.1+ MB


### Checking null values

In [8]:
train_features.isna().sum().values

array([   0, 1426, 1426, 1426, 1426,    0,    0,    0,    0,    0, 1426,
       1426, 1426, 1426, 1426,    0,    0,    0,    0,    0,    0,    0,
          0,    0, 1426, 1426, 1426,    0,    0,    0,    0,    0,  284,
        284,  284,  284,    0,    0,    0,    0,    0,  284,  284,  284,
        284,  284,    0,    0,    0,    0,    0,    0,    0,    0,    0,
        284,  284,  284,    0,    0,    0,    0,  396,  396,  396,  396,
          0,    0,    0,    0,    0,  396,  396,  396,  396,  396,    0,
          0,    0,    0,    0,    0,    0,    0,    0,  396,  396,  396,
          0,    0,    0,    0,  851,  851,  851,  851,    0,    0,    0,
          0,    0,  851,  851,  851,  851,  851,    0,    0,    0,    0,
          0,    0,    0,    0,    0,  851,  851,  851,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0], dtype=int64)

In [9]:
train_features['x1'].value_counts().count()

2

In [10]:
train_features[train_features.duplicated()]

Unnamed: 0,id,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,x16,x17,x18,x19,x20,x21,x22,x23,x24,x25,x26,x27,x28,x29,x30,x31,x32,x33,x34,x35,x36,x37,x38,x39,x40,x41,x42,x43,x44,x45,x46,x47,x48,x49,x50,x51,x52,x53,x54,x55,x56,x57,x58,x59,x60,x61,x62,x63,x64,x65,x66,x67,x68,x69,x70,x71,x72,x73,x74,x75,x76,x77,x78,x79,x80,x81,x82,x83,x84,x85,x86,x87,x88,x89,x90,x91,x92,x93,x94,x95,x96,x97,x98,x99,x100,x101,x102,x103,x104,x105,x106,x107,x108,x109,x110,x111,x112,x113,x114,x115,x116,x117,x118,x119,x120,x121,x122,x123,x124,x125,x126,x127,x128,x129,x130,x131,x132,x133,x134,x135,x136,x137,x138,x139,x140,x141,x142,x143,x144,x145


### We can see there's no duplicates.

In [11]:
# Separate numeric, boolean, and categorical columns
numeric_cols = train_features.select_dtypes(include=['number']).columns
boolean_cols = train_features.select_dtypes(include=['bool']).columns
categorical_cols = train_features.select_dtypes(include=['object']).columns

### Getting the binary categorical values

In [16]:

def binary_features(df):
    
    train_features = df
    
    # Identify columns with exactly two unique values
    binary_columns = [col for col in train_features.columns if train_features[col].nunique() == 2]

    # Filter these columns in the DataFrame
    binary_features = train_features[binary_columns]

    # Fill NaN values with 50% 'YES' and 50% 'NO'
    for col in binary_columns:
        num_nans = binary_features[col].isna().sum()
        num_yes = num_nans // 2
        num_no = num_nans - num_yes

        nan_indices = binary_features[binary_features[col].isna()].index
        yes_indices = np.random.choice(nan_indices, num_yes, replace=False)
        no_indices = nan_indices.difference(yes_indices)

        binary_features.loc[yes_indices, col] = 'YES'
        binary_features.loc[no_indices, col] = 'NO'

        #le = LabelEncoder()

        #for col in binary_features.columns:
        #    binary_features[col] = le.fit_transform(binary_features[col])

    # Replace the binary columns in the original DataFrame
    train_features[binary_columns] = binary_features

    return train_features

In [17]:
train_features = binary_features(train_features)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  binary_features.loc[yes_indices, col] = 'YES'
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  binary_features.loc[no_indices, col] = 'NO'
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  binary_features.loc[yes_indices, col] = 'YES'
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  binary_features.loc[no_indices, 

### Fill 50% of the null values to 'YES' rest 'NO'

### After fill the null values, We can see there's no null values.

In [19]:
train_features.isna().sum().values

array([   0,    0,    0, 1426, 1426,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,  284,  284,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,  396,  396,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,  851,  851,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0], dtype=int64)

### Same process for the non binary object datas. In here we fill with null value. 

In [21]:

def multi_category(df):
    
    train_features = df
    
    # Identify columns with exactly two unique values
    multi_columns = [col for col in train_features.select_dtypes(include=['object']).columns if train_features[col].nunique() > 2]

    # Filter these columns in the DataFrame
    multi_features = train_features[multi_columns]

    for col in multi_columns:
        num_nans = multi_features[col].isna().sum()

        nan_indices = multi_features[multi_features[col].isna()].index

        multi_features.loc[nan_indices, col] = 'null'

    # Apply the hash function to the hash columns in both train and test data
    #for col in multi_features.columns:
    #    multi_features[col] = multi_features[col].apply(lambda x: hash_string(str(x)))

    # Replace the binary columns in the original DataFrame
    train_features[multi_columns] = multi_features

    return train_features

In [22]:
train_features = multi_category(train_features)

In [23]:
train_features

Unnamed: 0,id,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,x16,x17,x18,x19,x20,x21,x22,x23,x24,x25,x26,x27,x28,x29,x30,x31,x32,x33,x34,x35,x36,x37,x38,x39,x40,x41,x42,x43,x44,x45,x46,x47,x48,x49,x50,x51,x52,x53,x54,x55,x56,x57,x58,x59,x60,x61,x62,x63,x64,x65,x66,x67,x68,x69,x70,x71,x72,x73,x74,x75,x76,x77,x78,x79,x80,x81,x82,x83,x84,x85,x86,x87,x88,x89,x90,x91,x92,x93,x94,x95,x96,x97,x98,x99,x100,x101,x102,x103,x104,x105,x106,x107,x108,x109,x110,x111,x112,x113,x114,x115,x116,x117,x118,x119,x120,x121,x122,x123,x124,x125,x126,x127,x128,x129,x130,x131,x132,x133,x134,x135,x136,x137,x138,x139,x140,x141,x142,x143,x144,x145
0,1,NO,NO,dqOiM6yBYgnVSezBRiQXs9bvOFnRqrtIoXRIElxD7g8=,GNjrXXA3SxbgD0dTRblAPO9jFJ7AIaZnu/f48g5XSUk=,0.576561,0.073139,0.481394,0.115697,0.472474,YES,NO,NO,NO,NO,42,0.396065,3,6,0.991018,0.0,0.82,3306,4676,YES,NO,YES,0,0.405047,0.464610,NO,NO,NO,NO,mimucPmJSF6NI6KM6cPIaaVxWaQyIQzSgtwTTb9bKlc=,s7mTY62CCkWUFc36AW2TlYAy5CIcniD2Vz+lHzyYCLg=,0.576561,0.073139,0.481394,0.115697,0.458560,YES,NO,YES,NO,NO,9,0.368263,2,10,0.992729,0.0,0.94,3306,4676,YES,NO,YES,1,0.375535,0.451301,+2TNtXRI6r9owdGCS80Ia9VVv8ZpuOpVaHEvxRGGu78=,NO,NO,Op+X3asn5H7EQJErI7PR0NkUs3YB+Ld/8OfWuiOC8tU=,GeerC2BbPUcQfQO86NmvOsKrfTvmW7HF+Iru9y+7DPA=,0.576561,0.073139,0.481394,0.115697,0.487598,YES,NO,NO,NO,NO,42,0.363131,6,10,0.987596,0.0,0.71,3306,4676,YES,NO,YES,0,0.375535,0.479734,bxU52teuxC05EZyzFihSiKHczE2ZAIVCXekVLG7j3C0=,NO,NO,+dia7tCOijlRGbABX0YKG5L85x/hXLyJwwplN5Qab04=,f4Uu1R9nnf/h03aqiRQT0Fw3WItzNToLCyRlW1Pn8Z8=,0.576561,0.073139,0.481394,0.115697,0.473079,YES,NO,NO,NO,NO,37,0.333618,4,6,0.987169,0.0,0.89,3306,4676,YES,NO,YES,1,0.346450,0.464610,0.576561,0.073139,0.481394,0.115697,0.473079,YES,NO,NO,NO,NO,42,0.363131,5,6,0.987596,0.0,0.810,3306,4676,YES,NO,YES,2,0.375535,0.464610
1,2,YES,NO,,,0.000000,0.000000,0.000000,0.000000,0.000000,YES,NO,NO,NO,NO,0,0.000000,0,0,0.000000,0.0,0.00,0,0,NO,NO,YES,0,0.000000,0.000000,NO,NO,NO,NO,l0G2rvmLGE6mpPtAibFsoW/0SiNnAuyAc4k35TrHvoQ=,lblNNeOLanWhqgISofUngPYP0Ne1yQv3QeNHqCAoh48=,1.058379,0.125832,0.932547,0.663037,0.569047,YES,NO,NO,NO,NO,9,0.709921,5,6,0.968240,0.0,0.81,4678,3306,YES,NO,YES,3,0.741682,0.560282,MZZbXga8gvaCBqWpzrh2iKdOkcsz/bG/z4BVjUnqWT0=,NO,NO,TqL9cs8ZFzALzVpZv6wYBDi+6zwhrdarQE/3FH+XAlA=,aZTF/lredyP4cukeN8bh6kpBjYmS1QFNpPOg2LVm3Lg=,1.058379,0.125832,0.932547,0.663037,0.628474,YES,NO,NO,NO,NO,2,0.679371,8,7,0.937387,0.0,0.84,4678,3306,YES,NO,YES,1,0.741984,0.619282,YvZUuCDjLu9VvkCdBWgARWQrvm+FSXgxp0zIrMjcLBc=,NO,NO,dsyhxXKNNJy4WVGD/v4+UGyW3jHWkx2xTdg3STsf34A=,X6dDAI/DZOWvu0Dg6gCgRoNr2vTUz/mc4SdHTNUPS38=,1.058379,0.125832,0.932547,0.663037,0.602394,NO,NO,NO,NO,NO,11,0.581367,3,6,0.966122,0.0,0.87,4678,3306,NO,NO,NO,3,0.615245,0.593630,1.058379,0.125832,0.932547,0.663037,0.602394,YES,NO,NO,NO,NO,9,0.709921,4,6,0.968240,0.0,0.510,4678,3306,YES,NO,YES,4,0.741682,0.593630
2,3,NO,NO,ib4VpsEsqJHzDiyL0dZLQ+xQzDPrkxE+9T3mx5fv2wI=,X6dDAI/DZOWvu0Dg6gCgRoNr2vTUz/mc4SdHTNUPS38=,1.341803,0.051422,0.935572,0.041440,0.501710,NO,NO,YES,NO,NO,2,0.838475,3,5,0.966122,0.0,0.74,4678,3306,NO,NO,NO,2,0.872353,0.493159,NO,NO,YES,YES,9TRXThP/ifDpJRGFX1LQseibUA1NJ3XM53gy+1eZ46k=,XSJ6E8aAoZC7/KAu3eETpfMg3mCq7HVBFIVIsoMKh9E=,1.341803,0.051422,0.935572,0.041440,0.447627,YES,NO,NO,NO,YES,2,0.752269,5,7,0.954930,0.0,0.82,4678,3306,YES,NO,YES,2,0.797338,0.438435,cr+kkNnNFV9YL0vz029hk3ohIDmGuABRVNhFe0ePZyo=,NO,NO,oFsUwSLCWcj8UA1cqILh5afKVcvwlFA+ohJ147Wkz5I=,WV5vAHFyqkeuyFB5KVNGFOBuwjkUGKYc8wh9QfpVzAA=,1.341803,0.051422,0.935572,0.041440,0.522873,YES,NO,NO,NO,NO,1,0.732305,6,6,0.954930,0.0,0.80,4678,3306,YES,NO,YES,0,0.777374,0.513681,X6dDAI/DZOWvu0Dg6gCgRoNr2vTUz/mc4SdHTNUPS38=,NO,NO,mRPnGiKVOWTk/vzZaqlLXZRtdrkcQ/sX0hqBCqOuKq0=,oo9tGpHvTredpg9JkHgYbZAuxcwtSpQxU5mA/zUbxY8=,1.341803,0.051422,0.935572,0.041440,0.501710,NO,NO,NO,NO,NO,2,0.657290,6,5,0.936479,0.0,0.79,4678,3306,NO,NO,NO,0,0.720811,0.493159,1.341803,0.051422,0.935572,0.041440,0.501710,NO,NO,YES,NO,NO,5,0.742589,3,5,0.966122,0.0,0.850,4678,3306,NO,NO,NO,1,0.776467,0.493159
3,4,YES,NO,BfrqME7vdLw3suQp6YAT16W2piNUmpKhMzuDrVrFQ4w=,YGCdISifn4fLao/ASKdZFhGIq23oqzfSbUVb6px1pig=,0.653912,0.041471,0.940787,0.090851,0.556564,YES,NO,NO,NO,NO,37,0.127405,8,15,0.959171,0.0,0.96,3306,4678,YES,NO,YES,1,0.168234,0.546582,NO,NO,YES,NO,BfrqME7vdLw3suQp6YAT16W2piNUmpKhMzuDrVrFQ4w=,YGCdISifn4fLao/ASKdZFhGIq23oqzfSbUVb6px1pig=,0.653912,0.041471,0.940787,0.090851,0.556564,YES,NO,NO,NO,NO,37,0.127405,8,15,0.959171,0.0,0.96,3306,4678,YES,NO,YES,1,0.168234,0.546582,XQG0f+jmjLI0UHAXXH2RYL4MEHa+yd9okO+730PCZuc=,YES,NO,/1yAAEg6Qib4GMD+wvGOlGmpCIPIAzioWtcCwbns9/I=,YGCdISifn4fLao/ASKdZFhGIq23oqzfSbUVb6px1pig=,0.653912,0.041471,0.940787,0.090851,0.557774,YES,NO,NO,NO,NO,22,0.067764,8,15,0.959598,0.0,0.93,3306,4678,YES,NO,YES,2,0.108166,0.547792,Vl+TDNSupucNoI+Fqeo7bMCkxg1hRjgTSS6NYb9BW00=,YES,NO,/1yAAEg6Qib4GMD+wvGOlGmpCIPIAzioWtcCwbns9/I=,YGCdISifn4fLao/ASKdZFhGIq23oqzfSbUVb6px1pig=,0.653912,0.041471,0.940787,0.090851,0.557774,YES,NO,NO,NO,NO,22,0.067764,8,15,0.959598,0.0,0.93,3306,4678,YES,NO,YES,2,0.108166,0.547792,0.653912,0.041471,0.940787,0.090851,0.557774,YES,NO,NO,NO,NO,0,0.067764,17,15,0.927550,0.0,0.945,3306,4678,NO,NO,YES,3,0.168234,0.546582
4,5,NO,NO,RTjsrrR8DTlJyaIP9Q3Z8s0zseqlVQTrlSe97GCWfbk=,3yK2OPj1uYDsoMgsxsjY1FxXkOllD8Xfh20VYGqT+nU=,1.415919,0.000000,1.000000,0.000000,0.375297,NO,NO,YES,NO,NO,1,0.523543,4,11,0.963004,0.0,1.00,1263,892,NO,NO,NO,2,0.560538,0.361045,NO,NO,NO,NO,XEDyQD4da6aJkZiBf+r7LD2VdhLGnCMsSpuRFUyCZgg=,Co/nVSLofrWsM5qpcKLXfekegArokgN29XjEXttuXK4=,1.415919,0.000000,1.000000,0.000000,0.300079,YES,NO,NO,NO,YES,6,0.167040,3,3,0.971973,0.0,1.00,1263,892,YES,NO,YES,1,0.195067,0.285827,wIHg6aGH2GMPX6l1pCTzeS1bXE4jxRqmd9ubES4HgW8=,NO,NO,ST8+q2Jgb91pWEwLwmSoJzXEGsQKeQGbzlLbgHPtj4w=,rB07AAHPffU4zFFF8IrqfKSltyWcPyy4+q+IM5SLZiQ=,1.415919,0.000000,1.000000,0.000000,0.400633,NO,NO,NO,NO,NO,9,0.144619,10,14,0.944507,-0.5,1.00,1263,892,NO,NO,NO,1,0.221973,0.386382,WYQEP5EEzM+P+nfkHKLkGko/S3RdBgfEQ3IcyYwrChE=,NO,NO,fylJzYvYlM0+kRBeLB3eFKKgCibqxFvBa8hL+WStwCE=,IoM2E9pNxABFR+H3yfapUL+ThKm7GtTzY7js9H/H99o=,1.415919,0.000000,1.000000,0.000000,0.375297,NO,NO,NO,NO,NO,1,0.065022,8,11,0.927130,0.0,1.00,1263,892,NO,NO,NO,0,0.137892,0.361045,1.415919,0.000000,1.000000,0.000000,0.375297,NO,NO,NO,NO,NO,9,0.146861,11,11,0.900224,0.0,1.000,1263,892,NO,NO,NO,1,0.246637,0.361045
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9994,9995,NO,NO,jComfqYTXYSeH3GvqcOhwPmldb+BCdVJDKKDNkdtw2w=,Kr2CC15nSwDjdpyAeVh4vOuIaHuC/Q7cL9BAK28JoG8=,1.207136,0.082855,0.918960,0.313880,0.495189,NO,NO,NO,NO,NO,10,0.512852,3,7,0.967342,0.0,0.84,4677,3307,NO,NO,NO,0,0.545510,0.486209,NO,NO,NO,NO,SRFkJlXGZgnI2svGQLoCAcrghqMRr+u5s36xzSMAOqg=,ubHy/++3FVAS97znZOt7L+cjkZFJREIiJPRZEfRIztc=,1.207136,0.082855,0.918960,0.313880,0.379303,NO,NO,YES,NO,NO,1,0.485939,3,7,0.983671,0.0,0.87,4677,3307,NO,NO,NO,0,0.502268,0.373316,7uf0BQkpKFCgsoTY6hGENDudghJBAtKvDQ3VTc1nO7E=,NO,NO,lPUeS6siL3Hb9UUwnRC9piF2fYeBf+u85lUSgk4qgg4=,3yK2OPj1uYDsoMgsxsjY1FxXkOllD8Xfh20VYGqT+nU=,1.207136,0.082855,0.918960,0.313880,0.765020,NO,NO,NO,NO,NO,4,0.498034,4,10,0.966132,0.0,0.88,4677,3307,NO,NO,NO,2,0.531902,0.757965,NMZ5RsVp7WQ9rJP2seQnLYgSaJ7ga0FV3Ieg0DW59C0=,NO,NO,2QaQ5ANfKi6SrWtkIv5y7DUAmEz3dJXLn5fhhjBV8N8=,uO9xKdGEeEsG0BVm6VI/0XYd8E0DRuXXEa2gwcoKHcg=,1.207136,0.082855,0.918960,0.313880,0.497969,NO,NO,NO,NO,NO,5,0.226489,14,7,0.886907,0.0,0.87,4677,3307,NO,NO,NO,1,0.339583,0.486637,1.207136,0.082855,0.918960,0.313880,0.496900,YES,NO,NO,NO,NO,1,0.473843,4,7,0.971575,0.0,0.810,4677,3307,YES,NO,YES,1,0.502268,0.486637
9995,9996,NO,NO,Pr5enXjzVzVjZziZxrDcgnyu6CLUftmEbnp6TctyJbU=,YvZUuCDjLu9VvkCdBWgARWQrvm+FSXgxp0zIrMjcLBc=,1.414798,0.000000,1.000000,0.000000,0.357369,YES,NO,NO,NO,NO,6,0.864350,4,17,0.974215,0.0,1.00,1262,892,YES,NO,YES,15,0.890135,0.346276,NO,NO,YES,YES,v4GptRPKxsWXk7yKmmdhHBTqN7QoYJTK1GPjpjppMa4=,WV5vAHFyqkeuyFB5KVNGFOBuwjkUGKYc8wh9QfpVzAA=,1.414798,0.000000,1.000000,0.000000,0.357369,YES,NO,NO,NO,NO,2,0.774664,6,17,0.959641,0.0,1.00,1262,892,YES,NO,YES,14,0.815022,0.346276,gtvCdEuc1Tnjv6MSRDG8mAMO+KHeyqX/rg0IQwpdbi8=,YES,YES,v4GptRPKxsWXk7yKmmdhHBTqN7QoYJTK1GPjpjppMa4=,WV5vAHFyqkeuyFB5KVNGFOBuwjkUGKYc8wh9QfpVzAA=,1.414798,0.000000,1.000000,0.000000,0.357369,YES,NO,NO,NO,NO,2,0.774664,6,17,0.959641,0.0,1.00,1262,892,YES,NO,YES,14,0.815022,0.346276,zX9+hre+RQeHdvHyFAguXw2WNsshYzygopGqPn/BDLc=,YES,YES,v4GptRPKxsWXk7yKmmdhHBTqN7QoYJTK1GPjpjppMa4=,WV5vAHFyqkeuyFB5KVNGFOBuwjkUGKYc8wh9QfpVzAA=,1.414798,0.000000,1.000000,0.000000,0.357369,YES,NO,NO,NO,NO,2,0.774664,6,17,0.959641,0.0,1.00,1262,892,YES,NO,YES,14,0.815022,0.346276,1.414798,0.000000,1.000000,0.000000,0.357369,YES,NO,NO,NO,NO,0,0.774664,11,17,0.933857,0.0,1.000,1262,892,YES,NO,YES,15,0.890135,0.346276
9996,9997,YES,YES,9Avs0tL1zvH7Xx41z2UqrXs11/4IWLmqRAodLt/SKjQ=,WV5vAHFyqkeuyFB5KVNGFOBuwjkUGKYc8wh9QfpVzAA=,1.413677,0.000000,1.000000,0.000000,0.668517,YES,NO,NO,NO,NO,3,0.865471,6,7,0.939462,0.0,1.00,1261,892,YES,NO,YES,6,0.926009,0.659001,YES,YES,NO,NO,9cgVS5E58bStBXAoRa9+MN4C7HpJh+UfM6/QcCwH0k4=,FExKgjj6CsbToTubdZ+kGsOmUx3gCvZVJCdZPcdPNF4=,1.413677,0.000000,1.000000,0.000000,0.554322,NO,NO,NO,NO,NO,1,0.678251,5,5,0.048687,-6.5,1.00,1261,892,NO,NO,NO,3,0.728700,0.544806,9Avs0tL1zvH7Xx41z2UqrXs11/4IWLmqRAodLt/SKjQ=,YES,NO,c5oB4c9pSRTzkd4PmhzY4BazFbmVHbhy0lbyPp7aRbA=,YvZUuCDjLu9VvkCdBWgARWQrvm+FSXgxp0zIrMjcLBc=,1.413677,0.000000,1.000000,0.000000,0.681998,YES,NO,NO,NO,NO,6,0.690583,4,9,0.964126,0.0,1.00,1261,892,YES,NO,YES,7,0.726457,0.673275,WV5vAHFyqkeuyFB5KVNGFOBuwjkUGKYc8wh9QfpVzAA=,NO,NO,zOxT5daF0yUlUsnKTZeXgFCmbLqtE7oBITbptqroU/Q=,tnLDGLnpYhzsik5+X+WPo4KQJoQA0TfWRlmEtQ3XNJQ=,1.413677,0.000000,1.000000,0.000000,0.667724,NO,NO,YES,NO,YES,6,0.626682,2,7,0.982063,0.0,1.00,1261,892,NO,NO,NO,4,0.644619,0.659001,1.413677,0.000000,1.000000,0.000000,0.667724,YES,NO,NO,NO,NO,2,0.671525,6,7,0.945067,0.0,1.000,1261,892,YES,NO,YES,5,0.726457,0.659001
9997,9998,NO,NO,9zkXU3f6YnRPjsWi3lKSCLseIGrleg00tpRI4OplABw=,gOZBAoajyr6i7GgON0N7q5+KE4JTwH3OUM0lZOWMuG8=,1.294118,0.000000,1.000000,0.000000,0.570707,NO,NO,NO,NO,NO,7,0.400871,8,11,0.949891,0.0,1.00,1188,918,NO,NO,NO,3,0.450980,0.561448,NO,NO,NO,NO,lnHueTbAHnZiCYqByDCzhzcB9kjjnk6GOmLFNK8gG00=,Kr2CC15nSwDjdpyAeVh4vOuIaHuC/Q7cL9BAK28JoG8=,1.294118,0.000000,1.000000,0.000000,0.570707,NO,NO,YES,NO,NO,11,0.375817,3,11,0.981481,0.0,1.00,1188,918,NO,NO,NO,2,0.394336,0.561448,zsF/C4x766PfoC59pZccSIWFOtQtiX/RPXB76PwIvIg=,NO,NO,lnHueTbAHnZiCYqByDCzhzcB9kjjnk6GOmLFNK8gG00=,Kr2CC15nSwDjdpyAeVh4vOuIaHuC/Q7cL9BAK28JoG8=,1.294118,0.000000,1.000000,0.000000,0.570707,NO,NO,YES,NO,NO,11,0.375817,3,11,0.981481,0.0,1.00,1188,918,NO,NO,NO,2,0.394336,0.561448,kZD8nTcJKVhhRKawBwobfk93XBOLQrH0jlf74jOnMuI=,NO,NO,Ymx/TSp548h70p30ArcHjHkJWOyIXyZ20Vs7vwl4GCA=,rB07AAHPffU4zFFF8IrqfKSltyWcPyy4+q+IM5SLZiQ=,1.294118,0.000000,1.000000,0.000000,0.570707,NO,NO,YES,NO,NO,8,0.142702,10,11,0.929194,0.0,1.00,1188,918,NO,NO,NO,1,0.213508,0.561448,1.294118,0.000000,1.000000,0.000000,0.570707,NO,NO,YES,NO,NO,0,0.142702,23,11,0.860566,1.0,1.000,1188,918,NO,NO,NO,3,0.450980,0.561448


In [24]:
train_features.isna().sum()

id      0
x1      0
x2      0
x3      0
x4      0
       ..
x141    0
x142    0
x143    0
x144    0
x145    0
Length: 146, dtype: int64

In [25]:
train_df = train_features.copy()

In [26]:
train_features.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9999 entries, 0 to 9998
Columns: 146 entries, id to x145
dtypes: float64(55), int64(31), object(60)
memory usage: 11.1+ MB


In [27]:
# getting categorical features from train data...

cat_df = train_features.select_dtypes(include=['object'])

cat_features = train_features.select_dtypes(include=['object']).columns

cat_features

Index(['x1', 'x2', 'x3', 'x4', 'x10', 'x11', 'x12', 'x13', 'x14', 'x24', 'x25',
       'x26', 'x30', 'x31', 'x32', 'x33', 'x34', 'x35', 'x41', 'x42', 'x43',
       'x44', 'x45', 'x55', 'x56', 'x57', 'x61', 'x62', 'x63', 'x64', 'x65',
       'x71', 'x72', 'x73', 'x74', 'x75', 'x85', 'x86', 'x87', 'x91', 'x92',
       'x93', 'x94', 'x95', 'x101', 'x102', 'x103', 'x104', 'x105', 'x115',
       'x116', 'x117', 'x126', 'x127', 'x128', 'x129', 'x130', 'x140', 'x141',
       'x142'],
      dtype='object')

### Applyin the TDIDF- Vectorizer

In [29]:
train2 = train_features.copy()

for feature in cat_features:
    # Apply TF-IDF vectorization

    tfidf_vectorizer = TfidfVectorizer(max_features = 2)
    
    tfidf_feature = tfidf_vectorizer.fit_transform(train2[feature])
    
    #print(tfidf_feature)
    
    train2 = train2.drop(columns=[feature])
    
    train2 = pd.concat([pd.DataFrame(tfidf_feature.toarray()), train2], axis=1)

### Scaling the data

In [31]:
train2.columns = train2.columns.astype(str)

std = StandardScaler()

train3 = std.fit_transform(train2)

final_train = pd.DataFrame(train3)

In [32]:
final_train

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205
0,-0.934813,0.934813,0.14859,-0.14859,-0.949565,0.949565,0.528797,-0.528797,0.160465,-0.160465,0.894971,-0.894971,0.077697,-0.077697,-0.902608,0.902608,-1.464019,1.464019,0.230633,-0.230633,-1.465708,1.465708,0.447777,-0.447777,0.263461,-0.263461,0.770038,-0.770038,0.254891,-0.254891,-1.455289,1.455289,-0.265841,-0.305001,-0.169422,-0.305001,0.308902,-0.308902,0.381950,-0.381950,-0.235167,-0.235167,-1.268599,1.268599,0.189608,-0.189608,-1.276146,1.276146,0.368771,-0.368771,0.181534,-0.181534,0.643495,-0.643495,0.174355,-0.174355,-1.250276,1.250276,-0.233266,-0.233266,-0.203069,-0.193535,0.290874,-0.290874,0.351222,-0.351222,-0.191861,-0.191861,-1.322590,1.322590,0.168168,-0.168168,-1.331495,1.331495,0.364558,-0.364558,0.169109,-0.169109,-1.527416,1.527416,0.162096,-0.162096,-1.305053,1.305053,-0.229428,-0.229428,-0.184742,-0.184742,0.276049,-0.276049,0.339980,-0.339980,0.236348,-0.236348,0.288847,-0.288847,-1.250538,1.250538,0.2961,-0.2961,-1.253171,1.253171,0.486397,-0.486397,0.322692,-0.322692,0.786272,-0.786272,0.31661,-0.31661,-1.239309,1.239309,-0.226759,-0.407843,-0.173134,-0.407843,0.375910,-0.375910,0.429459,-0.429459,-1.731878,-0.740232,0.151886,-0.878525,-0.173273,0.081133,4.104955,-0.076479,-0.338037,-0.294927,0.499739,0.128467,0.090058,0.579277,1.864294,-0.691244,-0.153452,0.083726,-1.230147,0.104771,-1.846781,-0.202356,-0.097083,0.078264,-0.226732,-0.536621,0.269334,0.435993,0.149883,0.232379,0.438832,1.833199,-0.392065,-0.399425,-0.087479,-1.222908,0.096953,-1.667531,-0.238825,-0.100812,3.195673,-0.216962,-0.127757,0.252283,0.415496,0.135713,-0.792186,0.445374,1.829111,-0.654121,-0.378134,-0.094738,-0.952283,0.144691,-1.209777,-0.199557,-0.013653,2.926908,-0.117194,-0.186084,-0.346482,0.407124,0.133224,0.161707,0.503857,1.837564,-0.349905,-0.200807,-0.011686,-1.475663,0.087135,-2.690826,-0.233831,-0.168363,4.064889,-0.262241,-0.566444,-0.425390,0.769844,-0.037165,-0.968699,0.400915,1.826982,-0.151420,-0.531420,-0.157912
1,-0.934813,0.934813,0.14859,-0.14859,-0.949565,0.949565,0.528797,-0.528797,0.160465,-0.160465,0.894971,-0.894971,0.077697,-0.077697,-0.902608,0.902608,0.683051,-0.683051,0.230633,-0.230633,0.682264,-0.682264,0.447777,-0.447777,0.263461,-0.263461,0.770038,-0.770038,0.254891,-0.254891,0.687149,-0.687149,-0.265841,-0.305001,-0.169422,-0.305001,0.308902,-0.308902,0.381950,-0.381950,-0.235167,-0.235167,-1.268599,1.268599,0.189608,-0.189608,-1.276146,1.276146,0.368771,-0.368771,0.181534,-0.181534,0.643495,-0.643495,0.174355,-0.174355,-1.250276,1.250276,-0.233266,-0.233266,-0.203069,-0.193535,0.290874,-0.290874,0.351222,-0.351222,-0.191861,-0.191861,-1.322590,1.322590,0.168168,-0.168168,-1.331495,1.331495,0.364558,-0.364558,0.169109,-0.169109,0.654700,-0.654700,0.162096,-0.162096,-1.305053,1.305053,-0.229428,-0.229428,-0.184742,-0.184742,0.276049,-0.276049,0.339980,-0.339980,0.236348,-0.236348,0.288847,-0.288847,-1.250538,1.250538,0.2961,-0.2961,0.797976,-0.797976,0.486397,-0.486397,0.322692,-0.322692,0.786272,-0.786272,0.31661,-0.31661,-1.239309,1.239309,-0.226759,2.451922,-0.173134,2.451922,0.375910,-0.375910,-2.328511,2.328511,-1.731531,-1.839152,-0.417309,-2.241569,-0.524932,-1.490985,-0.691728,-1.421060,-1.021654,-1.148179,-2.428892,0.128467,-2.300997,-1.322629,-1.238706,-0.691244,-1.495225,-1.470456,-0.077188,0.500459,0.168908,1.489693,0.325725,0.078264,0.987515,-0.220987,-0.346789,0.304135,0.149883,-0.423951,1.282618,0.864568,0.097753,0.899746,0.329182,-0.078947,0.487525,0.201802,1.327360,0.417698,-0.592917,0.896113,0.053737,-0.196096,0.181240,0.135713,-0.196672,1.281178,0.867992,-0.409147,0.903018,0.419454,0.058339,0.563743,0.298707,1.445158,0.441091,0.319937,0.813751,-0.424629,-0.346482,0.329659,0.133224,0.090933,1.315483,0.903464,0.194513,0.803791,0.443427,-0.188228,0.475968,0.068322,1.380663,0.336889,0.453057,0.985113,-0.698435,-0.425390,0.492993,-0.037165,-3.500558,1.264818,0.838876,0.030164,0.820313,0.345462
2,1.069733,-1.069733,0.14859,-0.14859,1.053114,-1.053114,0.528797,-0.528797,0.160465,-0.160465,-1.117355,1.117355,0.077697,-0.077697,1.107901,-1.107901,0.683051,-0.683051,0.230633,-0.230633,0.682264,-0.682264,0.447777,-0.447777,0.263461,-0.263461,0.770038,-0.770038,0.254891,-0.254891,0.687149,-0.687149,-0.265841,-0.305001,-0.169422,-0.305001,0.308902,-0.308902,0.381950,-0.381950,-0.235167,-0.235167,-1.268599,1.268599,0.189608,-0.189608,-1.276146,1.276146,0.368771,-0.368771,0.181534,-0.181534,0.643495,-0.643495,0.174355,-0.174355,-1.250276,1.250276,-0.233266,-0.233266,-0.203069,-0.193535,0.290874,-0.290874,0.351222,-0.351222,-0.191861,-0.191861,-1.322590,1.322590,0.168168,-0.168168,-1.331495,1.331495,-2.743045,2.743045,0.169109,-0.169109,0.654700,-0.654700,0.162096,-0.162096,-1.305053,1.305053,-0.229428,-0.229428,-0.184742,-0.184742,-3.622552,3.622552,-2.941351,2.941351,0.236348,-0.236348,0.288847,-0.288847,0.799656,-0.799656,0.2961,-0.2961,0.797976,-0.797976,0.486397,-0.486397,0.322692,-0.322692,-1.271825,1.271825,0.31661,-0.31661,0.806901,-0.806901,-0.226759,-0.407843,-0.173134,-0.407843,0.375910,-0.375910,0.429459,-0.429459,-1.731185,0.718311,-0.017129,0.407455,-0.398976,0.178413,-0.463314,1.425437,-0.338037,-0.437136,0.426168,0.128467,-0.143215,1.368574,0.955160,-0.199094,1.394563,0.179228,0.601028,-0.058316,0.182423,-0.431915,-0.138922,-0.595527,1.138017,-0.220987,-0.192758,0.232475,0.149883,-0.373464,1.282618,0.864568,-0.147156,1.097228,-0.136667,0.593976,-0.064025,0.214335,-0.451308,0.029021,-0.687632,1.082426,-0.127757,-0.345556,0.263092,0.135713,-0.379907,1.281178,0.867992,-0.654121,1.026747,0.030347,0.652827,-0.028026,0.308821,-0.422695,0.087029,-0.582476,1.099039,0.291007,-0.493269,0.220553,0.133224,-0.192164,1.315483,0.903464,-0.622114,1.198334,0.089022,0.569092,-0.073127,0.086821,-0.452869,-0.056498,0.015259,1.102615,-0.830426,-0.577267,0.462709,-0.037165,-0.631118,1.264818,0.838876,-0.242212,0.948733,-0.046525
3,-0.934813,0.934813,0.14859,-0.14859,1.053114,-1.053114,0.528797,-0.528797,0.160465,-0.160465,0.894971,-0.894971,0.077697,-0.077697,-0.902608,0.902608,-1.464019,1.464019,0.230633,-0.230633,-1.465708,1.465708,0.447777,-0.447777,0.263461,-0.263461,0.770038,-0.770038,0.254891,-0.254891,-1.455289,1.455289,-0.265841,-0.305001,-0.169422,-0.305001,0.308902,-0.308902,-2.618143,2.618143,-0.235167,-0.235167,-1.268599,1.268599,0.189608,-0.189608,-1.276146,1.276146,0.368771,-0.368771,0.181534,-0.181534,0.643495,-0.643495,0.174355,-0.174355,-1.250276,1.250276,-0.233266,-0.233266,-0.203069,-0.193535,0.290874,-0.290874,-2.847202,2.847202,-0.191861,-0.191861,-1.322590,1.322590,0.168168,-0.168168,-1.331495,1.331495,0.364558,-0.364558,0.169109,-0.169109,0.654700,-0.654700,0.162096,-0.162096,-1.305053,1.305053,-0.229428,-0.229428,-0.184742,-0.184742,0.276049,-0.276049,-2.941351,2.941351,0.236348,-0.236348,0.288847,-0.288847,-1.250538,1.250538,0.2961,-0.2961,-1.253171,1.253171,0.486397,-0.486397,0.322692,-0.322692,0.786272,-0.786272,0.31661,-0.31661,-1.239309,1.239309,-0.226759,-0.407843,-0.173134,-0.407843,0.375910,-0.375910,-2.328511,2.328511,-1.730838,-0.592802,-0.094570,0.422221,-0.248793,0.360934,3.533921,-0.988540,0.801325,0.984950,0.405625,0.128467,0.498287,0.579277,1.865621,-0.445169,-0.937926,0.357933,-1.045052,-0.133042,0.205722,-0.279166,0.277956,2.773429,-1.082740,0.094647,1.039488,0.255305,0.149883,0.333353,0.438832,1.834613,-0.392065,-1.134973,0.276803,-1.039257,-0.137784,0.235943,-0.309921,0.157478,1.301378,-1.256570,0.053737,0.999582,0.284869,0.135713,0.215607,0.445374,1.830514,-0.164173,-1.312888,0.156035,-0.790039,-0.107163,0.326258,-0.274218,0.284180,1.422887,-1.116173,0.768097,0.974602,0.305646,0.133224,0.303255,0.503857,1.838928,-0.077696,-1.091373,0.281736,-1.268979,-0.146558,0.118715,-0.307120,0.162550,-0.531989,-1.324632,1.017449,0.941505,-0.088967,-0.037165,0.170637,0.400915,1.828425,-0.060628,-1.296726,0.161904
4,1.069733,-1.069733,0.14859,-0.14859,1.053114,-1.053114,0.528797,-0.528797,0.160465,-0.160465,0.894971,-0.894971,0.077697,-0.077697,1.107901,-1.107901,0.683051,-0.683051,0.230633,-0.230633,0.682264,-0.682264,0.447777,-0.447777,0.263461,-0.263461,0.770038,-0.770038,0.254891,-0.254891,0.687149,-0.687149,-0.265841,-0.305001,-0.169422,-0.305001,0.308902,-0.308902,0.381950,-0.381950,-0.235167,-0.235167,0.788271,-0.788271,0.189608,-0.189608,0.783609,-0.783609,0.368771,-0.368771,0.181534,-0.181534,0.643495,-0.643495,0.174355,-0.174355,0.799824,-0.799824,-0.233266,-0.233266,-0.203069,-0.193535,0.290874,-0.290874,0.351222,-0.351222,-0.191861,-0.191861,-1.322590,1.322590,0.168168,-0.168168,-1.331495,1.331495,-2.743045,2.743045,0.169109,-0.169109,0.654700,-0.654700,0.162096,-0.162096,-1.305053,1.305053,-0.229428,-0.229428,-0.184742,-0.184742,0.276049,-0.276049,0.339980,-0.339980,0.236348,-0.236348,0.288847,-0.288847,0.799656,-0.799656,0.2961,-0.2961,0.797976,-0.797976,0.486397,-0.486397,0.322692,-0.322692,-1.271825,1.271825,0.31661,-0.31661,0.806901,-0.806901,-0.226759,-0.407843,-0.173134,-0.407843,0.375910,-0.375910,0.429459,-0.429459,-1.730492,0.859576,-0.417309,0.589880,-0.524932,-0.242217,-0.577521,0.356288,-0.110165,0.416115,0.416955,0.128467,0.614924,-0.596039,-0.646773,-0.199094,0.361634,-0.262712,0.778384,-0.444462,0.470280,-0.560022,-0.703558,-0.210504,-0.941876,-0.431410,-0.808881,0.324238,0.149883,0.535301,-0.817621,-0.842201,-0.392065,-1.039764,-0.720121,0.769948,-0.445177,0.481291,-0.569886,-0.420897,0.070086,-0.986063,0.235230,0.850122,0.214460,-0.656614,0.536268,-0.799193,-0.825541,-0.409147,-0.915004,-0.438713,0.808288,-0.436971,0.524245,-0.547218,-0.357511,-0.682744,-1.126475,0.768097,0.387454,0.186143,0.133224,0.550965,-0.704710,-0.742462,-0.622114,-0.980272,-0.377008,0.767134,-0.452583,0.480851,-0.575104,-0.550412,0.453057,-1.040132,0.225503,0.333996,-0.479803,-0.037165,0.634811,-0.885495,-0.902211,-0.242212,-1.007282,-0.561971
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9994,-0.934813,0.934813,0.14859,-0.14859,-0.949565,0.949565,0.528797,-0.528797,0.160465,-0.160465,0.894971,-0.894971,0.077697,-0.077697,-0.902608,0.902608,0.683051,-0.683051,0.230633,-0.230633,0.682264,-0.682264,0.447777,-0.447777,0.263461,-0.263461,0.770038,-0.770038,0.254891,-0.254891,0.687149,-0.687149,-0.265841,-0.305001,-0.169422,-0.305001,0.308902,-0.308902,0.381950,-0.381950,-0.235167,-0.235167,0.788271,-0.788271,0.189608,-0.189608,0.783609,-0.783609,0.368771,-0.368771,0.181534,-0.181534,0.643495,-0.643495,0.174355,-0.174355,0.799824,-0.799824,-0.233266,-0.233266,-0.203069,-0.193535,0.290874,-0.290874,0.351222,-0.351222,-0.191861,-0.191861,0.756092,-0.756092,0.168168,-0.168168,0.751036,-0.751036,0.364558,-0.364558,0.169109,-0.169109,-1.527416,1.527416,0.162096,-0.162096,0.766252,-0.766252,-0.229428,-0.229428,-0.184742,-0.184742,0.276049,-0.276049,0.339980,-0.339980,0.236348,-0.236348,0.288847,-0.288847,0.799656,-0.799656,0.2961,-0.2961,0.797976,-0.797976,0.486397,-0.486397,0.322692,-0.322692,0.786272,-0.786272,0.31661,-0.31661,0.806901,-0.806901,-0.226759,-0.407843,-0.173134,-0.407843,0.375910,-0.375910,0.429459,-0.429459,1.730492,0.461638,0.227493,0.360419,0.429099,0.156715,0.450339,0.319993,-0.338037,-0.152719,0.429773,0.128467,0.148377,1.367998,0.955824,-0.691244,0.311849,0.155978,0.278780,0.177726,0.108203,0.410307,-0.400385,-0.691783,0.191485,-0.431410,-0.192758,0.387223,0.149883,-0.121029,1.282003,0.865275,-0.636975,0.050253,-0.385631,0.274243,0.168965,0.145504,0.328265,0.920275,-0.403487,0.257860,-0.309251,0.252283,0.315356,0.135713,-0.013437,1.280569,0.868694,-0.164173,0.168545,0.930459,0.370362,0.221954,0.253277,0.395966,0.073872,-0.281671,-0.519744,2.199369,-0.199694,0.038094,0.133224,0.090933,1.314892,0.904146,-0.349905,-0.226472,0.066013,0.209258,0.158825,-0.014773,0.350750,-0.075293,-0.422539,0.135975,-0.698435,-0.273513,0.540705,-0.037165,-0.968699,1.264188,0.839597,-0.242212,-0.063549,-0.071973
9995,-0.934813,0.934813,0.14859,-0.14859,-0.949565,0.949565,0.528797,-0.528797,0.160465,-0.160465,0.894971,-0.894971,0.077697,-0.077697,-0.902608,0.902608,-1.464019,1.464019,0.230633,-0.230633,-1.465708,1.465708,0.447777,-0.447777,0.263461,-0.263461,0.770038,-0.770038,0.254891,-0.254891,-1.455289,1.455289,-0.265841,-0.305001,-0.169422,-0.305001,-3.237269,3.237269,-2.618143,2.618143,-0.235167,-0.235167,-1.268599,1.268599,0.189608,-0.189608,-1.276146,1.276146,0.368771,-0.368771,0.181534,-0.181534,0.643495,-0.643495,0.174355,-0.174355,-1.250276,1.250276,-0.233266,-0.233266,-0.203069,-0.193535,-3.437911,3.437911,-2.847202,2.847202,-0.191861,-0.191861,-1.322590,1.322590,0.168168,-0.168168,-1.331495,1.331495,0.364558,-0.364558,0.169109,-0.169109,0.654700,-0.654700,0.162096,-0.162096,-1.305053,1.305053,-0.229428,-0.229428,-0.184742,-0.184742,-3.622552,3.622552,-2.941351,2.941351,0.236348,-0.236348,0.288847,-0.288847,-1.250538,1.250538,0.2961,-0.2961,-1.253171,1.253171,0.486397,-0.486397,0.322692,-0.322692,0.786272,-0.786272,0.31661,-0.31661,-1.239309,1.239309,-0.226759,-0.407843,-0.173134,-0.407843,0.375910,-0.375910,0.429459,-0.429459,1.730838,0.857439,-0.417309,0.589880,-0.524932,-0.301869,-0.006488,1.513277,-0.110165,1.269367,0.450084,0.128467,0.614924,-0.596614,-0.646773,2.999879,1.453466,-0.312117,0.775702,-0.444462,0.470280,-0.560022,-0.484320,-0.595527,1.217608,-0.115776,1.347549,0.257839,0.149883,0.535301,-0.818236,-0.842201,2.791755,1.159975,-0.489013,0.767286,-0.445177,0.481291,-0.569886,-0.580136,-0.592917,1.231517,-0.127757,1.298502,0.285071,0.135713,0.536268,-0.799802,-0.825541,2.775512,1.158369,-0.586492,0.805937,-0.436971,0.524245,-0.547218,-0.420555,-0.582476,1.540086,0.291007,1.268177,0.305805,0.133224,0.550965,-0.705302,-0.742462,3.188812,1.550442,-0.429106,0.764139,-0.452583,0.480851,-0.575104,-0.620458,-0.531989,1.217982,0.225503,1.245259,0.001226,-0.037165,0.634811,-0.886124,-0.902211,1.028878,1.368367,-0.619594
9996,-0.934813,0.934813,0.14859,-0.14859,-0.949565,0.949565,0.528797,-0.528797,0.160465,-0.160465,0.894971,-0.894971,0.077697,-0.077697,-0.902608,0.902608,0.683051,-0.683051,0.230633,-0.230633,0.682264,-0.682264,-2.233254,2.233254,0.263461,-0.263461,-1.298637,1.298637,0.254891,-0.254891,0.687149,-0.687149,-0.265841,-0.305001,-0.169422,-0.305001,0.308902,-0.308902,0.381950,-0.381950,-0.235167,-0.235167,-1.268599,1.268599,0.189608,-0.189608,-1.276146,1.276146,0.368771,-0.368771,0.181534,-0.181534,0.643495,-0.643495,0.174355,-0.174355,-1.250276,1.250276,4.286946,4.286946,-0.203069,-0.193535,0.290874,-0.290874,-2.847202,2.847202,-0.191861,-0.191861,0.756092,-0.756092,0.168168,-0.168168,0.751036,-0.751036,0.364558,-0.364558,0.169109,-0.169109,0.654700,-0.654700,0.162096,-0.162096,0.766252,-0.766252,-0.229428,-0.229428,-0.184742,-0.184742,0.276049,-0.276049,0.339980,-0.339980,-4.231040,4.231040,-3.462039,3.462039,-1.250538,1.250538,0.2961,-0.2961,-1.253171,1.253171,0.486397,-0.486397,0.322692,-0.322692,0.786272,-0.786272,0.31661,-0.31661,-1.239309,1.239309,-0.226759,-0.407843,-0.173134,-0.407843,-2.660214,2.660214,-2.328511,2.328511,1.731185,0.855303,-0.417309,0.589880,-0.524932,0.733448,-0.349108,1.517082,0.345580,-0.152719,0.347382,0.128467,0.614924,-0.597189,-0.646773,0.785205,1.572305,0.733989,0.773019,-0.444462,0.470280,-0.560022,0.269377,-0.691783,0.874960,-0.220987,-0.500819,-4.647029,-10.563581,0.535301,-0.818851,-0.842201,0.097753,0.853683,0.270012,0.764625,-0.445177,0.481291,-0.569886,0.614702,-0.214058,0.935576,-0.309251,0.102824,0.305992,0.135713,0.536268,-0.800411,-0.825541,1.060696,0.848735,0.618404,0.803585,-0.436971,0.524245,-0.547218,0.670828,-0.181403,0.984026,-0.663175,-0.199694,0.388331,0.133224,0.550965,-0.705893,-0.742462,0.466722,0.913574,0.674022,0.761143,-0.452583,0.480851,-0.575104,0.592142,-0.313090,0.847006,-0.434452,-0.273513,0.161569,-0.037165,0.634811,-0.886754,-0.902211,0.120956,0.764108,0.600508
9997,1.069733,-1.069733,0.14859,-0.14859,1.053114,-1.053114,0.528797,-0.528797,0.160465,-0.160465,-1.117355,1.117355,0.077697,-0.077697,1.107901,-1.107901,0.683051,-0.683051,0.230633,-0.230633,0.682264,-0.682264,0.447777,-0.447777,0.263461,-0.263461,-1.298637,1.298637,0.254891,-0.254891,0.687149,-0.687149,-0.265841,-0.305001,-0.169422,-0.305001,0.308902,-0.308902,0.381950,-0.381950,-0.235167,-0.235167,0.788271,-0.788271,0.189608,-0.189608,0.783609,-0.783609,0.368771,-0.368771,0.181534,-0.181534,-1.554014,1.554014,0.174355,-0.174355,0.799824,-0.799824,-0.233266,-0.233266,-0.203069,-0.193535,0.290874,-0.290874,0.351222,-0.351222,-0.191861,-0.191861,0.756092,-0.756092,0.168168,-0.168168,0.751036,-0.751036,0.364558,-0.364558,0.169109,-0.169109,-1.527416,1.527416,0.162096,-0.162096,0.766252,-0.766252,-0.229428,-0.229428,-0.184742,-0.184742,0.276049,-0.276049,0.339980,-0.339980,0.236348,-0.236348,0.288847,-0.288847,0.799656,-0.799656,0.2961,-0.2961,0.797976,-0.797976,0.486397,-0.486397,0.322692,-0.322692,0.786272,-0.786272,0.31661,-0.31661,0.806901,-0.806901,-0.226759,-0.407843,-0.173134,-0.407843,0.375910,-0.375910,0.429459,-0.429459,1.731531,0.627424,-0.417309,0.589880,-0.524932,0.407994,0.107719,-0.060162,0.801325,0.416115,0.378202,0.128467,0.614924,-0.639185,-0.629520,0.046980,-0.001291,0.407661,0.486921,-0.444462,0.470280,-0.560022,0.332080,0.270776,-0.199887,-0.431410,0.423365,0.375434,0.149883,0.535301,-0.863747,-0.823818,-0.147156,-0.332716,0.333638,0.480759,-0.445177,0.481291,-0.569886,0.205081,0.259516,-0.172311,-0.399998,0.401743,0.386967,0.135713,0.536268,-0.844882,-0.807301,-0.164173,-0.312404,0.206353,0.552807,-0.436971,0.524245,-0.547218,0.329661,0.019133,-0.834587,1.245188,0.387454,0.193739,0.133224,0.550965,-0.749077,-0.724735,-0.349905,-0.697667,0.329907,0.441676,-0.452583,0.480851,-0.575104,0.213083,-0.531989,-1.055093,1.809396,0.333996,-1.047013,0.620360,0.634811,-0.932720,-0.883459,-0.060628,-0.252891,0.219903


### Slicing train labels to obtain the required train labels for the given 9,999 train data

In [33]:
# slicing train labels to obtain the required train labels for the given 9,999 train data

temp = train_labels[:9999]
y = temp.drop(['id'], axis =1)
y

Unnamed: 0,y1,y2,y3,y4,y5,y6,y7,y8,y9,y10,y11,y12,y13,y14,y15,y16,y17,y18,y19,y20,y21,y22,y23,y24,y25,y26,y27,y28,y29,y30,y31,y32,y33
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
2,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9994,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
9995,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
9996,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
9997,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1


### Splitting the data for training and validation

In [34]:
X_train, X_test, y_train, y_test = train_test_split(final_train, y, test_size=0.3, random_state=3)

### Applying GridSearchCV to find the best parameter of the RandomForestClassifier model

In [35]:
params = {'n_estimators' : [100, 500], 'max_depth' : [15, 20]}

model = RandomForestClassifier()

clf = GridSearchCV(model,params, cv= 3, scoring='f1_micro', return_train_score=True, n_jobs = -1)

clf.fit(X_train, y_train)

### Here's the best parameters

In [36]:
print(clf.best_params_)

{'max_depth': 20, 'n_estimators': 500}


### Defining the RFC with the best parameters

In [37]:
v_rfc = RandomForestClassifier(max_depth=20, n_estimators=500)
#v_rfc = RandomForestClassifier(n_estimators=10, criterion='entropy')

v_rfc.fit(X_train, y_train)

### After train the model, applying to prediction

In [38]:
v_rfc_predict = v_rfc.predict(X_test)

In [39]:
v_rfc_predict

array([[0, 0, 0, ..., 0, 0, 1],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 1],
       [0, 0, 0, ..., 0, 0, 1]], dtype=int64)

### After prediction, Check the accuracy score

In [40]:
print('Accuracy of the training set: ', accuracy_score(y_test, v_rfc_predict))

Accuracy of the training set:  0.722


### Classification report

In [41]:
print('Classification report of the training set: \n', classification_report(y_test, v_rfc_predict))

Classification report of the training set: 
               precision    recall  f1-score   support

           0       1.00      0.09      0.17        22
           1       0.00      0.00      0.00         2
           2       0.98      0.60      0.75        68
           3       1.00      0.84      0.91        43
           4       0.00      0.00      0.00         1
           5       0.89      0.60      0.72       239
           6       1.00      0.31      0.47       116
           7       0.00      0.00      0.00         2
           8       0.95      0.62      0.75       223
           9       0.90      0.41      0.57        46
          10       0.00      0.00      0.00         1
          11       0.87      0.55      0.67       233
          12       1.00      0.42      0.60        40
          13       0.00      0.00      0.00         0
          14       0.00      0.00      0.00         5
          15       0.92      0.38      0.53        32
          16       0.00      0.00   

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


### Same process for the test set

In [43]:
# Load test data
test_features = pd.read_csv('./DATA Scientist Assignment/test.csv')

In [44]:
test = pd.DataFrame(test_features.values, columns=train_features.columns)

In [45]:
test

Unnamed: 0,id,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,x16,x17,x18,x19,x20,x21,x22,x23,x24,x25,x26,x27,x28,x29,x30,x31,x32,x33,x34,x35,x36,x37,x38,x39,x40,x41,x42,x43,x44,x45,x46,x47,x48,x49,x50,x51,x52,x53,x54,x55,x56,x57,x58,x59,x60,x61,x62,x63,x64,x65,x66,x67,x68,x69,x70,x71,x72,x73,x74,x75,x76,x77,x78,x79,x80,x81,x82,x83,x84,x85,x86,x87,x88,x89,x90,x91,x92,x93,x94,x95,x96,x97,x98,x99,x100,x101,x102,x103,x104,x105,x106,x107,x108,x109,x110,x111,x112,x113,x114,x115,x116,x117,x118,x119,x120,x121,x122,x123,x124,x125,x126,x127,x128,x129,x130,x131,x132,x133,x134,x135,x136,x137,x138,x139,x140,x141,x142,x143,x144,x145
0,1698002,NO,NO,9ACcuXc7MMm9V7jZSr3P3VxAKyMvLAtsdwPKwgncc+k=,WV5vAHFyqkeuyFB5KVNGFOBuwjkUGKYc8wh9QfpVzAA=,0.832679,0.049834,0.945938,0.317427,0.482021,YES,NO,NO,NO,NO,3,0.8955,6,7,0.950468,0.0,0.88,4672,3311,YES,NO,YES,1,0.945032,0.471747,NO,NO,NO,NO,cr+kkNnNFV9YL0vz029hk3ohIDmGuABRVNhFe0ePZyo=,X6dDAI/DZOWvu0Dg6gCgRoNr2vTUz/mc4SdHTNUPS38=,0.832679,0.049834,0.945938,0.317427,0.480308,NO,NO,YES,NO,NO,4,0.804893,3,7,0.966777,0.0,0.84,4672,3311,NO,NO,NO,3,0.838115,0.471318,6Wv/YGbS0KDFv2UrATvlJcVtCjJOgnVQzWuuF5Ltv2k=,NO,NO,9ACcuXc7MMm9V7jZSr3P3VxAKyMvLAtsdwPKwgncc+k=,WV5vAHFyqkeuyFB5KVNGFOBuwjkUGKYc8wh9QfpVzAA=,0.832679,0.049834,0.945938,0.317427,0.482021,YES,NO,NO,NO,NO,1,0.725763,6,7,0.950468,0.0,0.88,4672,3311,YES,NO,YES,0,0.775294,0.471747,dBSc/QZM58O6miC4ULLhY0C4S6WIZLwy2oERlRo7Iaw=,NO,NO,9ACcuXc7MMm9V7jZSr3P3VxAKyMvLAtsdwPKwgncc+k=,WV5vAHFyqkeuyFB5KVNGFOBuwjkUGKYc8wh9QfpVzAA=,0.832679,0.049834,0.945938,0.317427,0.482021,YES,NO,NO,NO,NO,1,0.725763,6,7,0.950468,0.0,0.88,4672,3311,YES,NO,YES,0,0.775294,0.471747,0.832679,0.049834,0.945938,0.317427,0.482021,YES,NO,NO,NO,NO,0,0.725763,17,7,0.876992,1.0,0.866667,4672,3311,NO,NO,NO,5,0.945032,0.471318
1,1698003,NO,NO,MeBJ/ZzEIXfNKat4w1oeDxiMNKrAeY0PH41i00hpYDo=,tnLDGLnpYhzsik5+X+WPo4KQJoQA0TfWRlmEtQ3XNJQ=,1.415919,0.0,1.0,0.0,0.703088,NO,NO,YES,YES,YES,4,0.536996,2,10,0.979821,0.0,1.0,1263,892,NO,NO,NO,8,0.557175,0.693587,NO,NO,NO,NO,W4WKBYyqy7COFohWfUyg2xuUPldk8b4hPsDCHjbePaU=,ABhwE8nsQMtFix2MDemgmGfoV68Tn3hLpZRXWVTQ0TM=,1.415919,0.0,1.0,0.0,0.703088,NO,NO,NO,NO,NO,3,0.466368,6,10,0.939462,0.0,1.0,1263,892,NO,NO,NO,7,0.526906,0.693587,0+HOWI7C643OXIZ7LdoeNFzA7A0MkSH//diziszbII8=,NO,NO,W4WKBYyqy7COFohWfUyg2xuUPldk8b4hPsDCHjbePaU=,ABhwE8nsQMtFix2MDemgmGfoV68Tn3hLpZRXWVTQ0TM=,1.415919,0.0,1.0,0.0,0.703088,NO,NO,NO,NO,NO,3,0.466368,6,10,0.939462,0.0,1.0,1263,892,NO,NO,NO,7,0.526906,0.693587,eI6XNOFa+S9J5wTICUEas3M6YE4TJ6rSTCo2NvqM16s=,NO,NO,W4WKBYyqy7COFohWfUyg2xuUPldk8b4hPsDCHjbePaU=,ABhwE8nsQMtFix2MDemgmGfoV68Tn3hLpZRXWVTQ0TM=,1.415919,0.0,1.0,0.0,0.703088,NO,NO,NO,NO,NO,3,0.466368,6,10,0.939462,0.0,1.0,1263,892,NO,NO,NO,7,0.526906,0.693587,1.415919,0.0,1.0,0.0,0.703088,NO,NO,YES,YES,YES,0,0.466368,9,10,0.919283,-1.0,1.0,1263,892,NO,NO,NO,8,0.557175,0.693587
2,1698004,,,,,0.0,0.0,0.0,0.0,0.0,,,,,,0,0.0,0,0,0.0,0.0,0.0,0,0,,,,0,0.0,0.0,NO,NO,NO,NO,+X/rq+4qeKy2UVQN5xLQHSbri0Vdd/mYdG7WH/Kn+Pg=,oo9tGpHvTredpg9JkHgYbZAuxcwtSpQxU5mA/zUbxY8=,0.646703,0.059891,0.92196,0.53176,0.386558,NO,NO,NO,NO,NO,2,0.771325,6,9,0.939504,0.0,0.88,4672,3306,NO,NO,NO,3,0.831821,0.377568,nxOTdokWdl63qtAt4sgkSvyREdGu+ht1Iw4pcCaMi64=,YES,NO,QuSoZ63+jzXv3Ux1Y2LB+aSj7QCdPbrup3JSBj6Dmx8=,+yhSY//Hpg7u0bSA7NYmcmRFgv3bF4Tw3BMHrBqaTtA=,0.646703,0.059891,0.92196,0.53176,0.427226,YES,NO,YES,NO,YES,3,0.833031,1,7,0.992136,0.0,0.93,4672,3306,YES,NO,YES,1,0.840895,0.419949,B+EJpnEbkYtLnwDQYN1dP1rcfnoCnxAjKLYwQZE07Ew=,NO,NO,JA7Y7rY1wqr46lhqu5scuWov+Xw4neBTlPOPObhG0V8=,PjdztVAv+7nhxs8/+uAT0IRL7OcEHhLwHZ0IDTVbqcs=,0.646703,0.059891,0.92196,0.53176,0.416096,NO,NO,NO,NO,NO,13,0.161525,25,3,0.749546,0.0,0.86,4672,3306,NO,NO,NO,1,0.411978,0.405394,0.646703,0.059891,0.92196,0.53176,0.413099,YES,NO,NO,NO,NO,2,0.823351,5,3,0.952813,0.0,0.87,4672,3306,YES,NO,YES,0,0.870538,0.405822
3,1698005,NO,NO,uduY7XWJ8eFgTltv5P0rPh5GW6KwBu+tPFH13uQRN+0=,0L7+hNDV8S57etySgdljbm2AK1zQuLP77lGk2hyEmCo=,1.129212,0.08702,0.81424,1.112804,0.874318,NO,NO,NO,NO,NO,5,0.238793,3,10,0.990331,0.0,0.73,4400,3413,NO,NO,NO,5,0.248462,0.87,NO,NO,NO,NO,j3r3dqveRaSq13FDqUkrD+rGAW5DmYjnfEdKfVic3Do=,tnLDGLnpYhzsik5+X+WPo4KQJoQA0TfWRlmEtQ3XNJQ=,1.129212,0.08702,0.81424,1.112804,0.869091,NO,NO,YES,NO,YES,4,0.220334,2,38,0.992089,0.0,0.9,4400,3413,NO,NO,NO,7,0.228245,0.864773,8Whd23AFTt1KV61HEnaVzYZCSZsw5sqqmf4WUmWd3bQ=,NO,NO,IL4RpsAr/2F3DyhivH6T3zNTVSv14noGCEuy6VR0R0A=,ubHy/++3FVAS97znZOt7L+cjkZFJREIiJPRZEfRIztc=,1.155289,0.087313,0.853794,1.132142,0.884545,NO,NO,YES,NO,NO,1,0.213009,3,38,0.990038,0.0,0.9,4400,3413,NO,NO,NO,2,0.222971,0.880909,+yhSY//Hpg7u0bSA7NYmcmRFgv3bF4Tw3BMHrBqaTtA=,NO,NO,ZWvUbzI3yDZ8UGVOrIfsQCDpIBme7hoxGV079DPM78g=,ubHy/++3FVAS97znZOt7L+cjkZFJREIiJPRZEfRIztc=,1.129212,0.08702,0.81424,1.112804,0.874318,NO,NO,YES,NO,NO,2,0.208907,3,10,0.990624,0.0,0.88,4400,3413,NO,NO,NO,1,0.218283,0.870909,1.129212,0.08702,0.81424,1.112804,0.874318,YES,NO,YES,NO,YES,5,0.221213,1,10,0.996484,0.0,0.87,4400,3413,YES,NO,YES,2,0.224729,0.870909
4,1698006,NO,NO,kM4KU87XvnvKRvf4dN3Tu4zQYq8fpcqhDTFADWdfCg8=,4LhhvTzxwvh2SnFtcpaRasyvph66a3YDIQCshAfyS2o=,1.415919,0.0,1.0,0.0,0.232779,NO,NO,YES,NO,NO,2,0.516816,2,8,0.979821,0.0,1.0,1263,892,NO,NO,NO,6,0.536996,0.223278,NO,NO,NO,NO,M4ra4ZsGWR+veMAUMfpKnI4R8f5lQmz332MQ1RAcPGY=,BWlzsfzvLpUVVqvMBbjZ4zlrnQb/agQ7zCXv27i3RUw=,1.415919,0.0,1.0,0.0,0.232779,YES,NO,NO,NO,YES,7,0.456278,4,8,0.959641,0.0,1.0,1263,892,YES,NO,YES,5,0.496637,0.223278,sFxo2PpANwP+wYs1Jr0X/Mj94f3w/wzTeq8mKe0DrE8=,NO,NO,M4ra4ZsGWR+veMAUMfpKnI4R8f5lQmz332MQ1RAcPGY=,BWlzsfzvLpUVVqvMBbjZ4zlrnQb/agQ7zCXv27i3RUw=,1.415919,0.0,1.0,0.0,0.232779,YES,NO,NO,NO,YES,7,0.456278,4,8,0.959641,0.0,1.0,1263,892,YES,NO,YES,5,0.496637,0.223278,A0qW1vORvEPazvD0/mLe06Hm//wfSuxG8qBnFBj8RlI=,NO,NO,M4ra4ZsGWR+veMAUMfpKnI4R8f5lQmz332MQ1RAcPGY=,BWlzsfzvLpUVVqvMBbjZ4zlrnQb/agQ7zCXv27i3RUw=,1.415919,0.0,1.0,0.0,0.232779,YES,NO,NO,NO,YES,7,0.456278,4,8,0.959641,0.0,1.0,1263,892,YES,NO,YES,5,0.496637,0.223278,1.415919,0.0,1.0,0.0,0.232779,YES,NO,YES,NO,YES,0,0.456278,7,8,0.939462,0.0,1.0,1263,892,YES,NO,YES,6,0.536996,0.223278
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1994,1699996,NO,NO,Z6vucL/W0MPoFsgu2ewNXrvNCAQFiKzUJTYuqh6lP28=,yhI9Bw5Q8l1vEll4sw/Tem/jojpE9KwjKvQQIyrAqgU=,1.294118,0.0,1.0,0.0,0.164141,YES,NO,NO,NO,NO,4,0.174292,3,3,0.976035,0.0,1.0,1188,918,YES,NO,YES,2,0.198257,0.155724,NO,NO,NO,NO,J0BXf75QysCCr1LQvwLwqKywUU/ARUnFbGjg+QdLpX8=,ilf2dGT7f31KRWK57leidnw87VyQr7sdJUl+Ffqrnjs=,1.294118,0.0,1.0,0.0,0.164141,YES,NO,NO,NO,YES,7,0.135076,4,3,0.96841,0.0,1.0,1188,918,YES,NO,YES,1,0.166667,0.155724,XwGF4l3BgP/6u3gHERvOObKrF89JjyzD07RwwVRnm8E=,NO,NO,J0BXf75QysCCr1LQvwLwqKywUU/ARUnFbGjg+QdLpX8=,ilf2dGT7f31KRWK57leidnw87VyQr7sdJUl+Ffqrnjs=,1.294118,0.0,1.0,0.0,0.164141,YES,NO,NO,NO,YES,7,0.135076,4,3,0.96841,0.0,1.0,1188,918,YES,NO,YES,1,0.166667,0.155724,8W2jqgU9VhlI3ledX0DtFS20qRfWG4lMiGtXVF629RE=,NO,NO,J0BXf75QysCCr1LQvwLwqKywUU/ARUnFbGjg+QdLpX8=,ilf2dGT7f31KRWK57leidnw87VyQr7sdJUl+Ffqrnjs=,1.294118,0.0,1.0,0.0,0.164141,YES,NO,NO,NO,YES,7,0.135076,4,3,0.96841,0.0,1.0,1188,918,YES,NO,YES,1,0.166667,0.155724,1.294118,0.0,1.0,0.0,0.164141,YES,NO,NO,NO,YES,0,0.135076,8,3,0.944444,0.0,1.0,1188,918,YES,NO,YES,2,0.198257,0.155724
1995,1699997,NO,NO,LKQ9Uh6tQ3ZrIxAKaPaDEuiYFunnK/2d+oKAfpN9tuY=,h0cPLYjd7nmw9FJsQA+KUsnChH0SajbHjNdfMk47k9o=,1.020217,0.583944,0.625842,1.003516,0.791136,YES,NO,NO,NO,NO,2,0.583944,4,1,0.958101,0.0,0.59,4400,3413,YES,NO,YES,0,0.625842,0.778409,NO,NO,NO,NO,Pgia2elmEzkgkUe2GEDPm8O69qm5+2ZzhkpklIsrHvY=,1CiKJR7D66tRwH6l6wwv0p+D/tAuoW+NdSNqPTbvDoQ=,0.455318,0.556402,0.601231,0.430999,0.352955,NO,NO,NO,NO,NO,7,0.557574,7,2,0.973938,0.0,0.88,4400,3413,NO,NO,NO,0,0.587167,0.344545,I5T+alDpnWyWctJmKqYNmZr8hGc/SQiqLjHrOANRIC8=,NO,NO,5Vnx3es11UozOKalAqzD1x1/2jTqcXhGPWiBVQuHvmg=,tnLDGLnpYhzsik5+X+WPo4KQJoQA0TfWRlmEtQ3XNJQ=,1.064752,0.249341,0.582479,1.004102,0.807045,NO,NO,YES,NO,NO,3,0.558746,2,6,0.980076,0.0,0.81,4400,3413,NO,NO,NO,0,0.57867,0.794773,Eu/ebH1jFr/SVuF7llNoE1CXbg3gE56Ub+r7IBTtjd4=,NO,NO,955UFNdEoKo3fP1s6fReA0xsn6MsFK4DkaQQm93osL4=,3yK2OPj1uYDsoMgsxsjY1FxXkOllD8Xfh20VYGqT+nU=,1.064752,0.249341,0.582479,1.004102,0.792045,NO,NO,YES,NO,NO,2,0.474656,4,5,0.954585,0.0,0.81,4400,3413,NO,NO,NO,0,0.52007,0.779773,1.064752,0.249341,0.582479,1.004102,0.791818,YES,NO,NO,NO,NO,3,0.532083,5,5,0.949604,0.0,0.72,4400,3413,YES,NO,YES,0,0.582479,0.778864
1996,1699998,NO,NO,/tuZYGMsFx4A/Ou+jSol6t/TpLRkSl8Ku+1tnQPvwww=,aLEeZ8ZFKt2jQfkG5e9Nmad+QJlfpPmSfQS3CHlL6Ik=,0.354706,0.550882,0.930882,0.207941,0.2075,NO,NO,NO,NO,NO,6,0.910882,4,13,0.980294,0.0,0.88,4400,3400,NO,NO,NO,1,0.930588,0.202045,NO,NO,NO,NO,qyemYRzOAcokSqS/92nD6ek2GOy82zS8Zr+OtdK3K1o=,tnLDGLnpYhzsik5+X+WPo4KQJoQA0TfWRlmEtQ3XNJQ=,0.354706,0.550882,0.930882,0.207941,0.2075,NO,NO,YES,NO,NO,6,0.897647,2,13,0.990294,0.0,0.81,4400,3400,NO,NO,NO,4,0.907353,0.201591,L28qavC1qdxbQKqPAazKNeInnw7SbaN12h48g/VWSEg=,NO,NO,qyemYRzOAcokSqS/92nD6ek2GOy82zS8Zr+OtdK3K1o=,tnLDGLnpYhzsik5+X+WPo4KQJoQA0TfWRlmEtQ3XNJQ=,0.354706,0.550882,0.930882,0.207941,0.2075,NO,NO,YES,NO,NO,6,0.897647,2,13,0.990294,0.0,0.81,4400,3400,NO,NO,NO,4,0.907353,0.201591,zu3rCMl9Lih8mHQaU2J5ysGZFDHk5hK0vNqdMtMLty0=,NO,NO,qyemYRzOAcokSqS/92nD6ek2GOy82zS8Zr+OtdK3K1o=,tnLDGLnpYhzsik5+X+WPo4KQJoQA0TfWRlmEtQ3XNJQ=,0.354706,0.550882,0.930882,0.207941,0.2075,NO,NO,YES,NO,NO,6,0.897647,2,13,0.990294,0.0,0.81,4400,3400,NO,NO,NO,4,0.907353,0.201591,0.354706,0.550882,0.930882,0.207941,0.2075,NO,NO,NO,NO,NO,0,0.897647,7,13,0.972104,0.0,0.845,4400,3400,NO,NO,NO,5,0.930588,0.201591
1997,1699999,NO,NO,uMIU2KDOxlgzhYToCFCa3nMxIOPV0WqCnKWfooGaw+8=,4LhhvTzxwvh2SnFtcpaRasyvph66a3YDIQCshAfyS2o=,1.220588,0.102059,0.326176,1.213824,0.942955,NO,NO,YES,NO,NO,1,0.249412,2,4,0.992941,0.0,0.74,4400,3400,NO,NO,NO,1,0.256471,0.938864,NO,NO,NO,NO,UquKPML0snQ8uxGGONzBaGjSjKUOCdOYQMeqXedWl6c=,XsmDHgToSgSR4zK64rJBIxe9rkz5eyzFh3TjxTvSpnk=,1.220588,0.102059,0.326176,1.213824,0.942955,YES,NO,NO,NO,NO,1,0.146765,28,4,0.900588,0.0,0.6,4400,3400,YES,NO,YES,1,0.246176,0.938182,M0zOutuBgj8hmJP9FcA45DVyx1UWt9h0BeRt2RnOQWY=,NO,NO,UquKPML0snQ8uxGGONzBaGjSjKUOCdOYQMeqXedWl6c=,XsmDHgToSgSR4zK64rJBIxe9rkz5eyzFh3TjxTvSpnk=,1.220588,0.102059,0.326176,1.213824,0.942955,YES,NO,NO,NO,NO,1,0.146765,28,4,0.900588,0.0,0.6,4400,3400,YES,NO,YES,1,0.246176,0.938182,Mv+vwn4wgOtw5YJEqp0pSfj55e8MJqCky24z5NQHTbU=,NO,NO,/c/cfJxPOyk5eh9imrgWGV2z+2Os2fEq5KmBNBTNPqQ=,QIHEzfEYHubEp9c6aGZBHgEzfU0l0BWn+C3bAM0M51A=,1.220588,0.102059,0.326176,1.213824,0.942955,YES,NO,NO,NO,NO,1,0.102059,16,4,0.959412,0.0,0.69,4400,3400,YES,NO,YES,0,0.142647,0.938864,1.220588,0.102059,0.326176,1.213824,0.942955,YES,NO,YES,NO,YES,0,0.102059,48,4,0.859748,1.5,0.676667,4400,3400,NO,NO,NO,2,0.256471,0.938182


In [46]:
test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1999 entries, 0 to 1998
Columns: 146 entries, id to x145
dtypes: object(146)
memory usage: 2.2+ MB


In [47]:
### There's no duplicate values.

In [48]:
test_df = test.copy()

In [49]:
test_df.isna().sum().values

array([  0, 294, 294, 294, 294,   0,   0,   0,   0,   0, 294, 294, 294,
       294, 294,   0,   0,   0,   0,   0,   0,   0,   0,   0, 294, 294,
       294,   0,   0,   0,   0,   0,  73,  73,  73,  73,   0,   0,   0,
         0,   0,  73,  73,  73,  73,  73,   0,   0,   0,   0,   0,   0,
         0,   0,   0,  73,  73,  73,   0,   0,   0,   0,  81,  81,  81,
        81,   0,   0,   0,   0,   0,  81,  81,  81,  81,  81,   0,   0,
         0,   0,   0,   0,   0,   0,   0,  81,  81,  81,   0,   0,   0,
         0, 171, 171, 171, 171,   0,   0,   0,   0,   0, 171, 171, 171,
       171, 171,   0,   0,   0,   0,   0,   0,   0,   0,   0, 171, 171,
       171,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0], dtype=int64)

In [50]:
test_features = binary_features(test_df)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  binary_features.loc[yes_indices, col] = 'YES'
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  binary_features.loc[no_indices, col] = 'NO'
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  binary_features.loc[yes_indices, col] = 'YES'
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  binary_features.loc[no_indices, 

###  Treating Hash fields
    It's not possible to "dehash" or reverse a cryptographic hash function like SHA-256 back to its original text form using Python or any other programming language.
    This is because cryptographic hash functions are designed to be one-way, meaning they are easy to compute in one direction but extremely difficult to reverse.

In [52]:
test_features = multi_category(test_df)

In [53]:
test_features

Unnamed: 0,id,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,x16,x17,x18,x19,x20,x21,x22,x23,x24,x25,x26,x27,x28,x29,x30,x31,x32,x33,x34,x35,x36,x37,x38,x39,x40,x41,x42,x43,x44,x45,x46,x47,x48,x49,x50,x51,x52,x53,x54,x55,x56,x57,x58,x59,x60,x61,x62,x63,x64,x65,x66,x67,x68,x69,x70,x71,x72,x73,x74,x75,x76,x77,x78,x79,x80,x81,x82,x83,x84,x85,x86,x87,x88,x89,x90,x91,x92,x93,x94,x95,x96,x97,x98,x99,x100,x101,x102,x103,x104,x105,x106,x107,x108,x109,x110,x111,x112,x113,x114,x115,x116,x117,x118,x119,x120,x121,x122,x123,x124,x125,x126,x127,x128,x129,x130,x131,x132,x133,x134,x135,x136,x137,x138,x139,x140,x141,x142,x143,x144,x145
0,1698002,NO,NO,9ACcuXc7MMm9V7jZSr3P3VxAKyMvLAtsdwPKwgncc+k=,WV5vAHFyqkeuyFB5KVNGFOBuwjkUGKYc8wh9QfpVzAA=,0.832679,0.049834,0.945938,0.317427,0.482021,YES,NO,NO,NO,NO,3,0.8955,6,7,0.950468,0.0,0.88,4672,3311,YES,NO,YES,1,0.945032,0.471747,NO,NO,NO,NO,cr+kkNnNFV9YL0vz029hk3ohIDmGuABRVNhFe0ePZyo=,X6dDAI/DZOWvu0Dg6gCgRoNr2vTUz/mc4SdHTNUPS38=,0.832679,0.049834,0.945938,0.317427,0.480308,NO,NO,YES,NO,NO,4,0.804893,3,7,0.966777,0.0,0.84,4672,3311,NO,NO,NO,3,0.838115,0.471318,6Wv/YGbS0KDFv2UrATvlJcVtCjJOgnVQzWuuF5Ltv2k=,NO,NO,9ACcuXc7MMm9V7jZSr3P3VxAKyMvLAtsdwPKwgncc+k=,WV5vAHFyqkeuyFB5KVNGFOBuwjkUGKYc8wh9QfpVzAA=,0.832679,0.049834,0.945938,0.317427,0.482021,YES,NO,NO,NO,NO,1,0.725763,6,7,0.950468,0.0,0.88,4672,3311,YES,NO,YES,0,0.775294,0.471747,dBSc/QZM58O6miC4ULLhY0C4S6WIZLwy2oERlRo7Iaw=,NO,NO,9ACcuXc7MMm9V7jZSr3P3VxAKyMvLAtsdwPKwgncc+k=,WV5vAHFyqkeuyFB5KVNGFOBuwjkUGKYc8wh9QfpVzAA=,0.832679,0.049834,0.945938,0.317427,0.482021,YES,NO,NO,NO,NO,1,0.725763,6,7,0.950468,0.0,0.88,4672,3311,YES,NO,YES,0,0.775294,0.471747,0.832679,0.049834,0.945938,0.317427,0.482021,YES,NO,NO,NO,NO,0,0.725763,17,7,0.876992,1.0,0.866667,4672,3311,NO,NO,NO,5,0.945032,0.471318
1,1698003,NO,NO,MeBJ/ZzEIXfNKat4w1oeDxiMNKrAeY0PH41i00hpYDo=,tnLDGLnpYhzsik5+X+WPo4KQJoQA0TfWRlmEtQ3XNJQ=,1.415919,0.0,1.0,0.0,0.703088,NO,NO,YES,YES,YES,4,0.536996,2,10,0.979821,0.0,1.0,1263,892,NO,NO,NO,8,0.557175,0.693587,NO,NO,NO,NO,W4WKBYyqy7COFohWfUyg2xuUPldk8b4hPsDCHjbePaU=,ABhwE8nsQMtFix2MDemgmGfoV68Tn3hLpZRXWVTQ0TM=,1.415919,0.0,1.0,0.0,0.703088,NO,NO,NO,NO,NO,3,0.466368,6,10,0.939462,0.0,1.0,1263,892,NO,NO,NO,7,0.526906,0.693587,0+HOWI7C643OXIZ7LdoeNFzA7A0MkSH//diziszbII8=,NO,NO,W4WKBYyqy7COFohWfUyg2xuUPldk8b4hPsDCHjbePaU=,ABhwE8nsQMtFix2MDemgmGfoV68Tn3hLpZRXWVTQ0TM=,1.415919,0.0,1.0,0.0,0.703088,NO,NO,NO,NO,NO,3,0.466368,6,10,0.939462,0.0,1.0,1263,892,NO,NO,NO,7,0.526906,0.693587,eI6XNOFa+S9J5wTICUEas3M6YE4TJ6rSTCo2NvqM16s=,NO,NO,W4WKBYyqy7COFohWfUyg2xuUPldk8b4hPsDCHjbePaU=,ABhwE8nsQMtFix2MDemgmGfoV68Tn3hLpZRXWVTQ0TM=,1.415919,0.0,1.0,0.0,0.703088,NO,NO,NO,NO,NO,3,0.466368,6,10,0.939462,0.0,1.0,1263,892,NO,NO,NO,7,0.526906,0.693587,1.415919,0.0,1.0,0.0,0.703088,NO,NO,YES,YES,YES,0,0.466368,9,10,0.919283,-1.0,1.0,1263,892,NO,NO,NO,8,0.557175,0.693587
2,1698004,NO,NO,,,0.0,0.0,0.0,0.0,0.0,YES,YES,YES,NO,YES,0,0.0,0,0,0.0,0.0,0.0,0,0,NO,NO,NO,0,0.0,0.0,NO,NO,NO,NO,+X/rq+4qeKy2UVQN5xLQHSbri0Vdd/mYdG7WH/Kn+Pg=,oo9tGpHvTredpg9JkHgYbZAuxcwtSpQxU5mA/zUbxY8=,0.646703,0.059891,0.92196,0.53176,0.386558,NO,NO,NO,NO,NO,2,0.771325,6,9,0.939504,0.0,0.88,4672,3306,NO,NO,NO,3,0.831821,0.377568,nxOTdokWdl63qtAt4sgkSvyREdGu+ht1Iw4pcCaMi64=,YES,NO,QuSoZ63+jzXv3Ux1Y2LB+aSj7QCdPbrup3JSBj6Dmx8=,+yhSY//Hpg7u0bSA7NYmcmRFgv3bF4Tw3BMHrBqaTtA=,0.646703,0.059891,0.92196,0.53176,0.427226,YES,NO,YES,NO,YES,3,0.833031,1,7,0.992136,0.0,0.93,4672,3306,YES,NO,YES,1,0.840895,0.419949,B+EJpnEbkYtLnwDQYN1dP1rcfnoCnxAjKLYwQZE07Ew=,NO,NO,JA7Y7rY1wqr46lhqu5scuWov+Xw4neBTlPOPObhG0V8=,PjdztVAv+7nhxs8/+uAT0IRL7OcEHhLwHZ0IDTVbqcs=,0.646703,0.059891,0.92196,0.53176,0.416096,NO,NO,NO,NO,NO,13,0.161525,25,3,0.749546,0.0,0.86,4672,3306,NO,NO,NO,1,0.411978,0.405394,0.646703,0.059891,0.92196,0.53176,0.413099,YES,NO,NO,NO,NO,2,0.823351,5,3,0.952813,0.0,0.87,4672,3306,YES,NO,YES,0,0.870538,0.405822
3,1698005,NO,NO,uduY7XWJ8eFgTltv5P0rPh5GW6KwBu+tPFH13uQRN+0=,0L7+hNDV8S57etySgdljbm2AK1zQuLP77lGk2hyEmCo=,1.129212,0.08702,0.81424,1.112804,0.874318,NO,NO,NO,NO,NO,5,0.238793,3,10,0.990331,0.0,0.73,4400,3413,NO,NO,NO,5,0.248462,0.87,NO,NO,NO,NO,j3r3dqveRaSq13FDqUkrD+rGAW5DmYjnfEdKfVic3Do=,tnLDGLnpYhzsik5+X+WPo4KQJoQA0TfWRlmEtQ3XNJQ=,1.129212,0.08702,0.81424,1.112804,0.869091,NO,NO,YES,NO,YES,4,0.220334,2,38,0.992089,0.0,0.9,4400,3413,NO,NO,NO,7,0.228245,0.864773,8Whd23AFTt1KV61HEnaVzYZCSZsw5sqqmf4WUmWd3bQ=,NO,NO,IL4RpsAr/2F3DyhivH6T3zNTVSv14noGCEuy6VR0R0A=,ubHy/++3FVAS97znZOt7L+cjkZFJREIiJPRZEfRIztc=,1.155289,0.087313,0.853794,1.132142,0.884545,NO,NO,YES,NO,NO,1,0.213009,3,38,0.990038,0.0,0.9,4400,3413,NO,NO,NO,2,0.222971,0.880909,+yhSY//Hpg7u0bSA7NYmcmRFgv3bF4Tw3BMHrBqaTtA=,NO,NO,ZWvUbzI3yDZ8UGVOrIfsQCDpIBme7hoxGV079DPM78g=,ubHy/++3FVAS97znZOt7L+cjkZFJREIiJPRZEfRIztc=,1.129212,0.08702,0.81424,1.112804,0.874318,NO,NO,YES,NO,NO,2,0.208907,3,10,0.990624,0.0,0.88,4400,3413,NO,NO,NO,1,0.218283,0.870909,1.129212,0.08702,0.81424,1.112804,0.874318,YES,NO,YES,NO,YES,5,0.221213,1,10,0.996484,0.0,0.87,4400,3413,YES,NO,YES,2,0.224729,0.870909
4,1698006,NO,NO,kM4KU87XvnvKRvf4dN3Tu4zQYq8fpcqhDTFADWdfCg8=,4LhhvTzxwvh2SnFtcpaRasyvph66a3YDIQCshAfyS2o=,1.415919,0.0,1.0,0.0,0.232779,NO,NO,YES,NO,NO,2,0.516816,2,8,0.979821,0.0,1.0,1263,892,NO,NO,NO,6,0.536996,0.223278,NO,NO,NO,NO,M4ra4ZsGWR+veMAUMfpKnI4R8f5lQmz332MQ1RAcPGY=,BWlzsfzvLpUVVqvMBbjZ4zlrnQb/agQ7zCXv27i3RUw=,1.415919,0.0,1.0,0.0,0.232779,YES,NO,NO,NO,YES,7,0.456278,4,8,0.959641,0.0,1.0,1263,892,YES,NO,YES,5,0.496637,0.223278,sFxo2PpANwP+wYs1Jr0X/Mj94f3w/wzTeq8mKe0DrE8=,NO,NO,M4ra4ZsGWR+veMAUMfpKnI4R8f5lQmz332MQ1RAcPGY=,BWlzsfzvLpUVVqvMBbjZ4zlrnQb/agQ7zCXv27i3RUw=,1.415919,0.0,1.0,0.0,0.232779,YES,NO,NO,NO,YES,7,0.456278,4,8,0.959641,0.0,1.0,1263,892,YES,NO,YES,5,0.496637,0.223278,A0qW1vORvEPazvD0/mLe06Hm//wfSuxG8qBnFBj8RlI=,NO,NO,M4ra4ZsGWR+veMAUMfpKnI4R8f5lQmz332MQ1RAcPGY=,BWlzsfzvLpUVVqvMBbjZ4zlrnQb/agQ7zCXv27i3RUw=,1.415919,0.0,1.0,0.0,0.232779,YES,NO,NO,NO,YES,7,0.456278,4,8,0.959641,0.0,1.0,1263,892,YES,NO,YES,5,0.496637,0.223278,1.415919,0.0,1.0,0.0,0.232779,YES,NO,YES,NO,YES,0,0.456278,7,8,0.939462,0.0,1.0,1263,892,YES,NO,YES,6,0.536996,0.223278
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1994,1699996,NO,NO,Z6vucL/W0MPoFsgu2ewNXrvNCAQFiKzUJTYuqh6lP28=,yhI9Bw5Q8l1vEll4sw/Tem/jojpE9KwjKvQQIyrAqgU=,1.294118,0.0,1.0,0.0,0.164141,YES,NO,NO,NO,NO,4,0.174292,3,3,0.976035,0.0,1.0,1188,918,YES,NO,YES,2,0.198257,0.155724,NO,NO,NO,NO,J0BXf75QysCCr1LQvwLwqKywUU/ARUnFbGjg+QdLpX8=,ilf2dGT7f31KRWK57leidnw87VyQr7sdJUl+Ffqrnjs=,1.294118,0.0,1.0,0.0,0.164141,YES,NO,NO,NO,YES,7,0.135076,4,3,0.96841,0.0,1.0,1188,918,YES,NO,YES,1,0.166667,0.155724,XwGF4l3BgP/6u3gHERvOObKrF89JjyzD07RwwVRnm8E=,NO,NO,J0BXf75QysCCr1LQvwLwqKywUU/ARUnFbGjg+QdLpX8=,ilf2dGT7f31KRWK57leidnw87VyQr7sdJUl+Ffqrnjs=,1.294118,0.0,1.0,0.0,0.164141,YES,NO,NO,NO,YES,7,0.135076,4,3,0.96841,0.0,1.0,1188,918,YES,NO,YES,1,0.166667,0.155724,8W2jqgU9VhlI3ledX0DtFS20qRfWG4lMiGtXVF629RE=,NO,NO,J0BXf75QysCCr1LQvwLwqKywUU/ARUnFbGjg+QdLpX8=,ilf2dGT7f31KRWK57leidnw87VyQr7sdJUl+Ffqrnjs=,1.294118,0.0,1.0,0.0,0.164141,YES,NO,NO,NO,YES,7,0.135076,4,3,0.96841,0.0,1.0,1188,918,YES,NO,YES,1,0.166667,0.155724,1.294118,0.0,1.0,0.0,0.164141,YES,NO,NO,NO,YES,0,0.135076,8,3,0.944444,0.0,1.0,1188,918,YES,NO,YES,2,0.198257,0.155724
1995,1699997,NO,NO,LKQ9Uh6tQ3ZrIxAKaPaDEuiYFunnK/2d+oKAfpN9tuY=,h0cPLYjd7nmw9FJsQA+KUsnChH0SajbHjNdfMk47k9o=,1.020217,0.583944,0.625842,1.003516,0.791136,YES,NO,NO,NO,NO,2,0.583944,4,1,0.958101,0.0,0.59,4400,3413,YES,NO,YES,0,0.625842,0.778409,NO,NO,NO,NO,Pgia2elmEzkgkUe2GEDPm8O69qm5+2ZzhkpklIsrHvY=,1CiKJR7D66tRwH6l6wwv0p+D/tAuoW+NdSNqPTbvDoQ=,0.455318,0.556402,0.601231,0.430999,0.352955,NO,NO,NO,NO,NO,7,0.557574,7,2,0.973938,0.0,0.88,4400,3413,NO,NO,NO,0,0.587167,0.344545,I5T+alDpnWyWctJmKqYNmZr8hGc/SQiqLjHrOANRIC8=,NO,NO,5Vnx3es11UozOKalAqzD1x1/2jTqcXhGPWiBVQuHvmg=,tnLDGLnpYhzsik5+X+WPo4KQJoQA0TfWRlmEtQ3XNJQ=,1.064752,0.249341,0.582479,1.004102,0.807045,NO,NO,YES,NO,NO,3,0.558746,2,6,0.980076,0.0,0.81,4400,3413,NO,NO,NO,0,0.57867,0.794773,Eu/ebH1jFr/SVuF7llNoE1CXbg3gE56Ub+r7IBTtjd4=,NO,NO,955UFNdEoKo3fP1s6fReA0xsn6MsFK4DkaQQm93osL4=,3yK2OPj1uYDsoMgsxsjY1FxXkOllD8Xfh20VYGqT+nU=,1.064752,0.249341,0.582479,1.004102,0.792045,NO,NO,YES,NO,NO,2,0.474656,4,5,0.954585,0.0,0.81,4400,3413,NO,NO,NO,0,0.52007,0.779773,1.064752,0.249341,0.582479,1.004102,0.791818,YES,NO,NO,NO,NO,3,0.532083,5,5,0.949604,0.0,0.72,4400,3413,YES,NO,YES,0,0.582479,0.778864
1996,1699998,NO,NO,/tuZYGMsFx4A/Ou+jSol6t/TpLRkSl8Ku+1tnQPvwww=,aLEeZ8ZFKt2jQfkG5e9Nmad+QJlfpPmSfQS3CHlL6Ik=,0.354706,0.550882,0.930882,0.207941,0.2075,NO,NO,NO,NO,NO,6,0.910882,4,13,0.980294,0.0,0.88,4400,3400,NO,NO,NO,1,0.930588,0.202045,NO,NO,NO,NO,qyemYRzOAcokSqS/92nD6ek2GOy82zS8Zr+OtdK3K1o=,tnLDGLnpYhzsik5+X+WPo4KQJoQA0TfWRlmEtQ3XNJQ=,0.354706,0.550882,0.930882,0.207941,0.2075,NO,NO,YES,NO,NO,6,0.897647,2,13,0.990294,0.0,0.81,4400,3400,NO,NO,NO,4,0.907353,0.201591,L28qavC1qdxbQKqPAazKNeInnw7SbaN12h48g/VWSEg=,NO,NO,qyemYRzOAcokSqS/92nD6ek2GOy82zS8Zr+OtdK3K1o=,tnLDGLnpYhzsik5+X+WPo4KQJoQA0TfWRlmEtQ3XNJQ=,0.354706,0.550882,0.930882,0.207941,0.2075,NO,NO,YES,NO,NO,6,0.897647,2,13,0.990294,0.0,0.81,4400,3400,NO,NO,NO,4,0.907353,0.201591,zu3rCMl9Lih8mHQaU2J5ysGZFDHk5hK0vNqdMtMLty0=,NO,NO,qyemYRzOAcokSqS/92nD6ek2GOy82zS8Zr+OtdK3K1o=,tnLDGLnpYhzsik5+X+WPo4KQJoQA0TfWRlmEtQ3XNJQ=,0.354706,0.550882,0.930882,0.207941,0.2075,NO,NO,YES,NO,NO,6,0.897647,2,13,0.990294,0.0,0.81,4400,3400,NO,NO,NO,4,0.907353,0.201591,0.354706,0.550882,0.930882,0.207941,0.2075,NO,NO,NO,NO,NO,0,0.897647,7,13,0.972104,0.0,0.845,4400,3400,NO,NO,NO,5,0.930588,0.201591
1997,1699999,NO,NO,uMIU2KDOxlgzhYToCFCa3nMxIOPV0WqCnKWfooGaw+8=,4LhhvTzxwvh2SnFtcpaRasyvph66a3YDIQCshAfyS2o=,1.220588,0.102059,0.326176,1.213824,0.942955,NO,NO,YES,NO,NO,1,0.249412,2,4,0.992941,0.0,0.74,4400,3400,NO,NO,NO,1,0.256471,0.938864,NO,NO,NO,NO,UquKPML0snQ8uxGGONzBaGjSjKUOCdOYQMeqXedWl6c=,XsmDHgToSgSR4zK64rJBIxe9rkz5eyzFh3TjxTvSpnk=,1.220588,0.102059,0.326176,1.213824,0.942955,YES,NO,NO,NO,NO,1,0.146765,28,4,0.900588,0.0,0.6,4400,3400,YES,NO,YES,1,0.246176,0.938182,M0zOutuBgj8hmJP9FcA45DVyx1UWt9h0BeRt2RnOQWY=,NO,NO,UquKPML0snQ8uxGGONzBaGjSjKUOCdOYQMeqXedWl6c=,XsmDHgToSgSR4zK64rJBIxe9rkz5eyzFh3TjxTvSpnk=,1.220588,0.102059,0.326176,1.213824,0.942955,YES,NO,NO,NO,NO,1,0.146765,28,4,0.900588,0.0,0.6,4400,3400,YES,NO,YES,1,0.246176,0.938182,Mv+vwn4wgOtw5YJEqp0pSfj55e8MJqCky24z5NQHTbU=,NO,NO,/c/cfJxPOyk5eh9imrgWGV2z+2Os2fEq5KmBNBTNPqQ=,QIHEzfEYHubEp9c6aGZBHgEzfU0l0BWn+C3bAM0M51A=,1.220588,0.102059,0.326176,1.213824,0.942955,YES,NO,NO,NO,NO,1,0.102059,16,4,0.959412,0.0,0.69,4400,3400,YES,NO,YES,0,0.142647,0.938864,1.220588,0.102059,0.326176,1.213824,0.942955,YES,NO,YES,NO,YES,0,0.102059,48,4,0.859748,1.5,0.676667,4400,3400,NO,NO,NO,2,0.256471,0.938182


### Getting the Non numerical columns

In [57]:
test_df_cat1 = test_features[['x1', 'x2', 'x3', 'x4', 'x10', 'x11', 'x12', 'x13', 'x14', 'x24', 'x25',
       'x26', 'x30', 'x31', 'x32', 'x33', 'x34', 'x35', 'x41', 'x42', 'x43',
       'x44', 'x45', 'x55', 'x56', 'x57', 'x61', 'x62', 'x63', 'x64', 'x65',
       'x71', 'x72', 'x73', 'x74', 'x75', 'x85', 'x86', 'x87', 'x91', 'x92',
       'x93', 'x94', 'x95', 'x101', 'x102', 'x103', 'x104', 'x105', 'x115',
       'x116', 'x117', 'x126', 'x127', 'x128', 'x129', 'x130', 'x140', 'x141',
       'x142']].copy()

In [58]:
test_df_cat1

Unnamed: 0,x1,x2,x3,x4,x10,x11,x12,x13,x14,x24,x25,x26,x30,x31,x32,x33,x34,x35,x41,x42,x43,x44,x45,x55,x56,x57,x61,x62,x63,x64,x65,x71,x72,x73,x74,x75,x85,x86,x87,x91,x92,x93,x94,x95,x101,x102,x103,x104,x105,x115,x116,x117,x126,x127,x128,x129,x130,x140,x141,x142
0,NO,NO,9ACcuXc7MMm9V7jZSr3P3VxAKyMvLAtsdwPKwgncc+k=,WV5vAHFyqkeuyFB5KVNGFOBuwjkUGKYc8wh9QfpVzAA=,YES,NO,NO,NO,NO,YES,NO,YES,NO,NO,NO,NO,cr+kkNnNFV9YL0vz029hk3ohIDmGuABRVNhFe0ePZyo=,X6dDAI/DZOWvu0Dg6gCgRoNr2vTUz/mc4SdHTNUPS38=,NO,NO,YES,NO,NO,NO,NO,NO,6Wv/YGbS0KDFv2UrATvlJcVtCjJOgnVQzWuuF5Ltv2k=,NO,NO,9ACcuXc7MMm9V7jZSr3P3VxAKyMvLAtsdwPKwgncc+k=,WV5vAHFyqkeuyFB5KVNGFOBuwjkUGKYc8wh9QfpVzAA=,YES,NO,NO,NO,NO,YES,NO,YES,dBSc/QZM58O6miC4ULLhY0C4S6WIZLwy2oERlRo7Iaw=,NO,NO,9ACcuXc7MMm9V7jZSr3P3VxAKyMvLAtsdwPKwgncc+k=,WV5vAHFyqkeuyFB5KVNGFOBuwjkUGKYc8wh9QfpVzAA=,YES,NO,NO,NO,NO,YES,NO,YES,YES,NO,NO,NO,NO,NO,NO,NO
1,NO,NO,MeBJ/ZzEIXfNKat4w1oeDxiMNKrAeY0PH41i00hpYDo=,tnLDGLnpYhzsik5+X+WPo4KQJoQA0TfWRlmEtQ3XNJQ=,NO,NO,YES,YES,YES,NO,NO,NO,NO,NO,NO,NO,W4WKBYyqy7COFohWfUyg2xuUPldk8b4hPsDCHjbePaU=,ABhwE8nsQMtFix2MDemgmGfoV68Tn3hLpZRXWVTQ0TM=,NO,NO,NO,NO,NO,NO,NO,NO,0+HOWI7C643OXIZ7LdoeNFzA7A0MkSH//diziszbII8=,NO,NO,W4WKBYyqy7COFohWfUyg2xuUPldk8b4hPsDCHjbePaU=,ABhwE8nsQMtFix2MDemgmGfoV68Tn3hLpZRXWVTQ0TM=,NO,NO,NO,NO,NO,NO,NO,NO,eI6XNOFa+S9J5wTICUEas3M6YE4TJ6rSTCo2NvqM16s=,NO,NO,W4WKBYyqy7COFohWfUyg2xuUPldk8b4hPsDCHjbePaU=,ABhwE8nsQMtFix2MDemgmGfoV68Tn3hLpZRXWVTQ0TM=,NO,NO,NO,NO,NO,NO,NO,NO,NO,NO,YES,YES,YES,NO,NO,NO
2,NO,NO,,,YES,YES,YES,NO,YES,NO,NO,NO,NO,NO,NO,NO,+X/rq+4qeKy2UVQN5xLQHSbri0Vdd/mYdG7WH/Kn+Pg=,oo9tGpHvTredpg9JkHgYbZAuxcwtSpQxU5mA/zUbxY8=,NO,NO,NO,NO,NO,NO,NO,NO,nxOTdokWdl63qtAt4sgkSvyREdGu+ht1Iw4pcCaMi64=,YES,NO,QuSoZ63+jzXv3Ux1Y2LB+aSj7QCdPbrup3JSBj6Dmx8=,+yhSY//Hpg7u0bSA7NYmcmRFgv3bF4Tw3BMHrBqaTtA=,YES,NO,YES,NO,YES,YES,NO,YES,B+EJpnEbkYtLnwDQYN1dP1rcfnoCnxAjKLYwQZE07Ew=,NO,NO,JA7Y7rY1wqr46lhqu5scuWov+Xw4neBTlPOPObhG0V8=,PjdztVAv+7nhxs8/+uAT0IRL7OcEHhLwHZ0IDTVbqcs=,NO,NO,NO,NO,NO,NO,NO,NO,YES,NO,NO,NO,NO,YES,NO,YES
3,NO,NO,uduY7XWJ8eFgTltv5P0rPh5GW6KwBu+tPFH13uQRN+0=,0L7+hNDV8S57etySgdljbm2AK1zQuLP77lGk2hyEmCo=,NO,NO,NO,NO,NO,NO,NO,NO,NO,NO,NO,NO,j3r3dqveRaSq13FDqUkrD+rGAW5DmYjnfEdKfVic3Do=,tnLDGLnpYhzsik5+X+WPo4KQJoQA0TfWRlmEtQ3XNJQ=,NO,NO,YES,NO,YES,NO,NO,NO,8Whd23AFTt1KV61HEnaVzYZCSZsw5sqqmf4WUmWd3bQ=,NO,NO,IL4RpsAr/2F3DyhivH6T3zNTVSv14noGCEuy6VR0R0A=,ubHy/++3FVAS97znZOt7L+cjkZFJREIiJPRZEfRIztc=,NO,NO,YES,NO,NO,NO,NO,NO,+yhSY//Hpg7u0bSA7NYmcmRFgv3bF4Tw3BMHrBqaTtA=,NO,NO,ZWvUbzI3yDZ8UGVOrIfsQCDpIBme7hoxGV079DPM78g=,ubHy/++3FVAS97znZOt7L+cjkZFJREIiJPRZEfRIztc=,NO,NO,YES,NO,NO,NO,NO,NO,YES,NO,YES,NO,YES,YES,NO,YES
4,NO,NO,kM4KU87XvnvKRvf4dN3Tu4zQYq8fpcqhDTFADWdfCg8=,4LhhvTzxwvh2SnFtcpaRasyvph66a3YDIQCshAfyS2o=,NO,NO,YES,NO,NO,NO,NO,NO,NO,NO,NO,NO,M4ra4ZsGWR+veMAUMfpKnI4R8f5lQmz332MQ1RAcPGY=,BWlzsfzvLpUVVqvMBbjZ4zlrnQb/agQ7zCXv27i3RUw=,YES,NO,NO,NO,YES,YES,NO,YES,sFxo2PpANwP+wYs1Jr0X/Mj94f3w/wzTeq8mKe0DrE8=,NO,NO,M4ra4ZsGWR+veMAUMfpKnI4R8f5lQmz332MQ1RAcPGY=,BWlzsfzvLpUVVqvMBbjZ4zlrnQb/agQ7zCXv27i3RUw=,YES,NO,NO,NO,YES,YES,NO,YES,A0qW1vORvEPazvD0/mLe06Hm//wfSuxG8qBnFBj8RlI=,NO,NO,M4ra4ZsGWR+veMAUMfpKnI4R8f5lQmz332MQ1RAcPGY=,BWlzsfzvLpUVVqvMBbjZ4zlrnQb/agQ7zCXv27i3RUw=,YES,NO,NO,NO,YES,YES,NO,YES,YES,NO,YES,NO,YES,YES,NO,YES
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1994,NO,NO,Z6vucL/W0MPoFsgu2ewNXrvNCAQFiKzUJTYuqh6lP28=,yhI9Bw5Q8l1vEll4sw/Tem/jojpE9KwjKvQQIyrAqgU=,YES,NO,NO,NO,NO,YES,NO,YES,NO,NO,NO,NO,J0BXf75QysCCr1LQvwLwqKywUU/ARUnFbGjg+QdLpX8=,ilf2dGT7f31KRWK57leidnw87VyQr7sdJUl+Ffqrnjs=,YES,NO,NO,NO,YES,YES,NO,YES,XwGF4l3BgP/6u3gHERvOObKrF89JjyzD07RwwVRnm8E=,NO,NO,J0BXf75QysCCr1LQvwLwqKywUU/ARUnFbGjg+QdLpX8=,ilf2dGT7f31KRWK57leidnw87VyQr7sdJUl+Ffqrnjs=,YES,NO,NO,NO,YES,YES,NO,YES,8W2jqgU9VhlI3ledX0DtFS20qRfWG4lMiGtXVF629RE=,NO,NO,J0BXf75QysCCr1LQvwLwqKywUU/ARUnFbGjg+QdLpX8=,ilf2dGT7f31KRWK57leidnw87VyQr7sdJUl+Ffqrnjs=,YES,NO,NO,NO,YES,YES,NO,YES,YES,NO,NO,NO,YES,YES,NO,YES
1995,NO,NO,LKQ9Uh6tQ3ZrIxAKaPaDEuiYFunnK/2d+oKAfpN9tuY=,h0cPLYjd7nmw9FJsQA+KUsnChH0SajbHjNdfMk47k9o=,YES,NO,NO,NO,NO,YES,NO,YES,NO,NO,NO,NO,Pgia2elmEzkgkUe2GEDPm8O69qm5+2ZzhkpklIsrHvY=,1CiKJR7D66tRwH6l6wwv0p+D/tAuoW+NdSNqPTbvDoQ=,NO,NO,NO,NO,NO,NO,NO,NO,I5T+alDpnWyWctJmKqYNmZr8hGc/SQiqLjHrOANRIC8=,NO,NO,5Vnx3es11UozOKalAqzD1x1/2jTqcXhGPWiBVQuHvmg=,tnLDGLnpYhzsik5+X+WPo4KQJoQA0TfWRlmEtQ3XNJQ=,NO,NO,YES,NO,NO,NO,NO,NO,Eu/ebH1jFr/SVuF7llNoE1CXbg3gE56Ub+r7IBTtjd4=,NO,NO,955UFNdEoKo3fP1s6fReA0xsn6MsFK4DkaQQm93osL4=,3yK2OPj1uYDsoMgsxsjY1FxXkOllD8Xfh20VYGqT+nU=,NO,NO,YES,NO,NO,NO,NO,NO,YES,NO,NO,NO,NO,YES,NO,YES
1996,NO,NO,/tuZYGMsFx4A/Ou+jSol6t/TpLRkSl8Ku+1tnQPvwww=,aLEeZ8ZFKt2jQfkG5e9Nmad+QJlfpPmSfQS3CHlL6Ik=,NO,NO,NO,NO,NO,NO,NO,NO,NO,NO,NO,NO,qyemYRzOAcokSqS/92nD6ek2GOy82zS8Zr+OtdK3K1o=,tnLDGLnpYhzsik5+X+WPo4KQJoQA0TfWRlmEtQ3XNJQ=,NO,NO,YES,NO,NO,NO,NO,NO,L28qavC1qdxbQKqPAazKNeInnw7SbaN12h48g/VWSEg=,NO,NO,qyemYRzOAcokSqS/92nD6ek2GOy82zS8Zr+OtdK3K1o=,tnLDGLnpYhzsik5+X+WPo4KQJoQA0TfWRlmEtQ3XNJQ=,NO,NO,YES,NO,NO,NO,NO,NO,zu3rCMl9Lih8mHQaU2J5ysGZFDHk5hK0vNqdMtMLty0=,NO,NO,qyemYRzOAcokSqS/92nD6ek2GOy82zS8Zr+OtdK3K1o=,tnLDGLnpYhzsik5+X+WPo4KQJoQA0TfWRlmEtQ3XNJQ=,NO,NO,YES,NO,NO,NO,NO,NO,NO,NO,NO,NO,NO,NO,NO,NO
1997,NO,NO,uMIU2KDOxlgzhYToCFCa3nMxIOPV0WqCnKWfooGaw+8=,4LhhvTzxwvh2SnFtcpaRasyvph66a3YDIQCshAfyS2o=,NO,NO,YES,NO,NO,NO,NO,NO,NO,NO,NO,NO,UquKPML0snQ8uxGGONzBaGjSjKUOCdOYQMeqXedWl6c=,XsmDHgToSgSR4zK64rJBIxe9rkz5eyzFh3TjxTvSpnk=,YES,NO,NO,NO,NO,YES,NO,YES,M0zOutuBgj8hmJP9FcA45DVyx1UWt9h0BeRt2RnOQWY=,NO,NO,UquKPML0snQ8uxGGONzBaGjSjKUOCdOYQMeqXedWl6c=,XsmDHgToSgSR4zK64rJBIxe9rkz5eyzFh3TjxTvSpnk=,YES,NO,NO,NO,NO,YES,NO,YES,Mv+vwn4wgOtw5YJEqp0pSfj55e8MJqCky24z5NQHTbU=,NO,NO,/c/cfJxPOyk5eh9imrgWGV2z+2Os2fEq5KmBNBTNPqQ=,QIHEzfEYHubEp9c6aGZBHgEzfU0l0BWn+C3bAM0M51A=,YES,NO,NO,NO,NO,YES,NO,YES,YES,NO,YES,NO,YES,NO,NO,NO


In [61]:
test_df_cat1.isna().sum().values

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int64)

### Applying TF-IDF Vectorizer to the test set

In [63]:
test2 = test_features.copy()

for feature in test_df_cat1.columns:
    
    # Apply TF-IDF vectorization
    tfidf_vectorizer = TfidfVectorizer(max_features = 2)
    
    tfidf_feature = tfidf_vectorizer.fit_transform(test2[feature])
    
    test2 = test2.drop(columns=[feature])
    
    test2 = pd.concat([pd.DataFrame(tfidf_feature.toarray()), test2], axis=1)

### Converting the datatype into float. After this step all the columns are numerical.

In [81]:
# test3 = test2.select_dtypes(include=['object']).astype('float64')

test3 = test2.astype('float64')

### Scaling the data

In [83]:
test3.columns = test3.columns.astype(str)

test3_std = std.fit_transform(test3)

In [84]:
final_test = pd.DataFrame(test3_std)

### Make predictions to the test set.

In [86]:
test_predict = v_rfc.predict(final_test)

In [87]:
test_predict

array([[0, 0, 0, ..., 0, 0, 1],
       [0, 0, 0, ..., 0, 0, 1],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 1],
       [0, 0, 0, ..., 0, 0, 1],
       [0, 0, 0, ..., 0, 0, 1]], dtype=int64)

### Converting the predictions to dataframe.

In [88]:
test_predict_df = pd.DataFrame(test_predict, columns = y.columns)

In [92]:
test_predict_df = pd.concat([test_features['id'], test_predict_df], axis=1)

In [93]:
test_predict_df

Unnamed: 0,id,y1,y2,y3,y4,y5,y6,y7,y8,y9,y10,y11,y12,y13,y14,y15,y16,y17,y18,y19,y20,y21,y22,y23,y24,y25,y26,y27,y28,y29,y30,y31,y32,y33
0,1698002,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
1,1698003,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
2,1698004,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,1698005,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,1698006,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1994,1699996,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1995,1699997,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1996,1699998,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
1997,1699999,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1


### Getting the id column values and the prediction columns, and make new column that's 'id_label" and 'Pred'.

In [115]:
id_vars = test_predict_df.columns[0]  # 'A' column
value_vars = test_predict_df.columns[1:]  # 'B', 'C', 'D' columns

melted_df = pd.melt(test_predict_df, id_vars=id_vars, value_vars=value_vars, var_name='label', value_name='Pred')

melted_df['id_label'] = melted_df['id'].astype(str)+'_'+melted_df['label']

melted_df = melted_df.drop(['id', 'label'], axis=1)

melted_df = melted_df[['id_label', 'Pred']]

### Exporting the final results in Excel.

In [117]:
melted_df.to_excel('Dileep_submission.xlsx', index=False)

### After reading the excel display the sample results.

In [130]:
pd.read_excel('Dileep_submission.xlsx').sample(10)

Unnamed: 0,id_label,Pred
22534,1698547_y12,1
51046,1699073_y26,0
37501,1699521_y19,0
32659,1698677_y17,0
8089,1698095_y5,0
9024,1699030_y5,0
13006,1699014_y7,0
22405,1698418_y12,0
37592,1699612_y19,0
4010,1698014_y3,0
