### Attribute description for German Credit dataset

Attribute 1:  (qualitative)
	       Status of existing checking account
               A11 :      ... <    0 DM
	       A12 : 0 <= ... <  200 DM
	       A13 :      ... >= 200 DM /
		     salary assignments for at least 1 year
               A14 : no checking account

Attribute 2:  (numerical)
	      Duration in month

Attribute 3:  (qualitative)
	      Credit history
	      A30 : no credits taken/
		    all credits paid back duly
              A31 : all credits at this bank paid back duly
	      A32 : existing credits paid back duly till now
              A33 : delay in paying off in the past
	      A34 : critical account/
		    other credits existing (not at this bank)

Attribute 4:  (qualitative)
	      Purpose
	      A40 : car (new)
	      A41 : car (used)
	      A42 : furniture/equipment
	      A43 : radio/television
	      A44 : domestic appliances
	      A45 : repairs
	      A46 : education
	      A47 : (vacation - does not exist?)
	      A48 : retraining
	      A49 : business
	      A410 : others

Attribute 5:  (numerical)
	      Credit amount

Attibute 6:  (qualitative)
	      Savings account/bonds
	      A61 :          ... <  100 DM
	      A62 :   100 <= ... <  500 DM
	      A63 :   500 <= ... < 1000 DM
	      A64 :          .. >= 1000 DM
              A65 :   unknown/ no savings account

Attribute 7:  (qualitative)
	      Present employment since
	      A71 : unemployed
	      A72 :       ... < 1 year
	      A73 : 1  <= ... < 4 years  
	      A74 : 4  <= ... < 7 years
	      A75 :       .. >= 7 years

Attribute 8:  (numerical)
	      Installment rate in percentage of disposable income

Attribute 9:  (qualitative)
	      Personal status and sex
	      A91 : male   : divorced/separated
	      A92 : female : divorced/separated/married
              A93 : male   : single
	      A94 : male   : married/widowed
	      A95 : female : single

Attribute 10: (qualitative)
	      Other debtors / guarantors
	      A101 : none
	      A102 : co-applicant
	      A103 : guarantor

Attribute 11: (numerical)
	      Present residence since

Attribute 12: (qualitative)
	      Property
	      A121 : real estate
	      A122 : if not A121 : building society savings agreement/
				   life insurance
              A123 : if not A121/A122 : car or other, not in attribute 6
	      A124 : unknown / no property

Attribute 13: (numerical)
	      Age in years

Attribute 14: (qualitative)
	      Other installment plans 
	      A141 : bank
	      A142 : stores
	      A143 : none

Attribute 15: (qualitative)
	      Housing
	      A151 : rent
	      A152 : own
	      A153 : for free

Attribute 16: (numerical)
              Number of existing credits at this bank

Attribute 17: (qualitative)
	      Job
	      A171 : unemployed/ unskilled  - non-resident
	      A172 : unskilled - resident
	      A173 : skilled employee / official
	      A174 : management/ self-employed/
		     highly qualified employee/ officer

Attribute 18: (numerical)
	      Number of people being liable to provide maintenance for

Attribute 19: (qualitative)
	      Telephone
	      A191 : none
	      A192 : yes, registered under the customers name

Attribute 20: (qualitative)
	      foreign worker
	      A201 : yes
	      A202 : no


In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline
pd.set_option("display.max_rows",1200)
pd.set_option("display.max_columns",50)

sns.set(font_scale=3)

from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer

In [2]:
df = pd.read_csv("germancreditclean.csv")

In [3]:
df.head()

Unnamed: 0,customer_id,checking_account_status,loan_duration_mo,credit_history,purpose,loan_amount,savings_account_balance,time_employed_yrs,payment_pcnt_income,gender_status,other_signators,time_in_residence,property,age_yrs,other_credit_outstanding,home_ownership,number_loans,job_category,dependents,telephone,foreign_worker,bad_credit
0,6156361,A12,48,A32,A43,5951,A61,A73,2,A92,A101,2,A121,22,A143,A152,1,A173,1,A191,A201,1
1,2051359,A14,12,A34,A46,2096,A61,A74,2,A93,A101,3,A121,49,A143,A152,1,A172,2,A191,A201,0
2,8740590,A11,42,A32,A42,7882,A61,A74,2,A93,A103,4,A122,45,A143,A153,1,A173,2,A191,A201,0
3,3924540,A11,24,A33,A40,4870,A61,A73,3,A93,A101,4,A124,53,A143,A153,2,A173,2,A191,A201,1
4,3115687,A14,36,A32,A46,9055,A65,A73,2,A93,A101,4,A124,35,A143,A153,1,A172,2,A192,A201,0


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1011 entries, 0 to 1010
Data columns (total 22 columns):
customer_id                 1011 non-null int64
checking_account_status     1011 non-null object
loan_duration_mo            1011 non-null int64
credit_history              1011 non-null object
purpose                     1011 non-null object
loan_amount                 1011 non-null int64
savings_account_balance     1011 non-null object
time_employed_yrs           1011 non-null object
payment_pcnt_income         1011 non-null int64
gender_status               1011 non-null object
other_signators             1011 non-null object
time_in_residence           1011 non-null int64
property                    1011 non-null object
age_yrs                     1011 non-null int64
other_credit_outstanding    1011 non-null object
home_ownership              1011 non-null object
number_loans                1011 non-null int64
job_category                1011 non-null object
dependents        

In [5]:
df.describe()

Unnamed: 0,customer_id,loan_duration_mo,loan_amount,payment_pcnt_income,time_in_residence,age_yrs,number_loans,dependents,bad_credit
count,1011.0,1011.0,1011.0,1011.0,1011.0,1011.0,1011.0,1011.0,1011.0
mean,5418868.0,20.868447,3267.196835,2.969337,2.841741,35.552918,1.406528,1.155292,0.298714
std,2567433.0,12.028247,2818.261437,1.11872,1.105646,11.357116,0.580131,0.362362,0.457921
min,1018706.0,4.0,250.0,1.0,1.0,19.0,1.0,1.0,0.0
25%,3213826.0,12.0,1365.0,2.0,2.0,27.0,1.0,1.0,0.0
50%,5490556.0,18.0,2315.0,3.0,3.0,33.0,1.0,1.0,0.0
75%,7534566.0,24.0,3972.5,4.0,4.0,42.0,2.0,1.0,1.0
max,9994482.0,72.0,18424.0,4.0,4.0,75.0,4.0,2.0,1.0


In [6]:
df.shape

(1011, 22)

### Drop CustomerID and bad credit for transformation

In [7]:
df.drop(['customer_id','bad_credit'],axis=1,inplace=True)

In [8]:
#df.to_csv("germancredittrans.csv",index=False)

In [9]:
df = pd.read_csv("germancredittrans.csv")

In [10]:
df.describe(include='all')

Unnamed: 0,checking_account_status,loan_duration_mo,credit_history,purpose,loan_amount,savings_account_balance,time_employed_yrs,payment_pcnt_income,gender_status,other_signators,time_in_residence,property,age_yrs,other_credit_outstanding,home_ownership,number_loans,job_category,dependents,telephone,foreign_worker
count,1011,1011.0,1011,1011,1011.0,1011,1011,1011.0,1011,1011,1011.0,1011,1011.0,1011,1011,1011.0,1011,1011.0,1011,1011
unique,4,,5,10,,5,5,,4,3,,4,,3,3,,4,,2,2
top,A14,,A32,A43,,A61,A73,,A93,A101,,A123,,A143,A152,,A173,,A191,A201
freq,399,,540,282,,612,346,,553,915,,335,,824,722,,636,,602,973
mean,,20.868447,,,3267.196835,,,2.969337,,,2.841741,,35.552918,,,1.406528,,1.155292,,
std,,12.028247,,,2818.261437,,,1.11872,,,1.105646,,11.357116,,,0.580131,,0.362362,,
min,,4.0,,,250.0,,,1.0,,,1.0,,19.0,,,1.0,,1.0,,
25%,,12.0,,,1365.0,,,2.0,,,2.0,,27.0,,,1.0,,1.0,,
50%,,18.0,,,2315.0,,,3.0,,,3.0,,33.0,,,1.0,,1.0,,
75%,,24.0,,,3972.5,,,4.0,,,4.0,,42.0,,,2.0,,1.0,,


In [11]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1011 entries, 0 to 1010
Data columns (total 20 columns):
checking_account_status     1011 non-null object
loan_duration_mo            1011 non-null int64
credit_history              1011 non-null object
purpose                     1011 non-null object
loan_amount                 1011 non-null int64
savings_account_balance     1011 non-null object
time_employed_yrs           1011 non-null object
payment_pcnt_income         1011 non-null int64
gender_status               1011 non-null object
other_signators             1011 non-null object
time_in_residence           1011 non-null int64
property                    1011 non-null object
age_yrs                     1011 non-null int64
other_credit_outstanding    1011 non-null object
home_ownership              1011 non-null object
number_loans                1011 non-null int64
job_category                1011 non-null object
dependents                  1011 non-null int64
telephone         

### Drop irrelevant categorical features

In [12]:
df.drop(['purpose','savings_account_balance','gender_status','other_signators','property','home_ownership','telephone','foreign_worker'], 
        axis=1, inplace=True)

In [13]:
df.head()

Unnamed: 0,checking_account_status,loan_duration_mo,credit_history,loan_amount,time_employed_yrs,payment_pcnt_income,time_in_residence,age_yrs,other_credit_outstanding,number_loans,job_category,dependents
0,A12,48,A32,5951,A73,2,2,22,A143,1,A173,1
1,A14,12,A34,2096,A74,2,3,49,A143,1,A172,2
2,A11,42,A32,7882,A74,2,4,45,A143,1,A173,2
3,A11,24,A33,4870,A73,3,4,53,A143,2,A173,2
4,A14,36,A32,9055,A73,2,4,35,A143,1,A172,2


### Perform one-hot encoding

In [None]:
df.values

In [None]:
cat_features = ["checking_account_status","credit_history"]
onehot = OneHotEncoder()
transformer = ColumnTransformer([("one_hot",onehot,cat_features)], remainder="passthrough")

In [None]:
df1 = transformer.fit_transform(df)

### Use Get Dummies Pandas

In [14]:
df1 = pd.get_dummies(data=df, drop_first=True)

In [15]:
df1.shape

(1011, 23)

In [17]:
df1.head()

Unnamed: 0,loan_duration_mo,loan_amount,payment_pcnt_income,time_in_residence,age_yrs,number_loans,dependents,checking_account_status_A12,checking_account_status_A13,checking_account_status_A14,credit_history_A31,credit_history_A32,credit_history_A33,credit_history_A34,time_employed_yrs_A72,time_employed_yrs_A73,time_employed_yrs_A74,time_employed_yrs_A75,other_credit_outstanding_A142,other_credit_outstanding_A143,job_category_A172,job_category_A173,job_category_A174
0,48,5951,2,2,22,1,1,1,0,0,0,1,0,0,0,1,0,0,0,1,0,1,0
1,12,2096,2,3,49,1,2,0,0,1,0,0,0,1,0,0,1,0,0,1,1,0,0
2,42,7882,2,4,45,1,2,0,0,0,0,1,0,0,0,0,1,0,0,1,0,1,0
3,24,4870,3,4,53,2,2,0,0,0,0,0,1,0,0,1,0,0,0,1,0,1,0
4,36,9055,2,4,35,1,2,0,0,1,0,1,0,0,0,1,0,0,0,1,1,0,0


In [18]:
#df1.to_csv("germancredittrans2.csv",index=False)

In [19]:
df2 = pd.read_csv("germancredittrans2.csv")

In [20]:
df2.shape

(1011, 23)

In [21]:
df2.head()

Unnamed: 0,loan_duration_mo,loan_amount,payment_pcnt_income,time_in_residence,age_yrs,number_loans,dependents,checking_account_status_A12,checking_account_status_A13,checking_account_status_A14,credit_history_A31,credit_history_A32,credit_history_A33,credit_history_A34,time_employed_yrs_A72,time_employed_yrs_A73,time_employed_yrs_A74,time_employed_yrs_A75,other_credit_outstanding_A142,other_credit_outstanding_A143,job_category_A172,job_category_A173,job_category_A174
0,48,5951,2,2,22,1,1,1,0,0,0,1,0,0,0,1,0,0,0,1,0,1,0
1,12,2096,2,3,49,1,2,0,0,1,0,0,0,1,0,0,1,0,0,1,1,0,0
2,42,7882,2,4,45,1,2,0,0,0,0,1,0,0,0,0,1,0,0,1,0,1,0
3,24,4870,3,4,53,2,2,0,0,0,0,0,1,0,0,1,0,0,0,1,0,1,0
4,36,9055,2,4,35,1,2,0,0,1,0,1,0,0,0,1,0,0,0,1,1,0,0


In [22]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1011 entries, 0 to 1010
Data columns (total 23 columns):
loan_duration_mo                 1011 non-null int64
loan_amount                      1011 non-null int64
payment_pcnt_income              1011 non-null int64
time_in_residence                1011 non-null int64
age_yrs                          1011 non-null int64
number_loans                     1011 non-null int64
dependents                       1011 non-null int64
checking_account_status_A12      1011 non-null int64
checking_account_status_A13      1011 non-null int64
checking_account_status_A14      1011 non-null int64
credit_history_A31               1011 non-null int64
credit_history_A32               1011 non-null int64
credit_history_A33               1011 non-null int64
credit_history_A34               1011 non-null int64
time_employed_yrs_A72            1011 non-null int64
time_employed_yrs_A73            1011 non-null int64
time_employed_yrs_A74            1011 non-nul

In [23]:
df2.rename(columns={'checking_account_status_A12':'CAS1','checking_account_status_A13':'CAS2','checking_account_status_A14':'CAS3'},inplace=True)

In [24]:
df2.head()

Unnamed: 0,loan_duration_mo,loan_amount,payment_pcnt_income,time_in_residence,age_yrs,number_loans,dependents,CAS1,CAS2,CAS3,credit_history_A31,credit_history_A32,credit_history_A33,credit_history_A34,time_employed_yrs_A72,time_employed_yrs_A73,time_employed_yrs_A74,time_employed_yrs_A75,other_credit_outstanding_A142,other_credit_outstanding_A143,job_category_A172,job_category_A173,job_category_A174
0,48,5951,2,2,22,1,1,1,0,0,0,1,0,0,0,1,0,0,0,1,0,1,0
1,12,2096,2,3,49,1,2,0,0,1,0,0,0,1,0,0,1,0,0,1,1,0,0
2,42,7882,2,4,45,1,2,0,0,0,0,1,0,0,0,0,1,0,0,1,0,1,0
3,24,4870,3,4,53,2,2,0,0,0,0,0,1,0,0,1,0,0,0,1,0,1,0
4,36,9055,2,4,35,1,2,0,0,1,0,1,0,0,0,1,0,0,0,1,1,0,0


In [25]:
df2.rename(columns={'credit_history_A31':'CH1','credit_history_A32':'CH2',
                    'credit_history_A33':'CH3','credit_history_A34':'CH4'},inplace=True)

In [26]:
df2.head()

Unnamed: 0,loan_duration_mo,loan_amount,payment_pcnt_income,time_in_residence,age_yrs,number_loans,dependents,CAS1,CAS2,CAS3,CH1,CH2,CH3,CH4,time_employed_yrs_A72,time_employed_yrs_A73,time_employed_yrs_A74,time_employed_yrs_A75,other_credit_outstanding_A142,other_credit_outstanding_A143,job_category_A172,job_category_A173,job_category_A174
0,48,5951,2,2,22,1,1,1,0,0,0,1,0,0,0,1,0,0,0,1,0,1,0
1,12,2096,2,3,49,1,2,0,0,1,0,0,0,1,0,0,1,0,0,1,1,0,0
2,42,7882,2,4,45,1,2,0,0,0,0,1,0,0,0,0,1,0,0,1,0,1,0
3,24,4870,3,4,53,2,2,0,0,0,0,0,1,0,0,1,0,0,0,1,0,1,0
4,36,9055,2,4,35,1,2,0,0,1,0,1,0,0,0,1,0,0,0,1,1,0,0


In [27]:
df2.rename(columns={'time_employed_yrs_A72':'TE1','time_employed_yrs_A73':'TE2',
                    'time_employed_yrs_A74':'TE3','time_employed_yrs_A75':'TE4'},inplace=True)

In [28]:
df2.head()

Unnamed: 0,loan_duration_mo,loan_amount,payment_pcnt_income,time_in_residence,age_yrs,number_loans,dependents,CAS1,CAS2,CAS3,CH1,CH2,CH3,CH4,TE1,TE2,TE3,TE4,other_credit_outstanding_A142,other_credit_outstanding_A143,job_category_A172,job_category_A173,job_category_A174
0,48,5951,2,2,22,1,1,1,0,0,0,1,0,0,0,1,0,0,0,1,0,1,0
1,12,2096,2,3,49,1,2,0,0,1,0,0,0,1,0,0,1,0,0,1,1,0,0
2,42,7882,2,4,45,1,2,0,0,0,0,1,0,0,0,0,1,0,0,1,0,1,0
3,24,4870,3,4,53,2,2,0,0,0,0,0,1,0,0,1,0,0,0,1,0,1,0
4,36,9055,2,4,35,1,2,0,0,1,0,1,0,0,0,1,0,0,0,1,1,0,0


In [29]:
df2.rename(columns={'other_credit_outstanding_A142':'OC1','other_credit_outstanding_A143':'OC2'},inplace=True)

In [30]:
df2.head()

Unnamed: 0,loan_duration_mo,loan_amount,payment_pcnt_income,time_in_residence,age_yrs,number_loans,dependents,CAS1,CAS2,CAS3,CH1,CH2,CH3,CH4,TE1,TE2,TE3,TE4,OC1,OC2,job_category_A172,job_category_A173,job_category_A174
0,48,5951,2,2,22,1,1,1,0,0,0,1,0,0,0,1,0,0,0,1,0,1,0
1,12,2096,2,3,49,1,2,0,0,1,0,0,0,1,0,0,1,0,0,1,1,0,0
2,42,7882,2,4,45,1,2,0,0,0,0,1,0,0,0,0,1,0,0,1,0,1,0
3,24,4870,3,4,53,2,2,0,0,0,0,0,1,0,0,1,0,0,0,1,0,1,0
4,36,9055,2,4,35,1,2,0,0,1,0,1,0,0,0,1,0,0,0,1,1,0,0


In [31]:
df2.rename(columns={'job_category_A172':'JC1','job_category_A173':'JC2',
                    'job_category_A174':'JC3'},inplace=True)

In [32]:
df2.head()

Unnamed: 0,loan_duration_mo,loan_amount,payment_pcnt_income,time_in_residence,age_yrs,number_loans,dependents,CAS1,CAS2,CAS3,CH1,CH2,CH3,CH4,TE1,TE2,TE3,TE4,OC1,OC2,JC1,JC2,JC3
0,48,5951,2,2,22,1,1,1,0,0,0,1,0,0,0,1,0,0,0,1,0,1,0
1,12,2096,2,3,49,1,2,0,0,1,0,0,0,1,0,0,1,0,0,1,1,0,0
2,42,7882,2,4,45,1,2,0,0,0,0,1,0,0,0,0,1,0,0,1,0,1,0
3,24,4870,3,4,53,2,2,0,0,0,0,0,1,0,0,1,0,0,0,1,0,1,0
4,36,9055,2,4,35,1,2,0,0,1,0,1,0,0,0,1,0,0,0,1,1,0,0


In [33]:
#Save to csv
#df2.to_csv("germancredittrans3.csv",index=False)

### Combine target variable and features

In [34]:
df = pd.read_csv("germancreditclean.csv")

In [35]:
df.head()

Unnamed: 0,customer_id,checking_account_status,loan_duration_mo,credit_history,purpose,loan_amount,savings_account_balance,time_employed_yrs,payment_pcnt_income,gender_status,other_signators,time_in_residence,property,age_yrs,other_credit_outstanding,home_ownership,number_loans,job_category,dependents,telephone,foreign_worker,bad_credit
0,6156361,A12,48,A32,A43,5951,A61,A73,2,A92,A101,2,A121,22,A143,A152,1,A173,1,A191,A201,1
1,2051359,A14,12,A34,A46,2096,A61,A74,2,A93,A101,3,A121,49,A143,A152,1,A172,2,A191,A201,0
2,8740590,A11,42,A32,A42,7882,A61,A74,2,A93,A103,4,A122,45,A143,A153,1,A173,2,A191,A201,0
3,3924540,A11,24,A33,A40,4870,A61,A73,3,A93,A101,4,A124,53,A143,A153,2,A173,2,A191,A201,1
4,3115687,A14,36,A32,A46,9055,A65,A73,2,A93,A101,4,A124,35,A143,A153,1,A172,2,A192,A201,0


In [36]:
target = df['bad_credit']

In [39]:
target.shape

(1011,)

In [40]:
df2 = pd.read_csv("germancredittrans3.csv")

In [41]:
df2.head()

Unnamed: 0,loan_duration_mo,loan_amount,payment_pcnt_income,time_in_residence,age_yrs,number_loans,dependents,CAS1,CAS2,CAS3,CH1,CH2,CH3,CH4,TE1,TE2,TE3,TE4,OC1,OC2,JC1,JC2,JC3
0,48,5951,2,2,22,1,1,1,0,0,0,1,0,0,0,1,0,0,0,1,0,1,0
1,12,2096,2,3,49,1,2,0,0,1,0,0,0,1,0,0,1,0,0,1,1,0,0
2,42,7882,2,4,45,1,2,0,0,0,0,1,0,0,0,0,1,0,0,1,0,1,0
3,24,4870,3,4,53,2,2,0,0,0,0,0,1,0,0,1,0,0,0,1,0,1,0
4,36,9055,2,4,35,1,2,0,0,1,0,1,0,0,0,1,0,0,0,1,1,0,0


In [42]:
df3 = pd.concat([df2,target],axis=1)

In [43]:
df3

Unnamed: 0,loan_duration_mo,loan_amount,payment_pcnt_income,time_in_residence,age_yrs,number_loans,dependents,CAS1,CAS2,CAS3,CH1,CH2,CH3,CH4,TE1,TE2,TE3,TE4,OC1,OC2,JC1,JC2,JC3,bad_credit
0,48,5951,2,2,22,1,1,1,0,0,0,1,0,0,0,1,0,0,0,1,0,1,0,1
1,12,2096,2,3,49,1,2,0,0,1,0,0,0,1,0,0,1,0,0,1,1,0,0,0
2,42,7882,2,4,45,1,2,0,0,0,0,1,0,0,0,0,1,0,0,1,0,1,0,0
3,24,4870,3,4,53,2,2,0,0,0,0,0,1,0,0,1,0,0,0,1,0,1,0,1
4,36,9055,2,4,35,1,2,0,0,1,0,1,0,0,0,1,0,0,0,1,1,0,0,0
5,24,2835,3,4,53,1,1,0,0,1,0,1,0,0,0,0,0,1,0,1,0,1,0,0
6,36,6948,2,2,35,1,1,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,1,0
7,12,3059,2,4,61,1,1,0,0,1,0,1,0,0,0,0,1,0,0,1,1,0,0,0
8,30,5234,4,2,28,2,1,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,1,1
9,12,1295,3,1,25,1,1,1,0,0,0,1,0,0,1,0,0,0,0,1,0,1,0,1


In [44]:
df3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1011 entries, 0 to 1010
Data columns (total 24 columns):
loan_duration_mo       1011 non-null int64
loan_amount            1011 non-null int64
payment_pcnt_income    1011 non-null int64
time_in_residence      1011 non-null int64
age_yrs                1011 non-null int64
number_loans           1011 non-null int64
dependents             1011 non-null int64
CAS1                   1011 non-null int64
CAS2                   1011 non-null int64
CAS3                   1011 non-null int64
CH1                    1011 non-null int64
CH2                    1011 non-null int64
CH3                    1011 non-null int64
CH4                    1011 non-null int64
TE1                    1011 non-null int64
TE2                    1011 non-null int64
TE3                    1011 non-null int64
TE4                    1011 non-null int64
OC1                    1011 non-null int64
OC2                    1011 non-null int64
JC1                    1011 non-n

In [45]:
df3.shape

(1011, 24)

In [46]:
#Save to csv
#df3.to_csv("train.csv",index=False)