In the previous task we have removed the outlier and now we will do rest of the cleaning.
- Split the
Dependent Variables
andIndependent Variable
into train and test part. (Hint: X_train, y_train, remember!) Impute
the missing value of Numerical and categorical variables in train and test part of the data.
Parameter | dtype | argument type | default value | description |
---|---|---|---|---|
data | pandas DataFrame | compulsory | Data at hand for cleaning |
Parameter | dtype | description |
---|---|---|
X | DataFrame | Dataframe containing feature variables |
y | Series/DataFrame | Target Variable |
X_train | Numpy arrays for training any format acceptable by sklearn | scaled X_train |
X_test | Numpy arrays for testing any format acceptable by sklearn | scaled X_test |
y_train | Numpy arrays for training any format acceptable by sklearn | y_train |
y_test | Numpy arrays for testing any format acceptable by sklearn | y_test |
Hint :
- Set random seed as 9 before and while splitting the data set. Use test_size=0.25 while splitting
- Numerical variable (
LoanAmount
) imputation can be performed with mean imputation. - Categorical variables(
Gender
,Married
,Dependents
,Self_Employed
,Loan_Amount_Term
,Credit_History
) null values should be imputed with mode imputation.
Let's get started !