# <center> **Home Credit Default Risk Assessment**
# <center> **Feature Engineering**

# **Introduction**

In this part of the project, I utilize the existing features to create new features that will prove to have higher predictive abilities than the original features. I save the dataframe with the new features as a table to be used in later stages of this project.

# **Libraries**

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.preprocessing import PolynomialFeatures
from sklearn.impute import SimpleImputer

import functions
import importlib
importlib.reload(functions)

import warnings

# **Display**

In [2]:
%matplotlib inline

pd.options.display.max_rows = 300000
pd.options.display.max_columns = 999
pd.options.display.max_colwidth = 500

warnings.filterwarnings("ignore")
warnings.simplefilter(action="ignore", category=FutureWarning)

pd.set_option('display.max_rows', 200)

size = 20

# **Data**

## **Load Data**

In [3]:
train = pd.read_csv(
    r"C:\Users\Dell\Documents\AI\Risk\Data\Data\train.csv",
    index_col=False
)

## **Reduce Memory Usage**

Changing datatypes to a lower level to save on system resources.

In [5]:
train = functions.reduce_memory_usage(train)

Memory usage of dataframe is 260.42 MB
Memory usage after optimization is: 88.57 MB
Decreased by 66.0%


# **Feature Engineering**

Here I create 14 new features from the existing features. 

## **Debt to Income Ratio**

In [6]:
train['DEBT_TO_INCOME_RATIO'] = train['AMT_CREDIT'] / train['AMT_INCOME_TOTAL']

## **Annuity to Income Ratio**

In [7]:
train['ANNUITY_TO_INCOME_RATIO'] = train['AMT_ANNUITY'] / train['AMT_INCOME_TOTAL']

## **Annuity to Credit Ratio**

In [8]:
train['ANNUITY_TO_CREDIT_RATIO'] = train['AMT_ANNUITY'] / train['AMT_CREDIT'] 

## **Annual Payment to Credit Ratio**

In [9]:
train['ANNUAL_PAYMENT_TO_CREDIT_RATIO'] = (train['AMT_ANNUITY'] * 12) / train['AMT_CREDIT']

## **Employment to Age Ratio**

In [10]:
train['YEARS_EMPLOYED_RATIO'] = train['YEARS_EMPLOYED'] / train['AGE']

## **Dependent to Family Size Ratio**

In [11]:
train['DEPENDENTS_TO_FAMILY_SIZE'] = train['CNT_CHILDREN'] / train['CNT_FAM_MEMBERS']

## **Income to Age Ratio**

In [12]:
train['INCOME_TO_AGE_RATIO'] = train['AMT_INCOME_TOTAL'] / train['AGE']

## **Credit to Age Ratio**

In [13]:
train['CREDIT_TO_AGE_RATIO'] = train['AMT_CREDIT']  / train['AGE']

## **Employment Age Product**

In [14]:
train['YEARS_EMPLOYED_AGE_PRODUCT'] = train['YEARS_EMPLOYED'] * train['AGE']

## **Income per Family Member**

In [15]:
train['INCOME_PER_FAMILY_MEMBER'] = train['AMT_INCOME_TOTAL'] / train['CNT_FAM_MEMBERS']

## **Income per Dependent**

In [16]:
train['INCOME_PER_DEPENDENT'] = train['AMT_INCOME_TOTAL'] / (1 + train['CNT_CHILDREN'])

## **Credit per Dependent**

In [17]:
train['CREDIT_PER_DEPENDENT'] = train['AMT_CREDIT'] / (1 + train['CNT_CHILDREN'])

## **Mean of External Source Features**

In [18]:
train['EXT_SOURCE_MEAN'] = train[['EXT_SOURCE_1', 'EXT_SOURCE_2', 'EXT_SOURCE_3']].mean(axis=1)

## **Product of External Source Features**

In [19]:
train['EXT_SOURCE_PRODUCT'] = train['EXT_SOURCE_1'] * train['EXT_SOURCE_2'] * train['EXT_SOURCE_3'] 

# **Save Dataframe as CSV File**

A new dataframe is created to be used in later parts of this project.

In [21]:
train.to_csv(r"C:\Users\Dell\Documents\AI\Risk\Data\Data\train 22.csv", index=False)

# **Summary**

> * **14 New Features** — 14 additional features were created from the features in the application_train data. In later parts of this project, addtional features will be created from the features in the other 4 tables utilized in this project. 
> * **Predictive Abilities** — In later parts of this project, it will become clear that these new features have a higher predictive value than the features they were created from.
> * **Save New Dataframe** — A new dataframe is created and saved to be used in later parts of this project.