# Students Do: Predicting Fraudulent Loans Applications

Every year, banks and credit card companies lose billions of dollars to compensating users for fraudulent loan or credit card applications. That's one reason why predicting fraud using machine learning techniques becomes a [broad area of research](https://scholar.google.com.mx/scholar?q=fraud+detection+machine+learning&btnG=&oq=fraud+detection+) and a great [business opportunity for FinTech startups](https://www.eu-startups.com/2019/06/paris-based-fintech-bleckwen-raises-e8-8-million-for-its-fraud-detection-software-to-prevent-financial-crime/).

In this activity, you are going to explore how tree-based algorithms can be used to identify fraudulent loan applications. You will start using a decision tree model that will be trained with the `sba_loans_encoded.csv` file that you created before.

In [1]:
# Initial imports
import pandas as pd
from pathlib import Path
from sklearn import tree
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report

# Needed for decision tree visualization
import pydotplus
from IPython.display import Image


## Loading and Preprocessing Loans Encoded Data

Load the `sba_loans_encoded.csv` in a pandas DataFrame called `df_loans`.

In [2]:
# Loading data
path = Path("../Resources/sba_loans_encoded.csv")
df_loans = pd.read_csv(path)


Define the features set, by copying the `df_loans` DataFrame and dropping the `Default` column.

In [4]:
# Define features set

X = df_loans.copy()
X.drop("Default", axis=1, inplace=True)




Create the target vector by assigning the values of the `Default` column from the `df_loans` DataFrame.

In [7]:
# Define target vector
y = df_loans['Default'].values.reshape(-1,1)
y[:5]

array([[0],
       [0],
       [0],
       [0],
       [0]])

Split the data into training and testing sets.

In [None]:
# Splitting into Train and Test sets
X_train, X_test, y_train, y_test = train_test_split(X,y, random_state=1)


Use the `StandardScaler` to scale the features data, remember that only `X_train` and `X_testing` DataFrames should be scaled.

In [None]:
# Create the StandardScaler instance
scaler = StandardScaler()


In [None]:
# Fit the Standard Scaler with the training data
X_scaler = scaler.fit(X_train)


In [None]:
# Scale the training data

X_train_scaled = X_scaler.transform(X_train)
X_test_scaled = X_scaler.transform(X_test)

## Fitting the Decision Tree Model

Once data is scaled, create a decision tree instance and train it with the training data (`X_train_scaled` and `y_train`).

In [None]:
# Create the decision tree classifier instance
model = tree.DecisionTreeClassifier()


In [None]:
# Fit the model

model.fit(X_train_scaled, y_train)

## Making Predictions Using the Tree Model

Validate the trained model, by predicting fraudulent loan applications using the testing data (`X_test_scaled`).

In [None]:
# Making predictions using the testing data
predictions = model.predict(X_test_scaled)


## Model Evaluation

Evaluate model's results, by using `sklearn` to calculate the confusion matrix, the accuracy score and to generate the classification report.

In [None]:
# Calculating the confusion matrix
cm = confusion_matrix(y_test, predictions)
# Calculating the accuracy score
cm_df = pd.Dataframe(
    cm, index=["Actual 0", "Actual 1"],
    columns=['Predicted 0',"Predicted 1"]
)

acc_score = accuracy_score(y_test, predictions)

In [None]:
# Displaying results
print("Confusion Matrix")
display(cm_df)
print(f"Accuracy Score")


## Visualizing the Decision Tree

In this section, you should create a visual representation of the decision tree using `pydotplus`. Show the graph on the notebook, and also save it in `PDF` and `PNG` formats.

In [None]:
# Create DOT data

# Draw graph

# Show graph



In [None]:
# Saving the tree as PDF


# Saving the tree as PNG



## Analysis Question

Finally, analyze the model's evaluation results and answer the following question.

* Would you trust in this model to deploy a loans application approval solution in a bank?

 * **Your answer here**