This project is an example of a data science task workflow. Specifically, it makes use of many Python libraries which you can find below and the three supervised classification models of Logistic Regression, Decision Trees and Random Forests. The question asked is the following:
"Use the Machine Learning Workflow to process and transform the Mortgage Loan dataset to create a prediction model. This model must predict which people are likely to be approved for a mortage loan with 75% or greater accuracy."
The project was a good learning exercise for me, and hopefully, is a good reference for you.
Loan Prediction : This data set corresponds to a set of anonymized financial transactions associated with loan provision and individual data. There are nearly 1000 observations and 12 features. Each observation is independent from the previous.
- Aggregating Data - Pivot Tables
- Histograms - Distribution of Numerical Values
- Boxplots - Distribution of Numerical values for each category of categorical variables
- Cross Tabulation - Frequency Distribution of Categorical Variables
- Figure with two bar subplots - Correlation of Variables Distribution to make an Hypothesis
To run the project, it is required that the following are installed in your system:
- anaconda
- Pyhton version: "^2.7"
- NumPy version: "^1.11.3"
- Matplotlib version: "^2.02"
- Pandas version: "^0.20.1"
- scikit-learn version: "^0.18.1"
If anaconda is installed, you do not have to install python and the packages since they are already included in anaconda packages.