Skip to content

Zethtren/6hr_Timed_project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commits
 
 
 
 
 
 

Repository files navigation

Project 4

Resources:

The Notebook: (Caution running these cells. I am using a 16GB RAM with GPU optimization on the XGBoost. I experienced many Kernel crashes while putting this notbook together) https://git.generalassemb.ly/hbova/project_4/blob/master/ExploringLoanData.ipynb

The Data: https://www.kaggle.com/wendykan/lending-club-loan-data#loan.csv

The Presentation: https://docs.google.com/presentation/d/1wtbOwRK--AOGInoV7aIMq7XFDlodqQc25Gk-gFhcjjM/edit?usp=sharing

The Question:

Can we predict The count of deliquincies a person has based on other metrics collected when applying for a loan.

Hypothesis:

I believe that there are features which will be able to significantly predict how many payment delinquincies someone has had. This information will be useful in determining whether or not someone is likely to miss payments. This is information that any lender would love to have available to them when providing a loan service.

The Data:

The data I will be using is from the Kaggle website: https://www.kaggle.com/wendykan/lending-club-loan-data#loan.csv

There is many variables that seemed irrelevant to the predicting process and many variables that were directly correlated. During EDA all of the features were removed. And missing Data was handled with various methods of dropping and imputing.

Modeling:

The First model was used to determine how well the data can be used to predict my outcome. So first I fit a Generalized Linear Model from the statsmodel library. I passed a family class of Poisson. This allowed me to get a good fit to the data. Since the distribution was whole integers with many values at 0 decreasing as the value rises. After Analyzing the Coeficients and P values from the Statsmodels Summary I continued to try and build predictive models.

Analyzing Results:

With the methods I followed I was unable to build a model which generated accurate predicitons. I was however, able to identify some strong trends in the data. To me this indicates that it is possible to model against this data I would just need to adjust parameters in my models.

Conclusions:

Apart from being able to conclude that relationship between home ownership categories and purpose of loan againt number of delinquencies is significant. I am not able to conclude at this time wether or not that information can be used to build an effective model. Given how high some of the values were I do believe that an accurate model exists. I was limited greatly by computational resources and time during this project. Given more time I suspect an SVC model would fit very well.

Project Time Limit was 6 hours

About

Project that was completed in full during one class period. Overall work time was 6 hours. First project in which we chose our own problem and dataset.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published