The purpose of this analysis is to create a deep-learning neural network by using the features in the provided dataset to help create a binary classifier that is capable of predicting whether applicants will be successful if funded by Alphabet Soup.
-
Software:
- Jupyter Notebook 6.4.6
- Machine Learning
- Python
- scikit-learn library
- tensorflow library
- Python
-
Data source:
-
-
This dataset containes more than 34,000 organizations that have received funding from Alphabet Soup over the years. Within this dataset are a number of columns that capture metadata about each organization, such as the following:
EIN
andNAME
— Identification columnsAPPLICATION_TYPE
— Alphabet Soup application typeAFFILIATION
— Affiliated sector of industryCLASSIFICATION
— Government organization classificationUSE_CASE
— Use case for fundingORGANIZATION
— Organization typeSTATUS
— Active statusINCOME_AMT
— Income classificationSPECIAL_CONSIDERATIONS
— Special consideration for applicationASK_AMT
— Funding amount requestedIS_SUCCESSFUL
— Was the money used effectively
-
-
-
The model's target is the predicted outcome, or dependent variable, defined by:
IS_SUCCESSFUL
column: this column is a binary classifier that is capable of predicting whether applicants will be successful if funded by Alphabet Soup.
-
The model's features are the variables used to make a prediction, or independent variables, defined by:
APPLICATION_TYPE
AFFILIATION
CLASSIFICATION
USE_CASE
ORGANIZATION
STATUS
INCOME_AMT
SPECIAL_CONSIDERATIONS
ASK_AMT
-
There are 2 features that should be removed from the input data, namely:
- For the first model, I chose 2 hidden layers with the 80 and 30 neurons respectively.
-
2 hidden layers, this is because the additional layer was redundant or may increase the change of overfitting the train data.
-
The number of neurons was considered from a good rule of thumb to have 2 to 3 times the amout of neurons in the hidden layers as the number of inputs.
-
The activation functions:
- I selected the relu activation function for those 2 hidden layers which is ideal when looking at positive nonlinear input data.
- and selected the sigmoid activation function for the output layer because it is ideal for binary classification which will help us predict the probability of whether applicants will be successful.
-
** Optimizing the model: in order to achieve a target predictive accuracy higher than 75% from the original 73%
-
Besides bucketing or binning the
APPLICATION_TYPE
andCLASSIFICATION
columns, I also binned theASK_AMT
column because the number of unique values in this column is pretty high and too different from the others.-
Then, I made multiple attempts at optimizing the model and each attempt was slightly less optimal than the original state.
-
Original result: approximately 73%
-
The 1st attempt:
-
The 2nd attempt:
-
The 3rd attempt:
-
-
The overall results of the deep learning model was that I was not able to achieve the target model performance of 75% after I tried multiple attempts. Adjusting the input data by dropping more columns, creating more bins for rare occurrences in columns, adding more neurons to a hidden layer, adding more hidden layers, and using different activation functions for the hidden layers all did not result in a better model performance.
My recommendation on using a different model to solve the classification problem would be to try other types of supervised learning such as Logistic Regression, Support Vector Machine (SVM), and Random Forest. In my opinion, in order to achieve a target predictive accuracy I would consider a Random Forest model more than the others. The random forest models are robust against overfitting as all of those weak learners are trained on different pieces of the data and robust to outliers and nonlinear data as well.