In [1]:
# Import the necessary libraries
import pandas as pd
import numpy as np
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

In [2]:
# Read the CSV file into a DataFrame
application_df = pd.read_csv('Resources/charity_data.csv')
application_df.nunique()

EIN                       34299
NAME                      19568
APPLICATION_TYPE             17
AFFILIATION                   6
CLASSIFICATION               71
USE_CASE                      5
ORGANIZATION                  4
STATUS                        2
INCOME_AMT                    9
SPECIAL_CONSIDERATIONS        2
ASK_AMT                    8747
IS_SUCCESSFUL                 2
dtype: int64

In [3]:
# Split the data into features (X) and target (y)
X = application_df.drop(['IS_SUCCESSFUL'], axis=1)
y = application_df['IS_SUCCESSFUL']

# Apply one-hot encoding to categorical variables
X = pd.get_dummies(X)

# Split the preprocessed data into a training and testing dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)

In [4]:
# Step 2: Model Architecture
# Define the model architecture
nn = tf.keras.models.Sequential()

# Add hidden layers
nn.add(tf.keras.layers.Dense(units=128, activation='relu', input_dim=X_train.shape[1]))
nn.add(tf.keras.layers.Dense(units=64, activation='relu'))

# Add output layer
nn.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))

# Compile the model
nn.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

In [5]:
# Step 3: Training the Model
# Create a StandardScaler instance
scaler = StandardScaler()

# Fit the StandardScaler to the training data
X_scaler = scaler.fit(X_train)

# Scale the features data
X_train_scaled = X_scaler.transform(X_train)
X_test_scaled = X_scaler.transform(X_test)

In [9]:
# Train the model
model = nn.fit(X_train_scaled, y_train, epochs=100)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

Epoch 80/100
Epoch 81/100
Epoch 82/100
Epoch 83/100
Epoch 84/100
Epoch 85/100
Epoch 86/100
Epoch 87/100
Epoch 88/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100


In [10]:
# Step 4: Evaluate the Model
model_loss, model_accuracy = nn.evaluate(X_test_scaled, y_test)
print(f"Loss: {model_loss}, Accuracy: {model_accuracy}")

Loss: 1.1440248489379883, Accuracy: 0.6587755084037781


In [11]:
# Step 5: Save the Model
nn.save('AlphabetSoupCharity_Optimisation2.h5')

After evaluating the model using the test data, the results are as follows:

The model achieved a loss of 1.1440 and an accuracy of 0.6588. This means that the model correctly predicted the outcome for approximately 65.88% of the test samples.

While the accuracy is above the 50% mark, indicating that the model performs better than random guessing, there is still room for improvement.

To optimize the model further and aim for a target predictive accuracy higher than 75%, the following steps can be taken:

1. Adjust the model architecture: Experiment with adding more hidden layers or increasing the number of neurons in each layer. This allows the model to capture more complex patterns in the data.

2. Try different activation functions: The choice of activation functions in the hidden layers can have an impact on the model's performance. Experiment with different activation functions such as 'relu', 'sigmoid', or 'tanh' to see if they improve the accuracy.

3. Increase the number of epochs: Training the model for more epochs allows it to learn from the data for a longer period. Gradually increasing the number of epochs and monitoring the model's performance can help identify the optimal training duration.

4. Feature engineering: Explore the dataset and consider creating new features or transforming existing ones to better represent the underlying patterns. This could involve binning numerical features, encoding categorical variables differently, or performing other transformations.

By implementing these optimization strategies and fine-tuning the model, it is possible to achieve a target predictive accuracy higher than 75%.

In summary, the initial model achieved an accuracy of 65.88%, indicating that it performs better than random guessing. However, further optimization is necessary to reach the target accuracy. By adjusting the model architecture, activation functions, and training duration, the model's performance can be improved. Additionally, feature engineering techniques can be employed to enhance the representation of the data. With these improvements, the model can be better equipped to solve the classification problem at hand.

Analysis: Predicting Successful Donations with Deep Learning

1. Introduction

The purpose of this analysis is to develop a deep learning model that can predict whether a donation request made to Alphabet Soup, a fictional charity organization, will be successful or not. By training the model on historical data, the goal is to achieve a predictive accuracy higher than 75% to assist Alphabet Soup in identifying the most promising donation opportunities.

2. Data Preprocessing

- Target variable: The target variable for our model is "IS_SUCCESSFUL," which indicates whether a donation request was successful (1) or not (0).
- Features: The features used to predict the target variable include variables such as "APPLICATION_TYPE," "AFFILIATION," "USE_CASE," and others.
- Excluded variables: Variables that are neither targets nor features, such as unique identifiers or irrelevant metadata, are removed from the input data during preprocessing.

3. Compiling, Training, and Evaluating the Model

- Model architecture: The neural network model consists of multiple layers, with a varying number of neurons in each layer. Activation functions are used to introduce non-linearity into the model, allowing it to learn complex patterns in the data.
- Optimization attempts: Initially, the model achieved an accuracy of 65.88%, which is below the target of 75%. To improve performance, the model architecture, activation functions, and training duration can be adjusted.
- Results:
  - Loss: The loss value indicates the amount of error the model made during training. Lower values indicate better performance.
  - Accuracy: The accuracy metric measures the percentage of correctly predicted outcomes. Higher values indicate better performance.
- Summary: The initial model achieved an accuracy of 65.88%, indicating its ability to perform better than random guessing. However, further optimization is required to meet the target accuracy of over 75%.

4. Using a Different Model

To solve the classification problem of predicting successful donations, an alternative model that could be considered is the Random Forest algorithm. Random Forest is an ensemble learning method that combines multiple decision trees to make predictions. It offers several advantages:

- Handling categorical variables: Random Forest can handle categorical variables directly without the need for explicit encoding, making it suitable for datasets with multiple categorical features.
- Feature importance: The algorithm provides a measure of feature importance, allowing us to identify the variables that contribute the most to the prediction.
- Non-linear relationships: Random Forest can capture non-linear relationships between features and the target variable.
- Robustness to outliers: The algorithm is less sensitive to outliers compared to some other models, making it more resilient to noise in the data.

By using a Random Forest model, we can leverage its ability to handle categorical variables and capture complex relationships to potentially improve the prediction accuracy for successful donations.

In conclusion, the initial deep learning model achieved moderate accuracy but fell short of the target performance. By adjusting the model architecture, activation functions, and training duration, along with exploring alternative models such as Random Forest, we can aim for higher accuracy in predicting successful donations for Alphabet Soup. The insights gained from this analysis can guide the organization in making informed decisions about donation opportunities and maximize their impact in the real world.