# Venture Funding with Deep Learning

This notebook will be used to help the business team create a model that predicts whether applicants will be successful if funded by Alphabet Soup.

The business team has provided a CSV containing more than 34,000 organizations that have received funding from Alphabet Soup over the years. Using the provided dataset, this notebook will create a binary classifier model that will predict whether an applicant will become a successful business. The CSV file (lending_data.csv) contains a variety of information about these businesses, including whether or not they ultimately became successful.

It will do so by the following steps:

### Preparing the Data for Use on a Neural Network Model 

### Compiling and Evaluating a Binary Classification Model Using a Neural Network

### Possibly Optimizing the Neural Network Model

### Reviewing the Models

In [1]:
# Imports for the notebook.
import pandas as pd
from pathlib import Path
import tensorflow as tf
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler,OneHotEncoder

---

## Preparing the data to be used on a neural network model

In [2]:
# Reads the applicants_data.csv file from the Resources folder into a Pandas DataFrame.
applicant_data_df = pd.read_csv(Path("Resources/applicants_data.csv"))

# Reviews the DataFrame.
applicant_data_df.head()

Unnamed: 0,EIN,NAME,APPLICATION_TYPE,AFFILIATION,CLASSIFICATION,USE_CASE,ORGANIZATION,STATUS,INCOME_AMT,SPECIAL_CONSIDERATIONS,ASK_AMT,IS_SUCCESSFUL
0,10520599,BLUE KNIGHTS MOTORCYCLE CLUB,T10,Independent,C1000,ProductDev,Association,1,0,N,5000,1
1,10531628,AMERICAN CHESAPEAKE CLUB CHARITABLE TR,T3,Independent,C2000,Preservation,Co-operative,1,1-9999,N,108590,1
2,10547893,ST CLOUD PROFESSIONAL FIREFIGHTERS,T5,CompanySponsored,C3000,ProductDev,Association,1,0,N,5000,0
3,10553066,SOUTHSIDE ATHLETIC ASSOCIATION,T3,CompanySponsored,C2000,Preservation,Trust,1,10000-24999,N,6692,1
4,10556103,GENETIC RESEARCH INSTITUTE OF THE DESERT,T3,Independent,C1000,Heathcare,Trust,1,100000-499999,N,142590,1


In [3]:
# Reviews the data types associated with the columns.
applicant_data_df.dtypes

EIN                        int64
NAME                      object
APPLICATION_TYPE          object
AFFILIATION               object
CLASSIFICATION            object
USE_CASE                  object
ORGANIZATION              object
STATUS                     int64
INCOME_AMT                object
SPECIAL_CONSIDERATIONS    object
ASK_AMT                    int64
IS_SUCCESSFUL              int64
dtype: object

In [4]:
# Drops the 'EIN' and 'NAME' columns from the DataFrame as these are not essential.
applicant_data_df = applicant_data_df.drop(columns = ["EIN", "NAME"])

# Reviews the DataFrame.
applicant_data_df.head()

Unnamed: 0,APPLICATION_TYPE,AFFILIATION,CLASSIFICATION,USE_CASE,ORGANIZATION,STATUS,INCOME_AMT,SPECIAL_CONSIDERATIONS,ASK_AMT,IS_SUCCESSFUL
0,T10,Independent,C1000,ProductDev,Association,1,0,N,5000,1
1,T3,Independent,C2000,Preservation,Co-operative,1,1-9999,N,108590,1
2,T5,CompanySponsored,C3000,ProductDev,Association,1,0,N,5000,0
3,T3,CompanySponsored,C2000,Preservation,Trust,1,10000-24999,N,6692,1
4,T3,Independent,C1000,Heathcare,Trust,1,100000-499999,N,142590,1


In [5]:
# Creates a list of categorical variables.
categorical_variables = list(applicant_data_df.dtypes[applicant_data_df.dtypes == "object"].index)

# Displays the categorical variables list.
categorical_variables

['APPLICATION_TYPE',
 'AFFILIATION',
 'CLASSIFICATION',
 'USE_CASE',
 'ORGANIZATION',
 'INCOME_AMT',
 'SPECIAL_CONSIDERATIONS']

In [6]:
# Creates a OneHotEncoder instance.
enc = OneHotEncoder(sparse = False)

In [7]:
# Encodes the categorcal variables using OneHotEncoder.
encoded_data = enc.fit_transform(applicant_data_df[categorical_variables])

In [8]:
# Creates a DataFrame with the encoded variables.
encoded_df = pd.DataFrame(
    encoded_data,
    columns = enc.get_feature_names_out(categorical_variables))

# Reviews the DataFrame.
encoded_df.head()

Unnamed: 0,APPLICATION_TYPE_T10,APPLICATION_TYPE_T12,APPLICATION_TYPE_T13,APPLICATION_TYPE_T14,APPLICATION_TYPE_T15,APPLICATION_TYPE_T17,APPLICATION_TYPE_T19,APPLICATION_TYPE_T2,APPLICATION_TYPE_T25,APPLICATION_TYPE_T29,...,INCOME_AMT_1-9999,INCOME_AMT_10000-24999,INCOME_AMT_100000-499999,INCOME_AMT_10M-50M,INCOME_AMT_1M-5M,INCOME_AMT_25000-99999,INCOME_AMT_50M+,INCOME_AMT_5M-10M,SPECIAL_CONSIDERATIONS_N,SPECIAL_CONSIDERATIONS_Y
0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0


In [9]:
# Creates a DataFrame with the columnns containing numerical variables from the original dataset.
numerical_variables_df = applicant_data_df.drop(columns = categorical_variables)

# Reviews the DataFrame.
numerical_variables_df.head()

Unnamed: 0,STATUS,ASK_AMT,IS_SUCCESSFUL
0,1,5000,1
1,1,108590,1
2,1,5000,0
3,1,6692,1
4,1,142590,1


In [10]:
# Adds the numerical variables from the original DataFrame to the one-hot encoding DataFrame.
encoded_df = pd.concat([numerical_variables_df, encoded_df], axis = 1)

# Reviews the Dataframe.
encoded_df.head()

Unnamed: 0,STATUS,ASK_AMT,IS_SUCCESSFUL,APPLICATION_TYPE_T10,APPLICATION_TYPE_T12,APPLICATION_TYPE_T13,APPLICATION_TYPE_T14,APPLICATION_TYPE_T15,APPLICATION_TYPE_T17,APPLICATION_TYPE_T19,...,INCOME_AMT_1-9999,INCOME_AMT_10000-24999,INCOME_AMT_100000-499999,INCOME_AMT_10M-50M,INCOME_AMT_1M-5M,INCOME_AMT_25000-99999,INCOME_AMT_50M+,INCOME_AMT_5M-10M,SPECIAL_CONSIDERATIONS_N,SPECIAL_CONSIDERATIONS_Y
0,1,5000,1,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
1,1,108590,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
2,1,5000,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
3,1,6692,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
4,1,142590,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0


In [11]:
# Defines the target, setting y as the IS_SUCCESSFUL column.
y = encoded_df["IS_SUCCESSFUL"]

# Displays a sample of y.
y[:5]

0    1
1    1
2    0
3    1
4    1
Name: IS_SUCCESSFUL, dtype: int64

In [12]:
# Defines the features, setting X by selecting all columns but IS_SUCCESSFUL.
X = encoded_df.drop(columns = ["IS_SUCCESSFUL"]).copy()

# Reviews the features DataFrame.
X.head()

Unnamed: 0,STATUS,ASK_AMT,APPLICATION_TYPE_T10,APPLICATION_TYPE_T12,APPLICATION_TYPE_T13,APPLICATION_TYPE_T14,APPLICATION_TYPE_T15,APPLICATION_TYPE_T17,APPLICATION_TYPE_T19,APPLICATION_TYPE_T2,...,INCOME_AMT_1-9999,INCOME_AMT_10000-24999,INCOME_AMT_100000-499999,INCOME_AMT_10M-50M,INCOME_AMT_1M-5M,INCOME_AMT_25000-99999,INCOME_AMT_50M+,INCOME_AMT_5M-10M,SPECIAL_CONSIDERATIONS_N,SPECIAL_CONSIDERATIONS_Y
0,1,5000,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
1,1,108590,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
2,1,5000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
3,1,6692,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
4,1,142590,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0


In [13]:
# Splits the preprocessed data into a training and testing dataset.
# Assigns the function a random_state equal to 1, to keep information consistent. 
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 1)

In [14]:
# Creates a StandardScaler instance.
scaler = StandardScaler()

# Fits the scaler to the features training dataset.
X_scaler = scaler.fit(X_train)

# Fits the scaler to the features and test training datasets.
X_train_scaled = X_scaler.transform(X_train)
X_test_scaled = X_scaler.transform(X_test)

---

## Compiling and Evaluating the Binary Classification Model Using a Neural Network

In [15]:
# Defines the the number of inputs (features) to the model.
number_input_features = len(X_train.iloc[0])

# Reviews the number of inputs.
number_input_features

116

In [16]:
# Defines the number of neurons in the output layer.
number_output_neurons = 1

In [17]:
# Defines the number of hidden nodes for the first hidden layer.
hidden_nodes_layer1 = (number_input_features + number_output_neurons) // 2

# Reviews the number of hidden nodes in the first layer.
hidden_nodes_layer1

58

In [18]:
# Defines the number of hidden nodes for the second hidden layer.
hidden_nodes_layer2 = (hidden_nodes_layer1 + number_output_neurons) // 2

# Reviews the number of hidden nodes in the second layer.
hidden_nodes_layer2

29

In [19]:
# Creates the Sequential model instance.
nn = Sequential()

In [20]:
# Adds the first hidden layer, input layer, and the activation function to the model.
nn.add(Dense(units = hidden_nodes_layer1, input_dim = number_input_features, activation = "relu"))

# Adds the second hidden layer and the activation function to the model.
nn.add(Dense(units = hidden_nodes_layer2, activation = "relu"))

# Adds the output layer to the model and the activation function.
nn.add(Dense(units = number_output_neurons, activation = "sigmoid"))

In [21]:
# Displays the Sequential model summary.
nn.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 58)                6786      
                                                                 
 dense_1 (Dense)             (None, 29)                1711      
                                                                 
 dense_2 (Dense)             (None, 1)                 30        
                                                                 
Total params: 8,527
Trainable params: 8,527
Non-trainable params: 0
_________________________________________________________________


In [22]:
# Compiles the Sequential model.
nn.compile(loss = "binary_crossentropy", optimizer = "adam", metrics = ["accuracy"])

In [23]:
# Fits the model using 50 epochs and the training data.
fit_model = nn.fit(X_train_scaled, y_train, epochs = 50)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


In [24]:
# Evaluates the model loss and accuracy metrics using the evaluate method and the test data.
model_loss, model_accuracy = nn.evaluate(X_test_scaled, y_test, verbose=2)

# Displays the model loss and accuracy results.
print(f"Loss: {model_loss}, Accuracy: {model_accuracy}")

268/268 - 0s - loss: 0.5537 - accuracy: 0.7326 - 328ms/epoch - 1ms/step
Loss: 0.5537441968917847, Accuracy: 0.7325947284698486


---

In [25]:
# Saving the model.

# Sets the model's file path.
file_path = Path("Resources/AlphabetSoup.h5")

# Exports the model to a HDF5 file in Resources.
nn.save(file_path)

---

## Possibly optimizing the neural network model


In [26]:
# Reviews the previous data for re-working.
applicant_data_df.head()

Unnamed: 0,APPLICATION_TYPE,AFFILIATION,CLASSIFICATION,USE_CASE,ORGANIZATION,STATUS,INCOME_AMT,SPECIAL_CONSIDERATIONS,ASK_AMT,IS_SUCCESSFUL
0,T10,Independent,C1000,ProductDev,Association,1,0,N,5000,1
1,T3,Independent,C2000,Preservation,Co-operative,1,1-9999,N,108590,1
2,T5,CompanySponsored,C3000,ProductDev,Association,1,0,N,5000,0
3,T3,CompanySponsored,C2000,Preservation,Trust,1,10000-24999,N,6692,1
4,T3,Independent,C1000,Heathcare,Trust,1,100000-499999,N,142590,1


In [27]:
# Checks value counts for STATUS category.
applicant_data_df["STATUS"].value_counts()

1    34294
0        5
Name: STATUS, dtype: int64

In [28]:
# Checks vaule counts for SPECIAL_CONSIDERATIONS category.
applicant_data_df["SPECIAL_CONSIDERATIONS"].value_counts()

N    34272
Y       27
Name: SPECIAL_CONSIDERATIONS, dtype: int64

In [29]:
# Removes above categories to test if this might improve the model.
reduced_data_df = applicant_data_df.drop(columns = ["STATUS", "SPECIAL_CONSIDERATIONS"]).copy()

# Reviews reduced DataFrame.
reduced_data_df.head()

Unnamed: 0,APPLICATION_TYPE,AFFILIATION,CLASSIFICATION,USE_CASE,ORGANIZATION,INCOME_AMT,ASK_AMT,IS_SUCCESSFUL
0,T10,Independent,C1000,ProductDev,Association,0,5000,1
1,T3,Independent,C2000,Preservation,Co-operative,1-9999,108590,1
2,T5,CompanySponsored,C3000,ProductDev,Association,0,5000,0
3,T3,CompanySponsored,C2000,Preservation,Trust,10000-24999,6692,1
4,T3,Independent,C1000,Heathcare,Trust,100000-499999,142590,1


In [30]:
# Creates a list of categorical variables.
categorical_variables2 = list(reduced_data_df.dtypes[reduced_data_df.dtypes == "object"].index)

# Displays the categorical variables list.
categorical_variables2

['APPLICATION_TYPE',
 'AFFILIATION',
 'CLASSIFICATION',
 'USE_CASE',
 'ORGANIZATION',
 'INCOME_AMT']

In [31]:
# Encodes the categorcal variables using OneHotEncoder.
encoded_data2 = enc.fit_transform(reduced_data_df[categorical_variables2])

In [32]:
# Creates a DataFrame with the encoded variables.
encoded_df2 = pd.DataFrame(
    encoded_data2,
    columns = enc.get_feature_names_out(categorical_variables2))

# Reviews the DataFrame.
encoded_df2.head()

Unnamed: 0,APPLICATION_TYPE_T10,APPLICATION_TYPE_T12,APPLICATION_TYPE_T13,APPLICATION_TYPE_T14,APPLICATION_TYPE_T15,APPLICATION_TYPE_T17,APPLICATION_TYPE_T19,APPLICATION_TYPE_T2,APPLICATION_TYPE_T25,APPLICATION_TYPE_T29,...,ORGANIZATION_Trust,INCOME_AMT_0,INCOME_AMT_1-9999,INCOME_AMT_10000-24999,INCOME_AMT_100000-499999,INCOME_AMT_10M-50M,INCOME_AMT_1M-5M,INCOME_AMT_25000-99999,INCOME_AMT_50M+,INCOME_AMT_5M-10M
0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0


In [33]:
# Creates a DataFrame with the columnns containing numerical variables from the original dataset.
numerical_variables_df2 = reduced_data_df.drop(columns = categorical_variables2)

# Reviews the DataFrame.
numerical_variables_df2.head()

Unnamed: 0,ASK_AMT,IS_SUCCESSFUL
0,5000,1
1,108590,1
2,5000,0
3,6692,1
4,142590,1


In [34]:
# Adds the numerical variables from the original DataFrame to the one-hot encoding DataFrame.
encoded_df2 = pd.concat([numerical_variables_df2, encoded_df2], axis = 1)

# Reviews the Dataframe.
encoded_df2.head()

Unnamed: 0,ASK_AMT,IS_SUCCESSFUL,APPLICATION_TYPE_T10,APPLICATION_TYPE_T12,APPLICATION_TYPE_T13,APPLICATION_TYPE_T14,APPLICATION_TYPE_T15,APPLICATION_TYPE_T17,APPLICATION_TYPE_T19,APPLICATION_TYPE_T2,...,ORGANIZATION_Trust,INCOME_AMT_0,INCOME_AMT_1-9999,INCOME_AMT_10000-24999,INCOME_AMT_100000-499999,INCOME_AMT_10M-50M,INCOME_AMT_1M-5M,INCOME_AMT_25000-99999,INCOME_AMT_50M+,INCOME_AMT_5M-10M
0,5000,1,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,108590,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,5000,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,6692,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
4,142590,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0


In [35]:
# Defines the features set to X by selecting all columns but IS_SUCCESSFUL.
X2 = encoded_df2.drop(columns = ["IS_SUCCESSFUL"]).copy()

# Reviews the features DataFrame.
X2.head()

Unnamed: 0,ASK_AMT,APPLICATION_TYPE_T10,APPLICATION_TYPE_T12,APPLICATION_TYPE_T13,APPLICATION_TYPE_T14,APPLICATION_TYPE_T15,APPLICATION_TYPE_T17,APPLICATION_TYPE_T19,APPLICATION_TYPE_T2,APPLICATION_TYPE_T25,...,ORGANIZATION_Trust,INCOME_AMT_0,INCOME_AMT_1-9999,INCOME_AMT_10000-24999,INCOME_AMT_100000-499999,INCOME_AMT_10M-50M,INCOME_AMT_1M-5M,INCOME_AMT_25000-99999,INCOME_AMT_50M+,INCOME_AMT_5M-10M
0,5000,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,108590,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,5000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,6692,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
4,142590,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0


In [36]:
# Splits the preprocessed data into a training and testing dataset.
# Assigns the function a random_state equal to 1, to keep information consistent.
X_train2, X_test2, y_train2, y_test2 = train_test_split(X2, y, random_state = 1)

In [37]:
# Fits the scaler to the features training dataset.
X_scaler = scaler.fit(X_train2)

# Fits the scaler to the features training and testing dataset.
X_train_scaled2 = X_scaler.transform(X_train2)
X_test_scaled2 = X_scaler.transform(X_test2)

### Alternative Model 1

In [38]:
# Defines the the number of inputs (features) to the model.
number_input_features2 = len(X_train2.iloc[0])

# Reviews the number of inputs.
number_input_features2

113

In [39]:
# Defines the number of neurons in the output layer.
number_output_neurons_A1 = 1

In [40]:
# Defines the number of hidden nodes for the first hidden layer.
hidden_nodes_layer1_A1 = (number_input_features2 + number_output_neurons_A1) // 1.5

# Reviews the number of hidden nodes in the first layer.
hidden_nodes_layer1_A1

76.0

In [41]:
# Defines the number of hidden nodes for the second hidden layer.
hidden_nodes_layer2_A1 = (hidden_nodes_layer1_A1 + number_output_neurons_A1) // 2

# Reviews the number of hidden nodes in the second layer.
hidden_nodes_layer2_A1

38.0

In [42]:
# Defines the number of hidden nodes for the third hidden layer.
hidden_nodes_layer3_A1 = (hidden_nodes_layer2_A1 + number_output_neurons_A1) // 2.5

# Reviews the number of hidden nodes in the third layer.
hidden_nodes_layer3_A1

15.0

In [43]:
# Creates the Sequential model instance.
nn_A1 = Sequential()

In [44]:
# Adds the first hidden layer, the inputs, and the activation function to the model.
nn_A1.add(Dense(units = hidden_nodes_layer1_A1, input_dim = number_input_features2, activation = "relu"))

# Adds the second hidden layer and the activation function to the model.
nn_A1.add(Dense(units = hidden_nodes_layer2_A1, activation = "relu"))

# Adds the third hidden layer  and the activation function to the model.
nn_A1.add(Dense(units = hidden_nodes_layer3_A1, activation = "relu"))

# Adds the output layer and the activation function to the model.
nn_A1.add(Dense(units = number_output_neurons_A1, activation = "sigmoid"))

# Checks the structure of the model.
nn_A1.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_3 (Dense)             (None, 76)                8664      
                                                                 
 dense_4 (Dense)             (None, 38)                2926      
                                                                 
 dense_5 (Dense)             (None, 15)                585       
                                                                 
 dense_6 (Dense)             (None, 1)                 16        
                                                                 
Total params: 12,191
Trainable params: 12,191
Non-trainable params: 0
_________________________________________________________________


In [45]:
# Compiles the Sequential model.
nn_A1.compile(loss = "binary_crossentropy", optimizer = "adam", metrics = ["accuracy"])

In [46]:
# Fits the model using 50 epochs and the training data.
fit_model_A1 = nn_A1.fit(X_train_scaled2, y_train2, epochs = 50)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


In [47]:
# Evaluates the model loss and accuracy metrics using the evaluate method and the test data.
model_loss2, model_accuracy2 = nn_A1.evaluate(X_test_scaled2, y_test2, verbose = 2)

# Displays the model loss and accuracy results.
print(f"Loss: {model_loss2}, Accuracy: {model_accuracy2}")

268/268 - 0s - loss: 0.5575 - accuracy: 0.7305 - 320ms/epoch - 1ms/step
Loss: 0.557512640953064, Accuracy: 0.7304956316947937


#### Alternative Model 2

In [48]:
# Reduces the variables to possible "relevant" variables.
categorical_variables3 = ["APPLICATION_TYPE", "CLASSIFICATION", "ORGANIZATION", "INCOME_AMT"]

# Encodes the categorcal variables using OneHotEncoder.
encoded_data3 = enc.fit_transform(reduced_data_df[categorical_variables3])

In [49]:
# Creates a DataFrame with the encoded variables.
encoded_df3 = pd.DataFrame(
    encoded_data3,
    columns = enc.get_feature_names_out(categorical_variables3))

# Reviews the DataFrame.
encoded_df3.head()

Unnamed: 0,APPLICATION_TYPE_T10,APPLICATION_TYPE_T12,APPLICATION_TYPE_T13,APPLICATION_TYPE_T14,APPLICATION_TYPE_T15,APPLICATION_TYPE_T17,APPLICATION_TYPE_T19,APPLICATION_TYPE_T2,APPLICATION_TYPE_T25,APPLICATION_TYPE_T29,...,ORGANIZATION_Trust,INCOME_AMT_0,INCOME_AMT_1-9999,INCOME_AMT_10000-24999,INCOME_AMT_100000-499999,INCOME_AMT_10M-50M,INCOME_AMT_1M-5M,INCOME_AMT_25000-99999,INCOME_AMT_50M+,INCOME_AMT_5M-10M
0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0


In [50]:
# Defines the features and sets to X.
X3 = encoded_df3

In [51]:
# Splits the preprocessed data into a training and testing dataset.
# Assigns the function a random_state equal to 1, to keep information consistent.
X_train3, X_test3, y_train3, y_test3 = train_test_split(X3, y, random_state = 1)

In [52]:
# Defines the the number of inputs (features) to the model.
number_input_features3 = len(X_train3.iloc[0])

# Reviews the number of features.
number_input_features3

101

In [53]:
# Defines the number of neurons in the output layer.
number_output_neurons_A2 = 1

In [54]:
# Defines the number of hidden nodes for the first hidden layer.
hidden_nodes_layer1_A2 = (number_input_features3 + number_output_neurons_A2) * 2

# Reviews the number of hidden nodes in the first layer.
hidden_nodes_layer1_A2

204

In [55]:
# Defines the number of hidden nodes for the second hidden layer.
hidden_nodes_layer2_A2 = (number_input_features3 + number_output_neurons_A2) * 3

# Reviews the number of hidden nodes in the second layer.
hidden_nodes_layer2_A2

306

In [56]:
# Defines the number of hidden nodes for the third hidden layer.
hidden_nodes_layer3_A2 = hidden_nodes_layer1_A2

# Reviews the number of hidden nodes in the third layer.
hidden_nodes_layer3_A2

204

In [57]:
# Defines the number of hidden nodes for the fourth hidden layer.
hidden_nodes_layer4_A2 = number_input_features3

# Reviews the number of hidden nodes in the fourth layer.
hidden_nodes_layer4_A2

101

In [58]:
# Creates the Sequential model instance.
nn_A2 = Sequential()

In [59]:
# Adds the first hidden layer, the input layer, and the activation function to the model.
nn_A2.add(Dense(units = hidden_nodes_layer1_A2, input_dim = number_input_features3, activation = "relu"))

# Adds the second hidden layer and the activation function to the model.
nn_A2.add(Dense(units = hidden_nodes_layer2_A2, activation = "relu"))

# Adds the third hidden layer and the activation function to the model.
nn_A2.add(Dense(units = hidden_nodes_layer3_A2, activation = "relu"))

# Adds the fourth hidden layer and the activation function to the model.
nn_A2.add(Dense(units = hidden_nodes_layer4_A2, activation = "relu"))

# Adds the output layer and the activation function to the model.
nn_A2.add(Dense(units = number_output_neurons_A2, activation = "sigmoid"))

# Checks the structure of the model.
nn_A2.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_7 (Dense)             (None, 204)               20808     
                                                                 
 dense_8 (Dense)             (None, 306)               62730     
                                                                 
 dense_9 (Dense)             (None, 204)               62628     
                                                                 
 dense_10 (Dense)            (None, 101)               20705     
                                                                 
 dense_11 (Dense)            (None, 1)                 102       
                                                                 
Total params: 166,973
Trainable params: 166,973
Non-trainable params: 0
_________________________________________________________________


In [60]:
# Compiles the model.
nn_A2.compile(loss = "binary_crossentropy", optimizer = "adam", metrics = ["accuracy"])

In [61]:
# Fits the model and runs it for 60 epochs.
fit_model_A2 = nn_A2.fit(X_train3, y_train3, epochs = 60)

Epoch 1/60
Epoch 2/60
Epoch 3/60
Epoch 4/60
Epoch 5/60
Epoch 6/60
Epoch 7/60
Epoch 8/60
Epoch 9/60
Epoch 10/60
Epoch 11/60
Epoch 12/60
Epoch 13/60
Epoch 14/60
Epoch 15/60
Epoch 16/60
Epoch 17/60
Epoch 18/60
Epoch 19/60
Epoch 20/60
Epoch 21/60
Epoch 22/60
Epoch 23/60
Epoch 24/60
Epoch 25/60
Epoch 26/60
Epoch 27/60
Epoch 28/60
Epoch 29/60
Epoch 30/60
Epoch 31/60
Epoch 32/60
Epoch 33/60
Epoch 34/60
Epoch 35/60
Epoch 36/60
Epoch 37/60
Epoch 38/60
Epoch 39/60
Epoch 40/60
Epoch 41/60
Epoch 42/60
Epoch 43/60
Epoch 44/60
Epoch 45/60
Epoch 46/60
Epoch 47/60
Epoch 48/60
Epoch 49/60
Epoch 50/60
Epoch 51/60
Epoch 52/60
Epoch 53/60
Epoch 54/60
Epoch 55/60
Epoch 56/60
Epoch 57/60
Epoch 58/60
Epoch 59/60
Epoch 60/60


In [62]:
# Evaluates the model loss and accuracy metrics using the evaluate method and the test data.
model_loss3, model_accuracy3 = nn_A2.evaluate(X_test3, y_test3, verbose = 2)

# Displays the model loss and accuracy results.
print(f"Loss: {model_loss3}, Accuracy: {model_accuracy3}")

268/268 - 0s - loss: 0.6540 - accuracy: 0.6570 - 415ms/epoch - 2ms/step
Loss: 0.6539509892463684, Accuracy: 0.6570262312889099


---

## Reviewing the Models

In [63]:
# Reviews all the models.

# Starting with Original.
print("Original Model Results")

# Evaluates the model loss and accuracy metrics using the evaluate method and the test data.
model_loss, model_accuracy = nn.evaluate(X_test_scaled, y_test, verbose = 2)

# Displays the model loss and accuracy results.
print(f"Loss: {model_loss}, Accuracy: {model_accuracy}")

Original Model Results
268/268 - 0s - loss: 0.5537 - accuracy: 0.7326 - 251ms/epoch - 937us/step
Loss: 0.5537441968917847, Accuracy: 0.7325947284698486


In [64]:
# Reviews the first Alternative Model.
print("Alternative Model 1 Results")

# Evaluates the model loss and accuracy metrics using the evaluate method and the test data.
model_loss2, model_accuracy2 = nn_A1.evaluate(X_test_scaled2, y_test2, verbose = 2)

# Displays the model loss and accuracy results.
print(f"Loss: {model_loss2}, Accuracy: {model_accuracy2}")

Alternative Model 1 Results
268/268 - 0s - loss: 0.5575 - accuracy: 0.7305 - 232ms/epoch - 866us/step
Loss: 0.557512640953064, Accuracy: 0.7304956316947937


In [65]:
# Reviews the second Alternative Model.
print("Alternative Model 2 Results")

# Evaluates the model loss and accuracy metrics using the evaluate method and the test data.
model_loss3, model_accuracy3 = nn_A2.evaluate(X_test3, y_test3, verbose = 2)

# Displays the model loss and accuracy results.
print(f"Loss: {model_loss3}, Accuracy: {model_accuracy3}")

Alternative Model 2 Results
268/268 - 0s - loss: 0.6540 - accuracy: 0.6570 - 315ms/epoch - 1ms/step
Loss: 0.6539509892463684, Accuracy: 0.6570262312889099


---

In [66]:
# Saving the Alternative Models.

In [67]:
# Sets the file path for the first alternative model.
file_path = Path("Resources/AlphabetSoup_Alt1.h5")

# Exports the model to a HDF5 file.
nn_A1.save(file_path)

In [68]:
# Sets the file path for the second alternative model.
file_path = Path("Resources/AlphabetSoup_Alt2.h5")

# Exports the model to a HDF5 file.
nn_A2.save(file_path)