# Venture Funding with Deep Learning

You work as a risk management associate at Alphabet Soup, a venture capital firm. Alphabet Soup’s business team receives many funding applications from startups every day. This team has asked you to help them create a model that predicts whether applicants will be successful if funded by Alphabet Soup.

The business team has given you a CSV containing more than 34,000 organizations that have received funding from Alphabet Soup over the years. With your knowledge of machine learning and neural networks, you decide to use the features in the provided dataset to create a binary classifier model that will predict whether an applicant will become a successful business. The CSV file contains a variety of information about these businesses, including whether or not they ultimately became successful.

## Instructions:

The steps for this challenge are broken out into the following sections:

* Prepare the data for use on a neural network model.

* Compile and evaluate a binary classification model using a neural network.

* Optimize the neural network model.

### Prepare the Data for Use on a Neural Network Model 

Using your knowledge of Pandas and scikit-learn’s `StandardScaler()`, preprocess the dataset so that you can use it to compile and evaluate the neural network model later.

Open the starter code file, and complete the following data preparation steps:

1. Read the `applicants_data.csv` file into a Pandas DataFrame. Review the DataFrame, looking for categorical variables that will need to be encoded, as well as columns that could eventually define your features and target variables.   

2. Drop the “EIN” (Employer Identification Number) and “NAME” columns from the DataFrame, because they are not relevant to the binary classification model.
 
3. Encode the dataset’s categorical variables using `OneHotEncoder`, and then place the encoded variables into a new DataFrame.

4. Add the original DataFrame’s numerical variables to the DataFrame containing the encoded variables.

> **Note** To complete this step, you will employ the Pandas `concat()` function that was introduced earlier in this course. 

5. Using the preprocessed data, create the features (`X`) and target (`y`) datasets. The target dataset should be defined by the preprocessed DataFrame column “IS_SUCCESSFUL”. The remaining columns should define the features dataset. 

6. Split the features and target sets into training and testing datasets.

7. Use scikit-learn's `StandardScaler` to scale the features data.

### Compile and Evaluate a Binary Classification Model Using a Neural Network

Use your knowledge of TensorFlow to design a binary classification deep neural network model. This model should use the dataset’s features to predict whether an Alphabet Soup&ndash;funded startup will be successful based on the features in the dataset. Consider the number of inputs before determining the number of layers that your model will contain or the number of neurons on each layer. Then, compile and fit your model. Finally, evaluate your binary classification model to calculate the model’s loss and accuracy. 
 
To do so, complete the following steps:

1. Create a deep neural network by assigning the number of input features, the number of layers, and the number of neurons on each layer using Tensorflow’s Keras.

> **Hint** You can start with a two-layer deep neural network model that uses the `relu` activation function for both layers.

2. Compile and fit the model using the `binary_crossentropy` loss function, the `adam` optimizer, and the `accuracy` evaluation metric.

> **Hint** When fitting the model, start with a small number of epochs, such as 20, 50, or 100.

3. Evaluate the model using the test data to determine the model’s loss and accuracy.

4. Save and export your model to an HDF5 file, and name the file `AlphabetSoup.h5`. 

### Optimize the Neural Network Model

Using your knowledge of TensorFlow and Keras, optimize your model to improve the model's accuracy. Even if you do not successfully achieve a better accuracy, you'll need to demonstrate at least two attempts to optimize the model. You can include these attempts in your existing notebook. Or, you can make copies of the starter notebook in the same folder, rename them, and code each model optimization in a new notebook. 

> **Note** You will not lose points if your model does not achieve a high accuracy, as long as you make at least two attempts to optimize the model.

To do so, complete the following steps:

1. Define at least three new deep neural network models (the original plus 2 optimization attempts). With each, try to improve on your first model’s predictive accuracy.

> **Rewind** Recall that perfect accuracy has a value of 1, so accuracy improves as its value moves closer to 1. To optimize your model for a predictive accuracy as close to 1 as possible, you can use any or all of the following techniques:
>
> * Adjust the input data by dropping different features columns to ensure that no variables or outliers confuse the model.
>
> * Add more neurons (nodes) to a hidden layer.
>
> * Add more hidden layers.
>
> * Use different activation functions for the hidden layers.
>
> * Add to or reduce the number of epochs in the training regimen.

2. After finishing your models, display the accuracy scores achieved by each model, and compare the results.

3. Save each of your models as an HDF5 file.


In [59]:
# For Apple Silicon (M1) machine, created virtual environment for machine learning and tensorflow libraries, including tensorflow-metal, according to Apple's recommendations:
# https://developer.apple.com/metal/tensorflow-plugin/

In [60]:
# Imports
import pandas as pd
from pathlib import Path
import tensorflow as tf
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler,OneHotEncoder
import numpy

---

## Prepare the data to be used on a neural network model

### Step 1: Read the `applicants_data.csv` file into a Pandas DataFrame. Review the DataFrame, looking for categorical variables that will need to be encoded, as well as columns that could eventually define your features and target variables.  


In [61]:
# Read the applicants_data.csv file from the Resources folder into a Pandas DataFrame
applicant_data_df = pd.read_csv(Path('Resources/applicants_data.csv'))

# Review the DataFrame
display(applicant_data_df.info(), applicant_data_df)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 34299 entries, 0 to 34298
Data columns (total 12 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   EIN                     34299 non-null  int64 
 1   NAME                    34299 non-null  object
 2   APPLICATION_TYPE        34299 non-null  object
 3   AFFILIATION             34299 non-null  object
 4   CLASSIFICATION          34299 non-null  object
 5   USE_CASE                34299 non-null  object
 6   ORGANIZATION            34299 non-null  object
 7   STATUS                  34299 non-null  int64 
 8   INCOME_AMT              34299 non-null  object
 9   SPECIAL_CONSIDERATIONS  34299 non-null  object
 10  ASK_AMT                 34299 non-null  int64 
 11  IS_SUCCESSFUL           34299 non-null  int64 
dtypes: int64(4), object(8)
memory usage: 3.1+ MB


None

Unnamed: 0,EIN,NAME,APPLICATION_TYPE,AFFILIATION,CLASSIFICATION,USE_CASE,ORGANIZATION,STATUS,INCOME_AMT,SPECIAL_CONSIDERATIONS,ASK_AMT,IS_SUCCESSFUL
0,10520599,BLUE KNIGHTS MOTORCYCLE CLUB,T10,Independent,C1000,ProductDev,Association,1,0,N,5000,1
1,10531628,AMERICAN CHESAPEAKE CLUB CHARITABLE TR,T3,Independent,C2000,Preservation,Co-operative,1,1-9999,N,108590,1
2,10547893,ST CLOUD PROFESSIONAL FIREFIGHTERS,T5,CompanySponsored,C3000,ProductDev,Association,1,0,N,5000,0
3,10553066,SOUTHSIDE ATHLETIC ASSOCIATION,T3,CompanySponsored,C2000,Preservation,Trust,1,10000-24999,N,6692,1
4,10556103,GENETIC RESEARCH INSTITUTE OF THE DESERT,T3,Independent,C1000,Heathcare,Trust,1,100000-499999,N,142590,1
...,...,...,...,...,...,...,...,...,...,...,...,...
34294,996009318,THE LIONS CLUB OF HONOLULU KAMEHAMEHA,T4,Independent,C1000,ProductDev,Association,1,0,N,5000,0
34295,996010315,INTERNATIONAL ASSOCIATION OF LIONS CLUBS,T4,CompanySponsored,C3000,ProductDev,Association,1,0,N,5000,0
34296,996012607,PTA HAWAII CONGRESS,T3,CompanySponsored,C2000,Preservation,Association,1,0,N,5000,0
34297,996015768,AMERICAN FEDERATION OF GOVERNMENT EMPLOYEES LO...,T5,Independent,C3000,ProductDev,Association,1,0,N,5000,1


In [62]:
# Review the data types associated with the columns
print(applicant_data_df.dtypes)

EIN                        int64
NAME                      object
APPLICATION_TYPE          object
AFFILIATION               object
CLASSIFICATION            object
USE_CASE                  object
ORGANIZATION              object
STATUS                     int64
INCOME_AMT                object
SPECIAL_CONSIDERATIONS    object
ASK_AMT                    int64
IS_SUCCESSFUL              int64
dtype: object


### Step 2: Drop the “EIN” (Employer Identification Number) and “NAME” columns from the DataFrame, because they are not relevant to the binary classification model.

In [63]:
# Drop the 'EIN' and 'NAME' columns from the DataFrame
applicant_data_df = applicant_data_df.drop(columns=['EIN', 'NAME'])

# Review the DataFrame
display(applicant_data_df)
# We have 9 features, or explanatory variables, and 1 target variable, in our final source dataset, prior to any encoding

Unnamed: 0,APPLICATION_TYPE,AFFILIATION,CLASSIFICATION,USE_CASE,ORGANIZATION,STATUS,INCOME_AMT,SPECIAL_CONSIDERATIONS,ASK_AMT,IS_SUCCESSFUL
0,T10,Independent,C1000,ProductDev,Association,1,0,N,5000,1
1,T3,Independent,C2000,Preservation,Co-operative,1,1-9999,N,108590,1
2,T5,CompanySponsored,C3000,ProductDev,Association,1,0,N,5000,0
3,T3,CompanySponsored,C2000,Preservation,Trust,1,10000-24999,N,6692,1
4,T3,Independent,C1000,Heathcare,Trust,1,100000-499999,N,142590,1
...,...,...,...,...,...,...,...,...,...,...
34294,T4,Independent,C1000,ProductDev,Association,1,0,N,5000,0
34295,T4,CompanySponsored,C3000,ProductDev,Association,1,0,N,5000,0
34296,T3,CompanySponsored,C2000,Preservation,Association,1,0,N,5000,0
34297,T5,Independent,C3000,ProductDev,Association,1,0,N,5000,1


### Step 3: Encode the dataset’s categorical variables using `OneHotEncoder`, and then place the encoded variables into a new DataFrame.

In [64]:
# Create a list of categorical variables, filtering out numerical fields and leaving categorical fields
cat_variables = list(applicant_data_df.dtypes[applicant_data_df.dtypes=='object'].index)
#categorical_variables = applicant_data_df.dtypes.loc[applicant_data_df.dtypes=='object'].index #Alternative formulation using .loc, does not appear advantageous

# Display the categorical variables list
display(cat_variables)

['APPLICATION_TYPE',
 'AFFILIATION',
 'CLASSIFICATION',
 'USE_CASE',
 'ORGANIZATION',
 'INCOME_AMT',
 'SPECIAL_CONSIDERATIONS']

In [65]:
# Create a OneHotEncoder instance
encoder = OneHotEncoder()

In [66]:
# Review the applicant categorical DataFrame
applicant_cat_data_df = applicant_data_df[cat_variables]
display(applicant_cat_data_df)

# Encode the categorical DataFrame using OneHotEncoder
applicant_cat_data_encoded = encoder.fit_transform(applicant_cat_data_df)
print(f"Categorical DataFrame shape: {applicant_cat_data_encoded.shape}")

Unnamed: 0,APPLICATION_TYPE,AFFILIATION,CLASSIFICATION,USE_CASE,ORGANIZATION,INCOME_AMT,SPECIAL_CONSIDERATIONS
0,T10,Independent,C1000,ProductDev,Association,0,N
1,T3,Independent,C2000,Preservation,Co-operative,1-9999,N
2,T5,CompanySponsored,C3000,ProductDev,Association,0,N
3,T3,CompanySponsored,C2000,Preservation,Trust,10000-24999,N
4,T3,Independent,C1000,Heathcare,Trust,100000-499999,N
...,...,...,...,...,...,...,...
34294,T4,Independent,C1000,ProductDev,Association,0,N
34295,T4,CompanySponsored,C3000,ProductDev,Association,0,N
34296,T3,CompanySponsored,C2000,Preservation,Association,0,N
34297,T5,Independent,C3000,ProductDev,Association,0,N


Categorical DataFrame shape: (34299, 114)


In [67]:
# Create a DataFrame with the encoded variables
applicant_cat_data_encoded_df = pd.DataFrame(applicant_cat_data_encoded.todense(), columns=encoder.get_feature_names_out(cat_variables)) # 'todense() required as it appears pandas requires dense, not sparse input

# Review the DataFrame
display(applicant_cat_data_encoded_df)

Unnamed: 0,APPLICATION_TYPE_T10,APPLICATION_TYPE_T12,APPLICATION_TYPE_T13,APPLICATION_TYPE_T14,APPLICATION_TYPE_T15,APPLICATION_TYPE_T17,APPLICATION_TYPE_T19,APPLICATION_TYPE_T2,APPLICATION_TYPE_T25,APPLICATION_TYPE_T29,...,INCOME_AMT_1-9999,INCOME_AMT_10000-24999,INCOME_AMT_100000-499999,INCOME_AMT_10M-50M,INCOME_AMT_1M-5M,INCOME_AMT_25000-99999,INCOME_AMT_50M+,INCOME_AMT_5M-10M,SPECIAL_CONSIDERATIONS_N,SPECIAL_CONSIDERATIONS_Y
0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
34294,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
34295,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
34296,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
34297,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0


### Step 4: Add the original DataFrame’s numerical variables to the DataFrame containing the encoded variables.

> **Note** To complete this step, you will employ the Pandas `concat()` function that was introduced earlier in this course. 

In [68]:
# Create DataFrame for numerical fields only
applicant_num_data_df = applicant_data_df.drop(columns=cat_variables)
#display(applicant_num_data_df)

# Add the numerical variables from the original DataFrame to the one-hot encoding DataFrame
applicant_encoded_df = pd.concat([applicant_num_data_df, applicant_cat_data_encoded_df], axis='columns')

# Review the DataFrame
display(applicant_encoded_df)

Unnamed: 0,STATUS,ASK_AMT,IS_SUCCESSFUL,APPLICATION_TYPE_T10,APPLICATION_TYPE_T12,APPLICATION_TYPE_T13,APPLICATION_TYPE_T14,APPLICATION_TYPE_T15,APPLICATION_TYPE_T17,APPLICATION_TYPE_T19,...,INCOME_AMT_1-9999,INCOME_AMT_10000-24999,INCOME_AMT_100000-499999,INCOME_AMT_10M-50M,INCOME_AMT_1M-5M,INCOME_AMT_25000-99999,INCOME_AMT_50M+,INCOME_AMT_5M-10M,SPECIAL_CONSIDERATIONS_N,SPECIAL_CONSIDERATIONS_Y
0,1,5000,1,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
1,1,108590,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
2,1,5000,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
3,1,6692,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
4,1,142590,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
34294,1,5000,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
34295,1,5000,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
34296,1,5000,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
34297,1,5000,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0


### Step 5: Using the preprocessed data, create the features (`X`) and target (`y`) datasets. The target dataset should be defined by the preprocessed DataFrame column “IS_SUCCESSFUL”. The remaining columns should define the features dataset. 



In [69]:
#Create a list for the target variable field name(s)
y_target = ['IS_SUCCESSFUL']

# Define the target set y using the IS_SUCCESSFUL column
y = applicant_encoded_df[y_target]

# Display a sample of y
display(y)
display(y.value_counts())
# Observed target variable, y, is already numerically encoded, therefore can forgo encoding via something like LabelEncoder() object.

Unnamed: 0,IS_SUCCESSFUL
0,1
1,1
2,0
3,1
4,1
...,...
34294,0
34295,0
34296,0
34297,1


IS_SUCCESSFUL
1                18261
0                16038
Name: count, dtype: int64

In [70]:
# Define features set X by selecting all columns but IS_SUCCESSFUL
X = applicant_encoded_df.drop(columns=y_target)

# Review the features DataFrame
display(X)

Unnamed: 0,STATUS,ASK_AMT,APPLICATION_TYPE_T10,APPLICATION_TYPE_T12,APPLICATION_TYPE_T13,APPLICATION_TYPE_T14,APPLICATION_TYPE_T15,APPLICATION_TYPE_T17,APPLICATION_TYPE_T19,APPLICATION_TYPE_T2,...,INCOME_AMT_1-9999,INCOME_AMT_10000-24999,INCOME_AMT_100000-499999,INCOME_AMT_10M-50M,INCOME_AMT_1M-5M,INCOME_AMT_25000-99999,INCOME_AMT_50M+,INCOME_AMT_5M-10M,SPECIAL_CONSIDERATIONS_N,SPECIAL_CONSIDERATIONS_Y
0,1,5000,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
1,1,108590,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
2,1,5000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
3,1,6692,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
4,1,142590,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
34294,1,5000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
34295,1,5000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
34296,1,5000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
34297,1,5000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0


### Step 6: Split the features and target sets into training and testing datasets.


In [71]:
# Split the preprocessed data into a training and testing dataset
# Assign the function a random_state equal to 1
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)

### Step 7: Use scikit-learn's `StandardScaler` to scale the features data.

In [72]:
# Create a StandardScaler instance
standard_scaler = StandardScaler()

# Fit the scaler to the features training dataset
X_scaler_fit = standard_scaler.fit(X_train)

# Use the fit scaler to transform the scale of the features training and testing datasets
X_train_scaled = X_scaler_fit.transform(X_train)
X_test_scaled = X_scaler_fit.transform(X_test)

---

## Compile and Evaluate a Binary Classification Model Using a Neural Network

### Step 1: Create a deep neural network by assigning the number of input features, the number of layers, and the number of neurons on each layer using Tensorflow’s Keras.

> **Hint** You can start with a two-layer deep neural network model that uses the `relu` activation function for both layers. 'relu' is the Rectified Linear Unit activation function.

In [73]:
# Define the the number of inputs (features) to the model
number_input_features = X_train.shape[1] # Alternative formulation is: len(X_train.iloc[0])
#print(X_train_scaled.shape)
print(X_train.shape)

# Review the number of features
print(number_input_features)

(25724, 116)
116


In [74]:
# Define the number of neurons in the output layer
number_output_neurons = 1 # Single target variable output neuron

In [75]:
# Define the number of hidden nodes for the first hidden layer
# We will use the general guidance provided in class, although this is more art than recipe:
# Use the mean of the number of input features plus the number of output neurons
hidden_nodes_layer_1 = (number_input_features + number_output_neurons) // 2 # Python floor division operator (//) returns the quotient rounded down to the nearest integer

# Review the number hidden nodes in the first layer
print(hidden_nodes_layer_1)

58


In [76]:
# Define the number of hidden nodes for the second hidden layer
# We will use the general guidance provided in class, although more art than recipe:
# Use the mean of the number of hidden nodes in the first hidden layer plus the number of output neurons
hidden_nodes_layer_2 =  (hidden_nodes_layer_1 + number_output_neurons) // 2

# Review the number hidden nodes in the second layer
print(hidden_nodes_layer_2)

29


In [77]:
# Create the Sequential model instance
neural_net = Sequential(name='neural_net_venture_funding_model') # An alternative formulation is the Functional Model, or Model() object, also from the Keras library (c.f. https://becominghuman.ai/sequential-vs-functional-model-in-keras-20684f766057)

In [78]:
# Add the first hidden layer
neural_net.add(Dense(units=hidden_nodes_layer_1, input_dim=number_input_features, 
                                                      activation='relu', name='layer_1'))

In [79]:
# Add the second hidden layer
neural_net.add(Dense(units=hidden_nodes_layer_2, activation='relu', name='layer_2'))

In [80]:
# Add the output layer to the model specifying the number of output neurons and activation function
neural_net.add(Dense(units=number_output_neurons, activation='relu', name='layer_out'))

In [81]:
# Review the model layers created, along with initial iterative input weights prior to seeing any data
display(neural_net.layers)
display(neural_net.weights) # For example, confirms 29 weights, corresponding to 29 nodes, applied to input at the output layer

[<keras.src.layers.core.dense.Dense at 0x2dcdc15a0>,
 <keras.src.layers.core.dense.Dense at 0x2dcdc1c90>,
 <keras.src.layers.core.dense.Dense at 0x2dcdc1300>]

[<tf.Variable 'layer_1/kernel:0' shape=(116, 58) dtype=float32, numpy=
 array([[-0.08873172,  0.15002848, -0.17099756, ...,  0.01626182,
          0.01158209, -0.07924043],
        [ 0.10286887, -0.13420528, -0.17491573, ..., -0.00879359,
         -0.09414656,  0.18028878],
        [ 0.04036567,  0.10040961, -0.08102893, ..., -0.13573124,
          0.01411664,  0.17424963],
        ...,
        [-0.02178933, -0.07411443,  0.17890383, ..., -0.03831896,
          0.15701695, -0.06200225],
        [ 0.04138559, -0.14657822,  0.03124471, ...,  0.06295647,
          0.05822571,  0.16914017],
        [ 0.0124885 , -0.1624426 ,  0.15294652, ...,  0.02395739,
         -0.01650356,  0.0005907 ]], dtype=float32)>,
 <tf.Variable 'layer_1/bias:0' shape=(58,) dtype=float32, numpy=
 array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0

In [82]:
# Display the summary for the Sequential model build.
# "Every layer has both an input and output attribute," c.f. https://keras.io/guides/sequential_model/
display(neural_net.summary())
# For example, layer_1 has 6786 parameters for weights and bias = 116 features * 58 nodes + 58 bias parameters

Model: "neural_net_venture_funding_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 layer_1 (Dense)             (None, 58)                6786      
                                                                 
 layer_2 (Dense)             (None, 29)                1711      
                                                                 
 layer_out (Dense)           (None, 1)                 30        
                                                                 
Total params: 8527 (33.31 KB)
Trainable params: 8527 (33.31 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


None

### Step 2: Compile and fit the model using the `binary_crossentropy` loss function, the `adam` optimizer, and the `accuracy` evaluation metric.


In [83]:
# Compile the Sequential model, specifying the training configuration, including the loss function, optimizer, and evaluation metric(s)
# Available Keras optimizers: SGD, RMSprop, Adam, AdamW, Adadelta, Adagrad, Adamax, Adafactor, Nadam, Ftrl, c.f. https://keras.io/api/optimizers/
# Available probabilistic loss functions for classification problems: binary_crossentropy, categorical_crossentropy, poisson, kl_divergence, c.f. https://keras.io/api/losses/
# binary_crossentropy "computes the cross-entropy loss between true labels and predicted labels." \n
# "Use this cross-entropy loss for binary (0 or 1) classification applications." c.f. https://keras.io/api/losses/probabilistic_losses/#binary_crossentropy-function
# Available metrics at https://keras.io/api/metrics/
neural_net.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

In [84]:
# Fit the model using 50 epochs and the training data
neural_net_trained_model = neural_net.fit(X_train_scaled, y_train, epochs=50)
# 32 training samples is the default batch size.  The total training sample size is 25724, or 75% of 34299. \n
# Therefore, there are 804 batches run per epoch, at 32 samples per batch, equals 25724 training samples per epoch.
# The model struggles in training to reduce losses, while little progress is made to improve accuracy

Epoch 1/50


  1/804 [..............................] - ETA: 4:09 - loss: 7.2068 - accuracy: 0.3125

2023-10-28 01:40:00.536882: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.


Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


### Step 3: Evaluate the model using the test data to determine the model’s loss and accuracy.


In [87]:
# Evaluate the model loss and accuracy metrics using the evaluate method and the test data
model_loss, model_accuracy = neural_net.evaluate(X_test_scaled, y_test, verbose=2)

# Display the model loss and accuracy results
print(f"Loss: {model_loss}, Accuracy: {model_accuracy}")

268/268 - 1s - loss: 4.4750 - accuracy: 0.7082 - 776ms/epoch - 3ms/step
Loss: 4.475027561187744, Accuracy: 0.7082215547561646


### Step 4: Save and export your model to an HDF5 file, and name the file `AlphabetSoup.h5`. 


In [88]:
# Set the model's file path
file_nm_path = Path('Resources/AlphabetSoup.h5')
# Alternatively, set the file name path to the native .keras file format
file_nm_path_alt = Path('Resources/AlphabetSoup.keras')

# Export your model to a HDF5 file
neural_net.save(file_nm_path)
# Alternatively, export to the native .keras file format
neural_net.save(file_nm_path_alt)

  saving_api.save_model(


---

## Optimize the neural network model


### Step 1: Define at least three new deep neural network models (resulting in the original plus 2 optimization attempts). With each, try to improve on your first model’s predictive accuracy.

> **Rewind** Recall that perfect accuracy has a value of 1, so accuracy improves as its value moves closer to 1. To optimize your model for a predictive accuracy as close to 1 as possible, you can use any or all of the following techniques:
>
> * Adjust the input data by dropping different features columns to ensure that no variables or outliers confuse the model.
>
> * Add more neurons (nodes) to a hidden layer.
>
> * Add more hidden layers.
>
> * Use different activation functions for the hidden layers.
>
> * Add to or reduce the number of epochs in the training regimen.


**`A priori neural net venture funding model optimizing strategy:`**

`Will take incremental optimizing approach to assess impact of marginal changes to model, while holding everything else constant.`
>1. `Alternative model A1: Modify the activation function uniformly across all layers to "LeakyReLU" from the base model's "ReLU" and assess change in the model's performance and accuracy.`  
>2. `Alternative model A2: Maintain the activation function as "LeakyReLU" across all layers, except the output layer where the activation function is modified to the binary-focused 'Sigmoid'.`
>3. `Alternative model A3: Condense the model from 2 hidden layers to 1 hidden layer, or a shallower model.  Maintain 'LeakyReLU' on the hidden layer and the 'Sigmoid' activation function on the output layer, based on the results of our previous testing. Increase the number of nodes on the hidden layer, from the original formulation using the mean of the features and output, to a rule-of-thumb multiple of 2 to 3 times the features (course instructor's guidance), selecting a conservative 3 times multiple.`

`Following training and evaluation of our A3 model, we reviewed the original dataset's (applicant_data_df) nine features to assess whether a feature or features might be confusing or unnecessarily complicating the venture fund model, and should therefore be excluded as explanatory variable(s) from our model.  We did not see any suspect column that should be excluded, except perhaps the 'Special_Considerations' feature, which wasn't well understood by us and perhaps not by our venture capital firm where the field could be liberally or ambiguously interpreted by associates maintaining the dataset, leading to an unnecessary and inconsistent bifurcation within the dataset.  Given our limited resources and the three alternative models already investigated and developed, we will defer this question to a later time, junior associate or consultant.`

### Alternative Model 1:
* `Alternative model A1: Modify the activation function uniformly across all layers to 'LeakyReLU' from the base 'ReLU', and assess change in model's performance and accuracy.`

In [89]:
# Define the the number of inputs (features) to the model
number_input_features_A1 = len(X_train.iloc[0])

# Review the number of features
number_input_features_A1

116

In [90]:
# Define the number of neurons in the output layer
number_output_neurons_A1 = 1 # Maintain single target variable output neuron
number_output_neurons_A1

1

In [91]:
# Define the number of hidden nodes for the first hidden layer
hidden_nodes_layer1_A1 = (number_input_features_A1 + number_output_neurons_A1) // 2 # Maintain first layer number of nodes

# Review the number of hidden nodes in the first layer
hidden_nodes_layer1_A1

58

In [92]:
# Define the number of hidden nodes for the second hidden layer
hidden_nodes_layer2_A1 = (hidden_nodes_layer1_A1 + number_output_neurons_A1) // 2 # Maintain second layer number of nodes

# Review the number of hidden nodes in the first layer
hidden_nodes_layer2_A1

29

In [93]:
# Import LeakyReLU, which is available as a layer, not as an activation function per se.
# LeakyReLU is considered an "advanced activation" layer.
# "Activations that are more complex than a simple TensorFlow function (eg. learnable activations, which maintain a state) are available as Advanced Activation layers, \n
# and can be found in the module tf.keras.layers.advanced_activations. These include PReLU and LeakyReLU. Note that you should not pass activation layers instances as \n
# the activation argument of a layer. They're meant to be used just like regular layers," c.f. https://keras.io/api/layers/activations/
from keras.layers import LeakyReLU

# Create the Sequential model instance
neural_net_A1 = Sequential(name='neural_net_venture_funding_model_A1')

In [94]:
# Add layers to the neural_net instance

# First hidden layer ('1a' and '1b')
# 2-Step process since there is no alias for LeakyReLU as there is for ReLU 'relu':
# First step: Applies weights to inputs and adds bias
neural_net_A1.add(Dense(
    units=hidden_nodes_layer1_A1, input_dim=number_input_features_A1, name='layer_1a_A1'))
# Second step: 
neural_net_A1.add(LeakyReLU(name='layer_1b_A1')) # alpha parameter default for LeakyReLU is 0.3, assume for all LeakyReLU layers for now

# Second hidden layer ('2a' and '2b')
neural_net_A1.add(Dense(units=hidden_nodes_layer2_A1, name='layer_2a_A1'))
neural_net_A1.add(LeakyReLU(name='layer_2b_A1'))

# Output layer ('out_a' and 'out_b')
neural_net_A1.add(Dense(units=number_output_neurons_A1, name='layer_out_a_A1'))
neural_net_A1.add(LeakyReLU(name='layer_out_b_A1'))

# Check the structure of the model
display(neural_net_A1.summary())

Model: "neural_net_venture_funding_model_A1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 layer_1a_A1 (Dense)         (None, 58)                6786      
                                                                 
 layer_1b_A1 (LeakyReLU)     (None, 58)                0         
                                                                 
 layer_2a_A1 (Dense)         (None, 29)                1711      
                                                                 
 layer_2b_A1 (LeakyReLU)     (None, 29)                0         
                                                                 
 layer_out_a_A1 (Dense)      (None, 1)                 30        
                                                                 
 layer_out_b_A1 (LeakyReLU)  (None, 1)                 0         
                                                                 
Total params: 8527 (33.31 KB)
T

None

In [95]:
# Review the model's dense layers created, along with initial iterative input weights and bias prior to seeing any data
display(neural_net_A1.weights)
#weights, biases = neural_net_A1.layers[0].get_weights()
#print(weights, biases)

[<tf.Variable 'layer_1a_A1/kernel:0' shape=(116, 58) dtype=float32, numpy=
 array([[ 0.09147923,  0.11638416, -0.00267804, ...,  0.05232108,
         -0.08752505,  0.12196316],
        [-0.00876327, -0.00890462,  0.14985596, ...,  0.03014059,
         -0.0530757 ,  0.16514863],
        [ 0.06897564,  0.10077347,  0.12573908, ..., -0.08376962,
          0.16042583, -0.14160709],
        ...,
        [-0.10672785,  0.08688669, -0.06062394, ...,  0.01372661,
         -0.10027432, -0.17325912],
        [ 0.12673666,  0.08554931, -0.1717792 , ..., -0.02993524,
          0.04285811, -0.05640572],
        [ 0.09260859,  0.15380265, -0.14094901, ...,  0.03624701,
         -0.02030972,  0.12925027]], dtype=float32)>,
 <tf.Variable 'layer_1a_A1/bias:0' shape=(58,) dtype=float32, numpy=
 array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0

In [96]:
# Compile the Sequential model
neural_net_A1.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

In [97]:
# Fit the model using 50 epochs and the training data
neural_net_trained_model_A1 = neural_net_A1.fit(X_train_scaled, y_train, epochs=50)
# We observed that the LeakyReLU activation function produced an immediate improvement in reducing the loss and increasing the accuracy at the very outset of training, in the first few epochs.
# However, as with ReLU, we observed that the model with LeakyReLU struggled to marginally reduce loss and increase accuracy as training progressed.  While the LeakyReLU loss function plateaued \n
# around 0.7, significantly less than ReLU which plateaued around 4.2, after 50 epochs, LeakyReLU accuracy terminated at 0.73, not significantly superior to the ReLU model where accuracy terminated at 0.72.
# While we demonstrated that LeakyReLU is the superior activation function in terms of both training efficiency, the speed at which the model converged on a solution, and fit, reflected in the loss \n
# function metric, the lack of improvement in accuracy indicates that we have more work to do to improve the model's predictive power as measured by accuracy.

Epoch 1/50
  1/804 [..............................] - ETA: 4:10 - loss: 6.2601 - accuracy: 0.4375

2023-10-28 01:45:10.100257: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.


Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


#### Alternative Model 2:
* `Alternative model A2: Maintain the activation function from the previous model, A1, across all layers as 'LeakyReLU', except the output layer where the activation function is changed to 'Sigmoid'.`

In [98]:
# Define the the number of inputs (features) to the model
number_input_features_A2 = len(X_train.iloc[0])

# Review the number of features
number_input_features_A2

116

In [99]:
# Define the number of neurons in the output layer
number_output_neurons_A2 = 1 # Maintain single target variable output neuron
number_output_neurons_A2

1

In [100]:
# Define the number of hidden nodes for the first hidden layer
hidden_nodes_layer1_A2 = (number_input_features_A2 + number_output_neurons_A2) // 2 # Maintain first layer number of nodes

# Review the number of hidden nodes in the first layer
hidden_nodes_layer1_A2

58

In [101]:
# Define the number of hidden nodes for the second hidden layer
hidden_nodes_layer2_A2 = (hidden_nodes_layer1_A2 + number_output_neurons_A2) // 2 # Maintain second layer number of nodes

# Review the number of hidden nodes in the second layer
hidden_nodes_layer2_A2

29

In [102]:
# Import LeakyReLU, which is available as a layer, not as an activation function per se.
# LeakyReLU is considered an "advanced activation" layer.
# "Activations that are more complex than a simple TensorFlow function (eg. learnable activations, which maintain a state) are available as Advanced Activation layers, \n
# and can be found in the module tf.keras.layers.advanced_activations. These include PReLU and LeakyReLU. Note that you should not pass activation layers instances as \n
# the activation argument of a layer. They're meant to be used just like regular layers," c.f. https://keras.io/api/layers/activations/
from keras.layers import LeakyReLU

# Create the Sequential model instance
neural_net_A2 = Sequential(name='neural_net_venture_funding_model_A2')

In [103]:
# Add layers to the neural_net instance

# First hidden layer ('1a' and '1b')
# 2-Step process since there is no alias for LeakyReLU as there is for ReLU 'relu':
# First step: Applies weights to inputs and adds bias
neural_net_A2.add(Dense(
    units=hidden_nodes_layer1_A2, input_dim=number_input_features_A2, name='layer_1a_A2'))
# Second step: 
neural_net_A2.add(LeakyReLU(name='layer_1b_A2')) # alpha parameter default for LeakyReLU is 0.3, assume for all LeakyReLU layers for now

# Second hidden layer ('2a' and '2b')
neural_net_A2.add(Dense(units=hidden_nodes_layer2_A2, name='layer_2a_A2'))
neural_net_A2.add(LeakyReLU(name='layer_2b_A2'))

# Output layer ('out')
neural_net_A2.add(Dense(units=number_output_neurons_A2, activation='sigmoid', name='layer_out_A2'))

# Check the structure of the model
display(neural_net_A2.summary())

Model: "neural_net_venture_funding_model_A2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 layer_1a_A2 (Dense)         (None, 58)                6786      
                                                                 
 layer_1b_A2 (LeakyReLU)     (None, 58)                0         
                                                                 
 layer_2a_A2 (Dense)         (None, 29)                1711      
                                                                 
 layer_2b_A2 (LeakyReLU)     (None, 29)                0         
                                                                 
 layer_out_A2 (Dense)        (None, 1)                 30        
                                                                 
Total params: 8527 (33.31 KB)
Trainable params: 8527 (33.31 KB)
Non-trainable params: 0 (0.00 Byte)
_______________________________________________________________

None

In [104]:
# Review the model's dense layers created, along with initial iterative input weights and bias prior to seeing any data
display(neural_net_A2.weights)
#weights, biases = neural_net_A2.layers[0].get_weights()
#print(weights, biases)

[<tf.Variable 'layer_1a_A2/kernel:0' shape=(116, 58) dtype=float32, numpy=
 array([[-1.47447079e-01,  6.40165508e-02, -1.46042958e-01, ...,
          1.29904792e-01,  1.02464780e-01,  1.46716833e-04],
        [ 2.87679881e-02, -1.56906053e-01, -3.29940021e-02, ...,
         -1.56082615e-01,  6.76776320e-02, -1.07625932e-01],
        [ 7.20581263e-02,  3.14129591e-02, -5.97046912e-02, ...,
         -1.06512859e-01, -3.92729491e-02, -1.71243623e-01],
        ...,
        [ 1.64926156e-01,  1.51992694e-01,  1.01527885e-01, ...,
          5.80614954e-02,  5.17394990e-02,  4.18140143e-02],
        [ 1.42589435e-01,  7.26232976e-02,  8.61750692e-02, ...,
          1.54253259e-01, -1.77153841e-01,  2.15867758e-02],
        [-7.45952129e-04, -6.74048662e-02,  1.71317950e-01, ...,
          1.79496512e-01, -8.56673941e-02, -1.27675548e-01]], dtype=float32)>,
 <tf.Variable 'layer_1a_A2/bias:0' shape=(58,) dtype=float32, numpy=
 array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0

In [105]:
# Compile the Sequential model
neural_net_A2.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

In [106]:
# Fit the model using 50 epochs and the training data
neural_net_trained_model_A2 = neural_net_A2.fit(X_train_scaled, y_train, epochs=50)
# Prior to running the training, we did not know what to expect by only changing the activation function of a single layer, the output layer, from LeakyReLU to Sigmoid.  We wanted to test the change to Sigmoid because we have a binary classification problem, defined as having two labels, and we knew the Sigmoid activation function is designed to separate data between probabilities of 0 and 1 with most of the data pushed toward either extreme, 0 or 1, characteristic of the sigmoid transform shape.

# We also noted the following attractive characteristics discussed by Sharma, which surprisingly confirmed our suspicion to use the Sigmoid activation in the output layer, if anywhere:
#
#   1) The sigmoid activation function is both non-linear and differentiable which are good characteristics for activation function.
#   2) As its output ranges between 0 to 1, it can be used in the output layer to produce the result in probability for binary classification.
#   3) When the input values are too small or too high, it can cause the neural network to stop learning, this issue is known as the vanishing gradient problem. This is why the Sigmoid activation function should #       not be used in hidden layers.
# c.f. Sharma, https://machinelearningknowledge.ai/pytorch-activation-functions-relu-leaky-relu-sigmoid-tanh-and-softmax/

# We were quite surprised to see an immediate benefit to the model in training where the loss function started at a materially lower loss and an accuracy that already nearly matched the final epochs in the A1 training.  We also noted the more well behaved training behavior over the entire span of epochs, where loss improvement, while slow, was steady with few reversals, and none large, and accuracy slightly improved over the span along a similar steady line.  While the loss and accuracy started out much improved, and the A2 model maintained its superiority over the original and A1 models to the end of training, we did not that the improvement over time was slow and modest, consistent with the remarks above noting the Sigmoid's tendency to experience a reduced gradient force driving potential further improvement.

# As a result of this training, we considered testing an A3 model by replacing not only the output layer activation function with Sigmoid, but also the two hidden layers with the Sigmoid activation function.
# However, we suspect this might not be useful, that the real import of the Sigmoid activation function is at the output layer, and will take Sharma's advice not to introduce the Sigmoid activation function in the hidden layers, although we don't really know the effect without attempting, which we may do at a later time. 

Epoch 1/50
  1/804 [..............................] - ETA: 4:14 - loss: 0.8823 - accuracy: 0.3125

2023-10-28 01:52:34.123024: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.


Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


#### Alternative Model 3
* `Alternative model A3: Condense the model from 2 hidden layers to 1 hidden layer, or a shallower model.  Maintain 'LeakyReLU' on the hidden layer and the 'Sigmoid' activation function on the output layer, based on the results of our previous testing. Increase the number of nodes on the hidden layer, from the original formulation using the mean of the features and output, to a rule-of-thumb multiple of 2 to 3 times the features, selecting a conservative 3 times multiple.`

In [107]:
# Define the the number of inputs (features) to the model
number_input_features_A3 = len(X_train.iloc[0])

# Review the number of features
number_input_features_A3

116

In [108]:
# Define the number of neurons in the output layer
number_output_neurons_A3 = 1 # Maintain single target variable output neuron
number_output_neurons_A3

1

In [109]:
# Define the number of hidden nodes for the first hidden layer
node_multiplier_layer1_A3 = 3
hidden_nodes_layer1_A3 = (number_input_features_A3 * node_multiplier_layer1_A3)

# Review the number of hidden nodes in the first layer
hidden_nodes_layer1_A3

348

In [110]:
# Import LeakyReLU, which is available as a layer, not as an activation function per se.
# LeakyReLU is considered an "advanced activation" layer.
# "Activations that are more complex than a simple TensorFlow function (eg. learnable activations, which maintain a state) are available as Advanced Activation layers, \n
# and can be found in the module tf.keras.layers.advanced_activations. These include PReLU and LeakyReLU. Note that you should not pass activation layers instances as \n
# the activation argument of a layer. They're meant to be used just like regular layers," c.f. https://keras.io/api/layers/activations/
from keras.layers import LeakyReLU

# Create the Sequential model instance
neural_net_A3 = Sequential(name='neural_net_venture_funding_model_A3')

In [111]:
# Add layers to the neural_net instance

# First hidden layer ('1a' and '1b')
# 2-Step process since there is no alias for LeakyReLU as there is for ReLU 'relu':
# First step: Applies weights to inputs and adds bias
neural_net_A3.add(Dense(
    units=hidden_nodes_layer1_A3, input_dim=number_input_features_A3, name='layer_1a_A3'))
# Second step: 
neural_net_A3.add(LeakyReLU(name='layer_1b_A3')) # alpha parameter default for LeakyReLU is 0.3, assume for all LeakyReLU layers for now

# Output layer ('out')
neural_net_A3.add(Dense(units=number_output_neurons_A3, activation='sigmoid', name='layer_out_A3'))

# Check the structure of the model
display(neural_net_A3.summary())

Model: "neural_net_venture_funding_model_A3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 layer_1a_A3 (Dense)         (None, 348)               40716     
                                                                 
 layer_1b_A3 (LeakyReLU)     (None, 348)               0         
                                                                 
 layer_out_A3 (Dense)        (None, 1)                 349       
                                                                 
Total params: 41065 (160.41 KB)
Trainable params: 41065 (160.41 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


None

In [112]:
# Review the model's dense layers created, along with initial iterative input weights and bias prior to seeing any data
display(neural_net_A3.weights)
#weights, biases = neural_net_A3.layers[0].get_weights()
#print(weights, biases)

[<tf.Variable 'layer_1a_A3/kernel:0' shape=(116, 348) dtype=float32, numpy=
 array([[ 0.08234495, -0.07943785,  0.00072068, ..., -0.08382428,
         -0.04840802,  0.08248797],
        [ 0.01409575,  0.04724495,  0.08510229, ...,  0.10710184,
         -0.0772088 ,  0.06473574],
        [-0.03503314,  0.06825931, -0.070598  , ..., -0.05019045,
         -0.08354166, -0.0109862 ],
        ...,
        [-0.10023233, -0.06052174, -0.10864536, ..., -0.07769787,
          0.03044544, -0.0289284 ],
        [ 0.05292003,  0.02189657, -0.08174283, ..., -0.08037041,
          0.07185557, -0.06627493],
        [ 0.07941914,  0.07802019,  0.03444569, ...,  0.09239082,
          0.07767963, -0.1051845 ]], dtype=float32)>,
 <tf.Variable 'layer_1a_A3/bias:0' shape=(348,) dtype=float32, numpy=
 array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

In [113]:
# Compile the Sequential model
neural_net_A3.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

In [114]:
# Fit the model using 50 epochs and the training data
neural_net_trained_model_A3 = neural_net_A3.fit(X_train_scaled, y_train, epochs=50)
# The simpler, shallower, single hidden layer A3 model appears to have performed nearly as well in training as the more complicated two hidden layer model.  Although the single hidden layer A3 model used more neurons in the first hidden layer (348) than the A2 model (58), the A2 model was more complicated and in aggregate made use of more neurons across its hidden layers (two layers independently combining to yield 58 times 29 neuron branches, or 1682 neuron combinations in the A2 model).
# It will be interesting to next evaluate the four trained models in testing, using unseen data, but for now we prefer the A3 model for its power and ability to get the job done in training using a simpler and more efficient design and fewer resources.

Epoch 1/50
  1/804 [..............................] - ETA: 3:52 - loss: 0.7239 - accuracy: 0.4375

2023-10-28 01:56:31.352811: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.


Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


### Step 2: After finishing your models, display the accuracy scores achieved by each model, and compare the results.

In [141]:
print("Original Model Testing and Training Results Summary:")

# Evaluate the model loss and accuracy metrics using the evaluate method and the test data
model_loss, model_accuracy = neural_net.evaluate(X_test_scaled, y_test, verbose=2)

# Display the model test loss and accuracy results
print(f"\nTest Results:\nLoss: {model_loss}, Accuracy: {model_accuracy}\n")

# Review the available keys for the train history dictionary
print(f"Available keys from train history dictionary: {neural_net_trained_model_A1.history.keys()}\n")

# Display the model train loss and accuracy results
print(f"Train Results:\nLoss: {neural_net_trained_model.history['loss'][-1]}, Accuracy: {neural_net_trained_model.history['accuracy'][-1]}")

# Interesting to observe in this particular training iteration that the test results turned out superior to the actual training.  Other training iterations of this model did not generally produce this directional result.

Original Model Testing and Training Results Summary:
268/268 - 1s - loss: 4.4750 - accuracy: 0.7082 - 829ms/epoch - 3ms/step

Test Results:
Loss: 4.475027561187744, Accuracy: 0.7082215547561646

Available keys from train history dictionary: dict_keys(['loss', 'accuracy'])

Train Results:
Loss: 4.507335186004639, Accuracy: 0.7061887979507446


In [144]:
print("Alternative Model 1 Testing and Training Results Summary (A1):")

# Evaluate the model loss and accuracy metrics using the evaluate method and the test data
model_loss, model_accuracy = neural_net_A1.evaluate(X_test_scaled, y_test, verbose=2)

# Display the model test loss and accuracy results
print(f"\nTest Results:\nLoss: {model_loss}, Accuracy: {model_accuracy}\n")

# Display the model train loss and accuracy results
print(f"Train Results:\nLoss: {neural_net_trained_model_A1.history['loss'][-1]}, Accuracy: {neural_net_trained_model_A1.history['accuracy'][-1]}")

Alternative Model 1 Testing and Training Results Summary (A1):
268/268 - 1s - loss: 0.5935 - accuracy: 0.7292 - 912ms/epoch - 3ms/step

Test Results:
Loss: 0.5935436487197876, Accuracy: 0.7292128205299377

Train Results:
Loss: 0.6160327792167664, Accuracy: 0.7318068742752075


In [145]:
print("Alternative Model 2 Testing and Training Results Summary (A2):")

# Evaluate the model loss and accuracy metrics using the evaluate method and the test data
model_loss, model_accuracy = neural_net_A2.evaluate(X_test_scaled, y_test, verbose=2)

# Display the model test loss and accuracy results
print(f"\nTest Results:\nLoss: {model_loss}, Accuracy: {model_accuracy}\n")

# Display the model train loss and accuracy results
print(f"Train Results:\nLoss: {neural_net_trained_model_A2.history['loss'][-1]}, Accuracy: {neural_net_trained_model_A2.history['accuracy'][-1]}")

Alternative Model 2 Testing and Training Results Summary (A2):
268/268 - 1s - loss: 0.5565 - accuracy: 0.7286 - 927ms/epoch - 3ms/step

Test Results:
Loss: 0.5565388202667236, Accuracy: 0.7286297082901001

Train Results:
Loss: 0.5375846028327942, Accuracy: 0.7364329099655151


In [146]:
print("Alternative Model 3 Testing and Training Results Summary (A3):")

# Evaluate the model loss and accuracy metrics using the evaluate method and the test data
model_loss, model_accuracy = neural_net_A3.evaluate(X_test_scaled, y_test, verbose=2)

# Display the model test loss and accuracy results
print(f"\nTest Results:\nLoss: {model_loss}, Accuracy: {model_accuracy}\n")

# Display the model train loss and accuracy results
print(f"Train Results:\nLoss: {neural_net_trained_model_A1.history['loss'][-1]}, Accuracy: {neural_net_trained_model_A1.history['accuracy'][-1]}")

Alternative Model 3 Testing and Training Results Summary (A3):
268/268 - 1s - loss: 0.5618 - accuracy: 0.7291 - 789ms/epoch - 3ms/step

Test Results:
Loss: 0.5617796182632446, Accuracy: 0.7290962338447571

Train Results:
Loss: 0.6160327792167664, Accuracy: 0.7318068742752075


### Step 3: Save each of your alternative models as an HDF5 file.


In [147]:
# Set the file path for the first alternative model
file_nm_path = Path('Resources/AlphabetSoup_A1.h5')
# Alternatively, set the file name path to the native .keras file format
file_nm_path_alt = Path('Resources/AlphabetSoup_A1.keras')

# Export your model to a HDF5 file
neural_net_A1.save(file_nm_path)
# Alternatively, export to the native .keras file format
neural_net_A1.save(file_nm_path_alt)

  saving_api.save_model(


In [148]:
# Set the file path for the second alternative model
file_nm_path = Path('Resources/AlphabetSoup_A2.h5')
# Alternatively, set the file name path to the native .keras file format
file_nm_path_alt = Path('Resources/AlphabetSoup_A2.keras')

# Export your model to a HDF5 file
neural_net_A2.save(file_nm_path)
# Alternatively, export to the native .keras file format
neural_net_A2.save(file_nm_path_alt)

In [149]:
# Set the file path for the third alternative model
file_nm_path = Path('Resources/AlphabetSoup_A3.h5')
# Alternatively, set the file name path to the native .keras file format
file_nm_path_alt = Path('Resources/AlphabetSoup_A3.keras')

# Export your model to a HDF5 file
neural_net_A3.save(file_nm_path)
# Alternatively, export to the native .keras file format
neural_net_A3.save(file_nm_path_alt)

In [188]:
# Reviewing dataset of test predictions vs. target
test_actuals = y_test
test_predictions_all_df = test_actuals.rename(columns={'IS_SUCCESSFUL': 'Test_Actual'})
test_predictions_all_df['Test_Pred_Orig'] = neural_net.predict(X_test_scaled)
test_predictions_all_df['Test_Pred_A1'] = neural_net_A1.predict(X_test_scaled)
test_predictions_all_df['Test_Pred_A2'] = neural_net_A2.predict(X_test_scaled)
test_predictions_all_df['Test_Pred_A3'] = neural_net_A3.predict(X_test_scaled)
test_predictions_all_df = test_predictions_all_df.reset_index(drop=True)
display(test_predictions_all_df)



Unnamed: 0,Test_Actual,Test_Pred_Orig,Test_Pred_A1,Test_Pred_A2,Test_Pred_A3
0,1,7.029622,0.762729,0.741522,0.780832
1,0,0.000000,0.176792,0.205197,0.211591
2,1,34.923698,1.263842,0.723186,0.849688
3,1,7.029622,0.762729,0.741522,0.780832
4,0,7.146249,0.673407,0.657174,0.671644
...,...,...,...,...,...
8570,1,0.000000,0.330262,0.350684,0.332828
8571,0,20.434734,0.886507,0.775288,0.704376
8572,1,4.322459,0.874238,0.936948,0.910023
8573,0,0.000000,0.277309,0.230098,0.300466


>* `Within the context of the alternative models, the original model did not produce a terrible result, where training accuracy was inferior to the alternatives by just around 4.1% in the worst case (0.7062/0.7364-1): original vs A3 model).  Test loss and accuracy metrics were self-consistent and comparable to training with respect to each model, indicative of robust models out-of-sample.`

>* `We believe the loss function for the original model appeared much worse, while simultaneously showing competitive accuracy against the alternative models, because the loss function was derived from the raw predictions versus actual, whereas the accuracy metric defaulted to binary accuracy, reflecting, or consistent with, the binary_crossentropy loss specification, which transformed and classified the raw predictions as a 0 or 1 class label, based on a default 0.5 threshold. (In other words, predictions below 0.5 were assigned the 0 label, whereas those above 0.5 were assigned the 1 label, and then accuracy calculated, yielding relatively competitive and consistent accuracy across all models, and which remained robust to testing.)`

>* `References: https://keras.io/guides/training_with_built_in_methods/, https://visualstudiomagazine.com/articles/2018/08/30/neural-binary-classification-keras.aspx`

In [189]:
# Sojourn.  Reviewing quickly performance of a traditional non-iterative supervised learning model to explain our venture funding data.
# Classification with Support Vector Model:
# Import SVM library
from sklearn.svm import SVC

# Instantiate an SVM classification model
svc = SVC(kernel='rbf')

# Fit the Data
svc_model = svc.fit(X_train_scaled, y_train)

  y = column_or_1d(y, warn=True)


In [195]:
# Review training data predictions
y_train_predictions = svc_model.predict(X_train_scaled)
train_results = pd.DataFrame({"Prediction": y_train_predictions, "Actual": y_train['IS_SUCCESSFUL']}).reset_index(drop=True)
display(train_results)

Unnamed: 0,Prediction,Actual
0,0,0
1,1,1
2,1,1
3,1,1
4,1,1
...,...,...
25719,0,0
25720,1,1
25721,0,0
25722,0,0


In [196]:
# Score the model as trained
score_train = svc_model.score(X_train_scaled, y_train)
display(f"SVM SVC Model Train Score: {score_train}\n")

# Score the model exposed to unseen test data
score_test = svc_model.score(X_test_scaled, y_test)
display(f"SVM SVC Model Test Score: {score_test}")

'SVM SVC Model Train Score: 0.7339838283315192\n'

'SVM SVC Model Test Score: 0.7286297376093295'

>* `Traditional closed-form supervised learning model, using a support vector machine classifier, appeared to perform just as well as a more complex and resource-intensive neural net model.`