Skip to content

Deep Learning model for predicting success of venture capital recipients

License

Notifications You must be signed in to change notification settings

Billie-LS/DeepL_Adventure_Angels

Repository files navigation

Columbia University Engineering, New York FinTech BootCamp

August 2022 Cohort

image1

Module 13, Challenge - Data Science & Machine Learning - Neural Networks, DeepLearning

Objective - Compile and Evaluate a Binary Classification Model using a Neural Network that predicts if applicants will be successful if funded through venture capital firm.

Scenario - Given a historical dataset CSV file containing more than 34,000 organizations that have received funding, employ neural network knowledge to evaluate dataset features and create a binary classifier model that will predict an applicant will become a successful or failed business.

Product - Jupyter notebook with -

  • Data preprocessing for a neural network model.

  • binary classification model using a deep neural network.

  • Utilize model-fit-predict pattern to compile and evaluate.

  • Model optimization.


Methods

The code script analysis performed uses or employs applications of:

Data encoding with OneHotEncoder, 
train_test_split(),
Feature Scaling with StandardScaler,
keras.callbacks.EarlyStopping(),
keras.callbacks.ModelCheckpoint(),

Challenge - Modeling & Analysis

Original

image1 image3


Challenge - Modeling & Analysis

Optimization

image4 image7 image5 image6

image8 image9 image10

image14 image15 image16

image11 image12 image13


Supplemental Modeling and Analysis

Supplemental processing and analysis:

Beyond the scope of the assignment, the author sought to conduct additional analysis of the data obtained; supplemental material script with model building follows the primary challenge. Additionally, supplemetal experimental notebooks are included.


Technologies


Dependencies

This project leverages Jupyter Lab v3.4.4 and Python version 3.9.13 packaged by conda-forge | (main, May 27 2022, 17:01:00) with the following packages:

  • sys - module provides access to some variables used or maintained by the interpreter and to functions that interact strongly with the interpreter.

  • NumPy - an open source Python library used for working with arrays, contains multidimensional array and matrix data structures with functions for working in domain of linear algebra, fourier transform, and matrices.

  • pandas - software library written for the python programming language for data manipulation and analysis.

  • Path - from pathlib - Object-oriented filesystem paths, Path instantiates a concrete path for the platform the code is running on.

  • Scikit-learn - an open source machine learning library that supports supervised and unsupervised learning; provides various tools for model fitting, data preprocessing, model selection, model evaluation, and many other utilities.

  • tensorflow - an end-to-end machine learning platform.

  • tf.keras - a compact, easy to learn, high-level Python library run on top of TensorFlow framework; made with focus of understanding deep learning techniques, such as creating layers for neural networks maintaining the concepts of shapes and mathematical details.

  • keras - a deep learning API written in Python, running on top of the machine learning platform TensorFlow.

  • train_test_split - from sklearn.model_selection, a quick utility that wraps input validation and next(ShuffleSplit().split(X, y)) and application to input data into a single call for splitting (and optionally subsampling) data in a oneliner.

  • OneHotEncoder - from sklearn.preprocessing, encode categorical features as a one-hot numeric array. Features are encoded using a one-hot (aka ‘one-of-K’ or ‘dummy’) encoding scheme; creates a binary column for each category and returns a sparse matrix or dense array.

  • StandardScaler - from sklearn.preprocessing, standardize features by removing the mean and scaling to unit variance.

  • matplotlib.pyplot a state-based interface to matplotlib. It provides an implicit, MATLAB-like, way of plotting. It also opens figures on your screen, and acts as the figure GUI manager


Hardware used for development

MacBook Pro (16-inch, 2021)

Chip Appple M1 Max
macOS Monterey version 12.6

Development Software

Homebrew 3.6.11

Homebrew/homebrew-core (git revision 01c7234a8be; last commit 2022-11-15)
Homebrew/homebrew-cask (git revision b177dd4992; last commit 2022-11-15)

Python Platform: macOS-13.0.1-arm64-arm-64bit

Python version 3.9.13 packaged by conda-forge
Scikit-Learn 1.1.3
Tensor Flow Version: 2.10.0
Keras Version: 2.10.0
pandas 1.5.1

pip 22.3 from /opt/anaconda3/lib/python3.9/site-packages/pip (python 3.9)

git version 2.37.2


Installation of application (i.e. github clone)

In the terminal, navigate to directory where you want to install this application from the repository and enter the following command

git clone git@github.com:Billie-LS/DeepL_Adventure_Angels.git

Usage

From terminal, the installed application is run through jupyter lab web-based interactive development environment (IDE) interface by typing at prompt:

> jupyter lab

The file you will run is:

credit_risk_resampling.ipynb

Project requirements

see starter code


Version control

Version control can be reviewed at:

https://github.com/Billie-LS/DeepL_Adventure_Angels

repository


Contributors

Author

Loki 'billie' Skylizard LinkedIn @GitHub

BootCamp lead instructor

Vinicio De Sola LinkedIn @GitHub

Outside instructors

Jeff Heaton LinkedIn @GitHub YouTube

BootCamp teaching assistant

Santiago Pedemonte LinkedIn @GitHub

BootCamp classmates

None

askBCS assistants

None


Additional references and or resources utilized

Keras

Keras

pandas

pandas

PredictiveHacks

Stack Overflow

Stack Overflow

Stack Overflow

Stack Overflow

PredictiveHacks

TensorFlow

TensorFlow

GitHub

GitHub

GitHub


License

MIT License

Copyright (c) [2022] [Loki 'billie' Skylizard]

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Releases

No releases published

Packages

No packages published