<a href="https://colab.research.google.com/github/RoyalWeden/AICamp21/blob/main/AI_Camp_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# What is the project or what does the project do?

The goal of my project is to predict the amount of jobs created and retained from a loan to a business, as well as to predict the dollar amount of a loan to approve.

# How is it helpful for the community or environment?

The main benefits of this project is that it allows banks to choose what businesses to provide loans to that best help the job market and how much of a loan to provide the business with. Right now especially, a lot of people have lost their jobs due to the pandemic, and many business are low on money to provide for infrastructure, employees, and the products and servies they sell.

# How did I implement it using the tools discussed in the course? What machine learning concepts were used?

I implemented this project through both linear regression and an artificial neural network. As inputs, this neural network takes in the borrower's city, state, and zipcode, bank name and state, North American industry classification system code (NAICS), length of term (months), number of business employees, is a new or existing business, is urban or rural, if real estate was used, and if a recession happened. As outputs, the linear regression and artificial neural network result with the number of jobs created, the number of jobs retained, and the gross amount of loan approved by the bank and the SBA. To preprocess the data, I had to convert the categorical columns through one hot encoding. I also had to convert the numerical columns through standard scaling. Lastly, I dropped NA rows.

The data I am using to create the neural network is from kaggle with information on Small Business Administration Loans. As discussed by the data provider, Sean, "small businesses have been the primary source of employment in the United States. Helping small businesses with job creation, reduces unemployment, and as such, promotes economic growth."

# Authenticating Kaggle API using kaggle.json

I found this code to authenticate the Kaggle API from a public Colab notebook that you can view [here](https://colab.research.google.com/github/corrieann/kaggle/blob/master/kaggle_api_in_colab.ipynb#scrollTo=0HtGf0HEXEa5).

In [None]:
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))
  
# Then move kaggle.json into the folder where the API expects to find it.
!mkdir -p ~/.kaggle/ && mv kaggle.json ~/.kaggle/ && chmod 600 ~/.kaggle/kaggle.json

Saving kaggle.json to kaggle.json
User uploaded file "kaggle.json" with length 66 bytes


# Downloading SBA Loans Case dataset

The SBA Loans Kaggle data set can be viewed on Kaggle [here](https://www.kaggle.com/larsen0966/sba-loans-case-data-set). Furthermore, to get more information on the dataset, go [here](https://amstat.tandfonline.com/doi/full/10.1080/10691898.2018.1434342).

In [None]:
!kaggle datasets download -d larsen0966/sba-loans-case-data-set

Downloading sba-loans-case-data-set.zip to /content
  0% 0.00/112k [00:00<?, ?B/s]
100% 112k/112k [00:00<00:00, 40.8MB/s]


In [None]:
!unzip sba-loans-case-data-set.zip

Archive:  sba-loans-case-data-set.zip
  inflating: SBAcase.11.13.17.csv    


# Load dataset

In [None]:
import pandas as pd

sba_loans_df = pd.read_csv('SBAcase.11.13.17.csv')
sba_loans_df.head()

Unnamed: 0,Selected,LoanNr_ChkDgt,Name,City,State,Zip,Bank,BankState,NAICS,ApprovalDate,ApprovalFY,Term,NoEmp,NewExist,CreateJob,RetainedJob,FranchiseCode,UrbanRural,RevLineCr,LowDoc,ChgOffDate,DisbursementDate,DisbursementGross,BalanceGross,MIS_Status,ChgOffPrinGr,GrAppv,SBA_Appv,New,RealEstate,Portion,Recession,daysterm,xx,Default
0,0,1004285007,SIMPLEX OFFICE SOLUTIONS,ANAHEIM,CA,92801,CALIFORNIA BANK & TRUST,CA,532420,15074,2001,36,1,1.0,0,0,1,0,Y,N,,15095.0,32812,0,P I F,0,30000,15000,0,0,0.5,0,1080,16175.0,0
1,1,1004535010,DREAM HOME REALTY,TORRANCE,CA,90505,CALIFORNIA BANK & TRUST,CA,531210,15130,2001,56,1,1.0,0,0,1,0,Y,N,,15978.0,30000,0,P I F,0,30000,15000,0,0,0.5,1,1680,17658.0,0
2,0,1005005006,"Winset, Inc. dba Bankers Hill",SAN DIEGO,CA,92103,CALIFORNIA BANK & TRUST,CA,531210,15188,2001,36,10,1.0,0,0,1,0,Y,N,,15218.0,30000,0,P I F,0,30000,15000,0,0,0.5,0,1080,16298.0,0
3,1,1005535001,Shiva Management,SAN DIEGO,CA,92108,CALIFORNIA BANK & TRUST,CA,531312,15719,2003,36,6,1.0,0,0,1,0,Y,N,,15736.0,50000,0,P I F,0,50000,25000,0,0,0.5,0,1080,16816.0,0
4,1,1005996006,"GOLD CROWN HOME LOANS, INC",LOS ANGELES,CA,91345,SBA - EDF ENFORCEMENT ACTION,CO,531390,16840,2006,240,65,1.0,3,65,1,1,0,N,,16903.0,343000,0,P I F,0,343000,343000,0,1,1.0,0,7200,24103.0,0


In [None]:
sba_loans_df.shape

(2102, 35)

# Data Preprocessing

In [None]:
from sklearn.preprocessing import StandardScaler
from pandas import get_dummies
import numpy as np

Remove unnecessary columns

In [None]:
sba_loans_df = sba_loans_df.drop(columns=['Selected', 'LoanNr_ChkDgt', 'Name', 'ApprovalDate', 'ApprovalFY', 'FranchiseCode', 'RevLineCr', 'LowDoc', 'ChgOffDate', 'DisbursementDate', 'DisbursementGross', 'BalanceGross', 'MIS_Status', 'ChgOffPrinGr', 'Portion', 'xx', 'Default'])

In [None]:
sba_loans_df

Unnamed: 0,City,State,Zip,Bank,BankState,NAICS,Term,NoEmp,NewExist,CreateJob,RetainedJob,UrbanRural,GrAppv,SBA_Appv,New,RealEstate,Recession,daysterm
0,ANAHEIM,CA,92801,CALIFORNIA BANK & TRUST,CA,532420,36,1,1.0,0,0,0,30000,15000,0,0,0,1080
1,TORRANCE,CA,90505,CALIFORNIA BANK & TRUST,CA,531210,56,1,1.0,0,0,0,30000,15000,0,0,1,1680
2,SAN DIEGO,CA,92103,CALIFORNIA BANK & TRUST,CA,531210,36,10,1.0,0,0,0,30000,15000,0,0,0,1080
3,SAN DIEGO,CA,92108,CALIFORNIA BANK & TRUST,CA,531312,36,6,1.0,0,0,0,50000,25000,0,0,0,1080
4,LOS ANGELES,CA,91345,SBA - EDF ENFORCEMENT ACTION,CO,531390,240,65,1.0,3,65,1,343000,343000,0,1,0,7200
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2097,HIGHLAND,CA,92346,UNITI BANK,CA,532230,60,5,2.0,0,5,1,150000,75000,1,0,0,1800
2098,EL CAJON,CA,92021,ZIONS FIRST NATIONAL BANK,UT,532120,300,4,1.0,0,0,0,99000,79200,0,1,0,9000
2099,CAMARILLO,CA,93012,CITY NATIONAL BANK,CA,532120,84,2,1.0,0,0,0,50000,40000,0,0,0,2520
2100,SUN VALLEY,CA,91352,CITY NATIONAL BANK,CA,532120,120,3,1.0,0,0,0,500000,375000,0,0,0,3600


In [None]:
sba_loans_df.dropna(inplace=True)

In [None]:
sba_loans_df_c = get_dummies(sba_loans_df, columns=['City', 'State', 'Zip', 'Bank', 'BankState', 'NAICS'])

In [None]:
sba_loans_df_c

Unnamed: 0,Term,NoEmp,NewExist,CreateJob,RetainedJob,UrbanRural,GrAppv,SBA_Appv,New,RealEstate,Recession,daysterm,City_ACAMPO,City_ACTON,City_ADELANTO,City_AGOURA,City_AGOURA HILLS,City_AGUANGA,City_ALAMEDA,City_ALBANY,City_ALHAMBRA,City_ALISO VIEJO,City_ALPINE,City_ALTADENA,City_ALVISO,City_ANAHEIM,City_ANDERSON,City_ANTELOPE,City_ANTIOCH,City_APPLE VALLEY,City_ARCADE,City_ARCADIA,City_ARCATA,City_ARTESIA,City_ATWATER,City_AUBURN,City_AZUSA,City_Arcata,City_BAKERSFIELD,City_BALDWIN PARK,...,BankState_FL,BankState_IL,BankState_IN,BankState_MN,BankState_MO,BankState_NC,BankState_NV,BankState_NY,BankState_OH,BankState_OR,BankState_RI,BankState_SC,BankState_SD,BankState_TX,BankState_UT,BankState_VA,NAICS_531110,NAICS_531120,NAICS_531130,NAICS_531190,NAICS_531210,NAICS_531311,NAICS_531312,NAICS_531320,NAICS_531390,NAICS_532111,NAICS_532112,NAICS_532120,NAICS_532210,NAICS_532220,NAICS_532230,NAICS_532291,NAICS_532292,NAICS_532299,NAICS_532310,NAICS_532411,NAICS_532412,NAICS_532420,NAICS_532490,NAICS_533110
0,36,1,1.0,0,0,0,30000,15000,0,0,0,1080,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
1,56,1,1.0,0,0,0,30000,15000,0,0,1,1680,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,36,10,1.0,0,0,0,30000,15000,0,0,0,1080,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,36,6,1.0,0,0,0,50000,25000,0,0,0,1080,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,240,65,1.0,3,65,1,343000,343000,0,1,0,7200,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2097,60,5,2.0,0,5,1,150000,75000,1,0,0,1800,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
2098,300,4,1.0,0,0,0,99000,79200,0,1,0,9000,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
2099,84,2,1.0,0,0,0,50000,40000,0,0,0,2520,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
2100,120,3,1.0,0,0,0,500000,375000,0,0,0,3600,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0


In [None]:
scaler = StandardScaler()
scaled_sba = scaler.fit_transform(sba_loans_df_c[['Term', 'NoEmp', 'daysterm']])

In [None]:
scaled_sba = pd.DataFrame(scaled_sba, columns=['Term', 'NoEmp', 'daysterm'])

In [None]:
scaled_sba

Unnamed: 0,Term,NoEmp,daysterm
0,-0.970058,-0.266269,-0.970058
1,-0.756951,-0.266269,-0.756951
2,-0.970058,-0.004832,-0.970058
3,-0.970058,-0.121026,-0.970058
4,1.203632,1.592838,1.203632
...,...,...,...
2093,-0.714329,-0.150075,-0.714329
2094,1.842953,-0.179124,1.842953
2095,-0.458601,-0.237221,-0.458601
2096,-0.075009,-0.208172,-0.075009


In [None]:
newexist_sba_loans = sba_loans_df_c['NewExist'].apply(lambda x: -1 if x==2 else x)
newexist_sba_loans = pd.DataFrame(newexist_sba_loans, columns=['NewExist'])

In [None]:
sba_loans_df_c.drop(columns=['Term', 'NoEmp', 'daysterm', 'NewExist'], inplace=True)

In [None]:
scaled_sba_loans = pd.concat([scaled_sba, newexist_sba_loans, sba_loans_df_c], axis=1)

In [None]:
scaled_sba_loans

Unnamed: 0,Term,NoEmp,daysterm,NewExist,CreateJob,RetainedJob,UrbanRural,GrAppv,SBA_Appv,New,RealEstate,Recession,City_ACAMPO,City_ACTON,City_ADELANTO,City_AGOURA,City_AGOURA HILLS,City_AGUANGA,City_ALAMEDA,City_ALBANY,City_ALHAMBRA,City_ALISO VIEJO,City_ALPINE,City_ALTADENA,City_ALVISO,City_ANAHEIM,City_ANDERSON,City_ANTELOPE,City_ANTIOCH,City_APPLE VALLEY,City_ARCADE,City_ARCADIA,City_ARCATA,City_ARTESIA,City_ATWATER,City_AUBURN,City_AZUSA,City_Arcata,City_BAKERSFIELD,City_BALDWIN PARK,...,BankState_FL,BankState_IL,BankState_IN,BankState_MN,BankState_MO,BankState_NC,BankState_NV,BankState_NY,BankState_OH,BankState_OR,BankState_RI,BankState_SC,BankState_SD,BankState_TX,BankState_UT,BankState_VA,NAICS_531110,NAICS_531120,NAICS_531130,NAICS_531190,NAICS_531210,NAICS_531311,NAICS_531312,NAICS_531320,NAICS_531390,NAICS_532111,NAICS_532112,NAICS_532120,NAICS_532210,NAICS_532220,NAICS_532230,NAICS_532291,NAICS_532292,NAICS_532299,NAICS_532310,NAICS_532411,NAICS_532412,NAICS_532420,NAICS_532490,NAICS_533110
0,-0.970058,-0.266269,-0.970058,1.0,0.0,0.0,0.0,30000.0,15000.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
1,-0.756951,-0.266269,-0.756951,1.0,0.0,0.0,0.0,30000.0,15000.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,-0.970058,-0.004832,-0.970058,1.0,0.0,0.0,0.0,30000.0,15000.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,-0.970058,-0.121026,-0.970058,1.0,0.0,0.0,0.0,50000.0,25000.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,1.203632,1.592838,1.203632,1.0,3.0,65.0,1.0,343000.0,343000.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2097,-0.714329,-0.179124,-0.714329,-1.0,0.0,5.0,1.0,150000.0,75000.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2098,,,,1.0,0.0,0.0,0.0,99000.0,79200.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2099,,,,1.0,0.0,0.0,0.0,50000.0,40000.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2100,,,,1.0,0.0,0.0,0.0,500000.0,375000.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
scaled_sba_loans.dropna(inplace=True)
scaled_sba_loans

Unnamed: 0,Term,NoEmp,daysterm,NewExist,CreateJob,RetainedJob,UrbanRural,GrAppv,SBA_Appv,New,RealEstate,Recession,City_ACAMPO,City_ACTON,City_ADELANTO,City_AGOURA,City_AGOURA HILLS,City_AGUANGA,City_ALAMEDA,City_ALBANY,City_ALHAMBRA,City_ALISO VIEJO,City_ALPINE,City_ALTADENA,City_ALVISO,City_ANAHEIM,City_ANDERSON,City_ANTELOPE,City_ANTIOCH,City_APPLE VALLEY,City_ARCADE,City_ARCADIA,City_ARCATA,City_ARTESIA,City_ATWATER,City_AUBURN,City_AZUSA,City_Arcata,City_BAKERSFIELD,City_BALDWIN PARK,...,BankState_FL,BankState_IL,BankState_IN,BankState_MN,BankState_MO,BankState_NC,BankState_NV,BankState_NY,BankState_OH,BankState_OR,BankState_RI,BankState_SC,BankState_SD,BankState_TX,BankState_UT,BankState_VA,NAICS_531110,NAICS_531120,NAICS_531130,NAICS_531190,NAICS_531210,NAICS_531311,NAICS_531312,NAICS_531320,NAICS_531390,NAICS_532111,NAICS_532112,NAICS_532120,NAICS_532210,NAICS_532220,NAICS_532230,NAICS_532291,NAICS_532292,NAICS_532299,NAICS_532310,NAICS_532411,NAICS_532412,NAICS_532420,NAICS_532490,NAICS_533110
0,-0.970058,-0.266269,-0.970058,1.0,0.0,0.0,0.0,30000.0,15000.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
1,-0.756951,-0.266269,-0.756951,1.0,0.0,0.0,0.0,30000.0,15000.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,-0.970058,-0.004832,-0.970058,1.0,0.0,0.0,0.0,30000.0,15000.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,-0.970058,-0.121026,-0.970058,1.0,0.0,0.0,0.0,50000.0,25000.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,1.203632,1.592838,1.203632,1.0,3.0,65.0,1.0,343000.0,343000.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2093,-0.714329,-0.150075,-0.714329,1.0,0.0,3.0,1.0,100000.0,50000.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2094,1.842953,-0.179124,1.842953,1.0,0.0,3.0,2.0,30000.0,15000.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2095,-0.458601,-0.237221,-0.458601,1.0,4.0,6.0,1.0,721000.0,721000.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2096,-0.075009,-0.208172,-0.075009,1.0,8.0,28.0,1.0,1029000.0,1029000.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0


# Creating Linear Regression Model

In [None]:
X = scaled_sba_loans
X.drop(columns=['CreateJob', 'RetainedJob', 'GrAppv', 'SBA_Appv'])
y = scaled_sba_loans[['CreateJob', 'RetainedJob', 'GrAppv', 'SBA_Appv']]

In [None]:
X.shape

(2094, 1543)

In [None]:
y.shape

(2094, 4)

In [None]:
from sklearn.linear_model import LinearRegression

In [None]:
lr = LinearRegression()

In [None]:
lr.fit(X, y)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [None]:
lr.score(X, y)



1.0

Predict for first value in data

In [None]:
X_predict_arr = X.iloc[0]

In [None]:
X_predict_arr

Term           -0.970058
NoEmp          -0.266269
daysterm       -0.970058
NewExist        1.000000
CreateJob       0.000000
                  ...   
NAICS_532411    0.000000
NAICS_532412    0.000000
NAICS_532420    1.000000
NAICS_532490    0.000000
NAICS_533110    0.000000
Name: 0, Length: 1543, dtype: float64

In [None]:
lr.predict([X_predict_arr])

array([[ 2.17817800e-10, -9.57758007e-10,  3.00000000e+04,
         1.50000000e+04]])

In [None]:
y.iloc[0]

CreateJob          0.0
RetainedJob        0.0
GrAppv         30000.0
SBA_Appv       15000.0
Name: 0, dtype: float64

# Create Neural Network

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y)

In [None]:
model = Sequential()

In [None]:
X.shape

(2094, 1543)

In [None]:
model.add(Dense(1543, activation='relu', input_shape=(1543,)))
model.add(Dense(7, activation='relu'))
model.add(Dense(4, activation='relu'))

In [None]:
model.summary()

Model: "sequential_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_15 (Dense)             (None, 1543)              2382392   
_________________________________________________________________
dense_16 (Dense)             (None, 7)                 10808     
_________________________________________________________________
dense_17 (Dense)             (None, 4)                 32        
Total params: 2,393,232
Trainable params: 2,393,232
Non-trainable params: 0
_________________________________________________________________


In [None]:
model.compile(optimizer='adam', loss='mean_squared_logarithmic_error', metrics=['accuracy'])

In [None]:
model.fit(X_train, y_train, epochs=100, validation_data=(X_test, y_test))

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<tensorflow.python.keras.callbacks.History at 0x7f2e99e39250>

In [None]:
model.evaluate(X_test, y_test)



[67761.359375, 1.0]