Created by: [SmirkyGraphs](http://smirkygraphs.github.io/). Code: [Github](https://github.com/SmirkyGraphs/Python-Notebooks). Source: [treasury.gov](https://home.treasury.gov/policy-issues/cares-act/assistance-for-small-businesses/sba-paycheck-protection-program-loan-level-data).
<hr>

# Rhode Island SBA PPP Loans

This notebook contains code used to combine all data for the SBA Paycheck Projection Program. The data is split into 2 datasets, one for payments under 150k and one for payments over 150k for the entire nation. The code below is used to clean the data and add in the NAICS code for the company industry.

Tableau project based off the final cleaned dataset: [here](https://ivizri.com/posts/2020/07/ri-sba-ppp-loans/).
<hr>

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('./data/raw/PPP Data up to 150K - RI.csv')

df['id'] = df.index + 1 # add an id number
df['NonProfit'] = df['NonProfit'].fillna('N') # tag for non-profit
df['loan_per_job'] = df['LoanAmount']/df['JobsRetained'] # avg per job saved, then save file
df['City'] = df['City'].apply(lambda x: x.title()) # title case city

In [3]:
# merge industry, sector, sub-sector industry group and national industry
naics = pd.read_csv('./data/external/2-6_digit_naics_codes.csv')
naics = naics.set_index('2017 NAICS US Code')
naics.index = naics.index.astype(str)
df['NAICSCode'] = df['NAICSCode'].fillna(0).astype(int).astype(str)

naics = naics.rename(columns={'2017 NAICS US Title': 'national industry'})
df = df.merge(naics, how='left', left_on='NAICSCode', right_on='2017 NAICS US Code')

df['temp'] = df['NAICSCode'].str[:2]
naics = naics.rename(columns={'national industry': 'sector'})
df = df.merge(naics, how='left', left_on='temp', right_on='2017 NAICS US Code')

df['temp'] = df['NAICSCode'].str[:3]
naics = naics.rename(columns={'sector': 'subsector'})
df = df.merge(naics, how='left', left_on='temp', right_on='2017 NAICS US Code')

df['temp'] = df['NAICSCode'].str[:4]
naics = naics.rename(columns={'subsector': 'industry group'})
df = df.merge(naics, how='left', left_on='temp', right_on='2017 NAICS US Code')

# demographics contains a lot of missing info so removing it
drop_cols = ['temp', 'Gender', 'Veteran', 'RaceEthnicity', 'CD']
df = df.drop(columns=drop_cols)

# save file
df.to_csv('./data/clean/ri_ppp_under_150k.csv', index=False)

In [4]:
df = pd.read_csv('./data/raw/PPP Data 150k plus.csv')

df = df[df['State']=='RI'].reset_index(drop=True) # filter for ri
df['id'] = df.index + 1 # add an id number
df['LoanRange'] = df['LoanRange'].str[2:] # strip letter in loan range
df['City'] = df['City'].apply(lambda x: x.title()) # title case city

In [5]:
# merge industry, sector, sub-sector industry group and national industry
naics = pd.read_csv('./data/external/2-6_digit_naics_codes.csv')
naics = naics.set_index('2017 NAICS US Code')
naics.index = naics.index.astype(str)
df['NAICSCode'] = df['NAICSCode'].fillna(0).astype(int).astype(str)

naics = naics.rename(columns={'2017 NAICS US Title': 'national industry'})
df = df.merge(naics, how='left', left_on='NAICSCode', right_on='2017 NAICS US Code')

df['temp'] = df['NAICSCode'].str[:2]
naics = naics.rename(columns={'national industry': 'sector'})
df = df.merge(naics, how='left', left_on='temp', right_on='2017 NAICS US Code')

df['temp'] = df['NAICSCode'].str[:3]
naics = naics.rename(columns={'sector': 'subsector'})
df = df.merge(naics, how='left', left_on='temp', right_on='2017 NAICS US Code')

df['temp'] = df['NAICSCode'].str[:4]
naics = naics.rename(columns={'subsector': 'industry group'})
df = df.merge(naics, how='left', left_on='temp', right_on='2017 NAICS US Code')

# demographics contains a lot of missing info so removing it
drop_cols = ['temp', 'Gender', 'Veteran', 'RaceEthnicity', 'CD']
df = df.drop(columns=drop_cols)

# save file
df.to_csv('./data/clean/ri_ppp_over_150k.csv', index=False)