<a href="https://colab.research.google.com/github/philiplindsay/storytelling-with-data/blob/master/PPP_(Module_4).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Setup

This demonstration notebook provides a suggested set of libraries that you might find useful in crafting your data stories.  You should comment out or delete libraries that you don't use in your analysis.

In [0]:
#number crunching
import numpy as np
import pandas as pd

#data import
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

#data visualization
import plotly
import plotly.express as px

# Project team

Arvin, MK, Elliott, Philip

# Background and overview

Due to the recent pandemic, the United States government has highly encouraged (at times forced) people to stay at home. While the stay-at-home order protected individuals from contracting COVID-19, it did not protect small businesses from losing their customers. The lack of customers meant that the businesses cannot generate revenue, which in turn meant that they couldn’t keep their workers on the payroll. The decrease of small business activity became a clear problem.

As a solution, the US Small Business Administration (SBA) proposed the Paycheck Protection Program (PPP). It is a loan program designed to keep the small businesses alive by providing an incentive for them to keep their workers on the payroll. PPP also states that “SBA will forgive loans if all employees are kept on the payroll for eight weeks and the money is used for payroll, rent, mortgage interest, or utilities.”

Upon finding about PPP, small businesses filed the applications. It seemed like a perfect plan for many small businesses to survive this pandemic. However, 94 percent of small businesses were disappointed to find out that they were not given the fund. On the other hand, the remaining six percent claimed over 340 million dollars. Surprisingly, out of the 340-million-dollar fund, a large sum has ended up with small businesses located in the Midwest, notably North Dakota (ND). This brought our attention to why the Midwestern states have received more funding, both in amounts and in numbers. 

# Approach

To investigate the possibilities behind why the Midwestern states received more PPP loans per small business, we will be looking at the choropleth maps of the United States divided up by states. The maps will represent the number of approved PPP loans per small business per state and the amount of PPP loans per small business per state. Additionally, we will be investigating the possibilities behind why some states are receiving more PPP loans, which includes the possibility of the Midwestern states having bigger industries that need the funding. 

# Quick summary

Though we cannot confidently reach a conclusion on why the Midwestern states are receiving more funding than the other coastal states, we hypothesize that this stems from the Trump administration's mission to revive the manufacturing industry in the Midwest. 



# Data

Briefly describe your dataset(s), including links to original sources.  Provide any relevant background information specific to your data sources.

In [0]:
states = pd.read_excel('https://github.com/philiplindsay/storytelling-with-data/raw/master/data-stories/COVID-19/PPP-data.xlsx', sheet_name = 0)
size = pd.read_excel('https://github.com/philiplindsay/storytelling-with-data/raw/master/data-stories/COVID-19/PPP-data.xlsx', sheet_name = 1)
industry = pd.read_excel('https://github.com/philiplindsay/storytelling-with-data/raw/master/data-stories/COVID-19/PPP-data.xlsx', sheet_name = 2)
tot_states = pd.read_excel('https://github.com/philiplindsay/storytelling-with-data/raw/master/data-stories/COVID-19/PPP-data.xlsx', sheet_name = 3)
tot_industry = pd.read_excel('https://github.com/philiplindsay/storytelling-with-data/raw/master/data-stories/COVID-19/PPP-data.xlsx', sheet_name = 4)
unemp = pd.read_excel('https://github.com/philiplindsay/storytelling-with-data/raw/master/data-stories/COVID-19/PPP-data.xlsx', sheet_name = 5)

In [0]:
sbstates = pd.merge(states, tot_states, on = 'State', how = 'inner')
sbindustry = pd.merge(industry, tot_industry, on = 'Industry', how = 'inner')

In [0]:
sbstates['Approved Loans (per small business)'] = (sbstates['Approved Loans'] / sbstates['Small Businesses'])
sbstates['Approved Amounts (USD per small business)'] = (sbstates['Approved Amounts'] / sbstates['Small Businesses'])
sbstates['Average Loan Size'] = sbstates['Approved Amounts'] / sbstates['Approved Loans']
sbindustry['Approved Loans (per small business)'] = (sbindustry['Approved Loans'] / sbindustry['Small Businesses'])
sbindustry['Approved Amounts (USD per small business)'] = (sbindustry['Approved Dollars'] / sbindustry['Small Businesses'])
tot_states['% Share of Manufacturing'] = (tot_states['Manufacturing'] / tot_states['Small Businesses']) * 100
unemp['Change in Unemployment'] = unemp['March'] - unemp['January']

# Analysis

Briefly describe each step of your analysis, followed by the code implementing that part of the analysis and/or producing the relevant figures.  (Copy this text block and the following code block as many times as are needed.)

In [0]:
px.choropleth(data_frame = sbstates, locations = 'State', locationmode = 'USA-states', color = sbstates['Approved Loans (per small business)'], scope = 'usa', title = 'Number of Approved PPP Loans per small business (4/16/2020)')

Notes from class: Maybe the political leaning is playing a role in helping these Midwest states get their shares of PPP.

In [0]:
px.choropleth(data_frame = sbstates, locations = 'State', locationmode = 'USA-states', color = sbstates['Approved Amounts (USD per small business)'], scope = 'usa', range_color = (5000, 22000), title = 'Approved PPP Loan Amounts per small business (4/16/2020)')

In [0]:
numloans = px.bar(sbindustry, x = 'Industry', y = 'Approved Loans (per small business)', color = 'Industry', title = 'Number of Loans by Industry per small business') 
numloans.update_layout(showlegend = False)
numloans

In [0]:
industry['Average Loan Size'] = industry['Approved Dollars'] / industry['Approved Loans']
average_size_by_sector = px.bar(industry, x = 'Industry', y = 'Average Loan Size', color = 'Industry', title = 'Average Loan Size Per Industry') 
average_size_by_sector.update_layout(showlegend = False)
average_size_by_sector

In [0]:
px.choropleth(data_frame = tot_states, locations = 'State', locationmode = 'USA-states', color = tot_states['% Share of Manufacturing'], scope = 'usa', title = '% Share of Manufacturing Small Businesses')

In [0]:
px.choropleth(data_frame = unemp, locations = 'State', locationmode = 'USA-states', color = unemp['Change in Unemployment'], scope = 'usa', title = 'Change in Unemployment Rate (1/20 - 3/20)')

AK: Notes for the group. Manufacturing has the highest loan size per industry. This is curious as the manufacturing industry has really taken a hit since the China joined the WTO in 2001. The China shock (the phenomenon describing the increase of Chinese manufacturing impact as a share of US GDP) displaced 500,000 - 1M jobs between 2000-2007. Manufacturing employment and wages are at all time lows and have struggled to return to it heyday during the age of NAFTA. Trump administration has mitigated this phenomenon by waging a trade war. I wonder if manufacturers are getting bailed out more because of some political factors. 

# Interpretations and conclusions

In conclusion, one of the principal reasons why we think that there are a high number of approved PPP loans per small business in the Midwest is because of the relationship between the manufacturing industry and Trump's political aspirations, and also the ongoing trade war with China. The fact that unemployment is low in the Midwest and the fact that PPP loans are highly concentrated there would serve well in securing crucial votes for Trump in key states while also cushioning the economic impact of the trade war in China. There are definitely more questions that are left to explore, one of which is looking at individual industry data in order to more finely see the granularity of effects in the manufactoring industry. However, this data is not currently available.  

# Future directions


Some questions that another group might want to answer would be assessing the bank-lender relations in the Midwest in order to see whether or not these are contributing to the differences in the high concentration of approved PPP loans per small business. It would also be interesting to look at the industry breakdown of different banks and how they have been handling distributing PPP loans. This would allow us to see whether or not the differences are due to just differences in banks in the Midwest, or allow us to provide further evidence for our claim that PPP loans are being prioritized in the Midwest for certain political or economic purposes. 