# Travel in 2020

The year 2020 has been filled with many different events that have impacted various commercial areas. Specifically, the travel industry in the United States has seen a significant down tick in the number of people traveling. The purpose of this project is to collect data from various different sources to see if we can predict with high accuracy the number of people that would travel on a single day, specifically flying. We will be using data for number of travelers from the TSA website, data from the ourworldindata.org for covid case counts per day, stock market data from yahoo finance, and various other sources to know dates of travel bans.  

## Import Libraries

In [1]:
#Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import time
import math
from bs4 import BeautifulSoup
%matplotlib inline
from dfply import *

## Read in data

In [3]:
#Read in covid case and death count data
covid = pd.read_csv('daily-covid-cases-deaths.csv')

In [4]:
#Check to see if read in properly
covid.head()

Unnamed: 0,Entity,Code,Date,Daily new confirmed cases of COVID-19,Daily new confirmed deaths due to COVID-19
0,Afghanistan,AFG,2020-01-23,0,0
1,Afghanistan,AFG,2020-01-24,0,0
2,Afghanistan,AFG,2020-01-25,0,0
3,Afghanistan,AFG,2020-01-26,0,0
4,Afghanistan,AFG,2020-01-27,0,0


In [5]:
#Read in stock market data
stock = pd.read_csv('^DJI.csv')

In [6]:
#Check to see if read in properly
stock.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2019-12-10,27900.650391,27949.019531,27804.0,27881.720703,27881.720703,213250000
1,2019-12-11,27867.310547,27925.5,27801.800781,27911.300781,27911.300781,213510000
2,2019-12-12,27898.339844,28224.949219,27859.869141,28132.050781,28132.050781,277740000
3,2019-12-13,28123.640625,28290.730469,28028.320313,28135.380859,28135.380859,250660000
4,2019-12-16,28191.669922,28337.490234,28191.669922,28235.890625,28235.890625,286770000


In [9]:
#Read in data for number of travelers
tsa_data = pd.read_csv('tsa_data.csv')

In [13]:
#Check to see if read in properly
tsa_data.head()

Unnamed: 0,Date,Total Traveler Throughput,Total Traveler Throughput (1 Year Ago - Same Weekday)
0,12/9/2020,564372,2020488
1,12/8/2020,501513,1897051
2,12/7/2020,703546,2226290
3,12/6/2020,837137,2292079
4,12/5/2020,629430,1755801


In [14]:
#Sources for NBA and other sport suspensions
# https://www.nba.com/news/nba-suspend-season-following-wednesdays-games
# https://bleacherreport.com/articles/2880569-timeline-of-coronavirus-impact-on-sports

In [20]:
#Create dataset to show when first major sports league suspended season (NBA) and when it restarted
nba = pd.DataFrame()

sports_maybe = []

for i in range(1, 366):
    if i < 71:
        sports_maybe.append(1)
    elif i >= 71 or i < 189:
        sports_maybe.append(0)
    else:
        sports_maybe.append(1)
        
nba['Games'] = sports_maybe
        

In [22]:
nba.head()

Unnamed: 0,Games
0,1
1,1
2,1
3,1
4,1


In [23]:
#Source for first mask recommendation made by CDC
# https://www.npr.org/sections/coronavirus-live-updates/2020/04/03/826219824/president-trump-says-cdc-now-recommends-americans-wear-cloth-masks-in-public

In [25]:
#Create dataset to show when masks were recommended

masks = pd.DataFrame()

masks_lst = []

for i in range(1, 366):
    if i < 93:
        masks_lst.append(0)
    else:
        masks_lst.append(1)

masks['recommendation'] = masks_lst

In [27]:
#Check dataset
masks

Unnamed: 0,recommendation
0,0
1,0
2,0
3,0
4,0
...,...
360,1
361,1
362,1
363,1


## Clean up data and Create main dataframe

In [32]:
#Clean up covid dataset to only include United States
usa_covid = covid.loc[covid['Entity'] == 'United States']

In [34]:
#Check earliest date for tsa dates
tsa_data.tail()

Unnamed: 0,Date,Total Traveler Throughput,Total Traveler Throughput (1 Year Ago - Same Weekday)
279,3/5/2020,2130015,2402692
280,3/4/2020,1877401,2143619
281,3/3/2020,1736393,1979558
282,3/2/2020,2089641,2257920
283,3/1/2020,2280522,2301439
