## What is Canonical Correlation Analysis? 

Canonical correlation analysis is a multivariate statistical model that facilitates the study of
linear interrelationships between two sets of variables. One set of variables is referred to as
independent variables and the others are considered dependent variables; a
canonical variate is formed for each set. It may be helpful to think of a canonical variate as
being like the variate (i.e., linear composite) formed from the set of independent variables
in a multiple regression analysis. But in canonical correlation there is also a variate formed
from several dependent variables whereas multiple regression can accommodate only one
dependent variable. Canonical correlation analysis develops a canonical function that
maximizes the canonical correlation coefficient between the two canonical variates. The
canonical correlation coefficient measures the strength of the relationship between the two
canonical variates. Each canonical variate is interpreted with canonical loadings, the
correlation of the individual variables and their respective variates.

## Import Packages

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from sklearn.cross_decomposition import CCA

## Import & Wrangle Data

## Financial Datasets - Markets [NASDAQ | DowJones | S&P 500 | US GDP] 

## NASDAQ Dataset

In [2]:
NASDAQ = pd.read_csv("/Users/darringtonhenderson/Documents/GitHub Final/WozUFinalProject/FinalProject/Data/5yrNASDAQ Data.csv")

In [3]:
NASDAQ.head()

Unnamed: 0,Date,Close/Last,Volume,Open,High,Low
0,04/14/2022,13351.08,--,13647.43,13662.93,13345.22
1,04/13/2022,13643.59,--,13373.12,13679.43,13353.66
2,04/12/2022,13371.57,--,13584.69,13685.95,13317.74
3,04/11/2022,13411.96,--,13547.29,13585.08,13401.39
4,04/08/2022,13711.0,--,13830.47,13866.06,13693.69


In [4]:
NASDAQ.tail()

Unnamed: 0,Date,Close/Last,Volume,Open,High,Low
1254,04/24/2017,5983.82,--,5979.96,5989.92,5970.25
1255,04/21/2017,5910.52,--,5919.02,5919.23,5899.43
1256,04/20/2017,5916.78,--,5887.87,5926.23,5880.2
1257,04/19/2017,5863.03,--,5874.43,5894.67,5856.34
1258,04/18/2017,5849.47,--,5838.59,5860.04,5828.57


### Dropping 'Volume' Column From NASDAQ

In [5]:
NASDAQ.drop(['Volume'], axis=1, inplace=True)

In [6]:
NASDAQ.head()

Unnamed: 0,Date,Close/Last,Open,High,Low
0,04/14/2022,13351.08,13647.43,13662.93,13345.22
1,04/13/2022,13643.59,13373.12,13679.43,13353.66
2,04/12/2022,13371.57,13584.69,13685.95,13317.74
3,04/11/2022,13411.96,13547.29,13585.08,13401.39
4,04/08/2022,13711.0,13830.47,13866.06,13693.69


## DowJones Data

In [7]:
DowJones = pd.read_csv("/Users/darringtonhenderson/Documents/GitHub Final/WozUFinalProject/FinalProject/Data/TheDowData.csv")

In [8]:
DowJones.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume
0,1896-05-27,29.39,29.39,29.39,29.39,
1,1896-05-28,29.11,29.11,29.11,29.11,
2,1896-05-29,29.43,29.43,29.43,29.43,
3,1896-06-01,29.4,29.4,29.4,29.4,
4,1896-06-02,29.0,29.0,29.0,29.0,


In [9]:
DowJones.tail()

Unnamed: 0,Date,Open,High,Low,Close,Volume
32611,2022-04-12,34412.51,34669.97,34102.81,34220.36,373804662.0
32612,2022-04-13,34166.64,34598.36,34140.64,34564.59,341268894.0
32613,2022-04-14,34628.46,34889.17,34437.5,34451.23,388476298.0
32614,2022-04-18,34411.49,34618.29,34279.08,34411.69,298634820.0
32615,2022-04-19,34394.62,34983.11,34394.62,34911.2,350100624.0


In [10]:
DowJones.drop(['Volume'], axis=1, inplace=True)

In [11]:
DowJones.head()

Unnamed: 0,Date,Open,High,Low,Close
0,1896-05-27,29.39,29.39,29.39,29.39
1,1896-05-28,29.11,29.11,29.11,29.11
2,1896-05-29,29.43,29.43,29.43,29.43
3,1896-06-01,29.4,29.4,29.4,29.4
4,1896-06-02,29.0,29.0,29.0,29.0


## S&P 500 Dataset

In [12]:
SP500 = pd.read_csv("/Users/darringtonhenderson/Documents/GitHub Final/WozUFinalProject/FinalProject/Data/S&P50010yrData.csv")

In [13]:
SP500.head()

Unnamed: 0,date,value
0,4/20/09,832.39
1,4/21/09,850.08
2,4/22/09,843.55
3,4/23/09,851.92
4,4/24/09,866.23


In [14]:
SP500.tail()

Unnamed: 0,date,value
3271,4/12/22,4397.45
3272,4/13/22,4446.59
3273,4/14/22,4392.59
3274,4/18/22,4391.69
3275,4/19/22,4462.21


### Dropping NaN's from S&P 500

In [15]:
SP500.dropna(inplace=True)

In [16]:
SP500.tail()

Unnamed: 0,date,value
3271,4/12/22,4397.45
3272,4/13/22,4446.59
3273,4/14/22,4392.59
3274,4/18/22,4391.69
3275,4/19/22,4462.21


## US GDP Dataset 

In [17]:
GDP = pd.read_csv("/Users/darringtonhenderson/Documents/GitHub Final/WozUFinalProject/FinalProject/Data/HistoricalUSIndicatorGDP.csv")

In [18]:
GDP.head()

Unnamed: 0,Country,Category,DateTime,Value,Frequency,HistoricalDataSymbol,LastUpdate
0,United States,GDP,1960-12-31T00:00:00,543.3,Yearly,WGDPUS,2017-01-04T14:27:00
1,United States,GDP,1961-12-31T00:00:00,563.3,Yearly,WGDPUS,2017-01-04T14:27:00
2,United States,GDP,1962-12-31T00:00:00,605.1,Yearly,WGDPUS,2017-01-04T14:27:00
3,United States,GDP,1963-12-31T00:00:00,638.6,Yearly,WGDPUS,2017-01-04T14:27:00
4,United States,GDP,1964-12-31T00:00:00,685.8,Yearly,WGDPUS,2017-01-04T14:27:00


In [19]:
GDP.tail()

Unnamed: 0,Country,Category,DateTime,Value,Frequency,HistoricalDataSymbol,LastUpdate
56,United States,GDP,2016-12-31T00:00:00,18745.08,Yearly,WGDPUS,2021-07-05T14:17:00
57,United States,GDP,2017-12-31T00:00:00,19542.98,Yearly,WGDPUS,2021-07-05T14:17:00
58,United States,GDP,2018-12-31T00:00:00,20611.86,Yearly,WGDPUS,2021-07-05T14:17:00
59,United States,GDP,2019-12-31T00:00:00,21433.22,Yearly,WGDPUS,2021-07-05T14:17:00
60,United States,GDP,2020-12-31T00:00:00,20936.6,Yearly,WGDPUS,2021-07-02T08:43:00


### Dropping 'HistoricalDataSymbol', 'LastUpdate' Columns from US GDP

In [20]:
GDP.drop(['HistoricalDataSymbol', 'LastUpdate'], axis=1, inplace=True)

In [21]:
GDP.head()

Unnamed: 0,Country,Category,DateTime,Value,Frequency
0,United States,GDP,1960-12-31T00:00:00,543.3,Yearly
1,United States,GDP,1961-12-31T00:00:00,563.3,Yearly
2,United States,GDP,1962-12-31T00:00:00,605.1,Yearly
3,United States,GDP,1963-12-31T00:00:00,638.6,Yearly
4,United States,GDP,1964-12-31T00:00:00,685.8,Yearly


## US GDP Full Year Growth Dataset 

In [22]:
FullYearGDP = pd.read_csv("/Users/darringtonhenderson/Documents/GitHub Final/WozUFinalProject/FinalProject/Data/HistoricalUSFullYearGDPGrowth.csv")

In [23]:
FullYearGDP.head()

Unnamed: 0,Country,Category,DateTime,Value,Frequency,HistoricalDataSymbol,LastUpdate
0,United States,Full Year GDP Growth,1950-12-31T00:00:00,8.7,Yearly,USAFYGG,2022-01-27T11:36:00
1,United States,Full Year GDP Growth,1951-12-31T00:00:00,8.0,Yearly,USAFYGG,2022-01-27T11:36:00
2,United States,Full Year GDP Growth,1952-12-31T00:00:00,4.1,Yearly,USAFYGG,2022-01-27T11:36:00
3,United States,Full Year GDP Growth,1953-12-31T00:00:00,4.7,Yearly,USAFYGG,2022-01-27T11:36:00
4,United States,Full Year GDP Growth,1954-12-31T00:00:00,-0.6,Yearly,USAFYGG,2022-01-27T11:36:00


In [24]:
FullYearGDP.tail()

Unnamed: 0,Country,Category,DateTime,Value,Frequency,HistoricalDataSymbol,LastUpdate
67,United States,Full Year GDP Growth,2017-12-31T00:00:00,2.3,Yearly,USAFYGG,2022-01-27T11:36:00
68,United States,Full Year GDP Growth,2018-12-31T00:00:00,2.9,Yearly,USAFYGG,2022-01-27T11:36:00
69,United States,Full Year GDP Growth,2019-12-31T00:00:00,2.3,Yearly,USAFYGG,2022-01-27T11:36:00
70,United States,Full Year GDP Growth,2020-12-31T00:00:00,-3.4,Yearly,USAFYGG,2022-01-27T11:36:00
71,United States,Full Year GDP Growth,2021-12-31T00:00:00,5.7,Yearly,USAFYGG,2022-01-27T13:39:00


### Dropping 'HistoricalDataSymbol', 'LastUpdate' Columns from  FullYearGDP

In [25]:
FullYearGDP.drop(['HistoricalDataSymbol', 'LastUpdate'], axis=1, inplace=True)

In [26]:
FullYearGDP.head()

Unnamed: 0,Country,Category,DateTime,Value,Frequency
0,United States,Full Year GDP Growth,1950-12-31T00:00:00,8.7,Yearly
1,United States,Full Year GDP Growth,1951-12-31T00:00:00,8.0,Yearly
2,United States,Full Year GDP Growth,1952-12-31T00:00:00,4.1,Yearly
3,United States,Full Year GDP Growth,1953-12-31T00:00:00,4.7,Yearly
4,United States,Full Year GDP Growth,1954-12-31T00:00:00,-0.6,Yearly


## Annual GDP Growth Dataset

In [27]:
AnnualGDP = pd.read_csv("/Users/darringtonhenderson/Documents/GitHub Final/WozUFinalProject/FinalProject/Data/HistoricalUSGDPAnnualGrowth.csv")

In [28]:
AnnualGDP.head()

Unnamed: 0,Country,Category,DateTime,Value,Frequency,HistoricalDataSymbol,LastUpdate
0,United States,GDP Annual Growth Rate,1948-03-31T00:00:00,2.6,Quarterly,GDP CYOY,2018-07-27T13:02:00
1,United States,GDP Annual Growth Rate,1948-06-30T00:00:00,4.6,Quarterly,GDP CYOY,2014-07-30T15:38:00
2,United States,GDP Annual Growth Rate,1948-09-30T00:00:00,5.4,Quarterly,GDP CYOY,2018-07-27T13:02:00
3,United States,GDP Annual Growth Rate,1948-12-31T00:00:00,3.9,Quarterly,GDP CYOY,2018-07-27T13:02:00
4,United States,GDP Annual Growth Rate,1949-03-31T00:00:00,0.9,Quarterly,GDP CYOY,2014-07-30T15:38:00


In [29]:
AnnualGDP.tail()

Unnamed: 0,Country,Category,DateTime,Value,Frequency,HistoricalDataSymbol,LastUpdate
291,United States,GDP Annual Growth Rate,2020-12-31T00:00:00,-2.3,Quarterly,GDP CYOY,2021-07-29T13:19:00
292,United States,GDP Annual Growth Rate,2021-03-31T00:00:00,0.5,Quarterly,GDP CYOY,2021-07-29T13:19:00
293,United States,GDP Annual Growth Rate,2021-06-30T00:00:00,12.2,Quarterly,GDP CYOY,2021-07-29T13:19:00
294,United States,GDP Annual Growth Rate,2021-09-30T00:00:00,4.9,Quarterly,GDP CYOY,2021-10-28T13:11:00
295,United States,GDP Annual Growth Rate,2021-12-31T00:00:00,5.5,Quarterly,GDP CYOY,2022-03-30T12:52:00


### Dropping 'HistoricalDataSymbol', 'LastUpdate' Columns from  Annual GDP Growth

In [30]:
AnnualGDP.drop(['HistoricalDataSymbol', 'LastUpdate'], axis=1, inplace=True)

In [31]:
AnnualGDP.head()

Unnamed: 0,Country,Category,DateTime,Value,Frequency
0,United States,GDP Annual Growth Rate,1948-03-31T00:00:00,2.6,Quarterly
1,United States,GDP Annual Growth Rate,1948-06-30T00:00:00,4.6,Quarterly
2,United States,GDP Annual Growth Rate,1948-09-30T00:00:00,5.4,Quarterly
3,United States,GDP Annual Growth Rate,1948-12-31T00:00:00,3.9,Quarterly
4,United States,GDP Annual Growth Rate,1949-03-31T00:00:00,0.9,Quarterly


## Business Datasets - Closed/Opened  

## Bankrupt Small Businesses Dataset (1) [Trading Economics] 

In [32]:
Bankruptcies1 = pd.read_csv("/Users/darringtonhenderson/Documents/GitHub Final/WozUFinalProject/FinalProject/Data/HistoricalUSBankruptcies.csv")

In [33]:
Bankruptcies1.head()

Unnamed: 0,Country,Category,DateTime,Value,Frequency,HistoricalDataSymbol,LastUpdate
0,United States,Bankruptcies,1980-12-31T00:00:00,43694,Quarterly,UNITEDSTABAN,2015-04-25T14:40:00
1,United States,Bankruptcies,1981-12-31T00:00:00,48125,Quarterly,UNITEDSTABAN,2015-04-25T14:40:00
2,United States,Bankruptcies,1982-12-31T00:00:00,69300,Quarterly,UNITEDSTABAN,2015-04-25T14:40:00
3,United States,Bankruptcies,1983-12-31T00:00:00,62436,Quarterly,UNITEDSTABAN,2015-04-25T14:40:00
4,United States,Bankruptcies,1984-12-31T00:00:00,64004,Quarterly,UNITEDSTABAN,2015-04-25T14:40:00


In [34]:
Bankruptcies1.tail()

Unnamed: 0,Country,Category,DateTime,Value,Frequency,HistoricalDataSymbol,LastUpdate
133,United States,Bankruptcies,2020-12-31T00:00:00,21655,Quarterly,UNITEDSTABAN,2021-01-30T10:01:00
134,United States,Bankruptcies,2021-03-31T00:00:00,19911,Quarterly,UNITEDSTABAN,2021-05-08T10:01:00
135,United States,Bankruptcies,2021-06-30T00:00:00,18511,Quarterly,UNITEDSTABAN,2021-08-07T10:01:00
136,United States,Bankruptcies,2021-09-30T00:00:00,16140,Quarterly,UNITEDSTABAN,2021-11-13T10:00:00
137,United States,Bankruptcies,2021-12-31T00:00:00,14347,Quarterly,UNITEDSTABAN,2022-02-05T10:00:00


### Dropping 'HistoricalDataSymbol', 'LastUpdate' Columns from Trading Economics Bankruptcies

In [36]:
Bankruptcies1.drop(['HistoricalDataSymbol', 'LastUpdate'], axis=1, inplace=True)

In [37]:
Bankruptcies1.head()

Unnamed: 0,Country,Category,DateTime,Value,Frequency
0,United States,Bankruptcies,1980-12-31T00:00:00,43694,Quarterly
1,United States,Bankruptcies,1981-12-31T00:00:00,48125,Quarterly
2,United States,Bankruptcies,1982-12-31T00:00:00,69300,Quarterly
3,United States,Bankruptcies,1983-12-31T00:00:00,62436,Quarterly
4,United States,Bankruptcies,1984-12-31T00:00:00,64004,Quarterly


## Bankrupt Small Businesses Dataset (2) [USA Facts] 

In [38]:
Bankruptcies2 = pd.read_csv("/Users/darringtonhenderson/Documents/GitHub Final/WozUFinalProject/FinalProject/Data/BankruptciesFiledUSA.csv")

In [39]:
Bankruptcies2.head()

Unnamed: 0,Years,Bankruptcies filed
0,2010,1596355
1,2011,1467221
2,2012,1261140
3,2013,1107699
4,2014,963739


In [40]:
Bankruptcies2.tail()

Unnamed: 0,Years,Bankruptcies filed
6,2016,805580
7,2017,790830
8,2018,773375
9,2019,776674
10,2020,612561


## Opened Small Businesses (A Yr Old) Dataset [USA Facts] 

In [41]:
NewBusinesses1 = pd.read_csv("/Users/darringtonhenderson/Documents/GitHub Final/WozUFinalProject/FinalProject/Data/BusinessesAYearOldUSA.csv")

In [42]:
NewBusinesses1.head()

Unnamed: 0,Years,Businesses less than a year old
0,2010,560588
1,2011,582569
2,2012,631817
3,2013,629078
4,2014,652780


In [43]:
NewBusinesses1.tail()

Unnamed: 0,Years,Businesses less than a year old
7,2017,733490
8,2018,733825
9,2019,770609
10,2020,804641
11,2021,843320


## Opened Small Businesses (Applications) Dataset [USA Facts] 

In [44]:
NewBusinesses2 = pd.read_csv("/Users/darringtonhenderson/Documents/GitHub Final/WozUFinalProject/FinalProject/Data/BusinessApplicationsUSA.csv")

In [45]:
NewBusinesses2.head()

Unnamed: 0,Years,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020
0,Business applications,2502011.0,2647870.0,2659813.0,2556873.0,2401134.0,2463835.0,2537114.0,2542238.0,2582593.0,2689186.0,2786770.0,2945885.0,3176131.0,3476246.0,3478881.0,4406824.0
1,By State,,,,,,,,,,,,,,,,
2,Washington,49676.0,52420.0,53658.0,49884.0,44840.0,46271.0,47911.0,47183.0,48050.0,50128.0,53091.0,57979.0,61931.0,68758.0,70311.0,76056.0


In [46]:
NewBusinesses2.tail()

Unnamed: 0,Years,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020
0,Business applications,2502011.0,2647870.0,2659813.0,2556873.0,2401134.0,2463835.0,2537114.0,2542238.0,2582593.0,2689186.0,2786770.0,2945885.0,3176131.0,3476246.0,3478881.0,4406824.0
1,By State,,,,,,,,,,,,,,,,
2,Washington,49676.0,52420.0,53658.0,49884.0,44840.0,46271.0,47911.0,47183.0,48050.0,50128.0,53091.0,57979.0,61931.0,68758.0,70311.0,76056.0


### Transposing New Businesses2 Dataset

In [47]:
NewBusinesses2.T

Unnamed: 0,0,1,2
Years,Business applications,By State,Washington
2005,2502011.0,,49676.0
2006,2647870.0,,52420.0
2007,2659813.0,,53658.0
2008,2556873.0,,49884.0
2009,2401134.0,,44840.0
2010,2463835.0,,46271.0
2011,2537114.0,,47911.0
2012,2542238.0,,47183.0
2013,2582593.0,,48050.0


### Number of NaN's per column 

In [48]:
NewBusinesses2.isnull().sum()

Years    0
2005     1
2006     1
2007     1
2008     1
2009     1
2010     1
2011     1
2012     1
2013     1
2014     1
2015     1
2016     1
2017     1
2018     1
2019     1
2020     1
dtype: int64

### Dropping NaN's from Dataset

In [49]:
NewBusinesses2.dropna(inplace=True)

In [50]:
NewBusinesses2.head()

Unnamed: 0,Years,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020
0,Business applications,2502011.0,2647870.0,2659813.0,2556873.0,2401134.0,2463835.0,2537114.0,2542238.0,2582593.0,2689186.0,2786770.0,2945885.0,3176131.0,3476246.0,3478881.0,4406824.0
2,Washington,49676.0,52420.0,53658.0,49884.0,44840.0,46271.0,47911.0,47183.0,48050.0,50128.0,53091.0,57979.0,61931.0,68758.0,70311.0,76056.0


## Opened/Closed Small Businesses Netchange [USA Facts]

In [51]:
Netchange = pd.read_csv("/Users/darringtonhenderson/Documents/GitHub Final/WozUFinalProject/FinalProject/Data/NetChangeInBusinessUSA.csv")

In [52]:
Netchange.head()

Unnamed: 0,Years,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
0,Businesses opened (Items),797000,850000,908000,864000,875000,899000,946000,950000,957000,995000,105800,1154000
1,Businesses closed (Items),912000,819000,789000,769000,783000,798000,810000,844000,867000,878000,1005000,964000


In [53]:
Netchange.tail()

Unnamed: 0,Years,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
0,Businesses opened (Items),797000,850000,908000,864000,875000,899000,946000,950000,957000,995000,105800,1154000
1,Businesses closed (Items),912000,819000,789000,769000,783000,798000,810000,844000,867000,878000,1005000,964000


### Transposing New Businesses2 Dataset

In [54]:
Netchange.T

Unnamed: 0,0,1
Years,Businesses opened (Items),Businesses closed (Items)
2010,797000,912000
2011,850000,819000
2012,908000,789000
2013,864000,769000
2014,875000,783000
2015,899000,798000
2016,946000,810000
2017,950000,844000
2018,957000,867000


## Unemployment/Employment

## Unemployment Rate Dataset [Kff.org] 

In [55]:
YTDUnemployment = pd.read_csv("/Users/darringtonhenderson/Documents/GitHub Final/WozUFinalProject/FinalProject/Data/YTD Unemployment Rate(Feb22).csv")

In [56]:
YTDUnemployment.head()

Unnamed: 0,Title: Unemployment Rate (Seasonally Adjusted) | KFF,Unnamed: 1
0,Timeframe: February 2022,
1,Location,Unemployed
2,Washington,170771


### Dropping NaN's from YTD Unemployment Data

In [57]:
YTDUnemployment.dropna(inplace=True)

In [58]:
YTDUnemployment.head()

Unnamed: 0,Title: Unemployment Rate (Seasonally Adjusted) | KFF,Unnamed: 1
1,Location,Unemployed
2,Washington,170771


## Layoffs from Businesses Downsizing Dataset

In [59]:
Layoffs = pd.read_csv("/Users/darringtonhenderson/Documents/GitHub Final/WozUFinalProject/FinalProject/Data/BusinessesLosingJobsUSA.csv")

In [60]:
Layoffs.head()

Unnamed: 0,Years,Businesses losing jobs
0,2010,2824000
1,2011,2464000
2,2012,2397000
3,2013,2427000
4,2014,2451000


In [61]:
Layoffs.tail()

Unnamed: 0,Years,Businesses losing jobs
6,2016,2534000
7,2017,2661000
8,2018,2701000
9,2019,2754000
10,2020,2950000


## Businesses Gaining Jobs Dataset [Employers Hiring/Employment Rate]

In [62]:
Employment = pd.read_csv("/Users/darringtonhenderson/Documents/GitHub Final/WozUFinalProject/FinalProject/Data/BusinessesGainingJobsUSA.csv")

In [63]:
Employment.head()

Unnamed: 0,Years,Businesses gaining jobs
0,2010,2309000
1,2011,2588000
2,2012,2727000
3,2013,2691000
4,2014,2769000


In [64]:
Employment.tail()

Unnamed: 0,Years,Businesses gaining jobs
6,2016,2915000
7,2017,2896000
8,2018,2910000
9,2019,2970000
10,2020,2943000


## COVID Cases/Deaths 

## COVID Cases & Deaths CDC Dataset

In [68]:
covid = pd.read_csv("/Users/darringtonhenderson/Documents/GitHub Final/WozUFinalProject/FinalProject/Data/United States COVID19 Cases and Deaths by State over Time .csv")

In [69]:
covid.head()

Unnamed: 0,submission_date,state,tot_cases,conf_cases,prob_cases,new_case,pnew_case,tot_death,conf_death,prob_death,new_death,pnew_death,created_at,consent_cases,consent_deaths
0,10/18/2021,NC,1449935,1228710.0,221225.0,1365,171,18058,15877.0,2181.0,46,4,10/18/2021 12:00:00 AM,Agree,Agree
1,12/19/2021,WA,804129,,,1719,136,9674,,,0,0,12/19/2021 12:00:00 AM,,
2,05/12/2022,CT,777064,696528.0,80536.0,1963,173,10883,8906.0,1977.0,0,0,05/13/2022 01:28:57 PM,Agree,Agree
3,10/04/2020,MD,127290,,,471,0,4092,3933.0,159.0,3,0,10/06/2020 12:00:00 AM,,Agree
4,03/11/2021,MD,390490,,,924,0,8549,8345.0,204.0,19,0,03/13/2021 12:00:00 AM,,Agree


In [70]:
covid.tail()

Unnamed: 0,submission_date,state,tot_cases,conf_cases,prob_cases,new_case,pnew_case,tot_death,conf_death,prob_death,new_death,pnew_death,created_at,consent_cases,consent_deaths
51835,12/11/2020,AZ,394804,380243.0,14561.0,6986,459,7245,6689,556,91,12,12/12/2020 02:59:37 PM,Agree,Agree
51836,05/28/2022,MN,1504982,,,0,0,12938,12628,310,0,0,05/30/2022 12:41:08 PM,,Agree
51837,02/10/2021,FSM,1,1.0,0.0,0,0,0,0,0,0,0,02/11/2021 02:50:55 PM,Agree,Agree
51838,01/12/2022,WY,123743,97745.0,25998.0,989,246,1588,1588,0,0,0,01/13/2022 02:34:51 PM,Agree,Agree
51839,11/22/2021,AZ,1245127,1127692.0,117435.0,3249,403,21942,19414,2528,2,0,11/23/2021 02:18:53 PM,Agree,Agree


### Dropping 'conf_cases', 'prob_cases', 'new_case', 'pnew_case', 'conf_death', 'prob_death', 'new_death', 'pnew_death', 'created_at', 'consent_cases', 'consent_deaths' Columns from COVIDs CDC Dataset

In [71]:
covid.drop(['conf_cases', 'prob_cases', 'new_case', 'pnew_case', 'conf_death', 'prob_death', 'new_death', 'pnew_death', 'created_at', 'consent_cases', 'consent_deaths'], axis=1, inplace=True)

In [72]:
covid.head()

Unnamed: 0,submission_date,state,tot_cases,tot_death
0,10/18/2021,NC,1449935,18058
1,12/19/2021,WA,804129,9674
2,05/12/2022,CT,777064,10883
3,10/04/2020,MD,127290,4092
4,03/11/2021,MD,390490,8549


In [73]:
covid.tail()

Unnamed: 0,submission_date,state,tot_cases,tot_death
51835,12/11/2020,AZ,394804,7245
51836,05/28/2022,MN,1504982,12938
51837,02/10/2021,FSM,1,0
51838,01/12/2022,WY,123743,1588
51839,11/22/2021,AZ,1245127,21942


### Dropping Rows of Un-Needed States from the State Column 

In [74]:
covid.drop(covid.index[covid['state'] == 'KS'], inplace=True)

In [75]:
covid.drop(covid.index[covid['state'] == 'UT'], inplace=True)

In [76]:
covid.drop(covid.index[covid['state'] == 'AR'], inplace=True)

In [77]:
covid.drop(covid.index[covid['state'] == 'MP'], inplace=True)

In [78]:
covid.drop(covid.index[covid['state'] == 'PW'], inplace=True)

In [79]:
covid.drop(covid.index[covid['state'] == 'HI'], inplace=True)

In [80]:
covid.drop(covid.index[covid['state'] == 'AK'], inplace=True)

In [81]:
covid.drop(covid.index[covid['state'] == 'OK'], inplace=True)

In [82]:
covid.drop(covid.index[covid['state'] == 'AS'], inplace=True)

In [83]:
covid.drop(covid.index[covid['state'] == 'NE'], inplace=True)

In [84]:
covid.drop(covid.index[covid['state'] == 'AL'], inplace=True)

In [85]:
covid.drop(covid.index[covid['state'] == 'NC'], inplace=True)

In [86]:
covid.drop(covid.index[covid['state'] == 'VI'], inplace=True)

In [87]:
covid.drop(covid.index[covid['state'] == 'NV'], inplace=True)

In [88]:
covid.drop(covid.index[covid['state'] == 'VT'], inplace=True)

In [89]:
covid.drop(covid.index[covid['state'] == 'CT'], inplace=True)

In [90]:
covid.drop(covid.index[covid['state'] == 'DE'], inplace=True)

In [91]:
covid.drop(covid.index[covid['state'] == 'IN'], inplace=True)

In [92]:
covid.drop(covid.index[covid['state'] == 'ME'], inplace=True)

In [93]:
covid.drop(covid.index[covid['state'] == 'CA'], inplace=True)

In [94]:
covid.drop(covid.index[covid['state'] == 'NH'], inplace=True)

In [95]:
covid.drop(covid.index[covid['state'] == 'MS'], inplace=True)

In [96]:
covid.drop(covid.index[covid['state'] == 'MD'], inplace=True)

In [97]:
covid.drop(covid.index[covid['state'] == 'MI'], inplace=True)

In [98]:
covid.drop(covid.index[covid['state'] == 'IL'], inplace=True)

In [99]:
covid.drop(covid.index[covid['state'] == 'ID'], inplace=True)

In [100]:
covid.drop(covid.index[covid['state'] == 'WI'], inplace=True)

In [101]:
covid.drop(covid.index[covid['state'] == 'GU'], inplace=True)

In [102]:
covid.drop(covid.index[covid['state'] == 'MT'], inplace=True)

In [103]:
covid.drop(covid.index[covid['state'] == 'ND'], inplace=True)

In [104]:
covid.drop(covid.index[covid['state'] == 'FSM'], inplace=True)

In [105]:
covid.drop(covid.index[covid['state'] == 'DC'], inplace=True)

In [106]:
covid.drop(covid.index[covid['state'] == 'VA'], inplace=True)

In [107]:
covid.drop(covid.index[covid['state'] == 'OR'], inplace=True)

In [108]:
covid.drop(covid.index[covid['state'] == 'KY'], inplace=True)

In [110]:
covid.drop(covid.index[covid['state'] == 'LA'], inplace=True)

In [111]:
covid.drop(covid.index[covid['state'] == 'NJ'], inplace=True)

In [112]:
covid.drop(covid.index[covid['state'] == 'RI'], inplace=True)

In [113]:
covid.drop(covid.index[covid['state'] == 'AZ'], inplace=True)

In [114]:
covid.drop(covid.index[covid['state'] == 'MN'], inplace=True)

In [115]:
covid.drop(covid.index[covid['state'] == 'SC'], inplace=True)

In [116]:
covid.drop(covid.index[covid['state'] == 'WY'], inplace=True)

In [117]:
covid.drop(covid.index[covid['state'] == 'MA'], inplace=True)

In [118]:
covid.drop(covid.index[covid['state'] == 'PA'], inplace=True)

In [119]:
covid.drop(covid.index[covid['state'] == 'WV'], inplace=True)

In [120]:
covid.drop(covid.index[covid['state'] == 'FL'], inplace=True)

In [121]:
covid.drop(covid.index[covid['state'] == 'NM'], inplace=True)

In [122]:
covid.drop(covid.index[covid['state'] == 'MO'], inplace=True)

In [123]:
covid.drop(covid.index[covid['state'] == 'GA'], inplace=True)

# Canonical Correlation Analysis 

In [None]:
X = df[['covid','']]