# Census Bureau Data

After reviewing the employment data from the State of Colorado we determined that it did not provide enough local specificity to all us to make useful and proper correlations without snow data. As a result, we found and updated the QWI employment data below. These data were acquired from the Census Bureau and have corresponding county level data which will allow us to make more accurate correlations with the snow data which have limited specific counties west of the Eisenhower tunnel and in common ski resort locations (Aspen, Vail, etc).

There is at least one major drawback with these data, they are only reported quarterly. This too limits our ability to correlate this data on the same time scale as that of the snow data. Nonetheless, since the prior data showed little to no correlation with the snow data we feel that it is worth exploring a little further to see if employment data is a useful avenue to explore.

# First We Will Create Our Dataframe

In [31]:
! pip install pandas



In [2]:
import pandas as pd

pd.set_option('display.max_columns', None)

QWI_data = pd.read_csv ('/dsa/groups/capstonesp2022/online/group_2/QWI_Employment_Data.csv')

#Note: This dataset is very large and contains some information which will not be useful for our purposes

QWI_data=QWI_data.drop(['periodicity','periodicity_label.value', 'seasonadj_label.value','geo_level','geography','race_label.value','ethnicity_label.value','education_label.value','education_label.value','firmage','firmage_label.value','firmsize','ind_level','ownercode','ownercode_label.value','sex','sex_label.value','agegrp_label.value','agegrp','race','ethnicity','education','HirN','EarnS','Payroll'], axis = 1)




In [3]:
QWI_data.head(10)

Unnamed: 0,seasonadj,geo_level_label.value,geography_label.value,industry,industry_label.value,firmsize_label.value,year,quarter,Emp,EmpEnd,EmpS,HirA,Sep,TurnOvrS,sEmp,sEmpEnd,sEmpS,sHirA,sHirN,sSep,sTurnOvrS,sEarnS,sPayroll
0,U,States,Colorado,1111,Oilseed and Grain Farming,All Firm Sizes,1994,1,390.0,407.0,316.0,128.0,110.0,0.07,1,1,1,1,-1,1,6,5,5
1,U,States,Colorado,1112,Vegetable and Melon Farming,All Firm Sizes,1994,1,1309.0,1162.0,856.0,689.0,836.0,0.131,1,1,1,1,-1,1,6,5,5
2,U,States,Colorado,1113,Fruit and Tree Nut Farming,All Firm Sizes,1994,1,141.0,156.0,105.0,105.0,90.0,0.133,1,1,1,1,-1,1,6,5,5
3,U,States,Colorado,1114,"Greenhouse, Nursery, and Floriculture Production",All Firm Sizes,1994,1,2085.0,2576.0,1805.0,1080.0,588.0,0.154,1,1,1,1,-1,1,6,5,5
4,U,States,Colorado,1119,Other Crop Farming,All Firm Sizes,1994,1,540.0,629.0,437.0,289.0,200.0,0.125,1,1,1,1,-1,1,6,5,5
5,U,States,Colorado,1121,Cattle Ranching and Farming,All Firm Sizes,1994,1,2798.0,2924.0,2392.0,717.0,591.0,0.118,1,1,1,1,-1,1,6,5,5
6,U,States,Colorado,1122,Hog and Pig Farming,All Firm Sizes,1994,1,461.0,468.0,396.0,126.0,120.0,0.14,1,1,1,1,-1,1,6,5,5
7,U,States,Colorado,1123,Poultry and Egg Production,All Firm Sizes,1994,1,543.0,529.0,438.0,178.0,192.0,0.158,1,1,1,1,-1,1,6,5,5
8,U,States,Colorado,1124,Sheep and Goat Farming,All Firm Sizes,1994,1,173.0,180.0,142.0,55.0,48.0,0.141,9,1,9,1,-1,1,6,5,5
9,U,States,Colorado,1125,Aquaculture,All Firm Sizes,1994,1,22.0,29.0,17.0,14.0,7.0,,9,9,9,9,-1,9,5,5,5


Our core interest regarding these employment data is how industries directly impacted by the snow or lack of snow might be impacted. Additionally, we are only concerned with the winter months and those observations for particular areas. In this case that means we will limit the industry to the hospitality industry (code:7211), we will look only at the winter months (quarters 1 & 4), and will look at a singular county containing one of our snow data observation stations (Pitkin County, Aspen). Finally we will need to Group these observations by year since this is the best timescale equivalent to the snow data.

In [4]:
Quarterly_data=QWI_data.loc[(QWI_data['industry']==7211) & (QWI_data.quarter.isin([1,4])) & (QWI_data['geography_label.value']== 'Pitkin, CO')]
Quarterly_data=Quarterly_data.groupby(['year']).mean()

In [5]:
Quarterly_data.head(30)

Unnamed: 0_level_0,industry,quarter,Emp,EmpEnd,EmpS,HirA,Sep,TurnOvrS,sEmp,sEmpEnd,sEmpS,sHirA,sHirN,sSep,sTurnOvrS,sEarnS,sPayroll
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
1994,7211.0,2.5,2003.0,2368.5,1618.5,986.0,620.5,0.2595,1.0,1.0,1.0,1.0,0.0,1.0,6.0,5.0,5.0
1995,7211.0,2.5,2132.0,2520.5,1762.5,956.5,568.5,0.254,1.0,1.0,1.0,1.0,1.0,1.0,6.0,5.0,5.0
1996,7211.0,2.5,2238.0,2970.0,1864.5,1355.0,622.5,0.2445,1.0,1.0,1.0,1.0,1.0,1.0,6.0,5.0,5.0
1997,7211.0,2.5,2897.5,3251.5,2463.0,1063.5,709.5,0.2465,1.0,1.0,1.0,1.0,1.0,1.0,6.0,5.0,5.0
1998,7211.0,2.5,2818.0,3137.5,2329.0,1110.0,790.5,0.2125,1.0,1.0,1.0,1.0,1.0,1.0,6.0,5.0,5.0
1999,7211.0,2.5,2535.5,2985.5,2032.5,1209.5,760.0,0.2805,1.0,1.0,1.0,1.0,1.0,1.0,6.0,5.0,5.0
2000,7211.0,2.5,2173.5,2543.0,1643.5,1218.5,849.5,0.3085,1.0,1.0,1.0,1.0,1.0,1.0,6.0,5.0,5.0
2001,7211.0,2.5,1913.5,2112.5,1464.5,873.5,674.0,0.2705,1.0,1.0,1.0,5.0,1.0,1.0,6.0,5.0,5.0
2002,7211.0,2.5,1937.5,2425.0,1579.5,1106.0,618.5,0.2325,1.0,1.0,1.0,1.0,1.0,1.0,6.0,5.0,5.0
2003,7211.0,2.5,2018.5,2422.5,1729.5,839.0,435.0,0.1975,1.0,1.0,1.0,1.0,1.0,1.0,6.0,5.0,5.0


# Now We Import the Snow Data

In [6]:
import getpass
from sqlalchemy.engine.url import URL
from sqlalchemy import create_engine
%reload_ext sql

mypasswd = getpass.getpass()
username = 'nnfd2' # Replace with your pawprint
host = 'pgsql.dsa.lan'
database = 'caponl_22g2'

postgres_db = {'drivername': 'postgres',
               'username': username,
               'password': mypasswd,
               'host': host,
               'database': database}
engine = create_engine(URL(**postgres_db), echo=False)


connection_string = f'postgres://{username}:{mypasswd}@{host}/{database}'
%sql $connection_string
del mypasswd

········


We will apply the same kind of limitations and grouping to ensure that these data and our employment data concern the same factors.

In [7]:
%%sql

select year, station_name,SUM(total_snowfall) as total_snowfall,AVG(max_daily_snowfall) as avg_max_daily_snowfall, AVG(max_daily_snow_depth) as avg_max_daily_snow_depth
from gsom_data
where year >=1994 and year <=2019 and month IN ('1','2','3','10','11','12') and station_name LIKE 'A%' 
group by year,station_name
order by year

 * postgres://nnfd2:***@pgsql.dsa.lan/caponl_22g2
26 rows affected.


year,station_name,total_snowfall,avg_max_daily_snowfall,avg_max_daily_snow_depth
1994,"ASPEN 1 SW, CO US",130.4,6.36,28.0
1995,"ASPEN 1 SW, CO US",131.1,8.54,36.5
1996,"ASPEN 1 SW, CO US",186.6,10.0,30.75
1997,"ASPEN 1 SW, CO US",144.9,7.06666666666667,36.25
1998,"ASPEN 1 SW, CO US",138.8,8.48,28.8
1999,"ASPEN 1 SW, CO US",99.7,4.65,21.2
2000,"ASPEN 1 SW, CO US",132.6,6.43333333333333,16.3333333333333
2001,"ASPEN 1 SW, CO US",96.9,5.38333333333333,17.1666666666667
2002,"ASPEN 1 SW, CO US",127.1,6.33333333333333,21.8333333333333
2003,"ASPEN 1 SW, CO US",122.4,5.41666666666667,18.3333333333333


In [8]:
snow_data_df = %sql select year, station_name,SUM(total_snowfall) as total_snowfall,AVG(max_daily_snowfall) as avg_max_daily_snowfall, AVG(max_daily_snow_depth) as avg_max_daily_snow_depth from gsom_data where year >=1994 and year <=2019 and month IN ('1','2','3','10','11','12') and station_name LIKE 'A%'  group by year,station_name order by year

snow_data_df = snow_data_df.DataFrame()

 * postgres://nnfd2:***@pgsql.dsa.lan/caponl_22g2
26 rows affected.


Now we merger the two dataframes

In [18]:
emp_snow_df=pd.merge(Quarterly_data, snow_data_df, on='year')
#emp_snow_df['year'] = pd.Categorical(emp_snow_df.year)

emp_snow_df.head()

Unnamed: 0,year,industry,quarter,Emp,EmpEnd,EmpS,HirA,Sep,TurnOvrS,sEmp,sEmpEnd,sEmpS,sHirA,sHirN,sSep,sTurnOvrS,sEarnS,sPayroll,station_name,total_snowfall,avg_max_daily_snowfall,avg_max_daily_snow_depth
0,1994,7211.0,2.5,2003.0,2368.5,1618.5,986.0,620.5,0.2595,1.0,1.0,1.0,1.0,0.0,1.0,6.0,5.0,5.0,"ASPEN 1 SW, CO US",130.4,6.36,28.0
1,1995,7211.0,2.5,2132.0,2520.5,1762.5,956.5,568.5,0.254,1.0,1.0,1.0,1.0,1.0,1.0,6.0,5.0,5.0,"ASPEN 1 SW, CO US",131.1,8.54,36.5
2,1996,7211.0,2.5,2238.0,2970.0,1864.5,1355.0,622.5,0.2445,1.0,1.0,1.0,1.0,1.0,1.0,6.0,5.0,5.0,"ASPEN 1 SW, CO US",186.6,10.0,30.75
3,1997,7211.0,2.5,2897.5,3251.5,2463.0,1063.5,709.5,0.2465,1.0,1.0,1.0,1.0,1.0,1.0,6.0,5.0,5.0,"ASPEN 1 SW, CO US",144.9,7.066667,36.25
4,1998,7211.0,2.5,2818.0,3137.5,2329.0,1110.0,790.5,0.2125,1.0,1.0,1.0,1.0,1.0,1.0,6.0,5.0,5.0,"ASPEN 1 SW, CO US",138.8,8.48,28.8


Finally, we want to review the potential correlations between total snowfall and various employment measures to determine if these data might be useful for our analysis. Starting with the Employment variable.

In [28]:
import plotly.express as px
import numpy as np
import matplotlib.pyplot as plt


fig2 = px.scatter(emp_snow_df, x="Emp", y="total_snowfall", color='year', title="Total Snowfall by Employment")
fig2.show()

While there does seem to be a mild trend, these dont seem to demonstrate any clear trend.

In [24]:
fig2 = px.scatter(emp_snow_df, x="EmpS", y="total_snowfall", color="year", title="Total Snowfall by Standarized Employment")
fig2.show()

In [26]:
fig2 = px.scatter(emp_snow_df, x="Sep", y="total_snowfall", color="year", title="Total Snowfall by Seperation")
fig2.show()

Again, while these data appear to have a slight more defined correlaiton, there does not seem to be a significant correlation. 

In [27]:
fig2 = px.scatter(emp_snow_df, x="TurnOvrS", y="avg_max_daily_snowfall", color="year", title="Total Snowfall by Turnover")
fig2.show()

Once again, no clear correlation is present.

After reviewing these data it appears that, much like the Colorado data, there does not seem to be much of a correlation at all between these two data sets. Given this will not utilize employment as an avenue of analysis and will focus instead on or traffic data. We will continue to look for other avenues of analysis related to human movement which might provide more insight.