# SANDAG Sector Estimates Analysis
This is a calculator that takes in EDD data and aggregates it to specified SANDAG sectors. This also takes in estimates data and aggregates those to SANDAG sectors as well. Then a comparison between the two is drawn. 

See Purva's Excel for more information: https://sandag.sharepoint.com/:x:/r/qaqc/_layouts/15/Doc.aspx?sourcedoc=%7BCFD80313-6A49-43E6-B776-50C85CD17DD4%7D&file=EDD_Forecast%20Output%20Industry%20Level%20Jobs%20Comparison_QA.xlsx&action=default&mobileredirect=true&cid=9c4c556c-a447-4a46-8e56-ab5f1c95311a

In [1]:
import pandas as pd
import urllib.request  # For downloading the xlsx file
import pandas as pd
from sodapy import Socrata
import ssl
import sqlalchemy

# Grabbing EDD Data

In [2]:
# Sector Level Data
client = Socrata("data.edd.ca.gov", None)
results = client.get_all("r4zm-kdcg", area_name='San Diego-Carlsbad MSA')
results_df = pd.DataFrame.from_records(results)



In [3]:
# Cleaning the sector Data
edd_data = results_df[results_df['seasonally_adjusted']=='N'][['year', 'month', 'industry_title', 'current_employment']]
edd_data['date'] = edd_data.assign(day=1)[['year','month','day']].apply(lambda x: '-'.join(x.values.astype(str)), axis="columns")
edd_data['date'] = pd.to_datetime(edd_data['date']) #year-month-day
edd_data = edd_data.pivot(index='industry_title', columns='date', values='current_employment')
edd_data = edd_data.apply(pd.to_numeric)

In [4]:
edd_data

date,1990-01-01,1990-02-01,1990-03-01,1990-04-01,1990-05-01,1990-06-01,1990-07-01,1990-08-01,1990-09-01,1990-10-01,...,2021-09-01,2021-10-01,2021-11-01,2021-12-01,2022-01-01,2022-02-01,2022-03-01,2022-04-01,2022-05-01,2022-06-01
industry_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Accommodation,23800.0,24100.0,24300.0,23200.0,23300.0,23500.0,24600.0,24800.0,24500.0,24200.0,...,23400,23600,23500,23700,23000,23600,24200,25000,25600,26300
Accommodation and Food Service,86200.0,87500.0,87900.0,87000.0,87500.0,88400.0,92400.0,92700.0,91800.0,89700.0,...,151300,155500,156400,156100,153500,157000,162000,163600,164100,167500
Administrative and Support Services,42800.0,42900.0,44000.0,44600.0,45000.0,45100.0,45000.0,46200.0,46000.0,45100.0,...,86900,90100,91500,92100,90300,95400,95800,96100,94300,90400
Administrative and Support and Waste Ser,45000.0,45200.0,46300.0,46600.0,47500.0,47600.0,47300.0,48500.0,48300.0,47500.0,...,91100,94400,95800,96400,94500,99700,100100,100400,98600,94700
Aerospace Product and Parts Manufacturin,23700.0,23400.0,23300.0,22800.0,22600.0,22600.0,23000.0,22700.0,22700.0,22700.0,...,11500,11500,11500,11400,11400,11400,11400,11200,11000,11100
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
"Transportation, Warehousing and Utilitie",24100.0,23900.0,24100.0,23800.0,24000.0,24400.0,24600.0,24700.0,24600.0,24500.0,...,36900,37600,39800,40500,39000,38800,37300,37300,37000,37700
Utilities,6600.0,6500.0,6500.0,6600.0,6600.0,6600.0,6800.0,6700.0,6700.0,6700.0,...,5000,5000,5100,5000,5000,5100,5000,5000,5000,5100
Warehousing and Storage,3200.0,3200.0,3200.0,3200.0,3200.0,3200.0,3200.0,3200.0,3200.0,3300.0,...,5400,5400,5500,5500,5600,5600,5600,5500,5600,5700
Waste Management and Remediation Service,2200.0,2300.0,2300.0,2000.0,2500.0,2500.0,2300.0,2300.0,2300.0,2400.0,...,4200,4300,4300,4300,4200,4300,4300,4300,4300,4300


In [5]:
# Grab the EDD sectors that we are interested in 
edd_breakdown = ['Mining and Logging','Total Farm','Construction','Manufacturing','Wholesale Trade','Retail Trade','Utilities','Transportation and Warehousing','Information','Finance and Insurance','Real Estate and Rental and Leasing','Professional, Scientific and Technical S','Management of Companies and Enterprises','Administrative and Support and Waste Ser','Educational Services','Health Care and Social Assistance','Arts, Entertainment, and Recreation','Accommodation','Food Services and Drinking Places','Other Services','Federal Government excluding Department of Defense','Department of Defense','State Government Education','State Government Excluding Education','Local Government Education','Local Government Excluding Education']
edd_data = edd_data[edd_data.index.isin(edd_breakdown)]

In [6]:
edd_data = pd.DataFrame(edd_data.iloc[:,-1]) # Filter for the latest data
edd_data.columns = ['Employment'] # Change the column to be custom

In [7]:
edd_data

Unnamed: 0_level_0,Employment
industry_title,Unnamed: 1_level_1
Accommodation,26300
Administrative and Support and Waste Ser,94700
"Arts, Entertainment, and Recreation",31300
Construction,85500
Department of Defense,23400
Educational Services,26600
Federal Government excluding Department of Defense,23400
Finance and Insurance,45600
Food Services and Drinking Places,141200
Health Care and Social Assistance,192600


# Grabbing Estimates Data