# Indicator 1_1_1
#### Purpose:
Indicator 1_1_1 provides information on unemployment in the county of San Diego. 

#### Data Source:
Indicator 1_1_1 has two data sources that are used. Both of these data sources come from EDD. Both use the socrata API. \
$\;\;\;\;\;\;$ 1. Current Employment Statistics: https://data.edd.ca.gov/Industry-Information-/Current-Employment-Statistics-CES-/r4zm-kdcg \
$\;\;\;\;\;\;$ 2. Local Area Unemployment Statistics: https://data.edd.ca.gov/Labor-Force-and-Unemployment-Rates/Local-Area-Unemployment-Statistics-LAUS-/e6gw-gvii


#### Transformations being preformed:
The transformation process at a high level for this indicator includes combining both datasets together. Adding a time series compenent. Finally the data is broken down by ZIP code in San Diego county. This data is then uploaded to Azure. 

#### Location of Outputs:
https://portal.azure.com/#blade/Microsoft_Azure_Storage/ContainerMenuBlade/overview/storageAccountId/%2Fsubscriptions%2F0dc5d1f6-911d-46fa-a9ef-d300a130703b%2FresourceGroups%2FADLS_POC%2Fproviders%2FMicrosoft.Storage%2FstorageAccounts%2Fsandagadls/path/cleansed/etag/%220x8D981C226F2BC38%22/defaultEncryptionScope/%24account-encryption-key/denyEncryptionScopeOverride//defaultId//publicAccessVal/None

#### SME:
Calvin Raab (calvin.raab@sandag.org)

#### Author: 
Calvin Raab (calvin.raab@sandag.org)

#### Data Created 
5/18/2022

# Import Libraries

In [1]:
# Necessary libraries
import pandas as pd
import urllib.request  # For downloading the xlsx file
import pandas as pd
from sodapy import Socrata # For the API
import ssl
import sqlalchemy # For the SQL Push 

# sector_level_data_load()
The purpose of the sector_level_data_load() function is to load the sector level data for this indicator. Raw data is pulled via socrate API, then the necessary cleaning procedures are added. The final cleaned data frame is added.  

In [2]:
def sector_level_data_load():
    # Sector Level Data
    client = Socrata("data.edd.ca.gov", None)
    results = client.get_all("r4zm-kdcg", area_name='San Diego-Carlsbad MSA')
    results_df = pd.DataFrame.from_records(results)

    # Cleaning the sector Data
    edd_data = results_df[results_df['seasonally_adjusted']=='N'][['year', 'month', 'industry_title', 'current_employment']] # These instructions come from SME

    # Turn into a time series object
    edd_data['date'] = edd_data.assign(day=1)[['year','month','day']].apply(lambda x: '-'.join(x.values.astype(str)), axis="columns")
    edd_data['date'] = pd.to_datetime(edd_data['date']) #year-month-day
    edd_data = edd_data.pivot(index='industry_title', columns='date', values='current_employment')


    # We want the column headers to match our Census Bureau data. This column retitling is done to match up with what we have from Census Bureau.
    edd_data.rename(index={'Total Farm': 'Agriculture, Forestry, Fishing and Hunting',
                    'Mining and Logging': 'Mining, Quarrying, and Oil and Gas Extraction',
                    'Professional, Scientific and Technical S': 'Professional, Scientific, and Technical Services',
                    'Administrative and Support and Waste Ser': 'Administration & Support, Waste Management and Remediation',
                    'Accommodation and Food Service': 'Accommodation and Food Services',
                    'Other Services': 'Other Services (excluding Public Administration)'}, inplace=True)
    edd_data = edd_data.apply(pd.to_numeric)

    return edd_data

# double_df()
This function takes in the output of the sector_level_data_load() and multiplies that data by two. The reason we are doing this is to see what the numbers look like when they are doubled. 

In [3]:
def double_df(edd_data):
    doubled_df = edd_data*2 # Intructed by SME to multiply dataframe by two
    return doubled_df

# add_five()
This function takes in the dataset from the double_df() function and adds 5. It was instructed by SME to add value of five to dataframe.

In [4]:
def add_5(doubled_df):
    five_added = doubled_df+5 # Intructed by SME to multiply dataframe by two
    return five_added

# Output creation 
If there are any specific inputs needed to the model. Please describe them here. Describe any inputs you used to your functions when creating the Azure output. A common finding here is a resulting "gaierror", please try a number of times to run the function and in my experience it will eventually run. The issue is being looked into.  

In [9]:
final_output = add_5(double_df(sector_level_data_load()))



In [10]:
final_output

date,1994-09-01,1994-10-01,1994-11-01,1994-12-01,1995-01-01,1995-02-01,1995-03-01,1995-04-01,1995-05-01,1995-06-01,...,2021-06-01,2021-07-01,2021-08-01,2021-09-01,2021-10-01,2021-11-01,2021-12-01,2022-01-01,2022-02-01,2022-03-01
industry_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Accommodation,48805.0,48405.0,48205.0,47605.0,47205.0,47605.0,48005.0,48405.0,49405.0,50005.0,...,42405,45205,47205,46805,47205,47005,47405,46005,47205,48205
Accommodation and Food Services,197005.0,191805.0,191205.0,192605.0,187805.0,190805.0,193405.0,195005.0,198405.0,200805.0,...,284805,299605,304605,302605,311005,312805,312205,307005,314005,322605
Administrative and Support Services,105005.0,104005.0,104005.0,103605.0,102405.0,105405.0,106605.0,107205.0,107405.0,108605.0,...,167805,169605,170205,173805,180205,183005,184205,180605,190805,192605
"Administration & Support, Waste Management and Remediation",109005.0,107805.0,107805.0,107205.0,106005.0,109005.0,110205.0,110605.0,110805.0,112205.0,...,176005,178005,178605,182205,188805,191605,192805,189005,199405,201205
Aerospace Product and Parts Manufacturin,16605.0,15205.0,15205.0,15005.0,15205.0,15205.0,14805.0,12205.0,12205.0,12005.0,...,23405,23405,23005,23005,23005,23005,22805,22805,22805,22605
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
"Transportation, Warehousing and Utilitie",56005.0,54405.0,54005.0,54605.0,51205.0,51205.0,52005.0,52405.0,53005.0,55005.0,...,70205,73805,74205,73805,75205,79605,81005,78005,77605,75605
Utilities,12805.0,13005.0,12805.0,12805.0,12605.0,12405.0,12605.0,13005.0,13005.0,13205.0,...,10005,10005,10005,10005,10005,10205,10005,10005,10205,10205
Warehousing and Storage,10805.0,8205.0,8205.0,9005.0,7605.0,7805.0,8005.0,8005.0,8605.0,10205.0,...,7405,10405,10605,10805,10805,11005,11005,11205,11205,11205
Waste Management and Remediation Service,4005.0,3805.0,3805.0,3605.0,3605.0,3605.0,3605.0,3405.0,3405.0,3605.0,...,8205,8405,8405,8405,8605,8605,8605,8405,8605,8605
