GitHub - AllanMerrick/NLT_Client_Project: GA_DSI Client Project completed.

Problem Statement

This tool will rely on websites that provide employment information and sector-specific wage estimations (such as Glassdoor and Indeed) to project the economic loss (wage loss) due to a disaster. Based on the type of businesses and services in a given affected area and/or using supplementary demographic data (for example, from the Census Bureau of Statistics), the tool will provide an estimation about the projected economic loss in a given locality based on the reported or estimated wage loss in the locality.

Executive Summary

DUE TO THE SIZE OF OUR DATASET, IT IS HOSTED FOR PUBLIC ACCESS:

DSI11 Disaster Dataset SEA

Modeling and EDA

Discretized Model
- Categorized wage loss into ranges based on percentile
- Fit decision tree to predict which wage loss tier a given occupation would be in given the state, disaster type, and time of year
- Mainly modeled the food service industry
Principal Component Analysis Regressor and Random Forest Pt.1
- 25 th percentile of wage is expected to gain roughly by 42852$ after a disaster
- 50 th percentile of wage is expected to gain roughly by 99243$ after a disaster
- 75 th percentile of wage is expected to increase by 140 521$
Principal Component Analysis Regressor and Random Forest Pt.2
- 25th percentile of wage is expected loss of of 9565$ after a disaster
- 50th percentile of wage is expected to gain roughly by 0$ after a disaster
- 75th percentile of wage is expected to increase by 52582$
Webscraper EDA
- Anti-scraping methods identified and outlined
- Final function annotated for adaptability and multi-use

Data Dictionary

Feature	Type	Dataset	Description
disasterNumber	integer	main_df	ID number associated to each disaster event.
state	object	main_df	State where disaster occurred.
incidentType	object	main_df	Type of disaster (tornado, fire, hurricane, etc.).
year	float	main_df	Year disaster declared.
month	float	main_df	Month dsaster declared.
occ_code	object	main_df	Occupation code (first two digits indicate industry type).
occ_title	object	main_df	Occupation title.
tot_emp	integer	main_df	Total number of people employed in occupation in state at time of disaster.
h_mean	integer	main_df	Hourly average wage for occupation in state.
a_mean	integer	main_df	Annual average wage for occupation in state.
employment_rate_during	integer	main_df	State employment rate at time of disaster.
employment_rate_before	integer	main_df	State employment rate at one month before time of disaster.
employment_rate_after	integer	main_df	State employment rate two months after time of disaster.
wage_change	integer	main_df	Wage change across time one month before disaster and two months after. Total wages gained or lost from all people working in a particular occupation in each state at time of disaster.
job_status()	function	scraper	basic scraper function for checking current job postings, average salary, and location currently on GlassDoor.com.
City	object	scraper output	Result from `jobs_status()` function.
State	object	scraper output	Result from `jobs_status()` function.
Total Listings	object	scraper output	Result from `jobs_status()` function.
Average Salary(usd)	object	scraper output	Returns GlassDoor's average salary calculation for the selected area.
Posts (3, 7, 30 days)	integer	scraper output	quantity of posts in the selected job title and area within each bin of days.

Conclusions and Recommendations

Decision trees provide a fast, efficient, and versatile model for predicting wage loss tiers of a given job for a particular state and disaster type. They are straightforward to interpret, and make few assumptions on the distribution of the data, making them good models for any industry type. However, they are very prone to overfitting as was found here. Random forests found improvements in variance, but little improvement in overall predictive ability. Our decision tree performed with 70% accuracy, 2.4 times better than the baseline model which assigns every data point belongs to the largest wage tier.

The principal component analysis regression has not been an ideal choice to predict the an economic loss disaster in the industry of construction and reparation. Random forest regressor performed quite well with 97%.

We can see huge marge error or difference in loss prediction between the 2 models on the 50th percentile of income which is expected to see a significant loss of 99243$ using component analyze regression while the random forest regression predicts 0$ loss on the 50 percentile of income in the construction industry.

Scraper functionally manages to access Glassdoor website, use search bar, return information typically nested in anti-scraping HTML (shadow agents). The function is annotated heavily for repurposing or iteration (depending on the information needed.) Currently the function is configured for current status job summaries based on the inputs of job name transformed location (location in put is optional.)

Sources

Glass Door Developer Page : Dev page for API scrape
Glass Door first URL : Initial URL
Glass Door final URL : Final URL that worked with our scraper
Glassdoor scrape example - Selenium (git) : github link to author of Medium article
Glassdoor scrape example - Selenium (Medium) : Link to Medium article covering Glassdoor and Selenium
Selenium Review SEA-FLEX-11 : Selenium flex review lab completed with our instructor.
Xpath guide : reference guide for Xpath commands
Selenium Keys_Input Documentation : reference guide for special keys
FEMA Disasters : Data about every official disaster in the US since 1953
Occupational Statistics from the Bureau of Labor Statistics : Data regarding employment and wages by job type and state
Local Area Employment Statistics from the Bureau of Labor Statistics : Data regarding employment rates for each state

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Images		Images
chromedriver		chromedriver
code		code
data		data
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Contents:

Problem Statement

Executive Summary

Data Dictionary

Conclusions and Recommendations

Sources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Contents:

Problem Statement

Executive Summary

Data Dictionary

Conclusions and Recommendations

Sources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages