Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract `Inspection Resulted in Closure' Feature from Restaurant Inspection Data #14

Open
eclee25 opened this issue Jan 9, 2018 · 0 comments

Comments

@eclee25
Copy link
Contributor

eclee25 commented Jan 9, 2018

The purpose of this issue is to continue the data cleaning that was started in the dc_doh_hackathon repository. The routine should then be modified to be able to be run repeatedly on the restaurant inspection dataset as it is updated.

From the issue_19 folder:

Take a look at `Issue 19 Walkthrough.ipynb' for the data-related conclusions reached at the hackathon. Continue with this script or start a new one that develops a routine to identify inspections that resulted in closures (see the original full GitHub issue text below). Be sure that the final script can be run from the command line taking three arguments:

  1. The input restaurant inspection data file (...inspection_summary_data.csv)
  2. The file that maps inspection_id to census block (The output of issue Geocode Restaurant Inspections and Map to Census Blocks #6)
  3. The output filename

Please also provide a README.md that describes the script and how to run it.

You can model the solution after the files here or
here

Place all of your files in the scripts/feature_engineering/extract_restaurant_inspection_features/ folder

For reference, here is the original issue description from the dc_doh_hackathon:

issue_19

Start with the DC DOH Food Service Establishment Inspection report data in the /Data Sets/Restaurant Inspections/ folder in Dropbox.

Develop a script to extract the number of food establishment inspections that resulted in (temporary) closure of the establishment. More details on violations can be found here

Input:
CSV files with inspection summary and violation details

Output:
A CSV file with

  • 1 row for each establishment type and risk category, and each week, year, and census block
  • The following columns:

feature_id: The ID for the feature, in this case, "restaurant_inspection_closures"
feature_type: The establishment_type from the restaurant data set
feature_subtype: The risk_category from 1-5
year: The ISO-8601 year of the feature value
week: The ISO-8601 week number of the feature value
census_block_2010: The 2010 Census Block of the feature value
value: The value of the feature, i.e. the number of inspections that resulted in closure in establishments with the given types and risk categories in the specified week, year, and census block.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants