Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract 'Rodent or Trash Violations' Feature from Restaurant Inspection Data #13

Open
eclee25 opened this issue Jan 9, 2018 · 0 comments
Assignees

Comments

@eclee25
Copy link
Contributor

eclee25 commented Jan 9, 2018

The purpose of this issue is to translate the existing scripts from the dc_doh_hackathon repository to enable them to be run repeatedly on the restaurant inspection dataset as it is updated.

From the issue_18/Code & Input Data folder:

Take the rodents_or_trash_extraction.R script and verify that it works as expected (see the original full GitHub issue text below). Then, modify it to be run from the command line taking
three arguments:

  1. The input restaurant inspection data file (...inspection_summary_data.csv)
  2. The file that maps inspection_id to census block (The output of issue Geocode Restaurant Inspections and Map to Census Blocks #6)
  3. The output filename

Please also provide a README.md that describes the script and how to run it.

You can model the solution after the files here or
here

Place all of your files in the scripts/feature_engineering/extract_restaurant_inspection_features/ folder

For reference, here is the original issue description from the dc_doh_hackathon:

issue_18

Start with the DC DOH Food Service Establishment Inspection report data in the /Data Sets/Restaurant Inspections/ folder in Dropbox.

Develop a script to extract the number of food establishment inspections that found rodent or trash-related violations (violations 38 or 54). More details on violations can be found here

Input:
CSV files with inspection summary and violation details

Output:
A CSV file with

  • 1 row for each establishment type and risk category, and each week, year, and census block
  • The following columns:

feature_id: The ID for the feature, in this case, "restaurant_violations_rodent_or_trash"
feature_type: The establishment_type from the restaurant data set
feature_subtype: The risk_category from 1-5
year: The ISO-8601 year of the feature value
week: The ISO-8601 week number of the feature value
census_block_2010: The 2010 Census Block of the feature value
value: The value of the feature, i.e. the number of inspections that found rodent or trash-related violations in establishments with the given types and risk categories in the specified week, year, and census block.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants