Downloading of Ottawa Health Inspection Data with Ruby into MongoDB database
- Install MongoDB and start default mongodb setup (http://docs.mongodb.org/manual/installation/)
- Run Ruby script
All data will download into mongodb database. Each company.restaurant is a single object in mongo. Any inspections are sub-documents in Mongo.
Convert Date/Times to a proper format for Mongo - Refer to Github-Analytics Convert DateTime methods for code for conversion for mongo (Completed)
- Add suport for multilingual (En/Fr) Analysis class responses
- Convert into Sinatra App
- Rebuild Analysis/Aggregation code for new usage of Nokogiri
- Split .rd file into separate files (download, analyze, Sinatra stuff, etc)
- Provide JSON output option to use as API
- Provide Query/Search feature (Input Restaurant ID, and Output Restaurant details and inspections) - Used for API-Like functionality that does not require DB for storage. A list API would provide listing of restaurants/facilities with a ID value which is used to query the Health inspection site. Returned values would be in JSON structure.
NOTE: Sept 2: Major refactor coming in next few days that better reformats data into Mongo Refactor has occured for use of xpath and customized output for mongo. See image/screenshot below for sample. (Sept 5)
- Duration of downloading all data from Health inspection website is about 2 hours. This is because each webpage must be downloaded individually.
- All data is downloaded into the Mongodb database
- Aug 17 - 5445 Records were downloaded in 7303.2 seconds - According to count on Health Inspections report website there is are a total of 5453 records. Need to investigate the missing records.
- Aug 17 - 5453 Records were downloaded in 7213.8 seconds - Count Matches Website Count value.
- Sept 5 - 5453 Records were downloaded in 7180.8 seconds - Count matches website count value. (this used the xpath parsing method with Nokogiri gem)
Types of Analysis to Produce
- Breakdown of Restaurant Names and the Count for that Name
- ID formatting and spelling mistakes in Restaurant Names
- Recently Failed Restaurant Inspections
- GIS locations of Restaurants
- Breakdown of Resturant Categories
- Breakdown of Resturant Categories and Failed Inspections
- Breakdown of Restaurants in City Sectors
- Failed inspections per neighbourhood
- Failed inspections per Ward
- Number of inspections Per Month
- Number of Inspections Per Month Per Restaurant Type
- Number of Inspections Per Quarter
- Number of Inspections Per Ward
- Breakdown of Inspection Times
- Breakdown of Inspection Times and Restaurant Categories
- Most Inspected Restaurants
- Least Inspected Restaurants
- Most Inspected Resturants Per Category
- Least Inspected Resturants Per Cateogry
- Restaurant Inspections Count Per Cateogry Broken down by Month, Quarter and Week.
- Analysis of Record Creation / Restaurant Creation
- General Count of Resturant Inspection Failures Per Restaurant
- Restaurant Inspection Failures per restaurant per week, month and quarter
- Analyst of Inspections and Pass/Fail of Food Cart/Truck Vendors
- Phone Number Analysis (Area Codes)
- Shared Phone Numbers Analysis
- Street Analysis - Breakdown of restaurants per street with failures
- Analysis count of restaurants that have no inspections
- Analysis count of restaurants that are marked as "Closed" (example from old Drupal repo: https://github.com/StephenOTT/OttawaHealthInspectionsScrape/issues/5)