Finding out the best $1 pizza slice in NYC.
This guy is running around NYC trying every $1 pizza slice and documents his journey on his Instagram account, he gives scores from 1 to 5 to cheese, sauce and crust, I wanted to know what is the best pizza slice out there based on these scores but it turns out it's a tricky question.
There is no way to find out the best slice from the Instagram page only (unless you're willing to manually read every photo description), we need a way to automate this task!
We need to get every image and its description from his Instagram profile, from the description we can parse the cheese/sauce/crust scores, and then store all the data in a MongoDB collection for later use.
Not all images posted on that Instagram account are of pizza slices, we need a way to filter out other images, one way to do that is by using Google Vision API (vision
folder), you send an image to their servers and you get a set of labels (like 'food' or 'dish' for a pizza slice).
Once I have all the data, What's left is to get some additional information (like latitude and longitude) from Google Places API of the different pizza places, and then put everything on a map, the result is here, and the code for it is on the gh-pages branch.
Create a .env
file on the root directory containing the following environment variables
GOOGLE_PLACES_API_KEY=API key for Google Places
GOOGLE_APPLICATION_CREDENTIALS=path to a JSON key file given by Google Vision API
GCLOUD_PROJECT=name of the gcloud project used to get the API keys
Have a running a local MongoDB instance and a database named pizza
. (all scraped data is stored there).
$ node scrape.js
Running this command will fill up the pizza database with information to every $1 pizza place featured on the Instagram account!
And then I wrote a little script to export all the data from the database to a JSON object which is used for the website available at the gh-pages branch.