★★★ The worst part? You don't have an option whether you like it or not - https://maps.google.com/?cid=15088333349668026791
People review the darndest things. In this case, embassies.
What / Why?
For all user-generated content on the Internet, there's a signal-to-noise ratio. My theory is that in situations where the abstraction doesn't work, there's very little signal -- it's hard to create meaningful content in a context that's mostly nonsensical. Turns out the noise is often pretty funny.
Reviews of embassies fit the bill for this because reviews were designed to help individuals make comparisons between competitors. It works well for any situation where there's differentiation - Cafes, restaurants, and software are classic examples, which is why they're rated so frequently. It works less well for pure commodities, where you'd probably only write a review if something really bad happened - for example, at a petrol / gas station, or a chain supermarket. It works terribly for things that are completely non-competitive - for example, bridges ("The supporting cables used to vibrate, then they put supporting supporting cables. Good looking bridge, actually quite pleasant to walk across.").
To me, embassies are the epitome of non-competitiveness - you often have to go there to get a visa, but people don't choose countries to go to based on reviews of embassies. You're stuck with it, no matter how you feel about it.
The code / how it works
- Grab a key for the Google Places API, and put it in
- Probably add a credit card to the project if you're doing development - the daily limits get exhausted quickly if you're fetching lots of reviews.
What's in the repo?
I've designed this in a series of discrete stages that all have output files, because fetching complete data sets is slow, and so I wanted to be able to iterate on each stage. Each stage is contained within a folder (more or less):
./data/- contains information on cities and countries. See
data/README.mdfor details on sources.
./process_cities- processes the data in
/data, outputs a list of potentially interesting (lat,lng) pairs to use as inputs to the places API. Initially, this is just based on the capital cities of each country, as well as cities with a population larger than 500 000.
./fetch_embassies- Reads (lat,lng) pairs from
stdinand fetches as many embassies within 50km of each of these as the Places API will allow. Outputs the Places API placeid for each of these to
stdout. Note that this might return duplicates, you might want to pipe the output through
| sort | uniq.
./fetch_reviews- Takes a list of Place IDs and fetches reviews for all of them. Outputs reviews to JSON, along with the place name and a URL that links to Maps.
./process_reviews- Filters and formats the reviews for Twitter. The format is
"★★★ [Review Text] [Maps URL]". Input is JSON from
./fetch_reviews, Output is a pretty-printed JSON array of strings. Does some clever things to shorten what would be > 140 character tweets down to the limit.
./tweeter/- A python thingy that tweets random entries from the included JSON file. The code here is super reusable - just dump new tweets in
tweetsrc.json, and oauth credentials in
ouath.json.tmplfor a template), build and deploy to lambda.
./tweeter/into a zip file that can be uploaded to AWS lambda for periodic tweets.