JavaScript Python Other
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
crawling
css
dist
example
images
img
js
learning
.DS_Store
Control.FullScreen.css
Control.FullScreen.js
LICENSE
README
icon-fullscreen.png
index.html
real_world.html
style.css

README

Freedom Hack



Our hack is called The All Street Journal

It is a street news browsing application, with an interactive GIS interface. News is organized by locality and topic. The user can choose one or more topics of interest (eg crime, politics etc), and also narrow down to the locality of interest.



For this demo we are showing you how this application works for the whole of New Delhi - but clearly this can be replicated for any city in the world.



We started by collecting thousands of news articles by crawling online newspapers. Important to note, we are not using RSS feeds - so this technique can be applied to ANY news article. Now we needed an algorithm to tag each article to a topic and a locality. We hand tagged some news articles to their topics, and used an active machine learning approach to build a topic tagging algorithm. The active approach meant an exponential reduction in the amount of hand tagging needed. Once we had topics for all the news articles in this way, we used THAT data to derive street / locality tagging. This was done using a technique that we call CONTEXT DEPENDENT GEOPARSING, where we attempt to find out which location is relevant to the NEWS, by giving more emphasis to the location-revealing words in the article that are closer to the topic-revealing keywords.



Once we had the topic and locality tags, we projected this data on to a GUI based on the OPENSTREET GIS platform.



We've used only open source technologies in this hack. Our backend - the crawling and machine learning part - is coded in PYTHON, and the GUI uses OPENSTREET GIS, along with a number of openly available GITHUB libraries like leaflet, etc.



Regarding accuracy of the tagging, we reserved a part of the hand tagged data for evaluation and testing, and these are the results we saw. Across all the topics, we saw at least 87% accuracy of topic prediction, and at least 80% accuracy of location prediction, which we think are really encouraging numbers and could go commercial after some refinement.