Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
Freedom Hack Our hack is called The All Street Journal It is a street news browsing application, with an interactive GIS interface. News is organized by locality and topic. The user can choose one or more topics of interest (eg crime, politics etc), and also narrow down to the locality of interest. For this demo we are showing you how this application works for the whole of New Delhi - but clearly this can be replicated for any city in the world. We started by collecting thousands of news articles by crawling online newspapers. Important to note, we are not using RSS feeds - so this technique can be applied to ANY news article. Now we needed an algorithm to tag each article to a topic and a locality. We hand tagged some news articles to their topics, and used an active machine learning approach to build a topic tagging algorithm. The active approach meant an exponential reduction in the amount of hand tagging needed. Once we had topics for all the news articles in this way, we used THAT data to derive street / locality tagging. This was done using a technique that we call CONTEXT DEPENDENT GEOPARSING, where we attempt to find out which location is relevant to the NEWS, by giving more emphasis to the location-revealing words in the article that are closer to the topic-revealing keywords. Once we had the topic and locality tags, we projected this data on to a GUI based on the OPENSTREET GIS platform. We've used only open source technologies in this hack. Our backend - the crawling and machine learning part - is coded in PYTHON, and the GUI uses OPENSTREET GIS, along with a number of openly available GITHUB libraries like leaflet, etc. Regarding accuracy of the tagging, we reserved a part of the hand tagged data for evaluation and testing, and these are the results we saw. Across all the topics, we saw at least 87% accuracy of topic prediction, and at least 80% accuracy of location prediction, which we think are really encouraging numbers and could go commercial after some refinement.