Skip to content

Tasks for Interns

datameet-pune edited this page Jul 24, 2017 · 20 revisions

Check out these tasks, and email Nikhil if you want to take up

Non skill specific / excel operations

  • PMPML data: Assign UID to BRTS stops and nonBRTS stops see here
  • Study and Analysis of GTFS specification (global standard for public transport data) see here
  • Pune Development Plan maps warping see here
  • MSRTC data: find lat-long of bus stops/stands see here
  • Data gathering for various topics where official datasets aren't publicly available and we can work on organically building them up. Example: urban farms, organic/zbnf farms, tree plantation sites, public schools. (For some, there may be NGOs / volunteer groups already working on them and we could liaise with them)
  • Pune Budget 2017-18 data : curating the published excel data, bringing it to a flat standard tabular structure form see here
  • Pune elections and wards data: Transcribing ward-wise admin, zone data from image shared by PMC and adding into the tables made. see here
  • Pune elections and wards data : gathering photos and other info of elected corporators. see here

R, database operations, data analysis, data visualization related

  • Check out Datazar and similar online platforms for loading data and performing R tasks collaboratively online, find limitations of free accounts, report back to group.
  • Separate routes data into BRTS-side and nonBRTS-side see here
  • Pune Budget data: Load into a database and create queries, views. 2017-18 is to be cleaned but 2016-17 data is ready for databasing. see here
  • Pune Budget data: Check out standard formats shared by folks at Open Budgets India portal and create queries to adapt our data to that. see here
  • Pune Budget data: year on year comparisons, including comparing budgeted vs actual expenditure for years by combining different years' budget data see here
  • MH Villages mapping : finding district-wise, taluka-wise village counts etc from shapefile metadata and census data and comparing them, flagging differences see here
  • MSRTC data: find and publish statistics of bus stops by taluka, district etc see here

GIS, mapping related

  • PMPML data : Mapping routes, calculating distance between consecutive stops from lat-long and flagging any routes where this distance is too great or it looks buggy on the map. see here
  • MH Villages mapping : geo-referencing / map-warping taluka pdfs from MRSAC to web-map and comparing with shapefile to detect anomalies, missing villages etc. see here
  • MH Villages mapping : tracking shapes with repeating village codes, comparing with taluka PDFs and deciding if they are to be merged, assigned different codes etc see here
  • MH Villages mapping : tracking shapes with blank village codes, comparing with taluka PDFs to figure out if they belong anywhere see here
  • MH Villages mapping : Map MLA constituencies to villages, so that in the metadata of every village we can find which constituency it is and then find who is the MLA etc. Publish this census code to constituency lookup table separately. see here

Programming projects


Places where things are not clear yet and there's scope to figure things out

  • MSRTC data: find lat-long of bus stops/stands see here : automating this, using address lookup, census lookup, matching with MH Villages data etc.
  • MSRTC routes, timetables : Explore ways to gather this data.

Plenty of tasks to go around! There's more that will be added soon! Pick a task and contact Nikhil to get started! You might have to be added as editor to a spreadsheet etc.

Note: We expect the interns to maintain detailed logs of the steps they do in the tasks they take up, and take screenshots of the most important steps. These would be added to the wiki, either directly by the intern, or they can make a repo of their own and add to their wiki and we'll merge it in once they're done. Github markdown syntax is damn easy and super-cool and we expect interns to learn and get acquainted with it.