Skip to content

Tasks for Interns

answerquest edited this page Jul 27, 2017 · 20 revisions

Update: Repo for interns (with full write access) created, click here to check out your virtual office!

Check out these tasks, and alert the group on the #pune slack channel when you're ready to take up a task. You might need to be added to a spreadsheet or similar in some tasks so contact Nikhil for that.

Not skill-specific / tabular operations

  • PMPML data: Assign UID to BRTS stops and nonBRTS stops see here
  • Study and Analysis of GTFS specification (global standard for public transport data) see here
  • Pune Development Plan maps warping see here
  • MSRTC data: find lat-long of bus stops/stands see here
  • Data gathering for various topics where official datasets aren't publicly available and we can work on organically building them up. Example: urban farms, organic/zbnf farms, tree plantation sites, public schools. (For some, there may be NGOs / volunteer groups already working on them and we could liaise with them)
  • Pune Budget 2017-18 data : curating the published excel data, bringing it to a flat standard tabular structure form see here
  • Pune elections and wards data: Transcribing ward-wise admin, zone data from image shared by PMC and adding into the tables made. see here
  • Pune elections and wards data : gathering photos and other info of elected corporators. see here

R, database operations, data analysis, data visualization related

  • Check out Datazar and similar online platforms for loading data and performing R tasks collaboratively online, find limitations of free accounts, report back to group.
  • Separate routes data into BRTS-side and nonBRTS-side see here
  • Pune Budget data: Load into a database and create queries, views. 2017-18 is to be cleaned but 2016-17 data is ready for databasing. see here
  • Pune Budget data: Check out standard formats shared by folks at Open Budgets India portal and create queries to adapt our data to that. see here
  • Pune Budget data: year on year comparisons, including comparing budgeted vs actual expenditure for years by combining different years' budget data see here
  • MH Villages mapping : finding district-wise, taluka-wise village counts etc from shapefile metadata and census data and comparing them, flagging differences see here
  • MSRTC data: find and publish statistics of bus stops by taluka, district etc see here

GIS, mapping related

  • PMPML data : Mapping routes, calculating distance between consecutive stops from lat-long and flagging any routes where this distance is too great or it looks buggy on the map. see here
  • MH Villages mapping : geo-referencing / map-warping taluka pdfs from MRSAC to web-map and comparing with shapefile to detect anomalies, missing villages etc. see here
  • MH Villages mapping : tracking shapes with repeating village codes, comparing with taluka PDFs and deciding if they are to be merged, assigned different codes etc see here
  • MH Villages mapping : tracking shapes with blank village codes, comparing with taluka PDFs to figure out if they belong anywhere see here
  • MH Villages mapping : Map MLA constituencies to villages, so that in the metadata of every village we can find which constituency it is and then find who is the MLA etc. Publish this census code to constituency lookup table separately. see here

Programming projects


Places where things are not clear yet and there's scope to figure things out

  • MSRTC data: find lat-long of bus stops/stands see here : automating this, using address lookup, census lookup, matching with MH Villages data etc.
  • MSRTC routes, timetables : Explore ways to gather this data.

Plenty of tasks to go around! There's more that will be added soon!


Some things to take care of first

  • Create your free account on github if you don't already have one.
  • We expect the interns to maintain detailed logs of the steps they do in the tasks they take up, and take screenshots of the most important steps.
  • This logging will be done on this repo we've set up for the interns where they will have edit access. We'll leave it to you guys to sort out what goes where. You can start by going to the wiki and making a page for yourself. In case of a team working together, they need only have one document of the task.
  • Github markdown syntax is damn easy and super-cool and we expect interns to learn and get acquainted with it as they log their progress. There are also tools for converting from word to markdown. For offline editors, check out Remarkable.
  • Create a free account on http://imgur.com if you don't already have one. This will be where you upload all your screenshots. On uploading, you'll get a Direct Link URL which you'll embed in your work log.
  • Join the datameet slack network. It's linked on the home page. Over there (#pune channel) you can post which task you're taking up, whenever you're ready to get started.

Support structure

  • Stuck somewhere? Post it here: https://github.com/datameet-pune/interns/issues. It doesn't have to be about code, it can be as simple as "can you help me sort this table properly". We're all here to learn and this internship has people from different backgrounds and a variety of skillsets. By using the open forum space, we can make it possible for peers and even strangers on the web to help us.
  • Use the datameet slack network, #pune channel to post queries and discuss things. Heck, start an issue and post its link on the slack channel.
  • You can also join the whatsapp group (see home page for joining link) and chat there. But please keep it short over there and strictly no forwards.
  • There are many volunteers in our network who can guide you in specific matters. Reach out.

PS: Are you new here? Please see our home page and the **call for internships page to know what this is all about.