Tasks for Interns

answerquest edited this page Jul 31, 2017 · 20 revisions

Update: Repo for interns (with full write access) created, click here to check out your virtual office!

Check out these tasks, and alert the group on the #pune slack channel when you're ready to take up a task. You might need to be added to a spreadsheet or similar in some tasks so contact Nikhil for that.

Data Wrangling, data gathering related

project task who's on it
PMPML Assign UID to BRTS stops and nonBRTS stops. link
PMPML Study and Analysis of GTFS specification (global standard for public transport data) link
Development Plan Maps warping link
MSRTC find lat-long of bus stops/stands link
Pune Budget curating the published excel data, bringing it to a flat standard tabular structure form link
Pune electoral Transcribing ward-wise admin, zone data from image shared by PMC and adding into the tables made. link
Pune electoral gathering photos and other info of elected corporators. link

R, database operations, data analysis, data visualization related

project task who's on it
Misc Check out Datazar and similar online platforms for loading data and performing R tasks collaboratively online, find limitations of free accounts
PMPML Separate routes data into BRTS-side and nonBRTS-side link
Pune Budget Load into a database and create queries, views. 2017-18 is to be cleaned but 2016-17 data is ready for databasing. link
Pune Budget Check out standard formats shared by folks at Open Budgets India portal and create queries to adapt our data to that. link
Pune Budget year on year comparisons, including comparing budgeted vs actual expenditure for years by combining different years' budget data link
MH Villages finding district-wise, taluka-wise village counts etc from shapefile metadata and census data and comparing them, flagging differences link
MSRTC find and publish statistics of bus stops by taluka, district etc link

GIS, mapping related

project task who's on it
PMPML Mapping routes, calculating distance between consecutive stops from lat-long and flagging any routes where this distance is too great or it looks buggy on the map. link
MH Villages geo-referencing / map-warping taluka pdfs from MRSAC to web-map and comparing with shapefile to detect anomalies, missing villages etc. link
MH Villages tracking shapes with repeating village codes, comparing with taluka PDFs and deciding if they are to be merged, assigned different codes etc link
MH Villages tracking shapes with blank village codes, comparing with taluka PDFs to figure out if they belong anywhere link
MH Villages tracking shapes with blank village codes, comparing with taluka PDFs to figure out if they belong anywhere link
MH Villages Map MLA constituencies to villages, so that in the metadata of every village we can find which constituency it is and then find who is the MLA etc. Publish this census code to constituency lookup table separately. link
MH Villages Track villages migrated to new district formed, Palghar link

Programming projects

project task who's on it
PMPML GTFS feed creation link Gaurav
MH Villages DIY Map Choropleth Plotter link
MH Villages + others Match the following link
MH Villages + others Find shape from lat-long [[link Find shape from lat-long]]

Projects/tasks that are still in early stages and things aren't clear yet

project task link
Mapping Crowdsourcing Map-based data link
Misc Data gathering for various topics where official datasets aren't publicly availabe and we can work on organically building them up. Example: urban farms, organic/zbnf farms, tree plantation sites, public schools. (For some, there may be NGOs / volunteer groups already working on them and we could collaborate with them)
MSRTC find lat-long of bus stops/stands. Automating this, using address lookup, census lookup, matching with MH Villages data etc. link
MSRTC Explore ways to gather routes, timetables data

Note 1: Minimum committment

While you can take up any of the things here, or come up with tasks yourself, as part of this internship there is a minimum 1hr/week you will have to spend on any of the basic data cleaning, curating related tasks mentioned in the first table in this page.

Note 2: Method-agnostic

Our starting point is from real world data rather than from some academic discipline; and to get the work done we're agnostic about exactly which way to do it. For example, we care more about getting the city's bus stops and routes data properly managed than about doing it exclusively through R or MySQL or JSON or excel etc. In our work with this kind of data, we have observed time and again that projects require working in interdisciplinary ways (and you can see that above). So one can say we have an object-oriented instead of procedural approach.

We'll leave it to you to figure out for yourself which project/task matches the methodology you want to work on, or you can take any topic and take it forward in your chosen methodology yourself. But it might also be beneficial for you to work outside of your predefined subject area in the course of this internship and focus instead on the social subject like public transit, water, public finances etc. There's many paths to the mountain-top!

Some things to take care of first

  • Create your free account on github if you don't already have one.
  • We expect the interns to maintain detailed logs of the steps they do in the tasks they take up, and take screenshots of the most important steps.
  • This logging will be done on this repo we've set up for the interns where they will have edit access. We'll leave it to you guys to sort out what goes where. You can start by going to the wiki and making a page for yourself. In case of a team working together, they need only have one document of the task.
  • Github markdown syntax is damn easy and super-cool and we expect interns to learn and get acquainted with it as they log their progress. There are also tools for converting from word to markdown. For offline editors, check out Remarkable.
  • Create a free account on http://imgur.com if you don't already have one. This will be where you upload all your screenshots. On uploading, you'll get a Direct Link URL which you'll embed in your work log.
  • Join the datameet slack network. It's linked on the home page. Over there (#pune channel) you can post which task you're taking up, whenever you're ready to get started.

Support structure

  • Stuck somewhere? Post it here: https://github.com/datameet-pune/interns/issues. It doesn't have to be about code, it can be as simple as "can you help me sort this table properly". We're all here to learn and this internship has people from different backgrounds and a variety of skillsets. By using the open forum space, we can make it possible for peers and even strangers on the web to help us.
  • Use the datameet slack network, #pune channel to post queries and discuss things. Heck, start an issue and post its link on the slack channel.
  • You can also join the whatsapp group (see home page for joining link) and chat there. But please keep it short over there and strictly no forwards.
  • There are many volunteers in our network who can guide you in specific matters. Reach out.

PS: Are you new here? Please see our home page and the call for internships page to know what this is all about.

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.