Overview

SHRUG v2.0 Pakora contains many improvements, detailed below. Users should be careful not to combine data across SHRUG versions. For a quickstart tutorial, please see here.

New Data

New administrative data.

SHRUG 2.0 includes most variables from the Socioeconomic and Caste Census (household asset data for 1 billion households), a new range of town and village socioeconomic fields, bank branch information, and many more. For a complete list of tables and fields, see the documentation website.

New spatial data.

SHRUG v2.0 scales up some of the core fields most of interest to researchers, with up-to-date agricultural and weather data, Facebook population and relative wealth index, pollution, and more.

Data at all levels of aggregation

Download any data field at any level of geography. Indian data is famously inconsistent; district boundaries change over time, and there are multiple competing boundaries at the same geographic scale (eg. tehsil, block, constituency). A common constraint facing researchers is having data at the wrong geographic aggregation and spending months linking data across different aggregations.

We have designed a tool called IdConverter that automates the transformation of datasets from one aggregation unit to another. For instance, it can transform data from 2011 districts to India’s current “local government directory” districts, which will be the basis for the 2021 Population Census. This effectively automates the process that researchers would normally go through manually. The algorithm automates decision-making around how to handle districts that have split or combined using all the information that is available, and makes reasonable (but modifiable) default choices when assumptions are needed to combine data (e.g. when two districts are combined, but data is only available from one of the districts).

The process of transforming data across aggregation levels is a major headache to socioeconomic researchers, which our toolkit will almost completely automate. The basis for being able to do this is a set of keys across India’s geometries (which we have built up over the last decade), and our IdConverter algorithm. This means that any dataset at any level of aggregation will be instantly linkable to the SHRUG; the cost of linking new data will be close to zero. This also makes it nearly costless for users to contribute data to the SHRUG; the major barrier for most researchers with data to contribute is solving the problem of inconsistent geographic aggregations across datasets.

When users download data from the SHRUG, they can see what is the original aggregation used for data collection, but in most cases they will be able to download the fields they want at any level of aggregation they want.

Improved shrids.

In previous versions of the SHRUG, shrids were extremely large around India's most important cities. For v2.0 we have made a great effort to increase spatial granularity in these large urban areas by reducing the spatial extent of the shrid.

Open source village/town maps for India.

Village-, town-, and constituency-level shapefiles are prohibitively expensive and in high demand. While some open source data are available, they are incomplete, poorly labeled, and internally inconsistent. We are combining multiple open data sources to create the most comprehensive geometries available for India. These geographic elements alone will be transformative for the research and policy community. The open shapes also allow any spatial data to be seamlessly integrated with other data in SHRUG.

Breaking Changes

The original shrid identifier has been rebuilt entirely. The new ID variable is named shrid2. For more information, see here.
All data files have been repackaged and renamed.
The organizational framework for arranging variables within data files and download ZIP archives has changed. The SHRUG documentation site contains this information.
Assembly Constituency identifiers con08_id and con07_id have been updated.
Some variables have been renamed. Please see the metadata pages on the documentation website for details.

Other Versions

For the most recent version of the SHRUG, visit the Development Data Lab website. For older versions, please visit the Harvard Dataverse. Please ignore the Harvard version numbering system.

New Data

This release includes additional PMGSY road variables over the last release, including a larger number of observations and data for both upgraded roads and newly constructed roads.

The second major addition is the inclusion of 1000 bootstrapped consumption imputations to allow users to account for error associated with the estimation of per-capita consumption for the SHRUG.

The v1.4 Codebook is included here, which contains more details on variable construction and usage notes.

Breaking Changes

PMGSY variables have been renamed; all variables now use the road_ prefix.

Other Versions

For the most recent version of the SHRUG, visit the Development Data Lab website. For older versions, please visit the Harvard Dataverse. Please ignore the Harvard version numbering system.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overview

New Data

New administrative data.

New spatial data.

Data at all levels of aggregation

Improved shrids.

Open source village/town maps for India.

Breaking Changes

Other Versions

New Data

Breaking Changes

Other Versions

New Data

Breaking Changes

Other Versions

Releases: devdatalab/shrug-public

SHRUG v2.0 Pakora

Overview

New Data

New administrative data.

New spatial data.

Data at all levels of aggregation

Improved shrids.

Open source village/town maps for India.

Breaking Changes

Other Versions

SHRUG v1.5

New Data

Breaking Changes

Other Versions

SHRUG v1.4

New Data

Breaking Changes

Other Versions