Releases: devdatalab/shrug-public
SHRUG v2.1 Pakora
Overview
SHRUG v2.1 Pakora is an incremental update that mainly focusses on fixing bugs in the SHRID and Assembly Constituency definitions. Users should exercise caution as some SHRIDs have had their populations/geometric extents corrected, and a handful of SHRID ids have changed.
Changes to SHRIDs
We now correctly incorporate Urban OGs (Outgrowths) into our SHRIDs. Previously, there were cases when SHRIDs would have their PC11 population overestimated due to double counting of populations within OGs and their spatial extents undercounted due to OGs not being correctly included within the SHRIDs. This has also resulted in several older SHRIDs being reassigned to other SHRIDs as well as the creation of new SHRIDs. See here for a list of changed SHRIDs.
There are also 10–20,000 new SHRIDs, most of which are observed only in one census year. These are mostly villages that could not be matched to other years; we have included them for completeness. This results in a closer match to the total population from the 2011 Census.
Here is a summary table of the SHRID changes described above across all 3 Population Census waves:
Population Census Year | Number of changed SHRIDs | Number of new SHRIDs | Number of old SHRIDs that have been reassigned |
---|---|---|---|
2011 | 525 | 19855 | 60 |
2001 | 285 | 15555 | 60 |
1991 | 58 | 8427 | 57 |
We strongly recommend upgrading to the v2.1 shrids, and avoiding intermixing with older versions of the SHRUG, for maximum analytical accuracy.
Changes to Assembly Constituencies
We have corrected how certain datasets that have only rural/urban components like the Census Town and Village Directories are collapsed to the Assembly Constituency level. Census Abstracts are largely unchanged, but TD and VD have been significantly improved.
We have also added 2 new keys which contain constituency, state and district names (where available) for both pre-2007 and post-2008 delimitation Assembly Constituencies. These name keys are available in the Core Keys
SHRUG module on the download page.
Changes to Raster datasets
We have changed internally the method used to extract raster data to the required aggregate level. Previously, we included all raster pixels which touched a given aggregate polygon. Now, we use the fraction of overlap to adjust the raster value. We have verified that this changes raster datasets only very minutely.
Other versions
For the most recent version of the SHRUG, visit the Development Data Lab website. For older versions, please visit the Harvard Dataverse. Please ignore the Harvard version numbering system.
SHRUG v2.0 Pakora
Overview
SHRUG v2.0 Pakora contains many improvements, detailed below. Users should be careful not to combine data across SHRUG versions. For a quickstart tutorial, please see here.
New Data
New administrative data.
SHRUG 2.0 includes most variables from the Socioeconomic and Caste Census (household asset data for 1 billion households), a new range of town and village socioeconomic fields, bank branch information, and many more. For a complete list of tables and fields, see the documentation website.
New spatial data.
SHRUG v2.0 scales up some of the core fields most of interest to researchers, with up-to-date agricultural and weather data, Facebook population and relative wealth index, pollution, and more.
Data at all levels of aggregation
Download any data field at any level of geography. Indian data is famously inconsistent; district boundaries change over time, and there are multiple competing boundaries at the same geographic scale (eg. tehsil, block, constituency). A common constraint facing researchers is having data at the wrong geographic aggregation and spending months linking data across different aggregations.
We have designed a tool called IdConverter that automates the transformation of datasets from one aggregation unit to another. For instance, it can transform data from 2011 districts to India’s current “local government directory” districts, which will be the basis for the 2021 Population Census. This effectively automates the process that researchers would normally go through manually. The algorithm automates decision-making around how to handle districts that have split or combined using all the information that is available, and makes reasonable (but modifiable) default choices when assumptions are needed to combine data (e.g. when two districts are combined, but data is only available from one of the districts).
The process of transforming data across aggregation levels is a major headache to socioeconomic researchers, which our toolkit will almost completely automate. The basis for being able to do this is a set of keys across India’s geometries (which we have built up over the last decade), and our IdConverter algorithm. This means that any dataset at any level of aggregation will be instantly linkable to the SHRUG; the cost of linking new data will be close to zero. This also makes it nearly costless for users to contribute data to the SHRUG; the major barrier for most researchers with data to contribute is solving the problem of inconsistent geographic aggregations across datasets.
When users download data from the SHRUG, they can see what is the original aggregation used for data collection, but in most cases they will be able to download the fields they want at any level of aggregation they want.
Improved shrids.
In previous versions of the SHRUG, shrids were extremely large around India's most important cities. For v2.0 we have made a great effort to increase spatial granularity in these large urban areas by reducing the spatial extent of the shrid.
Open source village/town maps for India.
Village-, town-, and constituency-level shapefiles are prohibitively expensive and in high demand. While some open source data are available, they are incomplete, poorly labeled, and internally inconsistent. We are combining multiple open data sources to create the most comprehensive geometries available for India. These geographic elements alone will be transformative for the research and policy community. The open shapes also allow any spatial data to be seamlessly integrated with other data in SHRUG.
Breaking Changes
- The original
shrid
identifier has been rebuilt entirely. The new ID variable is namedshrid2
. For more information, see here. - All data files have been repackaged and renamed.
- The organizational framework for arranging variables within data files and download ZIP archives has changed. The SHRUG documentation site contains this information.
- Assembly Constituency identifiers
con08_id
andcon07_id
have been updated. - Some variables have been renamed. Please see the metadata pages on the documentation website for details.
Other Versions
For the most recent version of the SHRUG, visit the Development Data Lab website. For older versions, please visit the Harvard Dataverse. Please ignore the Harvard version numbering system.
SHRUG v1.5
New Data
- This release includes urban small area estimates of consumption, as well as bootstraps of these estimates.
- Tendulkar poverty rates have been added.
- Manufacturing and services employment have been added.
- The v1.5 Codebook is here, which contains more details on variable construction and usage notes. (The most recent codebook is available here)
- More VCF forest cover data - now available through 2019
Breaking Changes
- Rural consumption variables have been renamed.
- Night Lights are not in the
ancillary
file; they are in their own file. This was the case in v1.4 as well, but the v1.4 codebook misstated the location of these data.
Other Versions
For the most recent version of the SHRUG, visit the Development Data Lab website. For older versions, please visit the Harvard Dataverse. Please ignore the Harvard version numbering system.
SHRUG v1.4
New Data
This release includes additional PMGSY road variables over the last release, including a larger number of observations and data for both upgraded roads and newly constructed roads.
The second major addition is the inclusion of 1000 bootstrapped consumption imputations to allow users to account for error associated with the estimation of per-capita consumption for the SHRUG.
The v1.4 Codebook is included here, which contains more details on variable construction and usage notes.
Breaking Changes
PMGSY variables have been renamed; all variables now use the road_
prefix.
Other Versions
For the most recent version of the SHRUG, visit the Development Data Lab website. For older versions, please visit the Harvard Dataverse. Please ignore the Harvard version numbering system.