Skip to content

Commit

Permalink
Add citation and add dataverse link
Browse files Browse the repository at this point in the history
  • Loading branch information
lukesonnet committed Jun 29, 2018
1 parent 81bbf7c commit 9429233
Show file tree
Hide file tree
Showing 2 changed files with 145 additions and 0 deletions.
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@ This repository hosts open source R code used to scan, consolidate, tidy, and cl

The final output after cleaning is also included as [pk_candidate_scrutiny_data_2018.csv](https://github.com/colincookman/pakistan_2018_candidates/blob/master/pk_candidate_scrutiny_data_2018.csv), or in wide format (collapsing three reported tax-year rows per candidate into a single row) as [pk_candidate_scrutiny_data_2018_wide.csv](https://github.com/colincookman/pakistan_2018_candidates/blob/master/pk_candidate_scrutiny_data_2018_wide.csv).

If you use this data, please consider using the following citation when it is appropriate:
```
Cookman, Colin; Sonnet, Luke, 2018, "2018 Pakistan General Election Candidate Scrutiny Forms", https://doi.org/10.7910/DVN/PX8JKY, Harvard Dataverse, V1, UNF:6:T12VRIN5/4mgmHYXmTB+4g==
```
You can find citation tools such as a .bib file on this databases [dataverse page here](https://doi.org/10.7910/DVN/PX8JKY).

Please note that this code is still a work in progress and data outputs hosted here may be incomplete. For questions, suggestions, or to contribute, please leave an issue here or contact the contributors, Luke Sonnet and Colin Cookman.

## Data
Expand Down
139 changes: 139 additions & 0 deletions README_dataverse.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
# Pakistan 2018 General Elections Candidate Scrutiny Forms

Note that you can get the latest version of the data, more comprehensive documentation, and various auxiliary files at the GitHub repository here: https://github.com/colincookman/pakistan_candidate_scrutiny_18. The versions released on dataverse will correspond to releases on the GitHub repository here: https://github.com/colincookman/pakistan_candidate_scrutiny_18/releases

This repository hosts cleaned candidate scrutiny forms released by the Election Commission of Pakistan for prospective candidates for Pakistan's 2018 national and provincial assembly elections. The scrutiny forms consist of data released on tax payments from the Federal Board of Revenue (FBR), corruption cases from the the National Accountability Bureau (NAB), and oustanding loans from the State Bank of Pakistan (SBP).

The final output after cleaning is also included as pk_candidate_scrutiny_data_2018.csv, or in wide format (collapsing three reported tax-year rows per candidate into a single row) as pk_candidate_scrutiny_data_2018_wide.csv.

Please note that this database may change with additional input, corrections, or data sources. Please cite it when using it if you feel that is appropriate.

## Data
The raw files used as a data source are approximately 10.3 GB in size and too large to host in this Github repository. They were posted by the ECP in late June 2018.

## Scope and possible gaps in data
Pakistani election law does not impose residency requirements for candidacy filings and allows individual candidates to contest multiple seats simultaneously, within and across assemblies and provinces. As of the initial data release the ECP provided information for 19397 candidacy filings and 15907 unique candidates (as identified by accompanying Computerized National ID Card records). Note that in some cases individual records reported by the FBR, NAB, and SBP may not be consistent across multiple constituency filings.

### Candidacy filing counts based on available data
| province | assembly | direct_seats | womens_seats | minority_seats |
|-------------|---------------------|--------------|--------------|----------------|
| Balochistan | National Assembly | 374 | 40 | NA |
| Balochistan | Provincial Assembly | 1302 | 13 | 24 |
| KPK | National Assembly | 1076 | 90 | NA |
| KPK | Provincial Assembly | 1959 | 177 | 59 |
| National | National Assembly | NA | NA | 215 |
| Punjab | National Assembly | 2428 | 172 | NA |
| Punjab | Provincial Assembly | 6234 | 668 | 141 |
| Sindh | National Assembly | 1181 | 178 | NA |
| Sindh | Provincial Assembly | 2916 | 94 | 56 |


The ECP had previously reported (https://www.ecp.gov.pk/PrintDocument.aspx?PressId=55295&type=Image, https://web.archive.org/web/20180627164802/https://www.ecp.gov.pk/PrintDocument.aspx?PressId=55295&type=Image) on June 18 2018 that 21482 nomination papers had been filed. While we cannot account for the discrepancy or identify missing candidates at this stage, based on the ECP's earlier aggregate figures the following number of candidacy filings may be missing from the current dataset (or not accounted for in the earlier ECP statement in cases where more records were available):

### ECP reported filings (Difference with available data)
| province | assembly | direct_seats | womens_seats | minority_seats |
|-------------|---------------------|--------------|--------------|----------------|
| Balochistan | National Assembly | 435 (-61) | 36 (+4) | NA |
| Balochistan | Provincial Assembly | 1400 (-98) | 116 (-103) | 56 (-32) |
| KPK | National Assembly | 992 (+84) | 88 (+2) | NA |
| KPK | Provincial Assembly | 1920 (+39) | 262 (-85) | 73 (-14) |
| National | National Assembly | NA | NA | 154 (+61) |
| Punjab | National Assembly | 2700 (-272) | 236 (-64) | NA |
| Punjab | Provincial Assembly | 6747 (-513) | 664 (+4) | 232 (-91) |
| Sindh | National Assembly | 1346 (-165) | 76 (+102) | NA |
| Sindh | Provincial Assembly | 3626 (-710) | 213 (-119) | 110 (-54) |

A final list of candidates following the process of scrutiny, disqualification, appeal, and withdrawal is currently scheduled for release shortly after v1 of this data were released.

## Caveats
This dataset is being presented to encourage broader open data sharing among the community of analysts on Pakistan. We make no guarantees as to and cannot verify the accuracy of, or account for any discrepancies in, the underlying data.

## pk_candidate_scrutiny_data_2018.csv variable key

There are currently three rows in this dataset for every candidate-constituency. Each row is then a candidate-constituency-tax_year.

**candidate_code:** Candidacy filing code generated from constituency and candidate number (not unique to single individuals)

**province:** Province location (note that former FATA constituencies are included in KPK, and Islamabad constituencies in Punjab)

**assembly:** National or provincial assembly

**constituency_number:** Constituency number for directly elected seats or womens / minorities reserved list

**candidate_number:** ECP-assigned candidacy filing number

**candidate_CNICP_ECP:** ECP-reported Computerized National ID Card, unique to single individuals

**multi_candidate:** Flags individuals that are contesting multiple constituencies

**candidate_NTN:** Candidate National Tax Number as reported by Federal Board of Revenue

**candidate_NTN_issue:** Date of NTN issue as reported by Federal Board of Revenue

**candidate_RTO:** Location of NTN-issuing Regional Tax Office as reported by Federal Board of Revenue

**candidate_MNIC_NAB:** Candidate Manual National ID Card as reported by the NAB

**candidate_MNIC_SBP:** Candidate Manual National ID Card as reported by the SBP

**tax_year:** Tax-year observation (2015, 2016, or 2017)

**candidate_tax_type:** Candidate filed, did not file, or was unregistered

**candidate_tax-paid:** Tax paid by candidate as reported by FBR

**candidate_tax_paid_num:** Tax paid converted to numeric values for calculation

**candidate_tax_receipts:** "Receipts under final tax regime" as reported by FBR

**candidate_tax_receipts_num:** Tax receipts converted to numeric values for calculation

**candidate_tax_income:** Taxable income as reported by FBR

**candidate_tax_income_num:** Taxable income converted to numeric values for calculation

**candidate_tax_remarks:** Additional remarks as reported by FBR

**candidate_NAB_guilty:** Binary variable if NAB reported any conviction, plea bargain, or other pending case against candidate

**candidate_NAB_conviction:** Binary variable if NAB reported any conviction against candidate

**candidate_NAB_plea:** Binary variable if NAB reported any plea bargain on the part of candidate

**candidate_NAB_accused:** Binary variable if NAB reported candidate accused or otherwise facing pending cases

**candidate_NAB_remarks:** NAB remarks on any

**candidate_personal_loan:** SBP remarks on candidate personal loans, if any reported

**candidate_business_loan:** SBP remarks on candidate business loans, if any reported

**parl_inc_tax_2016:** An indicator for whether this CNIC is linked to a parliamentarian in 2016 when the FBR released incumbents tax payment amounts in the 2016 Incumbent Parliamentarian report

**parl_inc_name:** Candidate name when a parliamentarian as reported by the FBR in the 2016 Incumbent Parliamentarian report

**parl_inc_chamber:** Candidate chamber when a parliamentarian as reported by the FBR in the 2016 Incumbent Parliamentarian report

**parl_inc_province:** Candidate province when a parliamentarian as reported by the FBR in the 2016 Incumbent Parliamentarian report, incomplete for MNAs

**parl_inc_type:** Candidate seat type when a parliamentarian as reported by the FBR in the 2016 Incumbent Parliamentarian report, incomplete for MNAs

**parl_tax_paid_2016:** Amount paid by candidate in 2016 when they were a parliamentarian, as reported by the FBR in the 2016 Incumbent Parliamentarian report

**parl_aop_tax_paid_2016:** Amount paid by AOPs the candidate was a part of in 2016 when they were a parliamentarian, as reported by the FBR in the 2016 Incumbent Parliamentarian report

**candidate_name_FBR:** Candidate name as reported by FBR in Urdu unicode

**candidate_name_NAB:** Candidate name as reported by NAB in Urdu unicode

**candidate_name_SBP:** Candidate name as reported by SBP in Roman Urdu (note: SBP forms did not report name information consistently)

**urdu_name_match:** Check on whether FBR and NAB Urdu names match (several cases indicate mismatch, apparently due to typos or nonstandardized name spellings)

**MNIC_match:** Check on whether NAB and SBP MNIC identifications match

**target:** Directory path for folder with candidate data

### wide version of the data

We also release a wide version of the data as `pk_candidate_scrutiny_data_2018_wide.csv`. All that changes is the data is no longer candidate-constituency-tax_year, but is rather just candidate-constituency level. All of the `candidate_tax_*` variables scraped from the FBR PDFs become `candidate_tax_*_YYYY` variables.

0 comments on commit 9429233

Please sign in to comment.