Skip to content

dcadata/va-resource

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VA 2017 and 2019 Major Party Resource Allocation Study

Data Collection (Scraping) Scripts

These scripts enable you to scrape VPAP. They currently include only functionality that has been requested, but can be extended to add more features.

CSVs are published (in the /data directory) with data for specific requested candidates.

Uses Python libraries Requests, BeautifulSoup, Selenium, Pandas.


Notes

  • bio_ fields come from the Legislators page Overview section (example here), so generally they're only available for candidates who were/are legislators. Candidates who ran and lost do not have bio_ fields. bio_ fields are also often missing for candidates who did serve as legislators. However, some fields can be parsed (manually for now) out of the summary field.

TO-DO

Parse out candidate's gender from summary

  • Per above, bio_gender is only available for legislators. For candidates, attempt to parse out gender from summary based on the pronouns used for the candidate. This won't be perfect because some candidates do not have summaries and other candidates have summaries but no pronouns are used, e.g. "Jane Doe has served in the House of Delegates since 2017." It may also be possible to guess gender based on first name, but this is not always reliable either.

Parse out district from election title

  • e.g. 2019 House of Delegates - District 10 - Regular General => HD-10

Additional Fields

  • elections page:

    • district name
    • district index (competitiveness rating)
    • date of election
    • candidates' names; parties; incumbency; # of votes; voteshare; winner
  • candidate page:

    • campaign website

About

Data Collection for VA Resource Allocation Study via VPAP.org

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages