These scripts enable you to scrape VPAP. They currently include only functionality that has been requested, but can be extended to add more features.
CSVs are published (in the /data directory) with data for specific requested candidates.
Uses Python libraries Requests, BeautifulSoup, Selenium, Pandas.
bio_
fields come from the Legislators page Overview section (example here), so generally they're only available for candidates who were/are legislators. Candidates who ran and lost do not havebio_
fields.bio_
fields are also often missing for candidates who did serve as legislators. However, some fields can be parsed (manually for now) out of thesummary
field.
- Per above,
bio_gender
is only available for legislators. For candidates, attempt to parse out gender fromsummary
based on the pronouns used for the candidate. This won't be perfect because some candidates do not have summaries and other candidates have summaries but no pronouns are used, e.g. "Jane Doe has served in the House of Delegates since 2017." It may also be possible to guess gender based on first name, but this is not always reliable either.
- e.g.
2019 House of Delegates - District 10 - Regular General
=>HD-10
-
elections page:
- district name
- district index (competitiveness rating)
- date of election
- candidates' names; parties; incumbency; # of votes; voteshare; winner
-
candidate page:
- campaign website