NC Legislature Data Scraping
Use this scrapy application to obtain bill data from the NC Legislature website.
The scrapy extracts each bill's data into an object. Use scrapy command to out put a JSON list of bill objects.
The objective of this project is to export this data into a more usable format for its presentation by the citizens of North Carolina. The data can also be migrated into other apps or made available for further analysis.
How to parse that sweet, raw data
Requires python3 and scrapy.
- Install python3.
- Install scrapy, using pip for example:
pip install scrapy.
- Clone this repository and navigate into ncleg scraper directory:
- Copy file
settings.py. Adjust Scrapy configuration according to your needs.
- Tell scrapy to crawl "bills" via command line instruction. Pass "session" and "chamber" options (chamber is optional, passing no param will scrape both chambers). For example scrape bills Senate bills from 2017-2018 session to a json file:
scrapy crawl <spider> -a chamber=S -a session=2017 -o <filename>.json.
bills- retrieves individual bill information
members- retrieves NCGA representative data
membersvotes- retrieves each member vote from the entire session specified along with some basic member info
In order to politely preserve this public data resource please manage your autothrottle settings appropriately in
settings.py file. For more information read Scrapy's documentation.
Exporting scrapped data to MongoDB
If you want to seed a database with the data parsed by these spiders we can utilize the MongoPipeline. You will want to enable the pipeline in
settings.py. You will also want to set the MONGO_URI and MONGO_DATABASE in the settings. Collections names will be the spider name by default.
TO-DO and how to help
Visit the VoterSmarterNC JIRA Tracker for the list of desired features. Feel free to fork and use for your own project and needs!