A tool for scraping public Maryland Judiciary case records through Case Search. Close Crawl extracts the filing dates, titles, individual addresses and partial costs of case records given a range of case numbers or a precompiled array of case numbers.
Usage of the interface is detailed here.
Close Crawl uses Mechanize to crawl through the records and temporarily download the responses as HTML files. Saving the HTML data locally prevents any loss of progress in case of any network or system disruption during the crawling process, which can be resumed later. The saved files are then parsed using BeautifulSoup, and advanced regular expressions. The data is further cleaned with pandas methods, and the final output is saved as a CSV file.
A standalone executable can be found in the dist
directory. The executable was created on a Windows 10 system, and has been tested against Windows 10 systems, and Windows XP, Windows 7 and Windows 8 virtual environments.
If you are not familiar with Python projects on Windows machines, you might want to check out this quick guide.
- Python 2.7. Python 3 is not supported by Mechanize, the package used for crawling the records, and is therefore not supported. Development for Python 3 support is under way.
- Third party packages listed in
requirements.txt
Clone the repository, create a virtual environment and install the packages via pip: pip install -r requirements.txt
.
Or run the Makefile: make init
The tests run on nose. To install, run: pip install nose
- For UNIX machines, a Makefile has been provided for convenience. Just run:
make test
- For non-UNIX machines:
nosetests -v -w tests
will work.
PyInstaller was used to build the Windows executable. More details on the building process can be found here.