Public Python scraper for Texas Bar member records by ContactID.
This repository is intentionally prepared for public use:
- No database integrations
- No embedded credentials or secrets
- No private proxy infrastructure references
main.py iterates over a ContactID range, requests each Texas Bar profile page, parses key fields, and appends results to a CSV so the run is resumable.
Parsed fields include:
- Name and eligibility
- Bar number and TX license date
- Firm and practice areas
- Primary location and address parts
- Firm size, occupation, graduation date
- Status (
ok,not_found,unavailable,empty,error)
- Python 3.10+
- Package:
requests
Install dependency:
python3 -m pip install requestsRun a small range first:
python3 main.py --start 148186 --end 148250 --csv texas_bar_members.csvLonger run:
python3 main.py --start 148186 --end 394366 --csv texas_bar_members.csvThis scraper supports rotating proxies from a local text file.
- Copy the example file and add your proxy endpoints:
cp proxies.example.txt proxies.txt- Edit
proxies.txtwith one proxy URL per line:
http://user:pass@host:port
http://host:port
socks5://user:pass@host:port
- Run with the proxy file:
python3 main.py --start 148186 --end 148250 --proxy-file proxies.txtNotes:
- Blank lines and lines starting with
#are ignored. - A random proxy is selected per request.
Default behavior skips ContactIDs already present in the output CSV.
Retry missing IDs within a range:
python3 main.py --start 148186 --end 149000 --csv texas_bar_members.csv --retry-missingSort output by ContactID at the end:
python3 main.py --start 148186 --end 148500 --sort-csv--start/--end: ContactID range--csv: output CSV path (default:texas_bar_members.csv)--timeout: HTTP timeout in seconds (default:30)--retries: retries per ContactID (default:2)--delay-min/--delay-max: random delay between requests--proxy-file: path to proxy list file--retry-missing: scrape only IDs missing from CSV--sort-csv: sort CSV bycontact_idwhen complete
- Keep
proxies.txtout of git if it contains paid/private proxy credentials. - Prefer committing only
proxies.example.txt. - Be respectful with request pacing (
--delay-min/--delay-max) and follow site terms.
Use responsibly and in compliance with all applicable laws, website terms, and data-use policies.