Skip to content

Dustin013/Texas-Bar-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Texas Bar Scraper

Public Python scraper for Texas Bar member records by ContactID.

This repository is intentionally prepared for public use:

  • No database integrations
  • No embedded credentials or secrets
  • No private proxy infrastructure references

What It Does

main.py iterates over a ContactID range, requests each Texas Bar profile page, parses key fields, and appends results to a CSV so the run is resumable.

Parsed fields include:

  • Name and eligibility
  • Bar number and TX license date
  • Firm and practice areas
  • Primary location and address parts
  • Firm size, occupation, graduation date
  • Status (ok, not_found, unavailable, empty, error)

Requirements

  • Python 3.10+
  • Package: requests

Install dependency:

python3 -m pip install requests

Quick Start

Run a small range first:

python3 main.py --start 148186 --end 148250 --csv texas_bar_members.csv

Longer run:

python3 main.py --start 148186 --end 394366 --csv texas_bar_members.csv

Proxy Setup (Optional)

This scraper supports rotating proxies from a local text file.

  1. Copy the example file and add your proxy endpoints:
cp proxies.example.txt proxies.txt
  1. Edit proxies.txt with one proxy URL per line:
http://user:pass@host:port
http://host:port
socks5://user:pass@host:port
  1. Run with the proxy file:
python3 main.py --start 148186 --end 148250 --proxy-file proxies.txt

Notes:

  • Blank lines and lines starting with # are ignored.
  • A random proxy is selected per request.

Resume / Retry Workflows

Default behavior skips ContactIDs already present in the output CSV.

Retry missing IDs within a range:

python3 main.py --start 148186 --end 149000 --csv texas_bar_members.csv --retry-missing

Sort output by ContactID at the end:

python3 main.py --start 148186 --end 148500 --sort-csv

Useful Flags

  • --start / --end: ContactID range
  • --csv: output CSV path (default: texas_bar_members.csv)
  • --timeout: HTTP timeout in seconds (default: 30)
  • --retries: retries per ContactID (default: 2)
  • --delay-min / --delay-max: random delay between requests
  • --proxy-file: path to proxy list file
  • --retry-missing: scrape only IDs missing from CSV
  • --sort-csv: sort CSV by contact_id when complete

Public Repo Notes

  • Keep proxies.txt out of git if it contains paid/private proxy credentials.
  • Prefer committing only proxies.example.txt.
  • Be respectful with request pacing (--delay-min / --delay-max) and follow site terms.

Disclaimer

Use responsibly and in compliance with all applicable laws, website terms, and data-use policies.

About

This is a python script that will scrape texasbar.com by ID ranges

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages