Python web automation project to define a city boundary, split it into a grid, and scrape Google Maps (e.g. for "clinic") per cell. Output is a merged, deduplicated JSON of places.
- Boundary — Define the city as a polygon (e.g.
lahore.json). You can create or replace it with the city boundary extractor. - Grid — Build a grid of cells inside the boundary (
grid_builder.py→grid.json). Cell size is configurable (e.g. 500 m or 2000 m). - Scrape — For each cell, open Google Maps, search for a query (e.g. "clinic"), scroll results, and save places to
output/raw/<cell_id>.json. Resumable; supports multiple workers. - Merge — Merge all raw JSONs, deduplicate by place ID or name+address, write e.g.
output/lahore_clinics.json.
- Python 3.10+
- Playwright (Chromium) for browser automation
pip install -r requirements.txt
playwright install chromiumEdit config.json for:
| Key | Description |
|---|---|
boundary_name |
City name (used by extractor and grid) |
boundary_file |
Boundary polygon JSON (e.g. lahore.json) |
grid_file |
Output path for the grid (e.g. grid.json) |
cell_size_m |
Grid cell size in metres (e.g. 500 or 2000) |
search_query |
Google Maps search term (e.g. "clinic") |
output_dir, raw_dir |
Where merged output and raw cell JSONs go |
num_workers |
Parallel scraper workers (each has its own browser) |
Scraper and boundary extractor have more options (delays, viewport, etc.) in the same file.
Use this when you want to create or replace the city boundary polygon (e.g. for a new city).
- Opens Google Maps and searches for the city (from
boundary_nameor--city). - You click on the map to add boundary points; each click shows a point and the lat/long box at the bottom (same as in Google Maps).
- The script records each new coordinate from that box.
- When you close the browser window, it builds a polygon from all points in click order and saves it (e.g.
lahore.json). That file is the same formatgrid_builder.pyexpects.
Run:
python city_boundary_extractor.pyOptional: --city "City Name", --output path/to/boundary.json, --width, --height.
You need at least 3 points for a valid polygon. Then run grid_builder.py (or the full pipeline) to build the grid from the new boundary.
Builds grid.json from the boundary polygon: divides the bounding box into cells of size cell_size_m and keeps only cells whose center lies inside the polygon (point-in-polygon).
python grid_builder.pyUses config.json: boundary_file, cell_size_m, boundary_name, grid_file.
For each cell in grid.json, opens Maps, runs the search (e.g. "clinic"), pans to trigger "Search this area", scrolls the results list, and saves places to output/raw/<cell_id>.json. Skips cells that already have a file (resumable).
python scraper.pyOr run only this step via the pipeline:
python run.py --scrape-onlyUses config.json for search query, paths, delays, number of workers, etc.
Merges all output/raw/*.json (excluding _* files), deduplicates places by place ID or name+address, and writes e.g. output/lahore_clinics.json.
python merge_dedup.pyOr:
python run.py --merge-onlyRun all steps in order: grid → scrape → merge.
python run.pyRun only specific steps:
python run.py --grid-only— buildgrid.jsonfromlahore.json(or currentboundary_file)python run.py --scrape-only— scrape only (requires existinggrid.json)python run.py --merge-only— merge only (requires existingoutput/raw/*.json)
| File / folder | Purpose |
|---|---|
config.json |
Boundary name/file, grid size, search query, paths, scraper and extractor options |
lahore.json |
Boundary polygon: [[lat, lon], ...] (create with city_boundary_extractor or by hand) |
grid.json |
Grid cells (id, center_lat, center_lon) — produced by grid_builder.py |
city_boundary_extractor.py |
Interactive boundary capture from Google Maps (click on map, close browser to save polygon) |
grid_builder.py |
Build grid from boundary polygon |
scraper.py |
Scrape Google Maps per grid cell; saves to output/raw/ |
merge_dedup.py |
Merge raw JSONs and write final places JSON |
run.py |
Run full pipeline or a single step |
output/raw/ |
One JSON per grid cell (resumable scrape) |
output/lahore_clinics.json |
Final merged, deduplicated list of places |
- Google Maps may change its DOM; if the scraper or boundary extractor stops finding elements, selectors (e.g.
div[role="feed"],button[jsaction="reveal.card.latLng"]) may need updating. - Automated use of Google Maps may violate its Terms of Service; use for personal or educational purposes only.