Bangalore Doctors Scraper

A Scrapy-based web scraper for extracting doctor information from Practo.com, specifically focusing on Bangalore-based medical practitioners.

Features

Scrapes doctor profiles from Practo.com
Extracts comprehensive doctor information including:
- Name and specialization
- Experience and education
- Clinic details and locations
- Consultation fees
- Ratings and reviews
- Available time slots
Google Maps integration for location verification
Data cleaning and validation utilities
Support for multiple output formats (JSON, CSV, MongoDB)

Installation

Clone the repository:

git clone <repository-url>
cd bangalore_doctors_scraper

Install dependencies:

pip install -r requirements.txt

Set up environment variables: Create a .env file in the root directory and add:

GOOGLE_MAPS_API_KEY=your_api_key_here

Usage

Basic Usage

Run the scraper with default settings:

python run_scraper.py

Advanced Usage

Run with custom parameters:

scrapy crawl practo_spider -a specialty="cardiology" -a location="bangalore"

Output Options

JSON: scrapy crawl practo_spider -o doctors.json
CSV: scrapy crawl practo_spider -o doctors.csv
MongoDB: Configure in pipelines.py

Project Structure

bangalore_doctors_scraper/
├── scrapy.cfg              # Scrapy configuration
├── requirements.txt        # Python dependencies
├── README.md              # This file
├── setup.py               # Package setup
├── run_scraper.py         # Main execution script
├── doctors_scraper/       # Scrapy project directory
│   ├── settings.py        # Scrapy settings
│   ├── middlewares.py     # Custom middlewares
│   ├── pipelines.py       # Data processing pipelines
│   ├── items.py           # Data models
│   └── spiders/           # Spider implementations
│       └── practo_spider.py
├── utils/                 # Utility modules
│   ├── google_maps.py     # Google Maps integration
│   └── data_cleaner.py    # Data cleaning utilities
└── tests/                 # Test files
    └── test_spider.py

Configuration

Scrapy Settings

Key settings can be modified in doctors_scraper/settings.py:

DOWNLOAD_DELAY: Delay between requests (default: 1 second)
CONCURRENT_REQUESTS: Number of concurrent requests (default: 16)
USER_AGENT: User agent string for requests
ROBOTSTXT_OBEY: Whether to obey robots.txt (default: True)

Rate Limiting

To be respectful to the target website:

Default delay of 1 second between requests
Random delay variance of 0.5-1.5 seconds
Auto-throttling enabled based on response times

Data Fields

The scraper extracts the following information for each doctor:

name: Doctor's full name
specialty: Medical specialization
experience: Years of experience
education: Educational qualifications
clinic_name: Name of the clinic/hospital
clinic_address: Full address
location: City/area
consultation_fee: Fee for consultation
rating: Overall rating
review_count: Number of reviews
languages: Languages spoken
services: Medical services offered
availability: Available time slots
phone: Contact number
latitude: Geographic latitude (from Google Maps)
longitude: Geographic longitude (from Google Maps)

Legal and Ethical Considerations

This scraper is for educational and research purposes only
Always respect the website's robots.txt and terms of service
Implement appropriate delays to avoid overwhelming the server
Do not use scraped data for commercial purposes without permission
Ensure compliance with data protection regulations (GDPR, etc.)

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests for new functionality
Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Disclaimer

This tool is for educational purposes only. Users are responsible for ensuring their use complies with applicable laws and website terms of service.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Bangalore Doctors Scraper

Features

Installation

Usage

Basic Usage

Advanced Usage

Output Options

Project Structure

Configuration

Scrapy Settings

Rate Limiting

Data Fields

Legal and Ethical Considerations

Contributing

License

Disclaimer

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
bangalore_doctors_scraper		bangalore_doctors_scraper
README.md		README.md

KevinG45/practo_scraper

Folders and files

Latest commit

History

Repository files navigation

Bangalore Doctors Scraper

Features

Installation

Usage

Basic Usage

Advanced Usage

Output Options

Project Structure

Configuration

Scrapy Settings

Rate Limiting

Data Fields

Legal and Ethical Considerations

Contributing

License

Disclaimer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages