A hands-on project for learning web scraping with FastAPI, BeautifulSoup, and SQLModel. This skeleton provides the structure and tests, while you implement the core functionality.
- Implement web scraping logic using BeautifulSoup
- Create and manage background tasks with FastAPI
- Work with SQLite databases using SQLModel
- Build RESTful APIs with proper error handling
- Write tests for web scraping functionality
The project is organized into several tasks:
- Task-01: Implement the main scraping function
- Task-02: Create product extraction logic
- Task-03: Build the scrape job initialization endpoint
- Task-04: Implement background scraping process
- Task-05: Create the job listing endpoint
- Task-06: Build the product listing endpoint
- Task-07: Implement the product detail endpoint
- Python 3.9+
- Poetry for dependency management
- Clone the repository:
git clone <repository-url>
cd web-scraper
- Install dependencies:
poetry install
- Create a
.env
file (copy from.env.example
):
cp .env.example .env
- Start the FastAPI server:
poetry run uvicorn app.main:app --reload
The API will be available at http://localhost:8000
- Read through the code and understand the project structure
- Check the TODO comments in each file for implementation hints
- Run the tests to see what needs to be implemented:
poetry run pytest
- Start implementing each task in order
- Use the test suite to verify your implementation
Once the server is running, you can access:
- Interactive API docs:
http://localhost:8000/docs
- Alternative API docs:
http://localhost:8000/redoc
POST /scrape/start
: Start a new scraping jobGET /scrape/jobs
: List all scraping jobs
GET /products
: List all scraped products (with pagination)GET /products/{product_id}
: Get a specific product
poetry run pytest
The project uses:
- Black for code formatting
- isort for import sorting
- flake8 for linting
Run all checks:
poetry run black .
poetry run isort .
poetry run flake8
MIT