Web Scraper - Educational Skeleton

A hands-on project for learning web scraping with FastAPI, BeautifulSoup, and SQLModel. This skeleton provides the structure and tests, while you implement the core functionality.

Learning Objectives

Implement web scraping logic using BeautifulSoup
Create and manage background tasks with FastAPI
Work with SQLite databases using SQLModel
Build RESTful APIs with proper error handling
Write tests for web scraping functionality

Project Structure

The project is organized into several tasks:

Task-01: Implement the main scraping function
Task-02: Create product extraction logic
Task-03: Build the scrape job initialization endpoint
Task-04: Implement background scraping process
Task-05: Create the job listing endpoint
Task-06: Build the product listing endpoint
Task-07: Implement the product detail endpoint

Prerequisites

Python 3.9+
Poetry for dependency management

Setup

Clone the repository:

git clone <repository-url>
cd web-scraper

Install dependencies:

poetry install

Create a .env file (copy from .env.example):

cp .env.example .env

Start the FastAPI server:

poetry run uvicorn app.main:app --reload

The API will be available at http://localhost:8000

Getting Started

Read through the code and understand the project structure
Check the TODO comments in each file for implementation hints
Run the tests to see what needs to be implemented:

poetry run pytest

Start implementing each task in order
Use the test suite to verify your implementation

API Documentation

Once the server is running, you can access:

Interactive API docs: http://localhost:8000/docs
Alternative API docs: http://localhost:8000/redoc

API Endpoints to Implement

Scraping

POST /scrape/start: Start a new scraping job
GET /scrape/jobs: List all scraping jobs

Products

GET /products: List all scraped products (with pagination)
GET /products/{product_id}: Get a specific product

Development

Running Tests

poetry run pytest

Code Style

The project uses:

Black for code formatting
isort for import sorting
flake8 for linting

Run all checks:

poetry run black .
poetry run isort .
poetry run flake8

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
app		app
frontend_example		frontend_example
tests		tests
.gitignore		.gitignore
README.md		README.md
editorial.md		editorial.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Web Scraper - Educational Skeleton

Learning Objectives

Project Structure

Prerequisites

Setup

Getting Started

API Documentation

API Endpoints to Implement

Scraping

Products

Development

Running Tests

Code Style

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

CodeMini100/web-scraper

Folders and files

Latest commit

History

Repository files navigation

Web Scraper - Educational Skeleton

Learning Objectives

Project Structure

Prerequisites

Setup

Getting Started

API Documentation

API Endpoints to Implement

Scraping

Products

Development

Running Tests

Code Style

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages