Skip to content

CodeMini100/web-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web Scraper - Educational Skeleton

A hands-on project for learning web scraping with FastAPI, BeautifulSoup, and SQLModel. This skeleton provides the structure and tests, while you implement the core functionality.

Learning Objectives

  1. Implement web scraping logic using BeautifulSoup
  2. Create and manage background tasks with FastAPI
  3. Work with SQLite databases using SQLModel
  4. Build RESTful APIs with proper error handling
  5. Write tests for web scraping functionality

Project Structure

The project is organized into several tasks:

  1. Task-01: Implement the main scraping function
  2. Task-02: Create product extraction logic
  3. Task-03: Build the scrape job initialization endpoint
  4. Task-04: Implement background scraping process
  5. Task-05: Create the job listing endpoint
  6. Task-06: Build the product listing endpoint
  7. Task-07: Implement the product detail endpoint

Prerequisites

  • Python 3.9+
  • Poetry for dependency management

Setup

  1. Clone the repository:
git clone <repository-url>
cd web-scraper
  1. Install dependencies:
poetry install
  1. Create a .env file (copy from .env.example):
cp .env.example .env
  1. Start the FastAPI server:
poetry run uvicorn app.main:app --reload

The API will be available at http://localhost:8000

Getting Started

  1. Read through the code and understand the project structure
  2. Check the TODO comments in each file for implementation hints
  3. Run the tests to see what needs to be implemented:
poetry run pytest
  1. Start implementing each task in order
  2. Use the test suite to verify your implementation

API Documentation

Once the server is running, you can access:

  • Interactive API docs: http://localhost:8000/docs
  • Alternative API docs: http://localhost:8000/redoc

API Endpoints to Implement

Scraping

  • POST /scrape/start: Start a new scraping job
  • GET /scrape/jobs: List all scraping jobs

Products

  • GET /products: List all scraped products (with pagination)
  • GET /products/{product_id}: Get a specific product

Development

Running Tests

poetry run pytest

Code Style

The project uses:

  • Black for code formatting
  • isort for import sorting
  • flake8 for linting

Run all checks:

poetry run black .
poetry run isort .
poetry run flake8

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published