Skip to content

RayanRal/crawl-compass

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Crawl Compass

Crawls a website and generates a spec-compliant llms.txt file.

Prerequisites

Tool Install Purpose
Python 3.12+ python.org Runtime
uv brew install uv Python package manager
AWS CLI brew install awscli AWS access
AWS SAM CLI brew install aws-sam-cli Build and deploy
Docker Docker Desktop Lambda-compatible builds

Local development

Setup

make setup                                 # configures git hooks
cd backend && uv sync --all-groups         # installs dependencies

Run backend

cd backend && uv run fastapi dev app/main.py    # starts dev server at http://localhost:8000

Run frontend

Open frontend/index.html directly in a browser. API_BASE at the top of the script tag points to http://localhost:8000.

Deployment

AWS credentials

aws configure    # enter Access Key ID, Secret Key, region (e.g. eu-west-1), output format (json)

First deploy

make deploy-guided               # interactive: creates samconfig.toml, provisions all AWS resources
make deploy-frontend             # uploads frontend to S3

Subsequent deploys

make deploy                      # builds and deploys backend
make deploy-frontend             # uploads frontend to S3 with the live API URL substituted in

Tear down

make delete                      # deletes the entire CloudFormation stack

CI/CD

Deploys automatically on every push to main via GitHub Actions.

Architecture

Browser → API Gateway → ApiFunction (Lambda)
                              │ SQS
                        WorkerFunction (Lambda)
                              │
                        DynamoDB (job status)
                        S3 (results)

Browser → S3 static website (frontend)
  • POST /generate — returns job_id, enqueues crawl job
  • GET /job_status/{job_id} — poll for status (queued → crawling → generating → complete / failed)
  • GET /results/{job_id} — fetch plain-text llms.txt (local dev only; on AWS the status response includes a pre-signed S3 URL)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors