Skip to content

Atbash-Labs/ragdocs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAG Docs

A Python application for crawling and scraping documentation using Firecrawl API.

Features

  • Crawls websites and follows child links
  • Converts scraped content to markdown format
  • Saves documentation files with sanitized filenames
  • Handles duplicate filenames automatically

Requirements

  • Python 3.x
  • Firecrawl API key

Installation

pip install firecrawl-py

Usage

  1. Set your Firecrawl API key (recommended: use environment variables)
  2. Update the url and max_pages variables in firecrawlbasics.py
  3. Run the script:
python firecrawlbasics.py

Configuration

The script can be configured by modifying variables in firecrawlbasics.py:

  • url: The starting URL to crawl
  • max_pages: Maximum number of pages to crawl
  • output_folder: Folder to save markdown files
  • include_paths: Path filters for crawling
  • exclude_paths: Paths to exclude from crawling

Security Note

⚠️ Important: Move your Firecrawl API key to an environment variable instead of hardcoding it in the script.

License

[Add your license here]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages