State-of-the-Art (SOTA) AI-Powered Codebase Extractor
Stop manually copy-pasting code into ChatGPT. Let AI compress your entire repository into a single, token-optimized text file.
Welcome to Crawlable!
If you've ever tried to feed an entire project to an AI model (like ChatGPT, Claude, or Gemini) to ask for a refactor or bug fix, you know the struggle. You hit token limits, you accidentally upload node_modules or .git folders, and the AI loses context.
Crawlable solves this. It is an intelligent, high-speed CLI tool that scans your project directory, uses Google's powerful Gemini AI to figure out what files are useless noise, and asynchronously extracts only the core, proprietary source code. It outputs a beautiful, human-readable (and LLM-readable) text artifact that you can instantly drop into any AI assistant.
Whether you are a seasoned software architect or a non-technical manager trying to understand a codebase, Crawlable does the heavy lifting for you.
- 🧠 AI-Powered Filtering: Uses Google Gemini 2.5 Flash to dynamically identify and ignore build artifacts, lock files, and useless dependencies.
- ⚡ Asynchronous Extraction: Built on Python's
asyncio, it reads hundreds of files concurrently for blazing-fast performance. - 🛡️ Smart Truncation Algorithm: Automatically collapses massive directories (like
node_modulesorvenv) in the project roadmap so you don't waste valuable AI tokens. - 🎨 Beautiful Terminal UI: Powered by the
richlibrary, featuring live spinning progress bars, dynamic tables, and real-time status updates. - 📦 Versioned Output: Automatically groups your extractions into timestamped folders (
/crawlable_output/Project_YYYY-MM-DD_HH-MM/) keeping your workspace clean.
Don't have a Computer Science degree? Never used the terminal before? No problem. Follow these 4 easy steps to get Crawlable running on your machine.
You need Python to run this tool.
- Go to Python.org and download the latest version for your operating system.
- CRITICAL (Windows Users): When the installer opens, make sure to check the box that says "Add Python to PATH" before clicking Install.
Crawlable uses Google's AI brain, which requires a "key" to access.
- Go to Google AI Studio.
- Sign in with your Google account.
- Click "Get API key" on the left menu.
- Click "Create API key" and copy the long string of letters and numbers it gives you. Keep this secret!
Open your computer's terminal (Command Prompt on Windows, Terminal on Mac) and run these commands one by one:
# 1. Download the code to your computer
git clone [https://github.com/Med-Gh-TN/Crawlable.git](https://github.com/Med-Gh-TN/Crawlable.git)
# 2. Go into the project folder
cd Crawlable
# 3. Install the required packages
pip install -r requirements.txt
- Open the Crawlable folder on your computer.
- Navigate to
src/config.pyand open it with any text editor (Notepad, VS Code, TextEdit). - Find the line that says
API_KEY = "YOUR_API_KEY_HERE". - Replace
YOUR_API_KEY_HEREwith the key you got from Google in Step 2. (Keep the quotation marks!) - Save the file. You're ready to go!
Using Crawlable is incredibly simple. Open your terminal and run the main script, followed by the path of the folder you want to analyze.
Command:
python main.py /path/to/your/target/project
(Tip: You can literally drag and drop a folder from your desktop into the terminal window to automatically paste its path!)
Expected Output:
The terminal will display a gorgeous dashboard as it works through the 4 phases. Once complete, look inside the crawlable_output/ folder. You will find:
project_roadmap.txt: A clean, tree-like map of the project.source_code.txt: The consolidated, purely filtered code ready to be fed to an AI.prompt.txt: A base prompt template you can use to start your AI conversation.
For the technical folks, Crawlable operates on a highly decoupled, service-oriented architecture:
- Phase 1: Structural Crawl (
AsyncFileSystemService) Generates a pre-filtered map of the directory, instantly applyingHARDCODED_EXCLUSIONS(mathematically guaranteed noise) and our Smart Truncation algorithm. - Phase 2: Intelligent AI Filtering (
GeminiFilterService) Passes the roadmap togemini-2.5-flashwith a strict JSON schema prompt to dynamically identify project-specific noise. Includes exponential backoff and retry logic for API resilience. - Phase 3: Targeted Extraction (
AsyncCodeExtractorService) Fires off non-blockingasyncio.gathertasks to read all approved files concurrently, updating therichUI progress bar in real-time. - Phase 4: Output Generation
Orchestrated by
CrawlablePipeline, saving versioned artifacts to the file system.
We want Crawlable to be the absolute standard for AI code extraction. Contributions from developers of all skill levels are highly welcomed!
How to contribute:
- Fork the Project.
- Create your Feature Branch (
git checkout -b feature/AmazingFeature). - Commit your Changes (
git commit -m 'Add some AmazingFeature'). - Push to the Branch (
git push origin feature/AmazingFeature). - Open a Pull Request.
Note: If you find a bug or have a feature request, please open an Issue first so we can discuss it!
Distributed under the Apache 2.0 License. See LICENSE for more information. This grants you the freedom to use, modify, and distribute the software, even commercially, under the terms of the license.
Mouhamed Gharsallah - GitHub: @Med-Gh-TN
Built with ❤️ for the open-source community. Let's make AI collaboration seamless. Would you like me to help you draft the Release Notes for version 1.0 of Crawlable?