Crawlytics

Overview

Crawlytics is an open-source analytics platform for detecting, analyzing, and visualizing Large Language Model (LLM) crawler traffic in server logs. This tool processes Apache and Nginx log files to identify and extract requests from AI crawlers such as ChatGPT, Gemini, Claude, and other LLM-based systems.

This project is 100% free and open-source software.

Core Functionality

LLM Crawler Detection: Identifies requests from AI systems in standard server logs
Data Visualization: Provides heatmaps, charts, and geographic maps for traffic analysis
High-Performance Processing: Efficiently processes large log files using parallel processing
SQL Query Interface: Supports read-only SQL queries for custom data exploration
Multi-dimensional Analysis: Examines traffic patterns by time, geography, and crawler type
Detailed Log Inspection: Offers filterable, paginated views of individual log entries

Architecture

Backend: Python/FastAPI application with SQLite database for storage and query processing
Frontend: React-based dashboard with responsive visualizations
Packaging: Build scripts for creating standalone executables

Setup Instructions

Clone repository:

git clone https://github.com/danilotrix86/crawlytics.git
cd crawlytics\backend

Configure Python environment:

python -m venv venv  # Windows
# OR
python3 -m venv venv  # Linux/Mac
.\venv\Scripts\activate  # Windows
source venv/bin/activate  # Linux/Mac

Install backend dependencies:

pip install -r requirements.txt  # Windows
# OR
pip3 install -r requirements.txt  # Linux/Mac

Start the application:

python run_app.py  # Windows
# OR
python3 run_app.py  # Linux/Mac

Launching After first time Setup

Once you have completed the initial setup, you don't need to repeat all steps when launching the application in the future:

Navigate to the backend directory:
```
cd crawlytics\backend
```

Activate the virtual environment:

.\venv\Scripts\activate  # Windows
# OR
source venv/bin/activate  # Linux/Mac

Start the server:

python run_app.py  # Windows
# OR
python3 run_app.py  # Linux/Mac

Frontend Development

Note: The following section is only relevant if you intend to make changes to the frontend code. These steps are not required for normal usage of the application.

After modifying frontend code, rebuild and copy to the backend directory:

Windows:

cd fe
npm install
npm run build
cd ..
xcopy /E /I /Y fe\dist backend\react

Mac/Linux:

cd fe
npm install
npm run build
cd ..
mkdir -p backend/react
cp -R fe/dist/* backend/react/

Customizing LLM Crawler Detection

You can customize which LLM crawlers are detected by editing the crawler patterns list in backend/parser/llm_list.py. This allows you to:

Add new LLM crawler user agent patterns as they emerge
Remove patterns you don't want to track
Modify existing patterns to improve detection accuracy

Example of the crawler patterns list:

LLM_CRAWLER_PATTERNS = [
    # OpenAI
    "GPTBot",
    "ChatGPT-User",
    "OAI-SearchBot",

    # Anthropic
    "ClaudeBot",
    "Claude-Web",
    "Anthropic-AI",
    
    # Google
    "googlebot",
    "Google-Extended",
    
    # Add your custom patterns here
    "Your-Custom-LLM-Crawler",
]

Make these changes before processing log files to ensure they're detected correctly.

Screenshots

Main Dashboard

The main dashboard provides a comprehensive overview of your website's crawler activity:

Total log entries with detailed database statistics
Unique LLM crawlers detected with crawler identification
Distinct URLs requested by crawlers showing content accessed
Top crawler statistics with percentage breakdowns
Error rate analysis with client/server error distribution
Crawler density metrics showing average crawlers per page

Top Crawler Activity Timeline

The stacked area chart shows crawler activity over time:

Visualizes traffic patterns for multiple crawler types simultaneously
Color-coded areas for each crawler (ChatGPT-User, Googlebot, Bytespider, etc.)
Daily activity trends showing peak usage periods
Comparative view of crawler distribution and market share

Traffic Insight Heatmap

The detailed heatmap visualization shows:

Traffic intensity by day of week and hour of day
Color gradient indicating request volume
Clear patterns of high and low activity periods
Easy identification of when LLM crawlers are most active on your site

Geographic Insights

The geographic insight panel displays:

Global request distribution with percentage breakdowns
Color-coded regions by request volume
Regional traffic patterns and hotspots
Comprehensive breakdown of traffic sources by region

Crawler Behaviour Analysis

The crawler behavior panel provides advanced metrics:

Crawler speed metrics showing requests per minute
Path depth analysis showing navigation patterns
Error rate statistics with troubleshooting guidance
Path analysis showing most accessed content by crawler

Access Log Table

The logs table provides granular access to your data:

Advanced filtering by crawler, path, method, and status code
Date range selection with precise timestamps
Real-time IP address and request details
Sortable columns for customized analysis

Log Upload Interface

The simple upload interface allows you to:

Select and upload server log files (.log or .txt)
Process Apache, Nginx, or other standard log formats
Automatically analyze and extract LLM crawler data
Seamlessly integrate new data into your analytics

LLM Crawler Settings

The settings panel allows you to customize crawler detection:

Maintain a comprehensive list of LLM crawler patterns
Add new crawler user agents as they emerge
Configure detection parameters for optimal accuracy
Reset to default patterns or save custom configurations

Navigation Sidebar

The sidebar navigation provides quick access to:

Dashboard overview and key metrics
Traffic insight and pattern analysis
Geographic distribution visualization
Crawler behavior statistics
Detailed logs table with filtering
Log file upload interface
Settings and configuration options
Log file management and organization

Contact

For questions, feature requests, or technical support, please:

Open an issue on GitHub (preferred)
Send an email to the maintainer with "CRAWLYTICS" in the subject line: danilo.vaccalluzzo@gmail.com

License

Crawlytics is released under the MIT License.

This means you can:

Use the software commercially
Modify the source code
Distribute modified versions
Use it privately
Sublicense it

The only requirement is to include the original copyright notice and license text in any copy of the software/source.

Contributions welcome. Report issues via GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.cursor/rules		.cursor/rules
.vscode		.vscode
backend		backend
fe		fe
screenshots		screenshots
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.bat		build.bat
infobuild.md		infobuild.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crawlytics

Overview

Core Functionality

Architecture

Setup Instructions

Launching After first time Setup

Frontend Development

Customizing LLM Crawler Detection

Screenshots

Main Dashboard

Top Crawler Activity Timeline

Traffic Insight Heatmap

Geographic Insights

Crawler Behaviour Analysis

Access Log Table

Log Upload Interface

LLM Crawler Settings

Navigation Sidebar

Contact

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Crawlytics

Overview

Core Functionality

Architecture

Setup Instructions

Launching After first time Setup

Frontend Development

Customizing LLM Crawler Detection

Screenshots

Main Dashboard

Top Crawler Activity Timeline

Traffic Insight Heatmap

Geographic Insights

Crawler Behaviour Analysis

Access Log Table

Log Upload Interface

LLM Crawler Settings

Navigation Sidebar

Contact

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages