Professional attachment storage analysis tool for Jira Data Center
Analyze your Jira instance's attachment storage, identify duplicates, and discover optimization opportunities through beautiful visual reports.
- π Deep Storage Analysis - Scan all attachments across your Jira instance
- π Visual Reports - Interactive HTML reports with 6 analysis dimensions
- π₯ Heat Index - Identify "Frozen Dinosaurs" (large, inactive files)
- π High Performance - Multi-threaded downloads, streaming hash calculation
- πΎ Resume Capability - Interrupt and resume scans anytime
- π― Duplicate Detection - Find identical files wasting storage
- π Comprehensive Insights - By project, file type, user, age, and status
# Create virtual environment
python -m venv venv
# Activate virtual environment
# Windows:
venv\Scripts\activate
# Linux/Mac:
source venv/bin/activate
# Install dependencies
pip install -r requirements.txtCopy the example environment file and edit it:
copy .env.example .envEdit .env with your Jira credentials:
# Jira Connection
JIRA_BASE_URL=https://your-jira-instance.com
JIRA_TOKEN=your_personal_access_token_here
# OR use username/password (less secure)
# JIRA_USERNAME=your_username
# JIRA_PASSWORD=your_passwordπ‘ Tip: Personal Access Tokens are recommended for Jira DC 8.14+
python test_connection.pyExpected output:
β Connected to Jira successfully!
β User: John Doe (john.doe)
β Authentication working
python jira_dc_scanner.pyThe scanner will:
- β Connect to Jira
- π Count total issues
- π Scan attachments (with progress bar)
- πΎ Save results to
./reports/scan_XXXXXXXX.json - π Automatically generate HTML visual report
- π Display summary and report path
Open the generated HTML file in your browser:
./reports/scan_XXXXXXXX_visual_analysis.html
Edit config.yaml to customize scan behavior:
jira:
base_url: "https://your-jira.com"
verify_ssl: true
scan:
batch_size: 100 # Issues per batch
thread_pool_size: 12 # Parallel downloads
rate_limit_per_second: 50 # API requests/sec
use_content_hash: true # Hash file content (vs URL)
max_file_size_gb: 5 # Skip files larger than this
download_timeout_seconds: 300
filters:
# Scan specific projects only
projects: [] # Empty = all projects
# Date range
date_from: "" # Format: YYYY-MM-DD
date_to: "" # Format: YYYY-MM-DD
# OR use custom JQL
custom_jql: "" # Overrides all other filters
output:
output_dir: "./reports"
storage:
database_path: "./scans/scan.db"
checkpoint_interval: 100 # Save progress every N issues
logging:
level: "INFO"
file: "./logs/scanner.log"
console: trueIf a scan is interrupted (Ctrl+C, network issue, etc.):
python jira_dc_scanner.py --resume SCAN_IDExample:
python jira_dc_scanner.py --resume 9f7629d1python generate_visual_analysis_report.py scan_9f7629d1.jsonCustomize "Frozen Dinosaurs" criteria:
# Files > 5 MB inactive for > 180 days
python generate_visual_analysis_report.py scan_9f7629d1.json --min-size 5 --min-days 180The HTML report includes 6 interactive tabs:
- Storage consumption per project
- File counts and average sizes
- Color-coded by file size
- Top file types by storage
- File counts per extension
- Identify storage-heavy formats
- Top 10 storage consumers
- Format:
Name Surname (username) - Files uploaded per user
- Attachment age distribution
- Buckets: 0-90 days, 90-365 days, 1-2 years, 2-4 years, >4 years
- Identify archival candidates
- Storage by issue status
- Format:
Status Name (ID: X)(when available) - Find storage in closed/resolved issues
- "Frozen Dinosaurs": Large files on inactive issues
- Sortable table: Click any column header to sort
- Search & filter: Find specific files
- Pagination: Browse all files efficiently
Scan specific issues using custom JQL:
# config.yaml
filters:
custom_jql: "project = MYPROJECT AND created >= -365d"For large instances (>100K issues):
scan:
batch_size: 50 # Smaller batches
thread_pool_size: 8 # Fewer threads
rate_limit_per_second: 30 # Lower rate limitFor small instances (<10K issues):
scan:
batch_size: 200 # Larger batches
thread_pool_size: 20 # More threads
rate_limit_per_second: 100 # Higher rate limitSkip content downloads, hash URLs instead:
scan:
use_content_hash: false # Much faster, less accurateTrade-off: Files with identical content but different URLs won't be detected as duplicates.
Attachment-Architect_python/
βββ jira_dc_scanner.py # Main scanner
βββ generate_visual_analysis_report.py # Report generator
βββ generate_html_content.py # HTML template
βββ test_connection.py # Connection tester
βββ config.yaml # Configuration
βββ .env # Credentials (create from .env.example)
βββ requirements.txt # Python dependencies
βββ README.md # This file
βββ reports/ # Generated reports
β βββ scan_XXXXXXXX.json
β βββ scan_XXXXXXXX_visual_analysis.html
βββ scans/ # Scan database
β βββ scan.db
βββ logs/ # Log files
βββ scanner.log
Problem: Authentication failed
Solution:
- Verify credentials in
.env - For Personal Access Token: Check token hasn't expired
- For username/password: Verify credentials are correct
- Run
python test_connection.pyto diagnose
Problem: SSL Certificate verification failed
Solution:
# config.yaml
jira:
verify_ssl: false # Only for self-signed certificatesProblem: Scan is very slow
Solutions:
-
Reduce thread pool size if network is slow:
scan: thread_pool_size: 6
-
Use URL-based hashing (faster but less accurate):
scan: use_content_hash: false
-
Increase rate limit if Jira server can handle it:
scan: rate_limit_per_second: 100
Problem: Scan keeps timing out
Solution:
scan:
download_timeout_seconds: 600 # Increase timeout
max_file_size_gb: 2 # Skip very large filesProblem: Out of memory errors
Solution:
scan:
batch_size: 50 # Smaller batches
thread_pool_size: 6 # Fewer threads
checkpoint_interval: 50 # Save more frequentlyProblem: Can't resume scan
Solution:
- Check scan ID is correct (8 characters)
- Verify
./scans/scan.dbexists - Check logs in
./logs/scanner.log
- Python: 3.8 or higher
- Jira: Data Center 8.0 or higher
- Permissions: Read access to issues and attachments
- Network: Access to Jira instance
- Disk Space: ~100 MB per 100K attachments (for database)
- Never commit
.envfile - Contains sensitive credentials - Use Personal Access Tokens - More secure than username/password
- Limit token permissions - Read-only access is sufficient
- Rotate tokens regularly - Best practice for security
- Use SSL verification - Only disable for trusted self-signed certs
Commercial License - Attachment Architect Team
For issues, questions, or feature requests:
- Check the troubleshooting section above
- Review logs in
./logs/scanner.log - Contact support with scan ID and error details
- β
Test connection first:
python test_connection.py - β Start with a small project to test
- β
Review
config.yamlsettings - β Ensure sufficient disk space
- π‘ Monitor progress bar and metrics
- π‘ Check logs if issues occur
- π‘ Don't interrupt unless necessary (use Ctrl+C to save checkpoint)
- π Review visual report in browser
- π Identify "Frozen Dinosaurs" for archival
- π Share report with stakeholders
- π Plan cleanup based on insights
# Setup
python -m venv venv
venv\Scripts\activate # Windows
pip install -r requirements.txt
# Configure
copy .env.example .env
# Edit .env with your credentials
# Test
python test_connection.py
# Scan
python jira_dc_scanner.py
# Resume
python jira_dc_scanner.py --resume SCAN_ID
# Generate report
python generate_visual_analysis_report.py scan_XXXXXXXX.json
# Custom report
python generate_visual_analysis_report.py scan_XXXXXXXX.json --min-size 5 --min-days 180Made with β€οΈ by Attachment Architect Team