📊 Attachment Architect - Jira Data Center Storage Analyzer

Professional attachment storage analysis tool for Jira Data Center

Analyze your Jira instance's attachment storage, identify duplicates, and discover optimization opportunities through beautiful visual reports.

✨ Features

🔍 Deep Storage Analysis - Scan all attachments across your Jira instance
📊 Visual Reports - Interactive HTML reports with 6 analysis dimensions
🔥 Heat Index - Identify "Frozen Dinosaurs" (large, inactive files)
🚀 High Performance - Multi-threaded downloads, streaming hash calculation
💾 Resume Capability - Interrupt and resume scans anytime
🎯 Duplicate Detection - Find identical files wasting storage
📈 Comprehensive Insights - By project, file type, user, age, and status

🚀 Quick Start

1. Setup Environment

# Create virtual environment
python -m venv venv

# Activate virtual environment
# Windows:
venv\Scripts\activate
# Linux/Mac:
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

2. Configure Credentials

Copy the example environment file and edit it:

copy .env.example .env

Edit .env with your Jira credentials:

# Jira Connection
JIRA_BASE_URL=https://your-jira-instance.com
JIRA_TOKEN=your_personal_access_token_here

# OR use username/password (less secure)
# JIRA_USERNAME=your_username
# JIRA_PASSWORD=your_password

💡 Tip: Personal Access Tokens are recommended for Jira DC 8.14+

3. Test Connection

python test_connection.py

Expected output:

✓ Connected to Jira successfully!
✓ User: John Doe (john.doe)
✓ Authentication working

4. Run Scan

python jira_dc_scanner.py

The scanner will:

✅ Connect to Jira
📊 Count total issues
🔄 Scan attachments (with progress bar)
💾 Save results to ./reports/scan_XXXXXXXX.json
📈 Automatically generate HTML visual report
🎉 Display summary and report path

5. View Report

Open the generated HTML file in your browser:

./reports/scan_XXXXXXXX_visual_analysis.html

📖 Detailed Usage

Configuration Options

Edit config.yaml to customize scan behavior:

jira:
  base_url: "https://your-jira.com"
  verify_ssl: true

scan:
  batch_size: 100              # Issues per batch
  thread_pool_size: 12         # Parallel downloads
  rate_limit_per_second: 50    # API requests/sec
  use_content_hash: true       # Hash file content (vs URL)
  max_file_size_gb: 5          # Skip files larger than this
  download_timeout_seconds: 300

filters:
  # Scan specific projects only
  projects: []                 # Empty = all projects
  
  # Date range
  date_from: ""                # Format: YYYY-MM-DD
  date_to: ""                  # Format: YYYY-MM-DD
  
  # OR use custom JQL
  custom_jql: ""               # Overrides all other filters

output:
  output_dir: "./reports"

storage:
  database_path: "./scans/scan.db"
  checkpoint_interval: 100     # Save progress every N issues

logging:
  level: "INFO"
  file: "./logs/scanner.log"
  console: true

Resume Interrupted Scan

If a scan is interrupted (Ctrl+C, network issue, etc.):

python jira_dc_scanner.py --resume SCAN_ID

Example:

python jira_dc_scanner.py --resume 9f7629d1

Generate Report from Existing Scan

python generate_visual_analysis_report.py scan_9f7629d1.json

Customize "Frozen Dinosaurs" criteria:

# Files > 5 MB inactive for > 180 days
python generate_visual_analysis_report.py scan_9f7629d1.json --min-size 5 --min-days 180

📊 Visual Report Features

The HTML report includes 6 interactive tabs:

1. 📊 By Project

Storage consumption per project
File counts and average sizes
Color-coded by file size

2. 🔍 By File Type

Top file types by storage
File counts per extension
Identify storage-heavy formats

3. 👥 By User

Top 10 storage consumers
Format: Name Surname (username)
Files uploaded per user

4. 📅 By Age

Attachment age distribution
Buckets: 0-90 days, 90-365 days, 1-2 years, 2-4 years, >4 years
Identify archival candidates

5. 📋 By Status

Storage by issue status
Format: Status Name (ID: X) (when available)
Find storage in closed/resolved issues

6. 🔥 Heat Index

"Frozen Dinosaurs": Large files on inactive issues
Sortable table: Click any column header to sort
Search & filter: Find specific files
Pagination: Browse all files efficiently

🔧 Advanced Features

Custom JQL Queries

Scan specific issues using custom JQL:

# config.yaml
filters:
  custom_jql: "project = MYPROJECT AND created >= -365d"

Performance Tuning

For large instances (>100K issues):

scan:
  batch_size: 50               # Smaller batches
  thread_pool_size: 8          # Fewer threads
  rate_limit_per_second: 30    # Lower rate limit

For small instances (<10K issues):

scan:
  batch_size: 200              # Larger batches
  thread_pool_size: 20         # More threads
  rate_limit_per_second: 100   # Higher rate limit

URL-Based Hashing (Faster)

Skip content downloads, hash URLs instead:

scan:
  use_content_hash: false      # Much faster, less accurate

Trade-off: Files with identical content but different URLs won't be detected as duplicates.

📁 Project Structure

Attachment-Architect_python/
├── jira_dc_scanner.py              # Main scanner
├── generate_visual_analysis_report.py  # Report generator
├── generate_html_content.py        # HTML template
├── test_connection.py              # Connection tester
├── config.yaml                     # Configuration
├── .env                            # Credentials (create from .env.example)
├── requirements.txt                # Python dependencies
├── README.md                       # This file
├── reports/                        # Generated reports
│   ├── scan_XXXXXXXX.json
│   └── scan_XXXXXXXX_visual_analysis.html
├── scans/                          # Scan database
│   └── scan.db
└── logs/                           # Log files
    └── scanner.log

🛠️ Troubleshooting

Connection Issues

Problem: Authentication failed

Solution:

Verify credentials in .env
For Personal Access Token: Check token hasn't expired
For username/password: Verify credentials are correct
Run python test_connection.py to diagnose

Problem: SSL Certificate verification failed

Solution:

# config.yaml
jira:
  verify_ssl: false  # Only for self-signed certificates

Performance Issues

Problem: Scan is very slow

Solutions:

Reduce thread pool size if network is slow:
```
scan:
  thread_pool_size: 6
```
Use URL-based hashing (faster but less accurate):
```
scan:
  use_content_hash: false
```
Increase rate limit if Jira server can handle it:
```
scan:
  rate_limit_per_second: 100
```

Problem: Scan keeps timing out

Solution:

scan:
  download_timeout_seconds: 600  # Increase timeout
  max_file_size_gb: 2            # Skip very large files

Memory Issues

Problem: Out of memory errors

Solution:

scan:
  batch_size: 50                 # Smaller batches
  thread_pool_size: 6            # Fewer threads
  checkpoint_interval: 50        # Save more frequently

Resume Not Working

Problem: Can't resume scan

Solution:

Check scan ID is correct (8 characters)
Verify ./scans/scan.db exists
Check logs in ./logs/scanner.log

📋 Requirements

Python: 3.8 or higher
Jira: Data Center 8.0 or higher
Permissions: Read access to issues and attachments
Network: Access to Jira instance
Disk Space: ~100 MB per 100K attachments (for database)

🔒 Security Notes

Never commit .env file - Contains sensitive credentials
Use Personal Access Tokens - More secure than username/password
Limit token permissions - Read-only access is sufficient
Rotate tokens regularly - Best practice for security
Use SSL verification - Only disable for trusted self-signed certs

📝 License

Commercial License - Attachment Architect Team

🤝 Support

For issues, questions, or feature requests:

Check the troubleshooting section above
Review logs in ./logs/scanner.log
Contact support with scan ID and error details

🎯 Best Practices

Before Scanning

✅ Test connection first: python test_connection.py
✅ Start with a small project to test
✅ Review config.yaml settings
✅ Ensure sufficient disk space

During Scanning

💡 Monitor progress bar and metrics
💡 Check logs if issues occur
💡 Don't interrupt unless necessary (use Ctrl+C to save checkpoint)

After Scanning

📊 Review visual report in browser
📊 Identify "Frozen Dinosaurs" for archival
📊 Share report with stakeholders
📊 Plan cleanup based on insights

🚀 Quick Reference

# Setup
python -m venv venv
venv\Scripts\activate  # Windows
pip install -r requirements.txt

# Configure
copy .env.example .env
# Edit .env with your credentials

# Test
python test_connection.py

# Scan
python jira_dc_scanner.py

# Resume
python jira_dc_scanner.py --resume SCAN_ID

# Generate report
python generate_visual_analysis_report.py scan_XXXXXXXX.json

# Custom report
python generate_visual_analysis_report.py scan_XXXXXXXX.json --min-size 5 --min-days 180

Made with ❤️ by Attachment Architect Team

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Attachment-Architect-v1.0		Attachment-Architect-v1.0
--min-size		--min-size
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLOUD_VS_DC_COMPARISON.md		CLOUD_VS_DC_COMPARISON.md
DEPLOYMENT_PACKAGE.md		DEPLOYMENT_PACKAGE.md
FILE_LIST.txt		FILE_LIST.txt
FINAL_PACKAGE_SUMMARY.md		FINAL_PACKAGE_SUMMARY.md
FIXES_APPLIED.md		FIXES_APPLIED.md
JIRA_DC_ANALYSIS_DESIGN.md		JIRA_DC_ANALYSIS_DESIGN.md
JQL_EXAMPLES.md		JQL_EXAMPLES.md
PRODUCTION_READY.md		PRODUCTION_READY.md
PROJECT_SUMMARY.md		PROJECT_SUMMARY.md
PYTHON_SCANNING_PORT_GUIDE.md		PYTHON_SCANNING_PORT_GUIDE.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
REPORT_FORMATTING_UPDATES.md		REPORT_FORMATTING_UPDATES.md
SCANNER_RESUME_INSTRUCTIONS.md		SCANNER_RESUME_INSTRUCTIONS.md
START_HERE.txt		START_HERE.txt
TECHNICAL_ANALYSIS.md		TECHNICAL_ANALYSIS.md
VISUAL_ANALYSIS_README.md		VISUAL_ANALYSIS_README.md
check_files.py		check_files.py
config.yaml		config.yaml
fix_download_errors.py		fix_download_errors.py
fix_jql.py		fix_jql.py
generate_html_content.py		generate_html_content.py
generate_visual_analysis_report.py		generate_visual_analysis_report.py
jira_dc_scanner.py		jira_dc_scanner.py
logo.png		logo.png
requirements.txt		requirements.txt
setup.bat		setup.bat
setup.sh		setup.sh
test_connection.py		test_connection.py
test_function.py		test_function.py
test_heat_index.py		test_heat_index.py
test_jira_dates.py		test_jira_dates.py
test_jira_dates_simple.py		test_jira_dates_simple.py

Uh oh!

Uh oh!

drinkits/attachmentarchitect-python

Folders and files

Latest commit

History

Repository files navigation

📊 Attachment Architect - Jira Data Center Storage Analyzer

✨ Features

🚀 Quick Start

1. Setup Environment

2. Configure Credentials

3. Test Connection

4. Run Scan

5. View Report

📖 Detailed Usage

Configuration Options

Resume Interrupted Scan

Generate Report from Existing Scan

📊 Visual Report Features

1. 📊 By Project

2. 🔍 By File Type

3. 👥 By User

4. 📅 By Age

5. 📋 By Status

6. 🔥 Heat Index

🔧 Advanced Features

Custom JQL Queries

Performance Tuning

URL-Based Hashing (Faster)

📁 Project Structure

🛠️ Troubleshooting

Connection Issues

Performance Issues

Memory Issues

Resume Not Working

📋 Requirements

🔒 Security Notes

📝 License

🤝 Support

🎯 Best Practices

Before Scanning

During Scanning

After Scanning

🚀 Quick Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages