Skip to content

drinkits/attachmentarchitect-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

18 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“Š Attachment Architect - Jira Data Center Storage Analyzer

Professional attachment storage analysis tool for Jira Data Center

Analyze your Jira instance's attachment storage, identify duplicates, and discover optimization opportunities through beautiful visual reports.


✨ Features

  • πŸ” Deep Storage Analysis - Scan all attachments across your Jira instance
  • πŸ“Š Visual Reports - Interactive HTML reports with 6 analysis dimensions
  • πŸ”₯ Heat Index - Identify "Frozen Dinosaurs" (large, inactive files)
  • πŸš€ High Performance - Multi-threaded downloads, streaming hash calculation
  • πŸ’Ύ Resume Capability - Interrupt and resume scans anytime
  • 🎯 Duplicate Detection - Find identical files wasting storage
  • πŸ“ˆ Comprehensive Insights - By project, file type, user, age, and status

πŸš€ Quick Start

1. Setup Environment

# Create virtual environment
python -m venv venv

# Activate virtual environment
# Windows:
venv\Scripts\activate
# Linux/Mac:
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

2. Configure Credentials

Copy the example environment file and edit it:

copy .env.example .env

Edit .env with your Jira credentials:

# Jira Connection
JIRA_BASE_URL=https://your-jira-instance.com
JIRA_TOKEN=your_personal_access_token_here

# OR use username/password (less secure)
# JIRA_USERNAME=your_username
# JIRA_PASSWORD=your_password

πŸ’‘ Tip: Personal Access Tokens are recommended for Jira DC 8.14+

3. Test Connection

python test_connection.py

Expected output:

βœ“ Connected to Jira successfully!
βœ“ User: John Doe (john.doe)
βœ“ Authentication working

4. Run Scan

python jira_dc_scanner.py

The scanner will:

  1. βœ… Connect to Jira
  2. πŸ“Š Count total issues
  3. πŸ”„ Scan attachments (with progress bar)
  4. πŸ’Ύ Save results to ./reports/scan_XXXXXXXX.json
  5. πŸ“ˆ Automatically generate HTML visual report
  6. πŸŽ‰ Display summary and report path

5. View Report

Open the generated HTML file in your browser:

./reports/scan_XXXXXXXX_visual_analysis.html

πŸ“– Detailed Usage

Configuration Options

Edit config.yaml to customize scan behavior:

jira:
  base_url: "https://your-jira.com"
  verify_ssl: true

scan:
  batch_size: 100              # Issues per batch
  thread_pool_size: 12         # Parallel downloads
  rate_limit_per_second: 50    # API requests/sec
  use_content_hash: true       # Hash file content (vs URL)
  max_file_size_gb: 5          # Skip files larger than this
  download_timeout_seconds: 300

filters:
  # Scan specific projects only
  projects: []                 # Empty = all projects
  
  # Date range
  date_from: ""                # Format: YYYY-MM-DD
  date_to: ""                  # Format: YYYY-MM-DD
  
  # OR use custom JQL
  custom_jql: ""               # Overrides all other filters

output:
  output_dir: "./reports"

storage:
  database_path: "./scans/scan.db"
  checkpoint_interval: 100     # Save progress every N issues

logging:
  level: "INFO"
  file: "./logs/scanner.log"
  console: true

Resume Interrupted Scan

If a scan is interrupted (Ctrl+C, network issue, etc.):

python jira_dc_scanner.py --resume SCAN_ID

Example:

python jira_dc_scanner.py --resume 9f7629d1

Generate Report from Existing Scan

python generate_visual_analysis_report.py scan_9f7629d1.json

Customize "Frozen Dinosaurs" criteria:

# Files > 5 MB inactive for > 180 days
python generate_visual_analysis_report.py scan_9f7629d1.json --min-size 5 --min-days 180

πŸ“Š Visual Report Features

The HTML report includes 6 interactive tabs:

1. πŸ“Š By Project

  • Storage consumption per project
  • File counts and average sizes
  • Color-coded by file size

2. πŸ” By File Type

  • Top file types by storage
  • File counts per extension
  • Identify storage-heavy formats

3. πŸ‘₯ By User

  • Top 10 storage consumers
  • Format: Name Surname (username)
  • Files uploaded per user

4. πŸ“… By Age

  • Attachment age distribution
  • Buckets: 0-90 days, 90-365 days, 1-2 years, 2-4 years, >4 years
  • Identify archival candidates

5. πŸ“‹ By Status

  • Storage by issue status
  • Format: Status Name (ID: X) (when available)
  • Find storage in closed/resolved issues

6. πŸ”₯ Heat Index

  • "Frozen Dinosaurs": Large files on inactive issues
  • Sortable table: Click any column header to sort
  • Search & filter: Find specific files
  • Pagination: Browse all files efficiently

πŸ”§ Advanced Features

Custom JQL Queries

Scan specific issues using custom JQL:

# config.yaml
filters:
  custom_jql: "project = MYPROJECT AND created >= -365d"

Performance Tuning

For large instances (>100K issues):

scan:
  batch_size: 50               # Smaller batches
  thread_pool_size: 8          # Fewer threads
  rate_limit_per_second: 30    # Lower rate limit

For small instances (<10K issues):

scan:
  batch_size: 200              # Larger batches
  thread_pool_size: 20         # More threads
  rate_limit_per_second: 100   # Higher rate limit

URL-Based Hashing (Faster)

Skip content downloads, hash URLs instead:

scan:
  use_content_hash: false      # Much faster, less accurate

Trade-off: Files with identical content but different URLs won't be detected as duplicates.


πŸ“ Project Structure

Attachment-Architect_python/
β”œβ”€β”€ jira_dc_scanner.py              # Main scanner
β”œβ”€β”€ generate_visual_analysis_report.py  # Report generator
β”œβ”€β”€ generate_html_content.py        # HTML template
β”œβ”€β”€ test_connection.py              # Connection tester
β”œβ”€β”€ config.yaml                     # Configuration
β”œβ”€β”€ .env                            # Credentials (create from .env.example)
β”œβ”€β”€ requirements.txt                # Python dependencies
β”œβ”€β”€ README.md                       # This file
β”œβ”€β”€ reports/                        # Generated reports
β”‚   β”œβ”€β”€ scan_XXXXXXXX.json
β”‚   └── scan_XXXXXXXX_visual_analysis.html
β”œβ”€β”€ scans/                          # Scan database
β”‚   └── scan.db
└── logs/                           # Log files
    └── scanner.log

πŸ› οΈ Troubleshooting

Connection Issues

Problem: Authentication failed

Solution:

  1. Verify credentials in .env
  2. For Personal Access Token: Check token hasn't expired
  3. For username/password: Verify credentials are correct
  4. Run python test_connection.py to diagnose

Problem: SSL Certificate verification failed

Solution:

# config.yaml
jira:
  verify_ssl: false  # Only for self-signed certificates

Performance Issues

Problem: Scan is very slow

Solutions:

  1. Reduce thread pool size if network is slow:

    scan:
      thread_pool_size: 6
  2. Use URL-based hashing (faster but less accurate):

    scan:
      use_content_hash: false
  3. Increase rate limit if Jira server can handle it:

    scan:
      rate_limit_per_second: 100

Problem: Scan keeps timing out

Solution:

scan:
  download_timeout_seconds: 600  # Increase timeout
  max_file_size_gb: 2            # Skip very large files

Memory Issues

Problem: Out of memory errors

Solution:

scan:
  batch_size: 50                 # Smaller batches
  thread_pool_size: 6            # Fewer threads
  checkpoint_interval: 50        # Save more frequently

Resume Not Working

Problem: Can't resume scan

Solution:

  1. Check scan ID is correct (8 characters)
  2. Verify ./scans/scan.db exists
  3. Check logs in ./logs/scanner.log

πŸ“‹ Requirements

  • Python: 3.8 or higher
  • Jira: Data Center 8.0 or higher
  • Permissions: Read access to issues and attachments
  • Network: Access to Jira instance
  • Disk Space: ~100 MB per 100K attachments (for database)

πŸ”’ Security Notes

  1. Never commit .env file - Contains sensitive credentials
  2. Use Personal Access Tokens - More secure than username/password
  3. Limit token permissions - Read-only access is sufficient
  4. Rotate tokens regularly - Best practice for security
  5. Use SSL verification - Only disable for trusted self-signed certs

πŸ“ License

Commercial License - Attachment Architect Team


🀝 Support

For issues, questions, or feature requests:

  1. Check the troubleshooting section above
  2. Review logs in ./logs/scanner.log
  3. Contact support with scan ID and error details

🎯 Best Practices

Before Scanning

  1. βœ… Test connection first: python test_connection.py
  2. βœ… Start with a small project to test
  3. βœ… Review config.yaml settings
  4. βœ… Ensure sufficient disk space

During Scanning

  1. πŸ’‘ Monitor progress bar and metrics
  2. πŸ’‘ Check logs if issues occur
  3. πŸ’‘ Don't interrupt unless necessary (use Ctrl+C to save checkpoint)

After Scanning

  1. πŸ“Š Review visual report in browser
  2. πŸ“Š Identify "Frozen Dinosaurs" for archival
  3. πŸ“Š Share report with stakeholders
  4. πŸ“Š Plan cleanup based on insights

πŸš€ Quick Reference

# Setup
python -m venv venv
venv\Scripts\activate  # Windows
pip install -r requirements.txt

# Configure
copy .env.example .env
# Edit .env with your credentials

# Test
python test_connection.py

# Scan
python jira_dc_scanner.py

# Resume
python jira_dc_scanner.py --resume SCAN_ID

# Generate report
python generate_visual_analysis_report.py scan_XXXXXXXX.json

# Custom report
python generate_visual_analysis_report.py scan_XXXXXXXX.json --min-size 5 --min-days 180

Made with ❀️ by Attachment Architect Team

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published