QA Bugs Analytics (Starter)

Quick start

# 1) Create venv & install
python -m venv .venv && source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install -e .

# 2) Run on sample data (produces output/run_YYYYMMDD_HHMM/report.html)
qa-bugs run   --config configs/example.config.yml   --input data/sample_bugs.csv   --since 2025-09-01 --until 2025-09-30   --metrics defect_age,age_by_priority   --llm off

Open the generated report.html in a browser.

Environment Handling

Data-Driven Approach:

Environments are discovered from your uploaded data, not predefined in config
Only environments present in the data are analyzed and displayed in reports
Environments are ordered by defect count (most defects first) for better visibility
If an environment doesn't exist in your data, it won't appear in any metric

Environment Mapping:

Use env_value_mapping (manual) or auto_env_mapping (LLM-based) to normalize values
Example: "production" → "PROD", "testing" → "QA"
Mapping happens before analysis, so configure intended_env and leak_envs using the mapped values

Example:

# If your data has: "prod-server", "qa-env", "staging"
# And you map them to: PROD, QA, STAGE
# Configure leakage_rate to use mapped names:
leakage_rate:
  intended_env: ["QA", "STAGE"]  # Use mapped names
  leak_envs: ["PROD"]            # Use mapped names

Warnings:

If you configure intended_env or leak_envs with environments not in your data, warnings will be logged
Missing environments are skipped automatically - analysis continues with available data

AI Data Understanding (New!)

Automatic Classification:

The system can automatically classify your data semantics using AI
No more hardcoded status lists or priority mappings
Works with any bug tracking system (Jira, Azure DevOps, GitHub Issues, etc.)

What Gets Classified:

Statuses → Open / Closed / Rejected categories
Priorities → Severity order (Critical → High → Medium → Low)
Environments → Production vs Non-Production

How to Enable:

# In config file:
auto_classification:
  enabled: true
  classify_statuses: true
  classify_priorities: true
  classify_environments: true
  confidence_threshold: 0.6  # Auto-apply if confidence ≥ 60%

Or via CLI:

qa-bugs run --config config.yml --input data.csv --auto-classify --llm on

How It Works:

Upload your data → AI analyzes unique status/priority values
LLM classification (if enabled) or fuzzy keyword matching (fallback)
Classifications shown in report with confidence scores
High-confidence classifications auto-applied to metrics
You can review and override AI decisions

Benefits:

✅ Works with any project (no config changes needed)
✅ Transparent (see what AI decided + confidence scores)
✅ Fallback to fuzzy matching if LLM unavailable
✅ Reduces config complexity
✅ Adapts to your project's terminology

Report Display: Reports now include "� AI Data Profile" section showing:

Status Tab: Open/Closed/Rejected classifications with confidence scores
Priority Tab: Severity ordering from highest to lowest
Environment Tab: Production vs Non-Production, pipeline order (DEV → QA → STAGE → PROD)
Summary Tab: Field completeness, date range, applicable metrics
Method used (LLM or fuzzy matching) with confidence percentage
Any warnings or unclassified values

Available In:

✅ CLI HTML Reports - Detailed profile section with styling
✅ Streamlit UI - Interactive expandable tabs (Status/Priority/Environment/Summary)
✅ UI Toggle - Enable/disable auto-classification from sidebar

Exporting data from Jira

You can pull fresh issues directly from Jira into a CSV compatible with the analytics pipeline.

1. Configure environment

Copy .env.example to .env and fill (provide full JQL in JIRA_JQL_EXTRA including project clause):

JIRA_URL=https://your-domain.atlassian.net
JIRA_USER=your-email@example.com
JIRA_TOKEN=your_api_token
JIRA_JQL_EXTRA=project=PROJECTKEY AND status != Done AND priority in (High, Critical)

Generate an API token from Atlassian account security settings.

2. Install dependencies (if not already)

pip install -e . will install requests used by the exporter.

3. Run exporter

python -m qa_bugs.automation.jira_export export --output data/jira_issues.csv --limit 100

Filtering now uses an env var JIRA_JQL_EXTRA (required, full JQL). Example in .env.example already includes the project= clause. Adjust batch size or limit:

--batch-size 500 (Jira caps at 1000)
--limit 1000

4. Generate report on exported data

qa-bugs --config configs/example.config.yml --input data/jira_issues.csv --llm off

The CSV headers will match the configured fields_mapping (e.g., Created, Resolved, FixVersion).

Testing

Unit tests use pytest.

Run the full suite (excluding optional live tests by default):

python -m pytest

Run a single test file:

python -m pytest tests/test_defect_age.py -q

Live LLM test

tests/test_llm_live.py is marked with @pytest.mark.live and performs a real Azure OpenAI request. It is skipped unless the following environment variables are set:

AZURE_OPENAI_KEY
AZURE_OPENAI_ENDPOINT
(optional) AZURE_OPENAI_DEPLOYMENT (defaults to gpt-4o)
(optional) AZURE_OPENAI_API_VERSION (defaults to 2024-05-01-preview)

Run only live tests:

python -m pytest -m live

Exclude live tests:

python -m pytest -m "not live"

Adding tests

Place new test files in tests/ named test_*.py. Keep each test focused with minimal assertions covering:

Happy path
One edge case (e.g., empty dataframe)
One configuration variance

Debugging & Logging

Log Files

Automatic file logging is now enabled by default:

CLI runs:

Logs saved to: output/run_YYYYMMDD_HHMM/qa_bugs.log
Includes all field mapping, analysis, and LLM activity
DEBUG level details in file, INFO level in console

UI runs:

Field mapping: output/ui_session_YYYYMMDD_HHMMSS/qa_bugs_ui.log
Analysis: output/ui_run_YYYYMMDD_HHMMSS/qa_bugs_ui.log
Includes all activity at DEBUG level

LLM prompt/response files (when log_prompts: true):

Saved in same output directory as logs
Format: prompt_{metric_id}_{timestamp}.txt and response_{metric_id}_{timestamp}.txt

Enabling Detailed Logs in Code

For scripts or notebooks, configure logging manually:

import logging

# For detailed debugging (includes fuzzy match scores, LLM responses)
logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

# For high-level progress tracking
logging.basicConfig(level=logging.INFO)

Key Log Messages

Field Mapping Detection:

Auto-detecting field mapping for N columns - Detection starting
LLM service is enabled or using fuzzy matching only - Detection method selected
LLM prompt: / LLM response: - DEBUG level shows full LLM interaction
Fuzzy match: 'key' -> 'Issue ID' (score: 0.75) - DEBUG level match scores
LLM detection successful or Falling back to fuzzy matching - Result status
validation: valid=True, errors=0, warnings=2 - Validation summary

When to Use:

Debugging why certain CSV columns aren't detected
Understanding LLM vs fuzzy matching decisions
Reviewing actual LLM prompts and responses for troubleshooting
Troubleshooting missing required fields
Analyzing low similarity scores in fuzzy matching

See demo_field_mapper_logging.py for a working example.

Documentation

Auto Field Mapping: See docs/AUTO_MAPPING.md for detailed guide on automatic CSV field detection
Streamlit Deployment: See STREAMLIT_DEPLOYMENT.md for UI deployment instructions

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.claude		.claude
.devcontainer		.devcontainer
.github		.github
.streamlit		.streamlit
.vscode		.vscode
configs		configs
data		data
docs		docs
src/qa_bugs		src/qa_bugs
tests		tests
.copilot-instructions.md		.copilot-instructions.md
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
STREAMLIT_DEPLOYMENT.md		STREAMLIT_DEPLOYMENT.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QA Bugs Analytics (Starter)

Quick start

Environment Handling

AI Data Understanding (New!)

Exporting data from Jira

1. Configure environment

2. Install dependencies (if not already)

3. Run exporter

4. Generate report on exported data

Testing

Live LLM test

Adding tests

Debugging & Logging

Log Files

Enabling Detailed Logs in Code

Key Log Messages

Documentation

About

Uh oh!

Releases

Packages

Languages

deni-m/defect-analysis

Folders and files

Latest commit

History

Repository files navigation

QA Bugs Analytics (Starter)

Quick start

Environment Handling

AI Data Understanding (New!)

Exporting data from Jira

1. Configure environment

2. Install dependencies (if not already)

3. Run exporter

4. Generate report on exported data

Testing

Live LLM test

Adding tests

Debugging & Logging

Log Files

Enabling Detailed Logs in Code

Key Log Messages

Documentation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages