# Generate report

This notebook generates a comprehensive Word document containing all analysis graphs organized by location (Ljubljana, Maribor, Primorje, and Cross-regional analysis). The document includes graphs from various analysis steps including global overviews, data completeness assessments, AUC and CI analyses, and type-specific analyses.

In [1]:
import os
import glob
from docx import Document
from docx.shared import Inches, Cm
from docx.enum.text import WD_ALIGN_PARAGRAPH
from docx.enum.section import WD_SECTION
import re
import requests
import json

In [2]:
def generate_caption_with_ollama(image_filename, analysis_type, location=None):
    """
    Generate descriptive captions using Ollama based on image filename and context
    """
    try:
        # Prepare context for the caption generation
        context_info = f"Analysis type: {analysis_type}"
        if location:
            context_info += f", Location: {location}"
        
        # Create a prompt for Ollama to generate a caption
        prompt = f"""Based on the filename '{image_filename}' and context '{context_info}', 
        generate a concise, professional caption for a scientific figure in a pollen analysis report.
        The caption should explain what the figure shows. If multiple pannels are present, describe each briefly.
        Keep it factual and scientific.

Example format: "Figure showing [analysis type] for [location/species] demonstrating [key finding or visualization purpose]."

Filename: {image_filename}
Context: {context_info}

Caption:"""

        # Make request to Ollama API
        response = requests.post(
            'http://localhost:11434/api/generate',
            json={
                'model': 'llama3.2',  # You can change this to your preferred model
                'prompt': prompt,
                'stream': False,
                'options': {
                    'temperature': 0.3,  # Lower temperature for more consistent results
                    'num_predict': 50   # Limit response length
                }
            },
            timeout=30
        )
        
        if response.status_code == 200:
            result = response.json()
            caption = result.get('response', '').strip()
            # Clean up the caption and ensure it's not too long
            caption = caption.replace('\n', ' ').strip()
            if len(caption) > 200:  # Limit caption length
                caption = caption[:200] + "..."
            return caption if caption else f"Figure: {image_filename}"
        else:
            print(f"Ollama API error: {response.status_code}")
            return f"Figure: {image_filename}"
            
    except requests.exceptions.RequestException as e:
        print(f"Error connecting to Ollama: {e}")
        return f"Figure: {image_filename}"
    except Exception as e:
        print(f"Error generating caption: {e}")
        return f"Figure: {image_filename}"

In [3]:
def create_supporting_document():
    """
    Create a comprehensive Word document with all graphs organized by location
    """
    # Create a new document
    doc = Document()
    
    # Add title
    title = doc.add_heading('Supporting Material - Pollen Analysis Results', 0)
    title.alignment = WD_ALIGN_PARAGRAPH.CENTER
    
    # Add introduction
    intro_para = doc.add_paragraph()
    intro_para.add_run('This document contains comprehensive visual analysis results for pollen data across three regions: Ljubljana, Maribor, and Primorje, along with cross-regional correlation analyses.')
    
    # Define base path
    base_path = "results/Graphs"
    
    # Define locations and their analysis steps
    locations = {
        'Ljubljana': {
            'Step_2_Global_Overview': 'Global Overview Analysis',
            'Step_3_Completeness': 'Data Completeness Assessment', 
            'Step_4_AUC_and_CI': 'AUC and Confidence Interval Analysis',
            'Step_4_Type_Specific': 'Type-Specific Analysis'
        },
        'Maribor': {
            'Step_2_Global_Overview': 'Global Overview Analysis',
            'Step_3_Completeness': 'Data Completeness Assessment',
            'Step_4_AUC_and_CI': 'AUC and Confidence Interval Analysis', 
            'Step_4_Type_Specific': 'Type-Specific Analysis'
        },
        'Primorje': {
            'Step_2_Global_Overview': 'Global Overview Analysis',
            'Step_3_Completeness': 'Data Completeness Assessment',
            'Step_4_AUC_and_CI': 'AUC and Confidence Interval Analysis',
            'Step_4_Type_Specific': 'Type-Specific Analysis'
        }
    }
    
    return doc, base_path, locations

In [4]:
def add_images_to_document(doc, folder_path, section_title, location=None, use_ollama=True):
    """
    Add all images from a folder to the document with proper formatting and AI-generated captions
    """
    if not os.path.exists(folder_path):
        print(f"Warning: Path {folder_path} does not exist")
        return
    
    # Get all image files (png, jpg, jpeg)
    image_extensions = ['*.png', '*.jpg', '*.jpeg']
    image_files = []
    
    for extension in image_extensions:
        image_files.extend(glob.glob(os.path.join(folder_path, extension)))
    
    if not image_files:
        print(f"No images found in {folder_path}")
        return
    
    # Sort images for consistent ordering
    image_files.sort()
    
    # Add section heading
    if section_title:
        doc.add_heading(section_title, level=2)
    
    # Add each image
    for img_path in image_files:
        try:
            # Get image filename for caption
            img_name = os.path.basename(img_path)
            
            # Add image with appropriate sizing
            paragraph = doc.add_paragraph()
            run = paragraph.runs[0] if paragraph.runs else paragraph.add_run()
            
            # Add the image (max width 6 inches to fit page)
            doc.add_picture(img_path, width=Inches(6))
            
            # Generate caption using Ollama or fallback to simple caption
            if use_ollama:
                caption_text = generate_caption_with_ollama(img_name, section_title, location)
            else:
                caption_text = f"Figure: {img_name}"
            print(f"Generated caption: {caption_text}")
            # Add caption
            caption = doc.add_paragraph(caption_text)
            caption.alignment = WD_ALIGN_PARAGRAPH.CENTER
            
            # Add some space after each image
            doc.add_paragraph()
            
            print(f"Added: {img_name}")
            
        except Exception as e:
            print(f"Error adding image {img_path}: {str(e)}")
            continue

In [5]:
def generate_document(use_ollama_captions=True):
    """
    Main function to generate the complete supporting document
    """
    print("Starting document generation...")
    print(f"Using Ollama for captions: {use_ollama_captions}")
    
    # Initialize document
    doc, base_path, locations = create_supporting_document()
    
    # Process each location
    for location, steps in locations.items():
        print(f"\nProcessing {location}...")
        
        # Add major section for each location
        doc.add_page_break()
        doc.add_heading(f'{location} Analysis Results', level=1)
        
        # Add location-specific introduction
        location_intro = doc.add_paragraph()
        location_intro.add_run(f'This section contains all analysis results for {location}, including global overviews, data completeness assessments, statistical analyses, and type-specific results.')
        
        # Process each analysis step for the location
        for step_folder, step_title in steps.items():
            folder_path = os.path.join(base_path, location, step_folder)
            print(f"  Processing {step_title}...")
            add_images_to_document(doc, folder_path, step_title, location, use_ollama_captions)
    
    # Add Cross-regional analysis section
    print("\nProcessing Cross-regional analysis...")
    doc.add_page_break() 
    doc.add_heading('Cross-Regional Correlation Analysis', level=1)
    
    cross_regional_intro = doc.add_paragraph()
    cross_regional_intro.add_run('This section contains correlation analyses between different regions, showing relationships in pollen patterns across Ljubljana, Maribor, and Primorje.')
    
    # Add cross-regional images
    cross_regional_path = os.path.join(base_path, "Cross_regional", "Step_6_Cross_Regional_Correlation")
    add_images_to_document(doc, cross_regional_path, "Regional Correlation Results", "Cross-regional", use_ollama_captions)
    
    # Save the document
    output_filename = "Supporting_Material_Pollen_Analysis.docx"
    doc.save(output_filename)
    
    print(f"\nSupporting document generated successfully: {output_filename}")
    
    # Print summary of what was included
    total_images = 0
    print("\nSummary of included content:")
    for location, steps in locations.items():
        for step_folder, step_title in steps.items():
            folder_path = os.path.join(base_path, location, step_folder)
            if os.path.exists(folder_path):
                image_files = []
                for ext in ['*.png', '*.jpg', '*.jpeg']:
                    image_files.extend(glob.glob(os.path.join(folder_path, ext)))
                total_images += len(image_files)
                print(f"  {location} - {step_title}: {len(image_files)} images")
    
    # Count cross-regional images
    cross_regional_path = os.path.join(base_path, "Cross_regional", "Step_6_Cross_Regional_Correlation")
    if os.path.exists(cross_regional_path):
        cross_images = []
        for ext in ['*.png', '*.jpg', '*.jpeg']:
            cross_images.extend(glob.glob(os.path.join(cross_regional_path, ext)))
        total_images += len(cross_images)
        print(f"  Cross-regional analysis: {len(cross_images)} images")
    
    print(f"\nTotal images included: {total_images}")
    return output_filename

In [6]:
# Test Ollama connection first
def test_ollama_connection():
    """Test if Ollama is available and working"""
    try:
        response = requests.get('http://localhost:11434/api/tags', timeout=5)
        if response.status_code == 200:
            models = response.json().get('models', [])
            print(f"Ollama is available. Models found: {[m['name'] for m in models]}")
            return True
        else:
            print("Ollama API returned error status")
            return False
    except Exception as e:
        print(f"Ollama not available: {e}")
        print("Will use simple captions instead of AI-generated ones")
        return False

# Test connection
ollama_available = test_ollama_connection()

# Generate the supporting document
output_file = generate_document(use_ollama_captions=ollama_available)

Ollama is available. Models found: ['nomic-embed-text:latest', 'gpt-oss:20b', 'llama3.3:latest', 'llama3.1:8b', 'gemma3:4b', 'deepseek-r1:14b', 'llama3.2-vision:latest', 'llama3.2:latest', 'qwen2.5-coder:32b-instruct-q4_0']
Starting document generation...
Using Ollama for captions: True

Processing Ljubljana...
  Processing Global Overview Analysis...
Generated caption: Figure 1a: Global overview pollen diagram for Ljubljana, Slovenia, illustrating the seasonal abundance and distribution of major pollen taxa in the region.
Added: Ljubljana_Fig_01a.png
Generated caption: Figure 1a: Global overview pollen diagram for Ljubljana, Slovenia, illustrating the seasonal abundance and distribution of major pollen taxa in the region.
Added: Ljubljana_Fig_01a.png
Generated caption: Based on the provided filename and context, a possible caption for the figure could be:  "Figure 1b: Global overview pollen analysis for Ljubljana, Slovenia, highlighting regional patterns of tree pol...
Added: Ljubljan

## Ollama Integration

The notebook now includes integration with Ollama to generate intelligent, context-aware captions for your analysis graphs. The system will:

1. **Test Ollama Connection**: Check if Ollama is running and available
2. **Generate Smart Captions**: Use AI to create descriptive captions based on:
   - Image filename patterns
   - Analysis type (Global Overview, Data Completeness, etc.)
   - Location context (Ljubljana, Maribor, Primorje)
   - Cross-regional analysis context

3. **Fallback to Simple Captions**: If Ollama is not available, it will use basic filename-based captions

### Configuration Options

You can modify the Ollama model in the `generate_caption_with_ollama` function:
- `llama3.2` (default) - Good balance of speed and quality
- `llama3.2:3b` - Faster, smaller model
- `llama3.1` - Larger, more capable model
- `mistral` - Alternative model option

The system uses a low temperature (0.3) for consistent, factual captions suitable for scientific documentation.

## Document Structure

The generated Word document will include:

### 1. Ljubljana Analysis Results
- **Global Overview Analysis**: General visualization of Ljubljana data
- **Data Completeness Assessment**: Analysis of data quality and coverage
- **AUC and Confidence Interval Analysis**: Statistical performance metrics
- **Type-Specific Analysis**: Individual pollen type analyses

### 2. Maribor Analysis Results  
- **Global Overview Analysis**: General visualization of Maribor data
- **Data Completeness Assessment**: Analysis of data quality and coverage
- **AUC and Confidence Interval Analysis**: Statistical performance metrics
- **Type-Specific Analysis**: Individual pollen type analyses

### 3. Primorje Analysis Results
- **Global Overview Analysis**: General visualization of Primorje data  
- **Data Completeness Assessment**: Analysis of data quality and coverage
- **AUC and Confidence Interval Analysis**: Statistical performance metrics
- **Type-Specific Analysis**: Individual pollen type analyses

### 4. Cross-Regional Correlation Analysis
- Correlation analyses showing relationships between different regions
- Time series correlations for various pollen types across all three locations

Each section includes properly formatted images with captions and appropriate spacing for professional presentation.