# Automated Research Report Generation

In this notebook, we'll create a comprehensive, structured research report from an academic paper using AI. This is perfect for literature reviews, paper summaries, or creating study materials!

## What You'll Learn
- How to use structured outputs to generate detailed, multi-section reports
- How to design complex Pydantic models for rich data structures
- How to format AI-generated content as professional markdown documents

## Use Case
Researchers and students often need to:
- Create summaries of research papers for literature reviews
- Extract key information in a standardized format
- Generate reports that highlight methodology, findings, and implications

We'll automate this entire process using the OpenAI Responses API!

## Setup

Let's import the necessary libraries.

In [None]:
from openai import OpenAI
from pydantic import BaseModel, Field
from typing import List
from pypdf import PdfReader
from IPython.display import Markdown, display

# Initialize the OpenAI client
client = OpenAI()

## Define the Report Structure

We'll create a comprehensive Pydantic model that defines all the sections we want in our research report. This ensures the AI generates complete, well-structured output.

In [None]:
class KeyFinding(BaseModel):
    """Represents a single key finding from the research."""
    finding: str = Field(description="A specific key finding or result (1-2 sentences)")
    significance: str = Field(description="Why this finding matters (1 sentence)")

class ResearchReport(BaseModel):
    """Complete structured research report."""
    
    # Basic information
    title: str = Field(description="The full title of the paper")
    authors: str = Field(description="Author names as they appear in the paper")
    year: str = Field(description="Publication year or arXiv submission year")
    
    # Executive summary
    executive_summary: str = Field(description="3-4 sentence high-level summary of the entire paper")
    
    # Research context
    research_problem: str = Field(description="What problem does this research address? (2-3 sentences)")
    background: str = Field(description="What prior work or context is important? (2-3 sentences)")
    
    # Methodology
    methodology_summary: str = Field(description="How did the researchers approach the problem? (3-4 sentences)")
    key_techniques: List[str] = Field(description="List of 3-5 key techniques or methods used")
    
    # Results and findings
    key_findings: List[KeyFinding] = Field(description="3-5 most important findings with their significance")
    
    # Analysis
    strengths: List[str] = Field(description="3-4 key strengths of this research")
    limitations: List[str] = Field(description="2-3 limitations or areas for improvement")
    
    # Impact and applications
    practical_applications: List[str] = Field(description="3-4 real-world applications of this research")
    future_directions: str = Field(description="What future research does this enable? (2-3 sentences)")

## Load the Research Paper

We'll extract the text from a PDF paper.

In [None]:
def load_pdf_text(file_path):
    """Loads text from a PDF file."""
    reader = PdfReader(file_path)
    text = "\n\n".join([page.extract_text() for page in reader.pages])
    return text

# Load the Word2Vec paper
paper_text = load_pdf_text("../assets/paper3.pdf")

print(f"Loaded paper with {len(paper_text)} characters")

## Generate the Research Report

Now we'll use the Responses API with structured outputs to generate a complete, detailed report.

In [None]:
# Create instructions for comprehensive analysis
instructions = """
You are an expert research analyst who creates detailed, structured reports about academic papers.
Your reports should be:
- Thorough and accurate
- Clear and well-organized
- Helpful for someone doing a literature review
- Written in professional academic language
"""

# Generate the report
print("Generating comprehensive research report...")

response = client.responses.parse(
    model="gpt-5-mini",
    instructions=instructions,
    input=f"Create a detailed research report analyzing this paper:\n\n{paper_text}",
    text_format=ResearchReport
)

# Extract the structured report
report = response.output_parsed

print(f"✓ Generated report for: {report.title}")

## View the Raw Structured Data

Let's first look at the structured data object that was generated.

In [None]:
# Display the report object
print("Report Structure:")
print(f"Title: {report.title}")
print(f"Authors: {report.authors}")
print(f"Year: {report.year}")
print(f"\nNumber of key findings: {len(report.key_findings)}")
print(f"Number of key techniques: {len(report.key_techniques)}")
print(f"Number of practical applications: {len(report.practical_applications)}")

## Format as Professional Markdown Report

Now let's create a beautiful, formatted research report!

In [None]:
def format_report_as_markdown(report: ResearchReport) -> str:
    """
    Convert a ResearchReport object into a beautifully formatted markdown document.
    """
    
    md = f"""
# Research Report: {report.title}

**Authors:** {report.authors}  
**Year:** {report.year}

---

## Executive Summary

{report.executive_summary}

---

## Research Context

### Problem Statement
{report.research_problem}

### Background
{report.background}

---

## Methodology

### Overview
{report.methodology_summary}

### Key Techniques
"""
    
    # Add key techniques as bullet points
    for technique in report.key_techniques:
        md += f"- {technique}\n"
    
    md += "\n---\n\n## Key Findings\n\n"
    
    # Add each finding with its significance
    for i, finding in enumerate(report.key_findings, 1):
        md += f"### Finding {i}\n"
        md += f"**Result:** {finding.finding}\n\n"
        md += f"**Significance:** {finding.significance}\n\n"
    
    md += "---\n\n## Critical Analysis\n\n### Strengths\n"
    
    # Add strengths
    for strength in report.strengths:
        md += f"- {strength}\n"
    
    md += "\n### Limitations\n"
    
    # Add limitations
    for limitation in report.limitations:
        md += f"- {limitation}\n"
    
    md += f"""
---

## Impact and Applications

### Practical Applications
"""
    
    # Add applications
    for application in report.practical_applications:
        md += f"- {application}\n"
    
    md += f"""
### Future Research Directions
{report.future_directions}

---

*Report generated using OpenAI API with structured outputs*
"""
    
    return md

# Generate the formatted report
formatted_report = format_report_as_markdown(report)

## Display the Complete Report

Here's your professional research report!

In [None]:
display(Markdown(formatted_report))

## Save the Report to a File

Let's save this report so you can use it in your literature review or share it with others.

In [None]:
# Create a clean filename from the title
import re

def create_filename(title: str) -> str:
    """Convert a paper title to a clean filename."""
    # Remove special characters and replace spaces with underscores
    clean_name = re.sub(r'[^a-zA-Z0-9\s]', '', title)
    clean_name = clean_name.replace(' ', '_').lower()
    # Limit length
    clean_name = clean_name[:50]
    return f"{clean_name}_report.md"

# Save the report
filename = create_filename(report.title)
filepath = f"../assets/{filename}"

with open(filepath, 'w', encoding='utf-8') as f:
    f.write(formatted_report)

print(f"✓ Report saved to: {filepath}")

## Quick Access to Specific Sections

Since we have structured data, we can easily access specific parts of the report.

In [None]:
# Print just the key findings
print("KEY FINDINGS:\n")
for i, finding in enumerate(report.key_findings, 1):
    print(f"{i}. {finding.finding}")
    print(f"   → {finding.significance}")
    print()

In [None]:
# Print just the practical applications
print("PRACTICAL APPLICATIONS:\n")
for i, app in enumerate(report.practical_applications, 1):
    print(f"{i}. {app}")

## Key Takeaways

In this notebook, you learned:

1. **Complex Pydantic Models**: You can create rich, nested data structures with `List[CustomModel]` to capture detailed information
2. **Structured Report Generation**: Using `responses.parse()` ensures consistent, high-quality output every time
3. **Data Transformation**: Structured data (Pydantic objects) can easily be transformed into formatted documents (Markdown, HTML, PDF, etc.)
4. **Reusability**: Once you have structured data, you can generate different views or formats without re-calling the API

## Why This Matters

**Traditional approach:**
- Ask AI to "write a report"
- Get unstructured text
- Hard to extract specific information
- Inconsistent format across reports

**Structured approach (what we did):**
- Define exact schema with Pydantic
- Get guaranteed structure
- Easy to access specific fields
- Consistent reports every time
- Can store in databases, generate multiple formats, etc.

## Next Steps

Try:
- Adding more papers and comparing reports
- Creating a database of structured reports
- Building a web interface to browse reports
- Generating reports in different formats (HTML, PDF)
- Adding citation information and references