# AI Clinical Trials Landscape Intelligence Engine

An AI-powered healthcare research intelligence system that transforms raw clinical trial data into structured strategic insights.

---

## What This Project Does

1. **Pulls real-time clinical trial data** from the official ClinicalTrials.gov API based on healthcare topics (e.g., diabetes, oncology, cardiovascular disease).
2. **Extracts and structures key study attributes** such as phase, status, sponsor, and research focus.
3. **Computes quantitative landscape metrics** including phase distribution and sponsor concentration.
4. **Uses GPT-4o-mini** to generate a strategic intelligence brief highlighting research trends and system-level implications.

---

## Output Includes

- Emerging macro research themes  
- Innovation maturity signals (early vs. late-stage pipeline)  
- Sponsor ecosystem dynamics (academic vs. industry dominance)  
- Strategic risk gaps  
- 3–5 year forward-looking implications for U.S. healthcare systems  

---

## Purpose

To convert fragmented clinical research data into **decision-ready intelligence** that supports healthcare strategy, innovation monitoring, and policy analysis.


In [None]:
import os
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from openai import OpenAI

In [None]:
# =====================================
# STEP 1: ENVIRONMENT SETUP & API KEYS
# =====================================

# Load environment variables from .env file
# Make sure you have a .env file with: OPENAI_API_KEY=your_key_here
load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

# Validate the OpenAI API key format and existence
if not api_key:
    print("No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!")
elif not api_key.startswith("sk-proj-"):
    print("An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook")
elif api_key.strip() != api_key:
    print("An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook")
else:
    print("API key found and looks good so far!")

# Initialize OpenAI client
openai = OpenAI()

In [None]:
# =====================================
# STEP 2: CLINICAL TRIALS API SETUP
# =====================================
class ClinicalTrialsAPI:
    def __init__(self, condition, page_size=10):
        self.condition = condition
        self.page_size = page_size
        self.url = "https://clinicaltrials.gov/api/v2/studies"
        self.data = None
        self.fetch_data()

    def fetch_data(self):
        params = {
            "query.term": self.condition,
            "pageSize": self.page_size
        }

        response = requests.get(self.url, params=params, timeout=15)
        print("Status Code:", response.status_code)

        response.raise_for_status()
        self.data = response.json()
    
    def extract_trials(self):
        extracted = []

        for study in self.data.get("studies", []):
            protocol = study.get("protocolSection", {})

            identification = protocol.get("identificationModule", {})
            status = protocol.get("statusModule", {})
            design = protocol.get("designModule", {})
            conditions = protocol.get("conditionsModule", {})
            description = protocol.get("descriptionModule", {})
            sponsor = protocol.get("sponsorCollaboratorsModule", {})

            trial_info = {
                "title": identification.get("briefTitle"),
                "status": status.get("overallStatus"),
                "phase": design.get("phases"),
                "conditions": conditions.get("conditions"),
                "summary": description.get("briefSummary"),
                "sponsor": sponsor.get("leadSponsor", {}).get("name")
            }

            extracted.append(trial_info)
        return extracted
    


In [None]:
# =====================================
# STEP 3: FORMAT TRIALS FOR AI
# =====================================

def format_trials_for_ai(trials):
    """
    Convert a list of structured trial dictionaries into a clean text block
    that is easy for an AI model to read and analyze.
    """
    formatted = ""
    for ii, trial in enumerate(trials, 1):
        formatted += f"--- Trial {ii} ---\n"
        formatted += f"Title: {trial['title']}\n"
        formatted += f"Status: {trial['status']}\n"
        formatted += f"Phase: {trial['phase']}\n"
        formatted += f"Conditions: {trial['conditions']}\n"
        formatted += f"Sponsor: {trial['sponsor']}\n"
        formatted += f"Summary: {trial['summary']}\n\n"
    return formatted


In [None]:
# =====================================
# STEP 4: STRUCTURED METRICS CALCULATION
# =====================================

from collections import Counter

def compute_structured_metrics(trials):
    phase_counter = Counter()
    sponsor_counter = Counter()
    status_counter = Counter()

    for trial in trials:
        # Phase
        phase = trial.get("phase")
        if phase:
            if isinstance(phase, list):
                for p in phase:
                    phase_counter[p] += 1
            else:
                phase_counter[phase] += 1
        else:
            phase_counter["Unknown"] += 1

        # Sponsor
        sponsor = trial.get("sponsor")
        if sponsor:
            sponsor_counter[sponsor] += 1
        else:
            sponsor_counter["Unknown"] += 1

        # Status
        status = trial.get("status")
        if status:
            status_counter[status] += 1
        else:
            status_counter["Unknown"] += 1

    return phase_counter, sponsor_counter, status_counter


In [None]:

# =====================================
# STEP 5: HEALTHCARE STRATEGY SYSTEM PROMPT
# =====================================

system_prompt = """
You are a senior healthcare strategy consultant advising U.S. health systems and policy leaders.

Analyze the provided structured clinical trial metrics and trial details.

Your task is NOT to summarize the trials.
Your task is to extract macro-level signals and strategic implications.

Focus on:
1. Research direction concentration (what problems are receiving attention?)
2. Phase maturity signals (is innovation early or market-ready?)
3. Sponsor ecosystem dynamics (academic vs industry dominance)
4. Risk gaps (what is missing from the landscape?)
5. Forward-looking implications for U.S. healthcare delivery, reimbursement, and system design.

Respond in structured markdown with:

## Macro Research Signals
## Innovation Maturity Assessment
## Sponsor Power Dynamics
## Strategic Risks & Gaps
## 3–5 Year Outlook for U.S. Health Systems

Be analytical, decisive, and systems-oriented.
Avoid simple description.
"""

In [None]:
# =====================================
# STEP 6: AI ANALYSIS FUNCTION
# =====================================

def analyze_trials(system_prompt, trial_text):
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": trial_text}
    ]

    response = openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        temperature=0.3
    )

    return response.choices[0].message.content


In [None]:
# =====================================
# STEP 7: MAIN EXECUTION
# =====================================
    

if __name__ == "__main__":
    print("Starting Clinical Trial Signal Analyzer...")
    print("=" * 50)

    condition = "diabetes"
    trials = ClinicalTrialsAPI(condition=condition, page_size=20)
 
    # Extract structured trials
    extracted_trials = trials.extract_trials()

    # Compute structured metrics
    phase_counts, sponsor_counts, status_counts = compute_structured_metrics(extracted_trials)

    metrics_summary = f"""
    Structured Metrics:

    Phase Distribution:
    {dict(phase_counts)}

    Sponsor Distribution:
    {dict(sponsor_counts)}

    Status Distribution:
    {dict(status_counts)}
    """

    # Format trial narratives
    formatted_text = format_trials_for_ai(extracted_trials)

     # Combine structured metrics + trial details
    combined_input = metrics_summary + "\n\nTrial Details:\n\n" + formatted_text

    print("\nSending structured trials to AI for analysis...\n")

    analysis = analyze_trials(system_prompt, formatted_text)

    display(Markdown(analysis))

