# Notebook 4: Your First AI Agent

## AI4Science series: Programming with Large Language Models

**Duration**: ~40 minutes

**Learning Goals**:
- Understand what agents are and when to use them
- Build a simple research assistant agent
- Chain multiple LLM calls for complex tasks
- Create practical agents for scientific workflows

---

## 1. Setup

In [None]:
# Install required packages
!pip install openai pandas numpy matplotlib seaborn scipy -q

In [None]:
import os
import json
import openai
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from typing import Callable, Dict, List, Any

# API setup
try:
    from google.colab import userdata
    api_key = userdata.get('OPENAI_API_KEY')
except:
    api_key = os.environ.get('OPENAI_API_KEY')

if not api_key:
    raise ValueError("Please set your OPENAI_API_KEY")

client = openai.OpenAI(api_key=api_key)
print("Setup complete!")

In [None]:
# Helper function
def ask_llm(system_prompt, user_message, model="gpt-4o-mini", temperature=0.7):
    """Simple wrapper for OpenAI API calls."""
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_message}
        ],
        temperature=temperature
    )
    return response.choices[0].message.content

---

## 2. What is an Agent?

### The Key Formula:

$$\text{Agent} = \text{LLM} + \text{Tools} + \text{Loop}$$

| Component | What It Does | Example |
|-----------|--------------|--------|
| **LLM** | Thinks and decides | "I need to analyze this data" |
| **Tools** | Takes actions | `load_csv()`, `calculate_stats()`, `create_plot()` |
| **Loop** | Iterates until done | Keep working until the task is complete |

### When to Use Agents vs. Single Prompts:

| Use Single Prompt | Use Agent |
|-------------------|----------|
| One-step tasks | Multi-step tasks |
| Fixed input â†’ output | Dynamic decisions needed |
| "Summarize this text" | "Analyze my data and tell me what's interesting" |
| "Convert this format" | "Research this topic and write a summary" |

### The ReAct Pattern: Reasoning + Acting

Agents follow a think-act-observe cycle:

1. **Thought**: "I need to understand the data first"
2. **Action**: Call `load_data()` tool
3. **Observation**: "Data has 100 rows, columns: x, y, z"
4. **Thought**: "Now I should look for correlations"
5. **Action**: Call `calculate_correlation()` tool
6. ... continue until task is complete

---

## 3. Building Blocks: Tools

Tools are Python functions that the agent can call. Let's build some research-relevant tools.

In [None]:
# Create sample data for our agent to work with
np.random.seed(42)

# Experiment data
n_subjects = 60
experiment_data = pd.DataFrame({
    'subject_id': [f'S{i:03d}' for i in range(1, n_subjects + 1)],
    'treatment': np.repeat(['Drug_A', 'Drug_B', 'Placebo'], n_subjects // 3),
    'age': np.random.randint(25, 65, n_subjects),
    'baseline_score': np.random.normal(50, 10, n_subjects),
    'week_4_score': np.concatenate([
        np.random.normal(58, 8, n_subjects // 3),   # Drug_A
        np.random.normal(65, 9, n_subjects // 3),   # Drug_B
        np.random.normal(52, 10, n_subjects // 3),  # Placebo
    ]),
    'adverse_events': np.random.poisson(1, n_subjects)
})

experiment_data['improvement'] = experiment_data['week_4_score'] - experiment_data['baseline_score']

# Save for agent to use
experiment_data.to_csv('clinical_trial_data.csv', index=False)
print("Sample clinical trial data created:")
print(experiment_data.head())

In [None]:
# Define tools as Python functions

def load_csv(filepath: str) -> str:
    """
    Load a CSV file and return a summary.
    
    Args:
        filepath: Path to the CSV file
        
    Returns:
        Summary string with shape, columns, and sample data
    """
    try:
        df = pd.read_csv(filepath)
        summary = f"""Loaded {filepath}:
- Shape: {df.shape[0]} rows, {df.shape[1]} columns
- Columns: {', '.join(df.columns.tolist())}
- Data types:\n{df.dtypes.to_string()}
- First 3 rows:\n{df.head(3).to_string()}"""
        return summary
    except Exception as e:
        return f"Error loading {filepath}: {str(e)}"


def compute_statistics(filepath: str, column: str, group_by: str = None) -> str:
    """
    Compute descriptive statistics for a column.
    
    Args:
        filepath: Path to CSV file
        column: Column to analyze
        group_by: Optional column to group by
        
    Returns:
        Statistics as a formatted string
    """
    try:
        df = pd.read_csv(filepath)
        
        if group_by:
            stats_df = df.groupby(group_by)[column].agg(['count', 'mean', 'std', 'min', 'max'])
            return f"Statistics for {column} by {group_by}:\n{stats_df.to_string()}"
        else:
            series = df[column]
            return f"""Statistics for {column}:
- Count: {series.count()}
- Mean: {series.mean():.2f}
- Std: {series.std():.2f}
- Min: {series.min():.2f}
- Max: {series.max():.2f}
- Median: {series.median():.2f}"""
    except Exception as e:
        return f"Error computing statistics: {str(e)}"


def run_statistical_test(filepath: str, column: str, group_column: str, test_type: str = "anova") -> str:
    """
    Run a statistical test comparing groups.
    
    Args:
        filepath: Path to CSV file
        column: Dependent variable column
        group_column: Grouping variable column
        test_type: Type of test ("anova", "ttest", "kruskal")
        
    Returns:
        Test results as formatted string
    """
    try:
        df = pd.read_csv(filepath)
        groups = [group[column].values for name, group in df.groupby(group_column)]
        group_names = df[group_column].unique().tolist()
        
        if test_type == "anova":
            stat, pvalue = stats.f_oneway(*groups)
            test_name = "One-way ANOVA"
        elif test_type == "kruskal":
            stat, pvalue = stats.kruskal(*groups)
            test_name = "Kruskal-Wallis H-test"
        elif test_type == "ttest" and len(groups) == 2:
            stat, pvalue = stats.ttest_ind(groups[0], groups[1])
            test_name = "Independent t-test"
        else:
            return "Invalid test type or group count for selected test"
        
        significance = "significant" if pvalue < 0.05 else "not significant"
        
        return f"""{test_name} results for {column} by {group_column}:
- Groups compared: {group_names}
- Test statistic: {stat:.4f}
- P-value: {pvalue:.4f}
- Conclusion: The difference is {significance} at Î±=0.05"""
    except Exception as e:
        return f"Error running test: {str(e)}"


def create_visualization(filepath: str, plot_type: str, x: str, y: str = None, hue: str = None) -> str:
    """
    Create a visualization and save it.
    
    Args:
        filepath: Path to CSV file
        plot_type: Type of plot ("boxplot", "barplot", "scatter", "histogram")
        x: X-axis column
        y: Y-axis column (for some plot types)
        hue: Grouping column for color
        
    Returns:
        Status message
    """
    try:
        df = pd.read_csv(filepath)
        plt.figure(figsize=(10, 6))
        
        if plot_type == "boxplot":
            sns.boxplot(data=df, x=x, y=y, hue=hue)
            plt.title(f"Box Plot: {y} by {x}")
        elif plot_type == "barplot":
            sns.barplot(data=df, x=x, y=y, hue=hue, errorbar='se')
            plt.title(f"Bar Plot: {y} by {x}")
        elif plot_type == "scatter":
            sns.scatterplot(data=df, x=x, y=y, hue=hue)
            plt.title(f"Scatter Plot: {y} vs {x}")
        elif plot_type == "histogram":
            sns.histplot(data=df, x=x, hue=hue, kde=True)
            plt.title(f"Histogram: {x}")
        else:
            return f"Unknown plot type: {plot_type}"
        
        plt.tight_layout()
        output_file = f"{plot_type}_{x}_{y or 'dist'}.png"
        plt.savefig(output_file, dpi=150)
        plt.show()
        
        return f"Created {plot_type} visualization and saved to {output_file}"
    except Exception as e:
        return f"Error creating visualization: {str(e)}"


def explain_results(context: str, question: str) -> str:
    """
    Use LLM to explain statistical results in plain language.
    
    Args:
        context: Statistical results or data summary
        question: What to explain
        
    Returns:
        Plain language explanation
    """
    prompt = f"""Based on these results:

{context}

Question: {question}

Provide a clear, plain-language explanation that a non-statistician could understand.
Focus on practical implications and what this means for the research."""
    
    return ask_llm(
        "You are a helpful statistics explainer for scientists.",
        prompt,
        temperature=0.5
    )

In [None]:
# Test our tools
print("=== Testing load_csv ===")
print(load_csv('clinical_trial_data.csv'))
print("\n" + "="*50 + "\n")

print("=== Testing compute_statistics ===")
print(compute_statistics('clinical_trial_data.csv', 'improvement', 'treatment'))

---

## 4. Your First Agent: Research Assistant

Now let's build an agent that can use these tools to analyze data.

In [None]:
class SimpleAgent:
    """
    A simple agent that can use tools to accomplish tasks.
    Uses the ReAct pattern: Reasoning + Acting.
    """
    
    def __init__(self, tools: Dict[str, Callable], max_iterations: int = 10):
        """
        Initialize the agent with available tools.
        
        Args:
            tools: Dictionary mapping tool names to functions
            max_iterations: Maximum number of think-act cycles
        """
        self.tools = tools
        self.max_iterations = max_iterations
        self.history = []  # Track conversation history
        
        # Build tool descriptions for the prompt
        self.tool_descriptions = self._build_tool_descriptions()
        
    def _build_tool_descriptions(self) -> str:
        """Create descriptions of available tools for the system prompt."""
        descriptions = []
        for name, func in self.tools.items():
            doc = func.__doc__ or "No description available"
            descriptions.append(f"- {name}: {doc.split(chr(10))[0].strip()}")
        return "\n".join(descriptions)
    
    def _get_system_prompt(self) -> str:
        """Build the system prompt for the agent."""
        return f"""You are a research data analysis agent. You help scientists analyze their data.

You have access to these tools:
{self.tool_descriptions}

To use a tool, respond with:
THOUGHT: [your reasoning about what to do next]
ACTION: [tool_name]
PARAMETERS: [JSON object with parameters]

After receiving a tool result, continue with more THOUGHT/ACTION pairs until the task is complete.

When you have completed the task, respond with:
THOUGHT: [summary of what you found]
FINAL_ANSWER: [your complete response to the user]

Important rules:
1. Always think before acting
2. Use tools to gather information - don't make up data
3. If a tool fails, try a different approach
4. Provide clear, actionable insights
"""
    
    def _parse_response(self, response: str) -> Dict:
        """Parse the LLM response to extract thought, action, and parameters."""
        result = {
            'thought': None,
            'action': None,
            'parameters': None,
            'final_answer': None
        }
        
        lines = response.strip().split('\n')
        current_section = None
        current_content = []
        
        for line in lines:
            if line.startswith('THOUGHT:'):
                if current_section == 'parameters':
                    result['parameters'] = '\n'.join(current_content).strip()
                current_section = 'thought'
                current_content = [line[8:].strip()]
            elif line.startswith('ACTION:'):
                if current_section == 'thought':
                    result['thought'] = '\n'.join(current_content).strip()
                current_section = 'action'
                result['action'] = line[7:].strip()
            elif line.startswith('PARAMETERS:'):
                current_section = 'parameters'
                current_content = [line[11:].strip()]
            elif line.startswith('FINAL_ANSWER:'):
                if current_section == 'thought':
                    result['thought'] = '\n'.join(current_content).strip()
                current_section = 'final'
                current_content = [line[13:].strip()]
            else:
                current_content.append(line)
        
        # Capture final section
        if current_section == 'parameters':
            result['parameters'] = '\n'.join(current_content).strip()
        elif current_section == 'final':
            result['final_answer'] = '\n'.join(current_content).strip()
        elif current_section == 'thought' and not result['action']:
            result['thought'] = '\n'.join(current_content).strip()
            
        return result
    
    def _execute_tool(self, action: str, parameters: str) -> str:
        """Execute a tool with given parameters."""
        if action not in self.tools:
            return f"Error: Unknown tool '{action}'. Available tools: {list(self.tools.keys())}"
        
        try:
            # Parse parameters as JSON
            params = json.loads(parameters) if parameters else {}
            result = self.tools[action](**params)
            return str(result)
        except json.JSONDecodeError:
            return f"Error: Could not parse parameters as JSON: {parameters}"
        except Exception as e:
            return f"Error executing {action}: {str(e)}"
    
    def run(self, task: str, verbose: bool = True) -> str:
        """
        Run the agent on a task.
        
        Args:
            task: The task description from the user
            verbose: Whether to print intermediate steps
            
        Returns:
            The final answer from the agent
        """
        messages = [
            {"role": "system", "content": self._get_system_prompt()},
            {"role": "user", "content": f"Task: {task}"}
        ]
        
        for iteration in range(self.max_iterations):
            if verbose:
                print(f"\n{'='*50}")
                print(f"Iteration {iteration + 1}")
                print('='*50)
            
            # Get LLM response
            response = client.chat.completions.create(
                model="gpt-4o-mini",
                messages=messages,
                temperature=0.3
            )
            
            assistant_message = response.choices[0].message.content
            messages.append({"role": "assistant", "content": assistant_message})
            
            # Parse the response
            parsed = self._parse_response(assistant_message)
            
            if verbose and parsed['thought']:
                print(f"\nTHOUGHT: {parsed['thought']}")
            
            # Check if we have a final answer
            if parsed['final_answer']:
                if verbose:
                    print(f"\nFINAL ANSWER: {parsed['final_answer']}")
                return parsed['final_answer']
            
            # Execute the action if we have one
            if parsed['action']:
                if verbose:
                    print(f"ACTION: {parsed['action']}")
                    print(f"PARAMETERS: {parsed['parameters']}")
                
                result = self._execute_tool(parsed['action'], parsed['parameters'])
                
                if verbose:
                    print(f"\nOBSERVATION:\n{result[:500]}{'...' if len(result) > 500 else ''}")
                
                # Add the observation to messages
                messages.append({"role": "user", "content": f"OBSERVATION:\n{result}"})
            else:
                # No action and no final answer - ask for clarification
                messages.append({"role": "user", "content": "Please continue with your analysis. Use a tool or provide your final answer."})
        
        return "Agent reached maximum iterations without completing the task."
    
    def reset(self):
        """Reset the agent's history."""
        self.history = []

In [None]:
# Create our agent with the research tools
research_tools = {
    'load_csv': load_csv,
    'compute_statistics': compute_statistics,
    'run_statistical_test': run_statistical_test,
    'create_visualization': create_visualization,
    'explain_results': explain_results
}

agent = SimpleAgent(research_tools)

In [None]:
# Run the agent on a task
task = """
I have a clinical trial dataset in 'clinical_trial_data.csv'.
Please:
1. Load the data and tell me what's in it
2. Compare the improvement scores across treatment groups
3. Run a statistical test to see if the differences are significant
4. Create a visualization showing the results
5. Explain what this means for my research
"""

result = agent.run(task)

---

## 5. A Practical Agent: Data Analysis Assistant

Let's create a more focused agent specifically for data analysis.

In [None]:
class DataAnalysisAgent:
    """
    A specialized agent for exploratory data analysis.
    Automatically explores data and finds interesting patterns.
    """
    
    def __init__(self, filepath: str):
        self.filepath = filepath
        self.df = pd.read_csv(filepath)
        self.findings = []
        
    def explore(self) -> str:
        """Automatically explore the dataset and generate insights."""
        insights = []
        
        # Basic info
        insights.append(f"Dataset Overview: {self.df.shape[0]} rows, {self.df.shape[1]} columns")
        
        # Identify column types
        numeric_cols = self.df.select_dtypes(include=[np.number]).columns.tolist()
        categorical_cols = self.df.select_dtypes(include=['object']).columns.tolist()
        
        insights.append(f"Numeric columns: {numeric_cols}")
        insights.append(f"Categorical columns: {categorical_cols}")
        
        # Check for missing values
        missing = self.df.isnull().sum()
        if missing.any():
            insights.append(f"Missing values: {missing[missing > 0].to_dict()}")
        else:
            insights.append("No missing values found")
        
        return "\n".join(insights)
    
    def find_correlations(self, threshold: float = 0.5) -> str:
        """Find strong correlations between numeric variables."""
        numeric_df = self.df.select_dtypes(include=[np.number])
        
        if len(numeric_df.columns) < 2:
            return "Not enough numeric columns for correlation analysis"
        
        corr_matrix = numeric_df.corr()
        
        # Find strong correlations
        strong_corrs = []
        for i in range(len(corr_matrix.columns)):
            for j in range(i+1, len(corr_matrix.columns)):
                corr = corr_matrix.iloc[i, j]
                if abs(corr) >= threshold:
                    strong_corrs.append({
                        'var1': corr_matrix.columns[i],
                        'var2': corr_matrix.columns[j],
                        'correlation': corr
                    })
        
        if strong_corrs:
            result = f"Strong correlations found (threshold={threshold}):\n"
            for c in strong_corrs:
                result += f"  - {c['var1']} vs {c['var2']}: r={c['correlation']:.3f}\n"
            return result
        else:
            return f"No correlations found above threshold {threshold}"
    
    def compare_groups(self, value_col: str, group_col: str) -> str:
        """Compare a numeric variable across groups."""
        groups = self.df.groupby(group_col)[value_col].agg(['count', 'mean', 'std'])
        
        # Run ANOVA
        group_data = [group[value_col].values for name, group in self.df.groupby(group_col)]
        f_stat, p_value = stats.f_oneway(*group_data)
        
        result = f"Comparison of {value_col} by {group_col}:\n"
        result += groups.to_string() + "\n\n"
        result += f"ANOVA: F={f_stat:.2f}, p={p_value:.4f}\n"
        result += f"Significant difference: {'Yes' if p_value < 0.05 else 'No'}"
        
        return result
    
    def visualize_comparison(self, value_col: str, group_col: str):
        """Create a comparison visualization."""
        fig, axes = plt.subplots(1, 2, figsize=(12, 5))
        
        # Box plot
        sns.boxplot(data=self.df, x=group_col, y=value_col, ax=axes[0])
        axes[0].set_title(f'{value_col} by {group_col}')
        
        # Bar plot with error bars
        sns.barplot(data=self.df, x=group_col, y=value_col, errorbar='se', ax=axes[1])
        axes[1].set_title(f'Mean {value_col} (Â±SE)')
        
        plt.tight_layout()
        plt.show()
    
    def auto_analyze(self) -> str:
        """Run automatic analysis and return summary."""
        report = ["=" * 50]
        report.append("AUTOMATED DATA ANALYSIS REPORT")
        report.append("=" * 50)
        
        # Exploration
        report.append("\n1. DATA OVERVIEW")
        report.append(self.explore())
        
        # Correlations
        report.append("\n2. CORRELATION ANALYSIS")
        report.append(self.find_correlations())
        
        # Group comparisons (if categorical columns exist)
        categorical_cols = self.df.select_dtypes(include=['object']).columns.tolist()
        numeric_cols = self.df.select_dtypes(include=[np.number]).columns.tolist()
        
        if categorical_cols and numeric_cols:
            report.append("\n3. GROUP COMPARISONS")
            for cat_col in categorical_cols[:2]:  # Limit to first 2
                for num_col in numeric_cols[:3]:  # Limit to first 3
                    if self.df[cat_col].nunique() <= 5:  # Only if reasonable number of groups
                        report.append(f"\n--- {num_col} by {cat_col} ---")
                        report.append(self.compare_groups(num_col, cat_col))
        
        # Generate interpretation
        report.append("\n4. KEY FINDINGS")
        interpretation = ask_llm(
            "You are a data analyst. Summarize the key findings from this analysis in 3-4 bullet points.",
            "\n".join(report),
            temperature=0.5
        )
        report.append(interpretation)
        
        return "\n".join(report)

In [None]:
# Use the Data Analysis Agent
analyst = DataAnalysisAgent('clinical_trial_data.csv')
report = analyst.auto_analyze()
print(report)

In [None]:
# Create a visualization
analyst.visualize_comparison('improvement', 'treatment')

---

## 6. Practical Exercises

### Exercise A: Build a Methods Section Writer Agent

In [None]:
def methods_writer_tool(analysis_description: str, field: str = "biomedical") -> str:
    """
    Generate a methods section based on analysis description.
    
    Args:
        analysis_description: Description of the analysis performed
        field: Research field (for appropriate style)
    """
    prompt = f"""Write a methods section for a {field} research paper based on this analysis:

{analysis_description}

Include:
1. Statistical methods used
2. Software/tools used (assume Python with pandas, scipy, matplotlib)
3. Significance thresholds
4. Any assumptions made

Write in formal academic style, past tense, third person."""
    
    return ask_llm(
        "You are a scientific writing expert who writes clear, precise methods sections.",
        prompt,
        temperature=0.3
    )

# YOUR CODE HERE: Create a simple agent that:
# 1. Analyzes data
# 2. Generates a methods section based on the analysis

# Example usage:
# methods = methods_writer_tool(report, "biomedical")
# print(methods)

### Exercise B: Build a Statistical Test Selector Agent

In [None]:
def suggest_statistical_test(data_description: str, research_question: str) -> str:
    """
    Suggest appropriate statistical tests based on data and question.
    
    Args:
        data_description: Description of the data
        research_question: The question to answer
    """
    prompt = f"""Based on this data and research question, recommend the appropriate statistical test(s):

DATA:
{data_description}

RESEARCH QUESTION:
{research_question}

Please provide:
1. Recommended test(s)
2. Why this test is appropriate
3. Assumptions to check
4. Alternative tests if assumptions are violated
5. Python code to run the test"""
    
    return ask_llm(
        "You are a biostatistics expert who helps researchers choose appropriate statistical methods.",
        prompt,
        temperature=0.3
    )

# YOUR CODE HERE: Test this tool with your own data and question
# data_desc = analyst.explore()
# question = "Is there a significant difference in improvement between treatment groups?"
# recommendation = suggest_statistical_test(data_desc, question)
# print(recommendation)

### Exercise C: Build a Figure Caption Generator Agent

In [None]:
def generate_figure_caption(figure_description: str, statistical_results: str = None) -> str:
    """
    Generate a publication-ready figure caption.
    
    Args:
        figure_description: What the figure shows
        statistical_results: Optional statistical test results to include
    """
    prompt = f"""Write a publication-ready figure caption for this figure:

FIGURE DESCRIPTION:
{figure_description}
"""
    if statistical_results:
        prompt += f"\nSTATISTICAL RESULTS TO INCLUDE:\n{statistical_results}\n"
    
    prompt += """
The caption should:
1. Start with a brief title (e.g., "Figure 1. Treatment effects on improvement scores.")
2. Describe what is shown (plot type, variables)
3. Explain visual elements (error bars, significance markers, etc.)
4. Include sample sizes and statistical results if provided
5. Be concise but complete
"""
    
    return ask_llm(
        "You are a scientific writing expert who writes precise figure captions.",
        prompt,
        temperature=0.3
    )

# YOUR CODE HERE: Generate a caption for the visualization we created
# figure_desc = "Box plot and bar chart comparing improvement scores across three treatment groups (Drug_A, Drug_B, Placebo)"
# stats_results = "ANOVA: F=XX, p=XX. Drug_B showed significantly higher improvement than Placebo."
# caption = generate_figure_caption(figure_desc, stats_results)
# print(caption)

---

## 7. Key Takeaways

### What You've Learned:
1. **Agent Formula**: LLM + Tools + Loop
2. **ReAct Pattern**: Think â†’ Act â†’ Observe â†’ Repeat
3. **Tool Design**: Creating useful, well-documented functions
4. **Practical Applications**: Data analysis, statistics, writing assistance

### When to Use Agents:

| Good for Agents | Better with Single Prompts |
|----------------|---------------------------|
| Exploratory analysis | Fixed transformations |
| Multi-step workflows | Simple Q&A |
| Tasks needing decisions | Template filling |
| Dynamic problem-solving | Deterministic tasks |

### Best Practices:
- **Start simple**: Build basic agents before complex ones
- **Good tools**: Well-designed tools make agents more reliable
- **Clear prompts**: Tell the agent exactly what format to use
- **Iteration limits**: Always set a maximum to prevent infinite loops
- **Error handling**: Make tools robust to unexpected inputs

### Next Steps:
- Experiment with combining different tools
- Build agents for your specific research workflows
- Explore more advanced agent frameworks (LangChain, AutoGen)

---

## 8. Workshop Summary

Congratulations! You've completed the "LLM Allies for Scientists" workshop.

### What You've Learned Across All Notebooks:

| Notebook | Key Skills |
|----------|------------|
| 1. First AI Assistant | API basics, prompt engineering, literature analysis |
| 2. Data Visualization | Natural language â†’ code, iterative refinement |
| 3. Automation | Script generation, batch processing, data cleaning |
| 4. AI Agents | Multi-step reasoning, tool use, autonomous analysis |

### Keep Exploring:
- Try these techniques on your own research data
- Build custom tools for your specific domain
- Share useful prompts with your lab colleagues
- Stay updated on new AI capabilities

Happy researching! ðŸ”¬

In [None]:
# Cleanup
import os
# Uncomment to remove generated files
# os.remove('clinical_trial_data.csv') if os.path.exists('clinical_trial_data.csv') else None
# for f in os.listdir('.'):
#     if f.endswith('.png'):
#         os.remove(f)