# Tutorial: Learn OpenCode While Doing the Workshop

## Practical guide with commands to run in your terminal

**OpenCode** is a terminal-based coding assistant that lets you interact with LLMs (Claude, GPT, etc.) directly from the command line.

---

### How to use this notebook

1. **Keep this notebook open** as a reference
2. **Open a terminal** on your computer
3. **Copy and run** the commands shown
4. **Switch back and forth** between this tutorial and the workshop notebooks

Each section is synchronized with the main workshop notebooks.

---

## 0. Installation and Setup

### Install OpenCode

```bash
# Option 1: With npm (Node.js)
npm install -g @anthropics/opencode

# Option 2: With Homebrew (macOS)
brew install opencode

# Option 3: Direct download
curl -fsSL https://opencode.ai/install.sh | sh
```

### Verify installation

```bash
opencode --version
```

---

## 1. Getting Started

While learning API fundamentals in Notebook 1, try these basic OpenCode commands.

### 1.1 Start OpenCode

```bash
# Navigate to the workshop directory
cd ~/path/to/code-llm-allies-2026

# Start OpenCode
opencode
```

### 1.2 Your first prompt

Once inside OpenCode, type:

```
Explain what an LLM API is in 3 simple sentences
```

### 1.3 Explore the project

```
What files are in this project? Give me a summary of the structure
```

### 1.4 Read a file

```
Read the file data/paper_abstracts.txt and tell me how many abstracts it contains
```

### 1.5 Analyze text (like in Notebook 1)

```
Read the first abstract from data/paper_abstracts.txt and:
1. Summarize it in 2 sentences for someone without technical knowledge
2. Extract the 3 most important keywords
3. Suggest a more accessible alternative title
```

---

## 2. Data Visualization 

While working with visualizations in Notebook 2, try generating plots with OpenCode.

### 2.1 Explore data

```
Read data/climate_data.csv and tell me:
- How many rows and columns it has
- What data types are in each column
- A basic statistical summary
```

### 2.2 Create a simple visualization

```
Create a Python script that:
1. Loads data/climate_data.csv
2. Makes a line plot of temperature vs date
3. Saves the plot as temperature_plot.png
4. Run the script
```

### 2.3 More complex visualization

```
Using data/climate_data.csv, create a visualization that shows:
- Temperature on the Y axis
- Date on the X axis
- Different colors for each location
- A trend line (6-month rolling average)
- Clean publication-ready style

Save it as climate_trends.png
```

### 2.4 Bar chart with error bars

```
With data/experiment_results.csv:
1. Calculate the mean and standard deviation of 'measurement' by 'treatment'
2. Create a bar chart with error bars
3. Use colorblind-friendly colors
4. Save as treatment_comparison.png
```

### 2.5 Refine iteratively

After creating a plot, you can ask for modifications:

```
In the last plot you created:
- Increase the font size to 14pt
- Add a descriptive title
- Change the background to white
- Save the updated version
```

---

## 3. Automation 

While learning to automate tasks in Notebook 3, use OpenCode for real tasks.

### 3.1 Analyze problematic data

```
Read data/messy_data.csv and analyze:
1. What data quality issues do you find?
2. Which columns have inconsistent formats?
3. How many missing values are there per column?
```

### 3.2 Create a cleaning function

```
Based on the problems you found in messy_data.csv,
create a Python script called clean_data.py that:

1. Defines a function clean_messy_data(filepath) that:
   - Cleans the 'concentration' column (extract numbers only)
   - Standardizes dates to YYYY-MM-DD format
   - Cleans the 'temperature' column (extract numeric values)
   - Converts 'cell_count' to standard numeric format
   - Standardizes 'status' to Title Case

2. Saves the result as messy_data_cleaned.csv
3. Prints a summary of changes made

Run the script and show me the result.
```

### 3.3 Process multiple files

```
Create a script that:
1. Lists all CSV files in the data/ folder
2. For each file, generates basic statistics
3. Saves a summary to data_summary.txt
```

### 3.4 Generate automatic documentation

```
For the clean_data.py file we just created:
1. Add comprehensive docstrings to all functions
2. Add type hints
3. Add usage examples in the docstrings
```

### 3.5 Create analysis pipeline

```
Create a script analysis_pipeline.py that:

1. Loads experiment_results.csv
2. Calculates descriptive statistics by treatment group
3. Runs an ANOVA test to compare groups
4. Generates a visualization of the results
5. Saves a report in Markdown format with:
   - Statistics table
   - Statistical test result
   - Interpretation of results

Run the complete pipeline.
```

---

## 4. Complex Agent-like Tasks 

While learning about agents in Notebook 4, observe how OpenCode acts as an agent.

### 4.1 Complete exploratory analysis

```
Perform a complete exploratory analysis of experiment_results.csv:

1. Load and explore the data
2. Identify the most important variables
3. Look for significant correlations
4. Compare treatment groups
5. Create relevant visualizations
6. Generate a report with your main findings

Save everything in a folder called 'analysis_output'
```

### 4.2 Data investigation

```
Investigate whether there are significant differences in experiment 
results by species:

1. Explore the data
2. Decide which statistical tests are appropriate
3. Run the analyses
4. Create visualizations that illustrate your findings
5. Write a conclusion in simple language
```

### 4.3 Scientific report generation

```
Based on the analysis of experiment_results.csv, generate:

1. A Methods section for a scientific paper
2. A Results section with:
   - Description of findings
   - References to figures
   - Properly formatted statistical values
3. Figure legends for the generated figures

Save everything in a file called report.md
```

### 4.4 Multi-step task with decisions

```
I have survey data in survey_responses.csv.
I want to understand what factors influence satisfaction (q1_satisfaction).

Please:
1. Explore the data and understand its structure
2. Identify possible predictors of satisfaction
3. Decide which analyses are appropriate (correlations, group comparisons, etc.)
4. Run the analyses you consider relevant
5. Create visualizations that tell the story
6. Give me recommendations based on the data

Document your decision process at each step.
```

---

## 5. Useful OpenCode Commands

### Commands inside OpenCode

| Command | Description |
|---------|-------------|
| `/help` | Show help and available commands |
| `/clear` | Clear the conversation context |
| `/model` | Change the model (claude, gpt-4, etc.) |
| `/compact` | Compact history to save tokens |
| `/cost` | Show estimated session cost |
| `/exit` or `Ctrl+C` | Exit OpenCode |

### Keyboard shortcuts

| Shortcut | Action |
|----------|--------|
| `Tab` | Autocomplete |
| `Up/Down` | Navigate history |
| `Ctrl+C` | Cancel current operation |
| `Ctrl+L` | Clear screen |

### Usage tips

```
# Check your session cost
/cost

# Switch to a more economical model
/model gpt-4o-mini

# Clear context if the conversation gets confusing
/clear

# Compact history for long sessions
/compact
```

---

## 6. Principles for Effective Use of Code Agents

### 6.1 The Golden Rules

| Principle | Why It Matters |
|-----------|----------------|
| **Be specific** | Vague requests lead to generic solutions |
| **Provide context** | The agent doesn't know your project |
| **Iterate, don't restart** | Build on previous responses |
| **Verify outputs** | AI can make mistakes, especially with statistics |
| **Break down complex tasks** | Smaller steps = better results |

### 6.2 How to Write Effective Prompts

#### Bad vs Good Prompts

```
# BAD: Too vague
"Analyze my data"

# GOOD: Specific and contextual
"Analyze experiment_results.csv: compare the 'measurement' column 
across the three treatment groups using ANOVA, create a box plot 
with significance annotations, and save results to analysis_output/"
```

```
# BAD: No context about your needs
"Make a plot"

# GOOD: Clear requirements
"Create a publication-ready figure for Nature journal:
- Bar plot of mean ± SEM for each treatment
- Individual data points overlaid
- 300 DPI, Arial font, no gridlines
- Significance bars with asterisks (* p<0.05, ** p<0.01)"
```

```
# BAD: Assuming the agent knows your data
"Calculate the IC50"

# GOOD: Providing necessary context
"In dose_response.csv, the 'concentration' column has drug doses in µM
and 'viability' has cell viability as percentage. Fit a 4-parameter 
logistic curve and calculate IC50 with 95% confidence interval."
```

### 6.3 The CLEAR Framework for Scientific Prompts

**C** - Context: What is your data? What field are you in?
**L** - Language: Use precise scientific terminology
**E** - Expected output: What format do you need?
**A** - Assumptions: State any constraints or requirements
**R** - Review criteria: How will you verify the result?

```
# Example using CLEAR framework:

CONTEXT: I have qPCR data in 'qpcr_results.csv' with Ct values 
for a gene of interest and housekeeping gene across 3 conditions.

LANGUAGE: I need to calculate delta-delta Ct (2^-ΔΔCt method) 
for relative gene expression.

EXPECTED OUTPUT: 
- Bar plot with fold change vs control
- Error bars showing SEM from 3 biological replicates
- Statistical comparison using one-way ANOVA

ASSUMPTIONS:
- Control condition is "untreated"
- Housekeeping gene is "GAPDH"
- Alpha level is 0.05

REVIEW: I will verify by manually calculating one sample.
```

### 6.4 When to Trust (and Not Trust) the Agent

#### Generally Safe to Trust:
- File operations (reading, writing, organizing)
- Basic visualizations (plots, charts)
- Code syntax and structure
- Data transformations (filtering, merging, reshaping)
- Formatting and documentation

#### Always Verify:
- Statistical test selection and interpretation
- Mathematical calculations (especially p-values)
- Biological/scientific interpretations
- Sample size recommendations
- Conclusions and claims about significance

#### How to Verify:
```
# After statistical analysis, ask:
"Show me the intermediate calculations for the ANOVA:
- Group means and variances
- Degrees of freedom
- F-statistic calculation
- How was the p-value derived?"

# For complex analyses:
"Run the same analysis using a different method/package 
and compare results"
```

### 6.5 Iterative Refinement Strategy

#### The 3-Step Approach:

```
# STEP 1: Start broad, get something working
"Create a basic analysis of experiment_results.csv"

# STEP 2: Refine specific aspects
"Now improve the visualization:
- Change to colorblind-friendly palette
- Add proper axis labels with units
- Increase font size to 12pt"

# STEP 3: Polish for final output
"Make this publication-ready:
- Export as 300 DPI TIFF
- Add figure panel labels (A, B, C)
- Ensure it fits in a single column (8.5 cm width)"
```

#### When to Start Fresh vs. Continue:

| Continue the conversation | Start fresh (/clear) |
|--------------------------|---------------------|
| Refining the same analysis | Completely different task |
| Fixing errors in recent code | Agent seems confused |
| Adding features to existing script | Too much irrelevant context |
| Iterating on visualizations | Starting a new dataset |

### 6.6 Handling Errors Effectively

```
# When code fails, provide:
1. The exact error message
2. What you were trying to do
3. Any relevant context

# Example:
"The previous code failed with:
ValueError: could not convert string to float: '1.5 mM'

I'm trying to analyze the concentration column which has mixed 
formats (some numeric, some with units). Please fix the code 
to handle unit extraction."
```

### 6.7 Scientific Integrity Guidelines

#### DO:
- Verify statistical outputs manually for key results
- Keep a record of prompts used (for reproducibility)
- State in methods if AI tools were used for analysis
- Double-check biological interpretations

#### DON'T:
- Blindly trust p-values or statistical conclusions
- Let the agent choose statistical tests without understanding why
- Use AI-generated interpretations directly in papers
- Assume the agent understands your specific experimental design

```
# Good practice: Ask for explanations
"You chose a Mann-Whitney U test. Explain why this is more 
appropriate than a t-test for my data, and what assumptions 
are we checking/violating."
```

### 6.8 Prompt Templates for Common Scientific Tasks

#### Statistical Analysis
```
Analyze [file] to test if [variable] differs between [groups].
- Check assumptions for parametric tests
- Choose appropriate test and justify
- Report: test statistic, degrees of freedom, p-value, effect size
- Create visualization with significance annotations
- Write a results sentence in scientific format
```

#### Figure Generation
```
Create a publication figure from [file]:
- Plot type: [bar/scatter/line/box/violin]
- X-axis: [column] (label: "[Label with units]")
- Y-axis: [column] (label: "[Label with units]")
- Grouping: [column for colors/panels]
- Style: [journal name] guidelines
- Export: [format], [DPI], [dimensions]
```

#### Data Cleaning
```
Clean [file] for analysis:
- Expected columns: [list]
- [column1] should be: [type, range, format]
- [column2] should be: [type, range, format]
- Handle missing values by: [strategy]
- Flag but don't remove outliers beyond [X] SD
- Export cleaned data and QC report
```

#### Literature Analysis
```
Read [abstracts file] and for each paper extract:
- Main finding (1 sentence)
- Methodology used
- Sample size/model organism
- Key statistics reported
- Limitations mentioned
Create a summary table in markdown format.
```

---

## 7. Progressive Exercises

### Beginner Level

```
# Exercise 1: Basic data exploration
Read data/experiment_results.csv and tell me:
- How many subjects per treatment group?
- What is the mean measurement for each treatment?
- Are there any missing values?

# Exercise 2: Quick statistics
Calculate the mean, median, and standard deviation of 'growth_rate' 
in experiment_results.csv, grouped by 'species'

# Exercise 3: Simple visualization
Create a box plot comparing 'measurement' across treatment groups
from experiment_results.csv. Save it as treatment_boxplot.png

# Exercise 4: Data summary
Read all CSV files in the data/ folder and create a summary table
showing: filename, number of rows, number of columns, column names

# Exercise 5: Literature helper
Read data/paper_abstracts.txt and for each abstract:
- Identify the main methodology used
- List the key findings
- Suggest 3 related search terms for PubMed
```

### Intermediate Level

```
# Exercise 6: Statistical comparison
Using experiment_results.csv:
- Run an ANOVA to compare measurements across treatments
- If significant, run post-hoc Tukey tests
- Create a visualization showing significant differences with asterisks
- Save results to statistical_analysis.txt

# Exercise 7: Publication-ready figure
Create a multi-panel figure (2x2) from experiment_results.csv showing:
- Panel A: Bar plot of mean measurement by treatment (with SEM error bars)
- Panel B: Scatter plot of measurement vs growth_rate colored by species
- Panel C: Distribution of measurements (histogram + KDE)
- Panel D: Box plot by treatment and species
Use publication style (Nature/Science guidelines): 300 DPI, proper fonts

# Exercise 8: Data cleaning pipeline
Create a script clean_lab_data.py that:
- Reads messy_data.csv
- Standardizes all date formats to ISO format
- Converts concentration values to numeric (handling 'mM', 'uM' units)
- Flags outliers using IQR method
- Exports clean data and a QC report

# Exercise 9: Batch figure generation
Create a script that reads experiment_results.csv and automatically generates:
- One figure per unique value in 'species' column
- Each figure shows measurement vs treatment for that species
- Saves all figures in a 'figures/' folder with descriptive names

# Exercise 10: Methods section generator
Based on the analysis of experiment_results.csv, generate a Methods section
that includes:
- Sample sizes per group
- Statistical tests used (with software/version)
- Significance threshold
- How data was visualized
Format it for a scientific journal submission.

# Exercise 11: Supplementary table creator
Create a script that generates formatted supplementary tables:
- Table S1: Descriptive statistics by group
- Table S2: Full statistical test results
- Table S3: Individual subject data
Export as both CSV and formatted Excel with proper headers
```

### Advanced Level

```
# Exercise 12: Reproducible analysis notebook
Convert a complete analysis of experiment_results.csv into a 
Jupyter notebook suitable for journal submission:
- Introduction cell explaining the analysis
- Data loading with validation checks
- Exploratory data analysis with figures
- Statistical tests with interpretation
- Publication-ready figures with captions
- Conclusions and limitations
- Session info (package versions)

# Exercise 13: Power analysis tool
Create a script power_analysis.py that:
- Takes pilot data (experiment_results.csv)
- Calculates effect sizes between groups
- Estimates required sample size for 80% power
- Creates a power curve visualization
- Outputs recommendations for future experiments

# Exercise 14: Meta-analysis helper
Create a tool that:
- Reads multiple experiment CSV files
- Extracts effect sizes and confidence intervals
- Creates a forest plot
- Calculates pooled effect size
- Tests for heterogeneity
```

### Lab Productivity Tools

```
# Exercise 15: Experiment tracker dashboard
Create a Streamlit app for tracking lab experiments:
- Upload CSV files with experiment results
- Automatic QC checks (missing data, outliers, expected ranges)
- Quick visualization of results
- Compare with previous experiments
- Export summary for lab notebook
- Track experiments by date, project, researcher

Save as lab_dashboard.py and run with: streamlit run lab_dashboard.py

# Exercise 16: Protocol optimizer
Build a tool that analyzes experiment_results.csv to:
- Identify which conditions give best results
- Suggest optimal parameter combinations
- Show dose-response curves if applicable
- Calculate EC50/IC50 values
- Generate protocol recommendations

# Exercise 17: Lab meeting figure generator
Create an app that quickly generates presentation-ready figures:
- Input: CSV data file
- Select variables to plot
- Choose plot type (bar, scatter, line, box, violin)
- Auto-apply consistent lab style (colors, fonts)
- Add statistical annotations automatically
- Export as PNG and PowerPoint-compatible format

# Exercise 18: Grant figure assistant
Build a Streamlit app that helps create figures for grant applications:
- Loads preliminary data from CSV
- Generates publication-quality figures
- Adds projected data points for proposed experiments
- Creates comparison with published literature (simulated)
- Exports figures with proper legends for grant documents
```

### Research Workflow Automation

```
# Exercise 19: Plate reader analysis pipeline
Create a complete analysis pipeline for 96-well plate data:
- Read raw plate reader CSV output
- Apply blank subtraction
- Calculate means and standard deviations for replicates
- Normalize to control wells
- Fit dose-response curves
- Generate heatmap of plate layout
- Export results and QC report

# Exercise 20: Microscopy image quantification report
Build a tool that processes microscopy quantification data:
- Read CSV with cell counts, intensities, areas
- Calculate statistics per condition/treatment
- Generate violin plots for distributions
- Perform statistical comparisons
- Create a PDF report with all figures and stats
- Include methods description for paper

# Exercise 21: Time-course analysis
Create an analysis pipeline for time-series experiment data:
- Read climate_data.csv (as example time-series)
- Calculate rolling averages and trends
- Detect change points
- Fit growth/decay curves
- Compare conditions over time
- Generate figure with confidence intervals

# Exercise 22: Multi-experiment meta-analyzer
Build an application that:
- Loads multiple experiment result files
- Standardizes column names across files
- Combines data with experiment identifier
- Runs mixed-effects analysis
- Creates publication-ready comparison figures
- Generates combined statistics table
```

### Writing & Documentation Helpers

```
# Exercise 23: Results paragraph generator
Create a tool that:
- Reads statistical analysis output
- Generates a results paragraph in scientific writing style
- Includes proper statistical notation (F(2,45)=3.2, p=0.04)
- Suggests figure references
- Outputs in multiple formats (Word-ready, LaTeX)

# Exercise 24: Figure legend writer
Build a script that:
- Takes a figure file and its source data
- Analyzes what the figure shows
- Generates a complete figure legend including:
  - Brief description of what is shown
  - Explanation of error bars/statistics
  - Sample sizes
  - Statistical test results
  - Abbreviations

# Exercise 25: Scientific presentation builder
Create an app that generates presentation slides:
- Input: experiment data CSV + key message
- Output: 
  - Title slide with key finding
  - Methods summary slide
  - Results slides with auto-generated figures
  - Conclusions slide
  - Export as HTML or PDF slides
```

### Data Management & QC

```
# Exercise 26: Lab data validator
Create a validation system for incoming data:
- Define expected schema (columns, types, ranges)
- Validate new CSV files against schema
- Flag anomalies and potential errors
- Generate QC report with pass/fail status
- Suggest corrections for common issues

# Exercise 27: Experiment reproducibility checker
Build a tool that:
- Compares results from replicate experiments
- Calculates coefficient of variation
- Identifies outlier experiments
- Generates reproducibility report
- Flags experiments that need to be repeated

# Exercise 28: Data archival assistant
Create a system for organizing completed experiments:
- Read experiment data and metadata
- Generate standardized folder structure
- Create README with experiment summary
- Archive raw data with checksums
- Generate data availability statement for paper
```

---

## 8. Troubleshooting Common Issues

### "File not found"

```
# Check your current directory
What directory are we in? List the available files.

# Or specify the full path
Read /full/path/to/file.csv
```

### "Generated code has errors"

```
The previous code failed with this error: [paste error]
Please fix it and run again.
```

### "Need more context"

```
Before continuing, read these files to understand the context:
- [file1]
- [file2]
Then [your request]
```

### "Response is too long/short"

```
# For more concise responses
Answer in maximum 3 sentences: [question]

# For more detailed responses
Explain in detail, step by step: [question]
```

### "Want to undo changes"

```
Undo the last changes you made to file [name]

# Or if using git
Run git checkout [file] to restore the previous version
```

### "Statistical results seem wrong"

```
# Ask for verification
"Please verify the ANOVA results by:
1. Showing me the group means and sample sizes
2. Calculating the F-statistic step by step
3. Comparing with an alternative method (e.g., scipy vs statsmodels)"
```

### "Agent is stuck in a loop"

```
# Clear context and restart with a simpler request
/clear

# Then break down the task
"Let's start fresh. First, just read the file and show me the columns."
```

---

## 9. Cheat Sheet - Quick Reference

### Common Scientific Tasks

| Task | Prompt Template |
|------|-----------------|
| Explore data | `Read [file] and summarize: columns, types, missing values, basic stats` |
| Compare groups | `Compare [variable] between [groups] in [file] using appropriate statistical test` |
| Publication figure | `Create a [journal]-style figure: [plot type] of [variables], 300 DPI, proper labels` |
| Clean messy data | `Clean [file]: standardize [column] format, handle missing values, flag outliers` |
| Generate methods | `Write a Methods section based on the analysis of [file]` |
| Power analysis | `Calculate required sample size for 80% power based on effect size in [file]` |
| Batch process | `For each file in [folder], [action], save results to [output]` |

### Statistical Analysis Prompts

| Analysis | Prompt |
|----------|--------|
| t-test | `Compare [var] between [group1] and [group2], check normality first` |
| ANOVA | `Run one-way ANOVA on [var] by [group], include post-hoc if significant` |
| Correlation | `Calculate Pearson/Spearman correlation between [var1] and [var2]` |
| Regression | `Fit linear regression: [y] ~ [x1] + [x2], report coefficients and R²` |
| Non-parametric | `Data is not normal. Use Mann-Whitney/Kruskal-Wallis for [comparison]` |

### Figure Style Modifiers

| Style | Add to prompt |
|-------|---------------|
| Nature/Science | `...following Nature guidelines: 8.5cm width, Arial, 300 DPI` |
| Colorblind-safe | `...use colorblind-friendly palette (viridis or Color Universal Design)` |
| Presentation | `...large fonts (18pt+), high contrast, simple design` |
| Multi-panel | `...as 2x2 subplot with panels labeled A, B, C, D` |
| With statistics | `...add significance annotations (* p<0.05, ** p<0.01, *** p<0.001)` |

### Useful Modifiers

| Add to prompt | Effect |
|---------------|--------|
| `...step by step` | Shows reasoning and intermediate results |
| `...and run it` | Executes the generated code |
| `...save as [name]` | Saves output to specified file |
| `...explain the choice` | Justifies why a method/test was selected |
| `...verify the results` | Double-checks calculations |
| `...for journal submission` | Publication-ready formatting |
| `...include effect size` | Reports Cohen's d, η², etc. |

### The CLEAR Framework Reminder

When writing prompts, include:
- **C**ontext: What data? What field?
- **L**anguage: Scientific terminology
- **E**xpected output: Format needed
- **A**ssumptions: Constraints, parameters
- **R**eview: How to verify results

---

## Now practice!

Open your terminal, start OpenCode, and begin with the exercise that corresponds to the notebook you're working on.

Remember: 
- Start simple, then iterate
- Always verify statistical results
- Keep a log of your prompts for reproducibility
- The best way to learn is by experimenting!