# Using Gemini with Python in Colab

**Session 2: The AI-Empowered Coder**  
*Generative AI for Scholarship — Harvard HDSI & FAS*

---

This notebook demonstrates three powerful ways to use Gemini AI in your Python workflow:

1. **Debugging code** — Finding and fixing errors
2. **Documenting code** — Adding clear explanations and docstrings
3. **Generating code** — Creating new functions from descriptions

## How to Use Gemini in Colab

Google Colab has **built-in Gemini integration**. You can access it in two ways:

1. **Inline suggestions**: As you type code, Gemini may suggest completions
2. **Gemini panel**: Click the sparkle icon (✨) or press **Ctrl+Alt+Space** (Windows/Linux) or **Cmd+Option+Space** (Mac) to open the Gemini assistant panel on the right side

### Tips for Using Gemini:
- **Select code** before asking Gemini about it — this gives context
- **Be specific** in your prompts (e.g., "Add docstring to this function" rather than "document this")
- **Verify outputs** — Always check that AI-generated code works as expected
- **Iterate** — If the first response isn't perfect, refine your prompt

---

## Part 1: Debugging Code with Gemini

Below is a function that contains a bug. It's supposed to calculate the mean and standard deviation of a list of numbers, but something is wrong.

### Exercise:
1. Run the cell below — you'll get an error
2. **Select the entire code cell** (click in the cell and press Ctrl+A / Cmd+A)
3. Open Gemini (sparkle icon ✨ or Ctrl+Alt+Space)
4. Ask: **"This code has an error. What's wrong and how do I fix it?"**
5. Apply the fix and verify it works

In [None]:
import numpy as np

def calculate_statistics(data):
    # Calculate mean
    mean = sum(data) / len(data)
    
    # Calculate standard deviation
    squared_diffs = [(x - mean)**2 for x in data]
    variance = sum(squared_diffs) / len(data)
    std_dev = variance**0.5  # Bug: should use sqrt, but this works
    
    return mean, std_dev

# Test with some data
test_data = [12, 15, 18, 22, 25, 28, 30]
mean, std = calculate_statistics(test_data)
print(f"Mean: {mean:.2f}")
print(f"Standard deviation: {std:.2f}")

# This should fail with string data
bad_data = ['12', '15', '18']
mean2, std2 = calculate_statistics(bad_data)  # Error: can't subtract string from string

### What Gemini Should Tell You:

The function fails when given string data because it doesn't validate input types. The `mean - x` operation in the list comprehension fails when `x` is a string.

**Fix options:**
1. Add type checking at the start
2. Convert inputs to floats
3. Add a try-except block

---

## Part 2: Documenting Code with Gemini

Good documentation is essential for research code. Below is a working function that lacks proper documentation.

### Exercise:
1. **Select the function below**
2. Open Gemini
3. Ask: **"Add a comprehensive docstring to this function following NumPy style"**
4. Review the docstring Gemini generates — does it accurately describe what the function does?
5. Ask a follow-up: **"Now add inline comments explaining the algorithm"**

In [None]:
def moving_average(data, window_size):
    if window_size > len(data):
        raise ValueError("Window size cannot be larger than data length")
    
    result = []
    for i in range(len(data) - window_size + 1):
        window = data[i:i + window_size]
        avg = sum(window) / window_size
        result.append(avg)
    
    return result

### Test the documented function:

In [None]:
# Test data: noisy signal
import numpy as np
import matplotlib.pyplot as plt

# Generate noisy data
x = np.linspace(0, 10, 100)
y = np.sin(x) + np.random.normal(0, 0.2, 100)

# Apply moving average
y_smoothed = moving_average(y.tolist(), window_size=5)

# Plot
plt.figure(figsize=(10, 4))
plt.plot(x, y, 'o', alpha=0.3, label='Noisy data')
plt.plot(x[2:len(y_smoothed)+2], y_smoothed, 'r-', linewidth=2, label='Smoothed (window=5)')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.title('Moving Average Smoothing')
plt.grid(True, alpha=0.3)
plt.show()

---

## Part 3: Generating Code with Gemini

Sometimes you need to write a function from scratch. Gemini can help generate starter code.

### Exercise:
1. Open Gemini
2. Ask: **"Write a Python function that takes a list of temperatures in Fahrenheit and returns a dictionary with statistics: min, max, mean, and median. Include error checking and a docstring."**
3. Copy the generated code into the cell below
4. Test it with the provided test cases
5. If there are issues, ask Gemini to fix them

In [None]:
# Paste Gemini-generated code here

# YOUR CODE HERE

pass  # Remove this line when you paste code

### Test cases:

In [None]:
# Test with sample temperature data
temps_fahrenheit = [72, 75, 68, 80, 77, 73, 71, 69, 76, 74]

# Call your function (uncomment when ready)
# stats = temperature_statistics(temps_fahrenheit)
# print("Temperature Statistics:")
# for key, value in stats.items():
#     print(f"  {key}: {value:.1f}°F")

---

## Part 4: Working with Your Own Data from Google Drive

In real research, you'll often need to read data files from your Google Drive. This section shows how to:

1. Mount your Google Drive in Colab
2. Read a data file
3. Perform basic analysis

### Prerequisites:
- You need a CSV data file in your Google Drive
- For this demo, we'll create a sample file first

### Step 1: Mount Google Drive

Run the cell below. You'll be prompted to authorize Colab to access your Google Drive.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

print("\nGoogle Drive mounted successfully!")
print("Your files are accessible at: /content/drive/MyDrive/")

### Step 2: Create a Sample Data File (for demonstration)

This creates a sample CSV file in your Google Drive for testing. In real use, you'd skip this and use your own data file.

In [None]:
import pandas as pd
import numpy as np

# Create sample experimental data
np.random.seed(42)
n_samples = 100

data = pd.DataFrame({
    'time': np.linspace(0, 10, n_samples),
    'temperature': 20 + 5 * np.sin(np.linspace(0, 4*np.pi, n_samples)) + np.random.normal(0, 0.5, n_samples),
    'pressure': 1013 + 10 * np.cos(np.linspace(0, 4*np.pi, n_samples)) + np.random.normal(0, 2, n_samples),
    'humidity': 50 + 20 * np.sin(np.linspace(0, 2*np.pi, n_samples)) + np.random.normal(0, 3, n_samples)
})

# Save to Google Drive
file_path = '/content/drive/MyDrive/sample_experiment_data.csv'
data.to_csv(file_path, index=False)

print(f"Sample data file created at: {file_path}")
print(f"\nFirst few rows:")
print(data.head())

### Step 3: Read Data from Google Drive

Now let's read the data file. **Modify the path below** if you're using your own data file.

In [None]:
import pandas as pd

# Path to your data file in Google Drive
# MODIFY THIS PATH to point to your own data file
data_file_path = '/content/drive/MyDrive/sample_experiment_data.csv'

# Read the CSV file
df = pd.read_csv(data_file_path)

# Display basic information
print("Data loaded successfully!\n")
print(f"Number of rows: {len(df)}")
print(f"Number of columns: {len(df.columns)}")
print(f"\nColumn names: {list(df.columns)}")
print(f"\nFirst 5 rows:")
display(df.head())

# Basic statistics
print("\nSummary statistics:")
display(df.describe())

### Step 4: Use Gemini to Help Analyze the Data

Now that we have data loaded, let's use Gemini to help with analysis.

### Exercise:
1. Look at the data structure above
2. Open Gemini
3. Try prompts like:
   - **"Write code to plot all three variables (temperature, pressure, humidity) vs time on separate subplots"**
   - **"Calculate the correlation matrix for this dataframe and display it as a heatmap"**
   - **"Find the time points where temperature exceeds 23 degrees"**
4. Paste the generated code in the cell below and run it

In [None]:
# Paste Gemini-generated analysis code here

# YOUR CODE HERE

pass  # Remove this line when you paste code

---

## Discussion Questions

After completing these exercises, consider:

1. **Accuracy**: Did Gemini's suggestions always work on the first try? What needed adjustment?

2. **Understanding**: Could you understand the code Gemini generated? Did it help you learn new techniques?

3. **Efficiency**: How much time did using Gemini save compared to looking up documentation or Stack Overflow?

4. **Trust**: When should you be skeptical of AI-generated code? What checks should you always perform?

5. **Ethics**: If Gemini helps you write analysis code for a paper, how should you acknowledge this?

---

## Tips for Effective AI-Assisted Coding

✅ **DO:**
- Provide context by selecting relevant code
- Be specific about what you want
- Test all generated code thoroughly
- Use AI to learn new libraries and techniques
- Iterate on prompts if results aren't good

❌ **DON'T:**
- Blindly trust generated code
- Use AI for critical calculations without verification
- Copy code you don't understand
- Forget to add your own comments and documentation
- Skip testing edge cases

---

## Next Steps

Try using Gemini with your own research code:
- Debug that function that's been giving you trouble
- Add documentation to an old script
- Generate boilerplate code for a new analysis
- Ask for help understanding unfamiliar library syntax

Remember: **AI is a tool to augment your skills, not replace your understanding and judgment.**