# GitHub Copilot Agent Workshop
## Introduction to AI-Powered Research Tools

**Date**: October 2nd, 2024  
**Time**: 3:30 PM  
**Location**: OEW 703  
**Target Audience**: Chinese Medicine Students  

---

### Prerequisites
- ✅ GitHub account already set up
- ✅ VS Code or GitHub Codespace environment ready
- ✅ AI Agent (GitHub Copilot Agent) available in ask mode

### Workshop Objectives
- [ ] Understand the difference between browser AI and Agent AI
- [ ] Learn to use GitHub Copilot Agent for research tasks
- [ ] Practice file processing and document manipulation
- [ ] Process lecture transcripts and generate summaries
- [ ] Build automated research workflows


## Part 1: Introduction and Setup (30 minutes)

### 1.1 Welcome and Workshop Objectives

Welcome to the GitHub Copilot Agent workshop! Today we'll explore how AI can transform your research workflow.

**Key Question**: What's the difference between talking to AI in a browser vs. having an AI agent that can directly access your files?

**Your Task**: Think about a recent research task where you had to copy-paste between different applications. How could an AI agent help streamline this process?


### 1.2 GitHub Account Verification

**✅ Assumption**: You already have a GitHub account set up

**Reminder**: Apply for GitHub Education benefits if you haven't already
- Visit: https://docs.github.com/en/education/about-github-education/github-education-for-students/apply-to-github-education-as-a-student
- This gives you free access to GitHub Copilot Agent

**Your Task**: Verify your GitHub account is working and note any questions about education benefits.


### 1.3 Environment Setup

**Note**: We'll start by learning to interact with the AI Agent first. Python environment setup can be done later if needed.

**For now**: Let's focus on learning how to communicate with the AI Agent effectively.

**Your Task**: Make sure you can access the AI Agent in your VS Code or Codespace environment. Take a moment to familiarize yourself with the interface.


## Part 2: Warm-up with AI Agent (45 minutes)

### 2.1 Processing Lecture Transcripts - Three Approaches

**File**: We'll work with the Week 1 lecture transcript in VTT format
- Location: `data/2SeptWeek1lecture.vtt`
- This is a real lecture transcript from our course

**Three Processing Options**:

**Option 1: Direct AI Processing**
- Copy and paste the VTT content to Poe or another AI tool
- Ask for a summary directly

**Option 2: Python Code Approach**
- Ask AI to write Python code to clean up the VTT file
- Run the code to process the transcript
- Generate summary from cleaned data
- *Note: This approach requires Python environment setup*

**Option 3: Agent File Processing**
- Use the AI Agent to process the file directly
- Create instructions file for the Agent
- Let the Agent handle the entire workflow
- *Note: This approach uses structured instructions*

**Your Task**: Choose one approach and try it out. We'll compare results!


### 2.2 Option 1: Direct AI Processing

**Your Task**: Try the direct AI processing approach

**Step 1**: Open the VTT file
- Navigate to `data/2SeptWeek1lecture.vtt`
- Open the file and copy its contents

**Step 2**: Use an AI tool
- Go to Poe, ChatGPT, or another AI tool
- Paste the VTT content
- Ask: "Please analyze this lecture transcript and provide a comprehensive summary with key points"

**Step 3**: Review the results
- Note the quality of the summary
- Check if important content was captured
- Take notes on the experience

**🎯 Learning Goals**:
- Experience direct AI processing of content
- Understand the copy-paste workflow
- See how AI handles raw transcript data


### 2.3 Option 2: Structured Instructions Approach

**Your Task**: Create a structured instructions file for the AI Agent to follow

**Step 1**: Create the processing folder structure
- **COPY THIS PROMPT**: `Create a new folder called 'AgentProcessVTT' and inside it create three files:
  1. Copy 'data/2SeptWeek1lecture.vtt' to 'AgentProcessVTT/lecture.vtt'
  2. Create 'AgentProcessVTT/instructions.md' with a template for VTT processing instructions
  3. Create an empty file 'AgentProcessVTT/processed_lecture.md'`

**Step 2**: Customize the instructions template
- **COPY THIS PROMPT**: `Open the file 'AgentProcessVTT/instructions.md' and customize it with these details:
  - Input file path: AgentProcessVTT/lecture.vtt
  - Step 1: Remove all timestamp markers and VTT formatting
  - Step 2: Clean up speaker identification and technical markers
  - Step 3: Generate a summary with main topics and key concepts
  - Output file: AgentProcessVTT/processed_lecture.md
  - Quality requirements: Remove timestamps, maintain paragraph structure, include clear headings`

**Step 3**: Give the instructions to the AI Agent
- **COPY THIS PROMPT**: `Read the file 'AgentProcessVTT/instructions.md' and follow all the instructions exactly. Process the VTT file according to the specifications and update the processed_lecture.md file.`

**Step 4**: Verify the results
- Ask the Agent to show you what was updated in the processed_lecture.md file
- Check the quality of the output
- Compare with Option 1 results

**🎯 Learning Goals**:
- Learn to create structured instructions for the Agent
- Understand how to specify file paths and processing steps
- See how the Agent can follow detailed instructions
- Practice creating reusable processing workflows


### 2.4 Option 3: Python Pseudocode Analysis

**Your Task**: Read and understand Python pseudocode for VTT processing

**Step 1**: Read the pseudocode
- **COPY THIS PROMPT**: `Create a file called 'vtt_processing_pseudocode.py' with the following pseudocode for processing VTT files:`

```python
# VTT Processing Pseudocode
def process_vtt_file(input_file, output_file):
    """
    Process a VTT file to extract clean text content
    """
    # Step 1: Read the VTT file
    with open(input_file, 'r', encoding='utf-8') as file:
        content = file.read()
    
    # Step 2: Remove VTT header and metadata
    lines = content.split('\n')
    cleaned_lines = []
    
    for line in lines:
        # Skip timestamp lines (format: 00:00:00.000 --> 00:00:05.000)
        if '-->' in line and ':' in line:
            continue
        # Skip empty lines and VTT header
        elif line.strip() == '' or line == 'WEBVTT':
            continue
        # Keep text content
        else:
            cleaned_lines.append(line.strip())
    
    # Step 3: Join lines and clean up
    cleaned_text = ' '.join(cleaned_lines)
    cleaned_text = cleaned_text.replace('  ', ' ')  # Remove double spaces
    
    # Step 4: Save to output file
    with open(output_file, 'w', encoding='utf-8') as file:
        file.write(cleaned_text)
    
    return cleaned_text

# Usage example
if __name__ == "__main__":
    input_file = "data/2SeptWeek1lecture.vtt"
    output_file = "cleaned_lecture.txt"
    result = process_vtt_file(input_file, output_file)
    print(f"Processed {len(result)} characters")
```

**Step 2**: Answer these questions about the pseudocode
1. What does the function `process_vtt_file` do?
2. How does it identify timestamp lines?
3. What happens to empty lines and the VTT header?
4. What is the purpose of `cleaned_text.replace('  ', ' ')`?
5. How would you modify this code to also extract speaker names?

**Step 3**: Discuss with your neighbor
- What would happen if we didn't skip timestamp lines?
- How could we improve the text cleaning process?
- What other VTT processing features might be useful?

**🎯 Learning Goals**:
- Understand Python pseudocode structure
- Learn VTT file processing concepts
- Practice code analysis and interpretation
- Prepare for actual Python implementation


### 2.5 Comparison Activity

**Your Task**: Compare the results from all three approaches

**Questions to Consider**:
1. Which approach was fastest?
2. Which approach gave the best results?
3. Which approach was easiest to use?
4. What are the advantages of each approach?
5. When would you use each approach?

**Discussion Points**:
- **Direct AI**: Quick but requires manual copy-paste
- **Python Code**: More control but requires coding knowledge
- **Structured Instructions**: Automated and reusable but requires planning
- **Direct Agent Communication**: Most flexible but requires clear communication

**🎯 Learning Goal**: Understand different AI workflow approaches
**📝 Take notes on your preferences and use cases**


## Part 3: Python Environment Setup and Advanced Techniques (45 minutes)

### 3.0 Python Environment Setup

**Your Task**: Set up Python environment for advanced file processing

**Step 1**: Ask the Agent to help with Python setup
- **COPY THIS PROMPT**: `Help me set up a Python environment for data analysis and file processing. Install the necessary packages and create a simple test script.`

**Step 2**: Test the environment
- **COPY THIS PROMPT**: `Create a simple Python script that tests if we can read files, process text, and save outputs. Include error handling and show me the results.`

**Step 3**: Verify packages
- **COPY THIS PROMPT**: `Check what Python packages are available and suggest which ones we might need for PDF processing, text analysis, and file manipulation.`

**🎯 Learning Goals**:
- Learn to set up Python environment with AI assistance
- Understand package management
- Practice basic Python file operations
- Prepare for advanced processing tasks

### 3.1 Working with PDF Files

**Your Task**: Learn to process PDF files using structured instructions

**Step 1**: Create the PDF processing folder structure
- **COPY THIS PROMPT**: `Create a new folder called 'AgentProcessPDF' and inside it create three files:
  1. Copy 'data/reviewArticle.pdf' to 'AgentProcessPDF/article.pdf'
  2. Create 'AgentProcessPDF/instructions.md' with a template for PDF processing instructions
  3. Create an empty file 'AgentProcessPDF/converted_article.md'`

**Step 2**: Customize the PDF processing instructions
- **COPY THIS PROMPT**: `Open the file 'AgentProcessPDF/instructions.md' and customize it with these details:
  - Input file path: AgentProcessPDF/article.pdf
  - Step 1: Extract text from all pages preserving structure
  - Step 2: Clean up formatting artifacts and fix line breaks
  - Step 3: Convert to markdown format with proper headings
  - Output file: AgentProcessPDF/converted_article.md
  - Quality requirements: Proper markdown formatting, clean structure, readable content`

**Step 3**: Give the instructions to the AI Agent
- **COPY THIS PROMPT**: `Read the file 'AgentProcessPDF/instructions.md' and follow all the instructions exactly. Process the PDF file according to the specifications and update the converted_article.md file.`

**Step 4**: Complete the Python code for PDF processing (Optional)
- **COPY THIS PROMPT**: `Create a Python script called 'pdf_processor.py' with the following structure. Complete the code based on the comments:`

```python
import PyPDF2
import re

def process_pdf(input_file, output_file):
    """
    Process a PDF file and convert to markdown format
    """
    # TODO: Open the PDF file using PyPDF2
    # Hint: Use PyPDF2.PdfReader()
    
    # TODO: Extract text from all pages
    # Hint: Loop through pages and extract text
    
    # TODO: Clean up the text
    # Hint: Remove extra whitespace, fix line breaks
    
    # TODO: Convert to markdown format
    # Hint: Add proper headings and formatting
    
    # TODO: Save to output file
    # Hint: Write to file with proper encoding
    
    return cleaned_text

# TODO: Add error handling
# TODO: Add main execution block
# TODO: Test with the PDF file
```

**Step 3**: Test and improve the code
- **COPY THIS PROMPT**: `Run the PDF processor script and fix any errors. Show me the results and suggest improvements.`

**🎯 Learning Goals**:
- Learn to work with PDF files using Python
- Practice completing code based on comments
- Understand file processing workflows
- See how AI can help with code completion


### 3.2 Advanced Multi-Source Processing

**Your Task**: Practice advanced processing with multiple sources

**Step 1**: Create the advanced processing folder structure
- **COPY THIS PROMPT**: `Create a new folder called 'AgentProcessAdvanced' and inside it create five files:
  1. Copy 'data/2SeptWeek1lecture.vtt' to 'AgentProcessAdvanced/lecture.vtt'
  2. Copy 'data/reviewArticle.pdf' to 'AgentProcessAdvanced/article.pdf'
  3. Create 'AgentProcessAdvanced/instructions.md' with advanced processing instructions
  4. Create an empty file 'AgentProcessAdvanced/comprehensive_analysis.md'
  5. Create an empty file 'AgentProcessAdvanced/research_insights.md'`

**Step 2**: Customize the advanced processing instructions
- **COPY THIS PROMPT**: `Open the file 'AgentProcessAdvanced/instructions.md' and customize it with these details:
  - Input files: AgentProcessAdvanced/lecture.vtt and AgentProcessAdvanced/article.pdf
  - Step 1: Process both VTT and PDF files
  - Step 2: Perform cross-source analysis and identify connections
  - Step 3: Generate comprehensive analysis and research insights
  - Output files: AgentProcessAdvanced/comprehensive_analysis.md and AgentProcessAdvanced/research_insights.md
  - Quality requirements: Integrated analysis, actionable insights, professional formatting`

**Step 3**: Give the instructions to the AI Agent
- **COPY THIS PROMPT**: `Read the file 'AgentProcessAdvanced/instructions.md' and follow all the instructions exactly. Process both files according to the specifications and update both output files.`

**Step 4**: Complete the text analysis code (Optional)
- **COPY THIS PROMPT**: `Create a Python script called 'text_analyzer.py' with the following structure. Complete the code based on the comments:`

```python
import re
from collections import Counter

def extract_citations(text):
    """
    Extract citations and references from text
    """
    # TODO: Find citation patterns
    # Hint: Look for patterns like (Author, Year) or [1], [2], etc.
    
    # TODO: Extract author names, years, titles
    # Hint: Use regex patterns to match different citation formats
    
    # TODO: Format as bibliography
    # Hint: Create structured output
    
    return citations

def generate_summary(text):
    """
    Generate a comprehensive summary of the text
    """
    # TODO: Extract key sentences
    # Hint: Look for sentences with important keywords
    
    # TODO: Identify main topics
    # Hint: Use word frequency analysis
    
    # TODO: Structure the summary
    # Hint: Organize by sections (introduction, methodology, conclusions)
    
    return summary

def create_research_questions(text):
    """
    Generate research questions based on the text
    """
    # TODO: Identify gaps or limitations mentioned
    # Hint: Look for phrases like "future research", "limitations", etc.
    
    # TODO: Generate questions
    # Hint: Create questions that extend the research
    
    # TODO: Format questions clearly
    # Hint: Number and structure the questions
    
    return questions

# TODO: Add main execution block
# TODO: Test with the converted article
```

**Step 5**: Test and refine the code (Optional)
- **COPY THIS PROMPT**: `Run the text analyzer script and show me the results. Fix any errors and improve the output quality.`

**Step 6**: Compare approaches
- **COPY THIS PROMPT**: `Compare the structured instructions approach with the Python code approach. What are the advantages of each method?`

**🎯 Learning Goals**:
- Learn to process multiple file types with structured instructions
- Practice advanced multi-source analysis
- Understand the difference between structured and code-based approaches
- See how AI can handle complex processing workflows


## Part 4: Wrap-up and Next Steps (15 minutes)

### 4.1 Reflection and Discussion

**Your Task**: Reflect on today's workshop and share your thoughts:

1. **Which approach worked best for you?** Direct AI, Python code, or Agent processing?
2. **What was most useful?** What features of the Agent impressed you most?
3. **What challenges did you face?** What was difficult or confusing?
4. **How will you use this?** How do you plan to use Agent AI in your studies?
5. **What questions do you have?** What would you like to learn more about?

**Your Notes**:

---

---

---

---


### 4.2 Resources and Follow-up

**Resources for Continued Learning**:
- [GitHub Education Benefits](https://docs.github.com/en/education/about-github-education/github-education-for-students/apply-to-github-education-as-a-student)
- [GitHub Copilot Agent Documentation](https://docs.github.com/en/copilot/github-copilot-agent)
- [Python for Research](https://www.python.org/about/gettingstarted/)
- [Jupyter Notebook Tutorial](https://jupyter-notebook.readthedocs.io/en/stable/notebook.html)

**Next Steps**:
- [ ] Complete your GitHub Education application
- [ ] Practice using Agent AI with your own research projects
- [ ] Try all three approaches with your own files
- [ ] Join our weekly office hours for Agent-related questions
- [ ] Attend the advanced workshop for interested students

**Contact Information**:
- **Instructor**: [Your Name]
- **Email**: [Your Email]
- **Office Hours**: [Schedule]

**Thank you for participating in today's workshop!** 🎉
