<a href="https://colab.research.google.com/github/Yuweien/Python-Workshop/blob/main/Python_Workshop_for_Beginners_2_25.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Python Programming: Reading, Understanding, and Modifying Code ‚Äî Part Two

## The problem (live version)

**What I have**
- D2L quiz / survey results (CSV)
- Student info
- Questions
- Answers (MC + short answer)
- Scores (for MC)

**What is missing**
- ‚ùå Group information

**Why this is a problem**
- Students worked in different groups
- Groups were assigned different topics
- I want to analyze results *by group*

‚Üí The downloaded data is **almost** usable, but not yet.

---

### This problem shows up elsewhere too

The same issue happens when metadata is stored in **separate tables**, for example:
- Responses in one file, group membership in another
- Survey answers separate from demographic information
- Text data stored separately from labels or categories

**Core challenge**
- How do we **combine related data** so we can analyze it meaningfully?


## Hook: A problem I ran into recently (narrative version)

This is a problem I ran into recently, and I‚Äôm curious whether any of you have experienced something similar.

I used D2L to run a quiz/survey and downloaded the results as a CSV file.  
The file included student information, questions, responses, and even scores.

However, in this course, students were working in different groups, and each group was assigned a different topic.

When I opened the CSV file, I realized that the **group information was missing**.

This immediately raised some practical questions:
- How can I tell which responses came from which group?
- How can I summarize or compare results *by group*?
- How can I analyze the data in a way that actually reflects how the course was organized?

This kind of situation is surprisingly common:  
the data you download is **almost** what you need, but one key piece is missing.

### A broader version of the same problem

This issue is not specific to D2L.

The same thing can happen whenever your metadata is stored in **separate tables**.  
For example:
- Student responses are in one file, and group membership is in another.
- Survey answers are stored separately from demographic information.
- Text data is in one table, and labels or categories are stored elsewhere.

In all of these cases, the core challenge is the same:
how do we **combine related pieces of information** so that we can actually analyze the data?

In today‚Äôs workshop, we‚Äôll walk through a Python workflow that addresses exactly this kind of problem.


## Current workflow roadmap

**‚úÖ Raw D2L CSV**  
&nbsp;&nbsp;&nbsp;&nbsp;‚Üì  
**üîµ Inspect & clean**  
&nbsp;&nbsp;&nbsp;&nbsp;‚Üì  
**‚¨ú Add group info (merge)**  
&nbsp;&nbsp;&nbsp;&nbsp;‚Üì  
**‚¨ú Split by question type**  
&nbsp;&nbsp;&nbsp;&nbsp;‚îú‚îÄ‚îÄ **‚¨ú MC** ‚Üí group stats ‚Üí bar chart  
&nbsp;&nbsp;&nbsp;&nbsp;‚îî‚îÄ‚îÄ **‚¨ú Short answer** ‚Üí word freq ‚Üí word cloud



# Step 0. Files and setup

### You are here
- üîµ **Raw D2L CSV**
- ‚¨ú Inspect & clean
- ‚¨ú Add group info (merge)
- ‚¨ú Split by question type

In this step, we:
- Load the raw D2L CSV file
- Take a first look at what the data looks like

üëâ Goal: *Understand what we are working with before touching the data.*



In [1]:
# code goes here


# Step 1. Inspect & clean data

### Workflow status
- ‚úÖ Raw D2L CSV
- üîµ **Inspect & clean**
- ‚¨ú Add group info (merge)
- ‚¨ú Split by question type

In this step, we:
- Inspect columns and basic structure
- Clean obvious issues (extra spaces, column names, unnecessary columns)

üëâ Goal: *Make the data reliable for later steps.*



In [None]:
# code goes here


# Step 2. Add group information (merge)

### Workflow status
- ‚úÖ Raw D2L CSV
- ‚úÖ Inspect & clean
- üîµ **Add group info (merge)**
- ‚¨ú Split by question type

In this step, we:
- Load a separate group roster
- Merge group information into the main dataset

üëâ Goal: *Add meaningful context (groups) to the data.*



In [None]:
# code goes here


# Step 3. Split by question type

### Workflow status
- ‚úÖ Raw D2L CSV
- ‚úÖ Inspect & clean
- ‚úÖ Add group info (merge)
- üîµ **Split by question type**

In this step, we:
- Separate multiple-choice questions from short-answer questions
- Prepare different analysis paths for different data types

üëâ Goal: *Different data types need different analysis strategies.*



In [None]:
# code goes here


# Step 4A. Multiple-choice questions: group stats & visualization

### Workflow status
- ‚úÖ Raw D2L CSV
- ‚úÖ Inspect & clean
- ‚úÖ Add group info (merge)
- ‚úÖ Split by question type
  - üîµ **MC ‚Üí group stats ‚Üí bar chart**
  - ‚¨ú Short answer ‚Üí word freq ‚Üí word cloud

In this step, we:
- Calculate simple statistics by group
- Create a basic bar chart to compare groups

üëâ Goal: *Use simple statistics to answer a teaching or research question.*



In [None]:
# code goes here


# Step 4B. Short-answer questions: text exploration (optional)

### Workflow status
- ‚úÖ Raw D2L CSV
- ‚úÖ Inspect & clean
- ‚úÖ Add group info (merge)
- ‚úÖ Split by question type
  - ‚¨ú MC ‚Üí group stats ‚Üí bar chart
  - üîµ **Short answer ‚Üí word freq ‚Üí word cloud**

In this step, we:
- Explore common words or phrases in open-ended responses
- Visualize themes using word frequency or a word cloud

üëâ Goal: *Get a quick, exploratory sense of what students are saying.*

*(Optional ‚Äî skip if time is limited.)*



# Wrap-up: Adapting this workflow to your own data

We‚Äôve walked through a complete workflow:
- From raw LMS data
- To cleaned, combined, and analyzed results

Think about:
- Which steps are essential for your own project?
- Where might you stop, simplify, or extend the workflow?
- How could AI tools help you modify this code safely?

üëâ The goal is not to memorize code,  
but to **read, understand, and adapt workflows**.
