# Week 6 Lab: Mid-term Project Development Template

<a href="https://colab.research.google.com/github/bradleyboehmke/uc-bana-4080/blob/main/labs/06_midterm_project_template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Welcome to your **Mid-term Project Development Lab**! Today's session is entirely dedicated to helping you and your team organize, plan, and begin your mid-term project analysis for **Regork Grocery Chain**.

This notebook serves as both a **planning template** and a **starting framework** for your final analysis. Use it to brainstorm ideas, explore the dataset, and organize your approach.

## 🎯 Learning Objectives
By the end of this lab, you will be able to:
- Form or finalize your project team (2-4 students)
- Explore the Complete Journey dataset and understand its structure
- Develop and refine a focused business question for your analysis
- Plan your analytical approach and identify key datasets to join

## 📚 This Lab Reinforces
- **Mid-term Project Requirements** (Canvas assignment description)
- **Data joining and preparation** (Modules 3-4)
- **Exploratory data analysis and visualization** (Module 5)
- **Programming fundamentals** (Modules 1-6)

## 🕐 Estimated Time & Structure
**Total Time:** 75 minutes  
**Mode:** Group collaboration

- **[0–10 min]** Team formation and introductions
- **[10–30 min]** Dataset exploration and familiarization
- **[30–50 min]** Business question development and refinement
- **[50–70 min]** Planning analytical approach and next steps
- **[70–75 min]** Team coordination and logistics

You are **required** to work in teams of **2–4 students** for this project.

## 💡 Why This Matters
This project simulates a **real-world analytics engagement** where you're asked to identify growth opportunities for a retail business. In the professional world, data scientists regularly work with stakeholders to:

- Translate broad business goals into specific analytical questions
- Explore unfamiliar datasets to understand available information
- Design analytical approaches that provide actionable insights
- Communicate findings to business leaders in accessible language

Your analysis will help **Regork's CEO** make informed decisions about where to invest company resources for growth.

## Setup and Data Loading

First, let's import the necessary libraries and load the Complete Journey dataset to understand what data is available for your analysis.

In [None]:
# Required imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from completejourney_py import get_data

# Suppress warnings for cleaner output
import warnings
warnings.filterwarnings('ignore')

# Set up plotting
plt.style.use('default')
sns.set_palette("husl")

print("✅ Libraries loaded successfully!")

In [None]:
# Load the Complete Journey dataset
# This may take a moment to download on first use
print("Loading Complete Journey dataset...")
data = get_data()

# See what datasets are available
print("\n📊 Available datasets:")
for key in data.keys():
    print(f"  - {key}: {data[key].shape[0]:,} rows × {data[key].shape[1]} columns")

print("\n✅ Dataset loaded successfully!")

## Part 1 — Team Formation & Dataset Exploration (20 minutes)

### Team Formation Checklist

**✅ Complete these steps with your team:**

1. **Introduce yourselves** - Share names, backgrounds, and any relevant experience
2. **Exchange contact information** - Decide on your primary communication method
3. **Assign roles** (optional but helpful):
   - **Project coordinator** - Manages timeline and submissions
   - **Data lead** - Focuses on data preparation and joining
   - **Analysis lead** - Focuses on EDA and insights
   - **Presentation lead** - Focuses on communication and visuals
4. **Join your Canvas group** - Make sure everyone is in the same pre-defined group

### 🧠 Your Turn — Dataset Exploration

**Tasks:**
- Examine the structure and content of each available dataset
- Identify which datasets might be useful for different types of business questions
- Look for key variables that could be used to join datasets together

💡 **Hint:** The `transactions` dataset is typically central to most analyses, but you'll need to join it with other datasets to get richer insights.

In [None]:
# Explore the transactions dataset (usually the core dataset)
transactions = data['transactions']
print("🛒 TRANSACTIONS Dataset")
print(f"Shape: {transactions.shape}")
print(f"\nColumns: {list(transactions.columns)}")
print("\nFirst few rows:")
print(transactions.head())

In [None]:
# Explore other key datasets
# Add code here to examine products, demographics, campaigns, etc.
# Example:

# products = data['products']
# print("🥫 PRODUCTS Dataset")
# print(f"Shape: {products.shape}")
# print(f"Columns: {list(products.columns)}")
# print(products.head())

# Your exploration code here:


### ✅ Team Discussion: Dataset Understanding

**Discuss with your team:**
- What information is available in each dataset?
- Which datasets would you need to join to answer different types of business questions?
- What are the key variables for joining datasets (customer IDs, product codes, etc.)?

**Document your observations:**

**Team Notes - Dataset Observations:**

*Use this space to document your team's observations about the available data*

- Key datasets for our analysis: 
- Important variables for joining: 
- Data quality observations: 
- Potential limitations: 

## Part 2 — Business Question Development (20 minutes)

Now that you understand the available data, it's time to develop your **focused business question**. Remember: you're helping Regork's CEO identify a **growth opportunity**.

### Business Question Categories

Here are the main categories with example questions to inspire your thinking:

**🎯 Customer Segments & Demographics:**
- Are certain demographic groups underrepresented in specific product categories?
- Which customer segments have the highest growth potential?
- Are there untapped demographics for premium products?

**📈 Trends Over Time:**
- Do purchasing patterns shift seasonally or around holidays?
- Which products show the strongest growth trends?
- Are there declining categories that need intervention?

**🛒 Product Relationships:**
- Which products are frequently purchased together?
- Are there cross-selling opportunities being missed?
- Do certain product combinations drive higher basket values?

**💰 Promotions & Marketing:**
- Which promotions drive the most revenue uplift?
- Are certain campaigns more successful for specific customer groups?
- How can promotional strategies be optimized?

### 🧪 Team Exercise — Question Brainstorming

**Brainstorming Process:**
1. Each team member suggests 2-3 potential business questions
2. Discuss the feasibility of each question with available data
3. Consider which questions would provide the most actionable insights for a CEO
4. Narrow down to your top 2-3 questions for further exploration

**Team Brainstorming Notes:**

*Document your team's brainstorming session here*

**Potential Business Questions:**

1. 
2. 
3. 
4. 
5. 

**Top 2-3 Questions for Further Exploration:**

1. 
2. 
3. 

**Rationale for Selection:**
- Why these questions matter to the CEO:
- How available data supports these questions:
- Expected actionability of insights:

## Part 3 — Quick Data Exploration for Question Validation (20 minutes)

Before finalizing your business question, let's do some quick exploration to validate that your chosen questions are feasible with the available data.

### 🔍 Validation Exercise

For each of your top 2-3 questions, perform basic data exploration to confirm:
- The necessary data exists
- You can successfully join relevant datasets
- There's sufficient data volume for meaningful analysis

In [None]:
# Validation exploration for Question 1
# Example: If exploring customer segments and product categories

# Your exploration code here:
# - Check data availability
# - Test key joins
# - Assess data volume and quality


In [None]:
# Validation exploration for Question 2

# Your exploration code here:


In [None]:
# Validation exploration for Question 3

# Your exploration code here:


### ✅ Question Selection & Refinement

Based on your validation exploration, refine and finalize your business question.

**FINAL BUSINESS QUESTION:**

*Write your team's final, refined business question here*


**WHY THIS QUESTION MATTERS TO REGORK'S CEO:**


**DATA VALIDATION RESULTS:**
- Required datasets: 
- Key joins needed: 
- Data availability confirmed: ✅/❌
- Sufficient data volume: ✅/❌

## Part 4 — Analytical Approach Planning (15 minutes)

Now plan your analytical approach. A strong analysis should:
- Tell a clear narrative that ties findings back to the business question
- Use appropriate visuals that enhance the story
- End with specific, actionable recommendations for the CEO

### Planning Template

**ANALYTICAL APPROACH PLAN:**

**1. Data Preparation Steps:**
- Datasets to load:
- Key joins to perform:
- Data cleaning needed:
- New variables to create:

**2. Exploratory Data Analysis Plan:**
- Key metrics to calculate:
- Visualizations needed:
- Patterns to explore:
- Comparisons to make:

**3. Deeper Analysis Questions:**
- Sub-questions to explore:
- Segments to analyze separately:
- Time periods to compare:
- Additional angles to investigate:

**4. Expected Insights & Recommendations:**
- What insights do you expect to find?
- What actionable recommendations might you make?
- How will this help the CEO make decisions?

**5. Success Criteria:**
- How will you know if your analysis is successful?
- What would make a compelling recommendation?

## Part 5 — Project Organization & Next Steps (10 minutes)

### Team Coordination

**📅 Timeline & Task Assignment:**
- Project due: End of Week 7 (11 days from today)
- Intermediate milestones?
- Who will work on what components?
- When will you meet again?

**🔧 Technical Setup:**
- How will you share code? (GitHub, Google Colab, shared drive?)
- Who will create the initial notebook structure?
- How will you handle version control?

**📊 Deliverable Planning:**
- Who will focus on the written report?
- Who will create the presentation?
- How will you ensure consistency between deliverables?

**TEAM COORDINATION PLAN:**

**Team Members & Contact Info:**
- 
- 
- 
- 

**Task Assignments:**
- Data preparation & joining: 
- Exploratory data analysis: 
- Visualization creation: 
- Report writing: 
- Presentation creation: 
- Final review & submission: 

**Timeline & Milestones:**
- Next team meeting: 
- Data prep completion target: 
- Analysis completion target: 
- Draft report completion: 
- Final submission preparation: 

**Technical Setup:**
- Code sharing method: 
- Primary notebook location: 
- Communication platform: 

## 🎯 Immediate Next Steps

**Before you leave today:**
1. ✅ Finalize your team membership in Canvas
2. ✅ Exchange contact information 
3. ✅ Agree on your business question
4. ✅ Set up your shared workspace
5. ✅ Schedule your next team meeting

**This week (before next Tuesday):**
- Begin data preparation and joining
- Start exploratory data analysis
- Refine your analytical approach

**Next week:**
- Complete analysis and insights
- Write final report
- Create and record presentation
- Submit both deliverables

---

## 📋 Rubric Reminders

**For Full Points, Ensure Your Analysis Includes:**

**Report (40 points):**
- Clear introduction explaining the business problem and approach
- All libraries imported upfront with explanations
- Thorough EDA that uncovers non-obvious insights
- Professional summary with actionable recommendations
- Clean, well-commented code that executes without errors

**Presentation (35 points):**
- 3 minutes maximum, business-focused (no code)
- Clear problem statement and methodology
- Well-designed visuals that support key findings
- Specific, actionable recommendations for the CEO
- Professional delivery and compelling narrative

---

## 🚨 Common Pitfalls to Avoid

- **Vague business questions** - Be specific about what you're investigating
- **Insufficient data joining** - Use at least 2 datasets meaningfully
- **Shallow analysis** - Go beyond basic summaries to uncover insights
- **Missing business context** - Always tie findings back to business value
- **Technical presentation** - Keep the presentation business-focused
- **Poor time management** - Start early, don't wait until the last minute

**💾 Save this notebook and share it with your team** - Use it as a starting point for your analysis!

---

## Template for Your Final Analysis

*The structure below can serve as a starting template for your final analysis notebook:*

```markdown
# Regork Grocery Chain: [Your Business Question]
## Team: [Team Member Names]

### Executive Summary
[Brief overview of question, approach, and key findings]

### 1. Introduction
[Business problem, why it matters to CEO, proposed solution]

### 2. Data & Methodology
[Datasets used, joining approach, analytical methods]

### 3. Data Preparation
[Import, clean, join, create new variables]

### 4. Exploratory Data Analysis
[Key insights, patterns, visualizations]

### 5. Deep Dive Analysis
[Detailed investigation of your business question]

### 6. Summary & Recommendations
[Key findings, business implications, actionable next steps]

### 7. Limitations & Future Work
[Analysis limitations, potential improvements]
```