<a href="https://colab.research.google.com/github/bulut19/mgmt467-analytics-portfolio/blob/main/Week2_1_Prompt_Practice.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Week 2.1 — Prompt Practice: Git, GitHub, and Google Colab

**Course:** MGMT 467 — AI‑Assisted Big Data Analytics in the Cloud  
**Session:** Tuesday (2.1) — Developer Environment Setup

### How to use this notebook
- This is a **practice and planning** notebook: most cells are **Markdown** with copy‑pasteable prompt templates you will run in your AI tool (e.g., Gemini).  
- After you run a prompt in your AI tool, **summarize what you learned** in the provided **Reflection** cells here.  
- When a task asks for a short code snippet (e.g., Git or Colab), paste the **final, validated** snippet in the designated cell and add a one‑sentence explanation.

> **Validate everything.** Cross‑check AI outputs with official docs or a second prompt. If two sources disagree, note it and explain which you chose and why.



---
## Prompt Patterns Quick Reference

Use these as starting points and **adapt** them to your context.

### 1) Zero‑Shot (definition/explanation)
```
Act as a clear, concise tutor for first‑year CS students.
Explain {TOPIC} in 5 bullet points max. Include one analogy and one pitfall to avoid.
```

### 2) Few‑Shot (guided answers consistent with examples)
```
You will answer in the same style as the examples.

Q: What is a "commit" in Git?  
A: A snapshot of tracked file changes with a message explaining why.

Q: What is "pushing" in Git?  
A: Sending local commits to a remote repository so others can see them.

Q: {YOUR QUESTION}
A:
```

### 3) Step‑by‑Step Reasoning (show key steps)
```
I need a **numbered, step‑by‑step plan** for {TASK}.
For each step: the goal, one command (if applicable), and a 1‑line verification check.
Avoid hidden steps; keep it to 6–8 steps total.
```



---
## Group A — Git Fundamentals (3 questions)

### A1. What problem does Git solve? How is it different from file syncing?
**Use:** Zero‑Shot, then Few‑Shot for refinement.  
**Run this prompt:**
```
Act as a version control coach.
Explain what Git is and the specific problem it solves compared to simple file syncing (e.g., Drive).
List 3 concrete benefits for a small analytics team.
End with a 2‑sentence analogy.
```
**Reflection (2–4 sentences):** What did you learn that you didn’t already know?

I learnt that unlike file syncing services like Google Drive or Dropbox, Git creates a detailed timeline of who changed what, when, and why, while  merging different people's work together, rather than just syncing the latest version. I also learnt some new terminology like a branch that is a parallel timeline where you can make changes without affecting the main codebase and staging that is Sslecting which modified files you want to include in your next commit.


### A2. Commit → Branch → Merge: the minimal workflow
**Use:** Step‑by‑Step Reasoning.  
**Run this prompt:**
```
Create a minimal, step‑by‑step workflow to:
1) initialize a repo, 2) create and switch to a feature branch, 3) commit changes,
4) merge back to main locally, 5) push to a remote named "origin".
For each step include: goal, command, and a quick verification.
```
**Paste final validated commands below and add one sentence on when to branch.**


In [1]:
# Step 1: Initialize Repository
# Goal: Create a new Git repository in current folder
# Command: git init
# Verification: git status (shows "On branch main, no commits yet")

# Step 2: Create and Switch to Feature Branch
# Goal: Make a new branch for your feature work
# Command: git checkout -b feature/readme-polish
# Verification: git branch (shows * next to feature/readme-polish)

# Step 3: Commit Changes
# Goal: Save your work with a descriptive message
# Command:
# git add README.md
# git commit -m "Clarify setup steps"
# Verification: git log --oneline (shows commit message)

# Step 4: Merge Back to Main Locally
# Goal: Combine feature work into main branch
# Command:
# git checkout main
# git merge feature/readme-polish --no-ff
# Verification: git log --oneline (shows feature commit now on main)

# Step 5: Push to Remote Origin
# Goal: Send local changes to shared repository
# Command:
# git remote add origin <REMOTE_URL>
# git push -u origin main
# Verification: git status (shows "Your branch is up to date with 'origin/main'")


### A3. Resolving a simple merge conflict
**Use:** Step‑by‑Step Reasoning.  
**Run this prompt:**
```
I have a merge conflict in README.md after merging a feature branch into main.
Give a 6-step recipe to resolve it safely:
- how to open the file, identify conflict markers, choose/merge lines,
- add/commit the resolution, verify the merge, and push.
Include one common pitfall and how to avoid it.
```
**Reflection:** What’s your personal checklist to avoid conflicts getting messy?

Before Working
- Pull latest - `git pull origin main`
- Create feature branch - `git checkout -b feature/name`

During Development
- Commit frequently - `git add .` then `git commit -m "descriptive message"`
- Stay synced - `git checkout main`, `git pull origin main`, `git checkout feature/name`

Before Merging
- Sync and test - `git checkout main`, `git pull origin main`, then test your changes
- Clean merge - `git merge feature/name --no-ff`

Conflict Resolution
- Find markers - Look for `<<<<<<<`, `=======`, `>>>>>>>` in conflicted files
- Choose/combine - Edit file to keep what you want, delete all markers
- Test and commit - `git add filename`, `git commit -m "Resolve merge conflict"`

Escape Hatch
- Abort if stuck - `git merge --abort` to cancel and start over


---
## Group B — GitHub Collaboration (3 questions)

### B1. Branch vs. Fork vs. Clone
**Use:** Few‑Shot to drive crisp distinctions with examples.  
**Run this prompt:**
```
Answer using this format:
Term — One-sentence definition — When to use — One example.

Branch —
Fork —
Clone —
```
**Reflection:** Which one will your team use for this course and why?

Our team will use clone so that everyone gets a local copy of the shared team repository to work on their machine and stay synchronized with team changes. We will also use branch for each team member to create separate development paths for different pipeline components without causing conflicts. We won't need fork because that creates independent copies under different accounts, adding unnecessary complexity for a single team project where you'd need pull requests between different repositories instead of simple merges within one shared repo.



### B2. Pull Request (PR) checklist for this course
**Use:** Step‑by‑Step Reasoning.  
**Run this prompt:**
```
Write a "PR Checklist" for a university analytics course team repo.
Include: naming convention, description template, screenshots policy, reviewers, CI checks (if any),
and a revert plan. Limit to 8 concise checklist items.
```
**Paste your final checklist below.**


In [2]:
pr_checklist = [
    "PR title: `<unit>-<component>-<desc>` (e.g., `u2-preprocessing-outlier-detection`)",
    "Description includes: problem, approach, key files, and how to test",
    "Attach 1-2 screenshots of plots/dashboards if visual outputs changed",
    "Link assignment requirement or issue number being addressed",
    "Request review from 1 teammate; no self-merging allowed",
    "Passes full notebook execution: `Runtime > Run all` completes without errors",
    "No API keys, passwords, or personal data visible in code/outputs",
    "Revert plan: document which commits to revert if pipeline breaks"
]
pr_checklist

['PR title: `<unit>-<component>-<desc>` (e.g., `u2-preprocessing-outlier-detection`)',
 'Description includes: problem, approach, key files, and how to test',
 'Attach 1-2 screenshots of plots/dashboards if visual outputs changed',
 'Link assignment requirement or issue number being addressed',
 'Request review from 1 teammate; no self-merging allowed',
 'Passes full notebook execution: `Runtime > Run all` completes without errors',
 'No API keys, passwords, or personal data visible in code/outputs',
 'Revert plan: document which commits to revert if pipeline breaks']


### B3. Protected `main` workflow
**Use:** Zero‑Shot + Step‑by‑Step.  
**Run this prompt:**
```
Explain how to protect the main branch in a GitHub repo for a class team:
- Require PRs, at least one review, and passing checks
- Disallow force-pushes
Provide a numbered setup guide and a 3-line "why this matters" explanation.
```
**Reflection:** Which protection rules will you actually enable first, and why?

- Require a pull request before merging: This is the foundation that forces all changes through the review process instead of direct commits to main.
- Require approvals (minimum 1): Ensures someone else looks at code before it goes live, catching bugs and ensuring knowledge sharing across the team.
- Restrict pushes that create files + Disallow force pushes: Prevents accidental deletion of team's work history and protects against destructive operations.



---
## Group C — Google Colab for Analytics (3 questions)

### C1. Why Colab? Benefits & limits for this course
**Use:** Zero‑Shot.  
**Run this prompt:**
```
Act as a data science tech advisor.
List 5 advantages and 3 limitations of Google Colab for analytics coursework.
Tailor to a class that uses BigQuery and dashboards. Keep it to bullet points.
```
**Reflection:** Which two advantages will help *you* most this semester?

Real-time collaboration and seamless BigQuery integration would help me the most this semester. This is because real-time collaboration lets multiple team members work simultaneously on the same notebook during data exploration and model building, eliminating coordination headaches during team projects. BigQuery integration removes database setup barriers, allowing the team to immediately query real datasets and focus on solving business problems rather than wrestling with connection configurations.


### C2. Authenticate to GCP in Colab and query BigQuery
**Use:** Step‑by‑Step Reasoning for a minimal working snippet.  
**Run this prompt:**
```
Provide a minimal Colab snippet to:
1) authenticate to Google Cloud,
2) run a simple BigQuery SQL (e.g., SELECT 1),
3) get results into a pandas DataFrame,
4) print row count.
Include a one-line note on costs and safe use of LIMIT.
```
**Paste your final validated code below.**


In [3]:
# Authenticate to Google Cloud
# from google.colab import auth
# auth.authenticate_user()

# Setup BigQuery client and run simple SQL
# from google.cloud import bigquery
# import pandas as pd
# client = bigquery.Client(project="<YOUR_PROJECT_ID>")
# sql = "SELECT 1 AS test_col"

# Get results into pandas DataFrame
# df = client.query(sql).result().to_dataframe()

# Print row count and preview
# print("Rows:", len(df))
# df.head()

# Cost note: BigQuery charges by data scanned - always use LIMIT in exploratory queries to avoid unexpected bills



### C3. Save notebooks to GitHub from Colab
**Use:** Step‑by‑Step Reasoning.  
**Run this prompt:**
```
Give two safe workflows to keep Colab notebooks versioned in GitHub:
(A) using "File > Save a copy in GitHub",
(B) local git with Drive sync (brief).
Provide steps and cautions (e.g., large outputs, secrets) for each.
```
**Reflection:** Which workflow will your team adopt and why?

My team will adopt Workflow A (File > Save a copy in GitHub) because it eliminates the manual sync complexity that would cause coordination problems in our project. With Workflow B, we'd likely forget to download the latest version from Drive before making changes, leading to conflicting notebook versions and lost work. Workflow A integrates directly with our GitHub repository and reduces friction, making it much more likely we'll actually use version control consistently throughout the semester rather than abandoning it when deadlines get tight.

Workflow A: File > Save a copy in GitHub

Steps:
- Complete your analysis in Colab
- Clear all outputs: Runtime > Factory reset runtime then Edit > Clear all outputs
- Go to File > Save a copy in GitHub
- Select repository, branch, and add commit message
- GitHub saves the .ipynb file directly to your repo


---
## Capstone Synthesis (end of class)

**Scenario:** Your team needs a reproducible workflow for this course: team repo on GitHub, branching, Colab auth to BigQuery, and a PR checklist.

**Run this prompt:**
```
Act as a DevEx lead for a university analytics team.
Produce a one-page "Runbook" with:
- Repo structure (folders for notebooks, data, dashboards, docs)
- Branching model (who creates branches, when to merge)
- Colab ↔ BigQuery quickstart (auth, sample query, cost-safe LIMIT)
- PR checklist (max 8 bullets) and protection rules for main
- Two risks + mitigations (e.g., secrets leakage, merge conflicts)
Use concise bullets and keep it classroom-ready.
```

**Paste your final runbook below (or attach as a Markdown file in your repo) and add a 3‑bullet reflection on what you changed after validation.**

**Analytics Team Runbook**

**Repository Structure**  
analytics-pipeline/  
├── notebooks/ (Jupyter/Colab analysis files)  
├── data/ (Sample datasets < 10MB only)  
├── dashboards/ (Streamlit/Plotly dashboard code)  
├── docs/ (Project documentation, reports)  
├── sql/  (BigQuery scripts and queries)  
└── README.md  (Setup instructions and overview)

**Branching Model**

- Main branch: Stable, working code only; protected with PR requirements
- Feature branches: Each team member creates feature/component-name for their work
- When to branch: Start new branch for each major pipeline component or analysis
- When to merge: After peer review and successful notebook execution
- Branch cleanup: Delete feature branches after successful merge to main

**Colab ↔ BigQuery Quickstart**
```
# Authentication
from google.colab import auth
auth.authenticate_user()

# Query setup
from google.cloud import bigquery
client = bigquery.Client(project="YOUR_PROJECT_ID")
sql = "SELECT column1, column2 FROM dataset.table LIMIT 1000"

# Get results
df = client.query(sql).result().to_dataframe()
print(f"Rows: {len(df)}")
```

Cost Safety: Always use LIMIT 1000 for exploration - BigQuery charges by data scanned, not rows returned

**PR Checklist**
- Title format: unit-component-description (e.g., u2-preprocessing-outliers)
- Description includes problem, approach, key files, testing steps
- Screenshots attached if visualizations changed
- All outputs cleared: Edit > Clear all outputs
- No API keys, tokens, or personal data in code
- Notebook runs end-to-end: Runtime > Run all passes
- One teammate review required before merge
- Link to assignment requirement or issue number

**Main Branch Protection Rules**
- Require pull requests before merging
- Require at least 1 approval from teammate
- Disallow force pushes and direct commits
- Enable "Restrict pushes that create files"

**Risk Management**  
Risk 1: Secrets Leakage
- Problem: API keys or credentials accidentally committed to public repo
Mitigation: Use environment variables, clear outputs before commits, add .env to .gitignore  

Risk 2: Merge Conflicts in Notebooks
- Problem: JSON format makes notebook conflicts nearly impossible to resolve manually
Mitigation: Work on separate notebooks per team member, merge frequently, use git merge --abort if conflicts occur

Keep this runbook handy during team development - print or bookmark for quick reference!

**Reflection**

- Added SQL folder: Included sql/ directory in repo structure since the team will be working extensively with BigQuery queries
- Enhanced risk mitigations: Made the secrets mitigation more actionable by specifying environment variables and .gitignore usage
- Added project context - Included project="YOUR_PROJECT_ID" in BigQuery setup


---
## Submission Checklist (to your team repo + Brightspace link)

- [ ] All **Reflection** sections completed (A1–A3, B1–B3, C1–C3, Capstone).
- [ ] Any code snippets pasted are **validated** and include a 1‑line explanation.
- [ ] Notebook runs top‑to‑bottom without errors (where code cells exist).
- [ ] Commit message: `week2.1-prompt-practice` and open a PR for review.
- [ ] Add this notebook path to your repo **README.md** under Week 2.1.
