<a href="https://colab.research.google.com/github/QianyueWang0212/mgmt467-analytics-portfolio/blob/main/Labs/Lab1_AI_Assisted_SQL.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Week 2.1 — Prompt Practice: Git, GitHub, and Google Colab

**Course:** MGMT 467 — AI‑Assisted Big Data Analytics in the Cloud  
**Session:** Tuesday (2.1) — Developer Environment Setup

### How to use this notebook
- This is a **practice and planning** notebook: most cells are **Markdown** with copy‑pasteable prompt templates you will run in your AI tool (e.g., Gemini).  
- After you run a prompt in your AI tool, **summarize what you learned** in the provided **Reflection** cells here.  
- When a task asks for a short code snippet (e.g., Git or Colab), paste the **final, validated** snippet in the designated cell and add a one‑sentence explanation.

> **Validate everything.** Cross‑check AI outputs with official docs or a second prompt. If two sources disagree, note it and explain which you chose and why.



---
## Prompt Patterns Quick Reference

Use these as starting points and **adapt** them to your context.

### 1) Zero‑Shot (definition/explanation)
```
Act as a clear, concise tutor for first‑year CS students.
Explain {TOPIC} in 5 bullet points max. Include one analogy and one pitfall to avoid.
```

### 2) Few‑Shot (guided answers consistent with examples)
```
You will answer in the same style as the examples.

Q: What is a "commit" in Git?  
A: A snapshot of tracked file changes with a message explaining why.

Q: What is "pushing" in Git?  
A: Sending local commits to a remote repository so others can see them.

Q: {YOUR QUESTION}
A:
```

### 3) Step‑by‑Step Reasoning (show key steps)
```
I need a **numbered, step‑by‑step plan** for {TASK}.
For each step: the goal, one command (if applicable), and a 1‑line verification check.
Avoid hidden steps; keep it to 6–8 steps total.
```



---
## Group A — Git Fundamentals (3 questions)

### A1. What problem does Git solve? How is it different from file syncing?
**Use:** Zero‑Shot, then Few‑Shot for refinement.  
**Run this prompt:**
```
Act as a version control coach.
Explain what Git is and the specific problem it solves compared to simple file syncing (e.g., Drive).
List 3 concrete benefits for a small analytics team.
End with a 2‑sentence analogy.
```
**Reflection (2–4 sentences):** What did you learn that you didn’t already know?


I learned about what is the true methology of Git. Especially for the branching meaning for customizing your own project without affecting the main idea.


### A2. Commit → Branch → Merge: the minimal workflow
**Use:** Step‑by‑Step Reasoning.  
**Run this prompt:**
```
Create a minimal, step‑by‑step workflow to:
1) initialize a repo, 2) create and switch to a feature branch, 3) commit changes,
4) merge back to main locally, 5) push to a remote named "origin".
For each step include: goal, command, and a quick verification.
```
**Paste final validated commands below and add one sentence on when to branch.**


When to branch: You should branch whenever you are starting work on a new feature, bug fix, or experiment that is separate from the main stable version of your project. This keeps the main branch clean and allows you to work in isolation.

In [2]:
# git add .
# git commit -m "Descriptive commit message"
# git checkout main
# git merge <your-feature-branch-name>


In [3]:

# Paste your validated minimal Git workflow commands here as comments, e.g.:
# git init
# git checkout -b feature/readme-polish
# git add README.md
# git commit -m "Clarify setup steps"
# git checkout main
# git merge feature/readme-polish --no-ff
# git remote add origin <REMOTE_URL>
# git push -u origin main



### A3. Resolving a simple merge conflict
**Use:** Step‑by‑Step Reasoning.  
**Run this prompt:**
```
I have a merge conflict in README.md after merging a feature branch into main.
Give a 6-step recipe to resolve it safely:
- how to open the file, identify conflict markers, choose/merge lines,
- add/commit the resolution, verify the merge, and push.
Include one common pitfall and how to avoid it.
```
**Reflection:** What’s your personal checklist to avoid conflicts getting messy?


1. Make sure that your file is under the proper path.

2. Distinguish that the content that is from current branch or merging branch.

3. Keep changes small. Commit and push frequently to reduce overlap.

4. Communicate with team. Let teammates know when I am editing shared files.

5. Test after merging


---
## Group B — GitHub Collaboration (3 questions)

### B1. Branch vs. Fork vs. Clone
**Use:** Few‑Shot to drive crisp distinctions with examples.  
**Run this prompt:**
```
Answer using this format:
Term — One-sentence definition — When to use — One example.

Branch —
Fork —
Clone —
```
**Reflection:** Which one will your team use for this course and why?


For this course, our team will mainly use branches. Each teammate can clone the shared repository once and then create a branch for their own feature or task. This prevents conflicts and keeps the main branch stable while still allowing collaboration.


### B2. Pull Request (PR) checklist for this course
**Use:** Step‑by‑Step Reasoning.  
**Run this prompt:**
```
Write a "PR Checklist" for a university analytics course team repo.
Include: naming convention, description template, screenshots policy, reviewers, CI checks (if any),
and a revert plan. Limit to 8 concise checklist items.
```
**Paste your final checklist below.**


In [4]:
pr_checklist = [
"PR Title: Use the convention <unit>-<lab>-<short-desc> (e.g., u1-lab2-eda-trends)",
"Description: Include the problem being solved, the approach taken, key files changed, and how to test the changes",
"Screenshots: Attach 1–2 relevant screenshots (e.g., plots, dashboard views) if visuals were modified or added",
"Related Items: Link to the related assignment requirement, issue, or task being addressed",
"Reviewers: Request a review from at least one teammate; do not self-merge",
"Testing: Ensure the notebook or script runs end-to-end without errors (e.g., using Runtime > Run all in Colab)",
"Security/Privacy: Verify no secrets, API keys, tokens, or personally identifiable information (PII) are present in the code or notebook outputs",
"Revert Plan: Briefly note how to quickly revert the changes if the PR introduces unexpected issues after merging"]
pr_checklist

['PR Title: Use the convention <unit>-<lab>-<short-desc> (e.g., u1-lab2-eda-trends)',
 'Description: Include the problem being solved, the approach taken, key files changed, and how to test the changes',
 'Screenshots: Attach 1–2 relevant screenshots (e.g., plots, dashboard views) if visuals were modified or added',
 'Related Items: Link to the related assignment requirement, issue, or task being addressed',
 'Reviewers: Request a review from at least one teammate; do not self-merge',
 'Testing: Ensure the notebook or script runs end-to-end without errors (e.g., using Runtime > Run all in Colab)',
 'Security/Privacy: Verify no secrets, API keys, tokens, or personally identifiable information (PII) are present in the code or notebook outputs',
 'Revert Plan: Briefly note how to quickly revert the changes if the PR introduces unexpected issues after merging']

In [5]:

# Example (edit to your team's needs)
pr_checklist = [
    "PR title: <unit>-<lab>-<short-desc> (e.g., u1-lab2-eda-trends)",
    "Description includes: problem, approach, key files, and how to test",
    "Attach 1–2 screenshots (plots/dashboards) if visuals changed",
    "Link related issue or assignment requirement",
    "Request review from 1 teammate; no self-merge",
    "Passes notebook re-run without errors (Runtime > Run all)",
    "No secrets, tokens, or PII in code or outputs",
    "Revert plan: how to roll back quickly if needed"
]
pr_checklist


['PR title: <unit>-<lab>-<short-desc> (e.g., u1-lab2-eda-trends)',
 'Description includes: problem, approach, key files, and how to test',
 'Attach 1–2 screenshots (plots/dashboards) if visuals changed',
 'Link related issue or assignment requirement',
 'Request review from 1 teammate; no self-merge',
 'Passes notebook re-run without errors (Runtime > Run all)',
 'No secrets, tokens, or PII in code or outputs',
 'Revert plan: how to roll back quickly if needed']


### B3. Protected `main` workflow
**Use:** Zero‑Shot + Step‑by‑Step.  
**Run this prompt:**
```
Explain how to protect the main branch in a GitHub repo for a class team:
- Require PRs, at least one review, and passing checks
- Disallow force-pushes
Provide a numbered setup guide and a 3-line "why this matters" explanation.
```
**Reflection:** Which protection rules will you actually enable first, and why?


Require pull requests before merging. This will be my first rule because it stops direct pushes and ensures every change is reviewed, which is the foundation of safe collaboration.


---
## Group C — Google Colab for Analytics (3 questions)

### C1. Why Colab? Benefits & limits for this course
**Use:** Zero‑Shot.  
**Run this prompt:**
```
Act as a data science tech advisor.
List 5 advantages and 3 limitations of Google Colab for analytics coursework.
Tailor to a class that uses BigQuery and dashboards. Keep it to bullet points.
```
**Reflection:** Which two advantages will help *you* most this semester?


The two advantages that will help me most are Google Cloud integration and easy sharing & collaboration.

Seamless BigQuery integration allows me to query large datasets directly from Colab without extra setup, which is critical for our analytics coursework. Easy sharing makes it simple to get feedback from teammates and instructors, ensuring smoother group projects and faster troubleshooting.


### C2. Authenticate to GCP in Colab and query BigQuery
**Use:** Step‑by‑Step Reasoning for a minimal working snippet.  
**Run this prompt:**
```
Provide a minimal Colab snippet to:
1) authenticate to Google Cloud,
2) run a simple BigQuery SQL (e.g., SELECT 1),
3) get results into a pandas DataFrame,
4) print row count.
Include a one-line note on costs and safe use of LIMIT.
```
**Paste your final validated code below.**


In [6]:

# Minimal BigQuery test in Colab (paste your validated version)
# from google.colab import auth
# auth.authenticate_user()
#
# from google.cloud import bigquery
# client = bigquery.Client(project="<YOUR_PROJECT_ID>")
# sql = "SELECT 1 AS test_col"
# df = client.query(sql).result().to_dataframe()
# print("Rows:", len(df))
# df.head()



### C3. Save notebooks to GitHub from Colab
**Use:** Step‑by‑Step Reasoning.  
**Run this prompt:**
```
Give two safe workflows to keep Colab notebooks versioned in GitHub:
(A) using "File > Save a copy in GitHub",
(B) local git with Drive sync (brief).
Provide steps and cautions (e.g., large outputs, secrets) for each.
```
**Reflection:** Which workflow will your team adopt and why?


Our team will mainly use workflow A because it is fast and easy to be built into Colab, and reduces setup overhead. For larger projects or when tighter Git control is needed, we may use Workflow B.


---
## Capstone Synthesis (end of class)

**Scenario:** Your team needs a reproducible workflow for this course: team repo on GitHub, branching, Colab auth to BigQuery, and a PR checklist.

**Run this prompt:**
```
Act as a DevEx lead for a university analytics team.
Produce a one-page "Runbook" with:
- Repo structure (folders for notebooks, data, dashboards, docs)
- Branching model (who creates branches, when to merge)
- Colab ↔ BigQuery quickstart (auth, sample query, cost-safe LIMIT)
- PR checklist (max 8 bullets) and protection rules for main
- Two risks + mitigations (e.g., secrets leakage, merge conflicts)
Use concise bullets and keep it classroom-ready.
```

**Paste your final runbook below (or attach as a Markdown file in your repo) and add a 3‑bullet reflection on what you changed after validation.**


In [7]:
# Analytics Team Runbook

# ## Repository Structure

# Organize your project files logically within the GitHub repository:

# *   `/notebooks`: Colab notebooks for analysis, modeling, and exploration.
# *   `/data`: Raw or processed data files (keep small; large datasets should remain in BigQuery or Cloud Storage).
# *   `/dashboards`: Files related to dashboard development (e.g., Tableau workbooks, Looker Studio models, code for dashboarding tools).
# *   `/docs`: Project documentation, meeting notes, design documents, and this runbook.
# *   `/scripts`: Standalone Python or SQL scripts for specific tasks (e.g., data cleaning, ETL).

# ## Branching Model

# Adopt a simple feature-branch workflow:

# *   **`main` Branch:** Always represents the stable, production-ready state of the project. Direct commits to `main` are prohibited.
# *   **Feature Branches:** Created off `main` for each new task, feature, or bug fix (e.g., `feature/explore-user-behavior`, `bugfix/fix-data-load`).
# *   **Who Creates Branches:** Any team member working on a task creates their own feature branch.
# *   **When to Merge:** Merge feature branches into `main` only after a Pull Request has been approved and passes all checks.

# ## Colab ↔ BigQuery Quickstart

# Accessing and querying BigQuery from Colab:

# 1.  **Authenticate:** Run `from google.colab import auth; auth.authenticate_user()` in a Colab cell to authenticate using your Google account.
# 2.  **Initialize Client:** Run `from google.cloud import bigquery; client = bigquery.Client(project="<YOUR_PROJECT_ID>")` (Replace `<YOUR_PROJECT_ID>`).
# 3.  **Run Query:** Define your SQL query string (e.g., `sql = "SELECT * FROM your_dataset.your_table LIMIT 100"`). **Cost Note:** BigQuery costs are based on data processed. Use `LIMIT` during exploration to restrict data scanned and manage costs.
# 4.  **Get Results:** Run `df = client.query(sql).result().to_dataframe()` to load results into a pandas DataFrame.
# 5.  **Verify:** Use `print("Rows:", len(df))` or `display(df.head())` to inspect the results.

# ## Pull Request (PR) Checklist & `main` Protection

# Ensure code quality and collaboration with a PR process:

# *   **PR Title:** `<unit>-<lab>-<short-desc>`.
# *   **Description:** Problem, approach, key files, how to test.
# *   **Screenshots:** 1-2 relevant visuals if UI/plots changed.
# *   **Related Items:** Link issue or assignment requirement.
# *   **Reviewers:** At least 1 teammate review required; no self-merge.
# *   **Testing:** Notebook/script runs end-to-end without errors.
# *   **Security/Privacy:** No secrets, tokens, or PII in code/outputs.
# *   **Revert Plan:** Note how to quickly roll back if needed.

# **`main` Branch Protection Rules (Configure in GitHub Settings > Branches):**

# *   Require a pull request before merging.
# *   Require at least 1 approval.
# *   Require status checks to pass (if configured).
# *   Do not allow force pushes.

# ## Risks and Mitigations

# *   **Risk 1: Secrets Leakage:** Accidentally committing API keys, passwords, or sensitive credentials to the repository.
#     *   **Mitigation:** **Never** hardcode secrets. Use Colab Secrets, environment variables, or secure credential management tools. Add a `.gitignore` file to prevent common secret file types from from being committed. Regularly audit the repo history for leaked secrets (GitHub has scanning tools).
# *   **Risk 2: Merge Conflicts:** Multiple people changing the same part of a file simultaneously leading to conflicts during merging.
#     *   **Mitigation:** **Communicate** with your team about which files you are working on. **Keep changes small** and commit/push frequently. Pull the latest changes from `main` *before* starting new work on your branch and *before* creating a PR. Learn basic Git conflict resolution steps.

# This runbook provides a foundation for your team's workflow. Adapt it as needed based on your team's size and project complexity.

## Reflection
**Made BigQuery steps easier**: Kept only the basics (login, start client, safe LIMIT) so everyone can follow without extra setup.  
**Shortened PR checklist**: Cut it down to 8 simple points so it’s clear and not overwhelming.  
**Picked student-friendly risks**: Focused on secrets leaking and merge conflicts, since these are the real issues our team might face.  




---
## Submission Checklist (to your team repo + Brightspace link)

- [ ] All **Reflection** sections completed (A1–A3, B1–B3, C1–C3, Capstone).
- [ ] Any code snippets pasted are **validated** and include a 1‑line explanation.
- [ ] Notebook runs top‑to‑bottom without errors (where code cells exist).
- [ ] Commit message: `week2.1-prompt-practice` and open a PR for review.
- [ ] Add this notebook path to your repo **README.md** under Week 2.1.
