# **Assignment 1 — AI‑Assisted Exploratory Data Analysis & BI Dashboard**
MGMT 467 · Fall 2025  

**Team Name:** ☐  
**Members (GitHub handles):** ☐ ☐ ☐ ☐  
**GitHub Repo URL:** ☐  
**Looker Studio Dashboard (public link):** ☐

> **Scenario:** NYC DOT has asked your team to analyze the public Citi Bike program and recommend strategies to improve bike availability and engagement. You will use BigQuery + Gemini to conduct AI‑assisted EDA and publish an executive dashboard.

## ✅ Submission Checklist (Team → Brightspace)
- [ ] GitHub repository link (source of record)
- [ ] Looker Studio dashboard link
- [ ] This notebook committed to GitHub with prompts and results

### ✅ Submission Checklist (Individual → Brightspace)
- [ ] `Contribution_Reflection.pdf` (with commit/PR evidence + peer eval)

## 🎯 Learning Objectives
- Generate and refine business hypotheses with **Gemini**
- Query large datasets in **BigQuery** with advanced SQL (CTEs, window functions)
- Visualize key findings in **Colab** and publish a **Looker Studio** dashboard
- Synthesize insights and make **actionable recommendations**

## 🧰 Setup
> Run the cells below to connect Colab to Google Cloud & BigQuery.

In [None]:
# Install and import basics (Colab usually has these preinstalled)
# !pip install --quiet google-cloud-bigquery pandas matplotlib

import pandas as pd
import matplotlib.pyplot as plt

# Authenticate to Google from Colab
from google.colab import auth  # type: ignore
auth.authenticate_user()

# Set your GCP project ID
PROJECT_ID = "your-gcp-project-id"  # <-- edit this
print("Using project:", PROJECT_ID)

In [None]:
# BigQuery magics (%%bigquery) and client
from google.cloud import bigquery
client = bigquery.Client(project=PROJECT_ID)

# Optional: list datasets to verify access
list(client.list_datasets())

## 🧪 Dataset
We will use **Citi Bike Trips**: `bigquery-public-data.new_york_citibike.citibike_trips`  
Feel free to explore additional public datasets if needed.

## 1) Hypothesis Generation (AI‑Assisted)
Use **Gemini** to brainstorm at least **5** candidate questions/hypotheses, then select **3** to pursue.

> **Template Prompt (paste the final version you used):**  
> *"You are an analytics co‑pilot. Propose 5 high‑value, testable business questions about the Citi Bike dataset (tripduration, stations, user types, time-of-day/week). Return as bullets with suggested SQL hints."*

**Selected Hypotheses**
1. ☐  
2. ☐  
3. ☐

## 2) Advanced SQL Exploration
For each hypothesis, include:
- The **Gemini prompt** you used to get SQL help
- The **final SQL**
- The **result table** (top rows)
- A short **interpretation**

> Tip: Use **CTEs** and at least **one window function** across your work.

### Hypothesis A — Prompt Log

> Paste Gemini prompt(s) and key suggestion(s) here.

In [None]:
# Hypothesis A — SQL (store results in a Pandas DataFrame)
query_hyp_a = r"""
-- Your final SQL for Hypothesis A goes here.
-- Example skeleton:
-- WITH trips AS (
--   SELECT *
--   FROM `bigquery-public-data.new_york_citibike.citibike_trips`
--   WHERE starttime BETWEEN '2019-07-01' AND '2019-07-31'
-- )
-- SELECT start_station_name, COUNT(*) AS trips
-- FROM trips
-- GROUP BY start_station_name
-- ORDER BY trips DESC
-- LIMIT 25
"""

df_hyp_a = client.query(query_hyp_a).to_dataframe()
df_hyp_a.head()

**Interpretation (2–4 sentences):** ☐

---

### Hypothesis B — Prompt Log

> Paste Gemini prompt(s) and key suggestion(s) here.

In [None]:
# Hypothesis B — SQL
query_hyp_b = r"""
-- Your final SQL for Hypothesis B goes here.
"""
df_hyp_b = client.query(query_hyp_b).to_dataframe()
df_hyp_b.head()

**Interpretation (2–4 sentences):** ☐

---

### Hypothesis C — Prompt Log

> Paste Gemini prompt(s) and key suggestion(s) here.

In [None]:
# Hypothesis C — SQL
query_hyp_c = r"""
-- Your final SQL for Hypothesis C goes here.
"""
df_hyp_c = client.query(query_hyp_c).to_dataframe()
df_hyp_c.head()

**Interpretation (2–4 sentences):** ☐

## 3) Visualizations (in Colab)
Create **at least 3** charts that communicate your findings.  
> Keep charts readable and labeled. Use `matplotlib` (no specific styles required).

In [None]:
# Example: Bar chart from df_hyp_a
# (Replace with charts that fit your story)
ax = df_hyp_a.head(10).plot(kind="bar", x=df_hyp_a.columns[0], y=df_hyp_a.columns[1], legend=False)
ax.set_xlabel(df_hyp_a.columns[0])
ax.set_ylabel(df_hyp_a.columns[1])
ax.set_title("Top Categories (Example)")
plt.show()

## 4) KPIs & Looker Studio Dashboard
- **KPI 1:** ☐  
- **KPI 2:** ☐  
- **KPI 3:** ☐  
- **KPI 4 (optional):** ☐  

**Dashboard Link:** ☐ (make public for viewing)  
> Ensure labels, filters, and date controls are clear for non‑technical stakeholders.

## 5) Synthesis & Recommendations
Summarize your **top 3 insights** and provide **2–3 actionable recommendations** for NYC DOT.

## 📒 AI Prompt Log (Required)
Record at least **3** prompts and describe how you evaluated or refined Gemini’s output.

| # | Prompt (summary) | Where used | What changed after refinement? |
|---|------------------|------------|--------------------------------|
| 1 | ☐ | Hyp A/B/C | ☐ |
| 2 | ☐ | Hyp A/B/C | ☐ |
| 3 | ☐ | Hyp A/B/C | ☐ |

## 📦 Appendix — Reproducibility
- BigQuery location: ☐  
- Query costs observed (if any): ☐  
- Known data quality caveats: ☐