# Challenge Task 10: Exploratory Data Analysis with the Data Science Agent in BigQuery

## Scenario

Cymbal Retail’s "Living Spaces" initiative is reimagining the digital storefront by shifting from individual gadgets to curated home atmospheres, like "focus sanctuaries" or "vibrant social hubs." To measure the success of this high-end home automation expansion, the leadership team needs a real-world impact analysis of the Illinois market.

To provide rapid insights, you utilize the Data Science Agent in Colab Enterprise. By automating the complex Python workflows—from joining massive tables to identifying deep statistical correlations and turn "cold" data into a strategic narrative for the "Living Spaces" expansion.

## Your Mission

Your mission is to perform a comprehensive Exploratory Data Analysis (EDA) to prepare for an upcoming executive briefing. You will guide the Data Science Agent to consolidate raw transaction data and perform a deep dive into the Illinois market.

## Objectives:

**1. Consolidate the Data:** Use the agent to perform a multi-table join across `product_master`, `customer_master`, and `pos_transactions` to create a unified history table named `customer_purchases_history`.

**2. Audit Data Quality:** Leverage the agent to proactively detect statistical outliers and missing values. You must not only identify these gaps but also apply the agent’s recommended imputation strategies.

**3. Analyze Local Market Dynamics:** Focus your analysis on the City-level (not state) for the Illinois market. You will look for hidden correlations between numerical features, such as product price and customer loyalty.

**4. Visualize for Stakeholders:** Generate high-impact visualizations, including Pie Charts for geographic distribution and a suite of statistical plots (Histograms, Box Plots, and Scatter Plots) that tell the story of the "Living Spaces" collection.

**Optimize for Performance:** Ensure the generated code is efficient and follows the constraint of being under 95 lines for the final reporting block.

**Note:** In this notebook, you can use the code **generate** capability in Colab Enterprise to generate the code.

## 1. Create a curated table that combines multiple tables

Your first goal is to create a unified view of your business data. Using the Data Science Agent, generate and execute code that joins the following datasets from rscw_oltp_stg_ds:

`product_master` (include `product_id`, `product_nm`)

`customer_master` (include `customer_id`)

`pos_transactions`

`pos_transaction_items`

The resulting table should be named `customer_purchases_history` and must be persisted in the `cymbal_retail_rscw_ml_ds` BigQuery dataset.

## 2. Create a Pandas dataframe that we will reuse for the remainder of the lab module

**Note:** Run this cell as it is, not code generation or modification needed. 

In [None]:
import pandas as pd
df_customer_purchases = pandas_gbq.read_gbq("select * from cymbal_retail_rscw_ml_ds.customer_purchases_history", project_id=project_id)

## 3. Run OOTB Exploratory Data Analytics (EDA) against the table with Data Science Agent's help

As an expert analyst, you need to understand the local market dynamics. Use the Agent to perform an EDA on `df_customer_purchases` and create meaningful visualizations.

**Constraint:** The data is filtered for the state of Illinois (IL). Instead of state-level analysis, focus your insights at the City level.

## 4. Detect outliers with Data Science Agent's help

Data quality is critical for accurate modeling. Use the Data Science Agent to scan the `df_customer_purchases` DataFrame and identify any statistical outliers that could skew your results.

## 5. Detect missing values with Data Science Agent's help

Missing data can break your downstream pipelines. Prompt the Agent to detect missing values within `df_customer_purchases`. Additionally, the Agent should suggest appropriate replacement (imputation) values based on the context of the data.

## 6. Detect correlations between numerical features with Data Science Agent's help

Understand how your numerical variables interact. Use the Agent to calculate and display the correlations between all numerical features in your customer purchase data.

## 7. Explore the distribution of customers with Data Science Agent's help

You need to visualize the geographic and financial spread of your customers. Use the Agent and the data in dataframe `df_customer_purchases` to generate Pie Charts showing:

 * The distribution of customers by City.

 * The distribution of customers categorized by their Total Spend.

## 8. Visualizations

Create a final visual report for stakeholders. Use the Agent to generate a suite of visualizations including histograms, box plots, scatter plots, and bar charts. These should highlight the distribution of individual variables and the relationships between them.

**Context:** Remember the scope is limited to cities within Illinois, USA. Keep the code efficient (under 95 lines).

#### Task Validation

Task Complete! 
> **Note:** After execution of the pipeline and execution review. Go back to the lab guide page and click **Check my progress** for **AT ID: 7103 Perform exploratory data analysis with a Data Science Agent**.