# Exercises: Advanced Data Visualization


## **Instructions:**
This notebook is divided into 3 parts corresponding to the lecture sections.
Please execute the setup cell below first.

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from scipy import stats

# Dataset for exercises
# We will use the 'diamonds' dataset. It is large (50k rows), making it perfect for 
# testing overplotting and distribution visualization.
diamonds = sns.load_dataset("diamonds")
diamonds_sample = diamonds.sample(1000, random_state=42) # Smaller sample for some plots

print("Setup Complete. Dataset 'diamonds' loaded.")
diamonds.head()


## Part 1: Fundamentals (Exercises 1-3)
## *Topics: Aesthetics, Titles, and Saving Figures*

### **Exercise 1:**
Set the global seaborn theme to "whitegrid". Create a simple histogram (`histplot`) of the `price` column using the `diamonds` dataset.
*Hint: Use `sns.set_theme()`.*

In [None]:
# YOUR CODE HERE

### **Exercise 2:**
The following plot is messy. Improve it by:
1. Removing the top and right spines (`sns.despine()`).
2. Adding a clear title.
3. Adding proper units to the X and Y axis labels (Price is in USD, Carat is weight).

In [None]:
# *Code to improve:*
sns.scatterplot(data=diamonds_sample, x="carat", y="price")


### **Exercise 3:**
Re-create the plot from Exercise 2 and save it as a High-Resolution PNG file (300 DPI).
*Hint: Use `plt.savefig()`.*

## Part 2: Multivariate Exploration (Exercises 4-6)
## *Topics: Mapping Semantics, Overplotting, and Faceting*

### **Exercise 4:**
Create a multivariate scatterplot using `diamonds_sample` (the smaller dataset).
Map the following variables:
* x-axis: `carat`
* y-axis: `price`
* color (`hue`): `cut`
* size (`size`): `depth`
 
Ensure the points are semi-transparent (`alpha`) so we can see overlaps.

In [None]:
# YOUR CODE HERE

### **Exercise 5 (The Spaghetti Solution):**
The plot in Exercise 4 might be cluttered.
Instead of using hue for `cut`, create a **Faceted** plot where each `cut` gets its own subplot.
*Hint: Use `sns.relplot()` with `col="cut"`.*

In [None]:
# YOUR CODE HERE

### **Exercise 6:**
Re-do the plot from Exercise 5, but use a **colorblind-friendly** palette (e.g., "viridis" or "colorblind").
Also, wrap the columns (`col_wrap`) so the charts don't form one huge horizontal line.

In [None]:
# YOUR CODE HERE

## Part 3: Distributions & Uncertainty (Exercises 7-10)
## *Topics: Box/Violin Plots, Rainclouds, Q-Q Plots, Regression*

### **Exercise 7:**
Compare the distribution of `price` across different `color` grades.
Since the dataset is large, a standard boxplot hides too much detail.
Create a **Boxen Plot** (`sns.boxenplot`) to show the tails of the distribution better.

In [None]:
# YOUR CODE HERE

### **Exercise 8 (Advanced):**
Create a **Raincloud Plot** for `price` vs `cut`.
1. Plot a Violin plot (make it light gray or transparent).
2. Overlay a Strip plot (jittered points) on top of it.
3. (Optional) Overlay a Boxplot (empty face color) on top of that.

In [None]:
# YOUR CODE HERE

### **Exercise 9:**
We want to check if the `carat` variable follows a Normal Distribution.
Create a **Q-Q Plot** using `scipy.stats.probplot` to visually check this assumption.
Does the data fall perfectly on the red line?

In [None]:
# YOUR CODE HERE


### **Exercise 10:**
Visualize the relationship between `carat` and `price` using a Linear Regression model (`sns.regplot` or `sns.lmplot`).
* Use `diamonds_sample`.
* Make the scatter points transparent (`alpha=0.2`).
* Make the regression line a contrasting color (e.g., red).

In [None]:
# YOUR CODE HERE