# Answers

## A1 – Bill Scatter Overview

_Load seaborn's `penguins` dataset, set a colorblind theme, and plot bill length vs bill depth colored by species._

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

sns.set_theme(style='whitegrid', palette='colorblind')

penguins = sns.load_dataset('penguins')
fig, ax = plt.subplots(figsize=(6, 4))
sns.scatterplot(data=penguins, x='bill_length_mm', y='bill_depth_mm', hue='species', ax=ax, alpha=0.8)
ax.set_title('Bill Measurements by Species')
plt.tight_layout()
plt.show()
# Observation: Gentoo penguins cluster with long bills but shallow depths compared with Adelie and Chinstrap birds.


## A2 – Facet Scatter

_Create a facet grid of flipper length vs body mass, splitting columns by sex and coloring points by species._

In [None]:
g = sns.relplot(
    data=penguins,
    x='flipper_length_mm',
    y='body_mass_g',
    hue='species',
    col='sex',
    kind='scatter',
    height=4,
    aspect=1
)
g.set_titles('{col_name}')
plt.show()
# Observation: Male penguins tend to be heavier for a given flipper length across every species.


## A3 – Markers by Island

_Draw a scatter of bill length vs body mass using different markers per island and position the legend outside the plot._

In [None]:
fig, ax = plt.subplots(figsize=(6, 4))
sns.scatterplot(
    data=penguins,
    x='bill_length_mm',
    y='body_mass_g',
    style='island',
    hue='species',
    ax=ax
)
ax.legend(title='Species / Island', bbox_to_anchor=(1.05, 1), loc='upper left')
ax.set_title('Body Mass by Bill Length and Island')
plt.tight_layout()
plt.show()
# Observation: Biscoe island birds dominate the heavier mass range regardless of species.


## A4 – Overlapping Histograms

_Plot overlapping histograms of `flipper_length_mm` for each species using transparency and contrasting edge colors._

In [None]:
fig, ax = plt.subplots(figsize=(6, 4))
for species, df_subset in penguins.groupby('species'):
    ax.hist(
        df_subset['flipper_length_mm'].dropna(),
        bins=20,
        alpha=0.5,
        edgecolor='black',
        label=species
    )
ax.set_xlabel('Flipper Length (mm)')
ax.set_ylabel('Count')
ax.legend(title='Species')
ax.set_title('Flipper Length Distribution by Species')
plt.tight_layout()
plt.show()
# Observation: Gentoo flippers skew longest, shifting their histogram noticeably to the right.


## A5 – Bill Depth Density

_Create shaded KDE curves for bill depth by species on shared axes, including a vertical line at the overall median._

In [None]:
fig, ax = plt.subplots(figsize=(6, 4))
for species, df_subset in penguins.groupby('species'):
    sns.kdeplot(df_subset['bill_depth_mm'], ax=ax, fill=True, alpha=0.4, label=species)
ax.axvline(penguins['bill_depth_mm'].median(), color='black', linestyle='--', linewidth=1)
ax.set_title('Bill Depth Density by Species')
ax.set_xlabel('Bill Depth (mm)')
plt.tight_layout()
plt.show()
# Observation: Adelies concentrate around deeper bills relative to the overall median reference line.


## A6 – Violin with Means

_Plot body mass by species using a violin plot with inner quartile boxes and annotate the mean mass atop each violin._

In [None]:
fig, ax = plt.subplots(figsize=(6, 4))
sns.violinplot(data=penguins, x='species', y='body_mass_g', inner='box', ax=ax)
mean_masses = penguins.groupby('species')['body_mass_g'].mean()
for idx, (species, mean_val) in enumerate(mean_masses.items()):
    ax.text(idx, mean_val + 50, f'{mean_val:.0f}', ha='center', va='bottom', fontsize=9)
ax.set_title('Body Mass by Species')
plt.tight_layout()
plt.show()
# Observation: Gentoo penguins average near 5,000 grams, towering over the other species.


## A7 – Pairplot Tweaks

_Construct a pairplot over numeric penguin features with species hue, smaller markers, and histograms on the diagonal._

In [None]:
sns.pairplot(
    data=penguins,
    vars=['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g'],
    hue='species',
    plot_kws={'s': 30, 'alpha': 0.7},
    diag_kind='hist'
)
plt.show()
# Observation: Species clusters are clearly separated on bill depth versus bill length axes.


## A8 – Correlation Heatmap

_Compute the correlation matrix of numeric penguin features and display it with a diverging heatmap and annotations._

In [None]:
corr = penguins[['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g']].corr()
fig, ax = plt.subplots(figsize=(5, 4))
sns.heatmap(corr, annot=True, cmap='coolwarm', center=0, fmt='.2f', ax=ax)
ax.set_title('Penguin Feature Correlations')
plt.tight_layout()
plt.show()
# Observation: Flipper length and body mass exhibit the strongest positive relationship.


## A9 – Dual-Axis Trend

_Using the `flights` dataset, plot monthly passengers on the primary axis and a 12-month rolling sum on the secondary axis._

In [None]:
flights = sns.load_dataset('flights')
flights['date'] = pd.to_datetime(flights['year'].astype(str) + '-' + flights['month'].astype(str))
flights = flights.set_index('date').sort_index()
flights['rolling_year'] = flights['passengers'].rolling(12, min_periods=1).sum()

fig, ax1 = plt.subplots(figsize=(8, 4))
ax2 = ax1.twinx()
ax1.plot(flights.index, flights['passengers'], color='tab:blue', label='Monthly passengers')
ax2.plot(flights.index, flights['rolling_year'], color='tab:orange', label='12-month sum')
ax1.set_ylabel('Passengers')
ax2.set_ylabel('Rolling annual passengers')
ax1.set_title('Monthly vs Rolling Annual Passenger Counts')
ax1.legend(loc='upper left')
ax2.legend(loc='upper right')
plt.tight_layout()
plt.show()
# Observation: The rolling series smooths spikes yet mirrors the broader upward trajectory.


## A10 – Area Chart

_Plot a filled area chart of normalized monthly passengers (divided by max) from the flights data._

In [None]:
normalized = flights['passengers'] / flights['passengers'].max()
fig, ax = plt.subplots(figsize=(8, 3.5))
ax.fill_between(flights.index, normalized, color='steelblue', alpha=0.6)
ax.set_ylabel('Share of Max Passengers')
ax.set_title('Normalized Monthly Passenger Volume')
plt.tight_layout()
plt.show()
# Observation: Seasonal peaks recur annually while steadily creeping closer to 100% of the max.


## A11 – Peak Annotation

_Add an annotation for the month with the highest passenger count showing the value on the flights line series._

In [None]:
max_idx = flights['passengers'].idxmax()
max_val = flights['passengers'].max()
fig, ax = plt.subplots(figsize=(8, 3.5))
ax.plot(flights.index, flights['passengers'], color='tab:blue')
ax.scatter([max_idx], [max_val], color='red')
ax.annotate(
    f'{max_val} passengers',
    xy=(max_idx, max_val),
    xytext=(max_idx, max_val + 30),
    arrowprops=dict(arrowstyle='->')
)
ax.set_ylabel('Passengers')
ax.set_title('Monthly Passenger Counts with Peak Highlighted')
plt.tight_layout()
plt.show()
# Observation: The peak lands near mid-1960 as the series culminates.


## A12 – Load Tips Dataset

_Load seaborn's `tips` dataset for the remaining plotting tasks and preview the first rows._

In [None]:
tips = sns.load_dataset('tips')
tips.head()
# Observation: The table captures meal context, bill size, and gratuities across several days.


## A13 – Stacked Bar Chart

_Create a stacked bar chart of total tips by day and smoker status with custom colors and an external legend._

In [None]:
tips_summary = tips.groupby(['day', 'smoker'])['tip'].sum().unstack(fill_value=0)
colors = ['#1b9e77', '#d95f02']
fig, ax = plt.subplots(figsize=(6, 4))
tips_summary.plot(kind='bar', stacked=True, color=colors, ax=ax)
ax.set_ylabel('Total Tips ($)')
ax.set_title('Tips by Day and Smoker Status')
ax.legend(title='Smoker', bbox_to_anchor=(1.02, 1), loc='upper left')
plt.tight_layout()
plt.show()
# Observation: Weekend dinners account for the bulk of tipping volume, especially among non-smokers.


## A14 – Boxen and Swarm

_Combine a boxen plot of total bill by day with an overlaid swarm plot colored by meal time._

In [None]:
fig, ax = plt.subplots(figsize=(6, 4))
sns.boxenplot(data=tips, x='day', y='total_bill', ax=ax, color='lightgray')
sns.swarmplot(data=tips, x='day', y='total_bill', hue='time', dodge=True, size=4, ax=ax)
ax.set_title('Total Bill Distribution by Day')
ax.legend(title='Meal Time', bbox_to_anchor=(1.02, 1), loc='upper left')
plt.tight_layout()
plt.show()
# Observation: Dinner bills stretch higher, particularly on Saturday nights.


## A15 – Manual Facet Histograms

_Use matplotlib subplots to plot histograms of total bill for each day (2×2 grid) with shared axes._

In [None]:
days = tips['day'].unique()
fig, axes = plt.subplots(2, 2, figsize=(8, 6), sharex=True, sharey=True)
axes = axes.flatten()
for ax, day in zip(axes, days):
    subset = tips.loc[tips['day'] == day, 'total_bill']
    ax.hist(subset, bins=15, color='skyblue', edgecolor='black')
    ax.set_title(day)
fig.suptitle('Total Bill by Day', fontsize=14)
plt.tight_layout(rect=[0, 0, 1, 0.96])
plt.show()
# Observation: Saturday distributions appear widest, underscoring weekend spending.


## A16 – Tip Percentage Catplot

_Calculate tip percentage and draw a seaborn catplot (pointplot) of average tip percentage by day for each meal time._

In [None]:
tips = tips.assign(tip_pct=tips['tip'] / tips['total_bill'])
sns.catplot(
    data=tips,
    x='day',
    y='tip_pct',
    hue='time',
    kind='point',
    dodge=True,
    markers=['o', 's'],
    capsize=0.1,
    height=4,
    aspect=1.2
)
plt.title('Average Tip Percentage by Day and Meal')
plt.show()
# Observation: Dinner tips edge out lunch tips by a couple of percentage points across most days.


## A17 – Joint Regression Plot

_Create a seaborn jointplot of total bill vs tip with a regression line and marginal histograms._

In [None]:
sns.jointplot(data=tips, x='total_bill', y='tip', kind='reg', height=5)
plt.show()
# Observation: A strong positive linear relationship emerges with modest variance around the fit line.


## A18 – ECDF Plot

_Plot an empirical cumulative distribution function (ECDF) of total bill amounts grouped by meal time._

In [None]:
fig, ax = plt.subplots(figsize=(6, 4))
for meal, subset in tips.groupby('time'):
    sns.ecdfplot(subset['total_bill'], ax=ax, label=meal)
ax.set_xlabel('Total Bill ($)')
ax.set_ylabel('ECDF')
ax.set_title('Total Bill ECDF by Meal Time')
ax.legend(title='Meal Time')
plt.tight_layout()
plt.show()
# Observation: Lunch bills accumulate faster at lower amounts, confirming dinners run pricier.


## A19 – Hexbin Density

_Use matplotlib's `hexbin` to visualize the density of total bill vs tip values with a colorbar._

In [None]:
fig, ax = plt.subplots(figsize=(6, 4))
hb = ax.hexbin(tips['total_bill'], tips['tip'], gridsize=25, cmap='viridis')
cb = fig.colorbar(hb, ax=ax)
cb.set_label('Count')
ax.set_xlabel('Total Bill')
ax.set_ylabel('Tip')
ax.set_title('Tip vs Total Bill Density')
plt.tight_layout()
plt.show()
# Observation: The densest concentration sits around $10–$20 bills with $2–$4 tips.


## A20 – Style Context

_Within a seaborn style context (`darkgrid`), plot the average tip percentage by day to demonstrate temporary styling._

In [None]:
with sns.axes_style('darkgrid'):
    fig, ax = plt.subplots(figsize=(6, 3.5))
    tip_pct_by_day = tips.groupby('day')['tip_pct'].mean().reindex(['Thur', 'Fri', 'Sat', 'Sun'])
    tip_pct_by_day.plot(kind='bar', color='tab:purple', ax=ax)
    ax.set_ylabel('Average Tip %')
    ax.set_title('Tip Percentage by Day (Darkgrid Style)')
plt.tight_layout()
plt.show()
# Observation: Styling contexts offer quick aesthetic swaps without altering global settings.
