# Answers

## A1 – Load Flights Dataset

_Import the `flights` dataset from seaborn, report its shape, and preview the first rows._

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns

sns.set_theme(style='whitegrid')

flights = sns.load_dataset('flights')
print(f'Shape: {flights.shape}')
flights.head()
# Observation: The flights table spans multiple decades of monthly passenger counts.


## A2 – Ordered Months

_Ensure the `month` column is an ordered categorical variable following calendar order, then display its categories._

In [None]:
month_order = list(flights['month'].unique())
flights['month'] = pd.Categorical(flights['month'], categories=month_order, ordered=True)
flights['month'].cat.categories
# Observation: Enforcing month order prevents alphabetical sorting mishaps during pivots.


## A3 – Monthly Pivot

_Pivot the data into a matrix of `month` by `year` showing passenger counts, naming the result `flights_pivot`._

In [None]:
flights_pivot = flights.pivot(index='month', columns='year', values='passengers')
flights_pivot.head()
# Observation: The pivot reveals steady passenger growth across calendar months.


## A4 – Monthly Totals

_Add a `total` column to `flights_pivot` representing the sum across all years for each month._

In [None]:
flights_pivot['total'] = flights_pivot.sum(axis=1)
flights_pivot.head()
# Observation: Late-summer months dominate total passenger counts over the full horizon.


## A5 – Year-over-Year Growth

_Compute the year-over-year percentage change across columns in `flights_pivot` (excluding `total`) and store it as `flights_growth`._

In [None]:
growth_only = flights_pivot.drop(columns=['total'])
flights_growth = growth_only.pct_change(axis=1) * 100
flights_growth.head()
# Observation: Annual growth rates accelerate in the late 1950s before tapering off.


## A6 – Stacked View

_Stack `flights_pivot` (without the `total` column) back to a long index with `month` and `year`, naming the series `passengers`._

In [None]:
stacked = growth_only.stack().rename('passengers')
stacked.head()
# Observation: Stack reverses the pivot to a MultiIndex Series useful for chained reshapes.


## A7 – Datetime Period

_Create a `date` column in the original `flights` frame combining `year` and `month` as the first day of the month, then set it as the index._

In [None]:
flights['date'] = pd.to_datetime(flights['year'].astype(str) + '-' + flights['month'].astype(str))
flights = flights.set_index('date').sort_index()
flights.head()
# Observation: A datetime index unlocks convenient resampling for time-based aggregates.


## A8 – Quarterly Aggregation

_Resample the flights data to quarterly frequency, summing passengers into `flights_q`._

In [None]:
flights_q = flights['passengers'].resample('Q').sum()
flights_q.head()
# Observation: Quarterly aggregates smooth monthly volatility while retaining seasonal waves.


## A9 – Quarter-on-Quarter Change

_Compute the quarter-on-quarter percentage change of `flights_q` and name the Series `qoq_change`._

In [None]:
qoq_change = flights_q.pct_change() * 100
qoq_change.head()
# Observation: Growth bounces between positive and negative as seasonal peaks and troughs alternate.


## A10 – Rolling Annual Total

_Calculate a 12-month rolling sum of passengers in the monthly data and assign it to `flights['rolling_year']`._

In [None]:
flights['rolling_year'] = flights['passengers'].rolling(12, min_periods=1).sum()
flights[['passengers', 'rolling_year']].head(15)
# Observation: Rolling totals trend consistently upward, reflecting long-run demand growth.


## A11 – Load Tips Dataset

_Import the `tips` dataset for use in reshaping exercises and display its head._

In [None]:
tips = sns.load_dataset('tips')
tips.head()
# Observation: The tips dataset offers categorical dimensions ideal for pivoting practice.


## A12 – Day-Time Pivot

_Create `tips_pivot` summarizing average tip percentage (`tip / total_bill`) by `day` (rows) and `time` (columns)._

In [None]:
tips = tips.assign(tip_pct=tips['tip'] / tips['total_bill'])
tips_pivot = tips.pivot_table(values='tip_pct', index='day', columns='time', aggfunc='mean').round(3)
tips_pivot
# Observation: Dinner services consistently earn higher tip percentages than lunch.


## A13 – Flatten Columns

_Reset the index on `tips_pivot` and flatten the resulting columns into simple snake_case names._

In [None]:
tips_pivot_reset = tips_pivot.reset_index()
tips_pivot_reset.columns = ['day'] + [f'tip_pct_{col.lower()}' for col in tips_pivot_reset.columns[1:]]
tips_pivot_reset
# Observation: Flattened names improve joinability with other tables requiring consistent column formats.


## A14 – Melt Back to Long

_Melt the flattened pivot back into long format with columns `day`, `time`, and `tip_pct`._

In [None]:
tips_long = tips_pivot_reset.melt(id_vars='day', var_name='time', value_name='tip_pct')
tips_long
# Observation: Melting restores the original tidy layout after wide transformations.


## A15 – Customer Size Buckets

_Bin party `size` into labels (`solo`, `pair`, `small_group`, `large_group`) and append this to the `tips` frame._

In [None]:
size_bins = [1, 2, 3, 5, tips['size'].max()]
size_labels = ['solo', 'pair', 'small_group', 'large_group']
tips['party_bucket'] = pd.cut(tips['size'], bins=size_bins, labels=size_labels, right=False)
tips[['size', 'party_bucket']].head()
# Observation: Most dining parties fall into the pair or small group categories.


## A16 – Nested Pivot

_Create a pivot table showing average `tip_pct` by `day` and `party_bucket`, with `time` as columns (a three-level summary)._

In [None]:
nested_pivot = tips.pivot_table(
    values='tip_pct',
    index=['day', 'party_bucket'],
    columns='time',
    aggfunc='mean'
).round(3)
nested_pivot
# Observation: Larger dinner parties slightly dilute tip percentage, especially on weekdays.


## A17 – Stack MultiIndex

_Stack the columns of `nested_pivot` to produce a Series with a three-level MultiIndex (day, party_bucket, time)._

In [None]:
stacked_nested = nested_pivot.stack().rename('tip_pct')
stacked_nested.head()
# Observation: The stacked view simplifies filtering specific combinations with `.xs` selections.


## A18 – Cross Tabulation

_Build a contingency table counting meals by `smoker` status and `time`, storing the result in `smoker_time_ct`._

In [None]:
smoker_time_ct = pd.crosstab(tips['smoker'], tips['time'])
smoker_time_ct
# Observation: Smokers appear more frequently during dinner service than lunch.


## A19 – Normalize Crosstab

_Normalize `smoker_time_ct` by row totals to obtain proportional shares per smoker group._

In [None]:
smoker_time_share = pd.crosstab(tips['smoker'], tips['time'], normalize='index').round(3)
smoker_time_share
# Observation: Row normalization highlights that non-smokers split more evenly between lunch and dinner.


## A20 – Wide-to-Long Slice

_Using `flights_pivot`, reshape passenger counts for 1956–1959 into a long DataFrame with columns `month`, `year`, `passengers` only for that slice._

In [None]:
wide_slice = flights_pivot.loc[:, 1956:1959]
flights_slice_long = wide_slice.reset_index().melt(id_vars='month', var_name='year', value_name='passengers')
flights_slice_long.head()
# Observation: The slice reveals the rapid passenger ramp in the late 1950s.
