# 05 â€” Group Preparation

The model is trained **per segment**, where a segment is defined as (Make, Body_Type).

This notebook:
- Loads `clean_sales_with_features.csv`
- Groups by `(Make, Body_Type)`
- Keeps only groups with at least **16 unique months** of data
- Saves a summary of groups to `group_metadata.csv`

This is useful both for modelling and for your portfolio:
it shows exactly which slices of the market the model is actually learning from.


In [None]:
import pandas as pd

df = pd.read_csv("data/clean_sales_with_features.csv")
df["Year_Month"] = pd.to_datetime(df["Year_Month"])

In [None]:
groups = []

for (make, body), grp in df.groupby(["Make", "Body_Type"]):
    n_months = grp["Year_Month"].nunique()
    if n_months >= 16:
        groups.append({
            "Make": make,
            "Body_Type": body,
            "Months": n_months,
            "Rows": len(grp)
        })

groups_df = (
    pd.DataFrame(groups)
    .sort_values(["Months", "Rows"], ascending=False)
    .reset_index(drop=True)
)

display(groups_df.head(20))
print("Total groups with >= 16 months:", len(groups_df))

In [None]:
groups_df.to_csv("data/group_metadata.csv", index=False)
print("Saved group metadata to data/group_metadata.csv")