## Long-Sleeve Report Generator (Excel Version)

This notebook ingests the **“Rush … Details Category 2025.csv”** file you receive every week from Capelli, filters it to **one** Rush club, and outputs **one Excel workbook** with four sheets:

| Worksheet               | What it contains                                                                      |
|-------------------------|---------------------------------------------------------------------------------------|
| **Purchased Only Home** | Players who bought *only* the **Home** long-sleeve jersey                             |
| **Purchased Only Away** | Players who bought *only* the **Away** long-sleeve jersey                             |
| **Did Not Purchase**    | Players who bought **no** long sleeves                                                |
| **All Purchases**       | Full player-level table with status & counts (home, away, both, none)                 |

The workbook is saved to  

``LongSleeveOrders/<Club> Long Sleeve <MM.DD> Report.xlsx``  
*(example: `Hawaii Long Sleeve 5.11 Report.xlsx`)*

---

### 🔄 Weekly items you **must** update

| Section | Variable / line | What to enter |
|---------|-----------------|---------------|
| **1️⃣ CONFIG block** | `file_path` | Path to the new *Details Category* CSV you just downloaded (inside **`shippingdates/`**). |
| | `club_filter` | Exact **Club Name** to analyse (e.g. `"Rush Hawaii"`). Only **one** club at a time. |
| **Long-sleeve SKUs** | `home_ls`, `away_ls` | Copy-paste the **exact** *Product Name* strings for that club’s Home & Away long-sleeve jerseys. |

> **Important:** When switching clubs, update **all three** of the above items (`file_path`, `club_filter`, and the two SKU strings) **before** running the notebook.

---

### 📋 How the notebook works

1. **Load CSV**  
   Reads `file_path` with `pandas.read_csv()` (comma-delimited).

2. **Filter rows**  
   * Keeps only rows where `Club Name == club_filter`.  
   * Keeps only the categories  
     `["Field Players Mandatory Kit", "Goalkeepers Mandatory Kit", "Competitive Items"]`.

3. **Date handling**  
   Converts `Order Date` to datetime and retains orders from **2025**.

4. **Long-sleeve detection**  
   * Aggregates each player’s *Product Name* list.  
   * Counts matches to `home_ls` and `away_ls`.  
   * Assigns **Long Sleeve Status**:  
     - `Purchased Only Home Long Sleeve (#)`  
     - `Purchased Only Away Long Sleeve (#)`  
     - `Purchased Both Long Sleeves (#)`  
     - `Did Not Purchase Long Sleeve`

5. **Console summary**  
   Prints player counts per status and total Home / Away quantities.

6. **Excel export**  
   * Builds filename: **`<Club> Long Sleeve <MM.DD> Report.xlsx`** (`MM.DD` parsed from the CSV name).  
   * Writes four data-frames to the workbook (see table above) using **`xlsxwriter`**.  
   * Creates the `LongSleeveOrders/` folder if it doesn’t already exist.

---

### ⚠️ Assumptions & caveats

* Designed for **one club per run**. Re-run the notebook with a new `club_filter` to generate reports for other clubs.  
* `home_ls` and `away_ls` strings **must exactly match** the *Product Name* values in the Capelli report.  
* Requires the **`xlsxwriter`** package. Install once via:  
  ```bash
  pip install xlsxwriter


In [13]:
import pandas as pd
import re
from pathlib import Path

# ------------------------------------------------------------------
# 1️⃣  CONFIG –- update these two lines each time you run the report
# ------------------------------------------------------------------
file_path = "shippingdates/Rush Soccer 5.11 - Rush Details Category 2025.csv"
club_filter = "Rush California"                     # e.g. "Rush Montana", "Rush Nevada", etc.
# ------------------------------------------------------------------

# -------------------------- Step 1: Load the Data -------------------------- #
df = pd.read_csv(file_path)      # let pandas infer the comma delimiter
print("Data loaded successfully.\n")

# -------------------------- Step 2: Basic Filtering ------------------------ #
df = df[df["Club Name"] == club_filter]
df = df[df["Category"].isin(
        ["Field Players Mandatory Kit", "Goalkeepers Mandatory Kit", "Competitive Items"])]

# -------------------------- Step 3: Prep Dates ---------------------------- #
df["Order Date"] = pd.to_datetime(df["Order Date"], errors="coerce")
df = df[df["Order Date"].dt.year == 2025]

# -------------------------- Step 4: Long-Sleeve Matching ------------------- #
home_ls = (
    "CALIFORNIA RUSH BROOKLYN II RUSH SOCCER PYRAMIDS LONG SLEEVE MATCH JERSEY PROMO BLUE BLACK WHITE"
)
away_ls = (
    "CALIFORNIA RUSH BROOKLYN II RUSH SOCCER SCATTERED SHARDE LONG SLEEVE MATCH JERSEY WHITE PROMO GREY BLACK"
)

def compute_ls_status_player(product_names):
    s = pd.Series(product_names)
    count_home = s.isin([home_ls]).sum()
    count_away = s.isin([away_ls]).sum()

    if count_home and count_away:
        status = f"Purchased Both Long Sleeves ({count_home + count_away})"
    elif count_home:
        status = f"Purchased Only Home Long Sleeve ({count_home})"
    elif count_away:
        status = f"Purchased Only Away Long Sleeve ({count_away})"
    else:
        status = "Did Not Purchase Long Sleeve"

    return status, count_home, count_away

# -------------------------- Step 5: Aggregate by Player -------------------- #
grouped = (
    df.groupby("Player ID")
      .agg({
          "Order ID":      lambda x: " ; ".join(x.astype(str).unique()),
          "Order Date":    "min",
          "Customer Email":"first",
          "Club Name":     "first",
          "Player Name":   "first",
          "Product Name":  lambda x: list(x)
      })
      .reset_index()
)

grouped[["Long Sleeve Status", "Home Count", "Away Count"]] = (
    grouped["Product Name"].apply(lambda lst: pd.Series(compute_ls_status_player(lst)))
)

# -------------------------- Step 6: Quick Console Summary ------------------ #
summary = grouped.groupby("Long Sleeve Status").size()
total_home = grouped["Home Count"].sum()
total_away = grouped["Away Count"].sum()

print("Summary Statistics (unique players):\n", summary)
print(f"\nTotal Home Long Sleeve Jerseys Purchased:  {total_home}")
print(f"Total Away Long Sleeve Jerseys Purchased:   {total_away}")

# -----------------------------------------------------------------
# 7️⃣  EXPORT  →  Excel file with four worksheets
# -----------------------------------------------------------------
club_clean  = club_filter.replace("Rush ", "")          # "Rush Hawaii" → "Hawaii"
date_token  = re.search(r"(\d+\.\d+)", Path(file_path).stem).group(1)  # "5.11"
out_dir     = Path("LongSleeveOrders")
out_dir.mkdir(exist_ok=True)
out_path    = out_dir / f"{club_clean} Long Sleeve {date_token} Report.xlsx"

# Masks for each worksheet
mask_home_only = grouped["Long Sleeve Status"].str.startswith("Purchased Only Home")
mask_away_only = grouped["Long Sleeve Status"].str.startswith("Purchased Only Away")
mask_none      = grouped["Long Sleeve Status"] == "Did Not Purchase Long Sleeve"

with pd.ExcelWriter(out_path, engine="xlsxwriter") as writer:
    grouped[mask_home_only].to_excel(writer, sheet_name="Purchased Only Home",  index=False)
    grouped[mask_away_only].to_excel(writer, sheet_name="Purchased Only Away",  index=False)
    grouped[mask_none].to_excel(writer,      sheet_name="Did Not Purchase",     index=False)
    grouped.to_excel(writer,                 sheet_name="All Purchases",        index=False)

print(f"\nExcel file created: {out_path.resolve()}")


Data loaded successfully.

Summary Statistics (unique players):
 Long Sleeve Status
Did Not Purchase Long Sleeve           111
Purchased Both Long Sleeves (2)          8
Purchased Only Away Long Sleeve (1)      2
Purchased Only Home Long Sleeve (1)      3
dtype: int64

Total Home Long Sleeve Jerseys Purchased:  11
Total Away Long Sleeve Jerseys Purchased:   10

Excel file created: /Users/aidanwall/Documents/RUSH/MiscellanousProjects/Capelli-Orders/LongSleeveOrders/California Long Sleeve 5.11 Report.xlsx


## Long Sleeve Filtering: Overview & Weekly Updates

This notebook ingests the *“Details Category”* tab from the weekly Capelli report and produces two outputs:
1. **Player-Level Long Sleeve Summary** – aggregates purchase history per player, flags “Home” vs. “Away” long-sleeve purchases, and saves to `LongSleeveOrders/<DATE>Order_Summary_with_ls_status.csv`.  
2. **Merged Detail File** – merges each player’s long-sleeve status back into the raw detail exports, writing to `LongSleeveOrders/Rush Soccer <DATE> - Details_with_ls_status.csv`.

---

### 🔄 What You Need to Update Weekly
- **Input File Path**  
  - Change the CSV filename in the `pd.read_csv(...)` call to match the **new weekly** export inside the directory `shippingdates/`.  
- **Club Filter (Optional)**  
  - If analyzing a **single club**, uncomment and set `df = df[df["Club Name"] == "<YOUR CLUB>"]`.  
- **Long-Sleeve Product Names**  
  - Update the `home_ls` and `away_ls` strings to exactly match the **Product Name** entries for “Home Long Sleeve” and “Away Long Sleeve” in your report.  
- **Output Paths**  
  - The two `to_csv(...)` calls write into `LongSleeveOrders/`.  
  - Rename files to include the **current date** (e.g. `Rush Soccer 5.4…`, `5.4Order_Summary…`).  

---

### 📋 How It Works
1. **Read & (Optionally) Filter**  
   - Load the category-detail CSV, then optionally restrict to one **Club Name** and specific **Category** values.  
2. **Date & Year Filtering**  
   - Converts the “Order Date” column to datetime and filters to the current year (e.g. 2025).  
3. **Define Target Products**  
   - Assign exact **Product Name** strings to `home_ls` and `away_ls`.  
4. **Compute Player-Level Status**  
   - Group by **Player ID**, aggregate product names into lists, then count occurrences of each long sleeve SKU.  
   - Classify each player as:  
     - “Did Not Purchase Long Sleeve”  
     - “Purchased Only Home Long Sleeve (#)”  
     - “Purchased Only Away Long Sleeve (#)”  
     - “Purchased Both Long Sleeves (#)”  
5. **Save Outputs**  
   - **Summary CSV**: one row per player with status and counts.  
   - **Merged CSV**: original detail rows enriched with each player’s long-sleeve status.

---

> **⚠️ Important:** This notebook is designed to process **one club at a time**. You must supply the correct Club Name filter and the exact Product Name values for that club’s long-sleeve SKUs before running.



In [12]:
# import pandas as pd

# # Read the CSV file into a DataFrame.
# df = pd.read_csv("shippingdates/Rush Soccer 5.11 - Rush Details Category 2025.csv")

# # OPTIONAL: To restrict analysis to a specific club, e.g., "Rush Montana", uncomment:
# df = df[df["Club Name"] == "Rush Hawaii"]
# df = df[df["Category"].isin(["Field Players Mandatory Kit", "Goalkeepers Mandatory Kit","Competitive Items"])]

# # Convert "Order Date" to datetime and filter for orders in 2025.
# df["Order Date"] = pd.to_datetime(df["Order Date"], errors="coerce")
# df = df[df["Order Date"].dt.year == 2025]

# # Define the long sleeve product descriptions.
# home_ls = "HAWAII RUSH BROOKLYN II RUSH SOCCER PYRAMIDS LONG SLEEVE MATCH JERSEY PROMO BLUE BLACK WHITE"
# away_ls = "HAWAII RUSH BROOKLYN II RUSH SOCCER SCATTERED SHARDE LONG SLEEVE MATCH JERSEY WHITE PROMO GREY BLACK"

# # Function to compute long sleeve status and counts from a list of product names.
# def compute_ls_status_player(product_names):
#     # Convert the list into a pandas Series for easier aggregation.
#     s = pd.Series(product_names)
#     count_home = s.isin([home_ls]).sum()
#     count_away = s.isin([away_ls]).sum()
#     total = count_home + count_away
#     if total == 0:
#         status = "Did Not Purchase Long Sleeve"
#     elif count_home > 0 and count_away > 0:
#         status = f"Purchased Both Long Sleeves ({total})"
#     elif count_home > 0:
#         status = f"Purchased Only Home Long Sleeve ({count_home})"
#     elif count_away > 0:
#         status = f"Purchased Only Away Long Sleeve ({count_away})"
#     else:
#         status = "Did Not Purchase Long Sleeve"
#     return status, count_home, count_away

# # Group by "Player ID" so that if a player purchases in multiple orders, they are aggregated together.
# grouped = df.groupby("Player ID").agg({
#     "Order ID": lambda x: " ; ".join(x.astype(str).unique()),
#     "Order Date": "min",             # earliest order date for that player
#     "Customer Email": "first",       # assuming one email per player
#     "Club Name": "first",
#     "Player Name": "first",
#     "Product Name": lambda x: list(x)  # aggregate all product names into a list
# }).reset_index()

# # Compute long sleeve status and counts for each player.
# grouped[["Long Sleeve Status", "Home Count", "Away Count"]] = grouped["Product Name"].apply(
#     lambda prod_list: pd.Series(compute_ls_status_player(prod_list))
# )

# # Print out unique players (Player ID, Customer Email, Player Name, and status) for players who purchased any long sleeve.
# purchased_ls = grouped[grouped["Long Sleeve Status"] != "Did Not Purchase Long Sleeve"]
# print("Players who purchased long sleeve jerseys:")
# # print(purchased_ls[["Player ID", "Customer Email", "Player Name", "Long Sleeve Status", "Home Count", "Away Count"]])

# # Print summary statistics: count of unique players by Long Sleeve Status.
# summary = grouped.groupby("Long Sleeve Status").size()
# print("\nSummary Statistics (Unique Players by Long Sleeve Status):")
# print(summary)

# # Print total counts of Home and Away long sleeve jerseys purchased across all players.
# total_home = grouped["Home Count"].sum()
# total_away = grouped["Away Count"].sum()
# print(f"\nTotal Home Long Sleeve Jerseys Purchased: {total_home}")
# print(f"Total Away Long Sleeve Jerseys Purchased: {total_away}")

# # Write the aggregated order summary to CSV in the specified directory.
# grouped.to_csv("LongSleeveOrders/Hawwaii 5.4Order_Summary_with_ls_status.csv", index=False)

# # Optionally, merge the summary information back into the original DataFrame by Player ID.
# df = df.merge(grouped[["Player ID", "Long Sleeve Status", "Home Count", "Away Count"]],
#               on="Player ID", how="left")
# # df.to_csv("LongSleeveOrders/Washington Rush Soccer 5.4 - Details_with_ls_status.csv", index=False)#update for club specific name

# print("\nFiles saved: 'Hawaii 5.11Order_Summary_with_ls_status.csv') 
#       ## and 'Hawaii Rush Soccer 5.11 - Details_with_ls_status.csv'"


Data loaded successfully.

Summary Statistics (unique players):
 Long Sleeve Status
Did Not Purchase Long Sleeve           111
Purchased Both Long Sleeves (2)          8
Purchased Only Away Long Sleeve (1)      2
Purchased Only Home Long Sleeve (1)      3
dtype: int64

Total Home Long Sleeve Jerseys Purchased:  11
Total Away Long Sleeve Jerseys Purchased:   10

Excel file created: /Users/aidanwall/Documents/RUSH/MiscellanousProjects/Capelli-Orders/LongSleeveOrders/California Long Sleeve 5.11 Report.xlsx
