## Long Sleeve Filtering: Overview & Weekly Updates

This notebook ingests the *“Details Category”* tab from the weekly Capelli report and produces two outputs:
1. **Player-Level Long Sleeve Summary** – aggregates purchase history per player, flags “Home” vs. “Away” long-sleeve purchases, and saves to `LongSleeveOrders/<DATE>Order_Summary_with_ls_status.csv`.  
2. **Merged Detail File** – merges each player’s long-sleeve status back into the raw detail exports, writing to `LongSleeveOrders/Rush Soccer <DATE> - Details_with_ls_status.csv`.

---

### 🔄 What You Need to Update Weekly
- **Input File Path**  
  - Change the CSV filename in the `pd.read_csv(...)` call to match the **new weekly** export inside the directory `shippingdates/`.  
- **Club Filter (Optional)**  
  - If analyzing a **single club**, uncomment and set `df = df[df["Club Name"] == "<YOUR CLUB>"]`.  
- **Long-Sleeve Product Names**  
  - Update the `home_ls` and `away_ls` strings to exactly match the **Product Name** entries for “Home Long Sleeve” and “Away Long Sleeve” in your report.  
- **Output Paths**  
  - The two `to_csv(...)` calls write into `LongSleeveOrders/`.  
  - Rename files to include the **current date** (e.g. `Rush Soccer 5.4…`, `5.4Order_Summary…`).  

---

### 📋 How It Works
1. **Read & (Optionally) Filter**  
   - Load the category-detail CSV, then optionally restrict to one **Club Name** and specific **Category** values.  
2. **Date & Year Filtering**  
   - Converts the “Order Date” column to datetime and filters to the current year (e.g. 2025).  
3. **Define Target Products**  
   - Assign exact **Product Name** strings to `home_ls` and `away_ls`.  
4. **Compute Player-Level Status**  
   - Group by **Player ID**, aggregate product names into lists, then count occurrences of each long sleeve SKU.  
   - Classify each player as:  
     - “Did Not Purchase Long Sleeve”  
     - “Purchased Only Home Long Sleeve (#)”  
     - “Purchased Only Away Long Sleeve (#)”  
     - “Purchased Both Long Sleeves (#)”  
5. **Save Outputs**  
   - **Summary CSV**: one row per player with status and counts.  
   - **Merged CSV**: original detail rows enriched with each player’s long-sleeve status.

---

> **⚠️ Important:** This notebook is designed to process **one club at a time**. You must supply the correct Club Name filter and the exact Product Name values for that club’s long-sleeve SKUs before running.



In [3]:
import pandas as pd

# Read the CSV file into a DataFrame.
df = pd.read_csv("shippingdates/Rush Soccer 5.4 - Rush Details Category 2025.csv")

# OPTIONAL: To restrict analysis to a specific club, e.g., "Rush Montana", uncomment:
df = df[df["Club Name"] == "Rush Nevada"]
df = df[df["Category"].isin(["Field Players Mandatory Kit", "Goalkeepers Mandatory Kit","Competitive Items"])]

# Convert "Order Date" to datetime and filter for orders in 2025.
df["Order Date"] = pd.to_datetime(df["Order Date"], errors="coerce")
df = df[df["Order Date"].dt.year == 2025]

# Define the long sleeve product descriptions.
home_ls = "NEVADA RUSH BROOKLYN II RUSH SOCCER PYRAMIDS LONG SLEEVE MATCH JERSEY PROMO BLUE BLACK WHITE"
away_ls = "NEVADA RUSH BROOKLYN II RUSH SOCCER SCATTERED SHARDE LONG SLEEVE MATCH JERSEY WHITE PROMO GREY BLACK"

# Function to compute long sleeve status and counts from a list of product names.
def compute_ls_status_player(product_names):
    # Convert the list into a pandas Series for easier aggregation.
    s = pd.Series(product_names)
    count_home = s.isin([home_ls]).sum()
    count_away = s.isin([away_ls]).sum()
    total = count_home + count_away
    if total == 0:
        status = "Did Not Purchase Long Sleeve"
    elif count_home > 0 and count_away > 0:
        status = f"Purchased Both Long Sleeves ({total})"
    elif count_home > 0:
        status = f"Purchased Only Home Long Sleeve ({count_home})"
    elif count_away > 0:
        status = f"Purchased Only Away Long Sleeve ({count_away})"
    else:
        status = "Did Not Purchase Long Sleeve"
    return status, count_home, count_away

# Group by "Player ID" so that if a player purchases in multiple orders, they are aggregated together.
grouped = df.groupby("Player ID").agg({
    "Order ID": lambda x: " ; ".join(x.astype(str).unique()),
    "Order Date": "min",             # earliest order date for that player
    "Customer Email": "first",       # assuming one email per player
    "Club Name": "first",
    "Player Name": "first",
    "Product Name": lambda x: list(x)  # aggregate all product names into a list
}).reset_index()

# Compute long sleeve status and counts for each player.
grouped[["Long Sleeve Status", "Home Count", "Away Count"]] = grouped["Product Name"].apply(
    lambda prod_list: pd.Series(compute_ls_status_player(prod_list))
)

# Print out unique players (Player ID, Customer Email, Player Name, and status) for players who purchased any long sleeve.
purchased_ls = grouped[grouped["Long Sleeve Status"] != "Did Not Purchase Long Sleeve"]
print("Players who purchased long sleeve jerseys:")
# print(purchased_ls[["Player ID", "Customer Email", "Player Name", "Long Sleeve Status", "Home Count", "Away Count"]])

# Print summary statistics: count of unique players by Long Sleeve Status.
summary = grouped.groupby("Long Sleeve Status").size()
print("\nSummary Statistics (Unique Players by Long Sleeve Status):")
print(summary)

# Print total counts of Home and Away long sleeve jerseys purchased across all players.
total_home = grouped["Home Count"].sum()
total_away = grouped["Away Count"].sum()
print(f"\nTotal Home Long Sleeve Jerseys Purchased: {total_home}")
print(f"Total Away Long Sleeve Jerseys Purchased: {total_away}")

# Write the aggregated order summary to CSV in the specified directory.
grouped.to_csv("LongSleeveOrders/5.4Order_Summary_with_ls_status.csv", index=False)

# Optionally, merge the summary information back into the original DataFrame by Player ID.
df = df.merge(grouped[["Player ID", "Long Sleeve Status", "Home Count", "Away Count"]],
              on="Player ID", how="left")
df.to_csv("LongSleeveOrders/Rush Soccer 5.4 - Details_with_ls_status.csv", index=False)

print("\nFiles saved: '5.4Order_Summary_with_ls_status.csv' and 'Rush Soccer 5.4 - Details_with_ls_status.csv'")


Players who purchased long sleeve jerseys:

Summary Statistics (Unique Players by Long Sleeve Status):
Long Sleeve Status
Did Not Purchase Long Sleeve           55
Purchased Both Long Sleeves (2)         8
Purchased Only Away Long Sleeve (1)     5
Purchased Only Home Long Sleeve (1)     1
dtype: int64

Total Home Long Sleeve Jerseys Purchased: 9
Total Away Long Sleeve Jerseys Purchased: 13

Files saved: '5.4Order_Summary_with_ls_status.csv' and 'Rush Soccer 5.4 - Details_with_ls_status.csv'
