# Project 2: Funds Series Strategy Corrections
This Jupyter notebook includes Python scripts and logic used to clean, group, and correct fund strategies across related fund series.

The notebook performs tasks such as:
- Extracting series names from fund titles using regex
- Grouping related funds (e.g., Fund I, II, III)
- Assigning the most common strategy to each group
- Exporting the grouped series and final strategy-corrected fund list

Each section is labeled by step for clarity.


## Step 1: Load fund data

In [None]:
import pandas as pd
import re
from collections import Counter

# Load dataset
df = pd.read_csv("funds_with_strategies.csv")
df.columns = df.columns.str.strip().str.lower()
df.head()

## Step 2: Extract Series Names
Use regex to remove legal suffixes, numbers, and common fund terms to isolate the base series name.

In [None]:
def extract_series_name(name):
    name = re.sub(r'\b(I{1,3}|IV|V|VI|VII|VIII|IX|X|XI|LP|Fund|Series|Capital|Fund I LLC)\b', '', name, flags=re.IGNORECASE)
    name = re.sub(r'\b\d+\b', '', name)
    name = re.sub(r'\s+', ' ', name)
    return name.strip().lower()

df["series_name"] = df["fund_name"].apply(extract_series_name)
df[["fund_name", "series_name"]].head()

## Step 3: Apply Corrected Strategy
For each series group, assign the most common `original_strategy` as the `corrected_strategy`.

In [None]:
def apply_correction(group):
    most_common = Counter(group["original_strategy"]).most_common(1)[0][0]
    group["corrected_strategy"] = most_common
    return group

df = df.groupby("series_name", group_keys=False).apply(apply_correction)
df[["fund_name", "original_strategy", "corrected_strategy"]].head()

## Step 4: Export Grouped Fund Series

In [None]:
group_df = df.groupby("series_name")["fund_name"].apply(lambda x: "; ".join(sorted(x))).reset_index()
group_df.to_csv("step1_fund_series_groups.csv", index=False)
group_df.head()

## Step 5: Export Final Output with Corrected Strategies

In [None]:
df.to_csv("funds_with_corrected_strategy.csv", index=False)
df.head()

## ✅ Summary
- Grouped similar funds by series name
- Applied consistent strategies using mode logic
- Exported clean files for analysis or reporting