<a href="https://colab.research.google.com/github/Jacob-Rose-BU/Alternative-Investments---Assette-Capstone-Project/blob/main/PRODUCTMASTER.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#PRODUCTMASTER Table Synthetic Data
The PRODUCTMASTER table contains the core metadata for all investment products offered by the firm. Each row represents a distinct fund or product, including key attributes like strategy, legal structure, asset class, and share class. It also includes proxy keys for downstream tracking, such as performance and representative accounts. This table serves as a foundational dimension for joining with related tables like holdings, performance, attribution, and factsheet commentary.


| **Column Name**         | **Description**                                                                                              |
| ----------------------- | ------------------------------------------------------------------------------------------------------------ |
| `PRODUCTCODE`           | Unique code for each product. Used as a **primary key** and may appear as a **foreign key** in other tables. |
| `PRODUCTNAME`           | Name of the product fund. Example: "XYZ Growth Opportunities Fund".                                          |
| `STRATEGY`              | Investment strategy associated with the fund (e.g., "Long/Short Equity", "Absolute Return").                 |
| `VEHICLECATEGORY`       | Broad category of the legal vehicle (e.g., "Trust", "LLC", "LP").                                            |
| `VEHICLETYPE`           | Specific legal structure or subtype (e.g., "Delaware LP", "Master-Feeder", "UCITS").                         |
| `ASSETCLASS`            | Primary asset class targeted by the fund (e.g., "Private Equity", "Hedge Fund", "Real Assets").              |
| `SHARECLASS`            | Share class designation (e.g., "Class A", "Institutional", "Retail").                                        |
| `PERFORMANCEACCOUNT`    | Simulated ID for performance tracking. Can be used as a **proxy key** in a performance table.                |
| `REPRESENTATIVEACCOUNT` | Simulated ID for reporting or marketing ownership. Also a **proxy key** in related commentary tables.        |
| `ISMARKETED`            | Indicates whether the fund is actively marketed to investors (Yes/No).                                       |
| `PARENTPRODUCTCODE`     | Code for the parent or umbrella product (used for hierarchical grouping).                                    |


Keys like PRODUCTCODE may be reused in related tables
(e.g., fund performance, holdings, commentary, attribution) as foreign key proxies.

In [None]:
!pip install faker

Collecting faker
  Downloading faker-37.5.3-py3-none-any.whl.metadata (15 kB)
Downloading faker-37.5.3-py3-none-any.whl (1.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m28.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faker
Successfully installed faker-37.5.3


In [None]:
import pandas as pd
import random
from faker import Faker

# Set up faker and random seed for reproducibility
fake = Faker()
random.seed(42)

# Define how many rows of synthetic data to create
NUM_ROWS = 50

# Define realistic categorical values for each relevant field
strategies = ['Buyout', 'Growth Equity', 'Long/Short Equity', 'Direct Lending', 'Renewable Energy']
vehicle_categories = ['LP', 'Feeder', 'Master', 'Offshore']
vehicle_types = ['Delaware LP', 'Cayman SPC', 'UCITS', 'SICAV', 'Trust']
asset_classes = ['Private Equity', 'Hedge Fund', 'Private Credit', 'Infrastructure', 'Real Estate']
share_classes = ['Class A', 'Class B', 'Institutional', 'Retail', 'Founder']
is_marketed_options = ['Yes', 'No']

# Pre-generate unique product codes to reference in parent-child relationships
base_product_codes = [f"PRD{i:04d}" for i in range(1, NUM_ROWS + 1)]

In [None]:
# Initialize empty list to hold each synthetic row
synthetic_data = []

# Main data generation loop
for i in range(NUM_ROWS):
    product_code = base_product_codes[i]  # Unique key for this row
    product_name = f"{fake.company()} {random.choice(strategies)} Fund"  # Fund-style name
    strategy = random.choice(strategies)
    vehicle_category = random.choice(vehicle_categories)
    vehicle_type = random.choice(vehicle_types)
    asset_class = random.choice(asset_classes)
    share_class = random.choice(share_classes)

    # Proxy keys for use in performance/representative tables
    performance_account = f"PA-{random.randint(1000, 9999)}"
    representative_account = f"RA-{random.randint(1000, 9999)}"
    is_marketed = random.choice(is_marketed_options)

    # Assign a parent product code (cannot be self)
    parent_product_code = random.choice([code for code in base_product_codes if code != product_code])

    # Construct row as dictionary
    synthetic_data.append({
        'PRODUCTCODE': product_code,
        'PRODUCTNAME': product_name,
        'STRATEGY': strategy,
        'VEHICLECATEGORY': vehicle_category,
        'VEHICLETYPE': vehicle_type,
        'ASSETCLASS': asset_class,
        'SHARECLASS': share_class,
        'PERFORMANCEACCOUNT': performance_account,  # can be used as FK in performance table
        'REPRESENTATIVEACCOUNT': representative_account,  # can be FK in rep table
        'ISMARKETED': is_marketed,
        'PARENTPRODUCTCODE': parent_product_code  # simulated hierarchy
    })

In [None]:
# Convert to DataFrame
df_funds = pd.DataFrame(synthetic_data)

# Preview first few rows in Colab
df_funds.head()

Unnamed: 0,PRODUCTCODE,PRODUCTNAME,STRATEGY,VEHICLECATEGORY,VEHICLETYPE,ASSETCLASS,SHARECLASS,PERFORMANCEACCOUNT,REPRESENTATIVEACCOUNT,ISMARKETED,PARENTPRODUCTCODE
0,PRD0001,Nicholson and Sons Buyout Fund,Buyout,Master,Cayman SPC,Hedge Fund,Class B,PA-2679,RA-9935,Yes,PRD0039
1,PRD0002,Hill Group Direct Lending Fund,Buyout,LP,Delaware LP,Hedge Fund,Class B,PA-9279,RA-1434,Yes,PRD0047
2,PRD0003,Meyer and Sons Renewable Energy Fund,Direct Lending,Feeder,SICAV,Real Estate,Institutional,PA-1106,RA-3615,No,PRD0023
3,PRD0004,"Tyler, Rocha and Meyer Long/Short Equity Fund",Growth Equity,Feeder,UCITS,Private Equity,Class A,PA-7224,RA-2584,No,PRD0024
4,PRD0005,Clark Inc Renewable Energy Fund,Long/Short Equity,LP,SICAV,Real Estate,Class A,PA-7201,RA-2291,No,PRD0042


In [None]:
# Save to CSV (for export and Snowflake/other database usage)
df_funds.to_csv("productmaster_table_synthetic_data.csv", index=False)

print("✅ Synthetic data generated and saved as 'productmaster_table_synthetic_data.csv'")

##Potential Snowflake Implementation
This code is intended not only to demonstrate realistic data generation for academic or testing purposes but also to support end-to-end data pipelines where Snowflake acts as the target data warehouse. With minimal adjustments (e.g., schema renaming, batch control), this data can be used in:
* Report generation (e.g., Assette)
* Performance attribution systems
* ESG dashboards
* Fund analytics platforms

The code below shows a programmatic load using snowflake.connector in Python. Please see example.env to include your Snowflake database and API keys.

In [None]:
import snowflake.connector

# Load credentials
load_dotenv()

sf_user = os.getenv("SNOWFLAKE_USER")
sf_password = os.getenv("SNOWFLAKE_PASSWORD")
sf_account = os.getenv("SNOWFLAKE_ACCOUNT")
sf_database = os.getenv("SNOWFLAKE_DATABASE")
sf_schema = os.getenv("SNOWFLAKE_SCHEMA")
sf_warehouse = os.getenv("SNOWFLAKE_WAREHOUSE")

# Connect to Snowflake
conn = snowflake.connector.connect(
    user=sf_user,
    password=sf_password,
    account=sf_account,
    warehouse=sf_warehouse,
    database=sf_database,
    schema=sf_schema
)

cursor = conn.cursor()

# Function to upload DataFrame
def append_to_snowflake(df, table_name):
    try:
        # Create temp CSV
        temp_csv = "/tmp/temp_fund_upload.csv"
        df.to_csv(temp_csv, index=False)

        # Create staging area in memory
        cursor.execute(f"PUT file://{temp_csv} @%{table_name} OVERWRITE = TRUE")

        # Copy from staged CSV to table
        columns = ",".join(df.columns)
        cursor.execute(f"""
            COPY INTO {table_name}
            FROM @%{table_name}
            FILE_FORMAT = (TYPE = CSV FIELD_OPTIONALLY_ENCLOSED_BY='"' SKIP_HEADER=1)
        """)

        print(f"✅ Data appended to {table_name} in Snowflake")

    except Exception as e:
        print("❌ Failed to upload data:", e)
    finally:
        cursor.close()
        conn.close()

# Example usage:
append_to_snowflake(df_funds, "PRODUCTMASTER")