<a href="https://colab.research.google.com/github/Jacob-Rose-BU/Alternative-Investments---Assette-Capstone-Project/blob/main/Fund_Metadata_OpenRouter_GPT_API_Source.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# .gitignore
/.env
/.venv
/__pycache__/
/*.pyc
/*.pyo
/*.pyd
/.pytest_cache/
/*.log
/.mypy_cache/
/*.db
/.DS_Store
# Ignore Python bytecode files
*.py[cod]
# Ignore Jupyter Notebook checkpoints
.ipynb_checkpoints/
# Ignore virtual environment directories
venv/
.env/
# Ignore IDE/editor specific files
.idea/
.vscode/
# Ignore coverage reports
coverage.xml
# Ignore build directories
build/
dist/
# Ignore package distribution files
*.egg-info/
# Ignore configuration files
*.cfg
*.ini

#GPT-Generated Synthetic Data
This notebook uses the OpenRouter API to generate high-quality synthetic factsheet commentary for investment funds. Currently based on only a fund name and reporting period (month/year), the notebook produces templated narrative sections: fund description, strategy, and manager commentary.

The outputs are structured to populate the fund_profile table in Snowflake, providing a content layer that complements quantitative holdings, ESG scores, and performance metrics. These data sources will be integrated into the generation once the API key is fixed and we are able to ensure the original code is clean and works with no errors in generation.  

##Execution Instructions
To run this notebook:

1. Ensure your OpenRouter API key is securely stored as environment variable (api_key)

2. Run the notebook top to bottom, confirming that each section runs completely and output is given in the final section

3. Edit prompt templates as needed for specific tone or format preferences

##File Roadmap
1. Define inputs (fund name and reporting month)

2. Build AI prompt templates for each narrative section:

  * Fund Description

  * Strategy Summary

  * Monthly Manager Commentary

3. Generate text for each section using a structured API call

4. Collect results into a standardized output list for downstream storage or processing

5. Potential Snowflake implementation to pull stock, performance, and ESG data to refine qualitative outputs. This is on hold until API KEY is verified to work.

##Next Steps
###GPT API Integration
* Improve prompt wording for more consistent tone and structure across fund types

* Add support for template frameworks (e.g. f-string-driven prompts)

* Validate and screen outputs for safe, compliant phrasing

* Wrap generation logic in a reusable function or class for easier integration

###Snowflake Integration
* Create load_gpt_factsheets.py to push narrative sections into Snowflake

* Structure table with keys: fund_id, as_of_month, field_name, content

* Align GPT narrative with actual structured fund metadata where available

###Enhancements
* Enable batch generation for multiple funds and periods

* Add logic to feed ESG or performance data into GPT context dynamically

* Implement prompt-response logging and versioning for traceability

* Extend support to multilingual output (e.g., Spanish, French)

In [40]:
!pip install python-dotenv



In [41]:
!pip install Faker



In [42]:
import os
import yaml
import random
import string
import pandas as pd
import requests
from datetime import datetime, timedelta
from faker import Faker
from dotenv import load_dotenv
from sqlalchemy import create_engine


In [None]:
# Load environment variables
load_dotenv()

# ENV VARS
api_key = os.getenv("OPENROUTER_API_KEY")
sf_user = os.getenv("SNOWFLAKE_USER")
sf_password = os.getenv("SNOWFLAKE_PASSWORD")
sf_account = os.getenv("SNOWFLAKE_ACCOUNT")
sf_warehouse = os.getenv("SNOWFLAKE_WAREHOUSE")
sf_database = os.getenv("SNOWFLAKE_DATABASE")
sf_schema = os.getenv("SNOWFLAKE_SCHEMA")

# REQUIRED COLUMNS from Snowflake/CSV
REQUIRED_COLS = [
    "PORTFOLIOCODE", "NAME", "INVESTMENTSTYLE", "PORTFOLIOCATEGORY",
    "OPENDATE", "PERFORMANCEINCEPTIONDATE", "TERMINATIONDATE",
    "BASECURRENCYCODE", "BASECURRENCYNAME", "ISBEGINOFDAYPERFORMANCE", "PRODUCTCODE"
]

# ----------- LOAD TABLE FROM SNOWFLAKE -----------
def get_snowflake_engine():
    return create_engine(
        f"snowflake://{sf_user}:{sf_password}@{sf_account}/{sf_database}/{sf_schema}?warehouse={sf_warehouse}"
    )

def load_portfolio_data(source="snowflake", csv_path=None):
    if source == "csv" and csv_path:
        df = pd.read_csv(csv_path)
    else:
        engine = get_snowflake_engine()
        query = "SELECT * FROM AST_ALTERNATIVES_DB.DBO.PORTFOLIOGENERALINFORMATION"
        df = pd.read_sql(query, engine)

    df = df[REQUIRED_COLS]
    df = df.dropna(subset=["PORTFOLIOCODE", "NAME"])  # Basic check
    return df

In [43]:
from google.colab import files
uploaded = files.upload()

Saving openrouterkey.env.txt to openrouterkey.env (2).txt


In [44]:
# Load environment variables
load_dotenv()

# Get the API key from environment variables
api_key = os.getenv("OPENROUTER_API_KEY")

if not api_key:
    raise ValueError("API key not found. Make sure it's defined in the .env file.")

# url = "https://openrouter.ai/api/v1/chat/completions"

# Corrected POST-based API request function
def fetch_data_from_api(prompt):
    url = "https://openrouter.ai/api/v1/chat/completions"

    headers = {
        "Authorization": f"Bearer {api_key}",
        "HTTP-Referer": "https://colab.research.google.com",  # Must match whitelisted origin
        "Content-Type": "application/json"
    }

    payload = {
        "model": "mistralai/mistral-7b-instruct",  # You can change this to any available OpenRouter model
        "messages": [{"role": "user", "content": prompt}],
        "temperature": 0.7,
        "max_tokens": 300
    }

    try:
        response = requests.post(url, headers=headers, json=payload, timeout=15)
        response.raise_for_status()  # Raises HTTPError for 4xx or 5xx status

        data = response.json()

        if "choices" not in data or not data["choices"]:
            raise ValueError("No choices returned in API response.")

        return data["choices"][0]["message"]["content"].strip()

    except requests.exceptions.HTTPError as http_err:
        print(f"HTTP error occurred: {http_err} - Status code: {response.status_code}")
        print("Response content:", response.text)
    except requests.exceptions.RequestException as req_err:
        print(f"Request error: {req_err}")
    except ValueError as ve:
        print(f"Data error: {ve}")
    except Exception as e:
        print(f"Unexpected error: {e}")

    return None

In [45]:
# def load_config(config_path="config.yaml"):
#     with open(config_path, "r") as file:
#         return yaml.safe_load(file)

# # ---------------------------
# # Generate synthetic fund records
# # ---------------------------
# def random_date(start_year):
#     start_date = datetime(start_year, 1, 1)
#     end_date = datetime.today()
#     return start_date + timedelta(days=random.randint(0, (end_date - start_date).days))

# def generate_fund_id():
#     return ''.join(random.choices(string.ascii_uppercase + string.digits, k=8))

# # portfolio
# def generate_funds(config):
#     faker = Faker()
#     funds = []
#     for _ in range(config["fund_generation"]["num_fund"]):
#         fund = {
#             "fund_id": generate_fund_id(),
#             "fund_name": faker.company(),
#             "fund_type": random.choice(config["fund_generation"]["fund_types"]),
#             "strategy": random.choice(config["fund_generation"]["strategies"]),
#             "inception_date": random_date(config["fund_generation"]["start_year"]).strftime("%Y-%m-%d"),
#             "benchmark_code": random.choice(config["fund_generation"]["benchmark_codes"]),
#             "status": random.choice(config["fund_generation"]["statuses"]),
#             "as_of_date": datetime.today().strftime("%Y-%m-%d")  # Add today's date as as_of_date
#         }
#         funds.append(fund)
#     return pd.DataFrame(funds)

In [46]:
# # EXAMPLE
# import os
# import requests
# from dotenv import load_dotenv

# load_dotenv()
# api_key = os.getenv("API_KEY")

# url = "https://openrouter.ai/api/v1/chat/completions"

# headers = {
#     "Authorization": f"Bearer {api_key}",
#     "HTTP-Referer": "https://colab.research.google.com",  # <- THIS MUST MATCH WHITELIST
#     "Content-Type": "application/json"
# }

# payload = {
#     "model": "mistralai/mistral-7b-instruct",
#     "messages": [{"role": "user", "content": "Explain clean energy in one sentence."}],
#     "temperature": 0.7,
#     "max_tokens": 150
# }

# try:
#     response = requests.post(url, headers=headers, json=payload)
#     response.raise_for_status()
#     print("✅ Response:", response.json()["choices"][0]["message"]["content"])
# except requests.exceptions.HTTPError as e:
#     print(f"❌ HTTP error: {e} - Status Code: {response.status_code}")
#     print("🔍 Response content:", response.text)

✅ Response:  Clean energy refers to power sources that are renewable, sustainable, and produce little to no pollution, such as solar, wind, hydro, and geothermal power, as opposed to fossil fuels that deplete naturally and contribute significantly to greenhouse gas emissions.


In [47]:
def create_fund_description_prompt(fund):
    return f"""
Write a concise description for a hypothetical alternatives investment fund named {fund['fund_name']}.
The fund was launched in {fund['inception_date']} and is currently {fund['status']}.
It follows a {fund['strategy']} strategy and is benchmarked to {fund['benchmark_code']}.
The fund focuses on {fund['fund_type'].lower()} investors seeking exposure to sustainable and impact-driven projects.
Describe its core investment areas, impact goals, and investor appeal.
"""

def create_strategy_prompt(fund):
    return f"""
Summarize the investment strategy of the U.S.-based fund named {fund['fund_name']}, launched in {fund['inception_date']}.
It employs a {fund['strategy']} approach and focuses on scaling sustainable infrastructure or clean energy in underserved regions.
Benchmark reference: {fund['benchmark_code']}.
Write in an institutional tone. Limit to 100 words.
"""

def create_manager_commentary_prompt(fund):
    # Convert as_of_date (YYYY-MM-DD) → "Month Year" (e.g., "May 2024")
    as_of_str = fund.get("as_of_date", "")
    try:
        month_year = datetime.strptime(as_of_str, "%Y-%m-%d").strftime("%B %Y")
    except Exception:
        month_year = "a recent period"

    return f"""
Provide a professional manager commentary for the fund {fund['fund_name']} for the month of {month_year}.
The fund delivered stable returns and impact outcomes aligned with its {fund['strategy']} strategy.
Include:
- Commentary on clean tech or sustainability themes
- A notable challenge or risk
- A brief forward-looking perspective on the market
Target a concise and polished institutional tone.
"""

In [None]:
# ----------- PROMPT GENERATION -----------
def create_fund_description_prompt(row):
    return f"""
Write a concise description for a hypothetical fund named {row['NAME']}.
Launched on {row['OPENDATE']}, this fund is categorized as {row['PORTFOLIOCATEGORY']} with an investment style of {row['INVESTMENTSTYLE']}.
It is denominated in {row['BASECURRENCYNAME']} and serves as part of product line {row['PRODUCTCODE']}.
Describe its key themes, goals, and investor appeal.
"""

def create_strategy_prompt(row):
    return f"""
Summarize the investment strategy for the fund {row['NAME']}, opened on {row['OPENDATE']}, classified as {row['PORTFOLIOCATEGORY']} and following a {row['INVESTMENTSTYLE']} style.
The fund operates in {row['BASECURRENCYNAME']} and is intended for investors seeking structured long-term outcomes.
Use an institutional tone. Keep under 100 words.
"""

def create_manager_commentary_prompt(row):
    try:
        month_year = datetime.today().strftime("%B %Y")
    except:
        month_year = "a recent period"

    return f"""
Provide a professional manager commentary for the fund {row['NAME']} for {month_year}.
Include:
- Commentary on sustainability themes or macro conditions
- A notable challenge or performance risk
- A forward-looking view
Write in a polished institutional tone.
"""

In [48]:
def generate_qualitative_paragraphs(df):
    qualitative_rows = []

    for _, fund in df.iterrows():
        as_of_date = fund.get("as_of_date", "")
        try:
            month_year = datetime.strptime(as_of_date, "%Y-%m-%d").strftime("%B %Y")
        except:
            month_year = "Unknown"

        desc_prompt = create_fund_description_prompt(fund)
        strat_prompt = create_strategy_prompt(fund)
        comm_prompt = create_manager_commentary_prompt(fund)

        print(f"📝 Generating for {fund['fund_name']}")

        fund_description = fetch_data_from_api(desc_prompt)
        fund_strategy = fetch_data_from_api(strat_prompt)
        fund_commentary = fetch_data_from_api(comm_prompt)

        qualitative_rows.append({
            **fund,
            "fund_description": fund_description,
            "fund_strategy": fund_strategy,
            f"fund_commentary_{month_year}": fund_commentary
        })

    return pd.DataFrame(qualitative_rows)

In [None]:
# ----------- GENERATE PARAGRAPHS -----------
def generate_qualitative_for_all(df):
    rows = []
    for _, row in df.iterrows():
        print(f"🧠 Generating for {row['NAME']}")

        desc = fetch_data_from_api(create_fund_description_prompt(row))
        strat = fetch_data_from_api(create_strategy_prompt(row))
        comm = fetch_data_from_api(create_manager_commentary_prompt(row))

        rows.append({
            **row,
            "FUND_DESCRIPTION": desc,
            "FUND_STRATEGY": strat,
            "MANAGER_COMMENTARY": comm
        })

    return pd.DataFrame(rows)

In [None]:
# ----------- WRITE BACK TO SNOWFLAKE -----------
def append_to_snowflake_table(df, table="PORTFOLIOGENERALINFORMATION"):
    engine = get_snowflake_engine()
    df.to_sql(table, con=engine, if_exists="append", index=False)
    print("✅ Appended enriched rows to Snowflake.")

# ----------- MAIN -----------
if __name__ == "__main__":
    portfolio_df = load_portfolio_data(source="snowflake")
    enriched_df = generate_qualitative_for_all(portfolio_df)
    append_to_snowflake_table(enriched_df)

In [50]:
if __name__ == "__main__":
    # config = load_config("config.yaml")
    # fund_df = generate_funds(config)
    enriched_df = generate_qualitative_paragraphs(fund_df)

    enriched_df.to_csv("fund_qualitative_outputs.csv", index=False)
    print("✅ Generated qualitative fund data saved to fund_qualitative_outputs.csv")

📝 Generating for Collins-Gutierrez
📝 Generating for Ray and Sons
📝 Generating for Ramos, Williams and Kennedy
📝 Generating for Wiggins, Wilkerson and Harvey
📝 Generating for Berry, Perry and Carlson
📝 Generating for Blake Inc
📝 Generating for Sanchez LLC
📝 Generating for Bernard, Gillespie and Walker
📝 Generating for Middleton, Acosta and English
📝 Generating for Joseph-Brennan
✅ Generated qualitative fund data saved to fund_qualitative_outputs.csv


#Potential Snowflake Implementation with Stock and ESG Data

In [None]:
!pip install snowflake-connector-python
!pip install pandas

In [None]:
import snowflake.connector

# Load credentials
load_dotenv()

sf_user = os.getenv("SNOWFLAKE_USER")
sf_password = os.getenv("SNOWFLAKE_PASSWORD")
sf_account = os.getenv("SNOWFLAKE_ACCOUNT")
sf_database = os.getenv("SNOWFLAKE_DATABASE")
sf_schema = os.getenv("SNOWFLAKE_SCHEMA")
sf_warehouse = os.getenv("SNOWFLAKE_WAREHOUSE")

# Connect to Snowflake
conn = snowflake.connector.connect(
    user=sf_user,
    password=sf_password,
    account=sf_account,
    warehouse=sf_warehouse,
    database=sf_database,
    schema=sf_schema
)

cursor = conn.cursor()

# Function to upload DataFrame
def append_to_snowflake(df, table_name):
    try:
        # Create temp CSV
        temp_csv = "/tmp/temp_fund_upload.csv"
        df.to_csv(temp_csv, index=False)

        # Create staging area in memory
        cursor.execute(f"PUT file://{temp_csv} @%{table_name} OVERWRITE = TRUE")

        # Copy from staged CSV to table
        columns = ",".join(df.columns)
        cursor.execute(f"""
            COPY INTO {table_name}
            FROM @%{table_name}
            FILE_FORMAT = (TYPE = CSV FIELD_OPTIONALLY_ENCLOSED_BY='"' SKIP_HEADER=1)
        """)

        print(f"✅ Data appended to {table_name} in Snowflake")

    except Exception as e:
        print("❌ Failed to upload data:", e)
    finally:
        cursor.close()
        conn.close()

# Example usage:
append_to_snowflake(enriched_df, "PORTPORTFOLIOGENERALINFORMATION")