<a href="https://colab.research.google.com/github/Jacob-Rose-BU/Alternative-Investments---Assette-Capstone-Project/blob/main/yFinance_Related_Tables.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**ESG Equity Fact Sheet - YFinance Data Pipeline**

This notebook pulls real financial data using the Yahoo Finance API via the yfinance library. It generates tables for Security Master, ESG data, and historical price performance for U.S. equities.The extracted data is prepared to be loaded into Snowflake for downstream use in ESG fund fact sheets.


###**Execution Instructions**

**To run this notebook:**
1. Update your Snowflake credentials in the environment or connection file.
2. Run the notebook sequentially from top to bottom.

### **File Roadmap**
Pull S&P 500, NASDAQ 100, Dow Jones tickers <br>
Pull yfinance data for valid tickers <br>
Extract and clean ESG and performance history <br>
Push to Snowflake

**Output:** 3 tables in snowflake (security_master, esg_stock_data, stock_performance_history)


### **Next Steps**
#### **yfinance**
- Improve ESG completeness check (what to do when ESG data is not given in yfinance - maybe pull in ESG API)
- Add performance benchmark (SPESG & SUSL)

####**Snowflake SQl Documentation**
- Write code for Holdings creation in Snowflake (Friday Conversation w/ Corey)
- Create documentation for reusable steps for top 10 holdings by weight
- Create documentation for reusable steps for fund level ESG score aggregation
- Create documentation for reusable steps for fund performance versus benchmark

### **Future Improvement:**
- Automate periodic data refresh
- Add additional tickers
- Backfull daily performance

# **Connect to Snowflake**


To load data into Snowflake, we established a secure connection using credentials stored in a .env file. This connection allows us to push data directly from Python. The pipeline is designed to check if tables already exist, create them if needed, and merge new data while avoiding duplicates. This setup enables seamless integration between our local data processing and Snowflake's cloud warehouse, supporting scalable, centralized storage for downstream analytics like ESG reporting and fact sheet generation.

In [1]:
#load the .env file
from google.colab import files
files.upload()

Saving .env.txt to .env.txt


{'.env.txt': b'SNOWFLAKE_ACCOUNT=assette-ssappoc\nSNOWFLAKE_USER=CRYSTALL\nSNOWFLAKE_PASSWORD=Bbnmghjtyu123!\nSNOWFLAKE_ROLE=AST_ALTERNATIVES_DB_RW\nSNOWFLAKE_WAREHOUSE=AST_BU_WH\nSNOWFLAKE_DATABASE=AST_ALTERNATIVES_DB\nSNOWFLAKE_SCHEMA=DBO'}

In [2]:
#rename the file if needed
import os

if os.path.exists(".env.txt"):
    os.rename(".env.txt", ".env")
    print("Renamed .env.txt to .env")
else:
    print("File not found. Make sure you uploaded .env.txt.")

Renamed .env.txt to .env


In [3]:
!pip install snowflake-connector-python python-dotenv

Collecting snowflake-connector-python
  Downloading snowflake_connector_python-3.16.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (71 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/71.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m71.8/71.8 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting python-dotenv
  Downloading python_dotenv-1.1.1-py3-none-any.whl.metadata (24 kB)
Collecting asn1crypto<2.0.0,>0.24.0 (from snowflake-connector-python)
  Downloading asn1crypto-1.5.1-py2.py3-none-any.whl.metadata (13 kB)
Collecting boto3>=1.24 (from snowflake-connector-python)
  Downloading boto3-1.40.3-py3-none-any.whl.metadata (6.7 kB)
Collecting botocore>=1.24 (from snowflake-connector-python)
  Downloading botocore-1.40.3-py3-none-any.whl.metadata (5.7 kB)
Collecting jmespath<2.0.0,>=0.7.1 (from boto3>=1.24->snowflake-connector-python)
  Downloading jmespath-1.0.1-py3-none-any

In [4]:
import os
from dotenv import load_dotenv
import snowflake.connector

# Load .env file data
load_dotenv(".env")


True

In [5]:
#use .env paramaters to connect to snowflake
def get_snowflake_connection():
    return snowflake.connector.connect(
        user=os.getenv("SNOWFLAKE_USER"),
        password=os.getenv("SNOWFLAKE_PASSWORD"),
        account=os.getenv("SNOWFLAKE_ACCOUNT"),
        role=os.getenv("SNOWFLAKE_ROLE"),
        warehouse=os.getenv("SNOWFLAKE_WAREHOUSE"),
        database=os.getenv("SNOWFLAKE_DATABASE"),
        schema=os.getenv("SNOWFLAKE_SCHEMA")
    )
#connection - connection is authenticated
connection = get_snowflake_connection()
#lets me run SQL commands
cursor = connection.cursor()


In [6]:
from snowflake.connector.pandas_tools import write_pandas

def safe_quote(col: str) -> str:
    """
    Ensures column names are safely quoted for Snowflake SQL syntax.
    Replaces internal quotes and wraps the name in double quotes.
    """
    col = str(col).replace('"', '""').strip()
    return f'"{col}"'

def map_dtype_to_snowflake(dtype):
    """
    Maps pandas dtypes to Snowflake SQL data types.
    """
    if pd.api.types.is_float_dtype(dtype):
        return "FLOAT"
    elif pd.api.types.is_integer_dtype(dtype):
        return "NUMBER"
    else:
        return "VARCHAR"

def load_to_snowflake_merge(df, table_name, conn, unique_keys):
    """
    Uploads DataFrame to Snowflake with type inference and merge logic.
    """
    cur = conn.cursor()
    df_cols = df.columns.tolist()
    temp_table = f"{table_name}_STAGING"

    # Step 1: Infer column types and create table if needed
    col_defs = ", ".join([
        f"{safe_quote(col)} {map_dtype_to_snowflake(df[col].dtype)}"
        for col in df_cols
    ])
    cur.execute(f"CREATE TABLE IF NOT EXISTS {table_name} ({col_defs})")

    # Step 2: Add any missing columns to the main table
    cur.execute(f"DESC TABLE {table_name}")
    existing_cols = {row[0].upper() for row in cur.fetchall()}
    for col in df_cols:
        if col.upper() not in existing_cols:
            col_type = map_dtype_to_snowflake(df[col].dtype)
            cur.execute(f"ALTER TABLE {table_name} ADD COLUMN {safe_quote(col)} {col_type}")

    # Step 3: Create staging table
    cur.execute(f"CREATE OR REPLACE TABLE {temp_table} ({col_defs})")
    write_pandas(conn, df, temp_table)

    # Step 4: Merge without duplication
    on_clause = " AND ".join([f"t.{safe_quote(col)} = s.{safe_quote(col)}" for col in unique_keys])
    insert_cols = ", ".join([safe_quote(col) for col in df_cols])
    insert_vals = ", ".join([f"s.{safe_quote(col)}" for col in df_cols])

    merge_stmt = f"""
        MERGE INTO {table_name} t
        USING {temp_table} s
        ON {on_clause}
        WHEN NOT MATCHED THEN
            INSERT ({insert_cols}) VALUES ({insert_vals})
    """
    cur.execute(merge_stmt)

    # Step 5: Clean up
    cur.execute(f"DROP TABLE IF EXISTS {temp_table}")
    cur.close()
    conn.close()

    print(f"{table_name} updated. Duplicates prevented using keys: {unique_keys}")


In [7]:
#test the connection
cursor.execute("SELECT CURRENT_USER(), CURRENT_ROLE(), CURRENT_DATABASE(), CURRENT_DATE;")

for row in cursor:
    print(row)

('CRYSTALL', 'AST_ALTERNATIVES_DB_RW', 'AST_ALTERNATIVES_DB', datetime.date(2025, 8, 5))


In [8]:
#close SQL cursor
cursor.close()
#close connection to snowflake
connection.close()

# **Securities List**

This code builds a clean and verified list of stocks by scraping 3 major US equity indices from Wikipedia: the S&P500, Dow Jones Industrial Average, and NASDAQ 100. Each of these index lists are retried through BeautifulSoup. The extracted tickers are combined into a single list, cleaned, conform to the expected format, and deduplicated between the 3 indexes. To ensure only valid tickers are included a function was defined to check that Yahoo Finance returns metadata.
<br> <br>
The tickers chosen (S&P500, Dow Jones Industrial Average, NASDAQ 100) represent large, liquid, and well known US companies. They are likely to be included in popular retail and institutional funds, making them a reasonable starting point for building fund simulations and a security master. The decision to limit the scope to these indives was intentional, by focusing on high confdence symbols, the code minimizes errors and avoids excessive querying that could trigger rate limits or bands fron the Yahoo Finance API.

In [9]:
import yfinance as yf
import pandas as pd
import time
from bs4 import BeautifulSoup
import requests

#get tickers from sp500
def get_sp500_tickers():
    url = "https://en.wikipedia.org/wiki/List_of_S%26P_500_companies"
    soup = BeautifulSoup(requests.get(url).text, "lxml")
    table = soup.find("table", {"id": "constituents"})
    return [row.find_all("td")[0].text.strip() for row in table.find_all("tr")[1:]]

#get tickers from dow jones indstrial
def get_dow_tickers():
    url = "https://en.wikipedia.org/wiki/Dow_Jones_Industrial_Average"
    soup = BeautifulSoup(requests.get(url).text, "lxml")
    table = soup.find("table", {"id": "constituents"})
    return [
        row.find_all("td")[1].find("a").text.strip()
        for row in table.find_all("tr")[1:]
        if len(row.find_all("td")) >= 2
    ]

#get tickers from NASDAQ-100
def get_nasdaq100_tickers():
    url = "https://en.wikipedia.org/wiki/NASDAQ-100"
    soup = BeautifulSoup(requests.get(url).text, "lxml")
    table = soup.find("table", {"id": "constituents"})
    return [
        row.find_all("td")[0].text.strip()
        for row in table.find_all("tr")[1:]
        if len(row.find_all("td")) >= 1
    ]


#validate that these are real tickers
def is_valid_ticker(ticker):
    try:
        info = yf.Ticker(ticker).info
        return "shortName" in info
    except:
        return False

#pull all tickers from all of the above sources
sp500 = get_sp500_tickers()
dow = get_dow_tickers()
nasdaq = get_nasdaq100_tickers()

#combine and make tickers unique
all_tickers = sorted(set(sp500 + dow + nasdaq))

#data cleaning
all_tickers = [t.replace('.', '-') for t in all_tickers]


# **Insert yFinance Data into Snowflake Tables**


This code is designed to build 3 key financial datasets, a security master table, ESG scores, and 10 years of historical price performance, for a list of securities. Using the Yahoo Finance API, the script loops through each ticker and retrieves metadata like company name, sector,, industry, market cap, and trading exchanges. It also attemps to fetch ESG related metrics (if available) and daily price history over the past 10 years. All this information is stored in separate DataFrames. If a ticker fails to return valid metadata or historical price data, it is logged into a failed ticker list. Once the data is collected, the script standardizes data formatting and pushes each DataFrame directly to Snowflake using a merg strategy, avoiding duplicates based on defined unique keys. <br> <br>
Initally, the code included additional tables such as price snapshots, fundamentals, and analyst estimates. But, we decided to remove these from the pipeline because they aren't directly used in our target deliverable, the fund fact sheet. While they may be used in the broader portfolio analytics or internal risk assessments, they were out of scope for this specific task. <br>
The security master table is foundational to the pipeline, as it centralizes all core attributes about the securities in our dataset. It ensures consistency and enables future joins with holdings, ESG metrics, and price data. We also collect daily price history over a 10 year horizon to support fund level performance analysis, invluding quarter over quarter or year over year changes. Ideally, this time range should be extended further to reflect real world investment horizons more accurately. However, we limited the query to 10 years to avoid triggering API rate limits of blocks from Yahoo Finance. A long-term enhancement would be periodically refresh historical data and biild a more robust price history system over time. <br> <br>
While this current version is designed to run once and populate Snowflake, future iterations could introduce automated checks. Fo rexample, if a security appears in holdings but is missing from the security master, the system should automatically fetch and populate its metadata from Yahoo Finance. This would ensure the pipeline remains dynamic and scalable as fund compositions evolve.

In [81]:
#14 mins to run
import yfinance as yf
import datetime
import pandas as pd
import time

# Initialize lists
security_master = []
combined_data = []
failed_tickers = []

# --- Date range for historical price data (last 10 years) ---
start_date = (datetime.datetime.today() - datetime.timedelta(days=365 * 10)).strftime('%Y-%m-%d')
end_date = datetime.datetime.today().strftime('%Y-%m-%d')

for symbol in all_tickers:
    try:
        t = yf.Ticker(symbol)
        info = t.info

        # Skip invalid tickers
        if not info or "shortName" not in info:
            print(f"No valid info for {symbol}")
            failed_tickers.append(symbol)
            continue

        # --- Security Master ---
        security_master.append({
            "ticker": symbol,
            "shortName": info.get("shortName"),
            "name": info.get("longName"),
            "sector": info.get("sector"),
            "industry": info.get("industry"),
            "exchange": info.get("exchange"),
            "currency": info.get("currency"),
            "country": info.get("country"),
            "market_cap": info.get("marketCap")
        })

        # --- Historical Performance ---
        hist = t.history(start=start_date, end=end_date)
        if hist.empty:
            print(f"No price history for {symbol}")
            failed_tickers.append(symbol)
            continue

        hist = hist.reset_index()
        hist["ticker"] = symbol
        hist["data_type"] = "price"
        combined_data.append(hist)

        # --- ESG as single-row record ---
        sustainability = t.sustainability
        if sustainability is not None and not sustainability.empty:
            row = sustainability.transpose()
            esg_row = {
                "Date": pd.to_datetime("today"),
                "Open": None,
                "High": None,
                "Low": None,
                "Close": None,
                "Volume": None,
                "Dividends": None,
                "Stock Splits": None,
                "ticker": symbol,
                "data_type": "esg",
                "esgPerformance": row.get("esgPerformance", {}).values[0] if "esgPerformance" in row else None,
                "totalEsg": row.get("totalEsg", {}).values[0] if "totalEsg" in row else None,
                "environmentScore": row.get("environmentScore", {}).values[0] if "environmentScore" in row else None,
                "socialScore": row.get("socialScore", {}).values[0] if "socialScore" in row else None,
                "governanceScore": row.get("governanceScore", {}).values[0] if "governanceScore" in row else None,
                "highestControversy": row.get("highestControversy", {}).values[0] if "highestControversy" in row else None
            }
            combined_data.append(pd.DataFrame([esg_row]))

        # Sleep to avoid hitting API rate limits
        time.sleep(1)

    except Exception as e:
        print(f"Error with {symbol}: {e}")
        failed_tickers.append(symbol)

# Convert to DataFrames
df_security_master = pd.DataFrame(security_master)
df_performance = pd.concat(combined_data, ignore_index=True) if combined_data else pd.DataFrame()

df_performance["Date"] = pd.to_datetime(df_performance["Date"], utc=True).dt.tz_localize(None).dt.date


# --- Output summary ---
print(f"\n Finished processing {len(all_tickers)} tickers.")
print(f" Failed tickers: {len(failed_tickers)}")
print(failed_tickers)



ERROR:yfinance:HTTP Error 404: 
ERROR:yfinance:HTTP Error 404: 
ERROR:yfinance:HTTP Error 404: 
ERROR:yfinance:HTTP Error 404: 
ERROR:yfinance:HTTP Error 404: 
ERROR:yfinance:HTTP Error 404: 
  df_performance = pd.concat(combined_data, ignore_index=True) if combined_data else pd.DataFrame()



 Finished processing 517 tickers.
 Failed tickers: 0
[]


In [92]:
#connect to snowflake and load the data directly bypassing any previously loaded data
conn = get_snowflake_connection()
load_to_snowflake_merge(df_security_master, "SECURITY_MASTER", conn, unique_keys=["ticker"])

conn = get_snowflake_connection()
if not df_performance.empty:
    load_to_snowflake_merge(df_performance, "SECURITY_PERFORMANCE_HISTORY", conn, unique_keys=["Date", "ticker"])


SECURITY_MASTER updated. Duplicates prevented using keys: ['ticker']
SECURITY_PERFORMANCE_HISTORY updated. Duplicates prevented using keys: ['Date', 'ticker']


# **yFinance Benchmark Indexes**

theres no esg score for indexes
chose one index for each fund focus.
initally hardcoded some column fields but in order to make it repeatable and applied to other areas then did a config. also maybe reference currency info from the currency table instead of currency_full_name.

add this information into the 2-3 tables that are available for benchmarks.

make its own table in snowflake. do a join between benchmark tables and

In [83]:
import yfinance as yf
import pandas as pd
import datetime

# Define benchmark tickers
benchmark_tickers = ["ENRG", "SHE", "VOTE", "ESGD", "EFIV"]

# Define date range (last 3 years)
start_date = datetime.datetime.today() - datetime.timedelta(days=365 * 10)
end_date = datetime.datetime.today()

# Initialize containers
benchmark_performance = []
benchmark_general_information = []
benchmark_characteristics = []

# Pull data for each benchmark
for ticker in benchmark_tickers:
    try:
        t = yf.Ticker(ticker)
        info = t.info
        hist = t.history(start=start_date, end=end_date)

        if hist.empty or not info:
            continue

        # === Benchmark Performance ===
        hist = hist.reset_index()
        df_perf = pd.DataFrame({
            "BENCHMARKCODE": ticker.lower(),
            "PERFORMANCETYPE": "Prices",
            "CURRENCYCODE": info.get("currency", "USD"),
            "CURRENCY": info.get("financialCurrency", info.get("currency", "USD")),
            "PERFORMANCEFREQUENCY": "Daily",
            "VALUE": hist["Close"],
            "HISTORYDATE1": hist["Date"].dt.date,
            "HISTORYDATE": hist["Date"]
        })
        benchmark_performance.append(df_perf)

        # === Benchmark General Information ===
        benchmark_general_information.append({
            "BENCHMARKCODE": ticker.lower(),
            "TICKER": ticker,
            "NAME": info.get("longName", info.get("shortName", ticker)),
            "ISBEGINOFDAYPERFORMANCE": False
        })

        # === Benchmark Characteristics (auto-detect numeric fields) ===
        for key, value in info.items():
            if isinstance(value, (int, float)):
                benchmark_characteristics.append({
                    "BENCHMARKCODE": ticker.lower(),
                    "CURRENCYCODE": info.get("currency", "USD"),
                    "CURRENCY": info.get("financialCurrency", info.get("currency", "USD")),
                    "LANGUAGECODE": "en-US",
                    "CATEGORY": "Total",
                    "CATEGORYNAME": None,
                    "CHARACTERISTICNAME": key,
                    "CHARACTERISTICDISPLAYNAME": key.replace('_', ' ').title(),
                    "STATISTICTYPE": "NA",
                    "CHARACTERISTICVALUE": value,
                    "ABBREVIATEDTEXT": None,
                    "HISTORYDATE": datetime.date.today()
                })

    except Exception as e:
        print(f" Error with {ticker}: {e}")



In [84]:
# Create DataFrames
df_benchmark_performance = pd.concat(benchmark_performance, ignore_index=True)
df_benchmark_general_info = pd.DataFrame(benchmark_general_information)
df_benchmark_characteristics = pd.DataFrame(benchmark_characteristics)

# Preview
print(" Performance sample:")
print(df_benchmark_performance.head(3))
print("\n General Info sample:")
print(df_benchmark_general_info.head(3))
print("\n Characteristics sample:")
print(df_benchmark_characteristics.head(3))


 Performance sample:
  BENCHMARKCODE PERFORMANCETYPE CURRENCYCODE CURRENCY PERFORMANCEFREQUENCY  \
0          enrg          Prices          USD      USD                Daily   
1           she          Prices          USD      USD                Daily   
2           she          Prices          USD      USD                Daily   

       VALUE HISTORYDATE1               HISTORYDATE  
0  25.000000   2025-01-06 2025-01-06 00:00:00-05:00  
1  45.014851   2016-03-08 2016-03-08 00:00:00-05:00  
2  44.992115   2016-03-09 2016-03-09 00:00:00-05:00  

 General Info sample:
  BENCHMARKCODE TICKER                                NAME  \
0          enrg   ENRG                Ninepoint Energy ETF   
1           she    SHE  SPDR MSCI USA Gender Diversity ETF   
2          vote   VOTE               TCW Transform 500 ETF   

   ISBEGINOFDAYPERFORMANCE  
0                    False  
1                    False  
2                    False  

 Characteristics sample:
  BENCHMARKCODE CURRENCYCODE CURRENC

# **HoldingsDetails & PortfolioPerformance**


holdingsdetails does not have all, just has the most important fields. also added some fields to portfolioperformance as well to reflect ESG scores and the portfoliofocus

In [94]:
import pandas as pd
import numpy as np
from datetime import datetime

# Normalize 'Date' column
df_performance.rename(columns=lambda x: x.strip().capitalize() if x.lower() == 'date' else x, inplace=True)

# Step 1: Filter valid symbols
valid_symbols = df_security_master['ticker'].unique()

# Step 2: Extract ESG and price data
df_esg = df_performance[df_performance['data_type'] == 'esg'].copy()
df_esg = df_esg[df_esg['ticker'].isin(valid_symbols)]
df_esg_latest = df_esg.sort_values('Date').drop_duplicates('ticker', keep='last')

df_price = df_performance[df_performance['data_type'] == 'price'].copy()
df_price = df_price[df_price['ticker'].isin(valid_symbols)]
df_price_latest = df_price.sort_values('Date').drop_duplicates('ticker', keep='last')
df_price_latest = df_price_latest[['ticker', 'Date', 'Close']].rename(columns={'Close': 'price'})

# Step 3: Define synthetic funds
funds = pd.DataFrame({
    "PORTFOLIOCODE": [
        "Climate_Leaders_Fund", "Social_Impact_Fund", "Governance_Focused_Fund",
        "Low_Controversy_Fund", "Overall_ESG_Leaders", "Stakeholder_Advocacy_Fund"
    ],
    "PORTFOLIOFOCUS": [
        "environmentScore", "socialScore", "governanceScore",
        "highestControversy", "totalEsg", "socialScore"
    ]
})

# Step 4: Generate holdingsdetails table
fund_value_usd = 100_000_000
holdings_rows = []

for _, fund in funds.iterrows():
    focus = fund["PORTFOLIOFOCUS"]
    df = df_esg_latest.copy()

    # Correct sorting logic: lower score = better, except highestControversy
    df = df.sort_values(focus, ascending=(focus != 'highestControversy')).dropna(subset=[focus]).head(20).copy()


    df["raw_weight"] = np.abs(np.random.rand(len(df)))
    df = df.merge(df_price_latest, on='ticker', how='left').dropna(subset=["price"])
    df["weight"] = df["raw_weight"] / df["raw_weight"].sum()


    df_meta = df.merge(df_security_master, on='ticker', how='left')

    for _, row in df_meta.iterrows():
        invested = fund_value_usd * row["weight"]
        shares = invested / row["price"]
        holdings_rows.append({
            "PORTFOLIOCODE": fund["PORTFOLIOCODE"],
            "CURRENCYCODE": row["currency"],
            "CURRENCY": "US Dollar",
            "ISSUENAME": row["name"],
            "TICKER": row["ticker"],
            "QUANTITY": round(shares, 2),
            "MARKETVALUE": round(invested, 2),
            "PORTFOLIOWEIGHT": round(row["weight"] , 6),
            "PRICE": round(row["price"], 2),
            "ASSETCLASSNAME": row["sector"],
            "ISSUETYPE": row["industry"],
            "ISSUECOUNTRYCODE": row["exchange"],
            "ISSUECOUNTRY": row["country"],
            "HISTORYDATE": row["Date_x"]
        })

df_holdingsdetails = pd.DataFrame(holdings_rows)

# Step 5: Create portfolioperformance table
performance_date = pd.to_datetime("2025-06-30")
inception_date = pd.to_datetime("2023-01-01")

# Compute ESG averages based on fund_focus
avg_scores = []
for _, fund in funds.iterrows():
    focus = fund["PORTFOLIOFOCUS"]
    symbols = df_holdingsdetails[df_holdingsdetails["PORTFOLIOCODE"] == fund["PORTFOLIOCODE"]]["TICKER"]
    values = df_esg_latest[df_esg_latest["ticker"].isin(symbols)][focus]

    if focus == "highestControversy":
        avg_score = round(values.max(), 2)  # higher is better
    else:
        avg_score = round(values.min(), 2)  # lower is better

    avg_scores.append(avg_score)

funds["AVERAGE_ESG_SCORE"] = avg_scores

df_portfolioperformance = funds.assign(
    HISTORYDATE=performance_date,
    CURRENCYCODE="USD",
    CURRENCY="US Dollar",
    PERFORMANCECATEGORY="Asset Class",
    PERFORMANCECATEGORYNAME="Total Portfolio",
    PERFORMANCETYPE="Portfolio Gross",
    PERFORMANCEINCEPTIONDATE=inception_date,
    PORTFOLIOINCEPTIONDATE=inception_date,
    PERFORMANCEFREQUENCY="D",
    PERFORMANCEFACTOR=np.round(np.random.normal(loc=0.001, scale=0.01, size=len(funds)), 6)
)

# --- OUTPUT ---
print(" HoldingsDetails:")
print(df_holdingsdetails.head())

print("\n PortfolioPerformance:")
print(df_portfolioperformance.head())



 HoldingsDetails:
          PORTFOLIOCODE CURRENCYCODE   CURRENCY  \
0  Climate_Leaders_Fund          USD  US Dollar   
1  Climate_Leaders_Fund          USD  US Dollar   
2  Climate_Leaders_Fund          USD  US Dollar   
3  Climate_Leaders_Fund          USD  US Dollar   
4  Climate_Leaders_Fund          USD  US Dollar   

                                  ISSUENAME TICKER  QUANTITY  MARKETVALUE  \
0                                   Aon plc    AON  25488.99   9216817.76   
1  The Interpublic Group of Companies, Inc.    IPG  92346.06   2284641.60   
2                  Palo Alto Networks, Inc.   PANW  39655.22   6705301.53   
3                               DaVita Inc.    DVA   1389.02    195184.52   
4                              Nasdaq, Inc.   NDAQ  63989.93   6164789.67   

   PORTFOLIOWEIGHT   PRICE          ASSETCLASSNAME  \
0         0.092168  361.60      Financial Services   
1         0.022846   24.74  Communication Services   
2         0.067053  169.09              Technology

# **DataFrame Summary**


In [96]:

print(" df_security_master:")
print(df_security_master.info())
print(" df_performance:")
print(df_performance.info())
print(" df_benchmark_performance:")
print(df_benchmark_performance.info())
print(" df_benchmark_general_info:")
print(df_benchmark_general_info.info())
print(" df_benchmark_characteristics:")
print(df_benchmark_characteristics.info())
print(" df_holdingsdetails:")
print(df_holdingsdetails.info())
print(" df_portfolioperformance:")
print(df_portfolioperformance.info())


 df_security_master:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 517 entries, 0 to 516
Data columns (total 9 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   ticker      517 non-null    object
 1   shortName   517 non-null    object
 2   name        517 non-null    object
 3   sector      517 non-null    object
 4   industry    517 non-null    object
 5   exchange    517 non-null    object
 6   currency    517 non-null    object
 7   country     517 non-null    object
 8   market_cap  517 non-null    int64 
dtypes: int64(1), object(8)
memory usage: 36.5+ KB
None
 df_performance:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1260395 entries, 0 to 1260394
Data columns (total 16 columns):
 #   Column              Non-Null Count    Dtype  
---  ------              --------------    -----  
 0   Date                1260395 non-null  object 
 1   Open                1259884 non-null  float64
 2   High                1259884 non-null  fl