# SBTi-Finance Tool - Calculate Portfolio Coverage
This notebook calculates portfolio coverage - the percentage of your investment portfolio that holds companies with validated Science Based Targets (SBTs).

**Coverage is measured two ways:**
1. **By Investment Value ($)** - What % of your portfolio's value is in SBT companies
2. **By Company Count (#)** - What % of companies in your portfolio have SBTs

This notebook does not calculate temperature scores or use weighted aggregation methods (WATS). For temperature scoring, use the temperature rating notebooks.

# Quick Start Guide

## What This Notebook Does
This notebook calculates **portfolio coverage** - the percentage of your investment portfolio that holds companies with validated Science Based Targets (SBTs).

**Two coverage metrics are calculated:**
- **By Investment Value ($)** - What % of your portfolio's dollar value is in SBT companies
- **By Company Count (#)** - What % of companies in your portfolio have SBTs

Both metrics are shown for:
1. All SBT targets (any ambition level)
2. 1.5°C targets only (configurable)

## Before You Begin

### Required Data
You need a portfolio file (CSV or Excel) with these columns:
| Column | Required? | Description |
|--------|-----------|-------------|
| `company_name` | Yes | Company name (must be unique per row) |
| `company_id` | Yes | Your internal identifier |
| `isin` or `company_isin` | Recommended | ISIN code for matching |
| `lei` or `company_lei` | Recommended | LEI code for matching |
| `investment_value` | Yes | Value of your investment in each company |

### How to Run (Google Colab)
1. The SBTi package is pre-loaded - no installation required
2. Upload your portfolio file to the `data/` folder using the file browser (folder icon in left sidebar)
3. Update the file path in the "Load your portfolio" section to match your filename
4. Click **Runtime > Run all** or run cells one-by-one

### How to Run (Local/On-Premises)
1. Place your portfolio file in the `data/` folder
2. Update the file path in the "Load your portfolio" section
3. Run all cells in order

## Key User Input Sections
Look for cells marked with **USER INPUT** - these are where you need to make changes:
1. **Portfolio file path** - Point to your data file
2. **Date selection** - Choose the date for coverage calculation
3. **1.5°C filter settings** - Configure which targets to include
4. **Anonymize output** - Choose whether to remove company names and identifiers before saving, so you can safely share results with third parties

---

In [None]:
# Ensure required packages are installed
import pandas as pd
import openpyxl
import requests
from datetime import datetime
import re
from difflib import SequenceMatcher

## Create the data directory and download the example portfolio
We have prepared dummy data for you to be able to run the tool as it is to familiarise yourself with how it works. To use your own data; please check out to the [Data Requirements section](https://sciencebasedtargets.github.io/SBTi-finance-tool/DataRequirements.html) of the technical documentation for more details on data requirements and formatting.

*The dummy data may include some company names, but the data associated with those company names is completely random and any similarities with real world data is purely coincidental.

In [None]:
import urllib.request
import os

if not os.path.isdir("data"):
    os.mkdir("data")
if not os.path.isfile("data/example_portfolio.csv"):
    urllib.request.urlretrieve("https://github.com/ScienceBasedTargets/SBTi-finance-tool/raw/main/examples/data/example_portfolio.csv", "data/example_portfolio.csv")

## USER INPUT: Load Your Portfolio

### For Google Colab Users:
1. The SBTi package is pre-loaded - no installation required
2. Click the **folder icon** in the left sidebar to open the file browser
3. Navigate to the `data/` folder and upload your portfolio file there
4. Update the file path below to: `data/your_filename.csv` (or `.xlsx`)

### For Local/On-Premises Users:
1. Place your portfolio file in the `data/` folder
2. Update the file path below

### Portfolio File Requirements
- **Format**: CSV (.csv) or Excel (.xlsx)
- **Required columns**: `company_name`, `company_id`, `investment_value`
- **Recommended columns**: `isin` (or `company_isin`), `lei` (or `company_lei`)
- **No duplicate company_id values**

### Column Definitions
| Column | Description |
|--------|-------------|
| `company_name` | Name of the company (must be unique per row) |
| `company_id` | Your internal identifier for the company |
| `isin` | ISIN code, used to match against SBTi data |
| `lei` | Legal Entity Identifier, used to match against SBTi data |
| `investment_value` | Monetary value of your investment in the company |

See the [Data Legends](https://sciencebasedtargets.github.io/SBTi-finance-tool/Legends.html#) documentation for more details.

### Load the portfolio from a CSV or Excel file

In [None]:
df_portfolio = pd.read_csv("data/example_portfolio.csv", encoding="iso-8859-1")
#df_portfolio = pd.read_excel("data/example_portfolio.xlsx", engine="openpyxl") # .xlsx format

#Use your local file instead
#my_file_path = "path/to/your/portfolio_file.csv"
#df_portfolio = pd.read_csv(my_file_path, encoding="utf-8")

In [None]:
# Convert all column names to snake_case format
def convert_to_snake_case(name):
    """Convert any string to snake_case format"""
    import re
    # Handle CamelCase and PascalCase
    s1 = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', name)
    s2 = re.sub('([a-z0-9])([A-Z])', r'\1_\2', s1)
    # Convert to lowercase
    s3 = s2.lower()
    # Replace spaces and other separators with underscores
    s4 = re.sub(r'[^a-z0-9_]', '_', s3)
    # Remove duplicate underscores
    s5 = re.sub(r'_+', '_', s4)
    # Remove leading and trailing underscores
    return s5.strip('_')

# Apply conversion to all column names
original_columns = df_portfolio.columns.tolist()
df_portfolio.columns = [convert_to_snake_case(col) for col in df_portfolio.columns]

In [None]:
# Detect column format and standardize for flexibility
# Support both new format (isin, lei) and old format (company_isin, company_lei)

if 'isin' in df_portfolio.columns and 'company_isin' in df_portfolio.columns:
    print("Warning: Both 'isin' and 'company_isin' columns found. Using 'company_isin'.")
    portfolio_isin_col = 'company_isin'
elif 'company_isin' in df_portfolio.columns:
    print("Using old format: 'company_isin' column detected.")
    portfolio_isin_col = 'company_isin'
elif 'isin' in df_portfolio.columns:
    print("Using new format: 'isin' column detected.")
    portfolio_isin_col = 'isin'
else:
    print("Warning: No ISIN column found.")
    portfolio_isin_col = None

if 'lei' in df_portfolio.columns and 'company_lei' in df_portfolio.columns:
    print("Warning: Both 'lei' and 'company_lei' columns found. Using 'company_lei'.")
    portfolio_lei_col = 'company_lei'
elif 'company_lei' in df_portfolio.columns:
    print("Using old format: 'company_lei' column detected.")
    portfolio_lei_col = 'company_lei'
elif 'lei' in df_portfolio.columns:
    print("Using new format: 'lei' column detected.")
    portfolio_lei_col = 'lei'
else:
    print("Warning: No LEI column found.")
    portfolio_lei_col = None

print(f"Portfolio format detected - ISIN column: {portfolio_isin_col}, LEI column: {portfolio_lei_col}")

In [None]:
# Change the column names to match the API if the snakecase conversion did not work
#df_portfolio.rename(columns={'Company Name': 'company_name', 'ISIN': 'isin'}, inplace=True)

# Check for duplicate values in the 'company_id' column
duplicate_ids = df_portfolio[df_portfolio.duplicated('company_id', keep=False)]

if not duplicate_ids.empty:
    print("Error: Duplicate values found in the 'company_id' column:")
    print(duplicate_ids)
else:
    print("No duplicate values found in the 'company_id' column.")

## USER INPUT: Select Coverage Date

Enter the date for which you want to calculate portfolio coverage. This is typically:
- Your **reporting date** (e.g., end of fiscal year)
- Your **base year date** for portfolio targets

**Format**: Year, Month, Day (as numbers)

**Example**: For December 31, 2023, enter:
- year = 2025
- month = 12
- day = 31

In [None]:
year = 2025 #enter the year for which you want to calculate the portfolio coverage
month = 12 #enter the month for which you want to calculate the portfolio coverage
day = 31 #enter the day for which you want to calculate the portfolio coverage

In [None]:
user_date = datetime(year, month, day)

Now load the CTA file (Companies Taking Action) from the SBTi website.

In [None]:
# Use the SBTi class for consistent format handling
from SBTi.data.sbti import SBTi

print("Loading SBTi Companies Taking Action data...")
sbti_provider = SBTi()
print(f"SBTi format: {getattr(sbti_provider, 'format_type', 'detected automatically')}")
print(f"Companies loaded: {len(sbti_provider.targets)}")

# Access the processed targets data (already filtered and formatted)
cta_file = sbti_provider.targets.copy()

In [None]:
cta_file.head()

## Filter the CTA file
Filter the CTA file to create a dataframe that has one row per company with the columns "action" and "target".
If Action = Target then only keep the rows where Target = Near-term.

In [None]:
targets = cta_file.copy()

# Filter for companies that have "Target" in the action field
companies_with_targets = targets[targets[sbti_provider.c.COL_ACTION] == sbti_provider.c.VALUE_ACTION_TARGET]

print(f"Total companies with targets: {len(companies_with_targets)}")

# Get unique company names with targets
unique_companies_with_targets = companies_with_targets[sbti_provider.c.COL_COMPANY_NAME].unique()
total_companies_with_targets = len(unique_companies_with_targets)

# Total unique companies in SBTi database
total_companies_in_sbti = len(targets[sbti_provider.c.COL_COMPANY_NAME].unique())

# Create a new dataframe with one row per company (to avoid duplicates in counting)
unique_companies_df = companies_with_targets.drop_duplicates(subset=[sbti_provider.c.COL_COMPANY_NAME])

# Create sets for companies with different identifiers
companies_with_isin = set(unique_companies_df[unique_companies_df[sbti_provider.c.COL_COMPANY_ISIN].notna()][sbti_provider.c.COL_COMPANY_NAME])
companies_with_lei = set(unique_companies_df[unique_companies_df[sbti_provider.c.COL_COMPANY_LEI].notna()][sbti_provider.c.COL_COMPANY_NAME])

# Get unique ISINs and LEIs
all_isin_set = set(companies_with_targets[sbti_provider.c.COL_COMPANY_ISIN].dropna())
all_lei_set = set(companies_with_targets[sbti_provider.c.COL_COMPANY_LEI].dropna())

# Calculate the different categories
companies_with_both = companies_with_isin.intersection(companies_with_lei)
companies_with_only_isin = companies_with_isin - companies_with_both
companies_with_only_lei = companies_with_lei - companies_with_both
companies_with_neither = set(unique_companies_with_targets) - companies_with_isin - companies_with_lei

# Count companies in each category
total_companies_with_both = len(companies_with_both)
total_companies_with_only_isin = len(companies_with_only_isin)
total_companies_with_only_lei = len(companies_with_only_lei)
total_companies_without_identifiers = len(companies_with_neither)

# Print the analysis
print(f"Total unique companies in the SBTi database: {total_companies_in_sbti}")
print(f"Total companies with targets in SBTi database: {total_companies_with_targets}")
print(f"Total unique ISINs with targets: {len(all_isin_set)}")
print(f"Total unique LEIs with targets: {len(all_lei_set)}")
print(f"Companies with targets with both ISIN and LEI: {total_companies_with_both}")
print(f"Companies with targets with only ISIN (no LEI): {total_companies_with_only_isin}")
print(f"Companies with targets with only LEI (no ISIN): {total_companies_with_only_lei}")
print(f"Companies with targets but no LEI or ISIN: {total_companies_without_identifiers}")

# Verification
calculated_total = (total_companies_with_both +
                   total_companies_with_only_isin +
                   total_companies_with_only_lei +
                   total_companies_without_identifiers)
print(f"Sum of all categories: {calculated_total}")
print(f"Matches total companies with targets: {calculated_total == total_companies_with_targets}")

## Filter out dates

In [None]:
# List of potential date columns to look for
# The SBTi provider may already map date_updated -> Date Published,
# so check if the target column exists first to avoid duplicates.
target_date_col = sbti_provider.c.COL_DATE_PUBLISHED

if target_date_col in companies_with_targets.columns:
    print(f"Date column '{target_date_col}' already present — no rename needed.")
else:
    # Search for an alternative date column to rename
    potential_date_cols = ['date_updated', 'Date Updated', 'date_published']
    found_date_col = None
    for col in potential_date_cols:
        if col in companies_with_targets.columns:
            found_date_col = col
            break

    if found_date_col:
        print(f"Renaming column '{found_date_col}' to '{target_date_col}' for compatibility.")
        companies_with_targets.rename(columns={found_date_col: target_date_col}, inplace=True)
    else:
        print("WARNING: No date column found. Date filtering may fail.")

In [None]:
# Convert the "Date Published" column to datetime type
df_targets = companies_with_targets.copy()
df_targets[sbti_provider.c.COL_DATE_PUBLISHED] = pd.to_datetime(df_targets[sbti_provider.c.COL_DATE_PUBLISHED])

# Filter rows based on user-entered date
filtered_df = df_targets.loc[df_targets[sbti_provider.c.COL_DATE_PUBLISHED] <= user_date]
filtered_df = filtered_df[filtered_df[sbti_provider.c.COL_COMPANY_ISIN].notnull() | filtered_df[sbti_provider.c.COL_COMPANY_LEI].notnull()]

# Create a set of company names from the filtered SBTi data
filtered_df['company_name_lower'] = filtered_df[sbti_provider.c.COL_COMPANY_NAME].str.lower()
company_name_set = set(filtered_df['company_name_lower'].dropna())

## Check CTA file for companies with validated targets

In [None]:
# Create sets for matching
isin_set = set(filtered_df[sbti_provider.c.COL_COMPANY_ISIN])
lei_set = set(filtered_df[sbti_provider.c.COL_COMPANY_LEI])

# Flexible validation function that works with both old and new column formats
def is_validated(row):
    # Check LEI (use detected column name)
    if portfolio_lei_col and pd.notna(row.get(portfolio_lei_col)) and row.get(portfolio_lei_col) in all_lei_set:
        return True

    # Check ISIN (use detected column name)
    if portfolio_isin_col and pd.notna(row.get(portfolio_isin_col)) and row.get(portfolio_isin_col) in all_isin_set:
        return True

    # Check company name
    if pd.notna(row.get('company_name')):
        company_name_lower = row.get('company_name').lower()
        if company_name_lower in company_name_set:
            return True

    # If none of the conditions are met
    return False

# Apply the function to create the 'validated' column
df_portfolio['validated'] = df_portfolio.apply(is_validated, axis=1)

print(f"Validation completed using ISIN column: {portfolio_isin_col}, LEI column: {portfolio_lei_col}")

## USER INPUT: 1.5°C Filter Configuration

This section calculates what percentage of your portfolio is aligned with **1.5°C climate targets**.

### What Are the Options?

| Option | Default | What It Means |
|--------|---------|---------------|
| `INCLUDE_PURE_15C` | `True` | Include companies with pure "1.5°C" targets |
| `INCLUDE_MIXED_15C_2C` | `False` | Include companies with mixed targets like "1.5°C/2°C" |

### Recommended Settings
- **Most rigorous**: `INCLUDE_PURE_15C = True`, `INCLUDE_MIXED_15C_2C = False`
- **More inclusive**: `INCLUDE_PURE_15C = True`, `INCLUDE_MIXED_15C_2C = True`

### How Matching Works
Your portfolio companies are matched to SBTi data using (in priority order):
1. **LEI** (Legal Entity Identifier)
2. **ISIN** (International Securities ID)
3. **Company name** (exact match, case-insensitive)

In [None]:
# =============================================================================
# USER CONFIGURATION FOR 1.5°C FILTERING
# =============================================================================

print("=" * 70)
print("1.5°C PORTFOLIO COVERAGE CONFIGURATION")
print("=" * 70)

# -----------------------------------------------------------------------------
# OPTION 1: Include pure 1.5°C targets?
# -----------------------------------------------------------------------------
# Companies with "1.5°C" near-term target classification
#
# Options:
#   True  - Include companies with pure 1.5°C targets (RECOMMENDED)
#   False - Exclude pure 1.5°C targets

INCLUDE_PURE_15C = True

# -----------------------------------------------------------------------------
# OPTION 2: Include mixed 1.5°C/2°C classifications?
# -----------------------------------------------------------------------------
# Companies with mixed classifications like:
#   - "1.5°C/1.5°C" (multiple 1.5°C targets)
#   - "1.5°C/Well-below 2°C" (1.5°C + Well-below 2°C targets)
#   - "1.5°C/2°C" (1.5°C + 2°C targets)
#
# Options:
#   True  - Include companies with mixed 1.5°C/2°C classifications
#   False - Exclude mixed classifications (RECOMMENDED)

INCLUDE_MIXED_15C_2C = False

print("\nConfiguration:")
print(f"  Include Pure 1.5°C: {INCLUDE_PURE_15C}")
print(f"  Include Mixed 1.5°C/2°C: {INCLUDE_MIXED_15C_2C}")

In [None]:
# =============================================================================
# IDENTIFY 1.5°C ALIGNED COMPANIES
# =============================================================================

print("\n" + "=" * 70)
print("IDENTIFYING 1.5°C ALIGNED COMPANIES")
print("=" * 70)

df_analysis = filtered_df.copy()

# ============= PURE 1.5°C TARGETS =============

df_pure_15c = df_analysis[df_analysis['Target Classification'] == '1.5°C']
count_pure_15c = len(df_pure_15c)

# ============= MIXED 1.5°C/2°C CLASSIFICATIONS =============

# Mixed classifications containing both "1.5" and "/" (slash indicates mixed targets)
df_mixed_15c = df_analysis[
    (df_analysis['Target Classification'].astype(str).str.contains('1.5', na=False)) &
    (df_analysis['Target Classification'].astype(str).str.contains('/', na=False))
]
count_mixed_15c = len(df_mixed_15c)

print("\nClassification Breakdown:")
print(f"  Pure 1.5°C: {count_pure_15c:,}")
print(f"  Mixed (contains '1.5' and '/'): {count_mixed_15c}")

if count_mixed_15c > 0:
    print("\n  Mixed classification details:")
    for classification, count in df_mixed_15c['Target Classification'].value_counts().items():
        print(f"    {classification}: {count}")

# ============= APPLY FILTERING LOGIC =============

# Start with empty dataframe
df_1_5c = pd.DataFrame()

# Add pure 1.5°C if configured
if INCLUDE_PURE_15C:
    df_1_5c = pd.concat([df_1_5c, df_pure_15c])
    mode_desc = "Pure 1.5°C"
else:
    mode_desc = "None"

# Add mixed 1.5°C/2°C if configured
if INCLUDE_MIXED_15C_2C:
    df_1_5c = pd.concat([df_1_5c, df_mixed_15c])
    if mode_desc != "None":
        mode_desc += " + Mixed 1.5°C/2°C"
    else:
        mode_desc = "Mixed 1.5°C/2°C only"

# Remove duplicates
df_1_5c = df_1_5c.drop_duplicates(subset=[sbti_provider.c.COL_COMPANY_NAME])

print(f"\n Mode: {mode_desc}")
print(f"\nTotal 1.5°C aligned companies: {len(df_1_5c):,}")

# Create lookup sets for portfolio matching
isin_set_1_5c = set(df_1_5c[sbti_provider.c.COL_COMPANY_ISIN].dropna())
lei_set_1_5c = set(df_1_5c[sbti_provider.c.COL_COMPANY_LEI].dropna())
company_name_set_1_5c = set(df_1_5c[sbti_provider.c.COL_COMPANY_NAME].str.lower().dropna())

# Identifier coverage analysis
has_isin = df_1_5c[sbti_provider.c.COL_COMPANY_ISIN].notna().sum()
has_lei = df_1_5c[sbti_provider.c.COL_COMPANY_LEI].notna().sum()
has_either = df_1_5c[
    (df_1_5c[sbti_provider.c.COL_COMPANY_ISIN].notna()) |
    (df_1_5c[sbti_provider.c.COL_COMPANY_LEI].notna())
].shape[0]

print(f"\nIdentifier Coverage:")
print(f"  Companies with ISIN: {has_isin:,} ({has_isin/len(df_1_5c)*100:.1f}%)")
print(f"  Companies with LEI: {has_lei:,} ({has_lei/len(df_1_5c)*100:.1f}%)")
print(f"  Companies with either: {has_either:,} ({has_either/len(df_1_5c)*100:.1f}%)")
print(f"  Name-match only: {len(df_1_5c) - has_either:,}")

In [None]:
# =============================================================================
# MATCH PORTFOLIO COMPANIES TO 1.5°C ALIGNED COMPANIES
# =============================================================================

def is_1_5c_aligned(row):
    """
    Check if a portfolio company is 1.5°C aligned.

    Priority order:
    1. LEI match (most reliable)
    2. ISIN match (very reliable)
    3. Company name match (exact, case-insensitive)
    """
    # Check LEI
    if portfolio_lei_col and pd.notna(row.get(portfolio_lei_col)):
        if row.get(portfolio_lei_col) in lei_set_1_5c:
            return True

    # Check ISIN
    if portfolio_isin_col and pd.notna(row.get(portfolio_isin_col)):
        if row.get(portfolio_isin_col) in isin_set_1_5c:
            return True

    # Check company name (exact match, case-insensitive)
    if pd.notna(row.get('company_name')):
        company_name_lower = row.get('company_name').lower()
        if company_name_lower in company_name_set_1_5c:
            return True

    return False

# Apply the matching function
df_portfolio['is_1_5c'] = df_portfolio.apply(is_1_5c_aligned, axis=1)

# Summary statistics
total_portfolio_companies = len(df_portfolio)
matched_15c = df_portfolio['is_1_5c'].sum()

print(f"\nPortfolio Matching Results:")
print(f"  Total companies in portfolio: {total_portfolio_companies}")
print(f"  Matched to 1.5°C companies: {matched_15c}")
print(f"  Match rate: {matched_15c/total_portfolio_companies*100:.1f}%")

## Portfolio Coverage Results

Portfolio coverage shows what proportion of your portfolio holds companies with validated Science Based Targets.

**Two coverage metrics:**
- **By Value ($)**: Percentage of total investment value in SBT companies
- **By Count (#)**: Percentage of companies in portfolio with SBTs

In [None]:
# =============================================================================
# CALCULATE PORTFOLIO COVERAGE (ALL SBTs)
# =============================================================================

total_investment_value = df_portfolio['investment_value'].sum()
total_companies = len(df_portfolio)

# Coverage for all validated SBT companies
sbt_investment_value = df_portfolio.loc[df_portfolio['validated'] == True, 'investment_value'].sum()
sbt_company_count = df_portfolio['validated'].sum()

# Calculate percentages
coverage_by_value = (sbt_investment_value / total_investment_value * 100) if total_investment_value > 0 else 0
coverage_by_count = (sbt_company_count / total_companies * 100) if total_companies > 0 else 0

print("=" * 70)
print("PORTFOLIO COVERAGE - ALL SBT TARGETS")
print("=" * 70)
print(f"\nCoverage by Investment Value:")
print(f"  SBT Investment Value:    ${sbt_investment_value:,.2f}")
print(f"  Total Portfolio Value:   ${total_investment_value:,.2f}")
print(f"  Coverage:                {coverage_by_value:.2f}%")
print(f"\nCoverage by Company Count:")
print(f"  SBT Companies:           {sbt_company_count}")
print(f"  Total Companies:         {total_companies}")
print(f"  Coverage:                {coverage_by_count:.2f}%")

In [None]:
#Print the Total and Validated Investment Values
total_investment_weight = df_portfolio["investment_value"].sum()
validated_investment_sum = df_portfolio.loc[df_portfolio["validated"] == True, "investment_value"].sum()

print(f"Total Investment Value: {total_investment_weight:,.2f}")
print(f"Validated Investment Value: {validated_investment_sum:,.2f}")
print(f"Percentage of Portfolio Value with Validated Targets: {(validated_investment_sum/total_investment_weight)*100:.2f}%")
print(f"Total Companies in Portfolio: {len(df_portfolio)}")
print(f"Validated Companies: {df_portfolio['validated'].sum()}")
print(f"Percentage of Companies with Validated Targets: {(df_portfolio['validated'].sum()/len(df_portfolio))*100:.2f}%")

#Show the first few validated companies
print("\nSample of companies with validated targets:")
print(df_portfolio[df_portfolio["validated"] == True][["company_name", "investment_value", "validated"]].head(10))

Updated counting function to test

In [None]:
distinct_company_count = df_portfolio['company_name'].nunique()
validated_companies = df_portfolio[df_portfolio['validated']]['company_name'].nunique()

print(f"Total Distinct Companies: {distinct_company_count}")
print(f"Distinct Validated Companies: {validated_companies}")
print(f"Percentage of Distinct Companies with Validated Targets: {(validated_companies/distinct_company_count)*100:.2f}%")

# Original counting method - counts rows, not necessarily distinct companies
print(f"Total Portfolio Rows: {len(df_portfolio)}")
print(f"Validated Rows: {df_portfolio['validated'].sum()}")
print(f"Percentage of Rows with Validated Targets: {(df_portfolio['validated'].sum()/len(df_portfolio))*100:.2f}%")

#Show the first few validated companies
print("\nSample of companies with validated targets:")
print(df_portfolio[df_portfolio["validated"] == True][["company_name", "investment_value", "validated"]].head(10))

## 1.5°C Portfolio Coverage Results

The following calculates the percentage of the portfolio invested in companies with 1.5°C aligned targets (near-term and/or long-term).

In [None]:
# =============================================================================
# CALCULATE 1.5°C PORTFOLIO COVERAGE
# =============================================================================

total_investment_value = df_portfolio['investment_value'].sum()
total_companies = len(df_portfolio)

# Coverage for 1.5°C aligned companies
val_1_5c = df_portfolio.loc[df_portfolio['is_1_5c'] == True, 'investment_value'].sum()
count_1_5c = df_portfolio['is_1_5c'].sum()

# Calculate percentages
coverage_15c_by_value = (val_1_5c / total_investment_value * 100) if total_investment_value > 0 else 0
coverage_15c_by_count = (count_1_5c / total_companies * 100) if total_companies > 0 else 0

print("\n" + "=" * 70)
print("1.5°C PORTFOLIO COVERAGE RESULTS")
print("=" * 70)
print(f"\nConfiguration:")
print(f"  Pure 1.5°C: {'Included' if INCLUDE_PURE_15C else 'Excluded'}")
print(f"  Mixed 1.5°C/2°C: {'Included' if INCLUDE_MIXED_15C_2C else 'Excluded'}")
print(f"\nCoverage by Investment Value:")
print(f"  1.5°C Investment Value:  ${val_1_5c:,.2f}")
print(f"  Total Portfolio Value:   ${total_investment_value:,.2f}")
print(f"  Coverage:                {coverage_15c_by_value:.2f}%")
print(f"\nCoverage by Company Count:")
print(f"  1.5°C Companies:         {count_1_5c}")
print(f"  Total Companies:         {total_companies}")
print(f"  Coverage:                {coverage_15c_by_count:.2f}%")

# Show comparison: 1.5C as share of all SBTs
if sbt_company_count > 0:
    print(f"\n1.5°C as Share of All SBTs:")
    print(f"  By Value:  {(val_1_5c / sbt_investment_value * 100):.1f}%" if sbt_investment_value > 0 else "  By Value:  N/A")
    print(f"  By Count:  {(count_1_5c / sbt_company_count * 100):.1f}%")

### Detailed 1.5°C Coverage Breakdown
See how your portfolio companies matched to 1.5°C aligned companies.

In [None]:
# =============================================================================
# DETAILED BREAKDOWN BY MATCHING METHOD
# =============================================================================

print("=" * 70)
print("MATCHING METHOD BREAKDOWN")
print("=" * 70)

def get_match_method(row):
    """Identify how each company was matched"""
    if not row['is_1_5c']:
        return 'Not matched'

    if portfolio_lei_col and pd.notna(row.get(portfolio_lei_col)):
        if row.get(portfolio_lei_col) in lei_set_1_5c:
            return 'LEI'

    if portfolio_isin_col and pd.notna(row.get(portfolio_isin_col)):
        if row.get(portfolio_isin_col) in isin_set_1_5c:
            return 'ISIN'

    if pd.notna(row.get('company_name')):
        if row.get('company_name').lower() in company_name_set_1_5c:
            return 'Name'

    return 'Unknown'

df_portfolio['match_method'] = df_portfolio.apply(get_match_method, axis=1)

print(f"\nMatching Method Distribution:")
for method in ['LEI', 'ISIN', 'Name', 'Not matched']:
    count = (df_portfolio['match_method'] == method).sum()
    value = df_portfolio[df_portfolio['match_method'] == method]['investment_value'].sum()
    pct = (value / total_investment_weight * 100) if total_investment_weight > 0 else 0
    print(f"  {method:15s}: {count:3d} companies (${value:,.2f}, {pct:.1f}%)")

# Show sample of matched companies
if count_1_5c > 0:
    print("\nSample of 1.5°C matched companies:")
    sample = df_portfolio[df_portfolio['is_1_5c'] == True][
        ['company_name', 'investment_value', 'match_method']
    ].head(10)
    print(sample.to_string(index=False))

In [None]:
# =============================================================================
# CONSOLIDATED TARGET SUMMARY
# =============================================================================

print("\n" + "=" * 70)
print("CONSOLIDATED TARGET SUMMARY")
print("=" * 70)

# Get target classification breakdown from SBTi data
target_breakdown = filtered_df[filtered_df[sbti_provider.c.COL_ACTION] == sbti_provider.c.VALUE_ACTION_TARGET]['Target Classification'].value_counts()

total_validated = target_breakdown.sum()

print(f"\nSBTi Validated Targets by Classification:")
print("-" * 50)
for classification, count in target_breakdown.items():
    pct = count / total_validated * 100
    print(f"  {classification:30s}: {count:,} ({pct:.1f}%)")
print("-" * 50)
print(f"  {'Total Validated':30s}: {total_validated:,}")

# Portfolio summary
print(f"\nYour Portfolio Coverage:")
print("-" * 50)
print(f"  All SBT Targets:")
print(f"    By Value:  {coverage_by_value:.2f}%")
print(f"    By Count:  {sbt_company_count}/{total_companies} ({coverage_by_count:.2f}%)")
print(f"\n  1.5°C Targets Only:")
print(f"    By Value:  {coverage_15c_by_value:.2f}%")
print(f"    By Count:  {count_1_5c}/{total_companies} ({coverage_15c_by_count:.2f}%)")

if sbt_company_count > 0:
    print(f"\n  1.5°C as Share of All SBTs:")
    print(f"    By Value:  {(val_1_5c / sbt_investment_value * 100):.1f}%" if sbt_investment_value > 0 else "    By Value:  N/A")
    print(f"    By Count:  {(count_1_5c / sbt_company_count * 100):.1f}%")

print("\n" + "=" * 70)


## USER INPUT: Anonymize Output for Sharing

If you plan to share your portfolio coverage results with a third party (e.g., SBTi), you can **anonymize the output** to remove company-identifying information before saving.

### How to Use
In the code cell below, change the setting to control what gets saved:

| Setting | What Happens |
|---------|-------------|
| `ANONYMIZE_OUTPUT = True` | **Recommended for sharing.** Company names, ISINs, and LEIs are removed from the saved file. Only your internal company ID is kept as an identifier. |
| `ANONYMIZE_OUTPUT = False` | **Default.** The saved file contains all columns, including company names and identifiers. Use this if the file is for your own records. |

### What Gets Removed When Anonymized
- **Company name** -- removed from the output file
- **ISIN** (International Securities Identification Number) -- removed from the output file
- **LEI** (Legal Entity Identifier) -- removed from the output file

### What Is Kept
- **Company ID** -- your internal identifier (only meaningful to you, not traceable by the recipient)
- **Investment value** and **weights** -- your portfolio data
- **Validated**, **1.5°C aligned**, and **match method** -- the coverage results

This allows the recipient to see your portfolio's coverage results without being able to identify which specific companies you hold.

In [None]:
# =============================================================================
# USER CONFIGURATION: ANONYMIZE OUTPUT
# =============================================================================
# Set to True to remove identifying information (company_name, ISIN, LEI)
# from the saved output file. Only your internal company_id will be kept.
#
# Options:
#   True  - Anonymized output for sharing with third parties
#   False - Full output with all company identifiers (default)

ANONYMIZE_OUTPUT = False

# =============================================================================
# PREPARE OUTPUT DATAFRAME
# =============================================================================

# Columns that contain company-identifying information
pii_columns = ['company_name']

# Add the ISIN column if it exists
if portfolio_isin_col and portfolio_isin_col in df_portfolio.columns:
    pii_columns.append(portfolio_isin_col)

# Add the LEI column if it exists
if portfolio_lei_col and portfolio_lei_col in df_portfolio.columns:
    pii_columns.append(portfolio_lei_col)

if ANONYMIZE_OUTPUT:
    # Remove PII columns from the output
    columns_to_drop = [col for col in pii_columns if col in df_portfolio.columns]
    df_output = df_portfolio.drop(columns=columns_to_drop)

    print("=" * 70)
    print("ANONYMIZED OUTPUT MODE")
    print("=" * 70)
    print(f"\nRemoved columns: {', '.join(columns_to_drop)}")
    print(f"Remaining columns: {', '.join(df_output.columns.tolist())}")
    print(f"\nRows: {len(df_output)}")
    print("\nPreview (first 5 rows):")
    print(df_output.head().to_string(index=False))
else:
    df_output = df_portfolio.copy()

    print("=" * 70)
    print("FULL OUTPUT MODE (not anonymized)")
    print("=" * 70)
    print(f"\nAll columns retained: {', '.join(df_output.columns.tolist())}")
    print(f"Rows: {len(df_output)}")
    print("\nTo anonymize, set ANONYMIZE_OUTPUT = True above and re-run this cell.")

## Save the portfolio
If you want to save the portfolio, you can use the following code in the following cell.

In [None]:
# Save the output (uses anonymized or full data based on ANONYMIZE_OUTPUT setting)
if ANONYMIZE_OUTPUT:
    output_path = 'data/validated_portfolio_anonymized.csv'
else:
    output_path = 'data/validated_portfolio.csv'

df_output.to_csv(output_path, index=False)
print(f"Portfolio saved to: {output_path}")
print(f"Anonymized: {ANONYMIZE_OUTPUT}")
print(f"Columns in output: {', '.join(df_output.columns.tolist())}")