# SBTi-Finance Tool - Portfolio Aggregation
In this notebook we'll give some examples on how the portfolio aggregation methods can be used.

Please see the [methodology](https://sciencebasedtargets.org/wp-content/uploads/2020/09/Temperature-Rating-Methodology-V1.pdf), [guidance](https://sciencebasedtargets.org/wp-content/uploads/2020/10/Financial-Sector-Science-Based-Targets-Guidance-Pilot-Version.pdf) and the [technical documentation](https://sciencebasedtargets.github.io/SBTi-finance-tool/)  for more details on the different aggregation methods.

See 1_analysis_example (on [Colab](https://colab.research.google.com/github/ScienceBasedTargets/SBTi-finance-tool/blob/main/examples/1_analysis_example.ipynb) or [Github](https://github.com/ScienceBasedTargets/SBTi-finance-tool/blob/main/examples/1_analysis_example.ipynb)) for more in depth example of how to work with Jupyter Notebooks in general and SBTi notebooks in particular. 


## Setting up
First we will set up the imports, data providers, and load the portfolio. 

For more examples of this process, please refer to notebook 1 & 2 (analysis and quick calculation example).


In [None]:
%pip install sbti-finance-tool

In [None]:
%load_ext autoreload
%autoreload 2
import SBTi
from SBTi.data.excel import ExcelProvider
from SBTi.portfolio_aggregation import PortfolioAggregationMethod
from SBTi.portfolio_coverage_tvp import PortfolioCoverageTVP
from SBTi.temperature_score import TemperatureScore, Scenario, ScenarioType, EngagementType
from SBTi.target_validation import TargetProtocol
from SBTi.interfaces import ETimeFrames, EScope
%aimport -pandas
import pandas as pd
import requests
from datetime import datetime

In [None]:
# Download the dummy data
import urllib.request
import os

if not os.path.isdir("data"):
    os.mkdir("data")
if not os.path.isfile("data/data_provider_example.xlsx"):
    urllib.request.urlretrieve("https://github.com/ScienceBasedTargets/SBTi-finance-tool/raw/main/examples/data/data_provider_example.xlsx", "data/data_provider_example.xlsx")
if not os.path.isfile("data/example_portfolio.csv"):
    urllib.request.urlretrieve("https://github.com/ScienceBasedTargets/SBTi-finance-tool/raw/main/examples/data/example_portfolio.csv", "data/example_portfolio.csv")


In [None]:
provider = ExcelProvider(path="data/data_provider_example.xlsx")
df_portfolio = pd.read_csv("data/example_portfolio.csv", encoding="iso-8859-1")

# Print original columns to diagnose
print("Original columns:", df_portfolio.columns.tolist())

# Create a more cautious renaming dictionary based on what columns exist
rename_dict = {}
if "Company Name" in df_portfolio.columns:
    rename_dict["Company Name"] = "company_name"
if "ISIN" in df_portfolio.columns:
    rename_dict["ISIN"] = "isin" 
if "LEI" in df_portfolio.columns:
    rename_dict["LEI"] = "lei"
if "Sector" in df_portfolio.columns:
    rename_dict["Sector"] = "sector"
if "Target" in df_portfolio.columns:
    rename_dict["Target"] = "full_target_language"
if "Net-Zero Committed" in df_portfolio.columns:
    rename_dict["Net-Zero Committed"] = "net_zero_status"
if "Near term - Target Status" in df_portfolio.columns:
    rename_dict["Near term - Target Status"] = "near_term_status"
if "Target Classification" in df_portfolio.columns:
    rename_dict["Target Classification"] = "target_classification_long"
if "Extension" in df_portfolio.columns:
    rename_dict["Extension"] = "reason_for_extension_or_removal"
if "Date" in df_portfolio.columns:
    rename_dict["Date"] = "date_updated"

# Apply column renaming
df_portfolio = df_portfolio.rename(columns=rename_dict)

# Handle legacy column names if they exist
if 'company_isin' in df_portfolio.columns:
    df_portfolio.rename(columns={'company_isin': 'isin'}, inplace=True)
if 'company_lei' in df_portfolio.columns:
    df_portfolio.rename(columns={'company_lei': 'lei'}, inplace=True)

# Ensure required columns exist
required_columns = ['company_id', 'company_name', 'isin', 'lei', 'investment_value']
for col in required_columns:
    if col not in df_portfolio.columns:
        df_portfolio[col] = None

# Convert identifiers to string
if 'isin' in df_portfolio.columns:
    df_portfolio['isin'] = df_portfolio['isin'].astype(str)
if 'lei' in df_portfolio.columns:
    df_portfolio['lei'] = df_portfolio['lei'].astype(str)

# Check for duplicate values in the 'company_id' column
duplicate_ids = df_portfolio[df_portfolio.duplicated('company_id', keep=False)]
if not duplicate_ids.empty:
    print("Error: Duplicate values found in the 'company_id' column:")
    print(duplicate_ids)
else:
    print("No duplicate values found in the 'company_id' column.")

# Display final columns to verify
print("Final columns:", df_portfolio.columns.tolist())

# Convert to portfolio objects and get data
companies = SBTi.utils.dataframe_to_portfolio(df_portfolio)
try:
    portfolio_data = SBTi.utils.get_data([provider], companies)
    scenarios = {}
    print("Successfully loaded portfolio data")
except Exception as e:
    print(f"Error loading portfolio data: {str(e)}")
    # Add more error handling if needed
    import traceback
    traceback.print_exc()

Import the CTA and set up a SBTi target frame of reference.

In [None]:
#Provides an absolute frame of reference for SBTi targets so that they are considered as cardinal compared to others in the calculation of temperature scores.
def inject_sbti_validation_for_timeframe_scope_data(amended_portfolio, original_portfolio, debug=True):
    """
    Specially designed for the SBTi tool where amended_portfolio contains multiple rows 
    per company (one for each time frame and scope combination).
    """
    if 'sbti_validated' not in original_portfolio.columns:
        print("⚠ No 'sbti_validated' column found in original portfolio")
        return amended_portfolio
    
    # Store original values before modification
    original_validated_count = original_portfolio['sbti_validated'].sum()
    original_companies_count = len(original_portfolio)
    
    # Get count of unique companies in amended portfolio
    unique_companies_amended = amended_portfolio['company_id'].nunique()
    
    if debug:
        print(f"Original portfolio: {original_companies_count} companies, {original_validated_count} validated")
        print(f"Amended portfolio: {len(amended_portfolio)} rows, {unique_companies_amended} unique companies")
        
        # Check for duplicated rows by company_id
        if len(amended_portfolio) > unique_companies_amended:
            print(f"Multiple rows per company detected in amended portfolio")
            print(f"Rows per company: {len(amended_portfolio) / unique_companies_amended:.2f}")
            
            # Show distribution of time_frame and scope if they exist
            if 'time_frame' in amended_portfolio.columns:
                print("\nTime frame distribution:")
                print(amended_portfolio['time_frame'].value_counts())
            if 'scope' in amended_portfolio.columns:
                print("\nScope distribution:")
                print(amended_portfolio['scope'].value_counts())
    
    # Create a validation mapping
    validation_map = dict(zip(original_portfolio['company_id'], original_portfolio['sbti_validated']))
    
    # Apply validation to all rows in amended portfolio
    amended_portfolio['sbti_validated'] = amended_portfolio['company_id'].map(validation_map).fillna(False)
    
    # Count unique validated companies after modification
    validated_companies = amended_portfolio[amended_portfolio['sbti_validated']]['company_id'].nunique()
    
    # Print validation summary
    print(f"\nOriginal validated companies: {original_validated_count}")
    print(f"Unique companies validated in amended portfolio: {validated_companies}")
    
    if original_validated_count == validated_companies:
        print("✓ CTA validation successfully preserved at company level")
    else:
        print("⚠ CTA validation mismatch at company level")
        
        # Additional debugging
        if debug:
            print("\nChecking for specific discrepancies...")
            # Get list of company IDs that should be validated
            original_validated_ids = set(original_portfolio[original_portfolio['sbti_validated']]['company_id'])
            amended_validated_ids = set(amended_portfolio[amended_portfolio['sbti_validated']]['company_id'])
            
            missing_validations = original_validated_ids - amended_validated_ids
            extra_validations = amended_validated_ids - original_validated_ids
            
            if missing_validations:
                print(f"Companies that should be validated but aren't: {len(missing_validations)}")
                print(missing_validations)
            
            if extra_validations:
                print(f"Companies that shouldn't be validated but are: {len(extra_validations)}")
                print(extra_validations)
    
    return amended_portfolio
# STANDALONE SBTi VALIDATION - Download and process CTA data
print("Downloading SBTi Companies Taking Action (CTA) data...")
CTA_FILE_URL = "https://cdn.sciencebasedtargets.org/download/target-dashboard"

try:
    resp = requests.get(CTA_FILE_URL)
    if resp.status_code == 200:
        cta_file = pd.read_excel(resp.content)
        print(f"Downloaded CTA data with {len(cta_file)} rows")
        
        # Extract relevant columns
        targets = cta_file[['company_name', 'isin', 'lei', 'action', 'target', 'date_published']]
        
        # Filter for companies with targets
        companies_with_targets = targets[targets['action'] == 'Target']
        
        # Get unique identifiers
        all_isin_set = set(companies_with_targets['isin'].dropna())
        all_lei_set = set(companies_with_targets['lei'].dropna())
        
        # Create a set of lowercase company names
        companies_with_targets['company_name_lower'] = companies_with_targets['company_name'].str.lower()
        company_name_set = set(companies_with_targets['company_name_lower'].dropna())
        
        # Add df_portfolio columns if they don't exist
        if 'isin' not in df_portfolio.columns and 'company_isin' in df_portfolio.columns:
            df_portfolio['isin'] = df_portfolio['company_isin']
        if 'lei' not in df_portfolio.columns and 'company_lei' in df_portfolio.columns:
            df_portfolio['lei'] = df_portfolio['company_lei']
        
        # Function to check if ISIN, LEI, or company name is validated
        def is_validated(row):
            # First check LEI
            if pd.notna(row.get('lei')) and str(row.get('lei')).lower() in [str(x).lower() for x in all_lei_set]:
                return True
            
            # Then check ISIN
            if pd.notna(row.get('isin')) and str(row.get('isin')).lower() in [str(x).lower() for x in all_isin_set]:
                return True
            
            # Finally check company name
            if pd.notna(row.get('company_name')):
                company_name_lower = str(row.get('company_name')).lower()
                if company_name_lower in company_name_set:
                    return True
            
            return False
        
        # Add the validated column to the df_portfolio
        df_portfolio['sbti_validated'] = df_portfolio.apply(is_validated, axis=1)
        
        # Convert df_portfolio to company objects again (after adding sbti_validated)
        companies = SBTi.utils.dataframe_to_portfolio(df_portfolio)
        
        # Print validation summary
        validated_count = df_portfolio['sbti_validated'].sum()
        print(f"Companies with SBTi-validated targets: {validated_count} out of {len(df_portfolio)} ({validated_count/len(df_portfolio)*100:.2f}%)")
        
        if 'investment_value' in df_portfolio.columns:
            total_investment = df_portfolio['investment_value'].sum()
            validated_investment = df_portfolio[df_portfolio['sbti_validated']]['investment_value'].sum()
            print(f"Portfolio coverage by investment value: {validated_investment/total_investment*100:.2f}%")
    else:
        print(f"Failed to download CTA file: HTTP {resp.status_code}")
except Exception as e:
    print(f"Error processing CTA file: {str(e)}")

# Update provider data with our validation results
print("Updating provider data with validated companies...")
try:
    # Get the fundamental_data from provider
    if hasattr(provider, 'data') and 'fundamental_data' in provider.data:
        # Create a mapping of company_id to validation status
        validation_map = df_portfolio[['company_id', 'sbti_validated']].set_index('company_id')['sbti_validated'].to_dict()
        
        # Update sbti_validated in provider data
        updated_count = 0
        for idx, row in provider.data['fundamental_data'].iterrows():
            company_id = row['company_id']
            if company_id in validation_map:
                # Ensure we're setting a proper boolean value
                is_validated = bool(validation_map[company_id])
                provider.data['fundamental_data'].at[idx, 'sbti_validated'] = is_validated
                updated_count += 1
        
        # Force the sbti_validated column to be boolean type
        provider.data['fundamental_data']['sbti_validated'] = provider.data['fundamental_data']['sbti_validated'].astype(bool)
        
        print(f"Updated sbti_validated for {updated_count} companies in provider data")
    else:
        print("Provider does not have expected data structure - sbti_validated flags may be overwritten")
except Exception as e:
    print(f"Error updating provider data: {str(e)}")

In [None]:
temperature_score = TemperatureScore(time_frames=list(SBTi.interfaces.ETimeFrames), scopes=[EScope.S1S2, EScope.S3, EScope.S1S2S3])
amended_portfolio = temperature_score.calculate(data_providers=[provider], portfolio=companies)
# Preserve the CTA validation from our direct download
amended_portfolio = inject_sbti_validation_for_timeframe_scope_data(amended_portfolio, df_portfolio, debug=True)

scores_collection = {}

## Calculate the aggregated temperature score
Calculate an aggregated temperature score. This can be done using different aggregation methods. The termperature scores are calculated per time-frame/scope combination.

### WATS
Weighted Average Temperature Score (WATS): Temperature scores are allocated based on portfolio weights.
This method uses the "investment_value" field to be defined in your portfolio data.

In [None]:
temperature_score.aggregation_method = PortfolioAggregationMethod.WATS
aggregated_scores = temperature_score.aggregate_scores(amended_portfolio)
df_wats = pd.DataFrame(aggregated_scores.dict()).applymap(lambda x: round(x['all']['score'], 2))
scores_collection.update({'WATS': df_wats})
df_wats

### TETS
Total emissions weighted temperature score (TETS): Temperature scores are allocated based on historical emission weights using total company emissions. 
In addition to the portfolios "investment value" the TETS method requires company emissions, please refer to [Data Legends - Fundamental Data](https://ofbdabv.github.io/SBTi/Legends.html#fundamental-data) for more details

In [None]:
temperature_score.aggregation_method = PortfolioAggregationMethod.TETS
aggregated_scores = temperature_score.aggregate_scores(amended_portfolio)
df_tets = pd.DataFrame(aggregated_scores.dict()).applymap(lambda x: round(x['all']['score'], 2))
scores_collection.update({'TETS': df_tets})
df_tets

### MOTS
Market Owned emissions weighted temperature score (MOTS): Temperature scores are allocated based on an equity ownership approach.
In addition to the portfolios "investment value" the MOTS method requires company emissions and market cap, please refer to  [Data Legends - Fundamental Data](https://ofbdabv.github.io/SBTi/Legends.html#fundamental-data) for more details

In [None]:
temperature_score.aggregation_method = PortfolioAggregationMethod.MOTS
aggregated_scores = temperature_score.aggregate_scores(amended_portfolio)
df_mots = pd.DataFrame(aggregated_scores.dict()).applymap(lambda x: round(x['all']['score'], 2))
scores_collection.update({'MOTS': df_mots})
df_mots

### EOTS
Enterprise Owned emissions weighted temperature score (EOTS): Temperature scores are allocated based
on an enterprise ownership approach. 
In addition to the portfolios "investment value" the EOTS method requires company emissions and enterprise value, please refer to  [Data Legends - Fundamental Data](https://ofbdabv.github.io/SBTi/Legends.html#fundamental-data) for more details

In [None]:
temperature_score.aggregation_method = PortfolioAggregationMethod.EOTS
aggregated_scores = temperature_score.aggregate_scores(amended_portfolio)
df_eots = pd.DataFrame(aggregated_scores.dict()).applymap(lambda x: round(x['all']['score'], 2))
scores_collection.update({'EOTS': df_eots})
df_eots

### ECOTS
Enterprise Value + Cash emissions weighted temperature score (ECOTS): Temperature scores are allocated based on an enterprise value (EV) plus cash & equivalents ownership approach. 
In addition to the portfolios "investment value" the ECOTS method requires company emissions, company cash equivalents and enterprise value; please refer to  [Data Legends - Fundamental Data](https://sciencebasedtargets.github.io/SBTi-finance-tool/Legends.html#fundamental-data) for more details

In [None]:
temperature_score.aggregation_method = PortfolioAggregationMethod.ECOTS
aggregated_scores = temperature_score.aggregate_scores(amended_portfolio)
df_ecots = pd.DataFrame(aggregated_scores.dict()).applymap(lambda x: round(x['all']['score'], 2))
scores_collection.update({'ECOTS': df_ecots})
df_ecots

### AOTS
Total Assets emissions weighted temperature score (AOTS): Temperature scores are allocated based on a total assets ownership approach. 
In addition to the portfolios "investment value" the AOTS method requires company emissions and company total assets; please refer to  [Data Legends - Fundamental Data](https://sciencebasedtargets.github.io/SBTi-finance-tool/Legends.html#fundamental-data) for more details

In [None]:
temperature_score.aggregation_method = PortfolioAggregationMethod.AOTS
aggregated_scores = temperature_score.aggregate_scores(amended_portfolio)
df_aots = pd.DataFrame(aggregated_scores.dict()).applymap(lambda x: round(x['all']['score'], 2))
scores_collection.update({'AOTS': df_aots})
df_aots

### ROTS
Revenue owned emissions weighted temperature score (ROTS): Temperature scores are allocated based on the share of revenue.
In addition to the portfolios "investment value" the ROTS method requires company emissions and company revenue; please refer to  [Data Legends - Fundamental Data](https://sciencebasedtargets.github.io/SBTi-finance-tool/Legends.html#fundamental-data) for more details

In [None]:
temperature_score.aggregation_method = PortfolioAggregationMethod.ROTS
aggregated_scores = temperature_score.aggregate_scores(amended_portfolio)
df_rots = pd.DataFrame(aggregated_scores.dict()).applymap(lambda x: round(x['all']['score'], 2))
scores_collection.update({'ROTS': df_rots})
df_rots

See below how each aggregation method impact the scores on for each time frame and scope combination

In [None]:
pd.concat(scores_collection, axis=0)