# Sector-Calibrated Factor Scoring Model (SCFSM)

My primary goal in this trading exercise is to generate positive risk-adjusted returns with long-term conviction
(strategic holdings). I aim to outperform the ACWI benchmark while maintaining a diversified and balanced exposure across sectors.

## Investment Process

My stock selection methodology is grounded in the development of a proprietary quantitative framework I call the Sector-Calibrated Factor Scoring Model (SCFSM). The model integrates sector-specific factor sensitivity with fundamental analysis to systematically identify top-performing stocks across the MSCI universe. To begin, I isolate the top 50 weighted companies in the MSCI Index, which represent over 95.39% of its price movement (based on regression models).

In [1]:
import yfinance as yf
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import r2_score
from scipy.optimize import minimize
from datetime import date
import numpy as np
import warnings
warnings.filterwarnings("ignore")

top_50_tickers = [
    "AAPL", "NVDA", "MSFT", "AMZN", "META", "GOOGL", "GOOG", "TSLA", "AVGO", "2330.TW",
    "BRK-B", "JPM", "LLY", "V", "XOM", "UNH", "MA", "0700.HK", "COST", "NFLX",
    "PG", "JNJ", "WMT", "ABBV", "HD", "BAC", "KO", "CVX", "SAP.DE", "9988.HK",
    "ASML.AS", "NESN.SW", "CRM", "CSCO", "PM", "WFC", "ROG.SW", "ORCL", "ABT", "MRK",
    "IBM", "SHEL.L", "AZN.L", "NOVO-B.CO", "NOVN.SW", "MCD", "LIN", "GE", "HSBA.L", "PEP"
]

index = "ACWI"
all_tickers = top_50_tickers + [index]
data = yf.download(all_tickers, start="2015-02-25", end="2025-02-25")['Close']
data = data.dropna()
returns = data.pct_change().dropna()
X = returns[top_50_tickers]
y = returns[index]


model = LinearRegression()
model.fit(X, y)
y_pred = model.predict(X)
r2 = r2_score(y, y_pred)
print(f"R²: {r2:.4f}")
print(f"→ {r2 * 100:.2f}% of ACWI's daily return variance is explained by these 50 stocks.")




YF.download() has changed argument auto_adjust default to True


[*********************100%***********************]  51 of 51 completed


R²: 0.9539
→ 95.39% of ACWI's daily return variance is explained by these 50 stocks.


## Identifying a benchmarking algorithm

I then source 18 financial metrics, including valuation, profitability, leverage, and growth indicators, for every company in the S&P 500 over the last 10 years on a quarterly basis. In parallel, I compile quarterly sectoral return data for the same period. Using linear regression, I analyze each sector individually to determine which financial metrics had statistically significant relationships with sector performance. This ensures that my model incorporated context-specific drivers of return, rather than assuming uniform factor importance across sectors. I calculate the value of each sector based on the following equation:


$$
\begin{aligned}
Y =\ & \beta_0 + \beta_1 \cdot \text{MarketValue} + \beta_2 \cdot \text{ROA\_SurpActValue} + \beta_3 \cdot \text{ROE\_SurpActValue} + \beta_4 \cdot \text{EarningsPerShare} \\
& + \beta_5 \cdot \text{PE} + \beta_6 \cdot \text{EBIT} + \beta_7 \cdot \text{IntChargeCover} + \beta_8 \cdot \text{EV\_EB} + \beta_9 \cdot \text{OPM} \\
& + \beta_{10} \cdot \text{WorkingCapital} + \beta_{11} \cdot \text{GrossProfitMargin} + \beta_{12} \cdot \text{BookPerShare} + \beta_{13} \cdot \text{DividendYield} \\
& + \beta_{14} \cdot \text{PriceToBookVal} + \beta_{15} \cdot \text{CurrentRatio} + \beta_{16} \cdot \text{QuickRatio} \\
& + \beta_{17} \cdot \text{INVDH} + \beta_{18} \cdot \text{AccountsReceivablesDays} + \epsilon
\end{aligned}
$$


In [2]:
tickers = [
    "AAPL", "NVDA", "MSFT", "AMZN", "META", "GOOGL", "GOOG", "TSLA", "AVGO", "2330.TW",
    "BRK-B", "JPM", "LLY", "V", "XOM", "UNH", "MA", "0700.HK", "COST", "NFLX",
    "PG", "JNJ", "WMT", "ABBV", "HD", "BAC", "KO", "CVX", "SAP.DE", "9988.HK",
    "ASML.AS", "NESN.SW", "CRM", "CSCO", "PM", "WFC", "ROG.SW", "ORCL", "ABT", "MRK",
    "IBM", "SHEL.L", "AZN.L", "NOVO-B.CO", "NOVN.SW", "MCD", "LIN", "GE", "HSBA.L", "PEP"
]

def get_financial_data(ticker):
    try:
        stock = yf.Ticker(ticker)
        info = stock.info
        financials = stock.financials
        balance = stock.balance_sheet
        EBIT = financials.loc['EBIT'][0] if 'EBIT' in financials.index else None
        interest_expense = financials.loc['Interest Expense'][0] if 'Interest Expense' in financials.index else None
        revenue = financials.loc['Total Revenue'][0] if 'Total Revenue' in financials.index else None
        gross_profit = financials.loc['Gross Profit'][0] if 'Gross Profit' in financials.index else None
        net_income = financials.loc['Net Income'][0] if 'Net Income' in financials.index else None
        working_capital = balance.loc['Working Capital'][0] if 'Working Capital' in balance.index else None
        total_assets = balance.loc['Total Assets'][0] if 'Total Assets' in balance.index else None
        total_equity = balance.loc["Stockholders Equity"][0] if "Stockholders Equity" in balance.index else None
        inventory = balance.loc['Inventory'][0] if 'Inventory' in balance.index else None
        receivables = balance.loc['Accounts Receivable'][0] if 'Accounts Receivable' in balance.index else None
        cogs = financials.loc['Cost Of Revenue'][0] if 'Cost Of Revenue' in financials.index else None

        return {
            "Sector": info.get("sector"),
            "Market Value": info.get("marketCap"),
            "Earnings/Share": info.get("trailingEps"),
            "P/E": info.get("trailingPE"),
            "EBIT": EBIT,
            "Interest Coverage": EBIT / interest_expense if EBIT and interest_expense else None,
            "EV/EBIT": info.get("enterpriseValue") / EBIT if info.get("enterpriseValue") and EBIT else None,
            "OPM": EBIT / revenue if EBIT and revenue else None,
            "Working Capital": working_capital,
            "Gross Profit Margin": gross_profit / revenue if gross_profit and revenue else None,
            "Book/Share": info['bookValue'],
            "Dividend Yield": info.get("dividendYield"),
            "Price to Book": info.get("priceToBook"),
            "Current Ratio": info['currentRatio'],
            "Quick Ratio": info['quickRatio'],
            "INVDH": inventory / (cogs / 365) if inventory and cogs else None,
            "Accounts Receivables Days": receivables / (revenue / 365) if receivables and revenue else None,
            "ROA Surplus": net_income / total_assets if net_income and total_assets else None,
            "ROE Surplus": net_income / total_equity if net_income and total_equity else None
        }
    except Exception as e:
        print(f"Error with {ticker}: {e}")
        return {}

data = {ticker: get_financial_data(ticker) for ticker in tickers}
stock_data = pd.DataFrame.from_dict(data, orient='index')
stock_data


Error with JPM: 'currentRatio'
Error with ABBV: 'currentRatio'
Error with BAC: 'currentRatio'
Error with CVX: 'currentRatio'
Error with WFC: 'currentRatio'
Error with MRK: 'currentRatio'
Error with HSBA.L: 'currentRatio'


Unnamed: 0,Sector,Market Value,Earnings/Share,P/E,EBIT,Interest Coverage,EV/EBIT,OPM,Working Capital,Gross Profit Margin,Book/Share,Dividend Yield,Price to Book,Current Ratio,Quick Ratio,INVDH,Accounts Receivables Days,ROA Surplus,ROE Surplus
AAPL,Technology,3990206545920,7.48,36.101604,123216000000.0,,32.851954,0.315102,-23405000000.0,0.462063,4.991,0.39,54.10539,0.893,0.771,12.642571,31.185572,0.256825,1.645935
NVDA,Technology,4837505761280,3.38,58.784023,84273000000.0,341.186235,56.743847,0.645785,62079000000.0,0.749887,4.113,0.02,48.307808,4.214,3.488,112.724042,64.512786,0.653041,0.918729
MSFT,Technology,3842762342400,13.99,36.76412,126012000000.0,52.83522,30.481683,0.447289,49913000000.0,0.688237,48.84,0.7,10.530917,1.401,1.254,3.898054,90.568517,0.16451,0.296472
AMZN,Consumer Cyclical,2715315077120,6.96,35.82184,71020000000.0,29.517872,38.461356,0.111324,11436000000.0,0.488544,34.587,,7.208488,1.009,0.76,38.273274,31.725573,0.094813,0.207183
META,Communication Services,1581177372672,22.57,27.794418,71378000000.0,99.829371,22.244801,0.433906,66449000000.0,0.816652,76.98,0.33,8.149129,1.978,1.671,,37.706822,0.225898,0.341442
GOOGL,Communication Services,3351190962176,10.13,27.39783,120083000000.0,448.070896,27.437474,0.343077,74589000000.0,0.582004,32.033,0.3,8.66419,1.747,1.563,,54.580336,0.222358,0.307976
GOOG,Communication Services,3352742854656,10.14,27.422089,120083000000.0,448.070896,27.489729,0.343077,74589000000.0,0.582004,32.033,0.3,8.680423,1.747,1.563,,54.580336,0.222358,0.307976
TSLA,Consumer Cyclical,1477528387584,1.45,306.3862,9340000000.0,26.685714,155.194595,0.095609,29539000000.0,0.178626,24.058,,18.466208,2.066,1.486,54.663572,16.507012,0.058409,0.097788
AVGO,Technology,1712093528064,3.8,92.61579,13869000000.0,3.508475,12.547865,0.268915,2898000000.0,0.630337,59.221,0.65,5.942824,1.497,1.279,33.695253,31.252957,0.035588,0.087104
2330.TW,Technology,37991280017408,55.68,26.311064,1416335000000.0,134.948206,26.31264,0.489352,1779696000000.0,0.561224,192.762,1.33,7.600046,2.693,2.397,82.736937,34.135756,0.173105,0.272928


In [3]:
file_path = "./data/stats.xlsx" 
xls = pd.ExcelFile(file_path)
feature_map = {
    "X1": "Market Value",
    "X2": "ROA Surplus",
    "X3": "ROE Surplus",
    "X4": "Earnings/Share",
    "X5": "P/E",
    "X6": "EBIT",
    "X7": "Interest Coverage",
    "X8": "EV/EBIT",
    "X9": "OPM",
    "X10": "Working Capital",
    "X11": "Gross Profit Margin",
    "X12": "Book/Share",
    "X13": "Dividend Yield",
    "X14": "Price to Book",
    "X15": "Current Ratio",
    "X16": "Quick Ratio",
    "X17": "INVDH",
    "X18": "Accounts Receivables Days"
}
significant_named_features = {}

for sheet in xls.sheet_names:
    df = pd.read_excel(file_path, sheet_name=sheet)
    df = df[df["Unnamed: 0"].isin(feature_map.keys())]
    significant = df[df["P>|t|"] < 0.1]["Unnamed: 0"]
    named_features = significant.map(feature_map).tolist()
    significant_named_features[sheet] = named_features


significant_features = pd.DataFrame(dict([(k, pd.Series(v)) for k, v in significant_named_features.items()]))
significant_features


Unnamed: 0,Basic Materials,Communication Services,Consumer Cyclical,Consumer Defensive,Energy,Financial Services,Healthcare,Industrials,Real Estate,Technology,Utilities
0,Market Value,Market Value,ROA Surplus,Market Value,Market Value,Market Value,Market Value,Market Value,Market Value,Market Value,Earnings/Share
1,EBIT,ROA Surplus,ROE Surplus,EBIT,Working Capital,ROA Surplus,ROE Surplus,OPM,Working Capital,Earnings/Share,P/E
2,Book/Share,Interest Coverage,Earnings/Share,Working Capital,Price to Book,ROE Surplus,Earnings/Share,Gross Profit Margin,Dividend Yield,OPM,EBIT
3,INVDH,OPM,EBIT,Dividend Yield,,EBIT,P/E,Book/Share,Price to Book,Book/Share,EV/EBIT
4,,Working Capital,Book/Share,Quick Ratio,,EV/EBIT,EBIT,Dividend Yield,Accounts Receivables Days,INVDH,OPM
5,,Gross Profit Margin,Dividend Yield,INVDH,,Working Capital,Interest Coverage,Price to Book,,Accounts Receivables Days,Gross Profit Margin
6,,Dividend Yield,Current Ratio,Accounts Receivables Days,,Book/Share,OPM,,,,Dividend Yield
7,,,Quick Ratio,,,Dividend Yield,Working Capital,,,,
8,,,,,,INVDH,Dividend Yield,,,,
9,,,,,,Accounts Receivables Days,,,,,


In [4]:
sector_data = {}
for sector, group in stock_data.groupby('Sector'):
    sector_data[sector] = group

Next, I establish benchmark thresholds for each metric, derived from a combination of empirical distribution analysis and literature-backed standards. For every company in the top-50 MSCI subset, I apply a scoring function that rewarded metrics exceeding these benchmarks. To normalize and compare across companies, I use min-max scaling within each sector.

In [5]:
feature_benchmarks = {
    "Market Value": lambda x: x > 10_000_000_000,  # > $10B
    "ROA Surplus": lambda x: x > 0.07,             # > 7%
    "ROE Surplus": lambda x: x > 0.15,             # > 15%
    "Earnings/Share": lambda x: x > 2.0,           # EPS > $2
    "P/E": lambda x: 5 < x < 20,                   # Between 5 and 20
    "EBIT": lambda x: x > 500_000_000,             # > $500M
    "Interest Coverage": lambda x: x > 5,          # > 5x
    "EV/EBIT": lambda x: x < 12,                   # < 12
    "OPM": lambda x: x > 0.20,                     # > 20%
    "Working Capital": lambda x: x > 1_000_000_000,# > $1B
    "Gross Profit Margin": lambda x: x > 0.40,     # > 40%
    "Book/Share": lambda x: x > 20,                # > $20
    "Dividend Yield": lambda x: 0.02 <= x <= 0.06, # 2%–6%
    "Price to Book": lambda x: x < 2.5,            # < 2.5
    "Current Ratio": lambda x: x > 1.8,            # > 1.8
    "Quick Ratio": lambda x: x > 1.2,              # > 1.2
    "INVDH": lambda x: x < 100,                    # < 100 days
    "Accounts Receivables Days": lambda x: x < 60  # < 60 days
}


Moving forward, I rank all companies within their respective sectors based on the cumulative score and select the top two per sector for potential inclusion. This process produces a diversified, factor-driven portfolio built on both long-term data analysis and sector-specific insight.

In [6]:
sector_scores = {}


for sector, df in sector_data.items():
    sig_features = significant_features[sector].dropna().tolist()
    scores = pd.Series(0.0, index=df.index)

    for feature in sig_features:
        if feature not in df.columns or feature not in feature_benchmarks:
            continue

        passed = df[feature].apply(lambda x: feature_benchmarks[feature](x) if pd.notnull(x) else False)
        values_to_scale = df.loc[passed, feature].values.reshape(-1, 1)
        if len(values_to_scale) > 1:
            scaled = MinMaxScaler().fit_transform(values_to_scale).flatten()
            scores.loc[passed] += scaled

    result = pd.DataFrame({
        "Stock": df.index,
        "Score": scores
    }).reset_index(drop=True)
    sector_scores[sector] = result


In [7]:
final_selection = []
for sector in sector_scores:
    final_selection.append(sector_scores[sector].sort_values("Score", ascending=False).head(2)['Stock'])
final_selection

[0    LIN
 Name: Stock, dtype: object,
 0    META
 2    GOOG
 Name: Stock, dtype: object,
 2         HD
 3    9988.HK
 Name: Stock, dtype: object,
 4    NESN.SW
 2        WMT
 Name: Stock, dtype: object,
 0       XOM
 1    SHEL.L
 Name: Stock, dtype: object,
 0    BRK-B
 2       MA
 Name: Stock, dtype: object,
 6    NOVO-B.CO
 0          LLY
 Name: Stock, dtype: object,
 0    GE
 Name: Stock, dtype: object,
 4    2330.TW
 6    ASML.AS
 Name: Stock, dtype: object]

Next, I optimize my portfolio by maximizing the Sortino ratio.

In [8]:
tickers = [ticker for x in final_selection for ticker in x.tolist()]


def get_data(tickers, start='2025-01-01', end='2025-02-28'):
    data = yf.download(tickers, start=start, end=end)['Close']
    return data.dropna()


def get_returns(price_df):
    return price_df.pct_change().dropna()


def sortino_ratio(returns, weights, risk_free_rate=0.01):
    portfolio_returns = returns @ weights
    downside_returns = portfolio_returns[portfolio_returns < risk_free_rate]
    expected_return = np.mean(portfolio_returns)
    downside_std = np.std(downside_returns)
    return (expected_return - risk_free_rate) / downside_std if downside_std != 0 else 0


def neg_sortino_ratio(weights, returns, risk_free_rate=0.01):
    return -sortino_ratio(returns, weights, risk_free_rate)

def optimize_weights(returns, risk_free_rate=0.01):
    n_assets = returns.shape[1]
    init_guess = np.ones(n_assets) / n_assets
    bounds = [(0.01, 0.85) for _ in range(n_assets)]
    constraints = {'type': 'eq', 'fun': lambda w: np.sum(w) - 1}

    result = minimize(neg_sortino_ratio, init_guess,
                      args=(returns, risk_free_rate),
                      method='SLSQP',
                      bounds=bounds,
                      constraints=constraints)
    return result.x if result.success else None


price_data = get_data(tickers)
returns = get_returns(price_data)
optimal_weights = optimize_weights(returns)


def calculate_return(tickers, weights, start_date):
    end_date = date.today().isoformat()
    data = get_data(tickers, start=start_date, end=end_date)
    start_prices = data.iloc[0]
    end_prices = data.iloc[-1]
    individual_returns = (end_prices - start_prices) / start_prices
    portfolio_return = np.dot(individual_returns, weights)
    return portfolio_return


if optimal_weights is not None:
    print("\nOptimized Weights based on Sortino Ratio:")
    for ticker, weight in zip(tickers, optimal_weights):
        print(f"{ticker}: {weight:.4f}")

    port_return = calculate_return(tickers, optimal_weights, '2025-02-28')
    print(f"\nPortfolio Return from 2025-02-28 to today: {port_return:.2%}")
else:
    print("Optimization failed.")


[*********************100%***********************]  16 of 16 completed
[**********************94%********************   ]  15 of 16 completed


Optimized Weights based on Sortino Ratio:
LIN: 0.0100
META: 0.8500
GOOG: 0.0100
HD: 0.0100
9988.HK: 0.0100
NESN.SW: 0.0100
WMT: 0.0100
XOM: 0.0100
SHEL.L: 0.0100
BRK-B: 0.0100
MA: 0.0100
NOVO-B.CO: 0.0100
LLY: 0.0100
GE: 0.0100
2330.TW: 0.0100
ASML.AS: 0.0100


[*********************100%***********************]  16 of 16 completed


Portfolio Return from 2025-02-28 to today: 23.01%





In [9]:
def get_acwi_return(start_date='2025-02-28'):
    end_date = date.today().isoformat()
    data = yf.download('ACWI', start=start_date, end=end_date)

    if 'Close' not in data.columns or data.empty:
        print("No adjusted close data found.")
        return None

    adj_close = data['Close'].dropna()

    if adj_close.empty:
        print("No data available in adjusted close.")
        return None

    start_price = adj_close.iloc[0]
    end_price = adj_close.iloc[-1]
    acwi_return = (end_price - start_price) / start_price
    return float(acwi_return)

# Run and print result
acwi_ret = get_acwi_return()
if acwi_ret is not None:
    print(f"MSCI ACWI Return from 2025-02-28 to today: {acwi_ret:.2%}")


[*********************100%***********************]  1 of 1 completed

MSCI ACWI Return from 2025-02-28 to today: 18.19%





As of Noovember 4 2025, the SCFSM strategy's portfolio returns 23.01% (from 2/28/2025) in contrast to the index with 18.19%, an alpha of 4.82%.