M&A Prediction with Risk (S&P 500)

Project Overview - This project predicts which companies are likely M&A targets using real financial and legal risk data.
I used a classification model (XGBoost) and explained the predictions using SHAP (Part II)

S1 — Load & Clean Ticker List

In [None]:
import pandas as pd

df = pd.read_csv("constituents.csv")
df["Symbol"] = df["Symbol"].str.replace(".", "-", regex=False)
symbols = df["Symbol"].tolist()

print("Total companies:", len(symbols))
print("Sample tickers:", symbols[:10])

Total companies: 503
Sample tickers: ['MMM', 'AOS', 'ABT', 'ABBV', 'ACN', 'ADBE', 'AMD', 'AES', 'AFL', 'A']


S2 Filter by sector 

In [13]:
pe_sectors = [
    "Technology",
    "Healthcare",
    "Industrials",
    "Real Estate",
    "Consumer Discretionary",
    "Financials",
    "Energy"
]

df_sector_filtered = df[df["GICS Sector"].isin(pe_sectors)].reset_index(drop=True)

print("Remaining after sector filter:", len(df_sector_filtered))
df_sector_filtered.head()

Remaining after sector filter: 256


Unnamed: 0,Symbol,Security,GICS Sector,GICS Sub-Industry,Headquarters Location,Date added,CIK,Founded
0,MMM,3M,Industrials,Industrial Conglomerates,"Saint Paul, Minnesota",1957-03-04,66740,1902
1,AOS,A. O. Smith,Industrials,Building Products,"Milwaukee, Wisconsin",2017-07-26,91142,1916
2,AFL,Aflac,Financials,Life & Health Insurance,"Columbus, Georgia",1999-05-28,4977,1955
3,ABNB,Airbnb,Consumer Discretionary,"Hotels, Resorts & Cruise Lines","San Francisco, California",2023-09-18,1559720,2008
4,ARE,Alexandria Real Estate Equities,Real Estate,Office REITs,"Pasadena, California",2017-03-20,1035443,1994


S3 — Collect financials from yfinance

In [14]:
import yfinance as yf
import time

symbols = df_sector_filtered["Symbol"].tolist()
financials = []

for symbol in symbols:
    try:
        stock = yf.Ticker(symbol)
        info = stock.info

        financials.append({
            "Symbol": symbol,
            "Market_Cap": info.get("marketCap"),
            "EBITDA_Margin": info.get("ebitdaMargins"),
            "ROA": info.get("returnOnAssets"),
            "Debt_Equity": info.get("debtToEquity"),
            "Sector": info.get("sector"),
            "Sub_Industry": info.get("industry")
        })

        time.sleep(0.5)
    except Exception as e:
        print(symbol, "error:", e)

df_fin = pd.DataFrame(financials)
df_fin.to_csv("filtered_sector_financials.csv", index=False)
df_fin.head()

Unnamed: 0,Symbol,Market_Cap,EBITDA_Margin,ROA,Debt_Equity,Sector,Sub_Industry
0,MMM,70747734016,0.22258,0.05929,350.77,Industrials,Conglomerates
1,AOS,8861479936,0.20364,0.13533,12.132,Industrials,Specialty Industrial Machinery
2,AFL,56609198080,0.35156,0.03384,37.796,Financial Services,Insurance - Life
3,ABNB,69262327808,0.23275,0.07671,27.271,Consumer Cyclical,Travel Services
4,ARE,12888544256,0.62756,0.01419,56.93,Real Estate,REIT - Office


S4 - Narrow target list using size, profitability, and leverage filters

In [None]:
Market cap filter (1B-30B)
df1 = df_fin[
    (df_fin["Market_Cap"] >= 1e9) &
    (df_fin["Market_Cap"] <= 3e10)
]
print("After Market Cap filter:", df1.shape[0])

Profitability filter (EBITDA margin ≥ 10%, ROA > 0)
df2 = df1[
    (df1["EBITDA_Margin"] >= 0.10) &
    (df1["ROA"] > 0)
]
print("After Profitability filter:", df2.shape[0])

Leverage filter (D/E < 4.0)
df3 = df2[df2["Debt_Equity"] < 4.0]
print("After Leverage filter:", df3.shape[0])

Final target candidates
df3.reset_index(drop=True, inplace=True)
df3.to_csv("pe_target_candidates_filtered.csv", index=False)
df3.head()


After Market Cap filter: 114
After Profitability filter: 96
After Leverage filter: 3


Unnamed: 0,Symbol,Market_Cap,EBITDA_Margin,ROA,Debt_Equity,Sector,Sub_Industry
0,ERIE,21093484544,0.20193,0.16552,0.378,Financial Services,Insurance Brokers
1,TROW,18870568960,0.39812,0.11412,2.95,Financial Services,Asset Management
2,TPL,26758215680,0.80685,0.28298,0.11,Energy,Oil & Gas E&P
