# Caprae Capital – AI-Driven Lead Recommendation Tool

This notebook presents a lightweight AI-based prototype designed to support Caprae Capital’s mission of transforming acquired businesses using strategic technology. The objective is to help founder-led teams identify relevant companies for partnerships, sales outreach, or acquisition opportunities, based on financial and market similarity.

The tool takes as input basic company information—growth rate, revenue, domain, and country—and uses KMeans clustering to surface similar companies from a dataset of 1,000 synthetic companies. It is built to simulate the real-world decision-making process of a founder or operator when evaluating potential targets.

The model was developed under the 5-hour challenge constraint. It emphasizes clarity, relevance, and minimal input for high-leverage output.


In [12]:
# Step 1: Importing dependencies
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.cluster import KMeans

In [13]:
# Step 2: Load dataset
df = pd.read_csv('/content/Caprae_Capital_Overview.csv')
df.head()

Unnamed: 0,Name,Growth_Rate,Country,Revenue_Cr,Domain,Listed
0,Company_1,31.2,Canada,99.88,Logistics,Yes
1,Company_2,55.65,India,86.44,HealthTech,Yes
2,Company_3,85.28,Germany,80.0,HealthTech,No
3,Company_4,-3.79,UAE,417.9,EdTech,No
4,Company_5,10.0,Canada,312.57,FinTech,No


In [14]:
df.head()

Unnamed: 0,Name,Growth_Rate,Country,Revenue_Cr,Domain,Listed
0,Company_1,31.2,Canada,99.88,Logistics,Yes
1,Company_2,55.65,India,86.44,HealthTech,Yes
2,Company_3,85.28,Germany,80.0,HealthTech,No
3,Company_4,-3.79,UAE,417.9,EdTech,No
4,Company_5,10.0,Canada,312.57,FinTech,No


In [15]:

# Step 3: Define features for clustering
features = ["Growth_Rate", "Revenue_Cr", "Country", "Domain"]
X = df[features]

In [16]:
# Step 4: Preprocessing and clustering
numeric_features = ["Growth_Rate", "Revenue_Cr"]
categorical_features = ["Country", "Domain"]

numeric_transformer = StandardScaler()
categorical_transformer = OneHotEncoder(handle_unknown='ignore')

preprocessor = ColumnTransformer([
    ('num', numeric_transformer, numeric_features),
    ('cat', categorical_transformer, categorical_features)
])

pipeline = Pipeline([
    ('preprocessor', preprocessor),
    ('kmeans', KMeans(n_clusters=6, random_state=42))
])

pipeline.fit(X)
df["Cluster"] = pipeline.named_steps["kmeans"].labels_

In [17]:
# Step 5: Simulated founder input
user_input = {
    "Growth_Rate": 30,
    "Revenue_Cr": 120,
    "Country": "India",
    "Domain": "SaaS"
}
user_df = pd.DataFrame([user_input])

In [18]:
# Step 6: Predict user's cluster
input_transformed = pipeline.named_steps["preprocessor"].transform(user_df)
predicted_cluster = pipeline.named_steps["kmeans"].predict(input_transformed)[0]

In [19]:
# Step 7: Custom business filtering logic
user_country = user_input["Country"]
user_domain = user_input["Domain"]
user_revenue = user_input["Revenue_Cr"]

min_revenue = user_revenue - 50
max_revenue = user_revenue + 50

# Filter companies based on user's preferences
filtered_df = df[
    (df["Country"] == user_country) &
    (df["Domain"] == user_domain) &
    (df["Revenue_Cr"] >= min_revenue) &
    (df["Revenue_Cr"] <= max_revenue) &
    (df["Growth_Rate"] > 0)
]

# Display top 10 matching companies
filtered_df[["Name", "Country", "Domain", "Revenue_Cr", "Growth_Rate"]].head(10)


Unnamed: 0,Name,Country,Domain,Revenue_Cr,Growth_Rate
525,Company_526,India,SaaS,120.06,2.19
759,Company_760,India,SaaS,113.88,35.74


## Summary

This notebook presents a practical and targeted lead recommendation system, designed to align with the operational mindset of founder-led businesses. It was developed as part of Caprae Capital’s AI-readiness challenge, under the 5-hour constraint.

The system combines unsupervised machine learning (KMeans clustering) with specific business logic filters to identify highly relevant companies for potential partnerships, acquisitions, or outreach.

Key highlights:
- Clustering was performed on revenue, growth rate, domain, and country.
- User input was mapped to a cluster using KMeans to simulate strategic segmentation.
- Final recommendations were generated using defined filters:
  - Same country as the user
  - Same domain/industry
  - Revenue range within ±₹50 Cr of the user’s company
  - Positive growth rate

This hybrid approach ensures both algorithmic intelligence and business-context relevance. The model can be extended to include additional signals such as funding stage, valuation, or team size. It reflects Caprae Capital’s vision of applying technology meaningfully to enhance business decision-making across its portfolio.

