### 📌 **Technical Summary of Components Used in the Code:**

1. **Libraries & Tools:**

   * `pandas`: For data loading, filtering, and manipulation.
   * `sentence-transformers`: For Natural Language Processing (NLP) tasks, specifically generating sentence embeddings and computing semantic similarity.

2. **Machine Learning / NLP Techniques:**

   * **Transformer-based Sentence Embeddings** using the pre-trained model `'all-MiniLM-L6-v2'` to represent textual data in vector form.
   * **Cosine Similarity** to compute the semantic closeness between founder needs and provider skills.

3. **Data Processing:**

   * CSV file handling using `pandas`.
   * Filtering user types based on `user_id` prefix ('F' for founders, 'S' for service providers/mentors).

4. **Scoring System:**

   * **Semantic Match Score**: Based on cosine similarity between text embeddings.
   * **Industry Match Score**: Rule-based scoring (100, 70, or 0) based on domain compatibility.

5. **Evaluation Logic:**

   * Manually defined test cases to validate matching quality (strong match, conceptual match, mismatch).

---

### ✅ In Summary:

The code combines **data preprocessing**, **transformer-based NLP**, and **custom rule-based scoring** to build a lightweight semantic matching system for founders and service providers.


In [2]:
!pip install sentence-transformers

Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch>=1.11.0->sentence-transformers)
 

In [3]:
import pandas as pd

#loading the dataset
df = pd.read_csv('Cleaned_User_Matching_Dataset.csv')

# Filtering founders only
founders_df = df[df['user_id'].str.startswith('F')].copy()

# Filter for Service Providers or mentors
providers_df = df[df['user_id'].str.startswith('S')].copy()


print(f"Number of Founders: {len(founders_df)}")
print(f"Number of Service Providers/Mentors: {len(providers_df)}")


print("\nFounders DataFrame:")
display(founders_df.head())

Number of Founders: 50
Number of Service Providers/Mentors: 50

Founders DataFrame:


Unnamed: 0,user_id,user_type,startup_stage,startup_industry,project_need,tech_requirement,project_deadline,expertise_area,industry_preference,preferred_project_type,core_skill,availability
0,F001,Founder,Ideation,SaaS,Digital Marketing,AWS,1 Month,,,,,
1,F002,Founder,Ideation,FinTech,Pitch Deck Design,React,2 Months,,,,,
2,F003,Founder,Scaling,SaaS,Pitch Deck Design,GST Filing,1 Month,,,,,
3,F004,Founder,Early Growth,SaaS,Fundraising Support,React,2 Weeks,,,,,
4,F005,Founder,Early Growth,HealthTech,Fundraising Support,Python,2 Months,,,,,


Here it calculates the semantic similarity between two pieces of text and returns a score from 0 to 100.

 Instead of checking if keywords are identical, we'll use a pre-trained AI model to understand the meaning behind the text.

In [4]:
from sentence_transformers import SentenceTransformer, util

# Loaded SentenceTransformer to find similar sentences.

model = SentenceTransformer('all-MiniLM-L6-v2')

def calculate_semantic_score(text1, text2):

  # Encode the text into numerical vectors or embeddings
  embedding1 = model.encode(text1, convert_to_tensor=True)
  embedding2 = model.encode(text2, convert_to_tensor=True)

  # Calculating cosine similarity between the vectors
  cosine_score = util.pytorch_cos_sim(embedding1, embedding2)

  # converts the raw similarity score into a simple whole number between 0 and 100.
  score = int(cosine_score.item() * 100)

  return score

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Testing the new function to see how well it understands language.

This approach is to handle the cases where the names are not the same, which is common in the real world.

In [5]:
# Test 1: A strong match
founder_need_1 = "I need to build a new mobile app from scratch using Python."
provider_skill_1 = "Expert in Python and Full-Stack Development for mobile apps."
score1 = calculate_semantic_score(founder_need_1, provider_skill_1)
print(f"Match Score 1 (Strong): {score1}/100")


# Test 2: A conceptual match
founder_need_2 = "We need a complete UI/UX Revamp for our website."
provider_skill_2 = "Design Expert"
score2 = calculate_semantic_score(founder_need_2, provider_skill_2)
print(f"Match Score 2 (Conceptual): {score2}/100")


# Test 3: A strong mismatch
founder_need_3 = "Help with GST Filing and accounting."
provider_skill_3 = "I am a growth marketing expert specializing in SEO."
score3 = calculate_semantic_score(founder_need_3, provider_skill_3)
print(f"Match Score 3 (Mismatch): {score3}/100")


Match Score 1 (Strong): 73/100
Match Score 2 (Conceptual): 27/100
Match Score 3 (Mismatch): 18/100


This function will compare the founder's industry with the provider's preferred industry.
- A direct match (e.g., "HealthTech" vs. "HealthTech") is a perfect score.
- If the provider is open to "Any" industry, it's a good, but not perfect, match.
- A mismatch results in a score of 0.

In [6]:
def calculate_industry_score(founder_industry, provider_preference):
  if founder_industry == provider_preference:
    return 100  # Perfect match
  elif provider_preference == 'Any':
    return 70   # Provider is flexible, which is good but not specialized
  else:
    return 0    # Mismatch

A provider must be available when the founder needs them. So here we'll map the text descriptions of availability to numbers to compare them logically to find match according to the availability

In [7]:
# Defining the numerical values for deadlines and availability, higher numbers mean more urgent or more available.
deadline_mapping = {'Immediate': 3, 'Within 1 Month': 2, 'Flexible': 1}
availability_mapping = {'Immediate': 3, 'Within 2 Weeks': 2, 'In 1-2 Months': 1, 'Unavailable': 0}

def calculate_timeline_score(project_deadline, provider_availability):
  # Calculates the score based on timeline
  founder_need = deadline_mapping.get(project_deadline, 0)
  provider_can_start = availability_mapping.get(provider_availability, 0)

  # If the provider is available when or before the founder needs them, it's a match.
  if provider_can_start >= founder_need:
    return 100
  else:
    return 0 # 0 if they are not available in time

Combining all our scoring logic into one master function.

This function will take a founder's record and a provider's record, call other functions, and calculate the final weighted score out of 100.

In [8]:
def calculate_match_score(founder, provider):
  # Defining weights for each category
  weights = {
      'skill_and_project': 0.50, # 50 points
      'industry': 0.30,          # 30 points
      'timeline': 0.20           # 20 points
  }

  # Calculating Skill & Project Match
  # Here we do two semantic checks and average them for a robust score
  skill_score = calculate_semantic_score(founder['tech_requirement'], provider['core_skill'])
  project_type_score = calculate_semantic_score(founder['project_need'], provider['expertise_area'])

  avg_semantic_score = (skill_score + project_type_score) / 2

  # Calculating Industry Match
  industry_score = calculate_industry_score(founder['startup_industry'], provider['industry_preference'])

  # Calculating Timeline Match
  timeline_score = calculate_timeline_score(founder['project_deadline'], provider['availability'])

  # Calculating Final Weighted Score
  final_score = (avg_semantic_score * weights['skill_and_project']) + \
                (industry_score * weights['industry']) + \
                (timeline_score * weights['timeline'])

  score_details = {
      'Skill Score': round(skill_score),
      'Project Type Score': round(project_type_score),
      'Industry Score': round(industry_score),
      'Timeline Score': round(timeline_score),
      'Final Score': round(final_score)
  }

  return score_details

Here we'll loop through every founder and every provider, run our calculate_match_score function and store the results(compute all match scores)

In [9]:
# This list stores the detailed results for every possible pair
all_scores = []

# Iterating over each founder
for _, founder in founders_df.iterrows():
  # For each founder, iterating over every provider
  for _, provider in providers_df.iterrows():
    # Calculating the match score for the pair
    score_details = calculate_match_score(founder, provider)

    # Adding the user IDs to the results
    result = {
        'founder_id': founder['user_id'],
        'provider_id': provider['user_id'],
        **score_details
    }

    # Adding the complete result to the list
    all_scores.append(result)

match_results_df = pd.DataFrame(all_scores)

# Displaying the top results
print("Top scoring matches overall:")
display(match_results_df.sort_values(by='Final Score', ascending=False).head())

Top scoring matches overall:


Unnamed: 0,founder_id,provider_id,Skill Score,Project Type Score,Industry Score,Timeline Score,Final Score
992,F020,S043,100,58,100,100,90
2292,F046,S043,100,58,100,100,90
2249,F045,S050,100,58,100,100,90
1718,F035,S019,100,52,100,100,88
171,F004,S022,100,52,100,100,88


Creating the match matrix

In [10]:
# Pivoting the results (Rows = founders, Columns = providers, Values = final match score)
match_matrix = match_results_df.pivot_table(
    index='founder_id',
    columns='provider_id',
    values='Final Score'
)

print("Match Matrix (Founders vs. Providers):")
display(match_matrix.head())

Match Matrix (Founders vs. Providers):


provider_id,S001,S002,S003,S004,S005,S006,S007,S008,S009,S010,...,S041,S042,S043,S044,S045,S046,S047,S048,S049,S050
founder_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
F001,26.0,28.0,30.0,29.0,37.0,30.0,30.0,78.0,81.0,26.0,...,41.0,30.0,41.0,59.0,27.0,34.0,60.0,60.0,39.0,37.0
F002,31.0,55.0,30.0,26.0,21.0,46.0,25.0,26.0,25.0,31.0,...,26.0,36.0,26.0,28.0,27.0,28.0,28.0,28.0,46.0,21.0
F003,56.0,22.0,22.0,23.0,46.0,20.0,22.0,54.0,53.0,56.0,...,22.0,32.0,22.0,62.0,24.0,20.0,54.0,54.0,21.0,46.0
F004,22.0,26.0,31.0,27.0,26.0,58.0,38.0,57.0,67.0,22.0,...,31.0,27.0,31.0,52.0,28.0,41.0,60.0,60.0,52.0,26.0
F005,23.0,29.0,59.0,48.0,28.0,67.0,38.0,29.0,39.0,23.0,...,52.0,48.0,52.0,29.0,28.0,39.0,50.0,50.0,61.0,58.0


Two sided recommedation functions, finding the top N recommended providers/founders for a specific  founder/provider and then filtering the results for the specified founder/provider.

In [11]:
def get_top_providers_for_founder(founder_id, top_n=3):

  founder_matches = match_results_df[match_results_df['founder_id'] == founder_id]

  top_matches = founder_matches.sort_values(by='Final Score', ascending=False).head(top_n)
  return top_matches

def get_top_founders_for_provider(provider_id, top_n=3):

  provider_matches = match_results_df[match_results_df['provider_id'] == provider_id]

  top_matches = provider_matches.sort_values(by='Final Score', ascending=False).head(top_n)
  return top_matches

Testing the two sided recommedation function

In [15]:
# Test 1: A founder looking for help
founder_id_to_test = 'F001'
recommendations_for_founder = get_top_providers_for_founder(founder_id_to_test)
print(f" Top 3 Recommendations for Founder {founder_id_to_test}:")
display(recommendations_for_founder)

# Test 2: A provider looking for projects
provider_id_to_test = 'S001'
recommendations_for_provider = get_top_founders_for_provider(provider_id_to_test)
print(f"\n Top 3 Matching Projects for Provider {provider_id_to_test}:")
display(recommendations_for_provider)

 Top 3 Recommendations for Founder F001:


Unnamed: 0,founder_id,provider_id,Skill Score,Project Type Score,Industry Score,Timeline Score,Final Score
8,F001,S009,100,24,100,100,81
20,F001,S021,100,24,100,100,81
14,F001,S015,100,16,100,100,79



 Top 3 Matching Projects for Provider S001:


Unnamed: 0,founder_id,provider_id,Skill Score,Project Type Score,Industry Score,Timeline Score,Final Score
450,F010,S001,7,46,100,100,63
100,F003,S001,100,46,0,100,56
700,F015,S001,16,10,100,100,56


***Final output***

Here, we get the output for 5 founders and 5 providers
- First table 'Recommendation for Founders'  include Founder ID,	Recommended Provider,	Match Score,	Reason for Match
- Second table 'Recommedation for Providers' include Provider ID,	Recommended Founder,	Match Score,	Reason for Match



In [16]:
# 5 sample founders and 5 providers
sample_founder_ids = founders_df['user_id'].head(5).tolist()
sample_provider_ids = providers_df['user_id'].head(5).tolist()

# two separate lists to hold the results
founder_summary_list = []
provider_summary_list = []

# Generating recommendations for founders
for founder_id in sample_founder_ids:
  recs = get_top_providers_for_founder(founder_id)
  for _, row in recs.iterrows():
    # Creating the reason based on the highest final score
    reason = f"Excellent skill ({row['Skill Score']}) and project type ({row['Project Type Score']}) alignment."
    if row['Industry Score'] > 80:
      reason += f" Strong industry match."

    founder_summary_list.append({
        'Founder ID': founder_id,
        'Recommended Provider': row['provider_id'],
        'Match Score': row['Final Score'],
        'Reason for Match': reason
    })

# Generating recommendations for providers
for provider_id in sample_provider_ids:
  recs = get_top_founders_for_provider(provider_id)
  for _, row in recs.iterrows():
    reason = f"Strong fit for your core skill ({row['Skill Score']}) and expertise ({row['Project Type Score']})."
    if row['Timeline Score'] == 100:
      reason += f" Timeline is a perfect fit."

    provider_summary_list.append({
        'Provider ID': provider_id,
        'Recommended Founder': row['founder_id'],
        'Match Score': row['Final Score'],
        'Reason for Match': reason
    })


founder_summary_df = pd.DataFrame(founder_summary_list)
provider_summary_df = pd.DataFrame(provider_summary_list)

print("✅ Recommendations for Founders")
display(founder_summary_df)

print("\n✅ Matching Projects for Service Providers")
display(provider_summary_df)

✅ Recommendations for Founders


Unnamed: 0,Founder ID,Recommended Provider,Match Score,Reason for Match
0,F001,S009,81,Excellent skill (100) and project type (24) al...
1,F001,S021,81,Excellent skill (100) and project type (24) al...
2,F001,S015,79,Excellent skill (100) and project type (16) al...
3,F002,S016,78,Excellent skill (100) and project type (13) al...
4,F002,S036,62,Excellent skill (17) and project type (30) ali...
5,F002,S039,60,Excellent skill (32) and project type (8) alig...
6,F003,S011,78,Excellent skill (100) and project type (13) al...
7,F003,S044,62,Excellent skill (16) and project type (30) ali...
8,F003,S035,62,Excellent skill (2) and project type (46) alig...
9,F004,S022,88,Excellent skill (100) and project type (52) al...



✅ Matching Projects for Service Providers


Unnamed: 0,Provider ID,Recommended Founder,Match Score,Reason for Match
0,S001,F010,63,Strong fit for your core skill (7) and experti...
1,S001,F003,56,Strong fit for your core skill (100) and exper...
2,S001,F015,56,Strong fit for your core skill (16) and expert...
3,S002,F028,66,Strong fit for your core skill (20) and expert...
4,S002,F006,59,Strong fit for your core skill (27) and expert...
5,S002,F017,57,Strong fit for your core skill (21) and expert...
6,S003,F033,78,Strong fit for your core skill (100) and exper...
7,S003,F025,62,Strong fit for your core skill (32) and expert...
8,S003,F005,59,Strong fit for your core skill (23) and expert...
9,S004,F021,76,Strong fit for your core skill (100) and exper...


***Heatmap of the match matrix***

( Brighter squares indicate stronger matches)

In [17]:
import plotly.express as px

fig = px.imshow(match_matrix,
                labels=dict(x="Service Provider ID", y="Founder ID", color="Match Score"),
                title="Heatmap of Founder-Provider Match Scores")

fig.update_layout(
    width=1000,
    height=800
)


fig.show()