# Task 1: Get Matching Person Names

## Objective
Build a name-matching system that finds the most similar person names from a dataset when a user inputs a name.

The system returns:
- The **best matching name** with a similarity score
- A **ranked list of other similar names** with their similarity scores


## Approach Used

1. A dataset of 50 similar person names is prepared.
2. User input is normalized (case-insensitive, trimmed).
3. String similarity is computed using **RapidFuzz**.
4. Names are ranked based on similarity scores.
5. Weak matches are filtered using a configurable threshold.
6. Output is returned in a structured, JSON-compatible format.

This approach is lightweight, fast, and works completely offline.


In [1]:
!pip install rapidfuzz

Collecting rapidfuzz
  Downloading rapidfuzz-3.14.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (12 kB)
Downloading rapidfuzz-3.14.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (3.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.2/3.2 MB[0m [31m32.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: rapidfuzz
Successfully installed rapidfuzz-3.14.3


## Dataset Preparation

The dataset contains **50 names**, including:
- Spelling variations
- Phonetic variations
- Full names
- Shortened names

This exceeds the minimum requirement of 30 names.


In [2]:
NAMES_DATABASE = [
    "Geetha", "Gita", "Gitu", "Geeta", "Getha", "Geethu", "Githa",
    "Suresh", "Suresh Kumar", "Suresha", "Suraj", "Suresh K", "Sures",
    "Ramesh", "Ramesha", "Ram", "Ramu", "Rameshwar", "Rames",
    "Anita", "Anitha", "Anitah", "Anita Sharma", "Anit",
    "Sunita", "Suneeta", "Sunitha", "Sunit",
    "Kiran", "Kiran Kumar", "Kiranmai", "Kiran K",
    "Rahul", "Rahil", "Raul", "Rahool", "Rahul Singh", "Rah",
    "Amit", "Amith", "Ameet", "Amit Kumar",
    "Priya", "Priyanka"
]


## Why RapidFuzz?

- Faster than fuzzywuzzy
- Actively maintained
- Industry-accepted for fuzzy string matching
- Handles typos, partial matches, and reordered words

We use the **WRatio** scorer for best overall accuracy.


In [3]:
from rapidfuzz import process, fuzz

def normalize_name(name: str) -> str:
    return name.strip().lower()


def find_similar_names(
    user_input: str,
    names_list: list,
    top_n: int = 5,
    threshold: int = 60
):
    """
    Finds similar names from the dataset based on string similarity.
    Handles edge cases internally.
    """

    # Handle empty input
    if not user_input or not user_input.strip():
        return {
            "error": "Input name cannot be empty",
            "best_match": None,
            "matches": []
        }

    normalized_input = normalize_name(user_input)

    results = process.extract(
        normalized_input,
        names_list,
        scorer=fuzz.WRatio,
        limit=top_n
    )

    matches = [
    {"name": name, "similarity_score": round(score, 2)}
    for name, score, _ in results
    if score >= threshold and normalize_name(name) != normalized_input
]


    # Handle no match case
    if not matches:
        return {
            "message": "No similar names found",
            "best_match": None,
            "matches": []
        }

    return {
        "best_match": matches[0],
        "matches": matches
    }


## Using the Name Matching Function

The function `find_similar_names`:
- Accepts a user-entered name
- Returns the closest matching name with a similarity score
- Returns a ranked list of similar names
- Handles empty input and no-match scenarios internally


In [4]:
user_input = "Geeta"
result = find_similar_names(user_input, NAMES_DATABASE)
result


{'best_match': {'name': 'Geetha', 'similarity_score': 72.73},
 'matches': [{'name': 'Geetha', 'similarity_score': 72.73},
  {'name': 'Suneeta', 'similarity_score': 66.67},
  {'name': 'Getha', 'similarity_score': 60.0},
  {'name': 'Ameet', 'similarity_score': 60.0}]}

### Output Structure

- **best_match**: The most similar name with the highest similarity score
- **matches**: Ranked list of similar names with scores
- **error/message**: Returned only when applicable


In [5]:
find_similar_names("Sures", NAMES_DATABASE)


{'best_match': {'name': 'Suresh', 'similarity_score': 72.73},
 'matches': [{'name': 'Suresh', 'similarity_score': 72.73},
  {'name': 'Suresh Kumar', 'similarity_score': 72.0},
  {'name': 'Suresh K', 'similarity_score': 72.0},
  {'name': 'Suresha', 'similarity_score': 66.67}]}

In [6]:
find_similar_names("", NAMES_DATABASE)


{'error': 'Input name cannot be empty', 'best_match': None, 'matches': []}

In [7]:
find_similar_names("XyzUnknown", NAMES_DATABASE)


{'message': 'No similar names found', 'best_match': None, 'matches': []}

## Visual Representation of Similarity Scores


In [8]:
import pandas as pd

output = find_similar_names("Geeta", NAMES_DATABASE)

pd.DataFrame(output["matches"])


Unnamed: 0,name,similarity_score
0,Geetha,72.73
1,Suneeta,66.67
2,Getha,60.0
3,Ameet,60.0


## Design Decisions

- **RapidFuzz (WRatio)** was chosen for accurate fuzzy matching
- Threshold filtering avoids weak or irrelevant matches
- Structured dictionary output enables easy API integration
- Edge cases handled inside the core logic for robustness


## Scalability & Future Improvements

- Can be scaled using vector databases (FAISS) for large datasets
- Can be exposed as a REST API using FastAPI
- Can be extended with phonetic algorithms (Soundex, Metaphone)
