<a href="https://colab.research.google.com/github/PranavBK1/Cartoon-Conversion/blob/main/Classment_Assignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [13]:
# Import necessary libraries
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer

# Step 1: Create a dataset of job roles and their associated skills
data = {
    'Job Role': [
        'Data Scientist',
        'Data Analyst',
        'Machine Learning Engineer',
        'Data Engineer',
        'AI Researcher',
        'Business Analyst',
        'NLP Engineer'
    ],
    'Skills': [
        'Python, R, SQL, Machine Learning',
        'SQL, Excel, Data Visualization',
        'Python, Machine Learning, Deep Learning',
        'SQL, Data Warehousing, ETL',
        'Python, R, Machine Learning, Statistics',
        'SQL, Excel, Business Intelligence',
        'Python, NLP, Machine Learning'
    ]
}

# Create a DataFrame
df = pd.DataFrame(data)

# Step 2: Calculate cosine similarity
count_vectorizer = CountVectorizer()
count_matrix = count_vectorizer.fit_transform(df['Skills'])
cosine_sim = cosine_similarity(count_matrix)

# Step 3: Create a function to get recommendations
def get_recommendations(job_role):
    # Check if the job role exists in the DataFrame
    if job_role not in df['Job Role'].values:
        return f"Job role '{job_role}' not found in the dataset."

    # Get the index of the job role
    idx = df[df['Job Role'] == job_role].index[0]

    # Get the pairwise similarity scores of all job roles with that job role
    sim_scores = list(enumerate(cosine_sim[idx]))

    # Sort the job roles based on similarity scores
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    # Get the top 3 similar job roles (excluding the input job role itself)
    top_roles = [df['Job Role'][i[0]] for i in sim_scores[1:4]]

    return top_roles

# Step 4: Test the function
input_role = input("Enter the job role: ")
recommended_roles = get_recommendations(input_role)

# Output the recommendations
print(f"Top 3 recommended roles for '{input_role}': {recommended_roles}")

Enter the job role: Machine Learning Engineer
Top 3 recommended roles for 'Machine Learning Engineer': ['Data Scientist', 'AI Researcher', 'NLP Engineer']


**Approach to solve the problem**
Approach to Building the Job Recommendation Engine
Define the Objective:

The goal is to create a recommendation engine that suggests similar job roles based on the skills required for a given job role.
Data Collection:

Gather a dataset containing job roles and their associated skills. This dataset serves as the foundation for the recommendation engine.
Data Preparation:

Create a structured dataset (in this case, a pandas DataFrame) where each job role is mapped to its corresponding skills. Ensure that the lengths of the job roles and skills lists match.
Feature Extraction:

Use CountVectorizer from the sklearn library to convert the skills into a matrix of token counts. This step transforms the textual data into a numerical format suitable for similarity calculations.
Calculate Cosine Similarity:

Compute the cosine similarity between the skill vectors using the cosine_similarity function. This results in a similarity matrix that quantifies how similar each job role is to every other job role based on their skills.
Recommendation Function:

Implement a function (get_recommendations) that:
Takes a job role as input.
Checks if the job role exists in the dataset.
Retrieves the index of the job role and its corresponding similarity scores.
Sorts the scores to find the top 3 most similar job roles, excluding the input job role itself.
User Interaction:

Prompt the user to input a job role for which they want recommendations. This allows for dynamic interaction with the recommendation engine.
Output Recommendations:

Display the top 3 recommended job roles based on the input, providing users with relevant suggestions.

**Reason for choosing cosine Similarity**
Cosine similarity is chosen for measuring the similarity between job roles based on skills for several reasons:

Magnitude Independence: It focuses on the direction of the skill vectors rather than their magnitude, making it effective for comparing job roles with varying numbers of skills.

High-Dimensional Data: It works well in sparse, high-dimensional spaces, which is common when representing skills.

Interpretability: The similarity score ranges from -1 to 1, making it easy to understand the relationship between job roles.

Common Usage: It is widely used in text analysis and recommendation systems, making it a well-established choice.

Efficiency: Calculating cosine similarity is computationally efficient, allowing for quick comparisons even with larger datasets.

Overall, cosine similarity provides a clear, robust, and efficient way to assess the similarity of job roles based on their skill sets.