# Importing Necessary Libraries

### 1. `pandas`:
- **Code**: `import pandas as pd`
- **Purpose**: `pandas` is a powerful library for data manipulation and analysis. It provides data structures such as DataFrames and Series that are essential for working with structured data.

### 2. `TfidfVectorizer` from `sklearn.feature_extraction.text`:
- **Code**: `from sklearn.feature_extraction.text import TfidfVectorizer`
- **Purpose**:
  - Converts textual data into numerical feature vectors using the **TF-IDF** (Term Frequency-Inverse Document Frequency) method.
  - It quantifies the importance of words in documents by considering their frequency and uniqueness in a dataset.

### 3. `cosine_similarity` from `sklearn.metrics.pairwise`:
- **Code**: `from sklearn.metrics.pairwise import cosine_similarity`
- **Purpose**:
  - Measures the similarity between two vectors based on the cosine of the angle between them.
  - Often used to compute similarity between text data converted to vectors (e.g., using `TfidfVectorizer`).

### 4. `LabelEncoder` from `sklearn.preprocessing`:
- **Code**: `from sklearn.preprocessing import LabelEncoder`
- **Purpose**:
  - Encodes categorical labels into numerical form.
  - Useful for preprocessing non-numeric labels (e.g., text categories) into machine-learning-friendly numeric values.

In [20]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import LabelEncoder


#Loading and Displaying a Job Dataset

In [21]:
# Load the job dataset
df = pd.read_csv('job_descriptions2.csv')

# Display the first few rows of the dataset
df.head()


  df = pd.read_csv('job_descriptions2.csv')


Unnamed: 0,Experience,Qualifications,Salary Range,location,Country,Work Type,Preference,Job Title,Role,Job Description,skills,Responsibilities,Company
0,5 to 15 Years,M.Tech,$59K-$99K,Douglas,Isle of Man,Intern,Female,Digital Marketing Specialist,Social Media Manager,Social Media Managers oversee an organizations...,"Social media platforms (e.g., Facebook, Twitte...","Manage and grow social media accounts, create ...",Icahn Enterprises
1,2 to 12 Years,BCA,$56K-$116K,Ashgabat,Turkmenistan,Intern,Female,Web Developer,Frontend Web Developer,Frontend Web Developers design and implement u...,"HTML, CSS, JavaScript Frontend frameworks (e.g...","Design and code user interfaces for websites, ...",PNC Financial Services Group
2,0 to 12 Years,PhD,$61K-$104K,Macao,"Macao SAR, China",Temporary,Male,Operations Manager,Quality Control Manager,Quality Control Managers establish and enforce...,Quality control processes and methodologies St...,Establish and enforce quality control standard...,United Services Automobile Assn.
3,4 to 11 Years,PhD,$65K-$91K,Porto-Novo,Benin,Full-Time,Female,Network Engineer,Wireless Network Engineer,"Wireless Network Engineers design, implement, ...",Wireless network design and architecture Wi-Fi...,"Design, configure, and optimize wireless netwo...",Hess
4,1 to 12 Years,MBA,$64K-$87K,Santiago,Chile,Intern,Female,Event Manager,Conference Manager,A Conference Manager coordinates and manages c...,Event planning Conference logistics Budget man...,Specialize in conference and convention planni...,Cairn Energy


#Handling Missing Values and Combining Text Column

In [22]:
# Fill missing values
df['job_title'] = df['Job Title'].fillna('')
df['job_description'] = df['Job Description'].fillna('')
df['skills'] = df['skills'].fillna('')

# Combine relevant text columns
df['combined_text'] = df['job_title'] + ' ' + df['job_description'] + ' ' + df['skills']

# Display the combined column
df['combined_text'].head()


Unnamed: 0,combined_text
0,Digital Marketing Specialist Social Media Mana...
1,Web Developer Frontend Web Developers design a...
2,Operations Manager Quality Control Managers es...
3,Network Engineer Wireless Network Engineers de...
4,Event Manager A Conference Manager coordinates...


#Initializing and Applying TF-IDF Vectorization

In [23]:
# Initialize TF-IDF Vectorizer
tfidf = TfidfVectorizer(stop_words='english', max_features=5000)

# Fit and transform the combined text column
tfidf_matrix = tfidf.fit_transform(df['combined_text'])

# Check the shape of the TF-IDF matrix
tfidf_matrix.shape


(233905, 1928)

In [24]:
# Example user profile
# user_profile = {
#     'job_title': 'Data Scientist',
#     'skills': 'machine learning, data analysis, python',
#     'location': 'Remote'
# }

user_profile = {
    'job_title': 'Frontend Web Developer	',
    'skills': 'HTML, CSS, JavaScript Frontend frameworks',
    'location': 'Remote'
}

# Combine user profile information into a single text
user_text = user_profile['job_title'] + ' ' + user_profile['skills']

# Vectorize the user profile
user_tfidf = tfidf.transform([user_text])


#Recommending Jobs Using Cosine Similarity

In [25]:
# Calculate cosine similarity between user profile and job listings
cosine_sim = cosine_similarity(user_tfidf, tfidf_matrix).flatten()

# Get the top 10 most similar jobs
top_job_indices = cosine_sim.argsort()[-10:][::-1]

# Display the top recommended jobs
recommended_jobs = df.iloc[top_job_indices]
recommended_jobs[['job_title', 'location', 'skills', 'job_description']]


Unnamed: 0,job_title,location,skills,job_description
47059,Web Developer,Apia,"HTML, CSS, JavaScript Frontend frameworks (e.g...",Frontend Web Developers design and implement u...
8589,Web Developer,Brazzaville,"HTML, CSS, JavaScript Frontend frameworks (e.g...",Frontend Web Developers design and implement u...
42224,Web Developer,Madrid,"HTML, CSS, JavaScript Frontend frameworks (e.g...",Frontend Web Developers design and implement u...
49853,Web Developer,Managua,"HTML, CSS, JavaScript Frontend frameworks (e.g...",Frontend Web Developers design and implement u...
36894,Web Developer,The City of Hamilton,"HTML, CSS, JavaScript Frontend frameworks (e.g...",Frontend Web Developers design and implement u...
22410,Web Developer,Paramaribo,"HTML, CSS, JavaScript Frontend frameworks (e.g...",Frontend Web Developers design and implement u...
6088,Web Developer,Georgetown,"HTML, CSS, JavaScript Frontend frameworks (e.g...",Frontend Web Developers design and implement u...
24308,Web Developer,Oranjestad,"HTML, CSS, JavaScript Frontend frameworks (e.g...",Frontend Web Developers design and implement u...
1204,Web Developer,Manama,"HTML, CSS, JavaScript Frontend frameworks (e.g...",Frontend Web Developers design and implement u...
21191,Web Developer,Sri Jayawardenepura Kotte,"HTML, CSS, JavaScript Frontend frameworks (e.g...",Frontend Web Developers design and implement u...


#Job Recommendation System Using Cosine Similarity

## Overview
This script recommends jobs to users based on their profile, including job title, skills, and preferences, by using **cosine similarity** to match the user profile with job descriptions in a dataset.

In [26]:
from sklearn.metrics.pairwise import cosine_similarity
import pandas as pd
from tabulate import tabulate

def recommend_jobs(user_profile, df, tfidf, top_n=10):
    """
    Recommend jobs based on user profile using cosine similarity.

    Args:
        user_profile (dict): A dictionary containing user's job title and skills.
        df (DataFrame): Job postings DataFrame.
        tfidf (TfidfVectorizer): Fitted TF-IDF vectorizer for job descriptions.
        top_n (int): Number of top recommendations to return.

    Returns:
        DataFrame: Styled DataFrame with top recommended jobs.
    """
    # Combine user profile information
    user_text = user_profile['job_title'] + ' ' + user_profile['skills']

    # Vectorize the user profile text
    user_tfidf = tfidf.transform([user_text])

    # Calculate cosine similarity between user profile and job listings
    tfidf_matrix = tfidf.transform(df['job_description'])
    cosine_sim = cosine_similarity(user_tfidf, tfidf_matrix).flatten()

    # Get top N most similar jobs
    top_job_indices = cosine_sim.argsort()[-top_n:][::-1]

    # Select relevant columns for display
    recommended_jobs = df.iloc[top_job_indices][['job_title', 'location', 'Company', 'skills', 'job_description', 'Qualifications', 'Experience', 'Salary Range']]

    # Reset index for better readability
    recommended_jobs.reset_index(drop=True, inplace=True)

    return recommended_jobs

# Example user profile
print("Enter your details to get job recommendations:")

user_job_title = input("Enter your desired job title (e.g., Data Scientist): ")
user_skills = input("Enter your skills, separated by commas (e.g., python, machine learning, data analysis): ")
user_location = input("Enter your preferred location (e.g., Remote, On-site, Hybrid): ")
user_Preference = input("Enter your preference (e.g., Male, Female): ")

# Create user profile dictionary
user_profile = {
    'job_title': user_job_title.strip(),
    'skills': user_skills.strip(),
    'location': user_location.strip(),
    'Preference': user_Preference.strip()
}
# Get top 10 job recommendations for the user
recommended_jobs = recommend_jobs(user_profile, df, tfidf, top_n=10)

# Display recommendations in a tabulated format
print(tabulate(recommended_jobs, headers='keys', tablefmt='grid'))

# For Jupyter Notebook (optional)
# from IPython.display import display
# display(recommended_jobs.style.highlight_max(axis=0).set_caption("Top Recommended Jobs"))



Enter your details to get job recommendations:
Enter your desired job title (e.g., Data Scientist): data scientist
Enter your skills, separated by commas (e.g., python, machine learning, data analysis): python
Enter your preferred location (e.g., Remote, On-site, Hybrid): hybrid
Enter your preference (e.g., Male, Female): female
+----+--------------+----------------+----------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------+---------------+----------------+
|    | job_title    | location       | Company                

# Finding Similar Jobs Using K-Nearest Neighbors (KNN)

## Overview
This snippet uses the **K-Nearest Neighbors (KNN)** algorithm to identify the most similar jobs to a user's profile based on cosine similarity.


In [27]:
from sklearn.neighbors import NearestNeighbors

# Initialize the Nearest Neighbors model
knn = NearestNeighbors(n_neighbors=10, metric='cosine')

# Fit the model on job descriptions
knn.fit(tfidf_matrix)

# Find top 10 nearest neighbors
distances, indices = knn.kneighbors(user_tfidf)

In [35]:
from tabulate import tabulate

# Get the top 10 recommended jobs based on indices
recommended_jobs = df.iloc[indices[0]]

# Prepare data for tabular format
jobs_data = recommended_jobs[['job_title', 'location', 'skills', 'job_description']].values.tolist()

# Print the result in tabular format
print("Top recommended jobs for the user:")
print(tabulate(jobs_data, headers=['Job Title', 'Location', 'Skills', 'Job Description'], tablefmt='grid'))



Top recommended jobs for the user:
+---------------+--------------------+---------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Job Title     | Location           | Skills                                                                                | Job Description                                                                                                                                                                                                                         |
| Web Developer | Bratislava         | HTML, CSS, JavaScript Frontend frameworks (e.g., React, Angular) User experience (UX) | Frontend Web Developers design and implement user interfaces for websites, ensuring they are visually appealing and 

# Preparing for Job Classification Model

## Overview
This snippet lays the foundation for building and evaluating machine learning models to classify job-related data. It includes:
- Splitting data into training and testing sets.
- Importing classification models (Logistic Regression).
- Metrics for performance evaluation.

In [29]:

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report


In [30]:
import numpy as np

# Randomly assign 1 (relevant) or 0 (not relevant)
np.random.seed(42)  # for reproducibility
df['relevant'] = np.random.choice([0, 1], size=len(df))


In [31]:
df.head()

Unnamed: 0,Experience,Qualifications,Salary Range,location,Country,Work Type,Preference,Job Title,Role,Job Description,skills,Responsibilities,Company,job_title,job_description,combined_text,relevant
0,5 to 15 Years,M.Tech,$59K-$99K,Douglas,Isle of Man,Intern,Female,Digital Marketing Specialist,Social Media Manager,Social Media Managers oversee an organizations...,"Social media platforms (e.g., Facebook, Twitte...","Manage and grow social media accounts, create ...",Icahn Enterprises,Digital Marketing Specialist,Social Media Managers oversee an organizations...,Digital Marketing Specialist Social Media Mana...,0
1,2 to 12 Years,BCA,$56K-$116K,Ashgabat,Turkmenistan,Intern,Female,Web Developer,Frontend Web Developer,Frontend Web Developers design and implement u...,"HTML, CSS, JavaScript Frontend frameworks (e.g...","Design and code user interfaces for websites, ...",PNC Financial Services Group,Web Developer,Frontend Web Developers design and implement u...,Web Developer Frontend Web Developers design a...,1
2,0 to 12 Years,PhD,$61K-$104K,Macao,"Macao SAR, China",Temporary,Male,Operations Manager,Quality Control Manager,Quality Control Managers establish and enforce...,Quality control processes and methodologies St...,Establish and enforce quality control standard...,United Services Automobile Assn.,Operations Manager,Quality Control Managers establish and enforce...,Operations Manager Quality Control Managers es...,0
3,4 to 11 Years,PhD,$65K-$91K,Porto-Novo,Benin,Full-Time,Female,Network Engineer,Wireless Network Engineer,"Wireless Network Engineers design, implement, ...",Wireless network design and architecture Wi-Fi...,"Design, configure, and optimize wireless netwo...",Hess,Network Engineer,"Wireless Network Engineers design, implement, ...",Network Engineer Wireless Network Engineers de...,0
4,1 to 12 Years,MBA,$64K-$87K,Santiago,Chile,Intern,Female,Event Manager,Conference Manager,A Conference Manager coordinates and manages c...,Event planning Conference logistics Budget man...,Specialize in conference and convention planni...,Cairn Energy,Event Manager,A Conference Manager coordinates and manages c...,Event Manager A Conference Manager coordinates...,0


In [32]:
# Initialize TF-IDF Vectorizer
tfidf = TfidfVectorizer(stop_words='english', max_features=5000)

# Fit and transform the combined text column to get features
X = tfidf.fit_transform(df['combined_text'])

# Define target variable
y = df['relevant']


In [33]:
# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


# Logistic Regression Model Implementation

## Overview
This snippet demonstrates how to train, predict, and evaluate a **Logistic Regression** model using the training and testing data.


In [34]:
# Initialize and train Logistic Regression model
log_reg = LogisticRegression(max_iter=1000)
log_reg.fit(X_train, y_train)

# Predict on test data
y_pred_log_reg = log_reg.predict(X_test)

# Evaluate Logistic Regression model
print("Logistic Regression Performance:")
print("Accuracy:", accuracy_score(y_test, y_pred_log_reg))
print(classification_report(y_test, y_pred_log_reg))


Logistic Regression Performance:
Accuracy: 0.4964194865436823
              precision    recall  f1-score   support

           0       0.50      0.89      0.64     23379
           1       0.48      0.11      0.17     23402

    accuracy                           0.50     46781
   macro avg       0.49      0.50      0.41     46781
weighted avg       0.49      0.50      0.41     46781



#Conclusion

###In conclusion, the job recommendation system developed using cosine similarity, TF-IDF vectorizer, KNN, and logistic regression offers a robust and efficient solution for matching candidates with suitable job opportunities. By leveraging the TF-IDF vectorizer to convert textual job descriptions and candidate profiles into numerical representations, the system can calculate the similarity between them. The use of KNN enhances the ability to recommend jobs based on proximity in the feature space, while logistic regression provides a strong classifier for predicting job suitability. This project demonstrates the effectiveness of combining traditional machine learning algorithms with natural language processing techniques, providing personalized job recommendations that align with user preferences and qualifications. The system can be further refined by incorporating additional features and data sources to improve recommendation accuracy and scalability.