##Downloading Required Libraries

Spacy Library

In [None]:
!python -m spacy download en_core_web_md

In [None]:
!pip install spacy==2.3.1

In [None]:
!pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.1/en_core_web_sm-2.3.1.tar.gz

Pyresparser library which reads resume and stores required information.

In [None]:
!pip install pyresparser

##Importing Libraries:

In [None]:
import pandas as pd
import spacy
import numpy as np
import random
import nltk
nltk.download('stopwords')

# Load the spaCy model
nlp = spacy.load("en_core_web_md")

Importing Internship Dataset

In [None]:
df = pd.read_csv('/content/recomm_df.csv')

In [None]:
df.head()

Unnamed: 0,id,href,job_title,company_name,job_loc,details,category,compensation,start,end,skills,href.1
0,1,http://letsintern.com/internship/human-resourc...,hr executive - recruitment,engenia technologies,gurgaon,we are seeking a hr recruiter w...,human resources recruiter,paid,2019-03-02,2019-08-28,hr practices,http://letsintern.com/internship/Human-Resourc...
1,2,http://letsintern.com/internship/tele-sales-ex...,telecalling & lead generation,abalone technologies pvt ltd,noida,selected intern's day-to-day re...,tele sales executive,paid,2019-02-17,2019-08-30,office administration,http://letsintern.com/internship/Tele-Sales-Ex...
2,3,http://letsintern.com/internship/marketing-pro...,digital marketing internship,brandstory digital marketing company,bangalore,are you looking for digital mar...,marketing professional,paid,2018-12-25,2020-04-29,digital marketing,http://letsintern.com/internship/Marketing-Pro...
3,4,http://letsintern.com/internship/accountant-in...,recruitment of corporate bank back office post,bandhan pvt.ltd,"kathua,barasat,bardhaman,bongoan,habra",huge opportunity in corporate b...,accountant,paid,2019-03-12,,analytical skills,http://letsintern.com/internship/Accountant-in...
4,5,http://letsintern.com/internship/software-deve...,software developer,trippyigloo,bangalore,we are looking for interns who ...,software developer : python,paid,2019-01-30,2019-06-20,"go(golang),java,mongodb,ngin...",http://letsintern.com/internship/Software-Deve...


Getting insights from the data.

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 626 entries, 0 to 625
Data columns (total 13 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   id            626 non-null    int64  
 1   href          626 non-null    object 
 2   job_title     626 non-null    object 
 3   company_name  626 non-null    object 
 4   job_loc       626 non-null    object 
 5   details       625 non-null    object 
 6   category      626 non-null    object 
 7   compensation  626 non-null    object 
 8   start         626 non-null    object 
 9   end           411 non-null    object 
 10  skills        626 non-null    object 
 11  href.1        626 non-null    object 
 12  similarity    626 non-null    float64
dtypes: float64(1), int64(1), object(11)
memory usage: 63.7+ KB


Importing Pyresparser library.

In [None]:
from pyresparser import ResumeParser
import warnings
warnings.filterwarnings("ignore", category=UserWarning)

In [None]:
data = ResumeParser("/content/Black and White Corporate Resume.pdf").get_extracted_data()

###Extracting information from resume.



In [None]:
data

{'name': 'O +123',
 'email': 'hello@reallygreatsite.com',
 'mobile_number': '123-456-7890',
 'skills': ['International',
  'P',
  'Reporting',
  'Operations',
  'Administration',
  'Sales',
  'Alliances',
  'Communication',
  'R',
  'Video',
  'Negotiation'],
 'college_name': None,
 'degree': None,
 'designation': ['Business Development', 'Operations Manager'],
 'experience': ['October 2019 - Present'],
 'company_names': ['Ginyard International Co.', 'PROFESSIONAL EXPERIENCE'],
 'no_of_pages': 2,
 'total_experience': 4.75}

###Extracting Skills and Designation

In [None]:
user_skills = data['skills']
user_deets = data['designation']

In [None]:
user_details = ' '.join([str(elem) for elem in user_deets])
user_details += ' for ' + str(data['total_experience']) + ' years at ' + ' '.join([str(elem) for elem in data['company_names']])

Storing both User Skills and Details in a single String.

In [None]:
user_details

'Business Development Operations Manager for 4.75 years at Ginyard International Co. PROFESSIONAL EXPERIENCE'

###Calculating Similarity Score using "User Details String"

Using spaCy's similarity() method which calculates similarity between job description and User details.

In [None]:
user_doc = nlp(user_details)

# Function to compute similarity
def compute_similarity(details, user_doc):
    if type(details) != str:
        return 0.0
    detail_doc = nlp(details)
    return detail_doc.similarity(user_doc)

# Apply the similarity function to the dataset
df['similarity'] = df['details'].apply(lambda x: compute_similarity(x, user_doc))

# Display the dataframe with similarity scores
print(df[['id', 'details', 'similarity']])


      id                                            details  similarity
0      1                 we are seeking a hr recruiter w...    0.811893
1      2                 selected intern's day-to-day re...    0.796353
2      3                 are you looking for digital mar...    0.803698
3      4                 huge opportunity in corporate b...    0.850605
4      5                 we are looking for interns who ...    0.817470
..   ...                                                ...         ...
621  649                 we are seeking full time sales ...    0.846103
622  650                 we are looking for hardworking ...    0.777054
623  651                 we are looking for enthusiastic...    0.855585
624  652                 agro2o® is new delhi based agri...    0.809940
625  653                 we are looking for a content wr...    0.746010

[626 rows x 3 columns]


Defining RL Hyperparameters.

In [None]:
num_episodes = 1000
learning_rate = 0.1
discount_factor = 0.9
epsilon = 0.1

Defining Q-Table

In [None]:
num_jobs = len(df)
num_actions = num_jobs  # Each job is an action

# Q-table
Q = np.zeros((1, num_actions))

Defining Reward Function

Reward for recommending job based on skills and similarity score.

Reward increases based on number of matching skills between User Skills and Job required skills.

Similarity score is added to this Reward.

In [None]:
def reward_function(job_idx, user_skills, df):
    job_skills = df.iloc[job_idx]['skills'].split(', ')
    skill_match = len(set(job_skills) & set(user_skills))
    return df.iloc[job_idx]['similarity'] + skill_match

###Q-Learning Algorithm:

Epsilon Greedy Approach.

Updating Q value.

In [None]:
for episode in range(num_episodes):
    state = 0  # Only one state in this simplified environment
    for job_idx in range(num_jobs):
        if random.uniform(0, 1) < epsilon:
            action = random.choice(range(num_actions))  # Explore
        else:
            action = np.argmax(Q[state])  # Exploit

        reward = reward_function(action, user_skills, df)
        Q[state, action] = Q[state, action] + learning_rate * (reward + discount_factor * np.max(Q[state]) - Q[state, action])

###Top 3 Recommended Internships with highest Q-Value:

In [None]:
recommended_jobs = np.argsort(Q[0])[-3:][::-1]  # Top 3 recommendations

###Displaying Top 3 Recommedations:

In [None]:
# Display recommended jobs
print("Recommended internships based on your profile:")
for job_idx in recommended_jobs:
    print(f"Job ID: {df.iloc[job_idx]['id']}, Job Title: {df.iloc[job_idx]['job_title']}, Similarity: {df.iloc[job_idx]['similarity']:.2f}, Skills: {df.iloc[job_idx]['skills']}")

Recommended internships based on your profile:
Job ID: 553, Job Title: sales associates, Similarity: 0.87, Skills:                   analytical skills,sales situation handling - basic
Job ID: 194, Job Title: operations executive, Similarity: 0.87, Skills:                   accounting,agreeableness
Job ID: 153, Job Title: marketing interns, Similarity: 0.87, Skills:                   marketing,sales situation handling - basic,writing skills


###Saving Q-Table in Numpy file.

In [None]:
np.save('q_table.npy', Q)