# Recommender System: Research notebook

## Setting up environment

In [2]:
!pip install virtualenv
!virtualenv rs-env
!source rs-env/bin/activate

created virtual environment CPython3.9.7.final.0-64 in 12763ms
  creator CPython3Windows(dest=C:\Users\Erick\Projects\RecommenderSystem-LLM\notebooks\rs-env, clear=False, no_vcs_ignore=False, global=False)
  seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=C:\Users\Erick\AppData\Local\pypa\virtualenv)
    added seed packages: pip==23.1.2, setuptools==67.7.2, wheel==0.40.0
  activators BashActivator,BatchActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator


"source" no se reconoce como un comando interno o externo,
programa o archivo por lotes ejecutable.


## Modules

In [5]:
%%capture

!pip install sentence-transformers torch pandas numpy

In [6]:
import pandas as pd
import torch
import numpy as np
import pickle
from sentence_transformers import SentenceTransformer, util

  from .autonotebook import tqdm as notebook_tqdm


## Reading data

In [None]:
users = pd.read_csv('./users.csv')
jobs = pd.read_csv('./jobs.csv')

jobs_copy = jobs.copy()

## Processing

In [None]:
def fillna_with_whitspace(df, features):
    _ = [df[column].fillna('', inplace=True) for column in features]

fillna_with_whitspace(users, ['hardskills', 'subareas'])
fillna_with_whitspace(jobs_copy, ['area', 'country', 'work_modality'])

Defining custom prompts for either users and jobs datasets

In [None]:
def create_user_prompt(df: pd.DataFrame)-> pd.Series:
    return 'Area de ' + df.area + ' - ' + df.subareas + ', en ' + df.country + \
        ' o remoto. Con modalidad ' + df.work_modality + '. ' + df.hardskills

users['prompt'] = create_user_prompt(users)
print('User prompt:', users.iloc[50].prompt)

def create_job_prompt(df: pd.DataFrame)-> pd.Series:
    place = np.where(df.remote, 'Remoto', df.country)

    return 'Area de ' + df.area + ', ' + place + ', con modalidad ' \
        + df.work_modality + '. ' + df.description

jobs_copy['prompt'] = create_job_prompt(jobs_copy)
print('Job prompt:', jobs_copy.iloc[300].prompt[:100])

User prompt: Area de MERCADEO - , en Colombia o remoto. Con modalidad Medio tiempo. ACTIVACIÓN DE MARCA ,PLANEACIÓN,ESTRATEGIA OMNICANAL,TRADE MARKETING,MARKETING ESTRATEGICO,EXPERIENCIA DE SERVICIOS,SERVICIO AL CLIENTE,ESTRATEGÍAS COMERCIALES,ESTRATEGIA DE MERCADEO,PENSAMIENTO ESTRATÉGICO
Job prompt: Area de Recursos Humanos, Remoto, con modalidad Tiempo completo. at the intersect group, our mission


## Embeddings

In [3]:
!nvidia-smi

Sat May 27 08:52:34 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 531.14                 Driver Version: 531.14       CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                      TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA GeForce GTX 950M       WDDM | 00000000:01:00.0 Off |                  N/A |
| N/A    0C    P0               N/A /  N/A|      0MiB /  4096MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

In [8]:
torch.cuda.is_available()

False

In [7]:
def try_gpu(i=0):
    if torch.cuda.device_count()>=i+1:
        return torch.device(f'cuda:{i}')
    return torch.device('cpu')

print(try_gpu())

cpu


In [None]:
def load_model(model_name)-> SentenceTransformer:
    model = SentenceTransformer(model_name, device=try_gpu())
    return model.to(try_gpu())

In [None]:
MODEL_NAME = 'sentence-transformers/all-MiniLM-L6-v2'
model = load_model(MODEL_NAME)

Downloading (…)e9125/.gitattributes:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

Downloading (…)_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading (…)7e55de9125/README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

Downloading (…)55de9125/config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Downloading (…)ce_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

Downloading (…)125/data_config.json:   0%|          | 0.00/39.3k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

Downloading (…)nce_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading (…)e9125/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

Downloading (…)9125/train_script.py:   0%|          | 0.00/13.2k [00:00<?, ?B/s]

Downloading (…)7e55de9125/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)5de9125/modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

In [None]:
%%time
embedding_jobs = model.encode(jobs_copy['prompt'], convert_to_tensor=True)

CPU times: user 39.7 s, sys: 129 ms, total: 39.8 s
Wall time: 30.7 s


In [None]:
# Persisting objects
with open('model-embeddings.pkl', 'wb') as file:
    pickle.dump((model, embedding_jobs), file)

## Creating recommendations

In [30]:
from datetime import datetime
import pytz

def get_now_date()->str:
    tz = pytz.timezone('America/Bogota') 
    return datetime.now(tz).isoformat()

print("now =", get_now_date())

now = 2023-05-25T20:32:10.207723-05:00


In [None]:
def get_recommendations(model: SentenceTransformer, 
                        embedding_jobs: torch.tensor, 
                        id_user:int,
                        num:int = 5):
    '''
    Function to return a list job recommendations according to a single user.
    Parameters

    model: Sentence transformer model
    embedding_jobs: Job embedding
    id_user: User id.
    
    Return
    Returns a dictionary

    '''
    prompt: str = users[users.id_user == id_user]['prompt'].values[0]

    embedding_user: torch.tensor = model.encode(prompt, convert_to_tensor=True)  
      
    similarity_scores: torch.tensor = util.pytorch_cos_sim(embedding_user, embedding_jobs)    
    
    values, indices = similarity_scores.squeeze().sort(descending=True)
   
    recommendations: pd.DataFrame = jobs[jobs.index.isin(indices[:num].tolist())]
    recommendations.insert(0, 'match_score', values[:num].tolist(), True)
    recommendations.insert(1, 'recommendation_date', get_now_date(), True)
    return recommendations.to_dict(orient='records')

## Loading saved objects

In [None]:
with open('model-embeddings.pkl', 'rb') as file:
    model, embedding_jobs = pickle.load(file)

## Testing recommender

Using existing user Id

In [None]:
ID_USER = 541

for k, v in users[users.id_user == ID_USER].to_dict(orient='records')[0].items():
    print(k,':', v)

print('\n==== Results ====\n')
pd.DataFrame(get_recommendations(model, embedding_jobs, id_user=ID_USER))

id_user : 541
country : Colombia
area : MERCADEO
subareas : PR Y COMUNICACIONES
degrees : nan
wage_aspiration : 1600000.0
currency : COP
current_wage : nan
change_cities : nan
language : INGLÉS B2 - INTERMEDIO ALTO
years_experience : 1.0
months_experience : nan
wish_role_name : ANALISTA DE COMUNICACIONES,ANALISTA DE COMUNICACIONES INTERNAS Y EXTERNAS,ASISTENTE DE COMUNICACIONES
work_modality : Indiferente
hardskills : COMUNICACIÓN Y MEDIOS,ESTRATEGIA DE MARCA,MARKETING DIGITAL,COMUNICACION CORPORATIVA,BRANDING,LIDERAZGO,ADOBE CREATIVE SUITE,PRODUCCIÓN AUDIOVISUAL,REDES SOCIALES,PLANEACIÓN ESTRATÉGICA
prompt : Area de MERCADEO - PR Y COMUNICACIONES, en Colombia o remoto. Con modalidad Indiferente. COMUNICACIÓN Y MEDIOS,ESTRATEGIA DE MARCA,MARKETING DIGITAL,COMUNICACION CORPORATIVA,BRANDING,LIDERAZGO,ADOBE CREATIVE SUITE,PRODUCCIÓN AUDIOVISUAL,REDES SOCIALES,PLANEACIÓN ESTRATÉGICA

==== Results ====



Unnamed: 0,match_score,account executive,area,work_modality,country,city,remote,vacancy_name,description
0,0.882009,56189.0,Mercadeo,Tiempo completo,COLOMBIA,Bogota,False,analista de mercadeo,Importante empresa de comercialización de Inst...
1,0.851654,59281.0,Mercadeo,Tiempo completo,COLOMBIA,Bogota,False,creative content specialist leader,"estamos en búsqueda de un head of content, con..."
2,0.846939,51937.0,Mercadeo,Tiempo completo,COLOMBIA,Bogota,False,profesional - marketing digital,Reconocida empresa del sector aeronautico esta...
3,0.842396,55246.0,Mercadeo,Tiempo completo,COLOMBIA,Cucuta,False,gerente de mercadeo,100% Presencial en la ciudad de cúcuta\n\nActu...
4,0.841607,59269.0,Mercadeo,Tiempo completo,COLOMBIA,Bogota,False,community manager,reconocida empresa del sector salud busca para...


Using a manual prompt

In [None]:
pd.DataFrame(get_recommendations(model, embedding_jobs, sentence='Java developer, en Colombia'))

Unnamed: 0,match_score,account executive,area,work_modality,country,city,remote,vacancy_name,description
0,0.606291,2859.0,,Tiempo completo,MEXICO,Guadalajara,False,desarrollador java,esta vacante viene de la bolsa de empleo talen...
1,0.600613,18789.0,,Tiempo completo,MEXICO,Ciudad de Mexico,False,desarrollador java,#WeAreHiring: DESARROLLADOR JAVA - INGLES AVAN...
2,0.574529,635.0,,Tiempo completo,,,False,desarrollador java sr,"somos una empresa 100% mexicana, dedicada a br..."
3,0.561872,1460.0,,Tiempo completo,COLOMBIA,Bogota,False,desarrollador java,desarrollador java\n\nen q - vision buscamos i...
4,0.541469,1692.0,,Tiempo completo,MEXICO,Ciudad de Mexico,True,desarrollador java,acerca de la empresa\n\nwe are a company which...


In [None]:
pd.DataFrame(get_recommendations(model, embedding_jobs, sentence='desarrollador Node, en Mexico, presencial'))

Unnamed: 0,match_score,account executive,area,work_modality,country,city,remote,vacancy_name,description
0,0.557539,24981.0,Ventas Comercial,Indiferente,MEXICO,Ciudad de Mexico,False,Captador Inmobiliario Freelance MX,Houm es una Startup PropTech - YC W21 y plataf...
1,0.555337,6049.0,,Tiempo completo,MEXICO,Ecatepec,False,analista de servicio al cliente,Analista de servicio al cliente\n\nPerfil\n\nE...
2,0.549566,31201.0,Ventas Comercial,Tiempo completo,MEXICO,Tecamac,False,reclutamiento botargueros y vendedores metro n...,¡ÚNETE A NUESTRA GRAN FAMILIA!\n\nVENDEDORES ...
3,0.541399,64287.0,Recursos Humanos,Tiempo completo,MEXICO,Monterrey,False,reclutador de personal,Experiencia mínima en reclutamiento y disponib...
4,0.541181,49931.0,Mercadeo,Tiempo completo,MEXICO,Puebla,False,redacción de discurso y creación de contenido,"si te gusta escribir, crear discursos que cone..."
