# New Enrichment code

The new plan - screw ANythingLLM, just use as much space as we need in a GPT-4 prompt to answer our questions.

So all this code does is the following:
 - Grab the jobs-database and look for all the jobs that don't have some of the following things:
   - A "re-enrich" flag (if we don't have this, set a new one and make it "true".)
   - The extracted-details of "skills" (a list of strings) (if we don't have this, pull the text and ask our GPT4 structures for it)
   - The extracted-details of "job_title" (likewise, extract this)
   - Required-certifications (same sort of thing, with whatever is absolutely required, like a degree or a professional cert.)
   - If we place the last embedded piece in place (i.e. if we have just added something and all the pieces are now there), set re-encode to "false", indicating that we're done.
 - Do the same for people/the applicant database:
   - Again, start with the re-enrich flag
   - This one requires us to extract the text from resumes, but otherwise works the same, with skills, education, certifications, and work history.

This code is intended to get us through this process first - when we're done, we should have everything we need to do the rest of the matching, in a nice structured format, in the database.


After this is all done, we're going to do the following:
  - Make another flag, this one called "re-embed", that we will first set to "True" if it isn't there already.
  - If that flag is true, then construct a standardized narrative about the job/resume, like the following: "The job title is <desired job/job title>.  This job is similar to other positions , like <last job the person had/repeat the job title>.  The required skills are <person's skills/job required skills>."
  - If we just made that brief description, embed the person as person_<UID>, or the job as job_<UID>, in our vector DB.

Finally, every day we will look at the records where the "last_suggestions" flag is older than a chosen threshold, or where it doesn't exist.  If so, we'll go to our vector DB, and ask for the nearest 50 or so records to that person, get all the jobs that meet the immutable stuff (need to do some location and certification cuts here), then kick out a latest "recommended jobs" file/DB that we can use to do the actual applications.

In [6]:
# TODO:
# - make a nice way to turn all the flags back on so we can reconstruct everything as needed. (UI question?)
# - split up the job recommender to make it also work with these new code structures, and with three settings: embed-all (run to update all the embeddings), embed-one, and recommend-one.
# - maybe migrate from a local vector DB to pinecone
# - cut down recommendations by desired locations, pay rate, etc.

In [None]:

from __future__ import print_function
from enrichment_lib import DataEnricher
from jd_tools import CheekiFileHandler

# sys.path.insert(1,'/home/frog/repos/latest-anything-llm')


verbose_printing = True
#re-enrich the records even if they're already enriched?
jobs_override = False
applicant_override = True

# set up the various handlers we'll need:
credentials_filename = "./credentials.yml"
enricher = DataEnricher(credentials_filename)
filehandler = CheekiFileHandler() # credentials_filename)

# Do the Job enrichment:

In [None]:

our_url = 'https:/otis.wd5.myworkdayjobs.com/REC_Ext_Gateway/job/36-IMP-EDOUARD-MOREAU-VILLEFRANCHE-69400-France/TECHNICIEN-SUPERIEUR-DE-MAINTENANCE_20076433'

table_response = enricher.get_records('uid', our_url,'url',True) # can also ask for a 'frame', or 'all' records.

if table_response is None or len(table_response)==0:
    print('Error - no DB result.  Do you have the right record key/ID?')
else:
    for item in table_response: 
        uuid = item['uuid'] # can also use 'url'
        print('Checking out job with uuid: ' , uuid)
        # does this job not have the re-enrich flag, or is the re-enrich flag true?
        reenrich_flag_value = item.get('reenrich_flag')
        is_record_active = item.get('isActive')

        if jobs_override or (is_record_active and ((reenrich_flag_value is None) or reenrich_flag_value==True)):
            #after we're done here, don't need to do this again:
            item['reenrich_flag'] = False
            # Let's put all the pieces in there!
            job_title = item.get('title')
            job_desc = item.get('fullJobDescription')
            print(job_desc)
            #make the skills, and required credentials.
            try:
                item['job_skills_list'] = enricher.extract_job_skills_list(job_desc)
                item['job_creds_list'] = enricher.extract_job_credentials_list(job_desc)
                item['job_brief_desc'] = enricher.extract_job_brief_desc(job_desc)
                item['embedded_description'] = enricher.create_embedding_description_job(item)
                item['error_on_enrichment'] = False
                item['reembed_flag'] = True # if this all works, we want this to join our job embeddings

                if verbose_printing:
                    print('Required Job Skills: ', item['job_skills_list'] )
                    print('Required Job Credentials: ', item['job_creds_list'])
                    print('Brief Job Description: ', item['job_brief_desc'])

                #print(item['embedded_description'])
                #stick it all back in if we want it in the DB
                enricher.upload_item(item,True) # table.put_item(Item=item)
                #Or we could make a local file to use for now.
            except Exception as e:
                print('Exception: ', e)
                item['error_on_enrichment'] = True
                item['reembed_flag'] = False
            

            if verbose_printing:
                print('  ')
                print('  ')
                print('CHANGED ITEM::')
                print(item)
                print('  ')


# Do the Applicant Enrichment

In [None]:
our_id = 'c5cf56dc-566a-466c-8b32-d7551738589e'
table_response = enricher.get_records('uid', our_id,'id',False) # can also ask for a 'frame', or 'all' records.

if table_response is None or len(table_response)==0:
    print('Error - no DB result.  Do you have the right record key/ID?')
else:
    for item in table_response: 
        id = item['id'] # can also use 'url'
        print('Checking out user with id: ' , id)
        # does this job not have the re-enrich flag, or is the re-enrich flag true?
        reenrich_flag_value = item.get('reenrich_flag')
        is_record_active = item.get('isInitialProfileFormCompleted')
        print(is_record_active)
        if  applicant_override or (is_record_active and ((reenrich_flag_value is None) or reenrich_flag_value==True)):
            # after we're done here, don't need to do this again:
            item['reenrich_flag'] = False
            # Let's put all the pieces in there!
            job_title = item.get('jobsWanted')
            file_name = item.get('resumeFileName') 
            resume_text = None
            print(file_name)
            if file_name:
                # now we need to see if there's a usable resume text:
                resume_text = enricher.get_resume_text(item,filehandler)
            if resume_text is None:
                continue
            try:
                item['education'] = enricher.extract_education(resume_text)
                item['workHistory'] = enricher.extract_workhistory(resume_text)
                item['skills'] = enricher.extract_skills(resume_text)
                address, city, state, zipcode = enricher.extract_address(resume_text)
                item['address'] = address
                item['city'] = city
                item['state'] = state
                item['zipcode'] = zipcode
                item['embedded_description'] = enricher.create_embedding_description_applicant(item)
                item['error_on_enrichment'] = False
                item['reembed_flag'] = True # if this all works, we want this to join our job embeddings

                if verbose_printing:
                    print('Applicant Education: ', item['education'] )
                    print('Applicant Work History: ', item['workHistory'] )
                    print('Applicant Skills: ', item['skills'] )
                    print('Applicant Full Address: ', address + ' ' + city + ', ' + state + ' ' + zipcode )

                #stick it all back in if we want it in the DB
                enricher.upload_item(item,False) # table.put_item(Item=item)
                #Or we could make a local file to use for now.

                print(item['embedded_description'])
            except Exception as e:
                print('Exception: ', e)
                item['error_on_enrichment'] = True
                item['reembed_flag'] = False
            
            if verbose_printing:
                print('  ')
                print('  ')
                print('CHANGED ITEM::')
                print(item)
                print('  ')

# END OF CODE

# END OF CODE