# Technicals_Lookup
---
This project is intended to be used to analyze what (data science) technical skills are required by a job posting.  
User should first save the job description to a `.txt` file or a `.csv` file locally.  
When running the first code cell, user will be prompted to enter the directory of the job description.  
Executing the entire notebook will print each technical skills and return a list of the technical skills 


---
### identify_technicals
Once the job description has been loaded, all texts are converted to lower case and most common punctuations are removed. Exception being `-` as hyphen is often used in the name of skillsets.  
  
Because most skillsets are either a single word or a two-word phrase, the program will use the `ngrams` method in the `nltk` package to extract one-gram and two-gram values from the job description text.
  
Once done, it compares a pre-determined list of common data science skillsets to the one-gram and two-gram arrays. Matched skillsets will be stored in a list and returned.

In [1]:
def identify_technicals_txt(url):

    import numpy as np
    import re
    from nltk import ngrams


    df = open(url).read()
    
    df_lower = df.lower()
    df_split = re.split(",| |\n|\.|\(|\)|-", df_lower)

    onegram = ngrams(df_split, 1)
    twogram = ngrams(df_split, 2)

    onegrams = []
    twograms = list(twogram)

    for onegram in list(onegram):
        onegrams.append(onegram[0])
        
    keywords = open("Keywords.txt").read()
    keywords_lower = keywords.lower()
    keywords_split = keywords_lower.split(",\n")


    keywords_tuple = []

    for keyword in keywords_split:
        words = re.split("-| ", keyword)
        if len(words) > 1:
            words = tuple(words)
        keywords_tuple.append(words)
    
    technicals = []

    for gram in onegrams:
        if gram not in technicals and gram in keywords_split:
            technicals.append(gram)
            
    for gram in twograms:
        if gram not in technicals and gram in keywords_tuple:
            technicals.append(gram)
    
    return technicals

In [71]:
def identify_technicals_csv(url):

    import numpy as np
    import re
    from nltk import ngrams
    import pandas as pd


    df = pd.read_csv(url)
    df_desc = df.iloc[:, 4]

    output = []

    for row in range(len(df_desc)):
        df_row = df_desc.iloc[row]

        df_lower = df_row.lower()
        df_split = re.split(",| |\n|\.|\(|\)|-", df_lower)

        onegram = ngrams(df_split, 1)
        twogram = ngrams(df_split, 2)

        onegrams = []
        twograms = list(twogram)

        for onegram in list(onegram):
            onegrams.append(onegram[0])

        keywords = open("Keywords.txt").read()
        keywords_lower = keywords.lower()
        keywords_split = keywords_lower.split(",\n")


        keywords_tuple = []

        for keyword in keywords_split:
            words = re.split("-| ", keyword)
            if len(words) > 1:
                words = tuple(words)
            keywords_tuple.append(words)

        technicals = []

        for gram in onegrams:
            if gram not in technicals and gram in keywords_split:
                technicals.append(gram)

        for gram in twograms:
            if gram not in technicals and gram in keywords_tuple:
                technicals.append("-".join(gram))

        output.append("; ".join(technicals))
    
    return output

---
### print_technicals

The below code cell will take the output from `identify_technicals` and print out the skillsets in order of the input array.    
  
Due to the difference between the data type of one-gram and two-gram in the technicals list, a `if` function will identify which type of gram it is and print accordingly.

In [19]:
def print_technicals(technicals):

    for item in technicals:
        if type(item) != str:
            print(" ".join(item), end="; ")
        
        else:
            print(item, end='; ')

In [65]:
def save_csv_technicals(technicals):
    output_pd = pd.DataFrame(technicals)
    output_pd.to_csv("technicals_output.csv")

---
### Exectution - text 
Ask user to input job description file and run the file through `identify_technicals`

In [24]:
job_description = str(input("Enter the url of the job description file: "))

technicals = identify_technicals(job_description)


Enter the url of the job description file:  temp.txt


Feed the output from `identify_technicals` into `print_technicals` 

In [25]:
print("List of technical skills: \n")

print_technicals(technicals)

List of technical skills: 

classification; 

---
### Execution - csv

In [72]:
job_description = str(input("Enter the url of the job description file: "))

technicals = identify_technicals_csv(job_description)
save_csv_technicals(technicals)


Enter the url of the job description file:  Google_Job_Scrap_Output.csv
