Zero shot classifier - https://huggingface.co/facebook/bart-large-mnli

In [1]:
import pandas as pd
from datasets import Dataset
from transformers import pipeline

In [2]:
classifier = pipeline("zero-shot-classification",
                      model="facebook/bart-large-mnli") #built to understand natural language
candidate_labels = ['Hybrid', 'OnSite', 'Remote']

Device set to use cuda:0


Classifier function

In [3]:
def classify(text):
    """
    wrapper for classifier pipeling function
        relies on classifier and labels being defined earlier!
    """
    out = classifier(text, candidate_labels)
    label2score = list(zip(out['labels'], out['scores']))
    predicted_label = max(label2score, key=lambda x: x[1])[0]    #get label of largest confidence score

    return predicted_label

    

Eval

In [4]:
#get test data

fp_test = "../../MISC/work_arrangements_test_set.csv"
df_test = pd.read_csv(fp_test)
df_test.drop("id", axis=1, inplace=True) #get rid of id column
df_test.rename(columns={"job_ad": "text"}, inplace=True)

df_test
testdata = Dataset.from_pandas(df_test)


In [5]:
correct = 0
predictions = []


for i in range(len(testdata)):
    sample = testdata[i]

    predicted_label = classify(sample['text'])
    predictions.append(predicted_label)

    if predicted_label == sample['y_true']:
        correct+=1

print(f"Accuracy = {correct/len(testdata)}")


You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


Accuracy = 0.494949494949495


In [6]:
from eval import *

p = precision(["Remote", "Hybrid", "OnSite"], testdata['y_true'], predictions)
print(p)
r = recall(["Remote", "Hybrid", "OnSite"], testdata['y_true'], predictions)
print(r)

{'Remote': 0.8333333333333334, 'Hybrid': 0.34782608695652173, 'OnSite': 0.8333333333333334, 'AVERAGE': 0.6714975845410628}
{'Remote': 0.7692307692307693, 'Hybrid': 0.8888888888888888, 'OnSite': 0.10869565217391304, 'AVERAGE': 0.5889384367645237}


Gradio Demo

In [7]:
import gradio as gr

demo = gr.Interface(
    fn=classify,
    inputs=gr.Textbox(lines=2, placeholder="Input Job ad"),
    outputs=gr.Textbox(),
    title="Work arrangement 0-shot classifier",
    description="This model utilises BART-large-MNLI, using an understanding of natural language to predict the work arrangements of a job ad"
)

demo.launch(share=True)

* Running on local URL:  http://127.0.0.1:7860
* Running on public URL: https://1a581aea5705320031.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


