<a href="https://colab.research.google.com/github/cs145442/nlp-projects-with-tf2/blob/master/topic_classification_with_gpt.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [4]:
!pip install transformers

Collecting transformers
[?25l  Downloading https://files.pythonhosted.org/packages/48/35/ad2c5b1b8f99feaaf9d7cdadaeef261f098c6e1a6a2935d4d07662a6b780/transformers-2.11.0-py3-none-any.whl (674kB)
[K     |████████████████████████████████| 675kB 4.7MB/s 
[?25hCollecting sacremoses
[?25l  Downloading https://files.pythonhosted.org/packages/7d/34/09d19aff26edcc8eb2a01bed8e98f13a1537005d31e95233fd48216eed10/sacremoses-0.0.43.tar.gz (883kB)
[K     |████████████████████████████████| 890kB 18.3MB/s 
Collecting sentencepiece
[?25l  Downloading https://files.pythonhosted.org/packages/d4/a4/d0a884c4300004a78cca907a6ff9a5e9fe4f090f5d95ab341c53d28cbc58/sentencepiece-0.1.91-cp36-cp36m-manylinux1_x86_64.whl (1.1MB)
[K     |████████████████████████████████| 1.1MB 35.5MB/s 
Collecting tokenizers==0.7.0
[?25l  Downloading https://files.pythonhosted.org/packages/14/e5/a26eb4716523808bb0a799fcfdceb6ebf77a18169d9591b2f46a9adb87d9/tokenizers-0.7.0-cp36-cp36m-manylinux1_x86_64.whl (3.8MB)
[K     |███

In [0]:
# load the sentence-bert model from the HuggingFace model hub
from transformers import AutoTokenizer, AutoModel
from torch.nn import functional as F
tokenizer = AutoTokenizer.from_pretrained('deepset/sentence_bert')
model = AutoModel.from_pretrained('deepset/sentence_bert')

In [0]:

def predict_labels(sentence: str, labels: list):
  # run inputs through model and mean-pool over the sequence
  # dimension to get sequence-level representations
  inputs = tokenizer.batch_encode_plus([sentence] + labels,
                                      return_tensors='pt',
                                      pad_to_max_length=True)
  input_ids = inputs['input_ids']
  attention_mask = inputs['attention_mask']
  output = model(input_ids, attention_mask=attention_mask)[0]
  sentence_rep = output[:1].mean(dim=1)
  label_reps = output[1:].mean(dim=1)

  # now find the labels with the highest cosine similarities to
  # the sentence
  similarities = F.cosine_similarity(sentence_rep, label_reps)
  closest = similarities.argsort(descending=True)
  for ind in closest:
      print('label: ' + labels[ind] + '\t ' + 'similarity: ' + "{:.2e}".format(similarities[ind]))

In [0]:
labels = ['sports', 'late', 'sick', 'caring', 'anger']

In [0]:
sentences = ["Tom bought a ticket to witness his first ever football match in the indoor stadium. He went to the match and enjoyed a lot.",
             "Last night while returning from office, tom got drenched in the rain. The next morning, he had fever and had to take a day off from his work.",
             "Tom visits the old age home near Marathahalli every Saturday. He loves to spend time at the old age home by serving food, playing with them and making them laugh",
             "Tom was going to office in his brand-new car. At one of the red lights, a car banged his car from the back. Tom got out of his car and scolded the driver of the other car. ",
             "I am a social worker and have been taking multiple awareness initiatives during the current COVID-19 situation. I had an important meeting with the Chief Minister of Bengaluru at 10:00 am on 05 Jun 2020 to discuss on various possible options to increase the awareness level within the Government workers. I started from my house in my car at 8:30 am considering that in the current lock down situation the roads would be empty. However, I noticed that there was huge traffic at multiple points. By the time I reached the chief minister office, the meeting had already commenced with other stakeholders as per the scheduled timeline.",
             ]

In [22]:
for sentence in sentences:
  print("-"*50)
  print(sentence)
  predict_labels(sentence, labels)

--------------------------------------------------
Tom bought a ticket to witness his first ever football match in the indoor stadium. He went to the match and enjoyed a lot.
label: sports	 similarity: 1.06e-01
label: caring	 similarity: -1.88e-02
label: late	 similarity: -7.67e-02
label: anger	 similarity: -1.01e-01
label: sick	 similarity: -1.40e-01
--------------------------------------------------
Last night while returning from office, tom got drenched in the rain. The next morning, he had fever and had to take a day off from his work.
label: sick	 similarity: 1.96e-01
label: late	 similarity: 1.13e-01
label: anger	 similarity: 8.80e-02
label: sports	 similarity: -1.35e-01
label: caring	 similarity: -1.53e-01
--------------------------------------------------
Tom visits the old age home near Marathahalli every Saturday. He loves to spend time at the old age home by serving food, playing with them and making them laugh
label: caring	 similarity: 1.55e-01
label: anger	 similarity: 1