Installing Libraries & Packages - 

In [2]:
!pip install -q transformers

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.1/7.1 MB[0m [31m71.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m224.5/224.5 kB[0m [31m24.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m57.7 MB/s[0m eta [36m0:00:00[0m
[?25h

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import nltk
from transformers import pipeline
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification
from scipy.special import softmax


## Twitter-roBERTa-base for Sentiment Analysis

Below I use the roBERTa-base model trained on ~58M tweets and finetuned for sentiment analysis with the TweetEval benchmark. This model is suitable for English.

In [4]:
MODEL = f"cardiffnlp/twitter-roberta-base-sentiment"
tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = AutoModelForSequenceClassification.from_pretrained(MODEL)

Downloading (…)lve/main/config.json:   0%|          | 0.00/747 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/150 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/499M [00:00<?, ?B/s]

### Testing the model -

In [6]:
sample = "This was incredibly bad"
def roberta_senanalyis(inp):
    encoded_text = tokenizer(inp, return_tensors='pt')
    output = model(**encoded_text)
    scores = softmax(output[0][0].detach().numpy())
    scores_dict = {
        'negative' : scores[0],
        'neutral' : scores[1],
        'positive' : scores[2]
    }
    return scores_dict

neg, neu, pos = roberta_senanalyis(sample)['negative'], roberta_senanalyis(sample)['neutral'], roberta_senanalyis(sample)['positive']
maxm = max(neg, neu, pos)
print(neg, neu, pos)

0.9769726 0.020274766 0.0027526724


It can be seen above that the model gives a negative sentiment for the sample input "This was incredibly bad".

### Testing Hugging Face's Sentiment Analysis Pipeline - 

The pipelines are a great and easy way to use models for inference. These pipelines are objects that abstract most of the complex code from the library, offering a simple API dedicated to several tasks, including Named Entity Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction and Question Answering. 

In [7]:
sentiment_analysis = pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/268M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Xformers is not installed correctly. If you want to use memorry_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.


In [8]:
sentiment_analysis("I kind of liked that song.") # Positive sentiment

[{'label': 'POSITIVE', 'score': 0.999756395816803}]

In [9]:
sentiment_analysis("beautiful mess") # Not so perfect (?)

[{'label': 'NEGATIVE', 'score': 0.826908528804779}]

## Grammatical Error Correction using Happytransformer from Hugging Face -

In [10]:
!pip install happytransformer

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting happytransformer
  Downloading happytransformer-2.4.1-py3-none-any.whl (45 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
Collecting datasets>=1.6.0 (from happytransformer)
  Downloading datasets-2.12.0-py3-none-any.whl (474 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m474.6/474.6 kB[0m [31m41.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting sentencepiece (from happytransformer)
  Downloading sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m74.2 MB/s[0m eta [36m0:00:00[0m
Collecting dill<0.3.7,>=0.3.0 (from datasets>=1.6.0->happytransformer)
  Downloading dill-0.3.6-py3-none-any.whl (110 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11

In [11]:
from happytransformer import HappyTextToText, TTSettings

In [12]:
happy_tt = HappyTextToText("T5", "vennify/t5-base-grammar-correction")

args = TTSettings(num_beams=5, min_length=1)

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.42k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/892M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/1.92k [00:00<?, ?B/s]

Downloading spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/1.79k [00:00<?, ?B/s]

In [13]:
result = happy_tt.generate_text("grammar: This sentences has has bads grammar", args=args)

print(result.text)

This sentence has bad grammar.


In [15]:
def compiler(ip):
  neg, neu, pos = roberta_senanalyis(ip)['negative'], roberta_senanalyis(ip)['neutral'], roberta_senanalyis(ip)['positive']
  maxm = max(neg, neu, pos)
  fixed = happy_tt.generate_text("grammar: {}".format(ip),args=args).text
  print("Corrected sentence: ", fixed)
  print("Mood: ", end='')
  if maxm == neg:
    print('😒')
  elif maxm == neu:
    print('😐')
  else:
    print('😀')


# Testing for mood & grammatical errors in input -

In [16]:
compiler('I mean we all sorts messed') # Negative

Corrected sentence:  I mean, we all sort of messed up.
Mood: 😒


In [18]:
compiler('What was Goin the clstroom was bad') # Negative

Corrected sentence:  What was going in the classroom was bad.
Mood: 😒


In [19]:
compiler('That was such a goood times') # Positive

Corrected sentence:  That was such a great time.
Mood: 😀


In [26]:
compiler('I will go check') # Neutral

Corrected sentence:  I will go check it out.
Mood: 😐
