In [2]:
import transformers
import torch
import multiprocessing as mp

# Sentiment Analysis

In [4]:
classifier = transformers.pipeline(task='sentiment-analysis', model='distilbert/distilbert-base-uncased-finetuned-sst-2-english')



In [5]:
classifier('hello how are you doing')

[{'label': 'POSITIVE', 'score': 0.9943325519561768}]

In [6]:
classifier('this is not fun')

[{'label': 'NEGATIVE', 'score': 0.9997977614402771}]

In [37]:
del classifier

# Text Generation

In [7]:
generator = transformers.pipeline(task='text-generation', model='distilgpt2', token='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')

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


config.json:   0%|          | 0.00/762 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/353M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [13]:
generator("harry potter is such a", max_length=25, return_full_text=True)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'harry potter is such a sweet place that that he may even pick up a few. He is quite pleasant, and'}]

In [9]:
print('hello world')

hello world


In [38]:
del generator

# Zero-shot classification

Basically the model just tries to classify a piece of text into different class labels using pre-trained embeddings

In [17]:
zero_shot_classifier = transformers.pipeline('zero-shot-classification')

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [35]:
labels = ['english', 'math', 'law', 'engineering', 'medicine']

passages = [
    'I love to read books',
    'The patient has an aortic dissection',
    'The plaintiff is suing the defendant',
    'The model is not optimizing correctly due to being stuck in a saddle point',
    'The torque applied by the engine is 1000 newton-meters',
    'this one is unrelated to all the labels'
]

def get_result(passage):
    result = zero_shot_classifier(
        passage,
        candidate_labels=labels
    )
    return result

for passage in passages:
    result = get_result(passage)
    print(result)

{'sequence': 'I love to read books', 'labels': ['english', 'law', 'engineering', 'medicine', 'math'], 'scores': [0.600541353225708, 0.1184295192360878, 0.11539845168590546, 0.09796752035617828, 0.06766314804553986]}
{'sequence': 'The patient has an aortic dissection', 'labels': ['medicine', 'english', 'law', 'engineering', 'math'], 'scores': [0.7287752032279968, 0.14902187883853912, 0.08169705420732498, 0.025099463760852814, 0.015406393446028233]}
{'sequence': 'The plaintiff is suing the defendant', 'labels': ['law', 'english', 'engineering', 'medicine', 'math'], 'scores': [0.7353108525276184, 0.19169163703918457, 0.02753818966448307, 0.025707408785820007, 0.01975192315876484]}
{'sequence': 'The model is not optimizing correctly due to being stuck in a saddle point', 'labels': ['engineering', 'english', 'math', 'law', 'medicine'], 'scores': [0.34398236870765686, 0.2728816866874695, 0.2652459442615509, 0.0858307033777237, 0.03205932676792145]}
{'sequence': 'The torque applied by the eng

In [36]:
del zero_shot_classifier

# Summarization

In [3]:
# this model is fine tuned on CNN articles. Nothing to do with convolutional neural network
summarizer = transformers.pipeline('summarization', model='facebook/bart-large-cnn')



In [4]:
article = """
Georgia’s president has called on protesters to use their vote in upcoming parliamentary elections to "reverse" the controversial foreign agents law passed by the country’s parliament on Tuesday.

President Salome Zourabichvili admitted to CNN's Christiane Amanpour that there are "many concerns" after the parliament voted in favor of the controversial foreign agents law.

"The way and the place where we can reverse all of this is the elections in October...And we have to use this mobilization of the society and this consolidation of the political parties to go and win those elections," Zourabichvili said.
The president, who has previously accused Russia of trying to bolster its influence over the former Soviet country, told CNN that she will symbolically veto the law.

Due to the setup of Georgia's parliamentary system, Zourabichvili holds mainly a figurehead role and her veto can be overruled by a simple parliamentary majority.

She called the law a complete "duplicate" of one passed by the Kremlin in 2012 which she said has been used to "completely oppress and repress the civil society" in Russia.

Russia is growing more and "more worried" by Georgia's rapprochement with the European Union, Zourabichvili remarked, referencing the recent decision by the bloc to grant Georgia candidate status.

Although roughly 20% of Georgian territory is currently controlled by Russia following the 2008 invasion, Georgia has not been "diverted" from "following its European path," the president added.

"It has not stopped us an inch and it will not stop us continuing," she added.
"""

In [5]:
summarizer(article, max_length=100, min_length=0, do_sample=False)

[{'summary_text': 'Georgia’s president has called on protesters to use their vote in upcoming parliamentary elections to "reverse" the controversial foreign agents law. The president, who has previously accused Russia of trying to bolster its influence over the former Soviet country, said she will symbolically veto the law.'}]