<a href="https://colab.research.google.com/github/VivekMuraleedharanGit/HuggingFace-Learnings/blob/main/Pipeline_function.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Huggingface Transfomers tutorial notebooks

In this notebook I'm adding my learnings from Hugging face tutorials related to NLP and transformers
* 1. **Pipeline function**

Pipeline used to connect with the model and dataset with its necessary preprocessing and post processing steps, which makes them easier to apply directly with txt input and output 

In [None]:
# Installation 

!pip install datasets transformers[sentencepiece]

In [2]:
# pipeline function import 

from transformers import pipeline

* **Sentiment Analysis**

Sentiment analyser used to analyse the sentiment of a sentence and return the sentiment and polarity of the sentiment 

In [4]:
classifier = pipeline("sentiment-analysis")
classifier("I'm very much happy about that i gotta learn this course.")

[{'label': 'POSITIVE', 'score': 0.9998486638069153}]

In [5]:
classifier("I'm not so excited about it")

[{'label': 'NEGATIVE', 'score': 0.8566796779632568}]

In [6]:
classifier("I'm okay with the course contents it's upto the mark")

[{'label': 'POSITIVE', 'score': 0.9997645616531372}]

In [9]:
#passing two sentence at a time 
classifier(["I'm happy about it ",
           "I hate this feeling"])

[{'label': 'POSITIVE', 'score': 0.9998759627342224},
 {'label': 'NEGATIVE', 'score': 0.999440610408783}]

In [13]:
classifier(["It's an average movie,one time watch ", "But it's not bad though"])

[{'label': 'NEGATIVE', 'score': 0.5866876840591431},
 {'label': 'POSITIVE', 'score': 0.9993414282798767}]

In [21]:
# we can directly mention the model inside the classifier itself 

classifier(["It's an average movie,one time watch ", "But it's not bad though"],model="nlptown/bert-base-multilingual-uncased-sentiment")

[{'label': 'NEGATIVE', 'score': 0.5866876840591431},
 {'label': 'POSITIVE', 'score': 0.9993414282798767}]

* **Zero Shot classification**

Zero shot classification used to annotate the text into different classes and it allows you to specify which labels to use for the classification 

In [None]:
classifier = pipeline("zero-shot-classification")

In [35]:
classifier("This tutorial about the nlp models are really helpful for self learning and practice",
           candidate_labels = ['education','business','comedy'])

{'labels': ['education', 'business', 'comedy'],
 'scores': [0.8053352236747742, 0.13119320571422577, 0.06347152590751648],
 'sequence': 'This tutorial about the nlp models are really helpful for self learning and practice'}

In [39]:
classifier('He was such a witty person to talk, we had a great deal that closed today from him',
           candidate_labels = ['business',"comedy","politics"])

{'labels': ['business', 'comedy', 'politics'],
 'scores': [0.5627211332321167, 0.39812958240509033, 0.039149314165115356],
 'sequence': 'He was such a witty person to talk we had a great deal that closed today from him'}

* **Text generation**

The main idea here is that you provide a prompt and the model will auto-complete it by generating the remaining text

In [None]:
generator= pipeline('text-generation')

In [42]:
generator('In this course we will learn how to')

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course we will learn how to create content for you, to use as your new website or blog.\n\n\nWhat do I need to do to get started?\n\nIf you are a designer or UX engineer, a web developer or someone'}]

In [45]:
#tuning the parameters for length and sequences
generator('Today was a better day for me because ',max_length=40,num_return_sequences=3)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Today was a better day for me because \xa0they have been doing so very well, and I had the chance to pick up some of the great deals on my first day. I started reading about'},
 {'generated_text': 'Today was a better day for me because \xa0I was in love with it. \xa0I thought when I was an adult I would know how to read and write it better and to make it'},
 {'generated_text': "Today was a better day for me because \xa0I got all my groceries, a great diet, and a home.\xa0 My life wasn't perfect, but I learned a lot about myself . It"}]

In [50]:
#adding perticular model from the hub for text generation 

generator('text-generation',model = "distilgpt2")
generator("we can sit here tonight because",num_return_sequences=2,max_length=20)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'we can sit here tonight because the day is ready."\n\nLiz Pohl, President and'},
 {'generated_text': "we can sit here tonight because this isn't about us. We want to continue to make sure that"}]

* **Named entity recognition**

Named entity recognition (NER) is a task where the model has to find which parts of the input text correspond to entities such as persons, locations, or organizations

In [None]:
ner = pipeline("ner", grouped_entities=True)

In [54]:
ner("My Name is Sundar Pichai and I'm the CEO of Alphabet at California")

[{'end': 24,
  'entity_group': 'PER',
  'score': 0.95563614,
  'start': 11,
  'word': 'Sundar pichai'},
 {'end': 52,
  'entity_group': 'ORG',
  'score': 0.9937219,
  'start': 44,
  'word': 'Alphabet'},
 {'end': 66,
  'entity_group': 'LOC',
  'score': 0.51094395,
  'start': 56,
  'word': 'California'}]

* **Question answering**

Used to fetch aswers from the given context

In [None]:
Question_answer = pipeline("question-answering")

In [58]:
Question_answer(question = "where do Sundar work",
context = "My Name is Sundar Pichai and I'm the CEO of Alphabet at California")

{'answer': 'California', 'end': 66, 'score': 0.910706639289856, 'start': 56}

In [64]:
Question_answer(question = ["Which company I'am working","What is my Job"],
context = "My Name is Sundar Pichai and I'm the CEO of Alphabet at California")

[{'answer': 'Alphabet', 'end': 52, 'score': 0.9826475977897644, 'start': 44},
 {'answer': 'CEO of Alphabet',
  'end': 52,
  'score': 0.11989560723304749,
  'start': 37}]

In [68]:
context= '''Simba is the protagonist of Disney's The Lion King franchise. Introduced in the 1994 film The Lion King, 
Walt Disney Animation's 32nd animated feature, the character subsequently appears in The Lion King'''

Question = ['Who is Simba','When Simba came',"Which movie have Simba character"]

In [69]:
Question_answer(question =Question,context = context)

[{'answer': "the protagonist of Disney's The Lion King",
  'end': 50,
  'score': 0.18819037079811096,
  'start': 9},
 {'answer': '1994', 'end': 84, 'score': 0.9721288084983826, 'start': 80},
 {'answer': 'The Lion King',
  'end': 50,
  'score': 0.42562785744667053,
  'start': 37}]