<a href="https://colab.research.google.com/github/RDGopal/IB9LQ0-GenAI/blob/main/Transformers_with_HuggingFace.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# HuggingFace Transformers

#Install Transformers

[Resource](https://huggingface.co/learn/llm-course/chapter1/1?fw=pt)


In [None]:
!pip install transformers

In [None]:
from IPython.display import display, Markdown

#Pipelines

The `pipeline()` function connects a model with its necessary preprocessing and postprocessing steps, allowing us to directly input any text and get an intelligible answer.

There are three main steps involved when you pass some text to a pipeline:

1. The text is preprocessed into a format the model can understand.
2. The preprocessed inputs are passed to the model.
3. The predictions of the model are post-processed, so you can make sense of them.



Some of the currently available pipelines are:

* feature-extraction (get the vector representation of a text)

* fill-mask

* ner (named entity recognition)

* question-answering

* sentiment-analysis

* summarization

* text-generation

* translation

* zero-shot-classification

Models available on HuggingFace: [Models](https://huggingface.co/models)

#Sentiment Analysis

In [None]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")


In [None]:
classifier("This is quite difficult")

##Sentiment Analysis - Batch Processing

We will read the first 100 rows of the `sms_spam.csv` and calculate the sentiment scores.

In [None]:
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/RDGopal/IB9LQ0-GenAI/main/Data/sms_spam.csv',nrows=100)

In [None]:
df

In [None]:
try:
    results = classifier(df['text'].tolist())
except Exception as e:
    print(f"An error occurred: {e}")

labels = [item['label'] for item in results]
scores = [item['score'] for item in results]

df['sentiment_label'] = labels
df['sentiment_score'] = scores

In [None]:
df

#Zero shot classification

In [None]:
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")

In [None]:
classifier(
    "This is a discussion about world history",
    candidate_labels=["education", "politics", "business"],
)

#Text generation

In [None]:
generator = pipeline('text-generation', model='gpt2')

In [None]:
output = generator(
    "When you are on a diet",
    max_length=50,
    num_return_sequences=1,
)
# Extract the generated text from the list of dictionaries
generated_texts = [item['generated_text'] for item in output]
# Join the generated texts into a single string
display(Markdown("\n".join(generated_texts)))

#Fill mask

In [None]:
unmasker = pipeline("fill-mask", model = "distilroberta-base")

In [None]:
unmasker("This course will teach you all about <mask> models.", top_k=5)

#Named Entity Recognition

In [None]:
ner = pipeline("ner", grouped_entities=True, model="dbmdz/bert-large-cased-finetuned-conll03-english")

In [None]:
ner("My name is Ram and I work at WBS in Coventry.")

#Question Answering

In [None]:
question_answerer = pipeline("question-answering",model="distilbert-base-cased-distilled-squad")

In [None]:
question_answerer(
    question="Where do I work?",
    context="My name is Ram and I work at WBS in Coventry, UK",
)

In [None]:
question_answerer(
    question="Which city do I work in?",
    context="My name is Ram and I work at WBS in Coventry, UK",
)

#Summarization

In [None]:
summarizer = pipeline("summarization", model = "sshleifer/distilbart-cnn-12-6")

In [None]:
output = summarizer(
    """
    America has changed dramatically during recent years. Not only has the number of
    graduates in traditional engineering disciplines such as mechanical, civil,
    electrical, chemical, and aeronautical engineering declined, but in most of
    the premier American universities engineering curricula now concentrate on
    and encourage largely the study of engineering science. As a result, there
    are declining offerings in engineering subjects dealing with infrastructure,
    the environment, and related issues, and greater concentration on high
    technology subjects, largely supporting increasingly complex scientific
    developments. While the latter is important, it should not be at the expense
    of more traditional engineering.

    Rapidly developing economies such as China and India, as well as other
    industrial countries in Europe and Asia, continue to encourage and advance
    the teaching of engineering. Both China and India, respectively, graduate
    six and eight times as many traditional engineers as does the United States.
    Other industrial countries at minimum maintain their output, while America
    suffers an increasingly serious decline in the number of engineering graduates
    and a lack of well-educated engineers.
"""
)
display(Markdown(output[0]['summary_text']))

#Translation

In [None]:
trans = pipeline("translation_en_to_fr", model = "t5-base")

In [None]:
y = trans("good evening")
list(y)



---

