<table>
    <tr>
        <td><img src="https://s3.amazonaws.com/media-p.slid.es/uploads/1485763/images/9060062/Header.png" width="300"/></td>
        <td>&nbsp;</td>
        <td>
            <h1 style="font-size:200%;color:blue;text-align:center">    <FONT COLOR="blue">  Hugging_Face</p> Pipeline  </FONT>         </h1></td>         
        <td>
            <tp><p style="font-size:99%;text-align:center">PLN 2024-1 </p></tp>
            <tp><p style="font-size:115%;text-align:center">Pregrado MACC</p></tp>
            <tp><p style="font-size:115%;text-align:center">Prof. Fabián Sánchez</p></tp>
        </td>
    </tr>
</table>

# NLP with Hugging Face Transformers

This notebook demonstrates how to use the `pipeline` API from Hugging Face's `transformers` library to solve common NLP tasks using pre-trained models.


<FONT SIZE=5 COLOR="purple"> 1. Load Required Pipeline Function<FONT/>

We import the `pipeline` utility from Hugging Face Transformers, which is a simple and optimized inference class for many machine learning tasks like text generation, image segmentation, automatic speech recognition, document question answering, and more.


In [None]:
from transformers import pipeline

# <FONT SIZE=5 COLOR="purple"> 2.  Sentiment Analysis </FONT>
We use the `"sentiment-analysis"` pipeline, which defaults to a DistilBERT model fine-tuned on SST-2. This allows us to classify text as positive or negative sentiment.

In [None]:
classifier = pipeline("sentiment-analysis")
print(classifier("I am a bad student because i don't like to study"))
print(classifier("I like to go to the cinema"))

# <FONT SIZE=5 COLOR="purple"> 3. Text Generation</FONT>
First, we use a language model pipeline (GPT-2 by default), and then a specific Spanish GPT-2 model fine-tuned for Spanish to generate new text based on a prompt.


In [None]:
generator = pipeline("text-generation")
generator('I am a math professor and',
          max_length=40,
          num_return_sequences=3,
          )

In [None]:
generator = pipeline('text-generation', model='mrm8488/spanish-gpt2')
generator("Su casa es muy bonita",
           max_length=30,
           num_return_sequences=2,
           top_k = 0)

# <FONT SIZE=5 COLOR="purple"> 4. Masked Language Modeling (Fill-Mask) </FONT>
This task predicts missing tokens in a sentence. The `fill-mask` pipeline uses models like RoBERTa or BERT to complete masked input.


In [None]:
unmasker = pipeline('fill-mask')
unmasker('This course teach you about <mask> models.', top_k=2)

# <FONT SIZE=5 COLOR="purple"> 5. Named Entity Recognition (NER)  </FONT>
This task identifies and classifies named entities in text (such as people, organizations, and locations) using a model with entity grouping enabled.


In [None]:
ner = pipeline('ner', grouped_entities=True)
ner("My name is Fabian and I work at a University in Bogotá")

# <FONT SIZE=5 COLOR="purple"> 5. Question answering  </FONT>
Question answering tasks return an answer given a question.

In [None]:
question_answer = pipeline("question-answering")
question_answer(question="What is your name?",
                context="My name is John and I work at HuggingFace in Brooklyn")

# <FONT SIZE=5 COLOR="purple"> 6. Summarization  </FONT>
Summarization creates a shorter version of a document or an article that captures all the important information.

In [None]:
summarizer = pipeline("summarization")
summarizer("""World War II or the Second World War (1 September 1939 – 2 September 1945) was a global conflict between two major alliances:
the Allies and the Axis powers. The vast majority of the worlds countries, including all the great powers, fought as part of
these military alliances. Many participating countries invested all available economic, industrial, and scientific capabilities
into this total war, blurring the distinction between civilian and military resources. Aircraft played a major role, enabling the
strategic bombing of population centres and delivery of the only two nuclear weapons ever used in war. It was by far the deadliest
conflict in history, resulting in 70–85 million fatalities. Millions died due to genocides, including the Holocaust,
as well as starvation, massacres, and disease. In the wake of Axis defeat, Germany, Austria, and Japan were occupied,
and war crime tribunals were conducted against German and Japanese leaders.""")

In [None]:
summarizer = pipeline("summarization")
summarizer("""America has changed dramatically during recent years. Not only has the number of
    graduates in traditional engineering disciplines such as mechanical, civil,
    electrical, chemical, and aeronautical engineering declined, but in most of
    the premier American universities engineering curricula now concentrate on
    and encourage largely the study of engineering science. As a result, there
    are declining offerings in engineering subjects dealing with infrastructure,
    the environment, and related issues, and greater concentration on high
    technology subjects, largely supporting increasingly complex scientific
    developments. While the latter is important, it should not be at the expense
    of more traditional engineering.

    Rapidly developing economies such as China and India, as well as other
    industrial countries in Europe and Asia, continue to encourage and advance
    the teaching of engineering. Both China and India, respectively, graduate
    six and eight times as many traditional engineers as does the United States.
    Other industrial countries at minimum maintain their output, while America
    suffers an increasingly serious decline in the number of engineering graduates
    and a lack of well-educated engineers.""")

# <FONT SIZE=5 COLOR="purple"> 7. Translation </FONT>
Translation converts a sequence of text from one language to another

In [None]:
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-es")
translator("""World War II or the Second World War (1 September 1939 – 2 September 1945) was a global conflict between two major alliances:
the Allies and the Axis powers. The vast majority of the worlds countries, including all the great powers, fought as part of
these military alliances. Many participating countries invested all available economic, industrial, and scientific capabilities
into this total war, blurring the distinction between civilian and military resources. Aircraft played a major role, enabling the
strategic bombing of population centres and delivery of the only two nuclear weapons ever used in war. It was by far the deadliest
conflict in history, resulting in 70–85 million fatalities. Millions died due to genocides, including the Holocaust,
as well as starvation, massacres, and disease. In the wake of Axis defeat, Germany, Austria, and Japan were occupied,
and war crime tribunals were conducted against German and Japanese leaders.
""")

# <FONT SIZE=5 COLOR="purple"> 8. Text classification </FONT>
Text classification is a common NLP task that assigns a label or class to text.

In [None]:
classifier = pipeline('zero-shot-classification')
classifier("This is a course about the Natural language processing",
          candidate_labels=['education', 'politics', 'bussines'])