![image.png](https://github.com/Christina1281995/demo-repo/blob/main/header_noteook_2.png?raw=true)

### An introduction to Hugging Face

-	Hugging Face is a leading open-source library and platform for NLP tasks

-	The library includes:

    -	pre-trained <b>models </b>
    -	<b>tools</b> for building custom NLP models
    -	easy-to-use <b>pipeline</b> approach
    -	support for a wide range of programming <b>languages</b> (Python, Java, JavaScript …)


Let's take a look to see what it actually looks like:

https://huggingface.co/ 

##### <img src="https://github.com/Christina1281995/demo-repo/blob/main/models.PNG?raw=true">

The "Models" page on Hugging Face is a <b>searchable database</b> of over 170,000 pre-trained models for natural language processing (NLP), computer vision, and speech recognition. <br><br>
The page allows users to search for models by <b>task, framework, language, and model architecture</b>. 
<br><br>
Additionally, the page provides a <b>leaderboard</b> of top-performing models for various tasks, as well as a section for community-contributed models. The models available on Hugging Face are designed to be easily integrated into existing projects and workflows, and the site provides tools and resources to help developers use them effectively.
<br>





<img src="https://github.com/Christina1281995/demo-repo/blob/main/modelcard1.PNG?raw=true" width="80%" align="right">

<br>
<br>

<b>Each model</b> has a profile page that includes:

- a description

- performance metrics

- a list of available implementations in various frameworks

- links to download the model 

- links to its source code on GitHub

If you take a look at the "Files and version" tab on a model you will see that they all have a few essential building blocks.

For the model itself:
- a `config.json` file, which saves the configuration of your model ;
- a `pytorch_model.bin` file, which is the PyTorch checkpoint (unless you can’t have it for some reason) ;
- a `tf_model.h5` file, which is the TensorFlow checkpoint (unless you can’t have it for some reason) ;

For the tokenizer:
- a `special_tokens_map.json`, which is part of your tokenizer save;
- a `tokenizer_config.json`, which is part of your tokenizer save;
- files named `vocab.json`, `vocab.txt`, `merges.txt`, or similar, which contain the vocabulary of your tokenizer, part of your tokenizer save;
- maybe a `added_tokens.json`, which is part of your tokenizer save.

Thanks to those standardised files, we can use a pipeline() method to easily use any of those models! 

##### Pipeline

The [pipeline()](https://huggingface.co/docs/transformers/main/en/main_classes/pipelines#transformers.pipeline) is the easiest and fastest way to use a pretrained model for inference.

There are <b>three main steps</b> involved when you pass some text to a pipeline:

1. The text is preprocessed into a format the model can understand.
2. The preprocessed inputs are passed to the model.
3. The predictions of the model are post-processed, so you can make sense of them.


<img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter2/full_nlp_pipeline.svg" width="60%">


By <b>default</b>, the pipeline selects a particular pretrained model that has been fine-tuned for sentiment analysis in English. The model is downloaded and cached when you create the classifier object. If you rerun the command, the cached model will be used instead and there is no need to download the model again.

```python

    from transformers import pipeline
    classifier = pipeline("sentiment-analysis")
    classifier("Today was just terrible!!")

```

| **Task**                     | **Description**                                                                                              | **Modality**    | **Pipeline identifier**                       |
|------------------------------|--------------------------------------------------------------------------------------------------------------|-----------------|-----------------------------------------------|
| Text classification          | assign a label to a given sequence of text                                                                   | NLP             | pipeline(task=“sentiment-analysis”)           |
| Text generation              | generate text given a prompt                                                                                 | NLP             | pipeline(task=“text-generation”)              |
| Summarization                | generate a summary of a sequence of text or document                                                         | NLP             | pipeline(task=“summarization”)                |
| Image classification         | assign a label to an image                                                                                   | Computer vision | pipeline(task=“image-classification”)         |
| Image segmentation           | assign a label to each individual pixel of an image (supports semantic, panoptic, and instance segmentation) | Computer vision | pipeline(task=“image-segmentation”)           |
| Object detection             | predict the bounding boxes and classes of objects in an image                                                | Computer vision | pipeline(task=“object-detection”)             |
| Audio classification         | assign a label to some audio data                                                                            | Audio           | pipeline(task=“audio-classification”)         |
| Automatic speech recognition | transcribe speech into text                                                                                  | Audio           | pipeline(task=“automatic-speech-recognition”) |
| Visual question answering    | answer a question about the image, given an image and a question                                             | Multimodal      | pipeline(task=“vqa”)                          |
| Document question answering  | answer a question about a document, given an image and a question                                            | Multimodal      | pipeline(task="document-question-answering")  |
| Image captioning             | generate a caption for a given image                                                                         | Multimodal      | pipeline(task="image-to-text")                |



In [2]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier("Today was just terrible!")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'NEGATIVE', 'score': 0.9996246099472046}]

<img src="https://github.com/Christina1281995/demo-repo/blob/main/task.PNG?raw=true" align="right" width="30%">
If we don't just want to use the default model, we can also choose a particular model from the Hub. 

All we need to do is go to the [Model Hub](https://huggingface.co/models) and click on the corresponding tag on the left to display only the supported models for that task. You should get to a page like this one.

```python

from transformers import pipeline

classifier = pipeline("text-classification", model="cardiffnlp/twitter-roberta-base-sentiment")
classifier("Today was just terrible!")

```


In [4]:
classifier = pipeline("text-classification", model="cardiffnlp/twitter-roberta-base-sentiment")
classifier("Today was just terrible!")

Downloading (…)lve/main/config.json: 100%|██████████| 747/747 [00:00<00:00, 373kB/s]
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Downloading pytorch_model.bin: 100%|██████████| 499M/499M [00:10<00:00, 49.8MB/s] 
Downloading (…)olve/main/vocab.json: 100%|██████████| 899k/899k [00:00<00:00, 7.42MB/s]
Downloading (…)olve/main/merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 4.90MB/s]
Downloading (…)cial_tokens_map.json: 100%|██████████| 150/150 [00:00<00:00, 49.9kB/s]


[{'label': 'LABEL_0', 'score': 0.9787670373916626}]