# Hugging Face Introduction

In this part, we will learn the basics of `Hugging Face`, which is a popular library for NLP tasks. 

We will learn how to use `transformers` and `datasets` libraries.

## Installation

To start with hugging face, you need to install the following libraries:

* Transformers - Use to load pre-trained models
* Datasets - Use to load datasets

```bash
pip install transformers
```

or with conda

```bash
conda install -c huggingface transformers
```

In [1]:
!pip install transformers

Collecting transformers
  Downloading transformers-4.44.2-py3-none-any.whl.metadata (43 kB)
Collecting huggingface-hub<1.0,>=0.23.2 (from transformers)
  Downloading huggingface_hub-0.24.6-py3-none-any.whl.metadata (13 kB)
Collecting regex!=2019.12.17 (from transformers)
  Downloading regex-2024.7.24-cp310-cp310-win_amd64.whl.metadata (41 kB)
Collecting safetensors>=0.4.1 (from transformers)
  Downloading safetensors-0.4.5-cp310-none-win_amd64.whl.metadata (3.9 kB)
Collecting tokenizers<0.20,>=0.19 (from transformers)
  Downloading tokenizers-0.19.1-cp310-none-win_amd64.whl.metadata (6.9 kB)
Downloading transformers-4.44.2-py3-none-any.whl (9.5 MB)
   ---------------------------------------- 0.0/9.5 MB ? eta -:--:--
   -------- ------------------------------- 2.1/9.5 MB 10.7 MB/s eta 0:00:01
   ----------------------- ---------------- 5.5/9.5 MB 14.0 MB/s eta 0:00:01
   ------------------------------------- -- 8.9/9.5 MB 15.4 MB/s eta 0:00:01
   ----------------------------------------

## Start with Transformers

In [2]:
from transformers import pipeline

import torch
import torch.nn.functional as F

Create a classifier, with `pipeline` function

Mind that when creating a classifier, we need to specify the task we want to do, e.g. `text-classification`, `sentiment-analysis`, etc.

API: https://huggingface.co/docs/transformers/v4.44.2/en/main_classes/pipelines#transformers.pipeline

In [3]:
classifier  = pipeline('sentiment-analysis')
res = classifier("We are very happy to show you the 🤗 Transformers library.")  # try to classify the class of the sentence we give

print(res)

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


[{'label': 'POSITIVE', 'score': 0.9997795224189758}]


In [None]:
# let's try multiple sentence