# Hugging Face - Naas drivers integration
<a href="https://app.naas.ai/user-redirect/naas/downloader?url=https://raw.githubusercontent.com/jupyter-naas/awesome-notebooks/master/Naas/Naas_NLP_Examples.ipynb" target="_parent"><img src="https://img.shields.io/badge/-Open%20in%20Naas-success?labelColor=000000&logo="/></a>

In this notebook, you will be able to explore the Hugging Face transformers package with minimal technical knowledge thanks to Naas low-code drivers.<br>
Hugging Face is an immensely popular Python library providing pretrained models that are extraordinarily useful for a variety of natural language processing (NLP) tasks.

**Tags**: #nlp #huggingface #api #models #transformers

**Authors**: [Gagan Bhatia](https://www.linkedin.com/in/gbhatia30/), [Jeremy Ravenel](https://www.linkedin.com/in/j%C3%A9r%C3%A9my-ravenel-8a396910/), [Thomas Parenteau](https://www.linkedin.com/in/thomas-parenteau-0570b120a/)

## How it works?
Naas drivers HuggingFace formulas follow this format.
```
huggingface.get(task, model, tokenizer)(inputs)
```
The supported tasks are the following:

- text-generation (model: GPT2)
- summarization (model: t5-small)
- fill-mask (model: distilroberta-base)
- text-classification (model: distilbert-base-uncased-finetuned-sst-2-english)
- feature-extraction (model: distilbert-base-cased)
- token-classification (model: dslim/bert-base-NER)
- question-answering
- translation

We simply use [Hugging Face API](https://huggingface.co/models) under the hood to access the models.

In [2]:
from naas_drivers import huggingface

## Text Generation

In [3]:
huggingface.get("text-generation", model="gpt2", tokenizer="gpt2")("What is the most important thing in your life right now?")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "What is the most important thing in your life right now?\n\nI don't really see something in life right now but what's the most important thing in your life right now? I think there's some important things to think about right now."}]

## Text Summarization
Summarize the text given, maximum lenght (number of tokens/words) is set to 200.

In [4]:
huggingface.get("summarization", model="t5-small", tokenizer="t5-small")('''

There will be fewer and fewer jobs that a robot cannot do better. 
What to do about mass unemployment this is gonna be a massive social challenge and 
I think ultimately we will have to have some kind of universal basic income.

I think some kind of a universal basic income is going to be necessary 
now the output of goods and services will be extremely high 
so with automation they will they will come abundance there will be or almost everything will get very cheap.

The harder challenge much harder challenge is how do people then have meaning like a lot of people 
they find meaning from their employment so if you don't have if you're not needed if 
there's not a need for your labor how do you what's the meaning if you have meaning 
if you feel useless these are much that's a much harder problem to deal with. 

''')

Downloading:   0%|          | 0.00/1.20k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/242M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

Your max_length is set to 200, but you input_length is only 183. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=50)


[{'summary_text': 'there will be fewer and fewer jobs that a robot cannot do better . what to do about mass unemployment this is gonna be a massive social challenge . we will have to have some kind of universal basic income .'}]

## Text Classification
Basic sentiment analysis on a text.<br>
Returns a "label" (negative/neutral/positive), and score between -1 and 1.

In [5]:
huggingface.get("text-classification", 
        model="distilbert-base-uncased-finetuned-sst-2-english",
        tokenizer="distilbert-base-uncased-finetuned-sst-2-english")('''

It was a weird concept. Why would I really need to generate a random paragraph? 
Could I actually learn something from doing so? 
All these questions were running through her head as she pressed the generate button. 
To her surprise, she found what she least expected to see.

''')

Downloading:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/268M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

[{'label': 'POSITIVE', 'score': 0.7975088357925415}]

## Fill Mask

Fill the blanks ('< mask >') in a sentence given with multiple proposals. <br>
Each proposal has a score (confidence of accuracy), token value (proposed word in number), token_str (proposed word)

In [6]:
huggingface.get("fill-mask",
        model="distilroberta-base",
        tokenizer="distilroberta-base")('''

It was a beautiful <mask>.

''')

Downloading:   0%|          | 0.00/480 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/331M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

[{'sequence': '\n\nIt was a beautiful sunset.\n\n',
  'score': 0.09137973934412003,
  'token': 18820,
  'token_str': ' sunset'},
 {'sequence': '\n\nIt was a beautiful day.\n\n',
  'score': 0.07021960616111755,
  'token': 183,
  'token_str': ' day'},
 {'sequence': '\n\nIt was a beautiful sight.\n\n',
  'score': 0.06246931850910187,
  'token': 6112,
  'token_str': ' sight'},
 {'sequence': '\n\nIt was a beautiful night.\n\n',
  'score': 0.05541388317942619,
  'token': 363,
  'token_str': ' night'},
 {'sequence': '\n\nIt was a beautiful evening.\n\n',
  'score': 0.051386501640081406,
  'token': 1559,
  'token_str': ' evening'}]

## Feature extraction
This generate a words embedding (extract numbers out of the text data).<br>
Output is a list of numerical values.

In [7]:
huggingface.get("feature-extraction", model="distilbert-base-cased", tokenizer="distilbert-base-cased")("Life is a super cool thing")

Downloading:   0%|          | 0.00/411 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/263M [00:00<?, ?B/s]

Some weights of the model checkpoint at distilbert-base-cased were not used when initializing DistilBertModel: ['vocab_projector.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Downloading:   0%|          | 0.00/213k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/436k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

[[[0.3962975740432739,
   0.10953986644744873,
   -0.1044691950082779,
   -0.3824445307254791,
   -0.1948671042919159,
   -0.14558309316635132,
   0.3585171103477478,
   -0.12952859699726105,
   0.14789840579032898,
   -1.1870355606079102,
   -0.2329632043838501,
   0.1364167481660843,
   -0.16243140399456024,
   -0.03428542613983154,
   -0.49253687262535095,
   0.09349807351827621,
   0.12387130409479141,
   0.18162520229816437,
   -0.0539383664727211,
   -0.14095546305179596,
   0.1303362250328064,
   -0.24430859088897705,
   0.5100507140159607,
   -0.28827446699142456,
   0.11686251312494278,
   0.026834849268198013,
   0.18902577459812164,
   0.15868628025054932,
   -0.13757330179214478,
   0.4147922098636627,
   0.02030094340443611,
   0.22222192585468292,
   -0.06169242039322853,
   0.041600603610277176,
   -0.29421865940093994,
   0.15584172308444977,
   0.004737033974379301,
   -0.2527780532836914,
   -0.001537596806883812,
   -0.3084218204021454,
   -0.5496448278427124,
   0.2

## Token classification
Basically NER. If you give names, location, or any "entity" it can detect it.<br>

| Entity abreviation | Description                                                                  |
|--------------|------------------------------------------------------------------------------|
| O            | Outside of a named entity                                                    |
| B-MIS        | Beginning of a miscellaneous entity right after another miscellaneous entity |
| I-MIS        | Miscellaneous entity                                                         |
| B-PER        | Beginning of a person’s name right after another person’s name               |
| I-PER        | Person’s name                                                                |
| B-ORG        | Beginning of an organization right after another organization                |
| I-ORG        | organization                                                                 |
| B-LOC        | Beginning of a location right after another location                         |
| I-LOC        | Location                                                                     |


Full documentation : https://huggingface.co/dslim/bert-base-NER.<br>

In [None]:
huggingface.get("token-classification", model="dslim/bert-base-NER", tokenizer="dslim/bert-base-NER")('''

My name is Wolfgang and I live in Berlin

''')