# Uso de transformers con las `AutoClasses` de Hugging Face
Las clases `AutoClasses` nos permiten cargar la configuración, tokenizado y modelo de una arquitectura transformer concreta para distintas tareas de texto.  
>AutoClasses are here to do this job for you so that you automatically retrieve the relevant model given the name/path to the pretrained weights/config/vocabulary.
>Instantiating one of AutoConfig, AutoModel, and AutoTokenizer will directly create a class of the relevant architecture

In [None]:
#!pip install transformers

In [None]:
from transformers import AutoConfig, AutoTokenizer, AutoModel

Definimos un modelo (`checkpoint`) de una arquitectura concreta a cargar. Los posible modelos están listados en https://huggingface.co/docs/transformers/v4.29.1/en/model_doc/auto#transformers.AutoConfig.from_pretrained  


In [None]:
checkpoint = 'bert-base-cased'

Cargamos el tokenizador específico

In [None]:
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

tokenizer

In [None]:
input = tokenizer("I like the Transformers library")

In [None]:
print(input)

Cargamos la configuración por defecto del modelo

In [None]:
config = AutoConfig.from_pretrained(checkpoint)

config

In [None]:
[attr for attr in dir(config) if not attr.startswith('__')]

Podemos cambiar algunos parámetros de la configuración

In [None]:
config.output_hidden_states = True

Cargamos un modelo base (head-less)

In [None]:
modelo = AutoModel.from_pretrained(checkpoint, config=config)

In [None]:
modelo

Los modelos genéricos generan la salida (hidden_layer) a la salida de las capas del ENCODER

In [None]:
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")

In [None]:
output = modelo(**inputs)

In [None]:
output.keys()

In [None]:
output.last_hidden_state.shape

In [None]:
output.pooler_output.shape

In [None]:
len(output.hidden_states)

In [None]:
output.hidden_states[0].shape

In [None]:
modelo

Los modelos genéricos generan la salida (hidden_layer) a la salida de las capas del ENCODER

In [None]:
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")

In [None]:
output = modelo(**inputs)

In [None]:
output.keys()

In [None]:
output.last_hidden_state.shape

In [None]:
output.pooler_output.shape

In [None]:
len(output.hidden_states)

También podemos cargar la arquitectura (HEAD) para una tarea del lenguaje determinada. Existen las siguientes tareas:  https://huggingface.co/docs/transformers/v4.29.1/en/model_doc/auto#natural-language-processing

In [None]:
#Tarea de clasificación de textos

from transformers import AutoModelForSequenceClassification

modelo = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=3)

In [None]:
modelo

In [None]:
output = modelo(**inputs)

In [None]:
output.keys()

In [None]:
output.logits.shape

In [None]:
#Tarea de clasificación de tokens

from transformers import AutoModelForTokenClassification

id2label = {
    0: "O",
    1: "B-corporation",
    2: "I-corporation",
    3: "B-creative-work",
    4: "I-creative-work",
    5: "B-group",
    6: "I-group",
    7: "B-location",
    8: "I-location",
    9: "B-person",
    10: "I-person",
    11: "B-product",
    12: "I-product",
}
label2id = {
    "O": 0,
    "B-corporation": 1,
    "I-corporation": 2,
    "B-creative-work": 3,
    "I-creative-work": 4,
    "B-group": 5,
    "I-group": 6,
    "B-location": 7,
    "I-location": 8,
    "B-person": 9,
    "I-person": 10,
    "B-product": 11,
    "I-product": 12,
}

modelo = AutoModelForTokenClassification.from_pretrained(checkpoint, num_labels=13, id2label=id2label, label2id=label2id)

In [None]:
modelo

In [None]:
output = modelo(**inputs)

In [None]:
output.keys()

In [None]:
output.logits.shape

In [None]:
inputs = tokenizer(["I like icecream", "I do not like brocolli"], padding=True, return_tensors="pt")

In [None]:
inputs

In [None]:
output = modelo(**inputs)

In [None]:
output.logits.shape