# Complete process to use transformers without pipeline.

## step1 - Preprocessing with a tokenizer
- Splitting the input into words, subwords, or symbols (like punctuation) that are called tokens
- Mapping each token to an integer
- Adding additional inputs that may be useful to the model

In [None]:
from transformers import AutoTokenizer

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

In [None]:
# tokenizing the words.
raw_inputs = [
    "I've been waiting for a HuggingFace course my whole life.",
    "I hate this so much!",
]
inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="tf")
print(inputs)

## Going through the model
- to get the model in the same way as we have got tokenizers we can use `TFAutoModel ` which have `from_pretrained` method.

In [None]:
from transformers import TFAutoModel

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
model = TFAutoModel.from_pretrained(checkpoint)

## To see the diamensions of the vector we can use this code:

In [None]:
outputs = model(inputs)
print(outputs.last_hidden_state.shape)

### we will need a model with a sequence classification head (to be able to classify the sentences as positive or negative). So, we won’t actually use the TFAutoModel class, but TFAutoModelForSequenceClassification:

In [None]:
from transformers import TFAutoModelForSequenceClassification

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
model = TFAutoModelForSequenceClassification.from_pretrained(checkpoint)
outputs = model(inputs)

In [None]:
print(outputs.logits.shape)


## Postprocessing the output


In [None]:
# To see the outputs probabilities of the model we can use:
import tensorflow as tf

predictions = tf.math.softmax(outputs.logits, axis=-1)
print(predictions)

## Get label

In [None]:
## To get the label for the output we can use:
model.config.id2label