### Pre trained LLMS

* pipeline(): automatic model and tokenizer selection
* Auto classes(Automodel class)
* from_pretrained(): load pretrained model weights and tokenizer as specified in model_name
* AutoModel generic class that when given inputs for inference returns the hideen states produced by the model body. lacks a task-specific head, like a classification head, which we can include.

1. tokenize inputs: enabling padding and sequence truncation when exceeding the max length
2. get model hidden states and aggregate them using pooler_output
3. hidden states are forward-passed through the custom classification head for binary classification to obtain raw logits that are mapped into class probabilities using softmax

AutoModelForSequenceClassification
AutoModelForCasualLLLM

### How text generation LLM training works
1. requires of examples composed of input-target sequence pairs. 
2. input sequence represents a segment of text
3. target sequence is the same as the input sequence but shifted by one token to the left, ensuring the next token in the original sequence is the prediction target
4. with many sequences the llm eventually learns to predict the next word

In [None]:
# Load the tokenizer and pre-trained model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(
  model_name, num_labels=2)

text = ["The best movie I've ever watched!", "What an awful movie. I regret watching it."]

# Tokenize inputs and pass them to the model for inference
inputs = tokenizer(text, return_tensors="pt", padding=True)
outputs = model(**inputs)
logits = outputs.logits

predicted_classes = torch.argmax(logits, dim=1).tolist()
for idx, predicted_class in enumerate(predicted_classes):
    print(f"Predicted class for \"{text[idx]}\": {predicted_class}")

In [None]:
model_name = "Helsinki-NLP/opus-mt-en-es"

# Load the tokenizer and the model checkpoint
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

english_inputs = ["Hello", "Thank you", "How are you?", "Sorry", "Goodbye"]

# Encode the inputs, generate translations, decode, and print them
for english_input in english_inputs:
    input_ids = tokenizer.encode(english_input,return_tensors="pt")
    translated_ids = model.generate(input_ids)
    translated_text = tokenizer.decode(translated_ids[0], skip_special_tokens=True)
    print(f"English: {english_input} | Spanish: {translated_text}")

### LLMs for question answering

3 types:
* Extractive: the LLM extracts the answer to a question from a provided context. requires  an encoder only architecture
* Open Generative QA: model constructs answer using language generation based on the context rather than extracting it. Encoder decoder approach
* Closed generative QA:  LLM fully generated the answer no context provided. Decoder only architecture

* to accommodate contexts longer tha max sequence length, return_overflowing_tokens=True to activate a sliding window that splits the context into multiple overlapping windows.
* max_length and stride set up a window size. 

# Load a specific subset of the dataset
mlqa = load_dataset("xtreme", name="MLQA.en.en")

question = mlqa["test"]["question"][0]
context = mlqa["test"]["context"][0]
print("Question: ", question)
print("Context: ", context)

# Initialize the tokenizer using the model checkpoint
tokenizer = AutoTokenizer.from_pretrained("deepset/minilm-uncased-squad2")

# Tokenize the inputs returning the result as tensors
inputs = tokenizer(question, context, return_tensors="pt")
print("First five encoded tokens: ", inputs["input_ids"][0][:5])