# Quick Notes:
If something's not working, restart the notebook using the restart button at the top.
Make sure to use the .venv kernal to make sure it's working properly, you can use the code:

```python
import sys
print(sys.executable)
!pip list
```
If issues persist, delete the venv, create a new one, install requirements.txt and restart the kernel.
## Locating models
Models can be found in /Users/aaronfowler/.cache/huggingface




# NLP vs LLM
* NLP: Natural Language Processing
* LLM: Large Language Model
* NLP is broader field focused on understanding/interpreting & generating human language, such as:
* Sentiment analysis, entity recognition, machine translation
* LLMs: Powerful subset of NLP w/ large size, extensive training, and wide range of language tasks w/ minimal task specific training
## NLP and LLM Models
* NLP: Field at intersection of ML and linguistics, understanding context not just words, sentiment analysis, text gen, speech recog
## Rise of LLMS
* Transformed NLP shifted from task specific models to general purpose
LP is changing
* Computers don't process like humans, language is complex: Ambiguity, context, nuance like sarcasm requires careful text representation
## NLP Tasks
* Classifying sentences: Sentiment analysis, Spam detection
* Classifying words: Part of speech tagging, named entity recognition
* Generating text: Auto-completion, translation, summarization
* Extracting answers: Q&A based on context
## Characteristics of LLMS
* Scale: Mil-bil parameters
* general capabilities w/o task specific training
* learn from examples in context/in prompt
* Demonstrate emergent abilities not explicitly programmed
## Limitations of LLMS
* Hallucination
* lack of true understanding, opperating on stat patterns
* Bias is reproduced
* limited context windows
* significant computational resources
## Paradigm shift
* Moved from specialized models to signle large model, made sophisticated LP more accesible, but challenges ethics, efficiency, deployment
## Common NLP Tasks
* Classifying whole sentences: Sentiment of a review, detect spam, determining grammar correctness
* Classifying each word in a sentence: Noun, verb, adjective or named entities
* Generate text content: Complete promt w/ autogen text, filling in blanks w/ masked word
* Extracting answers based on context
* Generating a new sentence from an input text
* Not limited to written, can tackle speech recognition and computer vision

# TRANSFORMERS:
* Transformers library provides functionality to create and use shared models
* Most basic is the pipeline function
    * Returns an end to end object that performs an NLP task on one or several texts
    * Pre-processing -> Model -> Post-processing
        * Pre-processing: Converts input to NLP readable format
        * Model: Conducts computation of input
        * Post-processing: Returns it in human readable output

```python
from transformers import pipeline # Load pipeline

classifier = pipeline("sentiment-analysis") # Create a sentiment analysis classifier, which we've named classifier
classifier("I love this course!")# Test the classifier with a sample input
```
**OR** test with two inputs
```python
classifier = pipeline("sentiment-analysis") # Create a sentiment analysis classifier
# Test the classifier with a sample input
classifier(
    "I love this course!",
    "I can't get my finger out of my ass!")
# Responses
```

```bash
[{'label': 'POSITIVE', 'score': 0.9998835325241089},
{'label': 'NEGATIVE', 'score': 0.9990085959434509}]
```
It will label positive or negative, alongside a confidence score


In [None]:
from transformers import pipeline # Load the sentiment analysis pipeline

classifier = pipeline("sentiment-analysis") # Create a sentiment analysis classifier
classifier(["I love this course!", "My finger is stuck in my asshole!"]) # Test with multiple inputs

## Zero-shot-classification
* More generalized text classification pipeline
* Allows you to provide labels you want

In [3]:
from transformers import pipeline
classifier = pipeline("zero-shot-classification") # Create a zero-shot classifier
classifier(
    "This School class is going to teach you loads of business shit", # Input text to classify
    candidate_labels=["education", "politics", "business"]  #Specify the labels you want to classify against
)

No model was supplied, defaulted to facebook/bart-large-mnli and revision d7645e1 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use mps:0


{'sequence': 'This School class is going to teach you loads of business shit',
 'labels': ['business', 'education', 'politics'],
 'scores': [0.7433503866195679, 0.2517228424549103, 0.00492677791044116]}

# Text Generation
Text-generation pipeline uses an input prompt to generate text

In [14]:
from transformers import pipeline

generator = pipeline("text-generation") # Create a text generation pipeline
generator("Haven't seen dad in months, he said he just needed some time to")

No model was supplied, defaulted to openai-community/gpt2 and revision 607a30d (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use mps:0
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Haven\'t seen dad in months, he said he just needed some time to relax.\n\n"I haven\'t seen him in a while, so I\'m just trying to enjoy life and stay away from things," said her mother, Karen.\n\n"I don\'t really know how I feel about it, but I know that mom will be there. She\'ll be there for me, and she\'ll be there for me if I\'m not here for a long time."\n\nKaren said she has no idea why the two men ended up in a car that crashed into a tree in the driveway of her home.\n\nAfter the accident, Karen said she was able to call 911 and get help.\n\n"I am just so grateful that our family and friends stopped by to help them, and to be so fortunate to be able to be here for them when they were a little bit older," said Karen.\n\n"I\'m so proud of them. I\'m so proud of them because we really are thankful for this community and for this community."\n\nThe couple has been at the hospital since Wednesday after the incident.\n\nThe family of the car crash suspect is not

# Tasks
Zero-Shot, Text-Generation & Sentiment-Analysis are all examples of 'Tasks', which you can use as a filter on HF when searching for [Models](https://huggingface.co/models).
## Task examples:
```python
pipeline("text-classification")
pipeline("text-generation")
pipeline("summarization")
pipeline("translation")
pipeline("question-answering")
pipeline("zero-shot-classification")
pipeline("conversational")
pipeline("fill-mask")
pipeline("feature-extraction")
pipeline("token-classification")
```
More tasks can be retreived using:
```python
from transformers.pipelines import SUPPORTED_TASKS
print(SUPPORTED_TASKS.keys())
```
But tasks come after pipeline, so in the format:
```python
generator = pipeline("text-generation")
``` 
### The 'generator' is just naming the task in python so you know what it is.
You can name it whatever the fuck you want ⬇️

In [None]:
from transformers import pipeline
wankbot = pipeline("text-generation") # Create a text generation pipeline with a specific model
wankbot("The pornography category i'm feeling like watching today is", max_length=50, num_return_sequences=1) # Generate text with a maximum length and number of sequences

# Choosing Models
After starting off with our standard
```python
from transformers import pipeline
```
We can select our model by specifying it after our pipeline type:

```python
generator = pipeline("text-generation", model="gpt2")
```
#Additional Arguments
We can specify additional arguments such as maximum length, and the number of times we want to run the sequence (how many times it will run the task). For example:
```python
from transformers import pipeline

generator = pipeline("text-generation", model="distilgpt2")
generator(
    "When I get sad, I like to",
    max_length=30,
    num_return_sequences=3,
)
```


In [None]:
from transformers import pipeline

generator = pipeline("text-generation", model="distilgpt2")
generator(
    "When I get sad, I like to",
    truncation=True,
    max_length=30,
    num_return_sequences=3,
)

# Unmasking
Predicts missing words in the sentence. Note that different models may not use ```<mask>``` as their mask token.

In [None]:
from transformers import pipeline

unmasker = pipeline("fill-mask") # Create a fill-mask pipeline
unmasker("My dad left us, he said he was going to get <mask> from the store.", top_k=3) #Top-k=3 means it will return the top 3 predictions for the masked token

# Classifying Text With Named Entity Recognition (NER)
In this example, the model has to find parts of input text corresponding to entities such as people, orgs, or location

In [None]:
from transformers import pipeline

ner = pipeline("ner", grouped_entities=True) # Create a named entity recognition pipeline with grouped entities
ner("My name is Aaron and I study at Lund University in Sweden.")

## Formula
When we pass the argument 'Grouped_entities=True' in pipeline function, we tell PL to regroup together parts of the sentence that correspond to the same entity. You can see this in the response for Lund University, where it grouped those two words into the same entity. Without grouping the entities, the response will return each token seperately, which could be problematic as may times models use sub-word tokenization.
## Response
```bash
[{'entity_group': 'PER',
  'score': np.float32(0.9987024),
  'word': 'Aaron',
  'start': 11,
  'end': 16},
 {'entity_group': 'ORG',
  'score': np.float32(0.98618424),
  'word': 'Lund University',
  'start': 32,
  'end': 47},
 {'entity_group': 'LOC',
  'score': np.float32(0.99971896),
  'word': 'Sweden',
  'start': 51,
  'end': 57}]
```

# Questions answering
The questions-answering pippeline answers questions using info from a given context:

In [None]:
from transformers import pipeline
qna = pipeline("question-answering") # Create a question-answering pipeline
qna(
    question="What is my name?",
    context="My name is Aaron and I study at Lund University in Sweden."
)

## Response
The response **doesn't generate the answer**, it simply extracts the information.
```Bash
{'score': 0.9961819648742676, 'start': 11, 'end': 16, 'answer': 'Aaron'}
```


# Summarization
Reducing text into shorter text while keeping all or most of important aspects referenced in the text. You can still specify min/max length.

In [None]:
from transformers import pipeline
summarizer = pipeline("summarization") # Create a summarization pipeline
summarizer(
    "My name is Aaron and I study at Lund University in Sweden. I am currently taking a course on natural language processing, which is really interesting. The course covers various topics such as sentiment analysis, text generation, and named entity recognition.",
    max_length=10,
    min_length=5,
    do_sample=False
) # Summarize the input text with specified length constraints

# Translation
You can use a default model if you provide a language pair in the task name and use models from Helsinki-NLP's opus-mt Collections. E.g.
```python
from transformers import pipeline

translator = pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")
translator("Ce cours est produit par Hugging Face.")
```

In [None]:
from transformers import pipeline

translator = pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")
translator("Ce cours est produit par Hugging Face.") # Translate French to English