## Setup / Installation

<!-- Most tools require Anaconda, as they are doing _more_ than just installing Python packages.  They often have base C libs / misc tools that need to be installed

This is often the most painful part of ML / AI , in my experience ... -->

### Install Jupyter / Get started on Jupyter Labs

_Alt: Google Collabs_

```sh
pip install jupyterlab
```

## [HuggingFace](https://huggingface.co) _Transformers_ 🤖

HuggingFace's `transformers` package is a very popular lib for all sorts of ML/AI tools in Python

https://huggingface.co/docs/transformers/index

It exposes the same models to biggest ML frameworks (PyTorch, TensorFlow, Flax)

```sh
!pip install transformers datasets
pip install torch
```

In [14]:
# The pipeline() is the easiest and fastest way to use a pretrained model for inference. 
# You can use the pipeline() out-of-the-box for many tasks across different modalities

from transformers import pipeline

### pipeline example: Text classification `sentiment-analysis`

> classifying sequences according to positive or negative sentiments
>
> https://huggingface.co/docs/transformers/v4.33.2/en/main_classes/pipelines#transformers.TextClassificationPipeline

In [None]:
#  "text-classification" (alias "sentiment-analysis" available): will return a TextClassificationPipeline.
my_analyzer = pipeline(task="sentiment-analysis")

> No model was supplied, defaulted to `distilbert-base-uncased-finetuned-sst-2-english` and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
> Using a pipeline without specifying a model name and revision in production is not recommended.

In [None]:
my_analyzer.model

# DistilBertForSequenceClassification

In [None]:
SENTENCES = [
    "Yellow is my favorite color.",
    "I hate the color yellow.",
    "I love the color yellow.",
    "I don't know if I like the color yellow.",
]


for sentence in SENTENCES:
    result = my_analyzer(sentence)[0]
    print(f"{result['label']} with score {round(result['score'], 4)}: \t\t {sentence}")

### pipeline example: Text generation `text-generation`

> This pipeline predicts the words that will follow a specified text prompt.
>
> https://huggingface.co/docs/transformers/v4.33.2/en/main_classes/pipelines#transformers.TextGenerationPipeline

In [None]:
my_text_generator = pipeline(task="text-generation")

# No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).

my_text_generator.model

In [None]:
PROMPTS = [
    "I love the color yellow because",
    "I hate the color yellow because",
    "I don't know if I like the color yellow because",
]

# for prompt in PROMPTS:
# my_text_generator(PROMPTS[0], max_length=25, do_sample=False)

for prompt in PROMPTS:
    result = my_text_generator(prompt, max_length=25, do_sample=False)[0]["generated_text"]
    print(f"{prompt} ...\t {result[len(prompt):]}")

### pipeline example: Visual question answering `vqa`

> answer a question about the image, given an image and a question
>
> https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.VisualQuestionAnsweringPipeline

In [15]:
my_visual_question_answerer = pipeline(model="dandelin/vilt-b32-finetuned-vqa")

my_visual_question_answerer.task

Downloading (…)lve/main/config.json:   0%|          | 0.00/136k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/470M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/320 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading (…)rocessor_config.json:   0%|          | 0.00/251 [00:00<?, ?B/s]

Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.


'visual-question-answering'

In [17]:
# 

my_visual_question_answerer(question="What animal is the person riding?", image="./output/astronaut_rides_horse.png") 

[{'score': 0.9961353540420532, 'answer': 'horse'},
 {'score': 0.011806587688624859, 'answer': 'pony'},
 {'score': 0.011430094949901104, 'answer': 'yes'},
 {'score': 0.0026749465614557266, 'answer': 'donkey'},
 {'score': 0.00147891859523952, 'answer': 'human'}]

In [20]:
# traffic cam captcha example

for i in range(1, 17):
    result = my_visual_question_answerer(question="Does this image contain a traffic light?", image=f"./traffic-captcha/square-{i}.png")
    answer = result[0]["answer"]
    score = round(result[0]["score"] * 100, 2)

    if answer == "yes":
        print(f"square-{i}.png - ({score}% confidence)")


square-5.png - (99.91% confidence)
square-6.png - (99.91% confidence)
square-7.png - (76.56% confidence)
square-14.png - (86.83% confidence)


### pipeline example: `TextToAudioPipeline` and `AutomaticSpeechRecognitionPipeline`

> extracting spoken text contained within some audio.
> https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline
>
>  generates an audio file from an input text and optional other conditional input
> https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.TextToAudioPipeline

In [27]:
# https://huggingface.co/suno/bark-small
# https://huggingface.co/openai/whisper-base

my_text_to_audio_pipe = pipeline(model="suno/bark-small")
my_audio_to_text_pipe = pipeline(model="openai/whisper-base")

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.98k [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/290M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/3.78k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/805 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/836k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.48M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/494k [00:00<?, ?B/s]

Downloading (…)main/normalizer.json:   0%|          | 0.00/52.7k [00:00<?, ?B/s]

Downloading (…)in/added_tokens.json:   0%|          | 0.00/34.6k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.08k [00:00<?, ?B/s]

Downloading (…)rocessor_config.json:   0%|          | 0.00/185k [00:00<?, ?B/s]

In [34]:
result = my_text_to_audio_pipe("hello. this is YOUR BANK. calling for PERSON. Give us all your personal info.")


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:10000 for open-end generation.


In [35]:
from IPython.display import Audio

sampling_rate = my_text_to_audio_pipe.model.generation_config.sample_rate
Audio(result["audio"], rate=sampling_rate)

In [36]:
result = my_audio_to_text_pipe("./audio/christmascarol_00_dickens_64kb.mp3")

In [40]:
print(result["text"])

 A Christmas Carol by Charles Dickens. This is a Librevox recording. All Librevox recordings are in the public domain. A Christmas Carol. Preface. I have endeavoured in this ghostly little book to raise the ghost of an idea, which will not put my readers out of humour with themselves, with each other, with a season, or with


## [HuggingFace](https://huggingface.co) _Diffusers_ 🧨

HuggingFace's `diffusers` package 

https://huggingface.co/docs/diffusers/index

> diffusion models for generating images, audio, and even 3D structures of molecules

```sh
!pip install --upgrade diffusers[torch]
```

In [None]:
from diffusers import DiffusionPipeline

my_image_generator = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")

# running on Mac M1
my_image_generator = my_image_generator.to("mps")

# Recommended if your computer has < 64 GB of RAM
my_image_generator.enable_attention_slicing()

In [None]:
result = my_image_generator("pyschadelic cat listening to music")

In [None]:
result.images[0].save("output/my-image.png")

In [None]:
ACTIVITIES = [
    "eating a slice of lasagna",
    "playing the piano",
    "hitting a baseball with a bat",
]

for activity in ACTIVITIES:
    result = my_image_generator("comic strip of garfield the cat " + activity)
    result.images[0].save(f"output/{activity}.png")