# Lecture 5: Coding with Huggingface

This lecture will take place on Google Colab for enhanced processing power

https://colab.research.google.com/drive/1XNC5i1056nq_r8pvJjf2EAsmKw9bGOMD

Note, you need to install the following: -

* pip install -q transformers diffusers torch

Transformers and diffusers are huggingface libraries.

1. Transformers is a powerful Python library created by Hugging Face that allows you to download, manipulate, and run thousands of pretrained, open-source AI models. These models cover multiple tasks across modalities like natural language processing, computer vision, audio, and multimodal learning. You defined a task (like text-generation), of which there are about 15 of them, and if you don't specify a model it will use a default model. Below we use a number of them. See [Documentation](https://huggingface.co/docs/transformers/v4.57.1/en/main_classes/pipelines#transformers.Pipeline) for a list of all tasks.
2. Diffusion-based models are a class of generative models that work by progressively adding noise to data and then reversing the process to generate new, high-quality data. They have become popular in AI for generating images, videos, and other data types. See [Documentation](https://huggingface.co/docs/diffusers/using-diffusers/unconditional_image_generation) for the types of image generation you can do.

These download the weights and biases, but they need either pytorch or tensorflow to run the matrix opperations for the weights and biases. The -q option means do it quietly i.e. don't show any output.

In [2]:
# Import libraries
import os
import torch
from dotenv import load_dotenv 
from huggingface_hub import login

# load the .env file
load_dotenv()

# and get the API key
HUGGINGFACE_API_KEY = os.getenv("HUGGINGFACE_API_KEY")

if HUGGINGFACE_API_KEY is None:
    raise Exception("API key is missing")

# And log in to huggingface
login(token=HUGGINGFACE_API_KEY)

# The pipeline function is the quickest way to get started with any of the models from huggingface
from transformers import pipeline

## Text Generation

In [3]:
# the quickest way is to identify the task you want to perform and let the library pick the model for you. Text generation is not like a asking a question and getting
# an answer. It is more like autocomplete. You give it a prompt and it generates text based on that prompt.
generator = pipeline("text-generation")
# Generate a maximum of 200 tokens
gen_output = generator("The secret to baking a good cake is ", max_length=200)[0]['generated_text']
print('\nOutput: \n',gen_output)

No model was supplied, defaulted to openai-community/gpt2 and revision 607a30d (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.





Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)



Output: 
 The secret to baking a good cake is vernacular.

Just make sure you don't mess with the dough.

If you are going to use a plastic bag, make sure it is sturdy and doesn't get twisted.

If you are going to use baking soda, make sure it is well-ventilated.

If you are going to use a mix of corn starch, butter and baking soda, make sure you mix well.

Make sure you leave enough space for the butter to stick to the edges of the dough.

This cake is a bit trickier to do, so don't be afraid to use a larger bag.

When it comes to baking cakes, I've used a butter-filled bag that's about 5.5 inches thick. It comes in a variety of sizes, but the one that comes with the Cake Bowl is 4.5 inches.

If you're making these cakes in a large bowl, you can use a smaller bag.

If you're making these cakes in a small bag, you can use a larger bag.

If you're making these cakes in a larger bowl, you can use a smaller bag.

If you are making these cakes in a larger bowl, you can use a smaller bag.


## Text Summarization

In [None]:
# If using a CPU, remove device="cuda"
# summarizer = pipeline("summarization", device="cuda")
summarizer = pipeline("summarization")
sum_output = summarizer(gen_output, max_length=50)
print(sum_output)

## Text Classification

In [None]:
# If using a CPU, remove device="cuda". You actually don't need to specify "text-classification" because the model is already fine-tuned for that task.
#classifier = pipeline(model="ProsusAI/finbert", device="cuda")
classifier = pipeline("text-classification", model="ProsusAI/finbert")
class_output = classifier("I really want this to be done", max_length=100)
print(class_output)

## Translation

In [None]:
# If using a CPU, remove device="cuda"
#translator = pipeline("translation_en_to_fr", model="google-t5/t5-base", device="cuda")

# This one failed to work. It didn't start downloading model.safetensors. 
translator = pipeline("translation_en_to_fr", model="google-t5/t5-base")
translator_output = translator("Fly me to the moon", max_length=100)
print(translator_output)

In [None]:
# Cant get this to work. But it worked in colab but took 16 minutes.
from diffusers import DiffusionPipeline

# an image generator
image_gen = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")

# generate the image
image_gen = image_gen("A fire cat riding on a spaceship").images[0]
# and display
image_gen