# Lab Week 2: Introduction to Generative AI with HuggingFace

Welcome to your first lab assignment on Generative AI! In this notebook, you'll explore various aspects of GenAI using HuggingFace tools and models. You'll work with text generation, translation, image generation, and audio processing.

## Attribution
This notebook was re-used and modified from material created by NVIDIA and Dartmouth College and licensed under the Creative Commons Attribution-Non Commercial 4.0 International License (CC BY-NC 4.0) for the **Generative AI: Theory and Applications** MSc Module at UWS.
Source materials available at: https://developer.nvidia.com/gen-ai-teaching-kit-syllabus (NVIDIA Deep Learning Institute Generative AI Teaching Kit) 

## Objectives:
1. Set up the necessary libraries and environment
2. Experiment with text generation using GPT-2
3. Perform text translation
4. Generate images using Stable Diffusion
5. Work with audio generation and transcription

Let's get started!

## 1. Setup and Imports

First, let's install the necessary libraries and import them. Run the following cells to set up your environment.

In [1]:
# !pip install -q transformers diffusers torch pydub coqui-tts openai-whisper accelerate numba zv ffmpeg-python

In [2]:
import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer
from transformers import MarianMTModel, MarianTokenizer
from diffusers import StableDiffusionPipeline
from IPython.display import Audio, display
import numpy as np
from TTS.api import TTS
import whisper

USE_CUDA = True
device = "cuda" if USE_CUDA and torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

  from .autonotebook import tqdm as notebook_tqdm
  :param \*\*kwargs: additional arguments, forwarded to the reader or writer
  for match in re.finditer('{0}\s*'.format(re.escape(sent)), self.original_text):
  txt = re.sub('(?<={0})\.'.format(am), '∯', txt)
  txt = re.sub('(?<={0})\.'.format(am), '∯', txt)


Using device: cpu


## 2. Text Generation with GPT-2

Let's start by generating text using the GPT-2 model. You'll create a function to generate text based on a given prompt.

In [3]:
def generate_text(prompt, max_length=50):
    model_name = "gpt2-medium"
    model = GPT2LMHeadModel.from_pretrained(model_name).to(device)
    tokenizer = GPT2Tokenizer.from_pretrained(model_name)
    
    inputs = tokenizer(prompt, return_tensors="pt").to(device)
    outputs = model.generate(inputs["input_ids"], max_length=max_length, num_return_sequences=1)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Test the function
prompt = "Once upon a time in a land far, far away"
generated_text = generate_text(prompt)
print("Generated Text:")
print(generated_text)

Cancellation requested; stopping current tasks.


KeyboardInterrupt: 

### Exercise 1: Creative Writing
Generate a short story using GPT-2. Use a prompt of your choice and set the `max_length` to 100. Analyze the output and discuss any interesting patterns or unexpected results you observe.

In [None]:
# Your code here
creative_prompt = "The robot woke up and realized it had emotions"
story = generate_text(creative_prompt, max_length=100)
print(story)

# Your analysis here
# ...


## 3. Text Translation

Now, let's work on translating text from one language to another using the MarianMT model.

In [None]:
def translate_text(text, src_lang="en", tgt_lang="fr"):
    model_name = f"Helsinki-NLP/opus-mt-{src_lang}-{tgt_lang}"
    model = MarianMTModel.from_pretrained(model_name).to(device)
    tokenizer = MarianTokenizer.from_pretrained(model_name)
    
    inputs = tokenizer(text, return_tensors="pt", padding=True).to(device)
    translated = model.generate(**inputs)
    return tokenizer.decode(translated[0], skip_special_tokens=True)

# Test the function
text_to_translate = "Hello, how are you?"
translated_text = translate_text(text_to_translate)
print(f"Original: {text_to_translate}")
print(f"Translated: {translated_text}")

### Exercise 2: Multi-language Translation
Translate a sentence of your choice into three different languages. Then, translate each result back to English. Discuss any changes in meaning or nuances that occurred during the translation process.

In [None]:
# Your code here
original_sentence = "The quick brown fox jumps over the lazy dog"
languages = ["fr", "de", "es"]

for lang in languages:
    translated = translate_text(original_sentence, "en", lang)
    back_translated = translate_text(translated, lang, "en")
    print(f"{lang.upper()}: {translated}")
    print(f"Back to English: {back_translated}\n")

# Your analysis here
# ...


## 4. Image Generation with Stable Diffusion

Let's explore image generation using the Stable Diffusion model.

In [None]:
def generate_image(prompt, output_path="generated_image.png"):
    #model_id = "runwayml/stable-diffusion-v1-5"
    # https://huggingface.co/OFA-Sys/small-stable-diffusion-v0
    model_id = "OFA-Sys/small-stable-diffusion-v0"

    if not torch.cuda.is_available():
        pipe = StableDiffusionPipeline.from_pretrained(model_id,).to(device)
    else:
        pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to(device)

    image = pipe(prompt, num_inference_steps=10, guidance_scale=7.5).images[0]
    image.save(output_path)
    display(image)

# Test the function
image_prompt = "A futuristic city with flying cars"
generate_image(image_prompt)

### Exercise 3: Creative Image Generation
Generate three different images using creative prompts of your choice. For each image, describe the prompt you used and analyze how well the generated image matches your intention. Discuss any unexpected or interesting elements in the images.

In [None]:
# Your code here
prompts = [
    "A steampunk-inspired coffee machine",
    "An underwater library with merfolk readers",
    "A treehouse skyscraper in a futuristic forest"
]

for i, prompt in enumerate(prompts):
    print(f"Prompt {i+1}: {prompt}")
    generate_image(prompt, f"image_{i+1}.png")
    print("\n")

# Your analysis here
# ...


## 5. Audio Generation and Transcription

Finally, let's work with audio generation and transcription using TTS and Whisper models.

In [None]:
import os
import librosa

def generate_audio(text, output_path="output.mp3"):
    tts = TTS(model_name="tts_models/en/ljspeech/tacotron2-DDC", progress_bar=True).to(device)
    tts.tts_to_file(text=text, file_path=output_path)
    abs_path = os.path.abspath(output_path)
    display(Audio(abs_path))
    return abs_path

def transcribe_audio(file_path):
    # Load the Whisper model
    model = whisper.load_model("base")

    # 1. Load the audio file
    audio_array, _ = librosa.load(file_path, sr=whisper.audio.SAMPLE_RATE)
    # 2. Transcribe the NumPy array
    result = model.transcribe(audio_array)

    return result["text"]

# Test the functions
text_to_speak = "Hello, this is a test of text-to-speech conversion."
audio_path = generate_audio(text_to_speak)

transcribed_text = transcribe_audio(audio_path)
print("Transcribed Text:")
print(transcribed_text)

### Exercise 4: Audio Chain
Create a chain of operations: 
1. Generate text using GPT-2
2. Convert that text to speech
3. Transcribe the generated audio back to text

Compare the original generated text with the final transcription. Discuss any differences and potential reasons for these differences.

In [None]:
# Your code here
original_prompt = "The future of artificial intelligence is"
print(f"Original Prompt: {original_prompt}")
generated_text = generate_text(original_prompt, max_length=50)
print(f"Original Generated Text: {generated_text}")

file_path = generate_audio(generated_text, "chain_output.wav")

transcribed_text = transcribe_audio(file_path)
print(f"\nTranscribed Text: {transcribed_text}")

# Your analysis here
# ...


## Conclusion

Congratulations on completing this introduction to Generative AI using HuggingFace tools! In this lab, you've explored text generation, translation, image generation, and audio processing. These are fundamental tasks in the field of GenAI, and understanding how they work is crucial for developing more complex applications.


