<a href="https://colab.research.google.com/github/barbaroja2000/llm/blob/main/Transformer_Agents_OpenAi.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Transformer Agents with OpenAI

This colab explores the functionality in Transformer agents

* Image Generation & Modification
*  Audio production & transcription
* Video generation
* Chat Mode
* Custom Tools

To run you will need an OpenAi API Key and HuggingFace API key, as environment variables:

```
OPENAI_API_KEY=""
HUGGINGFACE_API_KEY=""
```



![Image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/diagram.png)

**Built in tools:**

* Document question answering: given a document (such as a PDF) in image format, answer a question on this document (Donut)
* Text question answering: given a long text and a question, answer the question in the text (Flan-T5)
* Unconditional image captioning: Caption the image! (BLIP)
* Image question answering: given an image, answer a question on this image (VILT)
* Image segmentation: given an image and a prompt, output the segmentation mask of that prompt (CLIPSeg)
* Speech to text: given an audio recording of a person talking, transcribe the speech into text (Whisper)
* Text to speech: convert text to speech (SpeechT5)
* Zero-shot text classification: given a text and a list of labels, identify to which label the text corresponds the most (BART)
* Text summarization: summarize a long text in one or a few sentences (BART)
* Translation: translate the text into a given language (NLLB)
* Transformers Agent Custom Tools.

**Custom Tools:**

* Text downloader: to download a text from a web URL
* Text to image: generate an image according to a prompt, leveraging stable diffusion
* Image transformation: modify an image given an initial image and a prompt, leveraging instruct pix2pix stable diffusion
* Text to video: generate a small video according to a prompt, leveraging damo-vilab

**Reference:**

https://huggingface.co/docs/transformers/en/transformers_agents

https://colab.research.google.com/drive/1c7MHD-T1forUPGcC_jlwsIptOzpG3hSj


In [None]:
#@title Load Keys
!python -m pip install python-dotenv
from google.colab import drive
drive.mount('/content/drive/', force_remount=True)
import dotenv
import os
dotenv.load_dotenv('/content/drive/MyDrive/keys/keys.env')


In [None]:
#@title Installation
transformers_version = "v4.29.2" #@param ["main", "v4.29.2"] {allow-input: true}
!pip install huggingface_hub>=0.14.1 git+https://github.com/huggingface/transformers@$transformers_version -q diffusers accelerate datasets torch soundfile sentencepiece opencv-python openai transformers
!pip install youtube_transcript_api beautifulsoup4

In [None]:
#@title Sound
import IPython
import soundfile as sf

def play_audio(audio):
    sf.write("speech_converted.wav", audio.numpy(), samplerate=16000)
    return IPython.display.Audio("speech_converted.wav")

# OpenAi Agent

In [None]:
from transformers.tools import OpenAiAgent
agent = OpenAiAgent(model="gpt-4")

## Using the agent

In [None]:
#Examples : "Generate an image of two cheshire cats, one black, one tabby staring at the camera"
picture_seed =  'Generate a picture of two godzillas fighting. photorealistic. both godzillas should side on and fully visible in the frame' #@param {type:"string"}

In [None]:
picture = agent.run(picture_seed)
picture

In [None]:
#Examples : "Generate an image of two cheshire cats, one black, one tabby staring at the camera"
picture_seed2 =  'the godzillas should be replaced with lego versions' #@param {type:"string"}

In [None]:
picture_replaced = agent.run(picture_seed2, image=picture)
picture_replaced

#Audio Produce & Transcribe

In [None]:
audio = agent.run("Read out loud the summary of https://en.wikipedia.org/wiki/Chuck_Norris")
play_audio(audio)

In [None]:
agent.run("Provide transcript of this audio", audio=audio)

#Video
Requires high Memory, borks on standard Colab.

In [None]:
#Examples : "Generate an image of two cheshire cats, one black, one tabby staring at the camera"
video_seed =  'Generate a video of  Darth Vader  dancing' #@param {type:"string"}

In [None]:
video = agent.run(video_seed)

In [None]:
import imageio
from IPython.display import HTML
from base64 import b64encode

def produce_video(frames):
  result = [(r).astype("uint8") for r in frames]
  imageio.mimsave("video.mp4", result, fps=5)
  mp4 = open('/content/video.mp4','rb').read()
  data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
  video_html = f"""
    <video width=400 controls>
          <source src="{data_url}" type="video/mp4">
    </video>
  """
  return video_html

In [None]:
video_html = produce_video(video)
HTML(video_html)

### Chat mode
- `.run` does not keep memory across runs, but performs better for multiple operations at once (such as running two, or three tools in a row from a given instruction)
- `.chat` keeps memory across runs, but performs better at single instructions.

In [None]:
#Examples : "Generate an image of two cheshire cats, one black, one tabby staring at the camera"
picture_seed3 =  'Create an photorealistic image of  an Oompa Lumpa' #@param {type:"string"}

In [None]:
agent.chat(picture_seed3)

In [None]:
#Examples : "Generate an image of two cheshire cats, one black, one tabby staring at the camera"
picture_seed4 =  'Change this image so the Oompa Loompa looks like a leprechaun' #@param {type:"string"}

In [None]:
agent.chat(picture_seed4)

#Custom Tools

Example below takes a youtube id and creates transcript from it

In [None]:
from youtube_transcript_api import YouTubeTranscriptApi
from bs4 import BeautifulSoup
from transformers import Tool
import requests
from huggingface_hub import list_models
from youtube_transcript_api.formatters import TextFormatter

class YouTubeTranscriptFetcher(Tool):
    name = "youtube_transcript_fetcher"
    description = ("This is a tool that fetches a transcript of a youtube video. It takes input of video id, and returns the transcript of the video.")

    inputs = ["text"]
    outputs = ["text"]

    @staticmethod
    def _check_video_url(video_id: str):
      checker_url = f"https://www.youtube.com/watch?v={video_id}"
      request = requests.get(checker_url)
      return request.status_code == 200

    def __call__(self, video_id:str):
      print(video_id)

      if not video_id or not self._check_video_url(video_id):
        raise ValueError("Must pass valid youtube ID")

      transcript = YouTubeTranscriptApi.get_transcript(video_id)
      formatter = TextFormatter()
      text_formatted = formatter.format_transcript(transcript)
      return text_formatted



In [None]:
#A conversation with OpenAI CEO Sam Altman
tool = YouTubeTranscriptFetcher()
tool("uRIWgbvouEw")

In [None]:
from transformers.tools import HfAgent
from transformers import OpenAiAgent

agent = OpenAiAgent(model="gpt-4", additional_tools=[tool])

In [None]:
agent.run("Fetch the youtube transcript of uRIWgbvouEw and summarize")