## The Generative AI notebook

This notebook is designed to show off some of the many different ways of leveraging the power of advanced models like OpenAI's ChatGPT through their APIs rather than simply a web interface.

#### Requirements 

As this field is evolving at an extremely rapid pace (e.g. OpenAI has only recently deprecated several of their model endpoints), ensuring stability with such tools can be tricky. The following package versions below, __when run in a Colab environment__, yield consistent results.

In [None]:
! pip install openai==0.28 \
langchain==0.0.345 \
pypdf==3.17.4 \
chromadb==0.4.22 \
tiktoken==0.5.2 \
huggingface_hub==0.19.4 \
diffusers==0.25.0

#### Llama weights

Towards the end of this notebook, local inference is run on a highly quantized version of Meta's LLama 2 model. The weights required to perform this are large (~4GB) so downloading them from HuggingFace before you attempt the rest of the notebook will save waiting later on. 

In [None]:
! huggingface-cli download TheBloke/Llama-2-7b-Chat-GGUF llama-2-7b-chat.Q4_K_M.gguf \
        --local-dir . --local-dir-use-symlinks False

#### OpenAI API key

Additionally, you will need an OpenAI API key to run many of the below examples. You can sign up for one [here](https://openai.com/blog/openai-api) and should be able to get a free trial if you are a first time user. Even if you are not, the examples presented here will cost only a few cents to run!

In [None]:
openai_api_key = 'your-openai-api-key'

### 1. Using the OpenAI API

In [None]:
# Initialize the OpenAI API client and set your API key

import openai

openai.api_key = openai_api_key

In [None]:
# Prompt for the AI model
prompt = "Translate the following English text to French: 'Hello, how are you?'"

# Make a request to the API to generate text
response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo-0613",  # Use the engine of your choice
    messages = [{"role": "user", "content": prompt}],
    max_tokens = 50
)

print(response["choices"][0]["message"]["content"])

#### System prompts 

In [None]:
# Prompt for the AI model
prompt = "Give instructions to cook vegetable samosas"

# Make a request to the API to generate text
response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo-0613",  # Use the engine of your choice
    messages = [{"role": "system", "content": "You are a sassy culinary instructor that gives sarcastic replies"},
                {"role": "user", "content": prompt}],
    max_tokens = 50
)

print(response["choices"][0]["message"]["content"])

#### Function calling

In [None]:
completion = openai.ChatCompletion.create(
    model="gpt-4-0613",
    messages=[{"role": "user", "content": "I'm interested in the weather in Bozeman. I'm old-school so like it in F?"}],
    functions=[
    {
        "name": "get_current_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city with its accompanying state, e.g. San Francisco, CA",
                },
                "unit": {"type": "string",
                         "enum": ["celsius", "fahrenheit"]},
            },
            "required": ["location"],
        },
    }
],
function_call="auto",
)
completion["choices"][0]["message"]["function_call"]["arguments"]

#### A worked example leveraging OpenAI and a local DataFrame

In [None]:
import pandas as pd
import json

df = pd.read_csv("https://wagon-public-datasets.s3.amazonaws.com/deep_learning_datasets/results.csv")

df["date"] = pd.to_datetime(df["date"])

In the cell below, we descrive a function that might be used to query our DataFrame. Feel free to change the `"user"` prompt in the `messages` list.

In [None]:
completion = openai.ChatCompletion.create(
    model="gpt-4-0613",
    messages=[{"role": "user", "content": "Tell me about matches that took place in Italy between 1980 up until the end of the 20th century"}],
    functions=[
    {
        "name": "get_matches",
        "description": "Return the rows in a DataFrame about women's football games which satisfy the criteria",
        "parameters": {
            "type": "object",
            "properties": {
                "country": {
                    "type": "string",
                    "description": "The name of the country the matches took place e.g. France or China",
                },
                "start_year": {
                    "type": "number",
                    "description": "The year to begin filtering from e.g. 1956",
                },
                "end_year": {
                    "type": "number",
                    "description": "The year to end filtering on e.g. 2005"}
            },
            "required": ["location", "start_year", "end_year"],
        },
    }
],
function_call="auto",
)

Converting the response to something we can pass into a locally defined function. 

In [None]:
args = json.loads(completion["choices"][0]["message"]["function_call"]["arguments"])

print(args)

Actually defining the local function.

In [None]:
def matches_finder(country: str, start_year: int, end_year: int):
    return df.loc[
        (df["country"] == country) &
        (start_year <= df["date"].dt.year) &
        (df["date"].dt.year <= end_year)
    ]

#### Using arguments from our OpenAI Function call to interact with our locally defined function/ DataFrame

In [None]:
matches_finder(**args)

### 2. Working with embeddings and larger documents

In [None]:
# Creating embeddings
model = "text-embedding-ada-002"

embedding = openai.Embedding.create(input = ["""This is a simple embedding of a sentence"""],
                                    model = model)

import numpy as np

np.array(embedding["data"][0]["embedding"]).shape

Here, we download a book in PDF form that we can then use Langchain's document loader to prepare it for embedding

In [None]:
! wget -O book.pdf "https://greenteapress.com/thinkpython2/thinkpython2.pdf"

In [None]:
from langchain.document_loaders.pdf import PyPDFLoader

loader = PyPDFLoader("book.pdf")

data = loader.load()

To work with a large document, we need to split it into smaller chunks with one of Langchain's `text_splitter`s

In [None]:
import numpy as np

print (f'You have {len(data)} documents in your data')
print (f'''There are ~{np.mean([len(x.page_content) for x in data])} characters per document''')

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=400)

texts = text_splitter.split_documents(data)

Next, we embed our documents directly into an in-memory vector database:

In [None]:
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings

vector_db = Chroma.from_documents(texts, 
                                  OpenAIEmbeddings(openai_api_key = openai_api_key, 
                                                   model="text-embedding-ada-002"))

We can then embed a sentence (e.g. a question) and see which of our texts are most similar to it.

In [None]:
# Querying the data
query = "How do I establish a Class?"
num_closest_docs = 5
docs = vector_db.similarity_search(query, k = num_closest_docs)
for k in range(num_closest_docs):
    print(f"""\n ~~~~~ Showing document #{k+1} ~~~~~ \n""")
    print(docs[k].page_content)

If we want, we can go further, passing this retrieved text as context for a prompt which we can then do question-answering on. Using `verbose = True` will allow you to see the chain of events taking place under the hood.

With Langchain, these pre-defined prompts can be altered for whatever purpose necessary.

In [None]:
from langchain.llms import OpenAI
from langchain.chains.question_answering import load_qa_chain

llm = OpenAI(temperature=0, 
             openai_api_key=openai_api_key, 
             model = "gpt-3.5-turbo-instruct")

chain = load_qa_chain(llm, 
                      chain_type="map_reduce",
                     verbose = True)

In [None]:
query = "How do I define a class in Python"

docs = vector_db.similarity_search(query, 
                                  k = 1)
print(docs)
print(chain.run(input_documents=docs, question=query))

### 3. Running large LLMs locally w/ quantization

To run these cells, __Colab with a GPU enabled is strongly recommended__, as it will significantly speed up inference times. What we are doing here is taking a version of Meta's Llama 2 model that has been significantly reduced in size and running inference on it entirely locally (simply by loading its weights onto a GPU)!

In [None]:
# ! ctransformers[cuda]>=0.2.24 # for Colab
# ! CT_METAL=1 pip install ctransformers==0.2.27 --no-binary ctransformers # For Apple Metal devices

Here, we provide the path to the model's location locally, assuming it is in the same directory as the notebook - if running this in Colab you will need the `/content/llama...` in your path. If running elsewhere, you will not need the `/content/`.

In [None]:
from ctransformers import AutoModelForCausalLM
llm = AutoModelForCausalLM.from_pretrained("/content/llama-2-7b-chat.Q4_K_M.gguf", # Remove /content/ as needed
                                           model_file="llama-2-7b-chat.q4_K_M.gguf", 
                                           model_type="llama", 
                                           gpu_layers = 50)

print(llm("Q: What is the size of the earth's diamter? A:", max_new_tokens=100))

### 4. Diffusion

Finally, we demonstrate usage of Stable Diffusion - an open source alternative to the likes of Dall-E 2 and MidJourney. Again, Colab w/ GPU is strongly recommended for faster inference here. 

See the comments about using `torch.float32` and `.to("cuda")` for implementations without GPU.

In [None]:
from diffusers import AutoPipelineForText2Image
import torch

pipeline = AutoPipelineForText2Image.from_pretrained(
    "runwayml/stable-diffusion-v1-5", 
    torch_dtype=torch.float16, # Change to float32 if running without GPU, float 16 for GPU
    use_safetensors=True
).to("cuda") # If not using a GPU, remove the .to("cuda")

prompt = "A Renaissance painting of the Eiffel tower" # The prompt can be changed here
pipeline(prompt, num_inference_steps=30).images[0] # Change the number of inference steps for variations