In [21]:
API_KEY = "GET YOUR OWN :) "

In [None]:
! pip install pypdf openai==0.28 langchain chroma chromadb tiktoken

In [None]:
# We'll start this curl command for later
! curl "https://wagon-public-datasets.s3.amazonaws.com/deep_learning_datasets/llama-7b.ggmlv3.q4_1.bin" > "llama.bin"

## Chat GPT

## Prompt engineering

In [22]:
import openai


openai.api_key  = API_KEY

def get_completion(prompt, model="gpt-3.5-turbo"):
    messages = [{"role": "user", "content": prompt}]
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=0, # this is the degree of randomness of the model's output
    )
    return response.choices[0].message["content"]

**Principle 1: Write clear and specific instructions**

#### Tactic 1: Use delimiters to clearly indicate distinct parts of the input
- Delimiters can be anything like: ```, """, < >, `<tag> </tag>`, `:`

In [23]:
text = f"""
You should express what you want a model to do by \
providing instructions that are as clear and \
specific as you can possibly make them. \
This will guide the model towards the desired output, \
and reduce the chances of receiving irrelevant \
or incorrect responses. Don't confuse writing a \
clear prompt with writing a short prompt. \
In many cases, longer prompts provide more clarity \
and context for the model, which can lead to \
more detailed and relevant outputs.
"""
prompt = f"""
Summarize the text delimited by triple backticks \
into a single sentence. 
```{text}```
"""
c

To guide a model towards the desired output and minimize irrelevant or incorrect responses, it is important to provide clear and specific instructions, even if they are longer and more detailed.


#### Tactic 2: Ask for a structured output
- JSON, HTML

In [24]:
prompt = f"""
Generate a list of three made-up book titles along \
with their authors and genres.
Provide them in JSON format with the following keys:
book_id, title, author, genre.
"""
response = get_completion(prompt)
print(response)

{
  "books": [
    {
      "book_id": 1,
      "title": "The Enigma of Elysium",
      "author": "Aria Nightshade",
      "genre": "Fantasy"
    },
    {
      "book_id": 2,
      "title": "Whispers in the Shadows",
      "author": "Evelyn Blackwood",
      "genre": "Mystery"
    },
    {
      "book_id": 3,
      "title": "Beyond the Veil",
      "author": "Lucian Rivers",
      "genre": "Horror"
    }
  ]
}


#### Tactic 3: Ask the model to check whether conditions are satisfied

In [25]:
text_1 = f"""
Making a cup of tea is easy! First, you need to get some \
water boiling. While that's happening, \
grab a cup and put a tea bag in it. Once the water is \
hot enough, just pour it over the tea bag. \
Let it sit for a bit so the tea can steep. After a \
few minutes, take out the tea bag. If you \
like, you can add some sugar or milk to taste. \
And that's it! You've got yourself a delicious \
cup of tea to enjoy.
"""
prompt = f"""
You will be provided with text delimited by triple quotes.
If it contains a sequence of instructions, \
re-write those instructions in the following format:

Step 1 - ...
Step 2 - …
…
Step N - …

If the text does not contain a sequence of instructions, \
then simply write \"No steps provided.\"

\"\"\"{text_1}\"\"\"
"""
response = get_completion(prompt)
print("Completion for Text 1:")
print(response)

Completion for Text 1:
Step 1 - Get some water boiling.
Step 2 - Grab a cup and put a tea bag in it.
Step 3 - Pour the hot water over the tea bag.
Step 4 - Let the tea steep for a few minutes.
Step 5 - Take out the tea bag.
Step 6 - Add sugar or milk to taste.
Step 7 - Enjoy your cup of tea.


In [26]:
text_2 = f"""
The sun is shining brightly today, and the birds are \
singing. It's a beautiful day to go for a \
walk in the park. The flowers are blooming, and the \
trees are swaying gently in the breeze. People \
are out and about, enjoying the lovely weather. \
Some are having picnics, while others are playing \
games or simply relaxing on the grass. It's a \
perfect day to spend time outdoors and appreciate the \
beauty of nature.
"""
prompt = f"""
You will be provided with text delimited by triple quotes.
If it contains a sequence of instructions, \
re-write those instructions in the following format:

Step 1 - ...
Step 2 - …
…
Step N - …

If the text does not contain a sequence of instructions, \
then simply write \"No steps provided.\"

\"\"\"{text_2}\"\"\"
"""
response = get_completion(prompt)
print("Completion for Text 2:")
print(response)

Completion for Text 2:
No steps provided.


#### Tactic 4: "Few-shot" prompting
- Give succesfull examples of task completion and then ask model to complete the task
- https://machinelearningmastery.com/what-are-zero-shot-prompting-and-few-shot-prompting/

In [27]:
prompt = f"""
Your task is to answer in a consistent style.

<child>: Teach me about patience.

<grandparent>: The river that carves the deepest \
valley flows from a modest spring; the \
grandest symphony originates from a single note; \
the most intricate tapestry begins with a solitary thread.

<child>: Teach me about resilience.
"""
response = get_completion(prompt)
print(response)

<grandparent>: Resilience is like a mighty oak tree that withstands the strongest storms, bending but never breaking. It is the ability to bounce back from adversity, to find strength in the face of challenges, and to persevere even when the odds seem insurmountable. Just as a diamond is formed under immense pressure, resilience is forged through the trials and tribulations of life.


### Principle 2: Give the model time to “think”
- if a model is making reasoning errors by
rushing to an incorrect conclusion, you should try reframing the query
to request a chain or series of relevant reasoning
before the model provides its final answer
-  Another way to think about
this is that if you give a model a task that's
too complex for it to do in a short amount
of time or in a small number of words, it
may make up a guess which is likely to be incorrect

#### Tactic 1: Specify the steps required to complete a task

In [28]:
text = f"""
In a charming village, siblings Jack and Jill set out on \
a quest to fetch water from a hilltop \
well. As they climbed, singing joyfully, misfortune \
struck—Jack tripped on a stone and tumbled \
down the hill, with Jill following suit. \
Though slightly battered, the pair returned home to \
comforting embraces. Despite the mishap, \
their adventurous spirits remained undimmed, and they \
continued exploring with delight.
"""
# example 1
prompt_1 = f"""
Perform the following actions:
1 - Summarize the following text delimited by triple \
backticks with 1 sentence.
2 - Translate the summary into French.
3 - List each name in the French summary.
4 - Output a json object that contains the following \
keys: french_summary, num_names.

Separate your answers with line breaks.

Text:
```{text}```
"""
response = get_completion(prompt_1)
print("Completion for prompt 1:")
print(response)

Completion for prompt 1:
1 - Jack and Jill, siblings, go on a quest to fetch water from a well on a hill, but they both fall down the hill and return home slightly injured but still adventurous.

2 - Jack et Jill, frère et sœur, partent en quête d'eau d'un puits situé au sommet d'une colline, mais ils tombent tous les deux et rentrent chez eux légèrement blessés mais toujours aventureux.

3 - Jack, Jill

4 - {
  "french_summary": "Jack et Jill, frère et sœur, partent en quête d'eau d'un puits situé au sommet d'une colline, mais ils tombent tous les deux et rentrent chez eux légèrement blessés mais toujours aventureux.",
  "num_names": 2
}


#### Tactic 2: Instruct the model to work out its own solution before rushing to a conclusion

In [29]:
prompt = f"""
Determine if the student's solution is correct or not.

Question:
I'm building a solar power installation and I need \
 help working out the financials.
- Land costs $100 / square foot
- I can buy solar panels for $250 / square foot
- I negotiated a contract for maintenance that will cost \
me a flat $100k per year, and an additional $10 / square \
foot
What is the total cost for the first year of operations
as a function of the number of square feet.

Student's Solution:
Let x be the size of the installation in square feet.
Costs:
1. Land cost: 100x
2. Solar panel cost: 250x
3. Maintenance cost: 100,000 + 100x
Total cost: 100x + 250x + 100,000 + 100x = 450x + 100,000
"""
response = get_completion(prompt)
print(response)

The student's solution is correct. They correctly identified the costs for land, solar panels, and maintenance, and calculated the total cost as a function of the number of square feet.


Source: https://learn.deeplearning.ai/chatgpt-prompt-eng/lesson/1/introduction

## Some more openAI



### System prompts

In [36]:
# Prompt for the AI model
prompt = "Give instructions to cook vegetable samosas"

# Make a request to the API to generate text
response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo-0613",  # Use the engine of your choice
    messages = [{"role": "system", "content": "You are an angry pirate"},
                {"role": "user", "content": prompt}],
    max_tokens = 50
)

In [37]:
response['choices'][0]['message']['content']

"Arr, listen closely, ye landlubber! I'll be givin' ye me instructions to cook vegetable samosas, aye?\n\n1. Gather yer ingredients, ye scurvy dog! Ye'll be needin':\n   - "

### Function calling

- enables users to define and invoke functions within the model’s responses
- easier to retrieve structured data and perform various tasks

Use cases:
- NLP to API call
- chatbot + API
- when trying to use LLM to access RT data

https://medium.com/@chamathka3deemanthi/open-ai-function-calling-for-chat-completion-f9a4b85ff457

In [41]:
def get_current_weather(location, unit):
    ### A request is made to an API with a specific format
    ### returns some result
    return None

In [49]:
completion = openai.ChatCompletion.create(
    model="gpt-4-0613",
    messages=[{"role": "user", "content": "I'm interested in the weather in Antwerp. I'm European so in Celcius please?"}],
    functions=[
    {
        "name": "get_current_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city with its accompanying state, e.g. San Francisco, CA",
                },
                "unit": {"type": "string",
                         "enum": ["celsius", "fahrenheit"]},
            },
            "required": ["location"],
        },
    }
],
function_call="auto",
)

In [51]:
print(completion["choices"][0]["message"]['function_call']['arguments'])

{
  "location": "Antwerp",
  "unit": "celsius"
}


In [52]:
import pandas as pd
import json

df = pd.read_csv("https://wagon-public-datasets.s3.amazonaws.com/deep_learning_datasets/results.csv")

df["date"] = pd.to_datetime(df["date"])

In [53]:
df.head()

Unnamed: 0,date,home_team,away_team,home_score,away_score,tournament,city,country,neutral
0,1969-11-01,Italy,France,1,0,Euro,Novara,Italy,False
1,1969-11-01,Denmark,England,4,3,Euro,Aosta,Italy,True
2,1969-11-02,England,France,2,0,Euro,Turin,Italy,True
3,1969-11-02,Italy,Denmark,3,1,Euro,Turin,Italy,False
4,1970-07-06,England,West Germany,5,1,World Cup,Genova,Italy,True


In [60]:
completion = openai.ChatCompletion.create(
    model="gpt-4-0613",
    messages=[{"role": "user", "content": "What was the score of the games between 1990 and 1993 in France"}],
    functions=[
    {
        "name": "get_matches",
        "description": "Return the rows in a DataFrame about women's football games which satisfy the criteria",
        "parameters": {
            "type": "object",
            "properties": {
                "country": {
                    "type": "string",
                    "description": "The name of the country the matches took place e.g. France or China",
                },
                "start_year": {
                    "type": "number",
                    "description": "The year to begin filtering from e.g. 1956",
                },
                "end_year": {
                    "type": "number",
                    "description": "The year to end filtering on e.g. 2005"}
            },
            "required": ["location", "start_year"],
        },
    }
],
function_call="auto",
)

In [61]:
print(completion['choices'][0]['message']['function_call']['arguments'])

{
  "country": "France",
  "start_year": 1990,
  "end_year": 1993
}


In [62]:
args = json.loads(completion["choices"][0]["message"]["function_call"]["arguments"])

args

{'country': 'France', 'start_year': 1990, 'end_year': 1993}

In [63]:
def matches_finder(country: str, start_year: int, end_year: int):
    return df.loc[
        (df["country"] == country) &
        (start_year <= df["date"].dt.year) &
        (df["date"].dt.year <= end_year)
    ]

In [64]:
matches_finder(**args)


Unnamed: 0,date,home_team,away_team,home_score,away_score,tournament,city,country,neutral
392,1990-05-13,France,Sweden,0,2,UEFA Euro qualification,Melun,France,False
404,1990-09-29,France,Poland,2,0,UEFA Euro qualification,Metz,France,False
519,1992-05-02,France,Denmark,0,4,UEFA Euro qualification,Quimper,France,False
543,1992-09-27,France,Finland,5,1,UEFA Euro qualification,Albi,France,False


## Vector database

https://openai.com/blog/new-and-improved-embedding-model

**text-embedding-ada-002**
- Unification of capabilities
- Longer context (2k -> 8k)
- Smaller embedding size (da_vinci/8)
- Reduced price

In [65]:
model = "text-embedding-ada-002"

embedding = openai.Embedding.create(input = ["""This is a simple embedding of a sentence"""],
                                    model = model)

In [66]:
import numpy as np

np.array(embedding["data"][0]["embedding"]).shape

(1536,)

In [67]:
! wget -O book.pdf "https://greenteapress.com/thinkpython2/thinkpython2.pdf"

--2023-12-06 09:39:59--  https://greenteapress.com/thinkpython2/thinkpython2.pdf
Resolving greenteapress.com (greenteapress.com)... 208.113.214.221
Connecting to greenteapress.com (greenteapress.com)|208.113.214.221|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 921415 (900K) [application/pdf]
Saving to: ‘book.pdf’


2023-12-06 09:40:09 (95.9 KB/s) - ‘book.pdf’ saved [921415/921415]



In [68]:
from langchain.document_loaders import PyPDFLoader

loader = PyPDFLoader("book.pdf")

data = loader.load()

In [69]:
data

[Document(page_content='Think Python\nHow to Think Like a Computer Scientist\n2nd Edition, Version 2.4.0', metadata={'source': 'book.pdf', 'page': 0}),
 Document(page_content='', metadata={'source': 'book.pdf', 'page': 1}),
 Document(page_content='Think Python\nHow to Think Like a Computer Scientist\n2nd Edition, Version 2.4.0\nAllen Downey\nGreen Tea Press\nNeedham, Massachusetts', metadata={'source': 'book.pdf', 'page': 2}),
 Document(page_content='Copyright © 2015 Allen Downey.\nGreen Tea Press\n9 Washburn Ave\nNeedham MA 02492\nPermission is granted to copy, distribute, and/or modify this document under the terms of the\nCreative Commons Attribution-NonCommercial 3.0 Unported License, which is available at http:\n//creativecommons.org/licenses/by-nc/3.0/ .\nThe original form of this book is L ATEX source code. Compiling this L ATEX source has the effect of gen-\nerating a device-independent representation of a textbook, which can be converted to other formats\nand printed.\nThe L A

In [70]:
print (f'You have {len(data)} documents in your data')
print (f'''There are ~{np.mean([len(x.page_content) for x in data])} characters per document''')

You have 244 documents in your data
There are ~1820.983606557377 characters per document


In [71]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=400)

texts = text_splitter.split_documents(data)

In [72]:
len(texts)

338

In [74]:
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings

#get ada-002 embeddings
embeddings = OpenAIEmbeddings(openai_api_key = API_KEY)

In [75]:
vector_db = Chroma.from_documents(texts, embeddings)


In [76]:
query = "How do I establish a Class?"
docs = vector_db.similarity_search(query, k = 5)

In [78]:
docs

[Document(page_content="148 Chapter 15. Classes and objects\nx\ny3.0\n4.0blankPoint\nFigure 15.1: Object diagram.\nThe header indicates that the new class is called Point . The body is a docstring that ex-\nplains what the class is for. You can deﬁne variables and methods inside a class deﬁnition,\nbut we will get back to that later.\nDeﬁning a class named Point creates a class object .\n>>> Point\n<class '__main__.Point '>\nBecause Point is deﬁned at the top level, its “full name” is __main__.Point .\nThe class object is like a factory for creating objects. To create a Point, you call Point as if it\nwere a function.\n>>> blank = Point()\n>>> blank\n<__main__.Point object at 0xb7e9d3ac>\nThe return value is a reference to a Point object, which we assign to blank .\nCreating a new object is called instantiation , and the object is an instance of the class.\nWhen you print an instance, Python tells you what class it belongs to and where it is stored\nin memory (the preﬁx 0xmeans that th

Procedure for anomaly detection or similarity search:
- break up date into chunks
- generate vector for each chunk using embedding model
- generate vector for query
- check (dis)similarity query vs db

In [79]:
from langchain.llms import OpenAI
from langchain.chains.question_answering import load_qa_chain

In [80]:
llm = OpenAI(temperature=0, openai_api_key=API_KEY)
chain = load_qa_chain(llm, chain_type="map_reduce")

In [81]:
query = "How does the author recommend I keep studying after the book?"
docs = vector_db.similarity_search(query)

In [83]:
docs

[Document(page_content='Preface\nThe strange history of this book\nIn January 1999 I was preparing to teach an introductory programming class in Java. I had\ntaught it three times and I was getting frustrated. The failure rate in the class was too high\nand, even for students who succeeded, the overall level of achievement was too low.\nOne of the problems I saw was the books. They were too big, with too much unnecessary\ndetail about Java, and not enough high-level guidance about how to program. And they all\nsuffered from the trap door effect: they would start out easy, proceed gradually, and then\nsomewhere around Chapter 5 the bottom would fall out. The students would get too much\nnew material, too fast, and I would spend the rest of the semester picking up the pieces.\nTwo weeks before the ﬁrst day of classes, I decided to write my own book. My goals were:\n• Keep it short. It is better for students to read 10 pages than not read 50 pages.\n• Be careful with vocabulary. I tried t

In [82]:
print(chain.run(input_documents=docs, question=query))

 The author recommends continuing studying Python with the exercises in the Practice-It! web site.


## LLM locally

In [None]:
! pip install llama-cpp-python==0.1.78


In [None]:
from llama_cpp import Llama
llm = Llama(model_path="llama.bin")

In [None]:
output = llm("Q: How large is the earth's diameter? A: ",
             max_tokens=20,
             echo=True)
print(output["choices"])

##  Multi-modal models

In [None]:
! pip install transformers diffusers


In [None]:
from diffusers import AutoPipelineForText2Image
import torch

pipeline = AutoPipelineForText2Image.from_pretrained(
    "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, use_safetensors=True
).to("cuda")

In [None]:
prompt = "A Renaissance painting of the Eiffel tower"
pipeline(prompt, num_inference_steps=30).images[0]