docs:Added vision section.

antononcube · Mar 17, 2024 · 0ae5850 · 0ae5850
1 parent e364bf2
commit 0ae5850
Show file tree

Hide file tree

Showing 2 changed files with 171 additions and 84 deletions.
diff --git a/README-work.md b/README-work.md
@@ -323,6 +323,40 @@ $chat.say
 
 --------
 
+## AI-vision functions
+
+Consider [this image](https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/MarkdownDocuments/Diagrams/AI-vision-via-WL/0iyello2xfyfo.png):
+
+![](https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/MarkdownDocuments/Diagrams/AI-vision-via-WL/0iyello2xfyfo.png)
+
+Here we import the image (as a Base64 string):
+
+```perl6
+use Image::Markup::Utilities;
+my $url = 'https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/MarkdownDocuments/Diagrams/AI-vision-via-WL/0iyello2xfyfo.png';
+my $img = image-import($url);
+$img.substr(^100)
+```
+
+Here we apply OpenAI's AI vision model `gpt-4-vision-preview` (which is the default one) over the ***URL of the image***:
+
+```perl6
+llm-vision-synthesize('Describe the image.', $url);
+```
+
+Here we apply Gemini's AI vision model `gemini-pro-vision` over the image:
+
+```perl6
+llm-vision-synthesize('Describe the image.', $img, e => 'Gemini');
+```
+
+**Remark:** Currently, Gemini works with (Base64) images only (and does not with URLs.) OpenAI's vision works with both URLs and images.
+
+
+The function `llm-vision-function` uses the same evaluators (configurations, models) as `llm-vision-synthesize`.
+
+--------
+
 ## Potential problems
 
 With PaLM with certain wrong configuration we get the error:

diff --git a/README.md b/README.md
@@ -149,28 +149,28 @@ use LLM::Functions;
 .raku.say for llm-configuration('OpenAI').Hash;
 ```
 ```
-# :format("values")
-# :total-probability-cutoff(0.03)
-# :examples($[])
+# :evaluator(Whatever)
+# :module("WWW::OpenAI")
+# :function(proto sub OpenAITextCompletion ($prompt is copy, :$model is copy = Whatever, :$suffix is copy = Whatever, :$max-tokens is copy = Whatever, :$temperature is copy = Whatever, Numeric :$top-p = 1, Int :$n where { ... } = 1, Bool :$stream = Bool::False, Bool :$echo = Bool::False, :$stop = Whatever, Numeric :$presence-penalty = 0, Numeric :$frequency-penalty = 0, :$best-of is copy = Whatever, :api-key(:$auth-key) is copy = Whatever, Int :$timeout where { ... } = 10, :$format is copy = Whatever, Str :$method = "tiny", Str :$base-url = "https://api.openai.com/v1") {*})
 # :max-tokens(300)
-# :tool-request-parser(WhateverCode)
 # :temperature(0.8)
-# :function(proto sub OpenAITextCompletion ($prompt is copy, :$model is copy = Whatever, :$suffix is copy = Whatever, :$max-tokens is copy = Whatever, :$temperature is copy = Whatever, Numeric :$top-p = 1, Int :$n where { ... } = 1, Bool :$stream = Bool::False, Bool :$echo = Bool::False, :$stop = Whatever, Numeric :$presence-penalty = 0, Numeric :$frequency-penalty = 0, :$best-of is copy = Whatever, :api-key(:$auth-key) is copy = Whatever, Int :$timeout where { ... } = 10, :$format is copy = Whatever, Str :$method = "tiny", Str :$base-url = "https://api.openai.com/v1") {*})
+# :prompts($[])
 # :name("openai")
-# :stop-tokens($[])
-# :module("WWW::OpenAI")
-# :model("gpt-3.5-turbo-instruct")
-# :tool-response-insertion-function(WhateverCode)
-# :base-url("https://api.openai.com/v1")
 # :images($[])
-# :prompts($[])
+# :tools($[])
 # :api-key(Whatever)
+# :api-user-id("user:109176182730")
+# :base-url("https://api.openai.com/v1")
 # :tool-prompt("")
-# :api-user-id("user:162182730280")
-# :prompt-delimiter(" ")
-# :evaluator(Whatever)
+# :tool-response-insertion-function(WhateverCode)
 # :argument-renames(${:api-key("auth-key"), :stop-tokens("stop")})
-# :tools($[])
+# :stop-tokens($[])
+# :examples($[])
+# :format("values")
+# :tool-request-parser(WhateverCode)
+# :total-probability-cutoff(0.03)
+# :model("gpt-3.5-turbo-instruct")
+# :prompt-delimiter(" ")
 ```
 
 Here is the ChatGPT-based configuration:
@@ -179,28 +179,28 @@ Here is the ChatGPT-based configuration:
 .say for llm-configuration('ChatGPT').Hash;
 ```
 ```
-# api-user-id => user:707945290371
 # tool-request-parser => (WhateverCode)
-# function => &OpenAIChatCompletion
-# argument-renames => {api-key => auth-key, stop-tokens => stop}
-# tools => []
+# total-probability-cutoff => 0.03
+# model => gpt-3.5-turbo
+# api-user-id => user:475496933842
 # images => []
-# prompts => []
+# argument-renames => {api-key => auth-key, stop-tokens => stop}
+# max-tokens => 300
+# module => WWW::OpenAI
+# tool-prompt => 
+# format => values
 # stop-tokens => []
+# examples => []
+# temperature => 0.8
+# prompts => []
 # tool-response-insertion-function => (WhateverCode)
+# tools => []
+# function => &OpenAIChatCompletion
 # prompt-delimiter =>  
-# format => values
-# max-tokens => 300
 # api-key => (Whatever)
-# model => gpt-3.5-turbo
-# temperature => 0.8
 # base-url => https://api.openai.com/v1
-# total-probability-cutoff => 0.03
 # name => chatgpt
-# tool-prompt => 
-# evaluator => (my \LLM::Functions::EvaluatorChat_4771181253256 = LLM::Functions::EvaluatorChat.new(context => "", examples => Whatever, user-role => "user", assitant-role => "assistant", system-role => "system", conf => LLM::Functions::Configuration.new(name => "chatgpt", api-key => Whatever, api-user-id => "user:707945290371", module => "WWW::OpenAI", base-url => "https://api.openai.com/v1", model => "gpt-3.5-turbo", function => proto sub OpenAIChatCompletion ($prompt is copy, :$role is copy = Whatever, :$model is copy = Whatever, :$temperature is copy = Whatever, :$max-tokens is copy = Whatever, Numeric :$top-p = 1, Int :$n where { ... } = 1, Bool :$stream = Bool::False, :$stop = Whatever, Numeric :$presence-penalty = 0, Numeric :$frequency-penalty = 0, :@images is copy = Empty, :api-key(:$auth-key) is copy = Whatever, Int :$timeout where { ... } = 10, :$format is copy = Whatever, Str :$method = "tiny", Str :$base-url = "https://api.openai.com/v1") {*}, temperature => 0.8, total-probability-cutoff => 0.03, max-tokens => 300, format => "values", prompts => [], prompt-delimiter => " ", examples => [], stop-tokens => [], tools => [], tool-prompt => "", tool-request-parser => WhateverCode, tool-response-insertion-function => WhateverCode, images => [], argument-renames => {:api-key("auth-key"), :stop-tokens("stop")}, evaluator => LLM::Functions::EvaluatorChat_4771181253256), formatron => "Str"))
-# module => WWW::OpenAI
-# examples => []
+# evaluator => (my \LLM::Functions::EvaluatorChat_6288007030840 = LLM::Functions::EvaluatorChat.new(context => "", examples => Whatever, user-role => "user", assitant-role => "assistant", system-role => "system", conf => LLM::Functions::Configuration.new(name => "chatgpt", api-key => Whatever, api-user-id => "user:475496933842", module => "WWW::OpenAI", base-url => "https://api.openai.com/v1", model => "gpt-3.5-turbo", function => proto sub OpenAIChatCompletion ($prompt is copy, :$role is copy = Whatever, :$model is copy = Whatever, :$temperature is copy = Whatever, :$max-tokens is copy = Whatever, Numeric :$top-p = 1, Int :$n where { ... } = 1, Bool :$stream = Bool::False, :$stop = Whatever, Numeric :$presence-penalty = 0, Numeric :$frequency-penalty = 0, :@images is copy = Empty, :api-key(:$auth-key) is copy = Whatever, Int :$timeout where { ... } = 10, :$format is copy = Whatever, Str :$method = "tiny", Str :$base-url = "https://api.openai.com/v1") {*}, temperature => 0.8, total-probability-cutoff => 0.03, max-tokens => 300, format => "values", prompts => [], prompt-delimiter => " ", examples => [], stop-tokens => [], tools => [], tool-prompt => "", tool-request-parser => WhateverCode, tool-response-insertion-function => WhateverCode, images => [], argument-renames => {:api-key("auth-key"), :stop-tokens("stop")}, evaluator => LLM::Functions::EvaluatorChat_6288007030840), formatron => "Str"))
 ```
 
 **Remark:** `llm-configuration(Whatever)` is equivalent to `llm-configuration('OpenAI')`.
@@ -217,28 +217,28 @@ Here is the default PaLM configuration:
 .say for llm-configuration('PaLM').Hash;
 ```
 ```
-# base-url => 
-# api-key => (Whatever)
-# name => palm
+# tool-request-parser => (WhateverCode)
+# stop-tokens => []
+# images => []
 # format => values
-# temperature => 0.4
-# module => WWW::PaLM
+# name => palm
+# api-user-id => user:948311569993
 # argument-renames => {api-key => auth-key, max-tokens => max-output-tokens, stop-tokens => stop-sequences}
-# tool-response-insertion-function => (WhateverCode)
+# examples => []
+# api-key => (Whatever)
 # prompt-delimiter =>  
-# tool-prompt => 
 # model => text-bison-001
-# evaluator => (Whatever)
+# prompts => []
+# temperature => 0.4
 # function => &PaLMGenerateText
+# tool-response-insertion-function => (WhateverCode)
+# tool-prompt => 
 # tools => []
+# evaluator => (Whatever)
+# module => WWW::PaLM
 # max-tokens => 300
-# prompts => []
-# stop-tokens => []
 # total-probability-cutoff => 0
-# tool-request-parser => (WhateverCode)
-# examples => []
-# api-user-id => user:473884737101
-# images => []
+# base-url =>
 ```
 
 -----
@@ -253,7 +253,7 @@ Here we make a LLM function with a simple (short, textual) prompt:
 my &func = llm-function('Show a recipe for:');
 ```
 ```
-# -> $text, *%args { #`(Block|4771202748208) ... }
+# -> $text, *%args { #`(Block|6288098091184) ... }
 ```
 
 Here we evaluate over a message: 
@@ -263,24 +263,23 @@ say &func('greek salad');
 ```
 ```
 # Ingredients:
-# - 1 cucumber, diced
-# - 1 red bell pepper, diced
-# - 1 green bell pepper, diced
+# - 1 large cucumber, diced
+# - 1 bell pepper, diced
 # - 1 red onion, thinly sliced
-# - 1 cup cherry tomatoes, halved
-# - 1/2 cup kalamata olives, pitted and sliced
-# - 1/2 cup feta cheese, crumbled
-# - 1/4 cup olive oil
+# - 2-3 tomatoes, diced
+# - 1 cup Kalamata olives, pitted
+# - 1 cup feta cheese, crumbled
+# - 1/4 cup extra virgin olive oil
 # - 2 tablespoons red wine vinegar
 # - 1 teaspoon dried oregano
 # - Salt and pepper to taste
 # 
 # Instructions:
-# 1. In a large bowl, combine the cucumber, bell peppers, red onion, cherry tomatoes, and olives.
+# 1. In a large salad bowl, combine the cucumber, bell pepper, red onion, tomatoes, and olives.
 # 2. In a small bowl, whisk together the olive oil, red wine vinegar, oregano, salt, and pepper.
-# 3. Pour the dressing over the salad and toss to combine.
-# 4. Top the salad with crumbled feta cheese.
-# 5. Serve immediately or refrigerate for later. Enjoy your delicious Greek salad!
+# 3. Pour the dressing over the vegetables and toss to combine.
+# 4. Add the feta cheese on top of the salad.
+# 5. Serve immediately or refrigerate for 1-2 hours to allow the flavors to meld together before serving. Enjoy your delicious Greek salad!
 ```
 
 ### Positional arguments
@@ -294,7 +293,7 @@ my &func2 = llm-function(
         llm-evaluator => 'palm');
 ```
 ```
-# -> **@args, *%args { #`(Block|4771222360872) ... }
+# -> **@args, *%args { #`(Block|6288154113224) ... }
 ```
 
 Here were we apply the function:
@@ -303,7 +302,7 @@ Here were we apply the function:
 my $res2 = &func2("tennis balls", "toyota corolla 2010");
 ```
 ```
-# 35
+# 48
 ```
 
 Here we show that we got a number:
@@ -324,7 +323,7 @@ Here the first argument is a template with two named arguments:
 my &func3 = llm-function(-> :$dish, :$cuisine {"Give a recipe for $dish in the $cuisine cuisine."}, llm-evaluator => 'palm');
 ```
 ```
-# -> **@args, *%args { #`(Block|4771222376024) ... }
+# -> **@args, *%args { #`(Block|6288120035248) ... }
 ```
 
 Here is an invocation:
@@ -333,32 +332,29 @@ Here is an invocation:
 &func3(dish => 'salad', cuisine => 'Russian', max-tokens => 300);
 ```
 ```
-# **Russian Salad**
+# **Ingredients:**
 # 
-# Ingredients:
-# 
-# * 2 pounds (900g) red potatoes, peeled and cubed
-# * 1 pound (450g) carrots, peeled and cubed
-# * 1 pound (450g) celery, thinly sliced
-# * 1/2 cup (120ml) mayonnaise
-# * 1/2 cup (120ml) sour cream
-# * 1/4 cup (60ml) apple cider vinegar
-# * 1 teaspoon salt
-# * 1/2 teaspoon black pepper
-# * 1/4 cup (60ml) chopped fresh parsley
+# * 1 head of cabbage (chopped)
+# * 2 carrots (grated)
+# * 1 cucumber (chopped)
+# * 1/2 red onion (chopped)
+# * 1/2 cup of mayonnaise
+# * 1/4 cup of sour cream
+# * Salt and pepper to taste
 # 
-# Instructions:
+# **Instructions:**
 # 
-# 1. In a large bowl, combine the potatoes, carrots, and celery.
-# 2. In a small bowl, whisk together the mayonnaise, sour cream, vinegar, salt, and pepper.
+# 1. In a large bowl, combine the cabbage, carrots, cucumber, and onion.
+# 2. In a small bowl, whisk together the mayonnaise, sour cream, salt, and pepper.
 # 3. Pour the dressing over the salad and toss to coat.
-# 4. Sprinkle with parsley and serve immediately.
+# 4. Serve immediately or chill for later.
 # 
 # **Tips:**
 # 
-# * To make the potatoes ahead of time, cook them and then let them cool completely. Store them in an airtight container in the refrigerator until you're ready to make the salad.
-# * You can also use different vegetables in this salad. Some popular additions include green beans, peas, radishes, and cucumbers.
-# * If you don't have any fresh parsley on hand, you can use dried parsley instead. Just
+# * For a more flavorful salad, add some chopped fresh herbs, such as dill or parsley.
+# * You can also add some protein to the salad, such as shredded chicken or crumbled bacon.
+# * If you don't have any sour cream on hand, you can use yogurt or even just milk to thin out the mayonnaise.
+# * This salad is best served cold, so make sure to chill it for at least a few hours before serving.
 ```
 
 --------
@@ -414,7 +410,7 @@ my &fec = llm-example-function(
 say &fec('raccoon');
 ```
 ```
-# raccoon
+# panda
 ```
 
 --------
@@ -472,7 +468,7 @@ $chat.eval('What is the most transparent gem?');
 $chat.eval('Ok. What are the second and third most transparent gems?');
 ```
 ```
-# The second most transparent gem is typically considered to be sapphire, followed by spinel as the third most transparent gem.
+# The second most transparent gem is sapphire, and the third most transparent gem is emerald.
 ```
 
 Here are the prompt(s) and all messages of the chat object:
@@ -487,23 +483,80 @@ $chat.say
 # ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
 # role	user
 # content	What is the most transparent gem?
-# timestamp	2024-03-17T14:04:55.694947-04:00
+# timestamp	2024-03-17T15:35:54.133613-04:00
 # ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
 # role	assistant
 # content	Diamond is the most transparent gem.
-# timestamp	2024-03-17T14:04:56.954897-04:00
+# timestamp	2024-03-17T15:35:54.831745-04:00
 # ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
 # role	user
 # content	Ok. What are the second and third most transparent gems?
-# timestamp	2024-03-17T14:04:56.966458-04:00
+# timestamp	2024-03-17T15:35:54.846413-04:00
 # ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
 # role	assistant
-# content	The second most transparent gem is typically considered to be sapphire, followed by spinel as the third most transparent gem.
-# timestamp	2024-03-17T14:04:58.396591-04:00
+# content	The second most transparent gem is sapphire, and the third most transparent gem is emerald.
+# timestamp	2024-03-17T15:35:56.018877-04:00
 ```
 
 --------
 
+## AI-vision functions
+
+Consider [this image](https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/MarkdownDocuments/Diagrams/AI-vision-via-WL/0iyello2xfyfo.png):
+
+![](https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/MarkdownDocuments/Diagrams/AI-vision-via-WL/0iyello2xfyfo.png)
+
+Here we import the image (as a Base64 string):
+
+```perl6
+use Image::Markup::Utilities;
+my $url = 'https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/MarkdownDocuments/Diagrams/AI-vision-via-WL/0iyello2xfyfo.png';
+my $img = image-import($url);
+$img.substr(^100)
+```
+```
+# ![](data:image/jpeg;base64,iVBORw0KGgoAAAANSUhEUgAAArwAAAK8CAIAAACC2PsUAAAA1XpUWHRSYXcgcHJvZmlsZSB0e
+```
+
+Here we apply OpenAI's AI vision model `gpt-4-vision-preview` (which is the default one) over the ***URL of the image***:
+
+```perl6
+llm-vision-synthesize('Describe the image.', $url);
+```
+```
+# The image is an infographic titled "Cyber Week Spending Set to Hit New Highs in 2023". It shows estimated online spending on Thanksgiving weekend in the United States for the years 2019 through 2023, with 2023 being a forecast. The data is presented in a bar chart format, with different colored bars representing each year.
+# 
+# There are three categories on the horizontal axis: Thanksgiving Day, Black Friday, and Cyber Monday. The vertical axis represents spending in billions of dollars, ranging from $0B to $12B.
+# 
+# The bars show an increasing trend in spending over the years for each of the three days. For Thanksgiving Day, the spending appears to have increased from just over $4B in 2019 to a forecast of around $6B in 2023. Black Friday shows a rise from approximately $7B in 2019 to a forecast of nearly $10B in 2023. Cyber Monday exhibits the highest spending, with an increase from around $9B in 2019 to a forecast of over $11B in 2023.
+# 
+# There is an icon of a computer monitor with a shopping tag, indicating the focus on online spending. At the bottom of the image, the source of the data is credited to Adobe Analytics, and the logo of Statista is present, indicating that they have produced or distributed the infographic. There are also two icons, one resembling a Creative Commons license and the other a share or export button.
+```
+
+Here we apply Gemini's AI vision model `gemini-pro-vision` over the image:
+
+```perl6
+llm-vision-synthesize('Describe the image.', $img, e => 'Gemini');
+```
+```
+# The image shows the estimated online spending on Thanksgiving weekend in the United States from 2019 to 2023. The y-axis shows the spending amount in billions of dollars, while the x-axis shows the year. The data is presented in four bars, each representing a different year. The colors of the bars are blue, orange, green, and yellow, respectively. The values for each year are shown below:
+# 
+# * 2019: $7.4 billion
+# * 2020: $9.0 billion
+# * 2021: $10.7 billion
+# * 2022: $11.3 billion
+# * 2023: $12.2 billion (estimated)
+# 
+# The image shows that online spending on Thanksgiving weekend has increased steadily over the years. In 2023, online spending is expected to reach $12.2 billion, up from $7.4 billion in 2019.
+```
+
+**Remark:** Currently, Gemini works with (Base64) images only (and does not with URLs.) OpenAI's vision works with both URLs and images.
+
+
+The function `llm-vision-function` uses the same evaluators (configurations, models) as `llm-vision-synthesize`.
+
+--------
+
 ## Potential problems
 
 With PaLM with certain wrong configuration we get the error: