# Coni
### Who is Coni?   
 * Coni is **a librarian-like LLM agent**, who works with digital collections (images, text, etc.)
 * This notebooks shows how Coni can do **visual Q&A**.

### How was Coni created? 
 * Agent creation framework: `Langchain`
 * LLM model:Cohere's  `command-r` LLM model
 * A visual Q&A tool: using Salesforce's visual Q&A model (`blip-vqa-base`)


# 1. Setting

### To use Cohere's `command-r` LLM model
 * get an API key from https://cohere.com

### Packages

In [1]:
# install the packages if they are not installed yet 
#%pip install langchain-cohere

In [2]:
import os

# color text when printing
from termcolor import colored

# to send http requests
import requests

# python imaging library
from PIL import Image

# for using model through langchain_cohere
from langchain_cohere.chat_models import ChatCohere

# for agents
from langchain.tools import BaseTool
from langchain.agents import AgentExecutor
from langchain_cohere.react_multi_hop.agent import create_cohere_react_agent

# for prompting
from langchain_core.prompts import ChatPromptTemplate

# for using BLIP model
from transformers import BlipProcessor, BlipForQuestionAnswering
import torch

# load the .env file (for saving API keys, see details in `.env_example file`)
import dotenv
dotenv.load_dotenv()

True

## 2. Create 'Coni'

### 1) Choose an LLM model 

In [3]:
# model id
llm_model_id = "command-r"

# llm model
llm = ChatCohere(model=llm_model_id,temperature=0)

### 2) Prepare tools

In [4]:
# Salesforce's visual q&a model
vqa_model = "Salesforce/blip-vqa-base"

# a function for do image qa
def VisualQA(image_url,question, vqa_model = vqa_model) : 

    processor = BlipProcessor.from_pretrained(vqa_model)
    model = BlipForQuestionAnswering.from_pretrained(vqa_model)

    # open image
    raw_image = Image.open(requests.get(image_url, stream=True).raw).convert('RGB')
    inputs = processor(raw_image, question, return_tensors="pt")
    
    outputs= model.generate(**inputs, max_length= 1000)
    return processor.decode(outputs[0], skip_special_tokens=True)

# convert to a langchain tool
class VisualQATool(BaseTool):
    name = "Visual QA Tool"
    description = 'use this tool to understand an image and generate content.'

    def _run(self, url: str, question: str):
        description = VisualQA(url,question)
        return description

# a list of tools to give to the agent
tools = [VisualQATool()] 

### 3) Create the agent

In [5]:
# create a cohere react agent
agent = create_cohere_react_agent(
        llm=llm,
        tools=tools,
        prompt=ChatPromptTemplate.from_template("{question}")
    )

# calls the agent
agent_executor = AgentExecutor(agent=agent, 
                               tools=tools, 
                               intermediate_steps=False,
                               verbose=False,
                               handle_parsing_errors=True)



## 3. Ask Coni questions

### Input image

In [6]:
# image url 
image_url = 'https://images.newscientist.com/wp-content/uploads/2018/08/23130240/by0ndk.jpg?width=900'

### Have a look at the image

<div>
<img src='https://images.newscientist.com/wp-content/uploads/2018/08/23130240/by0ndk.jpg?width=900'
    width="500"/>
</div>

### Question 1: What do you find?

In [7]:
# question
question= f"What do you find in :\n{image_url}"

# agent at work 
content=agent_executor.invoke(
                {"question":question})

# generated description 
description = content['output']


# print in color
print(colored(description, 'light_cyan'))

[96mThe image contains clownfish.[0m


### Question 2: How many?

In [8]:
# question
question= f"How many fish do you see in :\n{image_url}"

# agent at work 
content=agent_executor.invoke(
                {"question":question})

# generated description 
description = content['output']


# print in color 
print(colored(description, 'light_cyan'))

[96mI can see two fish in the image.[0m


### Question 3: Tell a story?

In [9]:
# question
question= f"Can you please tell an interesting and creative story in 100 words:\n{image_url}"

# agent at work 
content=agent_executor.invoke(
                {"question":question})

# generated description 
description = content['output']

# print
print(colored(description, 'light_cyan'))

[96mOnce upon a time, in a kingdom beneath the sea, there lived a young clownfish named Barry. Barry was an adventurous soul and dreamed of exploring the world above the waves. One day, while swimming near the coral reef, Barry spotted something strange—a bright, mysterious light beaming from the surface. Curiosity piqued, he decided to investigate. Swimming closer, he was amazed to discover a magical portal opening into a vibrant, unknown realm. Excited, Barry swam through the portal and found himself in a colourful new world, full of strange and wonderful creatures.[0m
