# Coni
### Who is Coni?   
 * Coni is **a librarian-like LLM agent**, who can organize digital collections (images, text, etc.)
 * This notebooks shows how Coni can **generate image captions** quickly and easily.

### How was Coni created? 
 * Agent creation framework: `Langchain`
 * LLM model:Cohere's  `command-r` LLM model
 * An image captioning tool: using Salesforce's image caption model (`blip-image-captioning-large model`)
 * [Ref](https://github.com/sugarforever/LangChain-Tutorials/blob/main/LangChain_AI_Image_Recognition.ipynb)


# 1. Setting

### To use Cohere's `command-r` LLM model
 * get an API key from https://cohere.com

### Packages

In [1]:
# install the packages if they are not installed yet 
#%pip install langchain-cohere

In [2]:
import os

# to send http requests
import requests

# python imaging library
from PIL import Image

# for using model through langchain_cohere
from langchain_cohere.chat_models import ChatCohere

# for agents
from langchain.tools import BaseTool
from langchain.agents import AgentExecutor
from langchain_cohere.react_multi_hop.agent import create_cohere_react_agent

# for prompting
from langchain_core.prompts import ChatPromptTemplate

# for using BLIP model
from transformers import BlipProcessor, BlipForConditionalGeneration


# load the .env file (for saving API keys, see details in `.env_example file`)
import dotenv
dotenv.load_dotenv()

True

## 2. Create 'Coni'

### 1) Choose an LLM model 

In [3]:
# model id
llm_model_id = "command-r"

# llm model
llm = ChatCohere(model=llm_model_id,temperature=0)

### 2) Prepare tools

In [4]:
# Salesforce's image caption model
image_to_text_model = "Salesforce/blip-image-captioning-large"

# a function for generating image captions
def describeImage(image_url, image_to_text_model =image_to_text_model): 
    processor = BlipProcessor.from_pretrained(image_to_text_model)
    model = BlipForConditionalGeneration.from_pretrained(image_to_text_model)

  # image
    image_object = Image.open(requests.get(image_url, stream=True).raw).convert('RGB')
    inputs = processor(image_object, return_tensors="pt")
    outputs = model.generate(**inputs,max_length= 60)
    return processor.decode(outputs[0], skip_special_tokens=True)

# convert to a langchain tool
class DescribeImageTool(BaseTool):
    name = "Describe Image Tool"
    description = 'use this tool to describe an image.'

    def _run(self, url: str):
        description = describeImage(url)
        return description

# a list of tools to give to the agent
tools = [DescribeImageTool()] 

### 3) Create the agent

In [5]:
# create a cohere react agent
agent = create_cohere_react_agent(
        llm=llm,
        tools=tools, 
        prompt=ChatPromptTemplate.from_template("{input}"),
    )

# calls the agent
agent_executor = AgentExecutor(agent=agent, 
                               tools=tools, 
                               intermediate_steps=False,
                               verbose=False,
                               handle_parsing_errors=True)



## 3. Ask Coni to generate image captions

### Input image

In [6]:
# image url 
image_url = 'https://cf.bstatic.com/xdata/images/hotel/max1024x768/98905830.jpg?k=2b38bc577ed6bc85917a7d1e794c2f4b5904cb506e860913ba19db62444bf014&o=&hp=1'

### Have a look at the image

<div>
<img src="https://cf.bstatic.com/xdata/images/hotel/max1024x768/98905830.jpg?k=2b38bc577ed6bc85917a7d1e794c2f4b5904cb506e860913ba19db62444bf014&o=&hp=1" width="400"/>
</div>

### Generate a caption

In [7]:
# question
input_text = f"Generate a captivating caption for the image:\n{image_url}"

# agent at work 
content=agent_executor.invoke(
                {"input":input_text})

# generated description 
description = content['output']


### Result

In [8]:
# print the caption in red color 
print(f"\x1b[31m\"{description}\"\x1b[0m")

[31m"A monkey in uniform holds a bell, ready to welcome guests to a grand hotel."[0m
