In [None]:
%pip install -U qU langchain_community

In [None]:
%pip install -U qU llama-cpp-python

The Google model Gemma2 in the 2B parameter version (the smalles available). The 9B version didn't work, there is also a 27B which, due to the size, I haven't tried. The model showcases its exceptional conversational AI prowess by outperforming all GPT-3.5 models on the Chatbot Arena at a size runnable on edge devices.
Through Ollama (which needs to be running by using the command ./bin/ollama serve on the terminal) it is possible to download on the device the specific model.

In [1]:
!./bin/ollama pull gemma2:2b

Error: could not connect to ollama app, is it running?


The model can then be loaded and the ChatOllama package allows to instantiate our model object and generate chat completions. The parameters are the sampling temperature, which ranges from 0 to 1 and the max number of tokens to generate.

In [2]:
from langchain_ollama import ChatOllama

model_id = "gemma2:2b"
llm = ChatOllama(model=model_id, temperature=0, num_predict=256)

Alternatively, LlamaCpp can be used

In [3]:
'''from langchain_core.callbacks import CallbackManager, StreamingStdOutCallbackHandler
from langchain_core.prompts import PromptTemplate
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

llm = LlamaCpp(
    model_path=model_path,
    temperature=0.75,
    max_tokens=2000,
    n_ctx=2048,
    top_p=1,
    callback_manager=callback_manager,
    verbose=True,  # Verbose is required to pass to the callback manager
)'''

'from langchain_core.callbacks import CallbackManager, StreamingStdOutCallbackHandler\nfrom langchain_core.prompts import PromptTemplate\ncallback_manager = CallbackManager([StreamingStdOutCallbackHandler()])\n\nllm = LlamaCpp(\n    model_path=model_path,\n    temperature=0.75,\n    max_tokens=2000,\n    n_ctx=2048,\n    top_p=1,\n    callback_manager=callback_manager,\n    verbose=True,  # Verbose is required to pass to the callback manager\n)'

The memory command invoked here keeps a list of the interactions of the conversation over time. It only uses the last K interactions. This can be useful for keeping a sliding window of the most recent interactions, so the buffer does not get too large.

In [4]:
from langchain.chains.conversation.memory import ConversationBufferWindowMemory
# initialize conversational memory
conversational_memory = ConversationBufferWindowMemory(
        memory_key='chat_history',
        k=2,
        return_messages=True
)

  conversational_memory = ConversationBufferWindowMemory(


In [5]:
%%time
question = """
What are the solar system planets?
"""
llm.invoke(question)


CPU times: user 57.5 ms, sys: 6.29 ms, total: 63.8 ms
Wall time: 8.89 s


AIMessage(content='Here are the planets in our solar system, in order from closest to furthest from the Sun:\n\n**Inner, Rocky Planets:**\n\n* **Mercury:** The smallest planet, very close to the sun. \n* **Venus:** Earth\'s "twin" in size but with a thick atmosphere that traps heat and makes it extremely hot.\n* **Earth:** Our home!  The only known planet to support life.\n* **Mars:** Known as the "Red Planet" due to its rusty-colored surface, Mars has evidence of past water and potential for future exploration.\n\n**Outer, Gas Giants:**\n\n* **Jupiter:** The largest planet in our solar system, a gas giant with a Great Red Spot (a massive storm).\n* **Saturn:** Famous for its prominent rings made of ice and rock particles. \n* **Uranus:** An icy giant tilted on its side, giving it unique rotation patterns.\n* **Neptune:**  The farthest planet from the sun, known for its strong winds and blue color.\n\n\nLet me know if you\'d like to learn more about any specific planet! 🪐 \n', addition

In [10]:
%%time
question = """
Question: What is PETRA III at DESY
"""
llm.invoke(question)

CPU times: user 58.4 ms, sys: 6.35 ms, total: 64.7 ms
Wall time: 2.33 s


AIMessage(content="PETRA III is a **high-energy particle accelerator** located at the Deutsches Elektronen-Synchrotron (DESY) in Hamburg, Germany. \n\nHere's a breakdown of what makes it significant:\n\n* **Purpose:**  It's primarily used for **research in nuclear physics**, specifically studying the properties and interactions of fundamental particles like protons, neutrons, and electrons.\n* **Technology:** PETRA III is a **third-generation synchrotron radiation facility**. This means it uses electromagnetic fields to accelerate charged particles to extremely high energies (up to 10 GeV). These particles then interact with matter, producing beams of X-rays that can be used for various scientific experiments.\n* **Applications:**  PETRA III's research focuses on:\n    * **Nuclear structure and dynamics:** Understanding the internal structure of atomic nuclei and how they behave under extreme conditions.\n    * **Fundamental interactions:** Investigating the forces governing the univer

In [6]:
import urllib.request

import cv2 as cv
import funkybob
import numpy as np
from duckduckgo_search import DDGS
from langchain.agents import (
    AgentExecutor,
    create_structured_chat_agent,
    create_tool_calling_agent,
)
from langchain.tools.render import render_text_description
from langchain_core.output_parsers import StrOutputParser
from langchain_core.tools import tool

We define a set of specific tools to handle the problems our LLM agent will have to face as a lab assistant. We create tools from functions using the @tool decorator
The function's docstring will be used by the LLM to decide when the tool is needed to fulfill the user request.
Here we created a function which captures images (from the specified camera extension if availabe: 0 is the integrated camera 1 is the external one) and saves them in the current folder with random names.


In [32]:
@tool
def click_new_image(camera_extension: int=1) -> str:
    """Captures an image from the connected camera and saves it."""
    # Initialize the camera
    cap = cv.VideoCapture(camera_extension)

    if not cap.isOpened():
        #    return "Error: Could not open camera."
        frame = cv.imread(
            "IMG20240614134009.jpg"
        )
    # Save the captured image
    generator = funkybob.RandomNameGenerator()
    it = iter(generator)
    random_name = next(it)
    image_path = random_name + ".jpg"
    cv.imwrite(image_path, frame)

    # Release the camera
    cap.release()

    return image_path

In [23]:
from glob import glob
import os
import random
from ultralytics import YOLO
from typing import Any
import os.path

This yolo_tool applies the CV model to the specified image, returning the detected object and the annotated image (with binding boxes)

In [24]:
@tool
def yolo_tool(content: str) :
    """Takes an image (or video, but needs adjustments) at the path in input, and runs the computer vision model to perform object detection. It then returns the list of detected objects and 
    the filename of the annotated image."""
    # load the yolo model
    yolo_model=YOLO('model/demo_best.pt')
    img=content.lstrip()
    results = yolo_model(img)
    detections=results[0]
    frame_out=cv.imread(img)
    obj=[]
    
    for detection in detections.boxes.data.tolist():
        x1, y1, x2, y2, score, class_id = detection
        
        if score > 0.4:
            label_bounding_boxes = []
            label_bounding_boxes.append([x1, y1, x2, y2, score, class_id])
    
            for label in label_bounding_boxes:
                label_x1, label_y1, label_x2, label_y2,label_score,label_id = label          
                obj.append(detections.names[label_id])
                (text_width, text_height), _ = cv.getTextSize(detections.names[label_id], cv.FONT_HERSHEY_SIMPLEX, 2, 6)
                cv.putText(frame_out, detections.names[label_id], (int((label_x2 + label_x1 - text_width)/2), int(label_y1 - text_height)), 
                               cv.FONT_HERSHEY_SIMPLEX, 2, (0, 255, 0), 6)
                cv.rectangle(frame_out, (int(label_x1), int(label_y1)), (int(label_x2), int(label_y2)), (0, 0, 255), 4)
            
    file_out=(img.split('.jpg')[0]).split('/')[-1]+'_annotated.jpg'
    cv.imwrite(file_out,frame_out)
    out={i: obj.count(i) for i in obj}
    
    return out, file_out

In [33]:
yolo_tool("/home/veglia/beegfs/yolov5/datasets/demo1/images/val/IMG20240614133953.jpg")




FileNotFoundError: IMG20240614133953.jpg does not exist

In [25]:
from langchain.tools import DuckDuckGoSearchRun

ddg_search = DuckDuckGoSearchRun()
from langchain.agents import Tool

search_tool = Tool(
    name="search_tool",
    description="A search tool used to query DuckDuckGo for search results when trying to find information from the internet.",
    func=ddg_search.run,
)

This tool take the transmitted content and write it tinto the 'logbook'

In [26]:
@tool
def save_to_logbook(content: Any) -> str:
    """Takes the content input and writes it into the logbook file"""
    # Initialize the logbook
    logfile='logbook.txt'
    document = open(logfile, 'a+')

    #if content is an image it will just write the filepath
    document.write(content+ "\n")
    document.close()
    
    return content

In [27]:
# List of tools
tools = [click_new_image, search_tool, yolo_tool, save_to_logbook]

The prompt is the bridge connecting the human intent to machine-generated responses. The appropriate prompt template would guide the model, providing context and refining the outputs.

In [28]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

system = """You are a helpful AI assistant, you are an agent capable of using a variety of tools to answer a question. Here are a few of the tools available to you:

{tools}

Use a json blob to specify a tool by providing an action key (tool name) and an action_input key (tool input).

Valid "action" values: "Final Answer" or {tool_names}

Provide only ONE action per $JSON_BLOB, as shown:

```
{{
  "action": $TOOL_NAME,
  "action_input": $INPUT
}}
```

Follow this format:

Question: input question to answer
Thought: consider previous and subsequent steps
Action:
```
$JSON_BLOB
```
Observation: action result
... (repeat Thought/Action/Observation N times)
Thought: I know what to respond
Action:
```
{{
  "action": "Final Answer",
  "action_input": "Final response to human"
}}

Begin! Reminder to ALWAYS respond with a valid json blob of a single action. Use tools if necessary. Respond directly if appropriate. Format is Action:```$JSON_BLOB```then Observation"""

human = """{input}

{agent_scratchpad}

(reminder to respond in a JSON blob no matter what)"""

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        MessagesPlaceholder("chat_history", optional=True),
        ("human", human),
    ]
)

In [29]:
# initialize agent with tools
agent = create_structured_chat_agent(llm=llm, tools=tools, prompt=prompt)

In [30]:
agent_executor = AgentExecutor(
    agent=agent, tools=tools, verbose=True, handle_parsing_errors=True, return_intermediate_steps=True
)

In [31]:
%%time
agent_executor.invoke({"input": "Please, take a picture and detect the objects in it. Then, add the image into the logbook."})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mAction: ```json
{
  "action": "click_new_image",
  "action_input": {} 
}
``` [0m

[ WARN:0@518.593] global cap_v4l.cpp:997 open VIDEOIO(V4L2:/dev/video1): can't open camera by index
[ERROR:0@518.593] global obsensor_uvc_stream_channel.cpp:159 getStreamChannelGroup Camera index out of range


[36;1m[1;3mreverent_khorana.jpg[0m[32;1m[1;3mAction: ```json
{
  "action": "yolo_tool",
  "action_input": "reverent_khorana.jpg" 
}
``` [0m
image 1/1 /gpfs/dust/maxwell/user/veglia/Langchain/reverent_khorana.jpg: 640x288 1 camera, 1 stage, 1 laser, 1 rasp. pi, 213.0ms
Speed: 6.3ms preprocess, 213.0ms inference, 1212.2ms postprocess per image at shape (1, 3, 640, 288)
[38;5;200m[1;3m({'laser': 1, 'camera': 1, 'rasp. pi': 1, 'stage': 1}, 'reverent_khorana_annotated.jpg')[0m[32;1m[1;3mAction: ```json
{
  "action": "save_to_logbook",
  "action_input": "reverent_khorana_annotated.jpg" 
}
``` [0m[36;1m[1;3mreverent_khorana_annotated.jpg[0m[32;1m[1;3mAction: ```json
{
  "action": "Final Answer",
  "action_input": "I have taken a picture of the object and annotated it."
}
``` [0m

[1m> Finished chain.[0m
CPU times: user 2.71 s, sys: 772 ms, total: 3.48 s
Wall time: 24.2 s


{'input': 'Please, take a picture and detect the objects in it. Then, add the image into the logbook.',
 'output': 'I have taken a picture of the object and annotated it.',
 'intermediate_steps': [(AgentAction(tool='click_new_image', tool_input={}, log='Action: ```json\n{\n  "action": "click_new_image",\n  "action_input": {} \n}\n``` '),
   'reverent_khorana.jpg'),
  (AgentAction(tool='yolo_tool', tool_input='reverent_khorana.jpg', log='Action: ```json\n{\n  "action": "yolo_tool",\n  "action_input": "reverent_khorana.jpg" \n}\n``` '),
   ({'laser': 1, 'camera': 1, 'rasp. pi': 1, 'stage': 1},
    'reverent_khorana_annotated.jpg')),
  (AgentAction(tool='save_to_logbook', tool_input='reverent_khorana_annotated.jpg', log='Action: ```json\n{\n  "action": "save_to_logbook",\n  "action_input": "reverent_khorana_annotated.jpg" \n}\n``` '),
   'reverent_khorana_annotated.jpg')]}