<a href="https://colab.research.google.com/github/barbaroja2000/agents/blob/main/crew_ai_exa_anthropic_claude3_haiku_confluence.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Autonomous Research Agent - Haiku

* Research Agent with Crewai as orchestrator
* Search via:
  * [EXA.ai](https://exa.ai) - Ancedata but seems to generate better quality search output than Tavily - YMMV mind....
  * Arxiv (abstract only, last years papers)
* Super fast [Anthropic Claude 3 Haiku model](https://www.anthropic.com/news/claude-3-haiku)
* Prompts (Agent Backstory & task) finessed using Anthropic [meta-prompt](https://colab.research.google.com/drive/1SoAajN8CBYTl79VyTwxtxncfCWlHlyy9)
* Publishes to Confluence
* Monitoring in Langsmith

In [1]:
!pip install -Uq 'crewai[tools]'  pymupdf arxiv crewai langchain-exa langchain-anthropic atlassian-python-api markdown2 bs4 requests langchain langchain_community

In [2]:
research_topic= "AI - Agents"
model="claude-3-haiku-20240307"

In [3]:
#@title Imports

import os
from crewai import Agent
import markdown2
import os
import json
from bs4 import BeautifulSoup
import json
from crewai_tools import tool
from exa_py import Exa
from atlassian import Confluence
from langchain_community.utilities import ArxivAPIWrapper

In [4]:
#@title Passwords etc

from google.colab import userdata
exa_api_key=userdata.get('exa_api_key')
anthropic_api_key=userdata.get('anthropic_api_key')

os.environ["CONFLUENCE_API_KEY"]=userdata.get('confluence_api_key')
os.environ["CONFLUENCE_USERNAME"]=userdata.get('confluence_username')
os.environ["CONFLUENCE_URI"]=userdata.get('confluence_uri')

In [5]:
#@title Langsmith
import random
import string

# Generate a random string of 6 characters, including letters and digits
random_string = ''.join(random.choices(string.ascii_letters + string.digits, k=6))

os.environ["LANGCHAIN_API_KEY"] =os.environ["LANGCHAIN_HUB_API_KEY"] = userdata.get("langchain_api_key")
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = f"Autonomous Research Agent - Opus"

In [6]:
class PaperFilter:
    def __init__(self):
        self.current_date = datetime.now()
        self.list_strings = []
        self.parsed_papers = []

    def load_papers(self, papers_string):
        self.list_strings = papers_string.split('\n\n')  # Splitting the input string into list of papers
        self.parsed_papers = [self._parse_to_dict(item) for item in self.list_strings]

    def _parse_to_dict(self, list_string):
        summary_index = list_string.find('\nSummary:')
        summary_text = ''
        if summary_index != -1:
            summary_text = list_string[summary_index + 1:]
            list_string = list_string[:summary_index]

        key_value_pairs = list_string.split('\n')
        info_dict = {pair.split(': ')[0]: pair.split(': ')[1] if ': ' in pair else '' for pair in key_value_pairs}

        if summary_text:
            info_dict['Summary'] = summary_text.split(': ', 1)[1]

        return info_dict

    def _is_recent(self, published_date):
        published_datetime = datetime.strptime(published_date, "%Y-%m-%d")
        return (self.current_date - published_datetime) <= timedelta(days=365)

    def _dict_to_string(self, dictionary):
        lines = [f"{key}: {value}" for key, value in dictionary.items()]
        return '\n'.join(lines)

    def filter(self):
        recent_papers = [paper for paper in self.parsed_papers if self._is_recent(paper['Published'])]
        papers_string = "\n\n".join([self._dict_to_string(paper) for paper in recent_papers])
        return papers_string

In [7]:
#@title Tools

import random
import string
from datetime import datetime, timedelta

exa = Exa(api_key=exa_api_key)
paper_filter = PaperFilter()

"""
ARXIV_MAX_QUERY_LENGTH = 300,
load_max_docs = 3,
load_all_available_meta = False,
doc_content_chars_max = 40000
"""

arxiv = ArxivAPIWrapper(top_k_results = 20,  load_max_docs = 20, load_all_available_meta = True, ARXIV_MAX_QUERY_LENGTH = 40000, doc_content_chars_max = 40000)

@tool("Arxiv search tool")
def search_arxiv(query: str):
  """Search for a research papers based on the query"""
  docs_string = arxiv.run(query)
  paper_filter.load_papers(docs_string)
  return paper_filter.filter()

@tool("EXA search tool")
def search(query: str):
    """Search for a webpage based on the query."""
    return exa.search(f"{query}", use_autoprompt=True, num_results=5)

@tool("EXA similar pages tool")
def find_similar(url: str):
    """Search for webpages similar to a given URL.
    The url passed in should be a URL returned from `search`.
    """
    return exa.find_similar(url, num_results=5)

@tool("EXA get page contents tool")
def get_contents(ids: list[str]):
    """Get the contents of a webpage.
    The ids passed in should be a list of ids returned from `search`.
    """
    return exa.get_contents(ids)

@tool("Confluence Publisher")
def confluence_publisher_tool(content: str) -> str:
    """Use this tool to publish to Confluence"""
    confluence_uri = os.environ["CONFLUENCE_URI"]
    username = os.environ["CONFLUENCE_USERNAME"]
    password = os.environ["CONFLUENCE_API_KEY"]
    space_key = 'AI'

    # Convert Markdown content of the Article to HTML
    html_content = markdown2.markdown(content)

    soup = BeautifulSoup(html_content, 'html.parser')

    # Extract the first <h1> tag content for the title
    h1_tag = soup.find('h1')

    page_title = f"{h1_tag.text}" if h1_tag else f"No title found. - {random_string}"

    # Remove the <h1> tag to get the rest of the HTML content without the title
    if h1_tag:
        h1_tag.decompose()

    html_content = str(soup)

    # Initialize Confluence client
    confluence = Confluence(
        url=confluence_uri,
        username=username,
        password=password
    )

    # Get page by title
    parent_info = confluence.get_page_by_title(space_key, "Research")

    # Extract the page ID
    parent_id = parent_info['id']

    # Create or update page with HTML content
    response = confluence.create_page(
        space_key,
        page_title,
        html_content,
        parent_id=parent_id,
        type='page',
        representation='storage',
        editor='v2',
        full_width=False
    )

    return json.dumps(response)

In [8]:
#@title Model

from langchain_anthropic import ChatAnthropic
from langchain_core.prompts import ChatPromptTemplate

model = ChatAnthropic(model=model, anthropic_api_key=anthropic_api_key, temperature=0)

In [9]:
#@title Prompts from gists

prompt_dict = {
    "crewai_writer_agent_backstory": "https://gist.githubusercontent.com/barbaroja2000/46d1a4412dd9f8ca6afae2ff87dda346/raw/a061ecaadd54ea096a7679cd33bc906c3fe64b78/crewai_writer_agent_backstory",
    "crewai_researcher_agent_backstory": "https://gist.githubusercontent.com/barbaroja2000/40cd328296296084ef6ede51add117ee/raw/528a8383023dd48a1af0e86e35bb244751b130bc/crewai_researcher_agent_backstory",
    "crewai_research_task" : "https://gist.githubusercontent.com/barbaroja2000/aceab69b70b3a48127695e92d490c881/raw/78f76757535b5714e927a33a90367fd2c5b4fdbb/crewai_research_task",
    "crewai_writer_task" : "https://gist.githubusercontent.com/barbaroja2000/304a343fb1780916e480847a53461207/raw/d9654cd2930bdff469808b236e0e38dff352f1d3/crewai_writer_task",
}

In [10]:
#@title Get Prompts

import requests

# Function to download and return Gist content
def download_gist(gist_url):
    try:
        response = requests.get(gist_url)
        response.raise_for_status()  # Raise an error for bad responses
        return response.text
    except requests.RequestException as e:
        print(f"Error downloading Gist: {e}")
        return None

for key, value in prompt_dict.items():
  globals()[key] = download_gist(value)
  print(f"{key}\n", globals()[key])

crewai_writer_agent_backstory
 Expert in tech content strategy with a deep understanding of the tech industry.

Skills:

* Converts complex technical concepts into engaging, easy-to-understand content.
* Expert in structuring articles for readability and engagement.
* Uses subheadings, bullet points, and lists for clarity.
* Applies analogies and real-world examples to demystify technical topics.
* Incorporates visuals like diagrams and infographics to support text.
* Varies sentence structure for dynamic reading.

Signature Style:

* Strong openings and closings that resonate with readers.
* Ability to refine and polish drafts into polished pieces.
* A meticulous attention to detail, aiming to inform, engage, and inspire.

Impact: 

* Transforms technical, dry subjects into compelling reads, making technology accessible to a broader audience.
crewai_researcher_agent_backstory
 I am an advanced AI research assistant, I have been developed by leading experts in the field of artificial i

In [11]:
#@title Agents

max_iter=10

# Creating a senior researcher agent with memory and verbose mode
researcher = Agent(
  role='Senior Researcher',
  goal='Efficiently gather, synthesize, and present the most relevant and accurate information on {topic}',
  verbose=True,
  memory=True,
  max_iter=max_iter,
  backstory=(crewai_researcher_agent_backstory),
  allow_delegation=False,
  llm=model

)

# Creating a writer agent with custom tools and delegation capability
writer = Agent(
  role='Writer',
  goal='Narrate compelling tech stories about {topic}',
  verbose=True,
  memory=True,
  max_iter=max_iter,
  backstory=(crewai_writer_agent_backstory),
  allow_delegation=False,
  llm=model
)


# Creating a publisher agent
publisher = Agent(
  role='Publisher',
  goal='Publishes research to Confluence',
  verbose=True,
  memory=True,
  max_iter=max_iter,
  backstory=("I am an AI Publisher. My sole function is to publish finished markdown articles to Confluence."),
  tools=[confluence_publisher_tool],
  allow_delegation=False,
  llm=model
)

In [12]:
#@title Tasks

from crewai import Task
from pydantic import BaseModel

# Research task
research_task = Task(
  description=(crewai_research_task),
  expected_output="A comprehensive 12 paragraph report on {topic}",
  tools=[search, search_arxiv],
  agent=researcher
)

# Writing task with language model configuration
write_task = Task(
  description=(crewai_writer_task),
  expected_output="A comprehensive 8 paragraph article, with 4-5 key points bulleted at the end. All formatted in markdown",
  agent=writer,
  async_execution=False,
  output_file=f" {random_string} - article.md"  # Example of output customization
)


# Writing task with language model configuration
publish_task = Task(
  description=(
"Publish task to Confluence"
  ),
  expected_output="An article published to confluence.",
  tools=[confluence_publisher_tool],
  agent=publisher,
  async_execution=False
)


In [13]:
#@title Crew

from crewai import Crew, Process

# Forming the tech-focused crew with enhanced configurations
crew = Crew(
  agents=[researcher, writer, publisher],
  tasks=[research_task, write_task, publish_task],
  process=Process.sequential  # Optional: Sequential task execution is default
)

In [14]:
#@title Kick off

# Starting the task execution process with enhanced feedback
result = crew.kickoff({"topic": research_topic})
print(result)



[1m> Entering new CrewAgentExecutor chain...[0m
[32;1m[1;3mThought: To provide a comprehensive report on AI - Agents, I will need to gather information from authoritative sources on the key concepts, terminology, areas of study, theories, frameworks, and current state of research in this field.

Action: Arxiv search tool
Action Input: {"query": "AI agents"}[0m[93m 

Published: 2024-03-22
Title: CACA Agent
Authors: Peng Xu, Haoran Wang, Chuang Wang, Xu Liu
Summary: As AI Agents based on Large Language Models (LLMs) have shown potential in
practical applications across various fields, how to quickly deploy an AI agent
and how to conveniently expand the application scenario of AI agents has become
a challenge. Previous studies mainly focused on implementing all the reasoning
capabilities of AI agents within a single LLM, which often makes the model more
complex and also reduces the extensibility of AI agent functionality. In this
paper, we propose CACA Agent (Capability Collaborat