Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add web search #274

Merged
merged 35 commits into from
Oct 5, 2023
Merged
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
fd98180
add web search
zhiyu-01 Sep 3, 2023
a19aba2
update
zhiyu-01 Sep 3, 2023
ce8a4a6
update
zhiyu-01 Sep 5, 2023
1ce288b
Merge branch 'master' into function
zhiyu-01 Sep 5, 2023
c593ad3
Merge branch 'master' into function
zhiyu-01 Sep 5, 2023
7f8450d
Update web_search.py
zhiyu-01 Sep 7, 2023
5193c01
Merge branch 'master' into function
zhiyu-01 Sep 9, 2023
1f75055
Merge branch 'master' into function
zhiyu-01 Sep 11, 2023
591fa47
update
zhiyu-01 Sep 11, 2023
a882e26
Merge branch 'function' of https://github.com/camel-ai/camel into fun…
zhiyu-01 Sep 11, 2023
5f16411
Update web_search.py
zhiyu-01 Sep 11, 2023
b224c1d
Merge branch 'master' into function
zhiyu-01 Sep 13, 2023
9ecb3e4
Update camel/functions/web_search.py
zhiyu-01 Sep 14, 2023
51d9ca2
Update camel/functions/web_search.py
zhiyu-01 Sep 14, 2023
1cf40ff
Update camel/functions/web_search.py
zhiyu-01 Sep 14, 2023
f3f65dc
Update camel/functions/web_search.py
zhiyu-01 Sep 14, 2023
e8ba969
update
zhiyu-01 Sep 14, 2023
48515fc
update
zhiyu-01 Sep 14, 2023
eb9521c
update
zhiyu-01 Sep 14, 2023
f48f46c
Merge branch 'master' into function
zhiyu-01 Sep 18, 2023
25590ba
update
zhiyu-01 Sep 18, 2023
6444d85
Merge branch 'function' of https://github.com/camel-ai/camel into fun…
zhiyu-01 Sep 18, 2023
13a3bb8
update
zhiyu-01 Sep 18, 2023
70c6ef6
update
zhiyu-01 Sep 19, 2023
685e7d4
update
zhiyu-01 Sep 20, 2023
a7d8dbe
Update search_functions.py
zhiyu-01 Sep 20, 2023
a9435b2
update
zhiyu-01 Sep 20, 2023
5c826ba
Update chat_agent.py
zhiyu-01 Sep 20, 2023
db92799
update
zhiyu-01 Sep 20, 2023
4fc33f7
Merge branch 'master' into function
zhiyu-01 Sep 24, 2023
84f032c
update
zhiyu-01 Sep 24, 2023
a14213f
Merge branch 'function' of https://github.com/camel-ai/camel into fun…
zhiyu-01 Sep 24, 2023
3e2514c
Update test_role_playing.py
zhiyu-01 Sep 24, 2023
2f0dc19
update
zhiyu-01 Sep 27, 2023
6d9b1eb
Merge branch 'master' into function
lightaime Oct 5, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .github/workflows/pytest_apps.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ jobs:
- name: Run pytest
env:
OPENAI_API_KEY: "${{ secrets.OPENAI_API_KEY }}"
GOOGLE_API_KEY: "${{ secrets.GOOGLE_API_KEY }}"
SEARCH_ENGINE_ID: "${{ secrets.SEARCH_ENGINE_ID }}"
run: poetry run pytest -v apps/

pytest_examples:
Expand All @@ -41,4 +43,6 @@ jobs:
- name: Run pytest
env:
OPENAI_API_KEY: "${{ secrets.OPENAI_API_KEY }}"
GOOGLE_API_KEY: "${{ secrets.GOOGLE_API_KEY }}"
SEARCH_ENGINE_ID: "${{ secrets.SEARCH_ENGINE_ID }}"
run: poetry run pytest -v examples/
6 changes: 6 additions & 0 deletions .github/workflows/pytest_package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ jobs:
- name: Run pytest
env:
OPENAI_API_KEY: "${{ secrets.OPENAI_API_KEY }}"
GOOGLE_API_KEY: "${{ secrets.GOOGLE_API_KEY }}"
SEARCH_ENGINE_ID: "${{ secrets.SEARCH_ENGINE_ID }}"
run: poetry run pytest --fast-test-mode test/

pytest_package_llm_test:
Expand All @@ -38,6 +40,8 @@ jobs:
- name: Run pytest
env:
OPENAI_API_KEY: "${{ secrets.OPENAI_API_KEY }}"
GOOGLE_API_KEY: "${{ secrets.GOOGLE_API_KEY }}"
SEARCH_ENGINE_ID: "${{ secrets.SEARCH_ENGINE_ID }}"
run: poetry run pytest --llm-test-only test/

pytest_package_very_slow_test:
Expand All @@ -51,4 +55,6 @@ jobs:
- name: Run pytest
env:
OPENAI_API_KEY: "${{ secrets.OPENAI_API_KEY }}"
GOOGLE_API_KEY: "${{ secrets.GOOGLE_API_KEY }}"
SEARCH_ENGINE_ID: "${{ secrets.SEARCH_ENGINE_ID }}"
run: poetry run pytest --very-slow-test-only test/
3 changes: 2 additions & 1 deletion camel/agents/chat_agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -294,7 +294,8 @@ def step(
a boolean indicating whether the chat session has terminated,
and information about the chat session.
"""
messages = self.update_messages('user', input_message)
messages = self.update_messages(input_message.role_type.value,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we had this bug in the code, please add to the test of ai_society\role_playing.py a check of the sequence of roles:

system
user
assistant
user
assistant
...

or whatever it must be. I expect added checks in test/agents/test_role_playing.py.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait a second. I recall figuring out this complex behavior of changing the roles. I even put an explicit comment:

Its `role` field that specifies the role at backend may be either
            `user` or `assistant` but it will be set to `user` anyway since
            for the self agent any incoming message is external.

also see the doctoring above:

def submit_message(self, message: BaseMessage) -> None:
        r"""Submits the externally provided message as if it were an answer of
        the chat LLM from the backend. Currently, the choice of the critic is
        submitted with this method.

Please let me know if you think this is in the scope of this change or not. If not, make a separate bug ticket and a separate PR with the proper fix and the test, and revert the irrelevant changes in thin PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait. I don't think this is a bug.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why this change is needed. @Obs01ete @zhiyu-01 can you explain?

zhiyu-01 marked this conversation as resolved.
Show resolved Hide resolved
input_message)

output_messages: List[BaseMessage]
info: Dict[str, Any]
Expand Down
259 changes: 256 additions & 3 deletions camel/functions/search_functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,13 @@
# See the License for the specific language governing permissions and
# limitations under the License.
# =========== Copyright 2023 @ CAMEL-AI.org. All Rights Reserved. ===========
from typing import List
import os
from typing import Any, Dict, List

from .openai_function import OpenAIFunction
import camel.agents
from camel.functions import OpenAIFunction
from camel.messages import BaseMessage
from camel.prompts import TextPrompt


def search_wiki(entity: str) -> str:
Expand Down Expand Up @@ -45,6 +49,255 @@ def search_wiki(entity: str) -> str:
return result


def search_google(query: str) -> List[Dict[str, Any]]:
r"""Use google search engine to search information for the given query.

Args:
query (string): The query to be searched.

Returns:
List[Dict[str, Any]]: A list of dictionaries where each dictionary
represents a website.
Each dictionary contains the following keys:
- 'result_id': A number in order.
- 'title': The title of the website.
- 'description': A brief description of the website.
- 'long_description': More detail of the website.
- 'url': The URL of the website.

Example:
{
'result_id': 1,
'title': 'OpenAI',
'description': 'An organization focused on ensuring that
artificial general intelligence benefits all of humanity.',
'long_description': 'OpenAI is a non-profit artificial
intelligence research company. Our goal is to advance digital
intelligence in the way that is most likely to benefit humanity
as a whole',
'url': 'https://www.openai.com'
}
title, descrption, url of a website.
"""
import requests

# https://developers.google.com/custom-search/v1/overview
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
# https://cse.google.com/cse/all
SEARCH_ENGINE_ID = os.getenv("SEARCH_ENGINE_ID")

# Using the first page
start_page_idx = 1
# Different language may get different result
search_language = "en"
# How many pages to return
num_result_pages = 10
# Constructing the URL
# Doc: https://developers.google.com/custom-search/v1/using_rest
url = f"https://www.googleapis.com/customsearch/v1?" \
f"key={GOOGLE_API_KEY}&cx={SEARCH_ENGINE_ID}&q={query}&start=" \
f"{start_page_idx}&lr={search_language}&num={num_result_pages}"

responses = []
# Fetch the results given the URL
try:
# Make the get
result = requests.get(url)
data = result.json()

# Get the result items
if "items" in data:
search_items = data.get("items")

# Iterate over 10 results found
for i, search_item in enumerate(search_items, start=1):
if "og:description" in search_item["pagemap"]["metatags"][0]:
long_description = \
search_item["pagemap"]["metatags"][0]["og:description"]
else:
long_description = "N/A"
# Get the page title
title = search_item.get("title")
# Page snippet
snippet = search_item.get("snippet")

# Extract the page url
link = search_item.get("link")
response = {
"result_id": i,
"title": title,
"description": snippet,
"long_description": long_description,
"url": link
}
responses.append(response)
else:
responses.append({"error": "google search failed."})

except requests.RequestException:
responses.append({"erro": "google search failed."})

return responses


def text_extract_from_web(url: str) -> str:
r"""Get the text information from given url.

Args:
url (string): The web site you want to search.

Returns:
string: All texts extract from the web.
"""
import requests
from bs4 import BeautifulSoup

try:
# Request the target page
response_text = requests.get(url).text

# Parse the obtained page
soup = BeautifulSoup(response_text, features="html.parser")

for script in soup(["script", "style"]):
script.extract()

text = soup.get_text()
# Strip text
lines = (line.strip() for line in text.splitlines())
chunks = (phrase.strip() for line in lines
for phrase in line.split(" "))
text = ".".join(chunk for chunk in chunks if chunk)

except requests.RequestException:
text = f"can't access {url}"

return text


# Split a text into smaller chunks of size n
def create_chunks(text: str, n: int) -> List[str]:
r"""Returns successive n-sized chunks from provided text."

Args:
text (string): The text to be split.
n (int): The max length of a single chunk.

Returns:
List[str]: A list of splited texts.
"""

chunks = []
i = 0
while i < len(text):
# Find the nearest end of sentence within a range of 0.5 * n
# and 1.5 * n tokens
j = min(i + int(1.2 * n), len(text))
while j > i + int(0.8 * n):
# Decode the tokens and check for full stop or newline
chunk = text[i:j]
if chunk.endswith(".") or chunk.endswith("\n"):
break
j -= 1
# If no end of sentence found, use n tokens as the chunk size
if j == i + int(0.8 * n):
j = min(i + n, len(text))
chunks.append(text[i:j])
i = j
return chunks


def prompt_single_step_agent(prompt: str) -> str:
"""Prompt a single-step agent to summarize texts or answer a question."""

assistant_sys_msg = BaseMessage.make_assistant_message(
role_name="Assistant",
content="You are a helpful assistant.",
)
agent = camel.agents.ChatAgent(assistant_sys_msg)
agent.reset()

user_msg = BaseMessage.make_user_message(
role_name="User",
content=prompt,
)
assistant_response = agent.step(user_msg)
if assistant_response.msgs is not None:
return assistant_response.msg.content
return ""


def summarize_text(text: str, query: str) -> str:
r"""Summarize the information from the text, base on the query if query is
given.

Args:
text (string): Text to summarise.
query (string): What information you want.

Returns:
string: Strings with information.
"""
summary_prompt = TextPrompt(
'''Gather information from this text that relative to the question, but
do not directly answer the question.\nquestion: {query}\ntext ''')
summary_prompt = summary_prompt.format(query=query)
# Max length of each chunk
max_len = 3000
results = ""
chunks = create_chunks(text, max_len)
# Summarize
for i, chunk in enumerate(chunks, start=1):
prompt = summary_prompt + str(i) + ": " + chunk
result = prompt_single_step_agent(prompt)
results += result + "\n"

# Final summarise
final_prompt = TextPrompt(
'''Here are some summarized texts which split from one text, Using the
information to answer the question: {query}.\n\nText: ''')
final_prompt = final_prompt.format(query=query)
prompt = final_prompt + results

response = prompt_single_step_agent(prompt)

return response


def search_google_and_summarize(query: str) -> str:
r"""Search webs for information. Given a query, this function will use
the google search engine to search for related information from the
internet, and then return a summarized answer.

Args:
query (string): Question you want to be answered.

Returns:
string: Summarized information from webs.
"""
# Google search will return a list of urls
responses = search_google(query)
for item in responses:
if "url" in item:
url = item.get("url")
# Extract text
text = text_extract_from_web(str(url))
# Using chatgpt summarise text
answer = summarize_text(text, query)

# Let chatgpt decide whether to continue search or not
prompt = TextPrompt(
'''Do you think the answer: {answer} can answer the query:
{query}. Use only 'yes' or 'no' to answer.''')
prompt = prompt.format(answer=answer, query=query)
reply = prompt_single_step_agent(prompt)
if "yes" in str(reply).lower():
return answer

return "Failed to find the answer from google search."


SEARCH_FUNCS: List[OpenAIFunction] = [
OpenAIFunction(func) for func in [search_wiki]
OpenAIFunction(func)
for func in [search_wiki, search_google_and_summarize]
]
30 changes: 29 additions & 1 deletion test/functions/test_search_functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,15 @@
# See the License for the specific language governing permissions and
# limitations under the License.
# =========== Copyright 2023 @ CAMEL-AI.org. All Rights Reserved. ===========
import os

import requests
import wikipedia

from camel.functions.search_functions import search_wiki
from camel.functions.search_functions import (
search_google_and_summarize,
search_wiki,
)


def test_search_wiki_normal():
Expand All @@ -38,3 +44,25 @@ def test_search_wiki_with_ambiguity():
expected_output = wikipedia.summary("New York (state)", sentences=5,
auto_suggest=False)
assert search_wiki("New York") == expected_output


def test_google_api():
# Check the google search api

# https://developers.google.com/custom-search/v1/overview
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
# https://cse.google.com/cse/all
SEARCH_ENGINE_ID = os.getenv("SEARCH_ENGINE_ID")

url = f"https://www.googleapis.com/customsearch/v1?" \
f"key={GOOGLE_API_KEY}&cx={SEARCH_ENGINE_ID}&q=any"
result = requests.get(url)

assert result.status_code == 200


def test_web_search():
query = "What big things are happening in 2023?"
answer = search_google_and_summarize(query)

assert answer is not None