<a href="https://colab.research.google.com/github/MYM110/Programming/blob/main/crewai_sequential_ScrapeWebsiteTool_quickstart.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# crewai-sequential-CSVSearchTool-quickstart
By [Alex Fazio](https://www.x.com/alxfazio)

📂 Github repo: https://github.com/alexfazio/crewai-quickstart

Simplified and tested template of a **sequential** CrewAI crew **extracting and reading the content of a specified website**.

Check the folder named `./output-files` for detailed results from each task or agent, as well as the initial output from the ScrapeWebsiteTool.

Extra Requirements:
- [OpenAI](https://platform.openai.com/playground)/[Groq](https://console.groq.com/settings/organization)/[Anthropic](https://console.anthropic.com/dashboard) API Key

📚 CrewAI docs: https://docs.crewai.com/

📚 ScrapeWebsiteTool docs: https://docs.crewai.com/tools/ScrapeWebsiteTool/

In [None]:
# @title 👨‍🦯 Run this cell to hide all warnings (optional)
# Warning control
import warnings
warnings.filterwarnings('ignore')

# To avoid the restart session warning in Colab, exclude the PIL and
# pydevd_plugins packages from being imported. This is fine because
# we didn't execute the code in the kernel session afterward.

# import sys
# sys.modules.pop('PIL', None)

In [None]:
# @title ⬇️ Install project dependencies by running this cell
# @markdown  🔄 Restart the session and rerun the cell **if Colab requires it**.

%pip install git+https://github.com/joaomdmoura/crewAI.git --quiet
%pip install crewai_tools langchain_openai langchain_groq langchain_anthropic langchain_community cohere --quiet
print("---")
%pip show crewai_tools langchain_openai langchain_groq langchain_anthropic langchain_community cohere

In [None]:
%pip install langchain langchain-google-genai

In [None]:
# @title 🔑 Input API Key by running this cell

import os
from getpass import getpass
from crewai import Agent, Task, Crew, Process
from textwrap import dedent
import google.generativeai as genai
from google.colab import userdata

# Retrieve your API key
GOOGLE_API_KEY = userdata.get('GEMINI_API_KEY')

# ↑ uncomment to use OpenAI's API
# from langchain_groq import ChatGroq
# ↑ uncomment to use Groq's API
# from langchain_anthropic import ChatAnthropic
# ↑ uncomment to use Antrhopic's API
# from langchain_community.chat_models import ChatCohere
# ↑ uncomment to use ChatCohere API

os.environ["GEMINI_API_KEY"] = getpass("Enter GEMINI_API_KEY: ")
# ↑ uncomment to use OpenAI's API
# os.environ["GROQ_API_KEY"] = getpass("Enter GROQ_API_KEY: ")
# ↑ uncomment to use Groq's API
# os.environ["ANTHROPIC_API_KEY"] = getpass("Enter ANTHROPIC_API_KEY: ")
# ↑ uncomment to use Anthropic's API
# os.environ["COHERE_API_KEY"] = getpass("Enter COHERE_API_KEY: ")
# ↑ uncomment to use Cohere's API

# Check if the 'output-files' directory exists, and create it if it doesn't
if not os.path.exists('output-files'):
    os.makedirs('output-files')

In [None]:
# @title 🕸️ Instantiate `ScrapeWebsiteTool` with a webpage `URL`

from crewai import Agent, Task, Crew, Process
import google.generativeai as genai
from crewai_tools import ScrapeWebsiteTool
import datetime

# To enable scraping any website it finds during it's execution

# tool = ScrapeWebsiteTool()

# Initialize the tool with the website URL, so the agent can only scrap the content of the specified website

scraper_website_tool = ScrapeWebsiteTool(website_url=input("Enter website URL: "))

# Extract the text from the site

text = scraper_website_tool.run()

# print(text)

# Save the scraped content to a file in Colab
with open(f'output-files/ScrapeWebsiteTool-output_{datetime.datetime.now().strftime("%Y%m%d_%H%M%S")}.txt', 'w') as file:
    file.write(text)

## Define Agents
In CrewAI, agents are autonomous entities designed to perform specific roles and achieve particular goals. Each agent uses a language model (LLM) and may have specialized tools to help execute tasks.

In [None]:
# @title 🕵🏻 Define your agents

# from langchain_groq import ChatGroq
# ↑ Uncomment to use Groq's API
# from langchain_anthropic import ChatAnthropic
# ↑ Uncomment to use Anthropic's API

agent_1 = Agent(
    role=dedent((
        """
        Defines the agent's function within the crew. It determines the kind of tasks the agent is best suited for.
        """)), # Think of this as the job title
    backstory=dedent((
        """
        Provides context to the agent's role and goal, enriching the interaction and collaboration dynamics.
        """)), # This is the backstory of the agent, this helps the agent to understand the context of the task
    goal=dedent((
        """
        The individual objective that the agent aims to achieve. It guides the agent's decision-making process.
        """)), # This is the goal that the agent is trying to achieve
    tools=[scraper_website_tool],
    allow_delegation=False,
    verbose=True,
    # ↑ Whether the agent execution should be in verbose mode
    max_iter=3,
    # ↑ maximum number of iterations the agent can perform before being forced to give its best answer (generate the output)
    max_rpm=100, # This is the maximum number of requests per minute that the agent can make to the language model
    llm=genai.GenerativeModel("gemini-1.5-flash", temperature=0.8)
    # ↑ uncomment to use OpenAI API + "gpt-4o"
    # llm=ChatGroq(temperature=0.8, model_name="mixtral-8x7b-32768"),
    # ↑ uncomment to use Groq's API + "llama3-70b-8192"
    # llm=ChatGroq(temperature=0.6, model_name="llama3-70b-8192"),
    # ↑ uncomment to use Groq's API + "mixtral-8x7b-32768"
    # llm = ChatAnthropic(model='claude-3-opus-20240229', temperature=0.8),
    # ↑ uncomment to use Anthropic's API + "claude-3-opus-20240229"
)

agent_2 = Agent(
    role=dedent((
        """
        Defines the agent's function within the crew. It determines the kind of tasks the agent is best suited for.
        """)), # Think of this as the job title
    backstory=dedent((
        """
        Provides context to the agent's role and goal, enriching the interaction and collaboration dynamics.
        """)), # This is the backstory of the agent, this helps the agent to understand the context of the task
    goal=dedent((
        """
        The individual objective that the agent aims to achieve. It guides the agent's decision-making process.
        """)), # This is the goal that the agent is trying to achieve
    tools=[scraper_website_tool],
    allow_delegation=False,
    verbose=True,
    # ↑ Whether the agent execution should be in verbose mode
    max_iter=3,
    # ↑ maximum number of iterations the agent can perform before being forced to give its best answer (generate the output)
    max_rpm=100, # This is the maximum number of requests per minute that the agent can make to the language model
    llm=genai.GenerativeModel("gemini-1.5-flash", temperature=0.8)
    # ↑ uncomment to use OpenAI API + "gpt-4o"
    # llm=ChatGroq(temperature=0.8, model_name="mixtral-8x7b-32768"),
    # ↑ uncomment to use Groq's API + "llama3-70b-8192"
    # llm=ChatGroq(temperature=0.6, model_name="llama3-70b-8192"),
    # ↑ uncomment to use Groq's API + "mixtral-8x7b-32768"
    # llm = ChatAnthropic(model='claude-3-opus-20240229', temperature=0.8),
    # ↑ uncomment to use Anthropic's API + "claude-3-opus-20240229"
)

agent_3 = Agent(
    role=dedent((
        """
        Defines the agent's function within the crew. It determines the kind of tasks the agent is best suited for.
        """)), # Think of this as the job title
    backstory=dedent((
        """
        Provides context to the agent's role and goal, enriching the interaction and collaboration dynamics.
        """)), # This is the backstory of the agent, this helps the agent to understand the context of the task
    goal=dedent((
        """
        The individual objective that the agent aims to achieve. It guides the agent's decision-making process.
        """)), # This is the goal that the agent is trying to achieve
    tools=[scraper_website_tool],
    allow_delegation=False,
    verbose=True,
    # ↑ Whether the agent execution should be in verbose mode
    max_iter=3,
    # ↑ maximum number of iterations the agent can perform before being forced to give its best answer (generate the output)
    max_rpm=100, # This is the maximum number of requests per minute that the agent can make to the language model
    llm=genai.GenerativeModel("gemini-1.5-flash", temperature=0.8)
    # ↑ uncomment to use OpenAI API + "gpt-4o"
    # llm=ChatGroq(temperature=0.8, model_name="mixtral-8x7b-32768"),
    # ↑ uncomment to use Groq's API + "llama3-70b-8192"
    # llm=ChatGroq(temperature=0.6, model_name="llama3-70b-8192"),
    # ↑ uncomment to use Groq's API + "mixtral-8x7b-32768"
    # llm = ChatAnthropic(model='claude-3-opus-20240229', temperature=0.8),
    # ↑ uncomment to use Anthropic's API + "claude-3-opus-20240229"
)


## Define Tasks
Tasks in CrewAI are specific assignments given to agents, detailing the actions they need to perform to achieve a particular goal. Tasks can have dependencies and context, and can be executed asynchronously to ensure an efficient workflow.

In [None]:
# @title 📝 Define your tasks

import datetime

task_1 = Task(
    description=dedent((
        """
        A clear, concise statement of what the task entails.
        ---
        VARIABLE 1: "{var_1}"
        VARIABLE 2: "{var_2}"
        VARIABLE 3: "{var_3}"
        Add more variables if needed...
        """)),
    expected_output=dedent((
        """
        A detailed description of what the task's completion looks like.
        """)),
    agent=agent_1,
    output_file=f'output-files/agent_1-output_{datetime.datetime.now().strftime("%Y%m%d_%H%M%S")}.md'
    # ↑ The output of each task iteration will be saved here
)

task_2 = Task(
    description=dedent((
        """
        A clear, concise statement of what the task entails.
        ---
        VARIABLE 1: "{var_1}"
        VARIABLE 2: "{var_2}"
        VARIABLE 3: "{var_3}"
        Add more variables if needed...
        """)),
    expected_output=dedent((
        """
        A detailed description of what the task's completion looks like.
        """)),
    agent=agent_2,
    context=[task_1],
    # ↑ specify which task's output should be used as context for subsequent tasks
    output_file=f'output-files/agent_2-output_{datetime.datetime.now().strftime("%Y%m%d_%H%M%S")}.md'
    # ↑ The output of each task iteration will be saved here
)

task_3 = Task(
    description=dedent((
        """
        A clear, concise statement of what the task entails.
        ---
        VARIABLE 1: "{var_1}"
        VARIABLE 2: "{var_2}"
        VARIABLE 3: "{var_3}"
        Add more variables if needed...
        """)),
    expected_output=dedent((
        """
        A detailed description of what the task's completion looks like.
        """)),
    agent=agent_3,
    context=[task_2],
    # ↑ specify which task's output should be used as context for subsequent tasks
    output_file=f'output-files/agent_3-output_{datetime.datetime.now().strftime("%Y%m%d_%H%M%S")}.md'
    # ↑ The output of each task iteration will be saved here
)

In [None]:
# @title ⌨️ Define any variables you have and input them
print("## Welcome to the YOUR_CREW_NAME")
print('-------------------------------------------')
var_1 = input("What is the  to pass to your crew?\n"),
var_2 = input("What is the  to pass to your crew?\n"),
var_3 = input("What is the  to pass to your crew?\n"),
print("-------------------------------")

In [None]:
# @title 🚀 Get your crew to work!
def main():
    # Instantiate your crew with a sequential process
    crew = Crew(
        agents=[agent_1, agent_2, agent_3],
        tasks=[task_1, task_2, task_3],
        verbose=True,  # You can set it to True or False
        # ↑ indicates the verbosity level for logging during execution.
        process=Process.sequential
        # ↑ the process flow that the crew will follow (e.g., sequential, hierarchical).
    )

    inputs = {
    "var_1": var_1,
    "var_2": var_2,
    "var_3": var_3
    }

    result = crew.kickoff(inputs=inputs)
    print("\n\n########################")
    print("## Here is your custom crew run result:")
    print("########################\n")
    print(result)

    return result

if __name__ == "__main__":
  result = main()

In [None]:
# @title 🖥️ Display the results of your crew as markdown
from IPython.display import display, Markdown

markdown_text = result.raw  # Adjust this based on the actual attribute

# Display the markdown content
display(Markdown(markdown_text))