# Assignment: Define Agent Roles and Tools Using a YAML Config


---

### Objective:
This assignment will guide you through defining AI **agent roles, goals, backstories, and tools using an external YAML configuration file**. This approach promotes modularity, reusability, and easier management of complex multi-agent systems. You will build a simple data analysis workflow, where agent definitions are loaded dynamically from a YAML file, demonstrating a more production-ready way to manage your CrewAI agents.

---

### Instructions:
1.  **LLM Access**: You'll need access to an LLM API (e.g., Google's Gemini, OpenAI's GPT-4). For this assignment, we'll primarily use **Google's Gemini Pro model**.
2.  **Environment Setup**: Install the necessary Python libraries: `pip install crewai crewai_tools google-generativeai pyyaml`.
3.  **API Key**: Securely handle your API key. It's best practice to load it from an environment variable.
4.  **Jupyter Notebook**: All your code, YAML content, outputs, observations, and analysis must be documented in this Jupyter Notebook.
5.  **YAML File**: You will create a separate `.yaml` file for agent and tool definitions.
6.  **Task Scenario**: You will build a workflow where agents analyze a small dataset (simulated or real) and provide insights.
7.  **Analysis**: Evaluate the benefits of using YAML for configuration and the effectiveness of your agents.

---

## Part 1: Setup and YAML File Creation
Begin by configuring your LLM and creating the YAML file that will hold your agent and tool definitions.

### Task 1.1: Install Libraries and Configure LLM
Install `crewai`, `google-generativeai`, and `pyyaml`, then set up your Google API key and initialize the LLM.

In [None]:
# Install necessary libraries (if not already installed)
# !pip install crewai crewai_tools google-generativeai pyyaml --quiet

import os
import yaml
from crewai import Agent, Task, Crew, Process
from crewai_tools import Tool # We'll define a custom tool
from langchain_google_genai import ChatGoogleGenerativeAI

# --- YOUR API KEY HERE ---
os.environ["GOOGLE_API_KEY"] = "YOUR_GOOGLE_API_KEY_HERE" # Replace with your actual Google API key

# Initialize the LLM (Gemini Pro)
llm = ChatGoogleGenerativeAI(model="gemini-pro", temperature=0.5)

print("CrewAI environment and LLM setup complete!")

### Task 1.2: Create `agents_config.yaml`
Create a file named `agents_config.yaml` in the same directory as this notebook. This file will define your agents and tools. Below is the content you should put into it. You will create two agents: a **Data Analyst** and an **Insight Generator**.

**`agents_config.yaml` Content:**
```yaml
agents:
  - id: data_analyst
    role: Data Analyst
    goal: Analyze provided data to identify key trends, anomalies, and statistical insights.
    backstory: An expert in statistical analysis and data interpretation. Transforms raw data into actionable insights, focusing on accuracy and detail.
    verbose: true
    allow_delegation: false
    tools:
      - name: analyze_data_tool
        description: A tool to perform basic data analysis on a given dataset (simulated).
        function_name: analyze_data
  
  - id: insight_generator
    role: Insight Generator
    goal: Synthesize data analysis findings into clear, concise, and actionable business insights and recommendations.
    backstory: A seasoned business strategist with a knack for distilling complex analytical results into strategic recommendations. Focuses on clarity and practical application.
    verbose: true
    allow_delegation: false
    # No tools needed for this agent, as it works with the output of the Data Analyst

tools:
  - name: analyze_data_tool
    description: A tool to perform basic data analysis on a given dataset (simulated).
    function_name: analyze_data
```

In [None]:
# Create the agents_config.yaml file programmatically (for convenience in notebook environment)
yaml_content = """
agents:
  - id: data_analyst
    role: Data Analyst
    goal: Analyze provided data to identify key trends, anomalies, and statistical insights.
    backstory: An expert in statistical analysis and data interpretation. Transforms raw data into actionable insights, focusing on accuracy and detail.
    verbose: true
    allow_delegation: false
    tools:
      - name: analyze_data_tool
        description: A tool to perform basic data analysis on a given dataset (simulated).
        function_name: analyze_data

  - id: insight_generator
    role: Insight Generator
    goal: Synthesize data analysis findings into clear, concise, and actionable business insights and recommendations.
    backstory: A seasoned business strategist with a knack for distilling complex analytical results into strategic recommendations. Focuses on clarity and practical application.
    verbose: true
    allow_delegation: false

tools:
  - name: analyze_data_tool
    description: A tool to perform basic data analysis on a given dataset (simulated).
    function_name: analyze_data
"""

with open("agents_config.yaml", "w") as f:
    f.write(yaml_content)

print("agents_config.yaml created successfully!")

---

## Part 2: Define and Load Custom Tools
Before loading agents, you need to define the actual Python functions that your tools will wrap. Then, you'll load the YAML and link the tools.

### Task 2.1: Implement a Custom Tool Function
Create a Python function that simulates data analysis. This function will be called by your `Data Analyst` agent.

* **Function Name**: `analyze_data`
* **Parameters**: Takes `data_description` (string) as input.
* **Behavior**: Return a simulated analysis result. For instance, if `data_description` contains "sales", return sales trends. If it contains "customer feedback", return sentiment analysis. Keep it simple for this assignment, just returning descriptive text.
* **Tool**: Create a `crewai_tools.Tool` instance wrapping this function.

In [None]:
def analyze_data(data_description: str) -> str:
    """
    A simulated tool to analyze data based on its description.
    Returns a summary of insights derived from the 'analysis'.
    """
    print(f"\n--- Tool: Analyzing data for '{data_description}' ---")
    if "sales" in data_description.lower():
        return (
            "Simulated Sales Data Analysis: Q1 sales increased by 15% YoY, driven by "
            "new product launches. Top performing region: North America (25% growth). "
            "Bottom performing: Europe (2% decline). Average order value increased by 8%."
        )
    elif "customer feedback" in data_description.lower() or "sentiment" in data_description.lower():
        return (
            "Simulated Customer Feedback Analysis: 70% positive sentiment, 20% neutral, 10% negative. "
            "Key positive themes: 'ease of use', 'great support'. Key negative themes: 'pricing', 'buggy software'."
        )
    elif "website traffic" in data_description.lower():
        return (
            "Simulated Website Traffic Analysis: 30% increase in unique visitors last month. "
            "Bounce rate is 45%. Top referral source: social media (40%). Most popular page: product features (2x others)."
        )
    else:
        return (
            f"Simulated General Data Analysis for '{data_description}': "
            "Overall data shows a positive trend. Further specific analysis might be required."
        )

analyze_data_tool = Tool(
    name="analyze_data_tool", # This name MUST match the 'name' in your YAML for the tool
    description="A tool to perform basic data analysis on a given dataset.",
    func=analyze_data # Link the Python function here
)

available_tools = {"analyze_data_tool": analyze_data_tool}

print("Custom tool 'analyze_data_tool' implemented and ready!")

### Task 2.2: Load Agents from YAML
Load your `agents_config.yaml` file and instantiate `Agent` objects using the loaded configuration. Map the tools defined in the YAML to your Python `Tool` instances.

In [None]:
def load_agents_from_yaml(config_file: str, llm, tools_map: dict):
    with open(config_file, 'r') as f:
        config = yaml.safe_load(f)

    agents = {}
    for agent_config in config['agents']:
        agent_id = agent_config.pop('id')
        agent_tools = []
        if 'tools' in agent_config:
            for tool_def in agent_config.pop('tools'):
                tool_name = tool_def['name']
                if tool_name in tools_map:
                    agent_tools.append(tools_map[tool_name])
                else:
                    print(f"Warning: Tool '{tool_name}' not found in tools_map for agent '{agent_id}'.")

        agents[agent_id] = Agent(
            llm=llm,
            tools=agent_tools,
            **agent_config
        )
    return agents

# Load agents using the function
loaded_agents = load_agents_from_yaml('agents_config.yaml', llm, available_tools)

data_analyst_agent = loaded_agents['data_analyst']
insight_generator_agent = loaded_agents['insight_generator']

print("Agents loaded from YAML:")
print(f"  - Data Analyst Role: {data_analyst_agent.role}")
print(f"  - Insight Generator Role: {insight_generator_agent.role}")

---

## Part 3: Define Tasks and Create the Crew
Define the tasks for your loaded agents and then assemble them into a Crew to run the workflow.

### Task 3.1: Define Tasks
Create two tasks for your workflow: one for data analysis and another for generating insights. Ensure the second task uses the output of the first as context.

* **Analyze Data Task**:
    * **Description**: "Analyze the provided sales data for Q1 2024 and identify key trends, growth areas, and any significant anomalies. Focus on identifying regional performance differences and product category contributions."
    * **Agent**: `data_analyst_agent`
* **Generate Insights Task**:
    * **Description**: "Based on the data analysis report, generate 3-5 concise, actionable business insights. Each insight should be a clear statement, followed by a recommendation for a business strategy. Format as bullet points."
    * **Agent**: `insight_generator_agent`
    * **Context**: This task should receive the output of the `analyze_data_task`.

In [None]:
analyze_data_task = Task(
    description=(
        "Analyze the provided sales data for Q1 2024 and identify key trends, "
        "growth areas, and any significant anomalies. Focus on identifying regional "
        "performance differences and product category contributions. "
        "Provide a summary of the analysis findings."
    ),
    agent=data_analyst_agent,
    expected_output="A detailed analysis report on sales data, including trends, anomalies, and regional performance."
)

generate_insights_task = Task(
    description=(
        "Based on the data analysis report provided, generate 3-5 concise, actionable "
        "business insights. Each insight should be a clear statement, followed by a "
        "recommendation for a business strategy. Format the insights as bullet points "
        "starting with 'Insight:' and 'Recommendation:'."
    ),
    agent=insight_generator_agent,
    context=[analyze_data_task], # The output of analyze_data_task becomes context here
    expected_output="3-5 actionable business insights with corresponding recommendations, formatted as bullet points."
)

print("Tasks defined!")

### Task 3.2: Create and Run the Crew
Assemble the agents and tasks into a `Crew` and execute the workflow.

* **Agents**: Use the agents loaded from YAML.
* **Tasks**: Use the defined tasks in sequential order.
* **Process**: Use `Process.sequential`.
* **Verbose**: Set to `True` for detailed logging.

In [None]:
business_intelligence_crew = Crew(
    agents=[data_analyst_agent, insight_generator_agent],
    tasks=[analyze_data_task, generate_insights_task],
    process=Process.sequential,
    verbose=True
)

print("\n--- Kicking off the Business Intelligence Crew! ---")

initial_data_context = {
    'data_description': 'sales data for Q1 2024, focusing on regional performance and product contributions'
}

result = business_intelligence_crew.kickoff(inputs=initial_data_context)

print("\n--- Workflow Finished! ---")
print("Final Insights Report:\n")
print(result)

---

## Part 4: Analysis and Reflection
Examine the outputs and discuss the benefits of using external configuration.

### Task 4.1: Review the Outputs
Inspect the final output from your CrewAI workflow and answer the following questions.

* **Analysis Quality**: Did the `Data Analyst` agent use the `analyze_data_tool` effectively? Was the simulated analysis output reasonable and did it provide sufficient information for the next step?
* **Insight Quality**: Were the insights generated by the `Insight Generator` agent concise, actionable, and directly derived from the analysis report?
* **Workflow Flow**: Describe how the information flowed between the `Data Analyst` and `Insight Generator` agents. Was the context passing successful?

### Task 4.2: Reflection on YAML Configuration
Discuss the advantages and potential disadvantages of defining agents and tools in a YAML file.

* **Advantages**: What are the main benefits of externalizing agent and tool definitions to a YAML file (e.g., modularity, reusability, version control, easier iteration, separation of concerns)?
* **Disadvantages/Challenges**: What are some potential challenges or limitations you might encounter when using YAML for configuration, especially for very complex setups?
* **Scalability**: How does using a YAML configuration improve the scalability and maintainability of your multi-agent applications as they grow larger or more complex?
* **When to Use**: In what real-world scenarios would defining agents and tools in a YAML file be particularly beneficial?

---

### Submission:
* Ensure all code cells have been executed and their outputs are visible.
* All analysis and reflections are clearly written in markdown cells.
* **Ensure the `agents_config.yaml` file is present** in the same directory as your notebook (its content is already programmatically created in Task 1.2).
* Save your Jupyter Notebook as `[YourName]_CrewAI_YAML_Config_Assignment.ipynb`.