# Using CrewAI to Generate Plots from Data

This notebook demonstrates how to use CrewAI to create a multi-agent system that can generate visualizations from a CSV-like dataset based on natural language prompts. 

**Workflow:**

1.  **User Prompt:** The user provides a natural language prompt describing the desired plot (e.g., 'Plot Value1 by Category', 'Show distribution of Value2').
2.  **Prompt Analysis Agent:** This agent analyzes the prompt to determine the type of plot needed (e.g., bar chart, histogram, scatter plot) and the data columns to be used.
3.  **Plotting Agent:** This agent takes the analysis from the Prompt Analysis Agent and generates the plot using Matplotlib or Seaborn. The plot is then displayed in the notebook.

This approach allows users to generate visualizations without writing any code, simply by describing what they want to see.

## 1. Install and Import Libraries

In [None]:
!pip install -q crewai pandas matplotlib seaborn

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from crewai import Agent, Task, Crew, Process
import os

# Set up a dummy API key for CrewAI if it's not already set
# IMPORTANT: Replace "YOUR_API_KEY" with your actual OpenAI API key 
# or configure for your chosen LLM provider.
os.environ["OPENAI_API_KEY"] = "YOUR_API_KEY"

# Ensure plots are displayed inline in the notebook
%matplotlib inline

print("Libraries imported.")

**Note:** Replace `"YOUR_API_KEY"` with your actual OpenAI API key if you are using a model that requires it. For local models or other providers, ensure your environment is configured correctly (e.g., `OPENAI_API_BASE_URL` for local models).

## 2. Define Sample Dataset

We'll create a simple dataset using Pandas to simulate data you might load from a CSV file.

In [None]:
data = {
    'Category': ['A', 'B', 'A', 'C', 'B', 'A', 'C', 'C', 'B', 'A'],
    'Value1': [10, 15, 12, 18, 20, 11, 16, 17, 19, 13],
    'Value2': [25, 30, 22, 35, 40, 26, 32, 33, 38, 28],
    'Date': pd.to_datetime([
        '2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05',
        '2023-01-06', '2023-01-07', '2023-01-08', '2023-01-09', '2023-01-10'
    ])
}
df = pd.DataFrame(data)

print("Sample DataFrame:")
print(df.head())

## 3. Define CrewAI Agents

We need two agents for this task:

1.  **PromptAnalysisAgent:** Analyzes the user's natural language prompt to determine the plot type and data columns.
2.  **PlottingAgent:** Generates the plot based on the analysis from the first agent.

In [None]:
# Define the Prompt Analysis Agent
prompt_analyzer = Agent(
    role='Prompt Analysis Expert',
    goal=f"""Analyze the user's natural language prompt to determine the required plot type 
    (e.g., bar, scatter, line, histogram, box) and the specific data columns to be used from the dataframe. 
    The available columns are: {list(df.columns)}.
    Provide the output as a concise summary string, for example: 
    'Plot type: bar, X-axis: Category, Y-axis: Value1' or 'Plot type: histogram, Data: Value2'.""",
    backstory=f"""You are an expert in understanding data visualization requests. 
    You can interpret natural language and identify the key components needed to generate a plot. 
    You are familiar with various plot types like bar charts, scatter plots, line charts, histograms, and box plots. 
    The available data columns are {list(df.columns)}.""",
    verbose=True,
    allow_delegation=False
    # llm=your_llm_instance, # Optionally specify an LLM instance, e.g., for local models
)

# Define the Plotting Agent
plot_generator = Agent(
    role='Data Visualization Specialist',
    goal=f"""Generate a plot using Matplotlib or Seaborn based on the provided analysis. 
    The analysis will specify the plot type, X-axis, Y-axis, or data for distribution plots. 
    The available columns in the dataframe are: {list(df.columns)}. 
    You MUST ensure the plot is generated and displayed. 
    Use `plt.show()` after generating a plot to ensure it displays. 
    Return a success message upon generating the plot.""",
    backstory=f"""You are a specialist in creating data visualizations using Python libraries 
    like Matplotlib and Seaborn. You take instructions specifying plot type and data columns 
    and generate clear, informative plots. The data is available in a pandas DataFrame named `df` 
    which has the columns: {list(df.columns)}. You must use this `df` for plotting.""",
    verbose=True,
    allow_delegation=False
    # llm=your_llm_instance, # Optionally specify an LLM instance
)

print("Agents defined.")

## 4. Define Tasks

Now, let's define the tasks for our agents.

In [None]:
# Task for the Prompt Analysis Agent
analysis_task = Task(
    description="Analyze the user prompt: '{user_prompt}' and provide a plot specification. "
                f"The DataFrame `df` has columns: {list(df.columns)}.",
    expected_output="A concise summary string specifying the plot type, X-axis, and Y-axis or data for the plot. "
                    "For example: 'Plot type: bar, X-axis: Category, Y-axis: Value1' or 'Plot type: histogram, Data: Value2'.",
    agent=prompt_analyzer
)

# Task for the Plotting Agent
plotting_task = Task(
    description="Generate a plot based on the analysis from the prompt_analyzer. "
                "Use the pandas DataFrame `df` for plotting. Ensure the plot is displayed. "
                f"The DataFrame `df` (with columns {list(df.columns)}) is already loaded and available for you to use directly.",
    expected_output="A confirmation message that the plot has been generated and displayed, e.g., 'Plot generated and displayed successfully.'.",
    agent=plot_generator,
    context=[analysis_task] # Depends on the output of the analysis_task
)

print("Tasks defined.")

### Tool for the Plotting Agent (Code Execution)

The `PlottingAgent` needs to execute Python code to generate plots. CrewAI doesn't directly execute plotting code from its agents in a way that renders in a Jupyter notebook by default. We need to provide the agent with a tool that can execute Python code, specifically for plotting. 

However, for simplicity in this example, we will rely on the agent's LLM to generate the Python code for plotting, and then we will manually execute this code in a separate cell. In a more advanced setup, you could create a custom tool that the agent can use to execute plotting code directly and handle the display.

For this notebook, the `PlottingAgent`'s output will be the Python code to generate the plot. We will then take that code and run it.

Let's adjust the `PlottingAgent` and its task slightly to reflect that its output should be Python code for plotting.

In [None]:
# Re-define the Plotting Agent to output Python code
plot_code_generator = Agent(
    role='Python Plotting Code Generator',
    goal=f"""Generate Python code using Matplotlib or Seaborn to create a plot based on the provided analysis. 
    The analysis will specify the plot type, X-axis, Y-axis, or data for distribution plots. 
    The available columns in the dataframe are: {list(df.columns)}. 
    You MUST provide *only* the Python code required to generate the plot. 
    The DataFrame is named `df`. Ensure the code includes `plt.show()`.
    For example, if asked for a bar chart of Value1 by Category, the output should be:
    ```python
    import matplotlib.pyplot as plt
    import seaborn as sns
    # df is assumed to be pre-loaded
    sns.barplot(x='Category', y='Value1', data=df)
    plt.title('Value1 by Category')
    plt.xlabel('Category')
    plt.ylabel('Value1')
    plt.show()
    ```""",
    backstory=f"""You are an expert in writing Python code for data visualizations using 
    Matplotlib and Seaborn. You take instructions specifying plot type and data columns 
    and generate clean, executable Python code. The data is available in a pandas DataFrame named `df` 
    which has the columns: {list(df.columns)}. You must use this `df` in your generated code.""",
    verbose=True,
    allow_delegation=False
)

# Re-define the Plotting Task to expect Python code as output
plotting_code_task = Task(
    description="Generate Python plotting code based on the analysis from the prompt_analyzer. "
                "Use the pandas DataFrame `df` (available globally) for plotting. "
                f"The DataFrame `df` has columns {list(df.columns)}.",
    expected_output="Complete, executable Python code as a single string to generate the plot using Matplotlib/Seaborn. "
                    "The code should include necessary imports like `matplotlib.pyplot as plt` and `seaborn as sns`, and use `plt.show()`.",
    agent=plot_code_generator,
    context=[analysis_task] # Depends on the output of the analysis_task
)

print("Plotting Agent and Task redefined to output Python code.")

## 5. Assemble and Run the Crew

Now we create the Crew, add our agents and tasks, and kick off the process.

In [None]:
# Assemble the crew with the updated plotting task
plot_crew = Crew(
    agents=[prompt_analyzer, plot_code_generator],
    tasks=[analysis_task, plotting_code_task],
    process=Process.sequential,
    verbose=2 # Shows agent actions and tool usage
)

# Function to run the crew and execute the generated code
def generate_and_display_plot(user_prompt):
    print(f"\n--- Running Crew for prompt: '{user_prompt}' ---")
    inputs = {'user_prompt': user_prompt}
    result = plot_crew.kickoff(inputs=inputs)
    
    print("\n--- Crew Execution Finished ---")
    print("\nAnalysis Result (from analysis_task):")
    # The analysis_task is the first task, its result is stored here after crew run if tasks share context
    # However, the 'result' variable from kickoff() will be the result of the *last* task.
    # To get intermediate task results, you might need to access them differently or ensure they are passed effectively.
    # For this structure, plot_crew.tasks[0].output.raw_output might hold it after execution.
    if plot_crew.tasks[0].output:
        print(plot_crew.tasks[0].output.raw_output)
    else:
        print("No direct output from analysis task available this way.")

    print("\nGenerated Plotting Code (from plotting_code_task):")
    print(result) # This 'result' is the output of the last task (plotting_code_task)
    
    print("\n--- Executing Generated Code --- ")
    if result and '```python' in result:
        # Extract code if it's in a markdown block
        code_to_execute = result.split('```python')[1].split('```')[0].strip()
    elif result:
        code_to_execute = result.strip()
    else:
        print("No code generated.")
        return
    
    try:
        # Make df available to the executed code
        # plt and sns are imported within the notebook cell already, 
        # but the generated code might re-import them, which is fine.
        exec(code_to_execute, {'df': df, 'plt': plt, 'sns': sns})
    except Exception as e:
        print(f"Error executing generated code: {e}")

print("Crew defined. Ready to run.")

## 6. Run with Example Prompts

In [None]:
# Example 1: Bar chart
generate_and_display_plot("Show me a bar chart of Value1 for each Category")

In [None]:
# Example 2: Histogram
generate_and_display_plot("What is the distribution of Value2?")

In [None]:
# Example 3: Scatter plot
generate_and_display_plot("Plot Value1 against Value2 as a scatter plot")

In [None]:
# Example 4: Line chart (if meaningful with the data)
# This might not be the best plot for the current data, but let's see what the agent does.
generate_and_display_plot("Show the trend of Value1 over Date as a line chart")

## 7. Conclusion

This notebook demonstrated a basic CrewAI setup to interpret natural language prompts and generate Python code for data visualizations. 

**Key Learnings:**
*   **Agent Roles:** Defining clear roles and goals for agents is crucial.
*   **Task Definition:** Tasks should be specific and provide necessary context (like available data columns).
*   **Output Handling:** The output of one agent (analysis) can be used as input for another (plotting code generation).
*   **Code Execution:** For tasks like plotting, the LLM can generate code, which then needs to be executed. In this notebook, we did this manually in the `generate_and_display_plot` function. For more robust solutions, custom tools for code execution within CrewAI can be developed.
*   **LLM Dependence:** The quality of the generated code and analysis heavily depends on the capabilities of the underlying LLM.

**Further Enhancements:**
*   **Error Handling:** Implement more sophisticated error handling if the LLM fails to understand the prompt or generates incorrect code.
*   **Custom Tools:** Develop a custom CrewAI tool that allows the `PlottingAgent` to directly execute plotting code and save/display the figure, rather than just outputting the code string.
*   **More Complex Prompts:** Train or prompt the agents to handle more complex requests (e.g., multiple plots, subplots, specific styling).
*   **Interactive Data Source:** Allow users to specify a CSV file path instead of using a fixed DataFrame.