# 🤖 Agentic Data Analysis with LangChain and Gradio

This notebook documents the setup, code, and execution of a web application that uses a LangChain agent to perform data analysis on user-uploaded CSV files. The interface is built with Gradio, and the project environment is managed by `uv`.

## Step 1: Project Setup with `uv`

We use `uv`, a fast Python package manager from Astral, to handle our project's virtual environment and dependencies. This ensures a clean, reproducible setup.

### 1.1 - Install `uv`

If you don't have `uv` installed, run the following command in your terminal.

In [14]:
# On macOS / Linux
!curl -LsSf https://astral.sh/uv/install.sh | sh

# On Windows (in PowerShell)
# powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

'sh' is not recognized as an internal or external command,
operable program or batch file.


### 1.2 - Initialize Project and Install Dependencies

Create a project directory, initialize it, and add the required packages.

In [15]:
!mkdir agentic-data-app
!cd agentic-data-app
!uv init
!uv venv
!uv add langchain langchain-openai pandas matplotlib gradio ipykernel pip langchain-experimental

[1m[31merror[39m[0m: Project is already initialized in `[36mc:\Users\miqba\projects\Analyst_Agent_with_Langchain[39m` (`pyproject.toml` file exists)
Using CPython 3.12.3 interpreter at: [36mC:\Users\miqba\anaconda3\python.exe[39m
Creating virtual environment at: [36m.venv[39m
[31muv::venv::creation[0m

  [31m×[0m Failed to create virtualenv
[31m  ╰─▶ [0mfailed to remove directory
[31m      [0m`c:\Users\miqba\projects\Analyst_Agent_with_Langchain\.venv`: Access is
[31m      [0mdenied. (os error 5)


^C


[2mResolved [1m130 packages[0m [2min 4ms[0m[0m
[2mInstalled [1m122 packages[0m [2min 8.56s[0m[0m
 [32m+[39m [1maiofiles[0m[2m==24.1.0[0m
 [32m+[39m [1maiohappyeyeballs[0m[2m==2.6.1[0m
 [32m+[39m [1maiohttp[0m[2m==3.12.15[0m
 [32m+[39m [1maiosignal[0m[2m==1.4.0[0m
 [32m+[39m [1mannotated-types[0m[2m==0.7.0[0m
 [32m+[39m [1manyio[0m[2m==4.10.0[0m
 [32m+[39m [1masttokens[0m[2m==3.0.0[0m
 [32m+[39m [1mattrs[0m[2m==25.3.0[0m
 [32m+[39m [1mbrotli[0m[2m==1.1.0[0m
 [32m+[39m [1mcertifi[0m[2m==2025.8.3[0m
 [32m+[39m [1mcharset-normalizer[0m[2m==3.4.2[0m
 [32m+[39m [1mclick[0m[2m==8.2.1[0m
 [32m+[39m [1mcolorama[0m[2m==0.4.6[0m
 [32m+[39m [1mcomm[0m[2m==0.2.3[0m
 [32m+[39m [1mcontourpy[0m[2m==1.3.3[0m
 [32m+[39m [1mcycler[0m[2m==0.12.1[0m
 [32m+[39m [1mdataclasses-json[0m[2m==0.6.7[0m
 [32m+[39m [1mdebugpy[0m[2m==1.8.15[0m
 [32m+[39m [1mdecorator[0m[2m==5.2.1[0m
 [32m+[

## Step 2: The Application Code (`app.py`)

### 2.1 - Imports and Configuration

First, we import all the necessary libraries. We need `gradio` for the UI, `pandas` for data handling, `os` for file operations, and `langchain` components to build and run our agent. We also define a constant for the filename of our plots to keep the code clean.

In [None]:
import gradio as gr
import pandas as pd
import os
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI  # Changed from OpenAI to ChatOpenAI
from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent

# --- Load Environment Variables ---
# This line reads the .env file and loads the variables into the environment
load_dotenv()
# --- Configuration ---
# Define a constant for the plot filename to ensure consistency.
PLOT_FILENAME = "temp_analysis_plot.png"

### 2.2 - The Agent Logic Function

Next, we define the main function, `data_analyst_agent`. This function will be called by Gradio every time the user clicks the "Run Analysis" button. It takes the uploaded file and the user's text prompt as input.

The first part of the function handles input validation and cleans up any old plot files.

In [None]:
def data_analyst_agent(file_obj, user_prompt):
    """
    This is the core function that orchestrates the agent's work.
    """
    # 1. Input Validation
    if file_obj is None:
        return "Error: Please upload a CSV file first.", None
    if not user_prompt:
        return "Error: Please enter a question or instruction.", None

    # 2. Cleanup: Remove any old plot file to prevent showing stale results.
    if os.path.exists(PLOT_FILENAME):
        os.remove(PLOT_FILENAME)

### 2.3 - Create and Execute the Agent

This is the core of the function. We create an instance of the `create_pandas_dataframe_agent`, which is specifically designed to work with data in a pandas DataFrame.

We then create a detailed prompt that includes the user's original question plus specific instructions for the agent, telling it to save any plots it makes. This makes the agent much more reliable. Finally, we `invoke` the agent to run the analysis and handle any errors that might occur.

In [None]:
try:
    # Load Data: The file_obj from Gradio has a .name attribute
    # which holds the temporary path to the uploaded file.
    df = pd.read_csv(file_obj.name)

    # Initialize LLM: Use a chat model that supports tool calling.
    llm = ChatOpenAI(
        api_key=api_key, 
        temperature=0, 
        model="gpt-3.5-turbo" # Specify a tool-calling model
    )

    # Create Agent: This is the heart of the operation.
    # `create_pandas_dataframe_agent` equips the LLM with tools
    # to execute pandas operations on the DataFrame.
    agent_executor = create_pandas_dataframe_agent(
        llm,
        df,
        agent_type="openai-tools",
        verbose=True,  # Set to True to see the agent's thought process in the terminal
        allow_dangerous_code=True # Opt-in to allow the agent to execute Python code
    )

    # Craft a Detailed Prompt: We augment the user's prompt to give the
    # agent explicit instructions on how to handle visualizations.
    # This makes the agent much more reliable.
    full_prompt = f"""
    User Question: {user_prompt}

    Instructions for the agent:
    - First, analyze the provided data to answer the user's question.
    - If you generate a plot or visualization, you MUST save it as a file named '{PLOT_FILENAME}'.
    - In your final answer, you must explicitly describe the visualization you created (e.g., "I have created a bar chart that shows the total sales for each product category.").
    - Also, mention that the plot has been saved.
    - Use Markdown formatting for all text in your output. This includes headings, bullet points, code blocks (if any), and emphasis for clarity.
    """

    # Run the Agent: Invoke the agent with the detailed prompt.
    response = agent_executor.invoke({"input": full_prompt})
    
    # Extract the text output from the agent's response.
    text_output = response.get('output', "I couldn't generate a text response. Please check the logs.")

    # Check for and Return the Plot: After the agent runs, check if the
    # plot file was created. If so, return it alongside the text answer.
    if os.path.exists(PLOT_FILENAME):
        return text_output, PLOT_FILENAME
    else:
        # If no plot was created, return None for the image output.
        return text_output, None


except Exception as e:
    
    error_message = f"An unexpected error occurred: {str(e)}"
    print(error_message) 
    return error_message, None

### 2.4 - Building the Gradio UI

Finally, we create the web interface. We use `gr.Blocks` for a custom layout. We define the input components (`gr.File`, `gr.Textbox`) and the output components (`gr.Markdown`, `gr.Image`).

The most important line is `submit_button.click(...)`. This tells Gradio that when the button is clicked, it should call our `data_analyst_agent` function, passing the content from the input components to it and sending the function's results to the output components.

In [None]:
with gr.Blocks(theme=gr.themes.Soft(primary_hue="blue")) as demo:
    gr.Markdown(
        """
        # 🤖 Agentic Data Analysis with LangChain
        Upload your CSV file, ask a question in natural language, and the AI agent will work to find the answer.
        It can perform calculations, data manipulation, and even generate visualizations.
        """
    )
    
    with gr.Row():
        with gr.Column(scale=1):
            # Input components
            file_input = gr.File(label="Upload your CSV", file_types=[".csv"])
            text_input = gr.Textbox(
                label="What would you like to know?",
                placeholder="e.g., 'What is the correlation between column A and B?' or 'Create a bar chart of sales by category.'"
            )
            submit_button = gr.Button("🚀 Run Analysis", variant="primary")
        
        with gr.Column(scale=2):
            # Output components
            text_output = gr.Markdown(label="📝 Agent's Answer")
            plot_output = gr.Image(label="📊 Generated Visualization", type="filepath")

    # Connect the button to the agent function
    submit_button.click(
        fn=data_analyst_agent,
        inputs=[file_input, text_input],
        outputs=[text_output, plot_output]
    )
    
# To run this app, save it as app.py and run 'uv run app.py' in your terminal.
# if __name__ == "__main__":
#     demo.launch()

## Step 3: Running the Application

To run the app, use `uv` to execute the script from your terminal.

In [None]:
!uv run app.py

[1m[31merror[39m[0m: Failed to spawn: `app.py`
  [1m[31mCaused by[39m[0m: program not found
