# Intelligent EDA Agent Tutorial
This notebook demonstrates how to use the Intelligent EDA Agent for automated exploratory data analysis with interactive visualizations and Streamlit dashboard integration.

## Features Covered
- Basic EDA Agent usage
- Interactive visualizations with Plotly
- Streamlit dashboard integration
- Custom analysis queries
- Report generation and saving

## Setup and Installation
First, let's install the required packages. Make sure you have Python 3.8+ installed.

In [None]:
# Install required packages
%pip install pandas numpy plotly streamlit langchain-core langchain-openai langchain-ollama

## Load Example Dataset
Let's load the Titanic dataset and initialize our EDA Agent.

In [None]:
import os
import sys
from core.EDA.eda_agent import EDAAgent
import streamlit as st
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
# Additional code for the notebook would go here

In [None]:
# Load the Titanic dataset
df = pd.read_csv('src/titanic.csv')
print(f"Dataset loaded: {df.shape[0]} rows × {df.shape[1]} columns")

# Initialize the EDA Agent
agent = EDAAgent(df, use_openai=False)  # Set to True if you want to use OpenAI instead of Ollama

## Basic EDA Agent Usage
Let's start with some basic exploratory data analysis using the agent.

In [None]:
# Get dataset overview
overview = agent.chat("Give me a dataset overview")
print(overview)

# Check data quality
quality = agent.chat("Check data quality and tell me about any issues")
print(quality)

# Get automated insights
insights = agent.chat("Give me automated insights about the dataset")
print(insights)

## Interactive Visualization with Plotly
Let's create some interactive visualizations using Plotly based on the insights from our EDA Agent.

In [None]:
# Create survival rate by class visualization
fig = px.bar(df.groupby('Pclass')['Survived'].mean().reset_index(), 
             x='Pclass', y='Survived', 
             title='Survival Rate by Passenger Class',
             labels={'Survived': 'Survival Rate', 'Pclass': 'Passenger Class'})
fig.show()

# Age distribution by survival status
fig = px.histogram(df, x='Age', color='Survived', 
                  nbins=30, opacity=0.7,
                  title='Age Distribution by Survival Status',
                  labels={'Survived': 'Survived'})
fig.show()

# Correlation heatmap for numeric columns
numeric_cols = df.select_dtypes(include=[np.number]).columns
fig = go.Figure(data=go.Heatmap(
    z=df[numeric_cols].corr(),
    x=numeric_cols,
    y=numeric_cols,
    colorscale='RdBu'))
fig.update_layout(title='Correlation Heatmap')
fig.show()

## Streamlit Dashboard Integration
Here's how to create a Streamlit dashboard that uses our EDA Agent. Save this code in a file named `dashboard.py`:

In [None]:
%%writefile src/dashboard.py
import streamlit as st
import pandas as pd
import plotly.express as px
from core.EDA.eda_agent import EDAAgent

def main():
    st.set_page_config(page_title="EDA Agent Dashboard", layout="wide")
    st.title("🔍 Intelligent EDA Dashboard")

    # File upload
    uploaded_file = st.file_uploader("Choose a CSV file", type="csv")
    if uploaded_file is not None:
        df = pd.read_csv(uploaded_file)
        agent = EDAAgent(df)

        # Sidebar controls
        st.sidebar.title("Analysis Controls")
        analysis_type = st.sidebar.selectbox(
            "Choose Analysis",
            ["Overview", "Data Quality", "Column Analysis", "Visualizations", "Custom Query"]
        )

        if analysis_type == "Overview":
            st.header("Dataset Overview")
            overview = agent.chat("Give me a dataset overview")
            st.write(overview)

        elif analysis_type == "Data Quality":
            st.header("Data Quality Report")
            quality = agent.chat("Check data quality")
            st.write(quality)

        elif analysis_type == "Column Analysis":
            st.header("Column Analysis")
            column = st.selectbox("Select Column", df.columns)
            analysis = agent.chat(f"Analyze the {column} column")
            st.write(analysis)

        elif analysis_type == "Visualizations":
            st.header("Interactive Visualizations")
            viz_type = st.selectbox(
                "Select Visualization",
                ["Distribution", "Correlation", "Custom"]
            )

            if viz_type == "Distribution":
                col = st.selectbox("Select Column", df.select_dtypes(include=['number']).columns)
                fig = px.histogram(df, x=col, title=f'Distribution of {col}')
                st.plotly_chart(fig)

            elif viz_type == "Correlation":
                numeric_cols = df.select_dtypes(include=['number']).columns
                fig = px.imshow(df[numeric_cols].corr(),
                              title='Correlation Heatmap')
                st.plotly_chart(fig)

        else:
            st.header("Custom Query")
            query = st.text_area("Enter your analysis question:")
            if st.button("Analyze"):
                response = agent.chat(query)
                st.write(response)

        # Save report button
        if st.sidebar.button("Generate Report"):
            report = agent.generate_automatic_eda()
            st.sidebar.download_button(
                "Download Report",
                report,
                file_name="eda_report.md",
                mime="text/markdown"
            )

if __name__ == "__main__":
    main()

To run the dashboard:
```bash
streamlit run src/dashboard.py
```

## Custom Queries and Analysis
Let's look at some example custom queries we can ask our EDA Agent.

In [None]:
# Example custom queries
queries = [
    "What's the survival rate by gender?",
    "Show me the correlation between age and fare",
    "Which features have the most missing values?",
    "What's the age distribution by passenger class?"
]

for query in queries:
    print(f"\nQuery: {query}")
    print("-" * 50)
    response = agent.chat(query)
    print(response)

## Saving Reports and Visualizations
Let's generate and save a comprehensive EDA report, along with some visualizations.

In [None]:
# Generate and save the EDA report
report = agent.generate_automatic_eda(save_path="reports/titanic_eda_report.md")

# Save some visualizations
def save_visualization(fig, filename):
    fig.write_html(f"reports/visualizations/{filename}.html")
    fig.write_image(f"reports/visualizations/{filename}.png")

# Create and save survival by class visualization
fig = px.bar(df.groupby('Pclass')['Survived'].mean().reset_index(), 
             x='Pclass', y='Survived',
             title='Survival Rate by Passenger Class')
save_visualization(fig, "survival_by_class")

# Create and save correlation heatmap
numeric_cols = df.select_dtypes(include=[np.number]).columns
fig = px.imshow(df[numeric_cols].corr(),
                title='Correlation Heatmap')
save_visualization(fig, "correlation_heatmap")