# Agent to EDA assistent

In this second part, we will expand on the agent and add abilities (tools) to inspect a certain dataset.

This notebook will guide you through the steps, but some parts are left as exercises for you to complete.

> REMEMBER: The notebook is just there to experiment; the end result needs to be present in `chat_app.py` to be able to interactively chat.


In [None]:
# We need to add some necessary dependencies

uv add datasets pandas sqlalchemy

## Loading the dataset

The Titanic dataset is a wellknown standard dataset denoting the passengers of the Titanic. It is primarily used for causality regarding the survival rate.

To load the Titanic dataset, we use the `load_dataset` function from the `datasets` library. This function allows us to easily access and convert the dataset into a pandas DataFrame for further analysis. The Titanic dataset is available at [Hugging Face Datasets](https://huggingface.co/datasets/mstz/titanic).


In [None]:
import pandas as pd
from datasets import load_dataset

dataset = load_dataset("mstz/titanic")["train"]
titanic_df: pd.DataFrame = dataset.to_pandas()

## Using LangChain's SQL Database Integration

In this section, we will explore how to leverage LangChain's SQL Database integration to interact with and analyze structured data. LangChain provides a seamless way to connect to SQL databases, execute queries, and retrieve results for further processing.

The integration allows us to:

- Connect to various SQL databases using supported drivers.
- Perform complex queries to extract insights from the data.
- Combine SQL capabilities with LangChain's tools for advanced data manipulation and analysis.

For more details, refer to the [LangChain SQL Database Integration Documentation](https://python.langchain.com/docs/integrations/tools/sql_database/).


In [None]:
# This requires some glue code to load the pandas DataFrame into the in-memory SQLite database.
titanic_df.to_sql(...)

## Make the tools available to the ReAct agent

- Ensure the agent can query the dataset using SQL commands.
- Test the agent's ability to summarize the dataset.
- Verify the agent can calculate statistics like mean, median, and mode.
- Check if the agent can handle missing data gracefully.

### Example Questions to Ask:

- "How many passengers survived the Titanic disaster?"
- "What is the average age of the passengers?"
- "List all passengers who embarked from Southampton."
- "What is the survival rate for male and female passengers?"
- "Show the top 5 oldest passengers and their survival status."
- Experiment with filtering data based specific conditions.
