# Semantic Layer on CSV

In this notebook, we will show how to create a semantic layer on a CSV file.
The semantic layer works as a bridge between the raw data and the natural language layer.

### Why use a Semantic Layer?
- Adds context and meaning to data columns
- Makes it easier for the large language model to understand context
- Set once, use across multiple sessions

## Import PandasAI

In [None]:
import pandasai as pai

## Read raw data

For this example, we will use a small dataset of heart disease patients from [Kaggle](https://www.kaggle.com/datasets/arezaei81/heartcsv).

In [None]:
# Load the heart disease dataset
df = pai.read_csv("./dataheart.csv")

# Display the first few rows
df.head()

## Create the Semantic Layer

Requirements for the semantic layer:
- `path`: Must be in format 'organization/dataset'
- `name`: A descriptive name for the dataset
- `description`: Brief overview of the dataset
- `columns`: List of dictionaries with format:
  ```python
  {
      "name": "column_name",
      "type": "column_type",  # string, number, date, datetime
      "description": "column_description"
  }
  ```

In [None]:
df.save(path="organization/heart",
    name="Heart",
    description="Heart Disease Dataset",
    columns=[
        {
            "name": "Age",
            "type": "number",
            "description": "Age of the patient in years"
        },
        {
            "name": "Sex",
            "type": "string",
            "description": "Gender of the patient (M: Male, F: Female)"
        },
        {
            "name": "ChestPainType",
            "type": "string",
            "description": "Type of chest pain (ATA: Atypical Angina, NAP: Non-Anginal Pain, ASY: Asymptomatic, TA: Typical Angina)"
        },
        {
            "name": "RestingBP",
            "type": "number",
            "description": "Resting blood pressure in mm Hg"
        },
        {
            "name": "Cholesterol",
            "type": "number",
            "description": "Serum cholesterol in mg/dl"
        },
        {
            "name": "FastingBS",
            "type": "number",
            "description": "Fasting blood sugar (1: if FastingBS > 120 mg/dl, 0: otherwise)"
        },
        {
            "name": "RestingECG",
            "type": "string",
            "description": "Resting electrocardiogram results (Normal, ST: having ST-T wave abnormality, LVH: showing probable or definite left ventricular hypertrophy)"
        },
        {
            "name": "MaxHR",
            "type": "number",
            "description": "Maximum heart rate achieved"
        },
        {
            "name": "ExerciseAngina",
            "type": "string",
            "description": "Exercise-induced angina (Y: Yes, N: No)"
        },
        {
            "name": "Oldpeak",
            "type": "number",
            "description": "ST depression induced by exercise relative to rest"
        },
        {
            "name": "ST_Slope",
            "type": "string",
            "description": "Slope of the peak exercise ST segment (Up, Flat, Down)"
        },
        {
            "name": "HeartDisease",
            "type": "number",
            "description": "Target variable - Heart disease presence (1: heart disease, 0: normal)"
        }
    ])

## Load Semantic Dataframe

Once you have saved the dataframe with its semantic layer, you can load it in any session using the `load()` method. This allows you to:
- Maintain data context across sessions
- Ask questions about your data in natural language
- Generate more accurate analysis and visualizations

In [None]:
# Load the semantically enhanced dataset
df = pai.load("organization/heart")

## Chat with your dataframe

You can now ask questions about your data in natural language to your dataframe using the `chat()` method. PandaAI can be used with several LLMs. For the purpose of this example, we are using BambooLLM. Get your free API key signing up at [pandabi.ai](https://pandabi.ai), which allows you to both use the data platform and get BambooLLM credits.

In [None]:
pai.api_key.set("your api key")

response = df.chat("What is the correlation between age and cholesterol?")

print(response)