This notebook analyses the [Yelp Dataset](https://www.yelp.com/dataset) using a LLM agent as a copilot. First we'll reorganize the data and put it in pandas a dataframe. This is in order to get the data organized in a format where it is easier to instruct the LLM how to analyze it. Alternatively we could try to get the LLM agent to restructure the data for us. Feel free to give this a shot!

In [None]:
import json
import pandas as pd
from typing import Dict, List


def read_json_data(filepath: str):
    data = []
    print('loading json')
    with open(filepath, 'r') as jsonfile:
        for line in jsonfile.readlines():
            data.append( json.loads(line) )
    return data

def extract_non_nested_fields(data: List[Dict]):
    record = data[0]
    columns = set([c for c in record if type(record[c]) != dict and c != 'hours'])
    print('reformatting json')
    records = []
    return [
        {key: record[key] for key in record if key in columns }
        for record in data
    ]

def load_json_to_df(filepath: str):
    data = read_json_data(filepath)
    data = extract_non_nested_fields(data)
    print('loading data into dataframe')
    return pd.DataFrame(data)
        

In [None]:
filepath = 'data/yelp_dataset/yelp_academic_dataset_business.json'

In [None]:
business_df = load_json_to_df('data/yelp_dataset/yelp_academic_dataset_business.json')

Lets take al ook at the dataframes content.

In [None]:
business_df.head()

We will use a custom `pythonCodeExecution` tool to enable our langchain agent to perform operations on the Pandas dataframes in our python environment and store the results. This tool requires all input dataframes which will be used for analysis to be registered with the `chat_agent` by calling the function `chat_agent.add_agent_input`. This function takes the variable, a variable name, and a variable description as input arguments
For example:
```
import chat_agent

chat_agent.add_agent_input(
    business_df,
    'business_df',
    'this is a dataframe containing information about businesses'
)
```

In [None]:
import chat_agent

In [None]:
chat_agent.add_agent_input(
    business_df,
    'business_df',
    'this is a dataframe containing information about businesses'
)

In [None]:
%%chat_agent -n
Give me all of the businesses that are in Texas.

In [None]:
chat_agent.get_result('result')

In [None]:
%%chat_agent
what business in the dataframe result has the greatest number of reviews

In [None]:
chat_agent.get_result('result')