# Mix Self-Consistency Notebook 

<a href="https://colab.research.google.com/github/run-llama/llama-hub/blob/main/llama_hub/llama_packs/tables/mix_self_consistency/mix_self_consistency.ipynb" target="_parent">
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this notebook, we highlight the mix self-consistency method proposed in ["Rethinking Tabular Data Understanding with Large Language Models"](https://arxiv.org/pdf/2312.16702v1.pdf) paper by Liu et al.

LLMs can reason over tabular data in 2 main ways:
1. textual reasoning via direct prompting
2. symbolic reasoning via program synthesis (e.g. python, SQL, etc)

The key insight of the paper is that different reasoning pathways work well in different tasks. By aggregating results from both with a self-consistency mechanism (i.e. majority voting), it achieves SoTA performance.

We implemented the paper based on the prompts described in the paper, and adapted it to get it working. That said, this is marked as beta, so there may still be kinks to work through. Do you have suggestions / contributions on how to improve the robustness? Let us know! 

# Download Data

We use the [WikiTableQuestions dataset](https://ppasupat.github.io/WikiTableQuestions/) (Pasupat and Liang 2015) as our test dataset.

WikiTableQuestions is a question-answering dataset over various semi-structured tables taken from Wikipedia. These tables range in size from a few rows/columns to mnay rows. Some columns may contain multi-part information as well (e.g. a temperature column may contain both Fahrenheight and Celsius).

In [1]:
# !wget "https://github.com/ppasupat/WikiTableQuestions/releases/download/v1.0.2/WikiTableQuestions-1.0.2-compact.zip" -O data.zip
# !unzip data.zip

'wget' is not recognized as an internal or external command,
operable program or batch file.
'unzip' is not recognized as an internal or external command,
operable program or batch file.


Let's visualize some examples.

In [1]:
import pandas as pd

examples = pd.read_table("data/raw/WikiTableQuestions/data/training-before300.tsv")

examples.head()

Unnamed: 0,id,utterance,context,targetValue
0,nt-0,what was the last year where this team was a p...,csv/204-csv/590.csv,2004
1,nt-1,in what city did piotr's last 1st place finish...,csv/204-csv/622.csv,"Bangkok, Thailand"
2,nt-2,which team won previous to crettyard?,csv/204-csv/772.csv,Wolfe Tones
3,nt-3,how many more passengers flew to los angeles t...,csv/203-csv/515.csv,12467
4,nt-4,who was the opponent in the first game of the ...,csv/204-csv/495.csv,Derby County


In [2]:
example = examples.iloc[0]
question = example["utterance"]
context = example["context"]
print("The question is: ", question)

The question is:  what was the last year where this team was a part of the usl a-league?


Let's load the table that can be used as context to answer the question in the first example.

In [3]:
table = pd.read_csv("data/raw/WikiTableQuestions/" + context)
table.head()

Unnamed: 0,Year,Division,League,Regular Season,Playoffs,Open Cup,Avg. Attendance
0,2001,2,USL A-League,"4th, Western",Quarterfinals,Did not qualify,7169
1,2002,2,USL A-League,"2nd, Pacific",1st Round,Did not qualify,6260
2,2003,2,USL A-League,"3rd, Pacific",Did not qualify,Did not qualify,5871
3,2004,2,USL A-League,"1st, Western",Quarterfinals,4th Round,5628
4,2005,2,USL First Division,5th,Quarterfinals,4th Round,6028


The corect answer should be 2004.

## Load Pack / Setup

Now we do `download_llama_pack` to load the Mix Self Consistency LlamaPack (you can also import the module directly if using the llama-hub package).

We will also optionally setup observability/tracing so we can observe the intermediate steps.

In [4]:
# Option: if developing with the llama_hub package
# from llama_hub.llama_packs.tables.mix_self_consistency.base import (
#     MixSelfConsistencyQueryEngine,
# )

# Option: download llama_pack
from llama_index.core.llama_pack import download_llama_pack

download_llama_pack(
    "MixSelfConsistencyPack",
    "./mix_self_consistency_pack",
    #skip_load=True,
    # leave the below line commented out if using the notebook on main
    # llama_hub_url="https://raw.githubusercontent.com/run-llama/llama-hub/suo/table_qa/llama_hub"
)


llama_index.packs.tables.mix_self_consistency.base.MixSelfConsistencyPack

In [5]:
from mix_self_consistency_pack.llama_index.packs.tables.mix_self_consistency.base import MixSelfConsistencyQueryEngine

In [6]:
import os
from llama_index.llms.openai import OpenAI
from configparser import ConfigParser

config=ConfigParser()
config.read('conf/conf.ini')
os.environ["OPENAI_API_KEY"] = config['openai']['apikey']

llm = OpenAI()

### Optional: Setup Observability

Here we will use our Arize Phoenix integration to view traces through the query engine.

In [7]:
import phoenix as px
import llama_index
from llama_index.core import set_global_handler

px.launch_app()
set_global_handler("arize_phoenix")

🌍 To view the Phoenix app in your browser, visit http://localhost:6006/
📺 To view the Phoenix app in a notebook, run `px.active_session().view()`
📖 For more information on how to use Phoenix, check out https://docs.arize.com/phoenix


## Run experiments

Let's try out different modes.

### Textual Reasoning Only (i.e. direct prompting)

Let's start with using only the textual reasoning path.
Basically, we directly convert the pandas dataframe into a markdown representation, and inject into the prompt for in-context reasoning.

In [8]:
query_engine = MixSelfConsistencyQueryEngine(
    df=table,
    llm=llm,
    text_paths=1,
    symbolic_paths=0,
    aggregation_mode="none",
    verbose=True,
)

In [8]:
response = await query_engine.aquery(example["utterance"])
print(response)

[1;3;38;2;155;135;227m> Running module c6ffdfda-15f3-4a4c-9d7d-3a7ab025437c with input: 
question: what was the last year where this team was a part of the usl a-league?
table: |    |   Year |   Division | League              | Regular Season   | Playoffs        | Open Cup        | Avg. Attendance   |
|---:|-------:|-----------:|:--------------------|:-----------------|:----...

[0m[1;3;38;2;155;135;227m> Running module fa48cf7a-e2c2-47d7-994c-2e631e9c84b0 with input: 
messages: You are an advanced AI capable of analyzing and understanding information within tables. Read the table below.

|    |   Year |   Division | League              | Regular Season   | Playoffs        | ...

[0m[1;3;38;2;155;135;227m> Running module df320e1c-8268-4f9f-813d-4ce0d9322de6 with input: 
input: assistant: Step 1: Identify the rows where the team was a part of the USL A-League.
- Row 0: Year 2001, Division 2, League USL A-League
- Row 1: Year 2002, Division 2, League USL A-League
- Row 2: Yea...

[

We get an incorrect answer. 

In [9]:
query_engine = MixSelfConsistencyQueryEngine(
    df=table,
    llm=llm,
    text_paths=5,
    symbolic_paths=0,
    aggregation_mode="self-consistency",
    verbose=True,
)

In [10]:
response = await query_engine.aquery(example["utterance"])
print(response)

[1;3;38;2;155;135;227m> Running modules and inputs in parallel: 
Module key: 3bcb2ac9-878b-49f9-bf00-803be89604f8. Input: 
question: what was the last year where this team was a part of the usl a-league?
table: |    |   Year |   Division | League              | Regular Season   | Playoffs        | Open Cup        | Avg. Attendance   |
|---:|-------:|-----------:|:--------------------|:-----------------|:----...


[0m[1;3;38;2;155;135;227m> Running modules and inputs in parallel: 
Module key: 68cc0066-3ea4-4d4a-b411-d9f94d6f9c2d. Input: 
question: what was the last year where this team was a part of the usl a-league?
table: |    |   Year |   Division | League              | Regular Season   | Playoffs        | Open Cup        | Avg. Attendance   |
|---:|-------:|-----------:|:--------------------|:-----------------|:----...


[0m[1;3;38;2;155;135;227m> Running modules and inputs in parallel: 
Module key: 1b049514-db6c-4d4a-b143-73869d7b1198. Input: 
question: what was the last year

We still get incorrect result after sampling 5 textua reasoning paths, and aggregating the results via self-consistency.

### Symbolid Reasoning Only (i.e. python shell)

Now, let's use symbolic reasoning. Here, LLM generates python expressions directly manipuate the pandas dataframe.

In [11]:
query_engine = MixSelfConsistencyQueryEngine(
    df=table,
    llm=llm,
    text_paths=0,
    symbolic_paths=1,
    aggregation_mode="none",
    verbose=True,
)

In [12]:
response = await query_engine.aquery(example["utterance"])
print(response)

> Pandas Instructions:
```
df[df['League'].str.contains('USL A-League')]['Year'].max()
```
> Pandas Output: 2004
Aggregation mode: none
Text results: []
Symbolic results: ['2004']
2004


We get the correct answer here. 

### Aggregation via Self-Evaluation

Now we consider self-evaluation for aggregating results across textual and symbolic reasoning paths.
Basically, we tell the LLM what each reasoning path is good at to obtain a final result.

In [13]:
query_engine = MixSelfConsistencyQueryEngine(
    df=table,
    llm=llm,
    text_paths=1,
    symbolic_paths=1,
    aggregation_mode="self-evaluation",
    verbose=True,
)

In [14]:
response = await query_engine.aquery(example["utterance"])
print(response)

[1;3;38;2;155;135;227m> Running modules and inputs in parallel: 
Module key: 9f0af23e-02bb-487c-a01c-97080bfbcfea. Input: 
question: what was the last year where this team was a part of the usl a-league?
table: |    |   Year |   Division | League              | Regular Season   | Playoffs        | Open Cup        | Avg. Attendance   |
|---:|-------:|-----------:|:--------------------|:-----------------|:----...


[0m> Pandas Instructions:
```
df[df['League'].str.contains('USL A-League')]['Year'].max()
```
> Pandas Output: 2004
[1;3;38;2;155;135;227m> Running modules and inputs in parallel: 
Module key: 54988dfa-32bc-4f18-ba51-a9e16102d51e. Input: 
messages: You are an advanced AI capable of analyzing and understanding information within tables. Read the table below.

|    |   Year |   Division | League              | Regular Season   | Playoffs        | ...


[0m[1;3;38;2;155;135;227m> Running modules and inputs in parallel: 
Module key: fcf25b5e-6fac-4308-8f8a-beb1cce44edb. Input

We obtain the correct result now.

### Aggregation via Mix Self-Consistency

Now, we consider the SoTA method, which aggregates results across reasoning paths via self-consistency (i.e. majority voting) 

In [15]:
query_engine = MixSelfConsistencyQueryEngine(
    df=table,
    llm=llm,
    text_paths=5,
    symbolic_paths=5,
    aggregation_mode="self-consistency",
    verbose=True,
)

In [16]:
response = await query_engine.aquery(example["utterance"])
print(response)

[1;3;38;2;155;135;227m> Running modules and inputs in parallel: 
Module key: a392d450-ac3f-469b-9a0f-514227186fe3. Input: 
question: what was the last year where this team was a part of the usl a-league?
table: |    |   Year |   Division | League              | Regular Season   | Playoffs        | Open Cup        | Avg. Attendance   |
|---:|-------:|-----------:|:--------------------|:-----------------|:----...


[0m[1;3;38;2;155;135;227m> Running modules and inputs in parallel: 
Module key: 2462e21a-2d2a-47f1-9888-55ff5cb4a2a8. Input: 
question: what was the last year where this team was a part of the usl a-league?
table: |    |   Year |   Division | League              | Regular Season   | Playoffs        | Open Cup        | Avg. Attendance   |
|---:|-------:|-----------:|:--------------------|:-----------------|:----...


[0m[1;3;38;2;155;135;227m> Running modules and inputs in parallel: 
Module key: 4599bc01-469e-45df-a2d4-d3694e758045. Input: 
question: what was the last year