# Analysing user queries from the prototype

These are stored in weights and biases

- find themes in the queries
- see how they relate to user research
- see how they relate to our current ideas on MVK

Full write up [is in Notion](https://www.notion.so/climatepolicyradar/logs-analysis-from-RAG-prototype-3c598eb04740496d9a2f1c222e968219).

In [24]:
import wandb
import pandas as pd

import sys
sys.path.append("../..")

from src.config import WANDB_PROJECT_NAME

In [19]:
wandb_api = wandb.Api()

# Selected by filtering on wandb user 'cpr-mark' and selecting runs with runtime > 1h.
# I tried to do this using the wandb API, but filtering by user is not documented.
RUN_NAMES = [
    "lyric-pyramid-165",
    "woven-sea-147",
    "sleek-morning-141",
    "chocolate-feather-140",
]

entity, project = "climatepolicyradar", WANDB_PROJECT_NAME

# runs = [wandb_api.run(f"{entity}/{project}/{run_name}") for run_name in RUN_NAMES]

runs = wandb_api.runs(
    path=f"{entity}/{project}",
    filters={"display_name": {"$regex": "|".join(RUN_NAMES)}}
)

assert len(runs) == len(RUN_NAMES)

In [37]:
logs_df_list = [run.history() for run in runs]

logs_df = pd.concat(logs_df_list)

internal_users = {
    "matyas",
    "alan",
    "Alan",
    "kalyan",
    "harrison",
    "thor",
    "THor",
    "thor ",
    # "Sazzy G",
    "Betsy",
}

logs_df = logs_df[~logs_df['user'].isin(internal_users)]

print(f"We have {len(logs_df)} logs from {len(logs_df['user'].unique())} users, across {len(logs_df['document_id'].unique())} documents.")

# Obscure the user names by keeping the first two letters and the last letter
logs_df['user'] = logs_df['user'].apply(lambda x: x[0] + x[1] + "*" + x[-1])

logs_df.sort_values(by="start_time")

We have 83 logs from 10 users, across 9 documents.


Unnamed: 0,end_time,_step,id,document_id,_runtime,query,generation,start_time,retrieved_passages,user,_timestamp,top_k
0,2024-02-27T13:24:44.453329,0,6fbbc141-9ce3-437c-95e1-9feb9bd9450f,CCLW.executive.9369.3236,97.237382,Objectives,Objectives:\n- Achieving a socially just trans...,2024-02-27T13:24:33.419506,"{""passages"": [{""node"": {""id_"": ""3323de6e-08a3-...",Ky*s,1.709040e+09,20
1,2024-02-27T13:26:01.446546,1,7912e04d-6776-4af7-99d4-0758be45c7ec,CCLW.executive.9369.3236,174.230572,implementation,Examples of implementation strategies for the ...,2024-02-27T13:25:54.619171,"{""passages"": [{""node"": {""id_"": ""3323de6e-08a3-...",Ky*s,1.709040e+09,20
2,2024-02-27T13:26:19.650844,2,58ff521c-c65e-4db2-9dd9-bc9a93159439,CCLW.executive.9369.3236,192.434863,What is the implementation process like?,The implementation process involves rigorous e...,2024-02-27T13:26:15.159870,"{""passages"": [{""node"": {""id_"": ""753f995d-b2ab-...",Ky*s,1.709040e+09,20
3,2024-02-27T13:26:25.678772,3,2817798f-3b59-4d73-94b0-4285d64d2d8b,CCLW.executive.9369.3236,198.462796,What is the implementation process?,The implementation process includes enforcing ...,2024-02-27T13:26:20.036340,"{""passages"": [{""node"": {""id_"": ""753f995d-b2ab-...",Ky*s,1.709040e+09,20
4,2024-02-27T13:26:30.736699,4,d3525263-c440-458c-8a00-5188c3c3c291,CCLW.executive.9369.3236,203.520726,What is the implementation process?,The implementation process includes enforcing ...,2024-02-27T13:26:24.487370,"{""passages"": [{""node"": {""id_"": ""753f995d-b2ab-...",Ky*s,1.709040e+09,20
...,...,...,...,...,...,...,...,...,...,...,...,...
51,2024-03-01T10:29:57.575050,51,868433ad-f3d6-4009-9ad2-fe35717cc6e9,CCLW.executive.9647.4059,239711.910509,write a poem based on the gender aspects of th...,The sources do not provide information on gend...,2024-03-01T10:29:55.830382,"{""passages"": [{""node"": {""id_"": ""1ccb5645-d31e-...",mi*l,1.709289e+09,20
52,2024-03-01T10:30:41.130294,52,ae010bfb-45a7-466a-8a54-5237743f73ba,CCLW.executive.9647.4059,239755.465755,"summarise the gender aspects of this plan, in ...",The gender aspects of this plan focus on promo...,2024-03-01T10:30:35.371383,"{""passages"": [{""node"": {""id_"": ""9d86ced3-6711-...",mi*l,1.709289e+09,20
53,2024-03-01T10:33:56.703920,53,744666db-9b51-4da8-8435-63562539d64d,CCLW.executive.9647.4059,239951.039385,"summarise the gender aspects of this plan, usi...",The gender aspects of the plan focus on promot...,2024-03-01T10:33:52.646487,"{""passages"": [{""node"": {""id_"": ""9d86ced3-6711-...",mi*l,1.709289e+09,20
54,2024-03-04T11:29:19.632254,54,162ff3c7-0d0a-42b2-b400-b0260dea60d1,CCLW.legislative.10699.5931,502473.967707,home energy,Examples of home energy policies mentioned in ...,2024-03-04T11:29:09.650043,"{""passages"": [{""node"": {""id_"": ""33a04622-f9f4-...",Si*s,1.709552e+09,20


In [44]:
print(" - " + "\n - ".join(sorted(list(set(logs_df['query'].tolist())))))

 - Does it include indigenous communities?
 - Does the end target cover all greenhouse gases?
 - Does the plan refer to equity or fairness?
 - Does this plan include indigenous communities?
 - Does this plan involve the role of indigenous communities?
 - How does this Act define climate change?
 - How does this law address climate change?
 - How will implementation be monitored?
 - How will this be implemented?
 - Is there a net zero target in the Act?
 - Is there an overall emissions target in this Act?
 - Is there anything about offsets or carbon credits in the plan?
 - Is this Act economy-wide?
 - Is this Act sector-wide?
 - MSMEs
 - Objectives
 - On which renewable energy source does this Act focus the most?
 - On which sectors does this Act focus?
 - What are the main components of this document?
 - What is the implementation process like?
 - What is the implementation process?
 - What is the role of indigenous communities?
 - What is the vision of the climate act?
 - What is the 

In [36]:
logs_df[logs_df['query'].str.contains("swear")]

Unnamed: 0,end_time,_step,id,document_id,_runtime,query,generation,start_time,retrieved_passages,user,_timestamp,top_k
53,2024-03-01T10:33:56.703920,53,744666db-9b51-4da8-8435-63562539d64d,CCLW.executive.9647.4059,239951.039385,"summarise the gender aspects of this plan, usi...",The gender aspects of the plan focus on promot...,2024-03-01T10:33:52.646487,"{""passages"": [{""node"": {""id_"": ""9d86ced3-6711-...",mi*l,1709289000.0,20
