# Introduction

This notebook provides several examples of quenstioning air quality relevant questions on a given csv file. The give csv recored some air concentrations levels in multiple zones in an office buidling. 

# Proceeding directly

In [1]:
import openai as ai 
import pandas as pd 

In [2]:
ai.api_key= ""

In [4]:
air_quality=pd.read_csv("air_quality.csv")
air_quality.head()

Unnamed: 0.1,Unnamed: 0,building_id,zone,local_time,co2,pm2.5,voc,operating_date
0,0,montmartre_1,level_0_canteen,2021-01-01 04:00:00,428.040043,4.671595,120.008563,2021-01-01
1,1,montmartre_1,level_0_canteen,2021-01-01 04:15:00,411.791914,5.748613,102.994119,2021-01-01
2,2,montmartre_1,level_0_canteen,2021-01-01 04:30:00,460.474686,5.228284,80.632254,2021-01-01
3,3,montmartre_1,level_0_canteen,2021-01-01 04:45:00,426.727381,5.576495,88.328663,2021-01-01
4,4,montmartre_1,level_0_canteen,2021-01-01 05:00:00,386.360277,5.021224,84.95957,2021-01-01


In [4]:
def alert_bot(prompts, temperature=0.0, model="gpt-3.5"):
    messages = [{"role": "system", "content": "You are an air quality aler bot."}]
    for prompt in prompts:
        messages.append({"role": "user", "content": prompt})
        response = ai.ChatCompletion.create(
            model=model, temperature=temperature, messages=messages
        )
    return response["choices"][0]["message"]["content"]

In [5]:
air_quality_criteria="1. The average value co2 should be less than 600 by zone, \
2. The average value of pm2.5 should be less than 10 by zone, \
3. The average value of voc should be less than 150 by zone."


In [6]:
air_quality=air_quality[["zone","local_time","co2","voc","pm2.5"]]
air_quality_str=air_quality.to_csv(index=False)

In [7]:
prompts = []
prompts.append(
    f"Here is the criteria for air quality in an office building: \n\n {air_quality_criteria} \n\n"
)
prompts.append(
    f"Does zone level_2_space_z {air_quality_str} satisfies the criteria for air quality in the office building? \n\n\
    If the data does not satisfy criteria above, please send an alert by telling which zone does not satisfy which item. \
Otherwise, please tell that we don't need to raise any alert."
)
response = alert_bot(prompts, temperature=0.0, model="gpt-3.5-turbo")
print(response)

InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 3823656 tokens. Please reduce the length of the messages.

In [8]:
air_quality=air_quality[air_quality["zone"]=="level_2_space_z"].reset_index(drop=True)[:50]
air_quality_str=air_quality.to_csv(index=False)

In [9]:
prompts = []
prompts.append(
    f"Here is the criteria for air quality in an office building: \n\n {air_quality_criteria} \n\n"
)
prompts.append(
    f"Does zone level_2_space_z {air_quality_str} satisfies the criteria for air quality in the office building? \n\n\
    If the data does not satisfy criteria above, please send an alert by telling which zone does not satisfy which item. \
Otherwise, please tell that we don't need to raise any alert."
)
response = alert_bot(prompts, temperature=0.0, model="gpt-3.5-turbo")
print(response)

Based on the provided data, the average value of CO2 in zone level_2_space_z is less than 600, the average value of PM2.5 is less than 10, and the average value of VOC is less than 150. Therefore, we don't need to raise any alert.


In [10]:
air_quality_table[["co2","pm2.5","voc"]].mean() 

co2      511.961525
pm2.5      8.282097
voc      178.194309
dtype: float64

# Using pandas agent  

In [8]:
from langchain.agents import create_pandas_dataframe_agent
from langchain.chat_models import ChatOpenAI
import pandas as pd
from langchain.chains import SimpleSequentialChain
from langchain import PromptTemplate, LLMChain
import os

In [9]:
os.environ['OPENAI_API_KEY'] = ""
llm=ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo")

In [10]:
air_quality=pd.read_csv("air_quality.csv")

In [11]:
air_quality.groupby("zone")[["co2","voc","pm2.5"]].mean()

Unnamed: 0_level_0,co2,voc,pm2.5
zone,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
level_0_canteen,531.322219,148.982919,7.482683
level_1_macrosoft,515.894641,148.976711,7.498013
level_2_space_z,515.347956,149.304085,7.49613
level_3_space_z,512.595859,148.832491,7.494349
level_3_the_bossing_company,514.028917,149.097064,7.521987
level_4_dumont,514.245985,148.96531,7.486373
level_4_intratrust,516.842123,149.012107,7.506191
level_5_appleson_spector_dimm,516.491006,148.964516,7.50961
level_5_exponentlabs,515.360147,148.709483,7.521974
level_5_perinne_notary,515.274098,148.919806,7.480685


In [12]:
pandas_agent=create_pandas_dataframe_agent(
    llm=llm, 
    df=air_quality, 
    verbose=False, 
)

In [13]:
prompt_template = """Check if the {output} satisfies the following criteria.\

1. The value co2 should be less than 600, \
2. The value of pm2.5 should be less than 10, \
3. The value of voc should be less than 150.

If not, send an alert as which zone does not satisfiy which criteria. 
Otherwise, say that no alert shoud be sent. 
"""

PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["output"]
)
alert_sender=LLMChain(
    llm=llm,
    prompt=PROMPT
)

In [14]:
alert_bot= SimpleSequentialChain(chains=[pandas_agent, alert_sender], verbose=True)
alert_bot.run("Get the mean value of co2, pm2.5 and voc of zone level_2_space_z. ")



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3mco2 mean is 515.347956, pm2.5 mean is 7.496130, and voc mean is 149.304085.[0m
[33;1m[1;3mAn alert should be sent as the criteria for VOC mean is not satisfied. The VOC mean is 149.304085 which is less than 150, but it is not less than or equal to 150 as required by the criteria.[0m

[1m> Finished chain.[0m


'An alert should be sent as the criteria for VOC mean is not satisfied. The VOC mean is 149.304085 which is less than 150, but it is not less than or equal to 150 as required by the criteria.'

# Retriever 

In [15]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.chat_models import ChatOpenAI
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
from langchain.document_loaders.csv_loader import CSVLoader
from langchain.vectorstores import FAISS
from langchain.prompts import PromptTemplate

In [16]:
loader = CSVLoader(file_path='air_quality.csv')
data = loader.load()

In [17]:
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(data, embeddings)
retriever = vectorstore.as_retriever()

In [18]:
alert_bot = RetrievalQA.from_chain_type(llm=llm,\
                                        chain_type="stuff",\
                                        retriever=retriever)
air_quality_criteria="1. The average value co2 should be less than 600 by zone, \
2. The average value of pm2.5 should be less than 10 by zone, \
3. The average value of voc should be less than 150 by zone."

alert_bot.run(
   f"Does zone level_2_space_z satisfies criteria for air quality in an office building: \n\n {air_quality_criteria} \n\n?"
)

'Based on the given data, we can calculate the average values for each parameter in zone level_2_space_z:\n\n- Average co2: (486.64941341793605 + 640.7694795493051 + 624.4318531986514) / 3 = 583.2832483889642\n- Average pm2.5: (10.735617412553209 + 10.602147333627244 + 6.812933791128792) / 3 = 9.383566845436748\n- Average voc: (178.2196606078645 + 282.6079725734885 + 120.98336366306097) / 3 = 193.6033322814713\n\nBased on these calculations, we can see that the average value of co2 in zone level_2_space_z is less than 600, which satisfies the first criteria. However, the average value of pm2.5 and voc in this zone are both greater than 10 and 150 respectively, which means that this zone does not satisfy the second and third criteria for air quality in an office building.'

In [19]:
retriever.get_relevant_documents("What the co2 value of zone level_2_space_z?")

[Document(page_content=': 20432\nbuilding_id: montmartre_1\nzone: level_2_space_z\nlocal_time: 2021-02-03 00:00:00\nco2: 459.02841578030467\npm2.5: 6.701546912613074\nvoc: 113.76630630008938\noperating_date: 2021-02-03', metadata={'source': 'air_quality.csv', 'row': 20432}),
 Document(page_content=': 25519\nbuilding_id: montmartre_1\nzone: level_2_space_z\nlocal_time: 2021-03-27 23:45:00\nco2: 433.1016248073304\npm2.5: 7.0777416825691155\nvoc: 107.17875350510484\noperating_date: 2021-03-27', metadata={'source': 'air_quality.csv', 'row': 25519}),
 Document(page_content=': 24576\nbuilding_id: montmartre_1\nzone: level_2_space_z\nlocal_time: 2021-03-18 04:00:00\nco2: 324.90747541067236\npm2.5: 3.834183099002247\nvoc: 70.26112321835673\noperating_date: 2021-03-18', metadata={'source': 'air_quality.csv', 'row': 24576}),
 Document(page_content=': 25664\nbuilding_id: montmartre_1\nzone: level_2_space_z\nlocal_time: 2021-03-29 12:00:00\nco2: 570.6097586886901\npm2.5: 13.596116365356881\nvoc: 3