## <span style='color:#ff5f27'> 📝 Imports

In [1]:
from xgboost import XGBRegressor
import hopsworks
from functions.llm_chain import load_model, get_llm_chain, generate_response
import pandas as pd
import warnings
warnings.filterwarnings("ignore")

## <span style="color:#ff5f27;"> 🔮 Connect to Hopsworks Feature Store </span>

In [2]:
project = hopsworks.login()
fs = project.get_feature_store() 

Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://snurran.hops.works/p/5240
Connected. Call `.close()` to terminate connection gracefully.


In [3]:
# Get_or_create the 'air_quality_fv' feature view
feature_view = fs.get_feature_view(
    name='air_quality_fv',
    version=1
)

# Initialize batch scoring
feature_view.init_batch_scoring(1)


## <span style="color:#ff5f27;">🪝 Retrieve AirQuality Model from Model Registry</span>

In [4]:
# Retrieve the model registry
mr = project.get_model_registry()

# Retrieve the 'air_quality_xgboost_model' from the model registry
retrieved_model = mr.get_model(
    name="air_quality_xgboost_model",
    version=1,
)

# Download the saved model artifacts  to a local directory
saved_model_dir = retrieved_model.download()

Connected. Call `.close()` to terminate connection gracefully.
Downloading model artifact (1 dirs, 6 files)... DONE

In [5]:
# Loading the XGBoost regressor model and label encoder from the saved model directory
# model_air_quality = joblib.load(saved_model_dir + "/xgboost_regressor.pkl")
model_air_quality = XGBRegressor()

model_air_quality.load_model(saved_model_dir + "/model.json")

# Displaying the retrieved XGBoost regressor model
model_air_quality

In [6]:
from functions.air_quality_data_retrieval import *
date_start = "2024-02-02"
date_end = "2024-02-04"
res = get_historical_data_in_date_range(date_start, date_end, feature_view, model_air_quality)
print(res)

Finished: Reading data from Hopsworks, using ArrowFlight (0.82s) 
         date  pm25
0  2024-02-02  22.0
1  2024-02-03  12.0
2  2024-02-04  17.0


## <span style='color:#ff5f27'>⬇️ LLM Loading

In [7]:
# Load the LLM and its corresponding tokenizer.
model_llm, tokenizer = load_model()

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

## <span style='color:#ff5f27'>⛓️ LangChain

In [8]:
# Create and configure a language model chain.
llm_chain = get_llm_chain(
    model_llm,
    tokenizer,
)

## <span style='color:#ff5f27'>🧬 Model Inference


In [9]:
QUESTION7 = "Hi!"

response7 = generate_response(
    QUESTION7,
    feature_view,
    model_llm, 
    tokenizer,
    model_air_quality,
    llm_chain,
    verbose=True,
)

print(response7)

🗓️ Today's date: Monday, 2024-03-18
📖 

Hello! How can I help you with air quality information?


In [10]:
QUESTION = "Who are you?"

response = generate_response(
    QUESTION,
    feature_view,
    model_llm,
    tokenizer,
    model_air_quality,
    llm_chain,
    verbose=True,
)

print(response)

🗓️ Today's date: Monday, 2024-03-18
📖 

I am an AI Air Quality Assistant, here to provide you with information about air quality in your city. Please let me know how I can assist you with that.


In [11]:
QUESTION1 = "What was the average air quality from 2024-01-10 till 2024-01-14?"

response1 = generate_response(
    QUESTION1, 
    feature_view, 
    model_llm, 
    tokenizer, 
    model_air_quality, 
    llm_chain,
    verbose=True,
)

print(response1)

Finished: Reading data from Hopsworks, using ArrowFlight (0.87s) 
🗓️ Today's date: Monday, 2024-03-18
📖 Air Quality Measurements:
Date: 2024-01-10; Air Quality: 9.0
Date: 2024-01-11; Air Quality: 8.0
Date: 2024-01-12; Air Quality: 9.0
Date: 2024-01-13; Air Quality: 14.0
Date: 2024-01-14; Air Quality: 13.0

The average air quality from 2024-01-10 to 2024-01-14 was 10.6.


In [12]:
QUESTION11 = "When and what was the air quality like last week?"

response11 = generate_response(
    QUESTION11, 
    feature_view, 
    model_llm,
    tokenizer,
    model_air_quality,
    llm_chain,
    verbose=True,
)

print(response11)

Finished: Reading data from Hopsworks, using ArrowFlight (0.78s) 
🗓️ Today's date: Monday, 2024-03-18
📖 Air Quality Measurements:
Date: 2024-03-11; Air Quality: 26.0
Date: 2024-03-12; Air Quality: 46.0
Date: 2024-03-13; Air Quality: 51.0
Date: 2024-03-14; Air Quality: 41.0
Date: 2024-03-15; Air Quality: 54.0
Date: 2024-03-16; Air Quality: 45.0

The air quality last week was as follows:

- On 2024-03-11, the air quality was 26.0, which is considered to be good for air quality.
- On 2024-03-12, the air quality was 46.0, which is also considered to be good for air quality.
- On 2024-03-13, the air quality was 51.0, which is considered to be good for air quality.
- On 2024-03-14, the air quality was 41.0, which is considered to be good for air quality.
- On 2024-03-15, the air quality was 54.0, which is considered to be good for air quality.
- On 2024-03-16, the air quality was 45.0, which is considered to be good for air quality.

Overall, the air quality last week was good, and it was sa

In [13]:
QUESTION12 = "When and what was the minimum air quality from 2024-01-10 till 2024-01-14?"

response12 = generate_response(
    QUESTION12, 
    feature_view, 
    model_llm, 
    tokenizer, 
    model_air_quality, 
    llm_chain,
    verbose=True,
)

print(response12)

Finished: Reading data from Hopsworks, using ArrowFlight (0.80s) 
🗓️ Today's date: Monday, 2024-03-18
📖 Air Quality Measurements:
Date: 2024-01-10; Air Quality: 9.0
Date: 2024-01-11; Air Quality: 8.0
Date: 2024-01-12; Air Quality: 9.0
Date: 2024-01-13; Air Quality: 14.0
Date: 2024-01-14; Air Quality: 13.0

The minimum air quality from 2024-01-10 to 2024-01-14 was on 2024-01-13, with an air quality level of 9.0.


In [14]:
QUESTION2a = "What was the air quality like last week?"

response2 = generate_response(
    QUESTION2a,
    feature_view, 
    model_llm,
    tokenizer,
    model_air_quality,
    llm_chain,
    verbose=True,
)

print(response2)

Finished: Reading data from Hopsworks, using ArrowFlight (0.85s) 
🗓️ Today's date: Monday, 2024-03-18
📖 Air Quality Measurements:
Date: 2024-03-11; Air Quality: 26.0
Date: 2024-03-12; Air Quality: 46.0
Date: 2024-03-13; Air Quality: 51.0
Date: 2024-03-14; Air Quality: 41.0
Date: 2024-03-15; Air Quality: 54.0
Date: 2024-03-16; Air Quality: 45.0

The air quality last week was as follows:

- On 2024-03-11, the air quality was 26.0, which is considered to be good for air quality.
- On 2024-03-12, the air quality was 46.0, which is also considered to be good for air quality.
- On 2024-03-13, the air quality was 51.0, which is considered to be good for air quality.
- On 2024-03-14, the air quality was 41.0, which is considered to be good for air quality.
- On 2024-03-15, the air quality was 54.0, which is considered to be good for air quality.
- On 2024-03-16, the air quality was 45.0, which is considered to be good for air quality.

Overall, the air quality last week was good, and it was sa

In [15]:
QUESTION2 = "What was the air quality like yesterday?"

response2 = generate_response(
    QUESTION2,
    feature_view, 
    model_llm,
    tokenizer,
    model_air_quality,
    llm_chain,
    verbose=True,
)

print(response2)

Finished: Reading data from Hopsworks, using ArrowFlight (0.87s) 
🗓️ Today's date: Monday, 2024-03-18
📖 Air Quality Measurements:


I'm sorry, but I don't have information about the air quality for yesterday. Could you please provide more context or a specific date for which you would like to know the air quality?


In [16]:
QUESTION3 = "What will the air quality be like on 2024-03-20?"

response3 = generate_response(
    QUESTION3, 
    feature_view, 
    model_llm, 
    tokenizer,
    model_air_quality,
    llm_chain,
    verbose=True,
)

print(response3)

Connection closed.
Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://snurran.hops.works/p/5240
Connected. Call `.close()` to terminate connection gracefully.
Finished: Reading data from Hopsworks, using ArrowFlight (0.44s) 
🗓️ Today's date: Monday, 2024-03-18
📖 Air Quality Measurements:


I'm sorry, but I don't have a prediction for the air quality on 2024-03-20. Could you please provide more context or a specific date for which you would like to know the air quality?


In [17]:
QUESTION4 = "What will the air quality be like the day after tomorrow?"

response4 = generate_response(
    QUESTION4, 
    feature_view, 
    model_llm, 
    tokenizer, 
    model_air_quality, 
    llm_chain,
    verbose=True,
)

print(response4)

Connection closed.
Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://snurran.hops.works/p/5240
Connected. Call `.close()` to terminate connection gracefully.
Finished: Reading data from Hopsworks, using ArrowFlight (0.39s) 
🗓️ Today's date: Monday, 2024-03-18
📖 Air Quality Measurements:


I'm sorry, but I don't have a prediction for the air quality the day after tomorrow. Could you please provide more context or a specific date for which you would like to know the air quality?


In [18]:
QUESTION5 = "What will the air quality be like on Sunday?"

response5 = generate_response(
    QUESTION5, 
    feature_view, 
    model_llm, 
    tokenizer, 
    model_air_quality, 
    llm_chain,
    verbose=True,
)

print(response5)

Connection closed.
Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://snurran.hops.works/p/5240
Connected. Call `.close()` to terminate connection gracefully.
Finished: Reading data from Hopsworks, using ArrowFlight (0.40s) 
🗓️ Today's date: Monday, 2024-03-18
📖 Air Quality Measurements:


I'm sorry, but I don't have information about the air quality for Sunday. Could you please provide more context or a specific date for which you would like to know the air quality?


In [19]:
QUESTION7 = "What will the air quality be like for the rest of the week?"

response7 = generate_response(
    QUESTION7, 
    feature_view,
    model_llm,
    tokenizer, 
    model_air_quality, 
    llm_chain,
    verbose=True,
)

print(response7)

Connection closed.
Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://snurran.hops.works/p/5240
Connected. Call `.close()` to terminate connection gracefully.
Finished: Reading data from Hopsworks, using ArrowFlight (0.40s) 
🗓️ Today's date: Monday, 2024-03-18
📖 Air Quality Measurements:
Date: 2024-03-18 00:00:00; Air Quality: 35.51
Date: 2024-03-19 00:00:00; Air Quality: 47.81
Date: 2024-03-20 00:00:00; Air Quality: 38.51
Date: 2024-03-21 00:00:00; Air Quality: 36.06
Date: 2024-03-22 00:00:00; Air Quality: 40.23
Date: 2024-03-23 00:00:00; Air Quality: 24.64
Date: 2024-03-24 00:00:00; Air Quality: 28.36

I can provide information about the air quality for the rest of the week based on the measurements provided. 

For Tuesday, 2024-03-19, the air quality is expected to be 47.81. This is in the moderate range (40-50), which means that the air may be unhealthy for sensitive groups, such as children, the elderly, and people with re

In [20]:
QUESTION = "Will the air quality be safe or not for the next week?"

response = generate_response(
    QUESTION7, 
    feature_view, 
    model_llm, 
    tokenizer,
    model_air_quality,
    llm_chain,
    verbose=True,
)

print(response)

Connection closed.
Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://snurran.hops.works/p/5240
Connected. Call `.close()` to terminate connection gracefully.
Finished: Reading data from Hopsworks, using ArrowFlight (0.50s) 
🗓️ Today's date: Monday, 2024-03-18
📖 Air Quality Measurements:
Date: 2024-03-18 00:00:00; Air Quality: 35.51
Date: 2024-03-19 00:00:00; Air Quality: 47.81
Date: 2024-03-20 00:00:00; Air Quality: 38.51
Date: 2024-03-21 00:00:00; Air Quality: 36.06
Date: 2024-03-22 00:00:00; Air Quality: 40.23
Date: 2024-03-23 00:00:00; Air Quality: 24.64
Date: 2024-03-24 00:00:00; Air Quality: 28.36

Based on the measurements provided, the air quality for the rest of the week is as follows:

For Tuesday, 2024-03-19, the air quality is expected to be 47.81. This is in the moderate range (40-50), which means that the air may be unhealthy for sensitive groups, such as children, the elderly, and people with respiratory issues.


In [21]:
QUESTION = "Is tomorrow's air quality level dangerous?"

response = generate_response(
    QUESTION, 
    feature_view, 
    model_llm, 
    tokenizer,
    model_air_quality, 
    llm_chain,
    verbose=True,
)

print(response)

Connection closed.
Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://snurran.hops.works/p/5240
Connected. Call `.close()` to terminate connection gracefully.
Finished: Reading data from Hopsworks, using ArrowFlight (0.40s) 
🗓️ Today's date: Monday, 2024-03-18
📖 Air Quality Measurements:


Based on the measurements provided, the air quality for tomorrow, 2024-03-19, is expected to be 47.81. This is in the moderate range (40-50), which means that the air may be unhealthy for sensitive groups, such as children, the elderly, and people with respiratory issues. While it is not considered dangerous, it is recommended to avoid outdoor activities for sensitive groups.


In [22]:
QUESTION = "Can you please explain different air quality levels?"

response = generate_response(
    QUESTION, 
    feature_view, 
    model_llm, 
    tokenizer,
    model_air_quality, 
    llm_chain,
    verbose=True,
)

print(response)

🗓️ Today's date: Monday, 2024-03-18
📖 

Certainly! Air quality levels are typically measured on a scale, and different organizations use different scales to rate air quality. One common scale is the Air Quality Index (AQI), which ranges from 0 to 500, with higher numbers indicating more pollution. Here are the general categories for AQI levels:

0-50: Good
51-100: Moderate
101-150: Unhealthy for Sensitive Groups
151-200: Unhealthy
201-300: Very Unhealthy
301-500: Hazardous

Another scale used is the Air Quality Health Index (AQHI), which ranges from 1 to 10+, with higher numbers indicating more health risks. Here are the general categories for AQHI levels:

1-3: Low Risk
4-6: Moderate Risk
7-10: High Risk
11+: Very High Risk

These scales help to communicate the potential health risks associated with different levels of air pollution.


In [23]:
import gradio as gr
from transformers import pipeline
import numpy as np
import hopsworks
from xgboost import XGBRegressor
from functions.llm_chain import load_model, get_llm_chain, generate_response


2024-03-18 07:30:26,930 INFO: HTTP Request: GET https://api.gradio.app/gradio-messaging/en "HTTP/1.1 200 OK"


In [24]:
# Initialize the ASR pipeline
transcriber = pipeline("automatic-speech-recognition", model="openai/whisper-base.en")

def transcribe(audio):
    sr, y = audio
    y = y.astype(np.float32)
    if y.ndim > 1 and y.shape[1] > 1:
        y = np.mean(y, axis=1)
    y /= np.max(np.abs(y))
    return transcriber({"sampling_rate": sr, "raw": y})["text"]

def generate_query_response(user_query):
    response = generate_response(
        user_query,
        feature_view,
        model_llm,
        tokenizer,
        model_air_quality,
        llm_chain,
        verbose=False,
    )
    return response

def handle_input(text_input=None, audio_input=None):
    if audio_input is not None:
        user_query = transcribe(audio_input)
    else:
        user_query = text_input
    
    if user_query:
        return generate_query_response(user_query)
    else:
        return "Please provide input either via text or voice."

iface = gr.Interface(
    fn=handle_input,
    inputs=[gr.Textbox(placeholder="Type here or use voice input..."), gr.Audio()],
    outputs="text",
    title="🌤️ AirQuality AI Assistant 💬",
    description="Ask your questions about air quality or use your voice to interact."
)

iface.launch(share=True)


Running on local URL:  http://127.0.0.1:7860
2024-03-18 07:30:28,744 INFO: HTTP Request: GET http://127.0.0.1:7860/startup-events "HTTP/1.1 200 OK"
2024-03-18 07:30:28,812 INFO: HTTP Request: GET https://checkip.amazonaws.com/ "HTTP/1.1 200 "
2024-03-18 07:30:29,229 INFO: HTTP Request: GET https://api.gradio.app/pkg-version "HTTP/1.1 200 OK"
2024-03-18 07:30:29,625 INFO: HTTP Request: POST https://api.gradio.app/gradio-initiated-analytics/ "HTTP/1.1 200 OK"
2024-03-18 07:30:30,801 INFO: HTTP Request: HEAD http://127.0.0.1:7860/ "HTTP/1.1 200 OK"
2024-03-18 07:30:35,260 INFO: HTTP Request: GET https://api.gradio.app/v2/tunnel-request "HTTP/1.1 200 OK"
Running on public URL: https://ae18888cf065580cbe.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)
2024-03-18 07:30:36,865 INFO: HTTP Request: HEAD https://ae18888cf065580cbe.gradio.live "HTTP/1.1 200 OK"




2024-03-18 07:30:37,595 INFO: HTTP Request: POST https://api.gradio.app/gradio-launched-telemetry/ "HTTP/1.1 200 OK"


---