## <span style='color:#ff5f27'> 📝 Imports

In [1]:
from xgboost import XGBRegressor
import hopsworks
from functions.llm_chain import load_model, get_llm_chain, generate_response
import pandas as pd
import warnings
warnings.filterwarnings("ignore")

## <span style="color:#ff5f27;"> 🔮 Connect to Hopsworks Feature Store </span>

In [2]:
project = hopsworks.login()
fs = project.get_feature_store() 

Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://snurran.hops.works/p/5240
Connected. Call `.close()` to terminate connection gracefully.


In [3]:
# Get_or_create the 'air_quality_fv' feature view
feature_view = fs.get_feature_view(
    name='air_quality_fv',
    version=1
)

# Initialize batch scoring
feature_view.init_batch_scoring(1)


## <span style="color:#ff5f27;">🪝 Retrieve AirQuality Model from Model Registry</span>

In [4]:
# Retrieve the model registry
mr = project.get_model_registry()

# Retrieve the 'air_quality_xgboost_model' from the model registry
retrieved_model = mr.get_model(
    name="air_quality_xgboost_model",
    version=1,
)

# Download the saved model artifacts  to a local directory
saved_model_dir = retrieved_model.download()

Connected. Call `.close()` to terminate connection gracefully.
Downloading model artifact (1 dirs, 6 files)... DONE

In [5]:
# Loading the XGBoost regressor model and label encoder from the saved model directory
# model_air_quality = joblib.load(saved_model_dir + "/xgboost_regressor.pkl")
model_air_quality = XGBRegressor()

model_air_quality.load_model(saved_model_dir + "/model.json")

# Displaying the retrieved XGBoost regressor model
model_air_quality

In [6]:
from functions.air_quality_data_retrieval import *
date_start = "2024-02-02"
date_end = "2024-02-04"
res = get_historical_data_in_date_range(date_start, date_end, feature_view, model_air_quality)
print(res)

Finished: Reading data from Hopsworks, using ArrowFlight (0.91s) 
         date  pm25
0  2024-02-02  22.0
1  2024-02-03  12.0
2  2024-02-04  17.0
3  2024-02-05  20.0


## <span style='color:#ff5f27'>⬇️ LLM Loading

In [7]:
# Load the LLM and its corresponding tokenizer.
model_llm, tokenizer = load_model()

tokenizer_config.json:   0%|          | 0.00/1.60k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/51.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/101 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


config.json:   0%|          | 0.00/624 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.94G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/120 [00:00<?, ?B/s]

## <span style='color:#ff5f27'>⛓️ LangChain

In [8]:
# Create and configure a language model chain.
llm_chain = get_llm_chain(
    model_llm,
    tokenizer,
)

## <span style='color:#ff5f27'>🧬 Model Inference


In [9]:
QUESTION7 = "Hi!"

response7 = generate_response(
    QUESTION7,
    feature_view,
    model_llm, 
    tokenizer,
    model_air_quality,
    llm_chain,
    verbose=True,
)

print(response7)

🗓️ Today's date: Tuesday, 2024-03-19
📖 

Hello! How can I assist you with air quality information?


In [10]:
QUESTION = "Who are you?"

response = generate_response(
    QUESTION,
    feature_view,
    model_llm,
    tokenizer,
    model_air_quality,
    llm_chain,
    verbose=True,
)

print(response)

🗓️ Today's date: Tuesday, 2024-03-19
📖 

I am an AI Air Quality Assistant, here to help you with air quality information.


In [11]:
QUESTION1 = "What was the average air quality from 2024-01-10 till 2024-01-14?"

response1 = generate_response(
    QUESTION1, 
    feature_view, 
    model_llm, 
    tokenizer, 
    model_air_quality, 
    llm_chain,
    verbose=True,
)

print(response1)

Finished: Reading data from Hopsworks, using ArrowFlight (0.89s) 
🗓️ Today's date: Tuesday, 2024-03-19
📖 Air Quality Measurements:
Date: 2024-01-10; Air Quality: 9.0
Date: 2024-01-11; Air Quality: 8.0
Date: 2024-01-12; Air Quality: 9.0
Date: 2024-01-13; Air Quality: 14.0
Date: 2024-01-14; Air Quality: 13.0
Date: 2024-01-15; Air Quality: 8.0

The average air quality from 2024-01-10 till 2024-01-14 was 10.6. The air quality during that period ranged from safe to moderately polluted, so it would be advisable to limit outdoor activities on days with higher pollution levels.


In [12]:
QUESTION11 = "When and what was the air quality like last week?"

response11 = generate_response(
    QUESTION11, 
    feature_view, 
    model_llm,
    tokenizer,
    model_air_quality,
    llm_chain,
    verbose=True,
)

print(response11)

Finished: Reading data from Hopsworks, using ArrowFlight (1.04s) 
🗓️ Today's date: Tuesday, 2024-03-19
📖 Air Quality Measurements:
Date: 2024-03-12; Air Quality: 46.0
Date: 2024-03-13; Air Quality: 51.0
Date: 2024-03-14; Air Quality: 41.0
Date: 2024-03-15; Air Quality: 54.0
Date: 2024-03-16; Air Quality: 45.0

Last week, the air quality was as follows:

- On 2024-03-12, the air quality was 46.0, which indicates very polluted air. It is not recommended to engage in outdoor activities on this day.
- On 2024-03-13, the air quality was 51.0, which indicates extremely polluted air. It is advisable to limit outdoor activities on this day.
- On 2024-03-14, the air quality was 41.0, which indicates moderately polluted air. It would be advisable to limit outdoor activities on this day.
- On 2024-03-15, the air quality was 54.0, which indicates extremely polluted air. It is advisable to limit outdoor activities on this day.
- On 2024-03-16, the air quality was 45.0, which indicates moderately po

In [13]:
QUESTION12 = "When and what was the minimum air quality from 2024-01-10 till 2024-01-14?"

response12 = generate_response(
    QUESTION12, 
    feature_view, 
    model_llm, 
    tokenizer, 
    model_air_quality, 
    llm_chain,
    verbose=True,
)

print(response12)

Finished: Reading data from Hopsworks, using ArrowFlight (0.82s) 
🗓️ Today's date: Tuesday, 2024-03-19
📖 Air Quality Measurements:
Date: 2024-01-10; Air Quality: 9.0
Date: 2024-01-11; Air Quality: 8.0
Date: 2024-01-12; Air Quality: 9.0
Date: 2024-01-13; Air Quality: 14.0
Date: 2024-01-14; Air Quality: 13.0
Date: 2024-01-15; Air Quality: 8.0

The minimum air quality from 2024-01-10 till 2024-01-14 was on 2024-01-15, with an air quality of 8.0. This indicates clean air, and it is safe to engage in outdoor activities.


In [14]:
QUESTION2a = "What was the air quality like last week?"

response2 = generate_response(
    QUESTION2a,
    feature_view, 
    model_llm,
    tokenizer,
    model_air_quality,
    llm_chain,
    verbose=True,
)

print(response2)

Finished: Reading data from Hopsworks, using ArrowFlight (0.83s) 
🗓️ Today's date: Tuesday, 2024-03-19
📖 Air Quality Measurements:
Date: 2024-03-12; Air Quality: 46.0
Date: 2024-03-13; Air Quality: 51.0
Date: 2024-03-14; Air Quality: 41.0
Date: 2024-03-15; Air Quality: 54.0
Date: 2024-03-16; Air Quality: 45.0

Last week, the air quality was as follows:

- On 2024-03-12, the air quality was 46.0, which indicates very polluted air. It is not recommended to engage in outdoor activities on this day.
- On 2024-03-13, the air quality was 51.0, which indicates extremely polluted air. It is advisable to limit outdoor activities on this day.
- On 2024-03-14, the air quality was 41.0, which indicates moderately polluted air. It would be advisable to limit outdoor activities on this day.
- On 2024-03-15, the air quality was 54.0, which indicates extremely polluted air. It is advisable to limit outdoor activities on this day.
- On 2024-03-16, the air quality was 45.0, which indicates moderately po

In [15]:
QUESTION2 = "What was the air quality like yesterday?"

response2 = generate_response(
    QUESTION2,
    feature_view, 
    model_llm,
    tokenizer,
    model_air_quality,
    llm_chain,
    verbose=True,
)

print(response2)

Finished: Reading data from Hopsworks, using ArrowFlight (0.88s) 
🗓️ Today's date: Tuesday, 2024-03-19
📖 Air Quality Measurements:


Yesterday, on 2024-03-18, the air quality was 48.0, which indicates very polluted air. It is not recommended to engage in outdoor activities on this day.


In [16]:
QUESTION3 = "What will the air quality be like on 2024-03-20?"

response3 = generate_response(
    QUESTION3, 
    feature_view, 
    model_llm, 
    tokenizer,
    model_air_quality,
    llm_chain,
    verbose=True,
)

print(response3)

Connection closed.
Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://snurran.hops.works/p/5240
Connected. Call `.close()` to terminate connection gracefully.
Finished: Reading data from Hopsworks, using ArrowFlight (0.35s) 
🗓️ Today's date: Tuesday, 2024-03-19
📖 Air Quality Measurements:


On 2024-03-20, the air quality is expected to be 50.0, which indicates extremely polluted air. It is advisable to limit outdoor activities on this day.


In [17]:
QUESTION4 = "What will the air quality be like the day after tomorrow?"

response4 = generate_response(
    QUESTION4, 
    feature_view, 
    model_llm, 
    tokenizer, 
    model_air_quality, 
    llm_chain,
    verbose=True,
)

print(response4)

Connection closed.
Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://snurran.hops.works/p/5240
Connected. Call `.close()` to terminate connection gracefully.
Finished: Reading data from Hopsworks, using ArrowFlight (0.41s) 
🗓️ Today's date: Tuesday, 2024-03-19
📖 Air Quality Measurements:


On 2024-03-21, the air quality is expected to be 49.0, which indicates extremely polluted air. It is advisable to limit outdoor activities on this day.


In [18]:
QUESTION5 = "What will the air quality be like this Sunday?"

response5 = generate_response(
    QUESTION5, 
    feature_view, 
    model_llm, 
    tokenizer, 
    model_air_quality, 
    llm_chain,
    verbose=True,
)

print(response5)

Connection closed.
Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://snurran.hops.works/p/5240
Connected. Call `.close()` to terminate connection gracefully.
Finished: Reading data from Hopsworks, using ArrowFlight (0.40s) 
🗓️ Today's date: Tuesday, 2024-03-19
📖 Air Quality Measurements:


On Sunday, 2024-03-24, the air quality is expected to be 48.0, which indicates very polluted air. It is not recommended to engage in outdoor activities on this day.


In [19]:
QUESTION7 = "What will the air quality be like for the rest of the week?"

response7 = generate_response(
    QUESTION7, 
    feature_view,
    model_llm,
    tokenizer, 
    model_air_quality, 
    llm_chain,
    verbose=True,
)

print(response7)

Connection closed.
Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://snurran.hops.works/p/5240
Connected. Call `.close()` to terminate connection gracefully.
Finished: Reading data from Hopsworks, using ArrowFlight (0.37s) 
🗓️ Today's date: Tuesday, 2024-03-19
📖 Air Quality Measurements:
Date: 2024-03-19 00:00:00; Air Quality: 47.81
Date: 2024-03-20 00:00:00; Air Quality: 38.51
Date: 2024-03-21 00:00:00; Air Quality: 36.06
Date: 2024-03-22 00:00:00; Air Quality: 40.23
Date: 2024-03-23 00:00:00; Air Quality: 24.64
Date: 2024-03-24 00:00:00; Air Quality: 28.36
Date: 2024-03-25 00:00:00; Air Quality: 18.81

The air quality for the rest of the week is expected to be as follows:

- On Wednesday, 2024-03-20, the air quality is expected to be 38.51, which indicates extremely polluted air. It is advisable to limit outdoor activities on this day.
- On Thursday, 2024-03-21, the air quality is expected to be 36.06, which indicates extrem

In [20]:
QUESTION = "Will the air quality be safe or not for the next week?"

response = generate_response(
    QUESTION7, 
    feature_view, 
    model_llm, 
    tokenizer,
    model_air_quality,
    llm_chain,
    verbose=True,
)

print(response)

Connection closed.
Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://snurran.hops.works/p/5240
Connected. Call `.close()` to terminate connection gracefully.
Finished: Reading data from Hopsworks, using ArrowFlight (0.37s) 
🗓️ Today's date: Tuesday, 2024-03-19
📖 Air Quality Measurements:
Date: 2024-03-19 00:00:00; Air Quality: 47.81
Date: 2024-03-20 00:00:00; Air Quality: 38.51
Date: 2024-03-21 00:00:00; Air Quality: 36.06
Date: 2024-03-22 00:00:00; Air Quality: 40.23
Date: 2024-03-23 00:00:00; Air Quality: 24.64
Date: 2024-03-24 00:00:00; Air Quality: 28.36
Date: 2024-03-25 00:00:00; Air Quality: 18.81

The air quality for the rest of the week is expected to be as follows:

- On Wednesday, 2024-03-20, the air quality is expected to be 38.51, which indicates extremely polluted air. It is advisable to limit outdoor activities on this day.
- On Thursday, 2024-03-21, the air quality is expected to be 36.06, which indicates extrem

In [21]:
QUESTION = "Is tomorrow's air quality level dangerous?"

response = generate_response(
    QUESTION, 
    feature_view, 
    model_llm, 
    tokenizer,
    model_air_quality, 
    llm_chain,
    verbose=True,
)

print(response)

Connection closed.
Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://snurran.hops.works/p/5240
Connected. Call `.close()` to terminate connection gracefully.
Finished: Reading data from Hopsworks, using ArrowFlight (0.42s) 
🗓️ Today's date: Tuesday, 2024-03-19
📖 Air Quality Measurements:


On Wednesday, 2024-03-20, the air quality is expected to be 38.51, which indicates extremely polluted air. It is advisable to limit outdoor activities on this day. While it is not considered dangerous, it is not recommended for sensitive individuals or those with respiratory issues to engage in prolonged outdoor activities.


In [22]:
QUESTION = "Can you please explain different air quality levels?"

response = generate_response(
    QUESTION, 
    feature_view, 
    model_llm, 
    tokenizer,
    model_air_quality, 
    llm_chain,
    verbose=True,
)

print(response)

🗓️ Today's date: Tuesday, 2024-03-19
📖 

Certainly! Air quality levels are typically measured on a scale, with different levels indicating varying degrees of air pollution. Here is a general breakdown of air quality levels:

1. Good (0-50): Air quality is considered good, and it is safe for everyone to engage in outdoor activities.
2. Moderate (51-100): Air quality is acceptable, but sensitive groups (such as children, the elderly, and those with respiratory issues) may want to limit prolonged exposure.
3. Poor (101-150): Air quality is not considered healthy, and groups sensitive to air pollution may experience health effects. It is advisable to limit outdoor activities.
4. Very Poor (151-200): Air quality is significantly polluted, and the general public may experience health effects. It is not recommended to engage in outdoor activities.
5. Hazardous (200+): Air quality is extremely polluted, and it is dangerous for everyone to engage in outdoor activities.

These levels may vary de

In [23]:
import gradio as gr
from transformers import pipeline
import numpy as np
import hopsworks
from xgboost import XGBRegressor
from functions.llm_chain import load_model, get_llm_chain, generate_response


2024-03-19 06:52:24,184 INFO: generated new fontManager
2024-03-19 06:52:24,549 INFO: HTTP Request: GET https://api.gradio.app/gradio-messaging/en "HTTP/1.1 200 OK"


In [24]:
# Initialize the ASR pipeline
transcriber = pipeline("automatic-speech-recognition", model="openai/whisper-base.en")

def transcribe(audio):
    sr, y = audio
    y = y.astype(np.float32)
    if y.ndim > 1 and y.shape[1] > 1:
        y = np.mean(y, axis=1)
    y /= np.max(np.abs(y))
    return transcriber({"sampling_rate": sr, "raw": y})["text"]

def generate_query_response(user_query):
    response = generate_response(
        user_query,
        feature_view,
        model_llm,
        tokenizer,
        model_air_quality,
        llm_chain,
        verbose=False,
    )
    return response

def handle_input(text_input=None, audio_input=None):
    if audio_input is not None:
        user_query = transcribe(audio_input)
    else:
        user_query = text_input
    
    if user_query:
        return generate_query_response(user_query)
    else:
        return "Please provide input either via text or voice."

iface = gr.Interface(
    fn=handle_input,
    inputs=[gr.Textbox(placeholder="Type here or use voice input..."), gr.Audio()],
    outputs="text",
    title="🌤️ AirQuality AI Assistant 💬",
    description="Ask your questions about air quality or use your voice to interact."
)

iface.launch(share=True)


config.json:   0%|          | 0.00/1.94k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/290M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/1.53k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/805 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.41M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

normalizer.json:   0%|          | 0.00/52.7k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/34.6k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/1.83k [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/185k [00:00<?, ?B/s]

Running on local URL:  http://127.0.0.1:7860
2024-03-19 06:52:32,787 INFO: HTTP Request: GET http://127.0.0.1:7860/startup-events "HTTP/1.1 200 OK"
2024-03-19 06:52:32,810 INFO: HTTP Request: GET https://checkip.amazonaws.com/ "HTTP/1.1 200 "
2024-03-19 06:52:33,305 INFO: HTTP Request: GET https://api.gradio.app/pkg-version "HTTP/1.1 200 OK"
2024-03-19 06:52:33,627 INFO: HTTP Request: POST https://api.gradio.app/gradio-initiated-analytics/ "HTTP/1.1 200 OK"
2024-03-19 06:52:34,872 INFO: HTTP Request: HEAD http://127.0.0.1:7860/ "HTTP/1.1 200 OK"
2024-03-19 06:52:38,123 INFO: HTTP Request: GET https://api.gradio.app/v2/tunnel-request "HTTP/1.1 200 OK"
2024-03-19 06:52:38,278 INFO: HTTP Request: GET https://cdn-media.huggingface.co/frpc-gradio-0.2/frpc_linux_amd64 "HTTP/1.1 200 OK"
Running on public URL: https://2a42de7877ff3aa594.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://h



2024-03-19 06:52:40,785 INFO: HTTP Request: POST https://api.gradio.app/gradio-launched-telemetry/ "HTTP/1.1 200 OK"


---