## <span style='color:#ff5f27'> 📝 Imports

In [1]:
from xgboost import XGBRegressor
import hopsworks
from functions.llm_chain import load_model, get_llm_chain, generate_response
import pandas as pd
import warnings
warnings.filterwarnings("ignore")

## <span style="color:#ff5f27;"> 🔮 Connect to Hopsworks Feature Store </span>

In [2]:
project = hopsworks.login()
fs = project.get_feature_store() 

Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://snurran.hops.works/p/5240
Connected. Call `.close()` to terminate connection gracefully.


In [3]:
# Get_or_create the 'air_quality_fv' feature view
feature_view = fs.get_feature_view(
    name='air_quality_fv',
    version=1
)

# Initialize batch scoring
feature_view.init_batch_scoring(1)


## <span style="color:#ff5f27;">🪝 Retrieve AirQuality Model from Model Registry</span>

In [4]:
# Retrieve the model registry
mr = project.get_model_registry()

# Retrieve the 'air_quality_xgboost_model' from the model registry
retrieved_model = mr.get_model(
    name="air_quality_xgboost_model",
    version=1,
)

# Download the saved model artifacts  to a local directory
saved_model_dir = retrieved_model.download()

Connected. Call `.close()` to terminate connection gracefully.
Downloading model artifact (1 dirs, 6 files)... DONE

In [5]:
# Loading the XGBoost regressor model and label encoder from the saved model directory
# model_air_quality = joblib.load(saved_model_dir + "/xgboost_regressor.pkl")
model_air_quality = XGBRegressor()

model_air_quality.load_model(saved_model_dir + "/model.json")

# Displaying the retrieved XGBoost regressor model
model_air_quality

In [6]:
from functions.air_quality_data_retrieval import *
date_start = "2024-02-02"
date_end = "2024-02-04"
res = get_historical_data_in_date_range(date_start, date_end, feature_view, model_air_quality)
print(res)

Finished: Reading data from Hopsworks, using ArrowFlight (1.22s) 
         date  pm25
0  2024-02-02  22.0
1  2024-02-03  12.0
2  2024-02-04  17.0
3  2024-02-05  20.0


## <span style='color:#ff5f27'>⬇️ LLM Loading

In [7]:
# Load the LLM and its corresponding tokenizer.
model_llm, tokenizer = load_model()

tokenizer_config.json:   0%|          | 0.00/1.60k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/51.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/101 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


config.json:   0%|          | 0.00/624 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.94G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

2024-03-20 10:49:25,327 INFO: We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/120 [00:00<?, ?B/s]

## <span style='color:#ff5f27'>⛓️ LangChain

In [8]:
# Create and configure a language model chain.
llm_chain = get_llm_chain(
    model_llm,
    tokenizer,
)

## <span style='color:#ff5f27'>🧬 Model Inference


In [9]:
QUESTION7 = "Hi!"

response7 = generate_response(
    QUESTION7,
    feature_view,
    model_llm, 
    tokenizer,
    model_air_quality,
    llm_chain,
    verbose=True,
)

print(response7)

🗓️ Today's date: Wednesday, 2024-03-20
📖 

Hello! How can I assist you with air quality information?


In [10]:
QUESTION = "Who are you?"

response = generate_response(
    QUESTION,
    feature_view,
    model_llm,
    tokenizer,
    model_air_quality,
    llm_chain,
    verbose=True,
)

print(response)

🗓️ Today's date: Wednesday, 2024-03-20
📖 

I am an AI Air Quality Assistant, designed to provide you with information about air quality in the city provided by you. I can answer your questions about air quality and offer advice based on the data you provide.


In [11]:
QUESTION1 = "What was the average air quality from 2024-01-10 till 2024-01-14?"

response1 = generate_response(
    QUESTION1, 
    feature_view, 
    model_llm, 
    tokenizer, 
    model_air_quality, 
    llm_chain,
    verbose=True,
)

print(response1)

Finished: Reading data from Hopsworks, using ArrowFlight (1.12s) 
🗓️ Today's date: Wednesday, 2024-03-20
📖 Air Quality Measurements:
Date: 2024-01-10; Air Quality: 9.0
Date: 2024-01-11; Air Quality: 8.0
Date: 2024-01-12; Air Quality: 9.0
Date: 2024-01-13; Air Quality: 14.0
Date: 2024-01-14; Air Quality: 13.0
Date: 2024-01-15; Air Quality: 8.0

The average air quality from 2024-01-10 to 2024-01-14 was 10.4. This indicates that the air quality during that period was generally good, with no need to worry about going outside.


In [12]:
QUESTION11 = "When and what was the air quality like last week?"

response11 = generate_response(
    QUESTION11, 
    feature_view, 
    model_llm,
    tokenizer,
    model_air_quality,
    llm_chain,
    verbose=True,
)

print(response11)

Finished: Reading data from Hopsworks, using ArrowFlight (0.91s) 
🗓️ Today's date: Wednesday, 2024-03-20
📖 Air Quality Measurements:
Date: 2024-03-12; Air Quality: 46.0
Date: 2024-03-13; Air Quality: 51.0
Date: 2024-03-14; Air Quality: 41.0
Date: 2024-03-15; Air Quality: 54.0
Date: 2024-03-16; Air Quality: 45.0
Date: 2024-03-19; Air Quality: 17.0

Last week, on 2024-03-12, the air quality was 46.0, indicating that the air quality was unhealthy for sensitive groups. On 2024-03-13, the air quality was 51.0, which is also unhealthy for sensitive groups. On 2024-03-14, the air quality improved to 41.0, which was considered unhealthy. On 2024-03-15, the air quality was 54.0, which is unhealthy for sensitive groups. On 2024-03-16, the air quality was 45.0, which is also unhealthy for sensitive groups. On 2024-03-19, the air quality improved to 17.0, which is considered safe for everyone.


In [13]:
QUESTION12 = "When and what was the minimum air quality from 2024-01-10 till 2024-01-14?"

response12 = generate_response(
    QUESTION12, 
    feature_view, 
    model_llm, 
    tokenizer, 
    model_air_quality, 
    llm_chain,
    verbose=True,
)

print(response12)

Finished: Reading data from Hopsworks, using ArrowFlight (0.93s) 
🗓️ Today's date: Wednesday, 2024-03-20
📖 Air Quality Measurements:
Date: 2024-01-10; Air Quality: 9.0
Date: 2024-01-11; Air Quality: 8.0
Date: 2024-01-12; Air Quality: 9.0
Date: 2024-01-13; Air Quality: 14.0
Date: 2024-01-14; Air Quality: 13.0
Date: 2024-01-15; Air Quality: 8.0

The minimum air quality from 2024-01-10 to 2024-01-14 was on 2024-01-15, with an air quality of 8.0. This indicates that the air quality during that period was generally good, with no need to worry about going outside.


In [14]:
QUESTION2a = "What was the air quality like last week?"

response2 = generate_response(
    QUESTION2a,
    feature_view, 
    model_llm,
    tokenizer,
    model_air_quality,
    llm_chain,
    verbose=True,
)

print(response2)

Finished: Reading data from Hopsworks, using ArrowFlight (1.02s) 
🗓️ Today's date: Wednesday, 2024-03-20
📖 Air Quality Measurements:
Date: 2024-03-13; Air Quality: 51.0
Date: 2024-03-14; Air Quality: 41.0
Date: 2024-03-15; Air Quality: 54.0
Date: 2024-03-16; Air Quality: 45.0
Date: 2024-03-19; Air Quality: 17.0
Date: 2024-03-20; Air Quality: 17.0

Last week, the air quality was generally good. On 2024-03-19 and 2024-03-20, the air quality was 17.0, indicating that the air quality was safe for everyone. On 2024-03-15, the air quality was 54.0, which is unhealthy for sensitive groups. On 2024-03-16, the air quality was 45.0, which is also unhealthy for sensitive groups. On 2024-03-13, the air quality was 51.0, which is unhealthy for sensitive groups. On 2024-03-14, the air quality was 41.0, which was considered unhealthy.


In [15]:
QUESTION2 = "What was the air quality like yesterday?"

response2 = generate_response(
    QUESTION2,
    feature_view, 
    model_llm,
    tokenizer,
    model_air_quality,
    llm_chain,
    verbose=True,
)

print(response2)

Finished: Reading data from Hopsworks, using ArrowFlight (0.97s) 
🗓️ Today's date: Wednesday, 2024-03-20
📖 Air Quality Measurements:
Date: 2024-03-19; Air Quality: 17.0

Yesterday, the air quality was safe for everyone. The air quality measurement was 17.0, indicating that the air quality was safe for everyone.


In [16]:
QUESTION3 = "What will the air quality be like on 2024-03-20?"

response3 = generate_response(
    QUESTION3, 
    feature_view, 
    model_llm, 
    tokenizer,
    model_air_quality,
    llm_chain,
    verbose=True,
)

print(response3)

Connection closed.
Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://snurran.hops.works/p/5240
Connected. Call `.close()` to terminate connection gracefully.
Finished: Reading data from Hopsworks, using ArrowFlight (0.53s) 
🗓️ Today's date: Wednesday, 2024-03-20
📖 Air Quality Measurements:


On 2024-03-20, the air quality was 17.0, indicating that the air quality was safe for everyone. You can go outside and enjoy the day without any concerns about the air quality.


In [17]:
QUESTION4 = "What will the air quality be like the day after tomorrow?"

response4 = generate_response(
    QUESTION4, 
    feature_view, 
    model_llm, 
    tokenizer, 
    model_air_quality, 
    llm_chain,
    verbose=True,
)

print(response4)

Connection closed.
Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://snurran.hops.works/p/5240
Connected. Call `.close()` to terminate connection gracefully.
Finished: Reading data from Hopsworks, using ArrowFlight (0.46s) 
🗓️ Today's date: Wednesday, 2024-03-20
📖 Air Quality Measurements:


I'm sorry, but I can't predict the air quality for the day after tomorrow. The air quality can change depending on various factors such as weather, pollution sources, and other environmental conditions.


In [18]:
QUESTION5 = "What will the air quality be like this Sunday?"

response5 = generate_response(
    QUESTION5, 
    feature_view, 
    model_llm, 
    tokenizer, 
    model_air_quality, 
    llm_chain,
    verbose=True,
)

print(response5)

Connection closed.
Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://snurran.hops.works/p/5240
Connected. Call `.close()` to terminate connection gracefully.
Finished: Reading data from Hopsworks, using ArrowFlight (0.56s) 
🗓️ Today's date: Wednesday, 2024-03-20
📖 Air Quality Measurements:


I'm sorry, but I can't predict the air quality for this Sunday. The air quality can change depending on various factors such as weather, pollution sources, and other environmental conditions.


In [19]:
QUESTION7 = "What will the air quality be like for the rest of the week?"

response7 = generate_response(
    QUESTION7, 
    feature_view,
    model_llm,
    tokenizer, 
    model_air_quality, 
    llm_chain,
    verbose=True,
)

print(response7)

Connection closed.
Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://snurran.hops.works/p/5240
Connected. Call `.close()` to terminate connection gracefully.
Finished: Reading data from Hopsworks, using ArrowFlight (0.62s) 
🗓️ Today's date: Wednesday, 2024-03-20
📖 Air Quality Measurements:
Date: 2024-03-24 00:00:00; Air Quality: 41.25
Date: 2024-03-25 00:00:00; Air Quality: 58.89
Date: 2024-03-26 00:00:00; Air Quality: 47.22
Date: 2024-03-27 00:00:00; Air Quality: 39.18
Date: 2024-03-28 00:00:00; Air Quality: 39.91

I'm sorry, but I can't predict the air quality for the rest of the week. The air quality can change depending on various factors such as weather, pollution sources, and other environmental conditions.


In [20]:
QUESTION = "Will the air quality be safe or not for the next week?"

response = generate_response(
    QUESTION7, 
    feature_view, 
    model_llm, 
    tokenizer,
    model_air_quality,
    llm_chain,
    verbose=True,
)

print(response)

Connection closed.
Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://snurran.hops.works/p/5240
Connected. Call `.close()` to terminate connection gracefully.
Finished: Reading data from Hopsworks, using ArrowFlight (0.45s) 
🗓️ Today's date: Wednesday, 2024-03-20
📖 Air Quality Measurements:
Date: 2024-03-24 00:00:00; Air Quality: 41.25
Date: 2024-03-25 00:00:00; Air Quality: 58.89
Date: 2024-03-26 00:00:00; Air Quality: 47.22
Date: 2024-03-27 00:00:00; Air Quality: 39.18
Date: 2024-03-28 00:00:00; Air Quality: 39.91

I'm sorry, but I can't predict the air quality for the rest of the week. The air quality can change depending on various factors such as weather, pollution sources, and other environmental conditions.


In [21]:
QUESTION = "Is tomorrow's air quality level dangerous?"

response = generate_response(
    QUESTION, 
    feature_view, 
    model_llm, 
    tokenizer,
    model_air_quality, 
    llm_chain,
    verbose=True,
)

print(response)

Connection closed.
Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://snurran.hops.works/p/5240
Connected. Call `.close()` to terminate connection gracefully.
Finished: Reading data from Hopsworks, using ArrowFlight (0.43s) 
🗓️ Today's date: Wednesday, 2024-03-20
📖 Air Quality Measurements:


I'm sorry, but I can't predict the air quality for tomorrow. The air quality can change depending on various factors such as weather, pollution sources, and other environmental conditions.


In [22]:
QUESTION = "Can you please explain different air quality levels?"

response = generate_response(
    QUESTION, 
    feature_view, 
    model_llm, 
    tokenizer,
    model_air_quality, 
    llm_chain,
    verbose=True,
)

print(response)

🗓️ Today's date: Wednesday, 2024-03-20
📖 

Of course! Air quality levels are typically measured using an index, such as the Air Quality Index (AQI), which ranges from 0 to 500. Here's a brief explanation of the different air quality levels:

0-50: Good air quality, which means the air is clean and poses little or no risk.

51-100: Moderate air quality, which means the air is generally clean, but there may be some health concerns for sensitive groups, such as the elderly or those with respiratory issues.

101-150: Unhealthy for sensitive groups, which means that although the air quality is still considered moderate, it may pose health risks for certain groups, such as children, the elderly, and those with respiratory issues.

151-200: Unhealthy air quality, which means that the air quality is not safe for the general public, particularly for those with respiratory issues or heart disease.

201-300: Very unhealthy air quality, which means that the air quality is hazardous and can cause s

In [23]:
import gradio as gr
from transformers import pipeline
import numpy as np
import hopsworks
from xgboost import XGBRegressor
from functions.llm_chain import load_model, get_llm_chain, generate_response


2024-03-20 10:51:51,163 INFO: generated new fontManager
2024-03-20 10:51:51,477 INFO: HTTP Request: GET https://api.gradio.app/gradio-messaging/en "HTTP/1.1 200 OK"


In [24]:
# Initialize the ASR pipeline
transcriber = pipeline("automatic-speech-recognition", model="openai/whisper-base.en")

def transcribe(audio):
    sr, y = audio
    y = y.astype(np.float32)
    if y.ndim > 1 and y.shape[1] > 1:
        y = np.mean(y, axis=1)
    y /= np.max(np.abs(y))
    return transcriber({"sampling_rate": sr, "raw": y})["text"]

def generate_query_response(user_query):
    response = generate_response(
        user_query,
        feature_view,
        model_llm,
        tokenizer,
        model_air_quality,
        llm_chain,
        verbose=False,
    )
    return response

def handle_input(text_input=None, audio_input=None):
    if audio_input is not None:
        user_query = transcribe(audio_input)
    else:
        user_query = text_input
    
    if user_query:
        return generate_query_response(user_query)
    else:
        return "Please provide input either via text or voice."

iface = gr.Interface(
    fn=handle_input,
    inputs=[gr.Textbox(placeholder="Type here or use voice input..."), gr.Audio()],
    outputs="text",
    title="🌤️ AirQuality AI Assistant 💬",
    description="Ask your questions about air quality or use your voice to interact."
)

iface.launch(share=True)


config.json:   0%|          | 0.00/1.94k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/290M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/1.53k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/805 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.41M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

normalizer.json:   0%|          | 0.00/52.7k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/34.6k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/1.83k [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/185k [00:00<?, ?B/s]

Running on local URL:  http://127.0.0.1:7860
2024-03-20 10:52:00,100 INFO: HTTP Request: GET http://127.0.0.1:7860/startup-events "HTTP/1.1 200 OK"
2024-03-20 10:52:00,174 INFO: HTTP Request: GET https://checkip.amazonaws.com/ "HTTP/1.1 200 "
2024-03-20 10:52:00,689 INFO: HTTP Request: GET https://api.gradio.app/pkg-version "HTTP/1.1 200 OK"
2024-03-20 10:52:00,944 INFO: HTTP Request: POST https://api.gradio.app/gradio-initiated-analytics/ "HTTP/1.1 200 OK"
2024-03-20 10:52:02,176 INFO: HTTP Request: HEAD http://127.0.0.1:7860/ "HTTP/1.1 200 OK"
2024-03-20 10:52:11,599 INFO: HTTP Request: GET https://api.gradio.app/v2/tunnel-request "HTTP/1.1 200 OK"
2024-03-20 10:52:11,732 INFO: HTTP Request: GET https://cdn-media.huggingface.co/frpc-gradio-0.2/frpc_linux_amd64 "HTTP/1.1 200 OK"
Running on public URL: https://b28c6fa14cfcdba855.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://h



2024-03-20 10:52:13,992 INFO: HTTP Request: POST https://api.gradio.app/gradio-launched-telemetry/ "HTTP/1.1 200 OK"


Traceback (most recent call last):
  File "/srv/hops/anaconda/envs/theenv/lib/python3.10/site-packages/gradio/queueing.py", line 501, in call_prediction
    output = await route_utils.call_process_api(
  File "/srv/hops/anaconda/envs/theenv/lib/python3.10/site-packages/gradio/route_utils.py", line 253, in call_process_api
    output = await app.get_blocks().process_api(
  File "/srv/hops/anaconda/envs/theenv/lib/python3.10/site-packages/gradio/blocks.py", line 1695, in process_api
    result = await self.call_function(
  File "/srv/hops/anaconda/envs/theenv/lib/python3.10/site-packages/gradio/blocks.py", line 1235, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/srv/hops/anaconda/envs/theenv/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/srv/hops/anaconda/envs/theenv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2134, in run_sync_in_worker_thread

---