## <span style='color:#ff5f27'> 📝 Imports

In [1]:
from xgboost import XGBRegressor
import hopsworks
from functions.llm_chain import load_model, get_llm_chain, generate_response
import pandas as pd
import warnings
warnings.filterwarnings("ignore")

## <span style="color:#ff5f27;"> 🔮 Connect to Hopsworks Feature Store </span>

In [2]:
project = hopsworks.login()
fs = project.get_feature_store() 

Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://snurran.hops.works/p/5240
Connected. Call `.close()` to terminate connection gracefully.


In [3]:
# Get_or_create the 'air_quality_fv' feature view
feature_view = fs.get_feature_view(
    name='air_quality_fv',
    version=1
)

# Initialize batch scoring
feature_view.init_batch_scoring(1)


## <span style="color:#ff5f27;">🪝 Retrieve AirQuality Model from Model Registry</span>

In [4]:
# Retrieve the model registry
mr = project.get_model_registry()

# Retrieve the 'air_quality_xgboost_model' from the model registry
retrieved_model = mr.get_model(
    name="air_quality_xgboost_model",
    version=1,
)

# Download the saved model artifacts  to a local directory
saved_model_dir = retrieved_model.download()

Connected. Call `.close()` to terminate connection gracefully.
Downloading model artifact (1 dirs, 6 files)... DONE

In [5]:
# Loading the XGBoost regressor model and label encoder from the saved model directory
# model_air_quality = joblib.load(saved_model_dir + "/xgboost_regressor.pkl")
model_air_quality = XGBRegressor()

model_air_quality.load_model(saved_model_dir + "/model.json")

# Displaying the retrieved XGBoost regressor model
model_air_quality

In [6]:
from functions.air_quality_data_retrieval import *
date_start = "2024-02-02"
date_end = "2024-02-04"
res = get_historical_data_in_date_range(date_start, date_end, feature_view, model_air_quality)
print(res)

Finished: Reading data from Hopsworks, using ArrowFlight (1.32s) 
         date  pm25
0  2024-02-02  22.0
1  2024-02-03  12.0
2  2024-02-04  17.0


In [7]:
res = get_historical_data_in_date_range(date_start, date_end, feature_view, model_air_quality)


Finished: Reading data from Hopsworks, using ArrowFlight (1.22s) 


## <span style='color:#ff5f27'>⬇️ LLM Loading

In [8]:
import time
start_time = time.time()

# Load the LLM and its corresponding tokenizer.
model_llm, tokenizer = load_model()

duration = time.time() - start_time
print(f"The code execution took {duration} seconds.")

You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


config.json:   0%|          | 0.00/624 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.94G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

2024-03-25 12:15:28,854 INFO: We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/120 [00:00<?, ?B/s]

The code execution took 148.75465869903564 seconds.


## <span style='color:#ff5f27'>⛓️ LangChain

In [9]:
import time
start_time = time.time()


# Create and configure a language model chain.
llm_chain = get_llm_chain(
    model_llm,
    tokenizer,
)

duration = time.time() - start_time
print(f"The code execution took {duration} seconds.")

The code execution took 0.6608378887176514 seconds.


## <span style='color:#ff5f27'>🧬 Model Inference


In [10]:
QUESTION7 = "Hi!"

response7 = generate_response(
    QUESTION7,
    feature_view,
    model_llm, 
    tokenizer,
    model_air_quality,
    llm_chain,
    verbose=True,
)

print(response7)

🗓️ Today's date: Monday, 2024-03-25
📖 

Hello! How can I help you with air quality information?


In [11]:
QUESTION = "Who are you?"

response = generate_response(
    QUESTION,
    feature_view,
    model_llm,
    tokenizer,
    model_air_quality,
    llm_chain,
    verbose=True,
)

print(response)

🗓️ Today's date: Monday, 2024-03-25
📖 

I am an AI Air Quality Assistant, here to help you with air quality information. I can provide you with the air quality indicators for a specific date or location, and offer advice on whether it's safe to go outside or not.


In [12]:
QUESTION1 = "What was the average air quality from 2024-01-10 till 2024-01-14?"

response1 = generate_response(
    QUESTION1, 
    feature_view, 
    model_llm, 
    tokenizer, 
    model_air_quality, 
    llm_chain,
    verbose=True,
)

print(response1)

Finished: Reading data from Hopsworks, using ArrowFlight (1.28s) 
🗓️ Today's date: Monday, 2024-03-25
📖 Air Quality Measurements:
Date: 2024-01-10; Air Quality: 9.0
Date: 2024-01-11; Air Quality: 8.0
Date: 2024-01-12; Air Quality: 9.0
Date: 2024-01-13; Air Quality: 14.0
Date: 2024-01-14; Air Quality: 13.0

The average air quality from 2024-01-10 to 2024-01-14 is 10.6. The air quality levels ranged from safe to moderate, and it would be suitable to go outside for most activities.


In [13]:
QUESTION11 = "When and what was the air quality like last week?"

response11 = generate_response(
    QUESTION11, 
    feature_view, 
    model_llm,
    tokenizer,
    model_air_quality,
    llm_chain,
    verbose=True,
)

print(response11)

Finished: Reading data from Hopsworks, using ArrowFlight (1.18s) 
🗓️ Today's date: Monday, 2024-03-25
📖 Air Quality Measurements:
Date: 2024-03-19; Air Quality: 17.0
Date: 2024-03-20; Air Quality: 17.0
Date: 2024-03-21; Air Quality: 41.0
Date: 2024-03-22; Air Quality: 14.0
Date: 2024-03-23; Air Quality: 20.0

Last week, on 2024-03-19, the air quality was 17.0, which is considered moderate. The air quality levels ranged from moderate to high, and it was suitable to go outside for most activities.


In [14]:
QUESTION12 = "When and what was the minimum air quality from 2024-01-10 till 2024-01-14?"

response12 = generate_response(
    QUESTION12, 
    feature_view, 
    model_llm, 
    tokenizer, 
    model_air_quality, 
    llm_chain,
    verbose=True,
)

print(response12)

Finished: Reading data from Hopsworks, using ArrowFlight (1.28s) 
🗓️ Today's date: Monday, 2024-03-25
📖 Air Quality Measurements:
Date: 2024-01-10; Air Quality: 9.0
Date: 2024-01-11; Air Quality: 8.0
Date: 2024-01-12; Air Quality: 9.0
Date: 2024-01-13; Air Quality: 14.0
Date: 2024-01-14; Air Quality: 13.0

The minimum air quality from 2024-01-10 to 2024-01-14 was on 2024-01-13 with an air quality level of 8.0, which is considered safe. It would be suitable to go outside for most activities.


In [15]:
QUESTION2a = "What was the air quality like last week?"

response2 = generate_response(
    QUESTION2a,
    feature_view, 
    model_llm,
    tokenizer,
    model_air_quality,
    llm_chain,
    verbose=True,
)

print(response2)

Finished: Reading data from Hopsworks, using ArrowFlight (1.22s) 
🗓️ Today's date: Monday, 2024-03-25
📖 Air Quality Measurements:
Date: 2024-03-19; Air Quality: 17.0
Date: 2024-03-20; Air Quality: 17.0
Date: 2024-03-21; Air Quality: 41.0
Date: 2024-03-22; Air Quality: 14.0
Date: 2024-03-23; Air Quality: 20.0

Last week, on 2024-03-19, the air quality was 17.0, which is considered moderate. The air quality levels ranged from moderate to high, and it was suitable to go outside for most activities.


In [16]:
QUESTION2 = "What was the air quality like yesterday?"

response2 = generate_response(
    QUESTION2,
    feature_view, 
    model_llm,
    tokenizer,
    model_air_quality,
    llm_chain,
    verbose=True,
)

print(response2)

Finished: Reading data from Hopsworks, using ArrowFlight (1.31s) 
🗓️ Today's date: Monday, 2024-03-25
📖 Air Quality Measurements:


Yesterday, on 2024-03-24, the air quality was 15.0, which is considered moderate. The air quality levels ranged from moderate to high, and it was suitable to go outside for most activities.


In [17]:
QUESTION3 = "What will the air quality be like on 2024-03-20?"

response3 = generate_response(
    QUESTION3, 
    feature_view, 
    model_llm, 
    tokenizer,
    model_air_quality,
    llm_chain,
    verbose=True,
)

print(response3)

Connection closed.
Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://snurran.hops.works/p/5240
Connected. Call `.close()` to terminate connection gracefully.
Finished: Reading data from Hopsworks, using ArrowFlight (0.55s) 
🗓️ Today's date: Monday, 2024-03-25
📖 Air Quality Measurements:


On 2024-03-20, the air quality was 16.0, which is considered moderate. The air quality levels ranged from moderate to high, and it was suitable to go outside for most activities.


In [18]:
QUESTION4 = "What will the air quality be like the day after tomorrow?"

response4 = generate_response(
    QUESTION4, 
    feature_view, 
    model_llm, 
    tokenizer, 
    model_air_quality, 
    llm_chain,
    verbose=True,
)

print(response4)

Connection closed.
Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://snurran.hops.works/p/5240
Connected. Call `.close()` to terminate connection gracefully.
Finished: Reading data from Hopsworks, using ArrowFlight (0.65s) 
🗓️ Today's date: Monday, 2024-03-25
📖 Air Quality Measurements:


On the day after tomorrow, 2024-03-27, the air quality will be 18.0, which is considered moderate. The air quality levels ranged from moderate to high, and it was suitable to go outside for most activities.


In [19]:
QUESTION5 = "What will the air quality be like this Sunday?"

response5 = generate_response(
    QUESTION5, 
    feature_view, 
    model_llm, 
    tokenizer, 
    model_air_quality, 
    llm_chain,
    verbose=True,
)

print(response5)

Connection closed.
Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://snurran.hops.works/p/5240
Connected. Call `.close()` to terminate connection gracefully.
Finished: Reading data from Hopsworks, using ArrowFlight (0.79s) 
🗓️ Today's date: Monday, 2024-03-25
📖 Air Quality Measurements:


On Sunday, 2024-03-31, the air quality will be 19.0, which is considered high. The air quality levels ranged from moderate to high, and it is recommended to avoid outdoor activities, especially for sensitive groups such as children, the elderly, and those with respiratory issues.


In [20]:
QUESTION7 = "What will the air quality be like for the rest of the week?"

response7 = generate_response(
    QUESTION7, 
    feature_view,
    model_llm,
    tokenizer, 
    model_air_quality, 
    llm_chain,
    verbose=True,
)

print(response7)

Connection closed.
Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://snurran.hops.works/p/5240
Connected. Call `.close()` to terminate connection gracefully.
Finished: Reading data from Hopsworks, using ArrowFlight (0.57s) 
🗓️ Today's date: Monday, 2024-03-25
📖 Air Quality Measurements:
Date: 2024-03-25 00:00:00; Air Quality: 46.52
Date: 2024-03-26 00:00:00; Air Quality: 43.97
Date: 2024-03-27 00:00:00; Air Quality: 21.64
Date: 2024-03-28 00:00:00; Air Quality: 24.7
Date: 2024-03-29 00:00:00; Air Quality: 13.1
Date: 2024-03-30 00:00:00; Air Quality: 42.3
Date: 2024-03-31 00:00:00; Air Quality: 24.03

For the rest of the week, the air quality will range from moderate to high. Specifically, on 2024-03-26, the air quality will be 43.97, which is considered high. On 2024-03-27, the air quality will be 21.64, which is considered moderate. On 2024-03-28, the air quality will be 24.7, which is also considered moderate. On 2024-03-29,

In [21]:
QUESTION = "Will the air quality be safe or not for the next week?"

response = generate_response(
    QUESTION7, 
    feature_view, 
    model_llm, 
    tokenizer,
    model_air_quality,
    llm_chain,
    verbose=True,
)

print(response)

Connection closed.
Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://snurran.hops.works/p/5240
Connected. Call `.close()` to terminate connection gracefully.
Finished: Reading data from Hopsworks, using ArrowFlight (0.59s) 
🗓️ Today's date: Monday, 2024-03-25
📖 Air Quality Measurements:
Date: 2024-03-25 00:00:00; Air Quality: 46.52
Date: 2024-03-26 00:00:00; Air Quality: 43.97
Date: 2024-03-27 00:00:00; Air Quality: 21.64
Date: 2024-03-28 00:00:00; Air Quality: 24.7
Date: 2024-03-29 00:00:00; Air Quality: 13.1
Date: 2024-03-30 00:00:00; Air Quality: 42.3
Date: 2024-03-31 00:00:00; Air Quality: 24.03

For the rest of the week, the air quality will range from moderate to high. Specifically, on 2024-03-26, the air quality will be 43.97, which is considered high. On 2024-03-27, the air quality will be 21.64, which is considered moderate. On 2024-03-28, the air quality will be 24.7, which is also considered moderate. On 2024-03-29,

In [22]:
QUESTION = "Is tomorrow's air quality level dangerous?"

response = generate_response(
    QUESTION, 
    feature_view, 
    model_llm, 
    tokenizer,
    model_air_quality, 
    llm_chain,
    verbose=True,
)

print(response)

Connection closed.
Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://snurran.hops.works/p/5240
Connected. Call `.close()` to terminate connection gracefully.
Finished: Reading data from Hopsworks, using ArrowFlight (0.57s) 
🗓️ Today's date: Monday, 2024-03-25
📖 Air Quality Measurements:


On Monday, 2024-03-25, the air quality will be 43.97, which is considered high. Tomorrow, 2024-03-26, the air quality will be 21.64, which is considered moderate. While the air quality level is not dangerous, it is still recommended to avoid outdoor activities, especially for sensitive groups such as children, the elderly, and those with respiratory issues.


In [23]:
QUESTION = "Can you please explain different air quality levels?"

response = generate_response(
    QUESTION, 
    feature_view, 
    model_llm, 
    tokenizer,
    model_air_quality, 
    llm_chain,
    verbose=True,
)

print(response)

🗓️ Today's date: Monday, 2024-03-25
📖 

Certainly! Air quality levels are typically categorized into different levels, each with their own description and recommended actions. Here are some common air quality levels and their descriptions:

1. Good (0-50): Air quality is considered good when the air quality index (AQI) falls within this range. It is suitable for all outdoor activities.
2. Moderate (51-100): The air quality is acceptable, but sensitive groups such as children, the elderly, and those with respiratory issues should avoid prolonged outdoor activities.
3. High (101-150): The air quality is considered unhealthy for sensitive groups, and outdoor activities should be limited.
4. Very High (151-200): The air quality is unhealthy, and everyone should avoid prolonged outdoor activities, especially sensitive groups.
5. Hazardous (200+): The air quality is extremely unhealthy, and everyone should avoid all outdoor activities.

These categories are general guidelines, and the specif

In [24]:
import gradio as gr
from transformers import pipeline
import numpy as np
import hopsworks
from xgboost import XGBRegressor
from functions.llm_chain import load_model, get_llm_chain, generate_response


2024-03-25 12:18:05,390 INFO: generated new fontManager
2024-03-25 12:18:05,716 INFO: HTTP Request: GET https://api.gradio.app/gradio-messaging/en "HTTP/1.1 200 OK"


In [25]:
# Initialize the ASR pipeline
transcriber = pipeline("automatic-speech-recognition", model="openai/whisper-base.en")

def transcribe(audio):
    sr, y = audio
    y = y.astype(np.float32)
    if y.ndim > 1 and y.shape[1] > 1:
        y = np.mean(y, axis=1)
    y /= np.max(np.abs(y))
    return transcriber({"sampling_rate": sr, "raw": y})["text"]

def generate_query_response(user_query):
    response = generate_response(
        user_query,
        feature_view,
        model_llm,
        tokenizer,
        model_air_quality,
        llm_chain,
        verbose=False,
    )
    return response

def handle_input(text_input=None, audio_input=None):
    if audio_input is not None:
        user_query = transcribe(audio_input)
    else:
        user_query = text_input
    
    if user_query:
        return generate_query_response(user_query)
    else:
        return "Please provide input either via text or voice."

iface = gr.Interface(
    fn=handle_input,
    inputs=[gr.Textbox(placeholder="Type here or use voice input..."), gr.Audio()],
    outputs="text",
    title="🌤️ AirQuality AI Assistant 💬",
    description="Ask your questions about air quality or use your voice to interact."
)

iface.launch(share=True)


config.json:   0%|          | 0.00/1.94k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/290M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/1.53k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/805 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.41M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

normalizer.json:   0%|          | 0.00/52.7k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/34.6k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/1.83k [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/185k [00:00<?, ?B/s]

Running on local URL:  http://127.0.0.1:7860
2024-03-25 12:18:15,242 INFO: HTTP Request: GET http://127.0.0.1:7860/startup-events "HTTP/1.1 200 OK"
2024-03-25 12:18:15,327 INFO: HTTP Request: GET https://checkip.amazonaws.com/ "HTTP/1.1 200 "
2024-03-25 12:18:15,823 INFO: HTTP Request: GET https://api.gradio.app/pkg-version "HTTP/1.1 200 OK"
2024-03-25 12:18:16,101 INFO: HTTP Request: POST https://api.gradio.app/gradio-initiated-analytics/ "HTTP/1.1 200 OK"
2024-03-25 12:18:17,308 INFO: HTTP Request: HEAD http://127.0.0.1:7860/ "HTTP/1.1 200 OK"
2024-03-25 12:18:22,772 INFO: HTTP Request: GET https://api.gradio.app/v2/tunnel-request "HTTP/1.1 200 OK"
2024-03-25 12:18:22,895 INFO: HTTP Request: GET https://cdn-media.huggingface.co/frpc-gradio-0.2/frpc_linux_amd64 "HTTP/1.1 200 OK"
Running on public URL: https://49e3bc82db8aedbe3e.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://h



2024-03-25 12:18:25,193 INFO: HTTP Request: POST https://api.gradio.app/gradio-launched-telemetry/ "HTTP/1.1 200 OK"
Connection closed.
Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://snurran.hops.works/p/5240
Connected. Call `.close()` to terminate connection gracefully.
Finished: Reading data from Hopsworks, using ArrowFlight (0.58s) 


---