## <span style='color:#ff5f27'> 📝 Imports

In [1]:
from xgboost import XGBRegressor
import hopsworks
from functions.llm_chain import load_model, get_llm_chain, generate_response
import pandas as pd
import warnings
warnings.filterwarnings("ignore")

## <span style="color:#ff5f27;"> 🔮 Connect to Hopsworks Feature Store </span>

In [2]:
project = hopsworks.login()
fs = project.get_feature_store() 

Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://snurran.hops.works/p/8321
Connected. Call `.close()` to terminate connection gracefully.


## <span style="color:#ff5f27;"> ⚙️ Feature View Creation</span>

In [3]:
# Retrieve feature groups
air_quality_fg = fs.get_feature_group(
    name='air_quality',
    version=1,
)
weather_fg = fs.get_feature_group(
    name='weather',
    version=1,
)

In [4]:
# Select features for training data.
selected_features = air_quality_fg.select(['date', 'pm25']).join(
    weather_fg.select(['temperature_2m_mean', 'precipitation_sum', 'wind_speed_10m_max', 'wind_direction_10m_dominant']), 
    on=['city'],
)

In [5]:
# Get_or_create the 'air_quality_fv' feature view
feature_view = fs.get_or_create_feature_view(
    name='air_quality_fv',
    version=2,
    query=selected_features,
)

# Initialize batch scoring
feature_view.init_batch_scoring(1)

data, _ = feature_view.training_data()
data.head(3)

Finished: Reading data from Hopsworks, using ArrowFlight (1.25s) 


Unnamed: 0,date,pm25,temperature_2m_mean,precipitation_sum,wind_speed_10m_max,wind_direction_10m_dominant
0,2017-10-04 00:00:00+00:00,13.0,10.587333,2.9,22.206486,248.656326
1,2017-10-05 00:00:00+00:00,9.0,8.433167,2.6,16.595179,306.52124
2,2017-10-06 00:00:00+00:00,8.0,8.247749,0.5,23.871555,320.408325


## <span style="color:#ff5f27;">🪝 Retrieve AirQuality Model from Model Registry</span>

In [6]:
# Retrieve the model registry
mr = project.get_model_registry()

# Retrieve the 'air_quality_xgboost_model' from the model registry
retrieved_model = mr.get_model(
    name="air_quality_xgboost_model",
    version=1,
)

# Download the saved model artifacts  to a local directory
saved_model_dir = retrieved_model.download()

Connected. Call `.close()` to terminate connection gracefully.
Downloading model artifact (1 dirs, 6 files)... DONE

In [7]:
# Loading the XGBoost regressor model and label encoder from the saved model directory
# model_air_quality = joblib.load(saved_model_dir + "/xgboost_regressor.pkl")
model_air_quality = XGBRegressor()

model_air_quality.load_model(saved_model_dir + "/model.json")

# Displaying the retrieved XGBoost regressor model
model_air_quality

## <span style='color:#ff5f27'>⬇️ LLM Loading

In [8]:
# Load the LLM and its corresponding tokenizer.
model_llm, tokenizer = load_model()

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


2024-03-13 15:48:17,606 INFO: We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

## <span style='color:#ff5f27'>⛓️ LangChain

In [9]:
# Create and configure a language model chain.
llm_chain = get_llm_chain(
    model_llm,
    tokenizer,
)

## <span style='color:#ff5f27'>🧬 Model Inference


In [10]:
QUESTION7 = "Hi!"

response7 = generate_response(
    QUESTION7,
    feature_view,
    model_llm, 
    tokenizer,
    model_air_quality,
    llm_chain,
    verbose=True,
)

print(response7)

🗓️ Today's date: Wednesday, 2024-03-13
📖 

Hello! How can I assist you with air quality information today?


In [11]:
QUESTION = "Who are you?"

response = generate_response(
    QUESTION,
    feature_view,
    model_llm,
    tokenizer,
    model_air_quality,
    llm_chain,
    verbose=True,
)

print(response)

🗓️ Today's date: Wednesday, 2024-03-13
📖 

I am an AI-powered assistant designed to provide air quality information based on the data provided by the user. I am here to help you with any questions you may have about air quality.


In [12]:
QUESTION1 = "What was the average air quality from 2024-01-10 till 2024-01-14?"

response1 = generate_response(
    QUESTION1, 
    feature_view, 
    model_llm, 
    tokenizer, 
    model_air_quality, 
    llm_chain,
    verbose=True,
)

print(response1)

Finished: Reading data from Hopsworks, using ArrowFlight (0.99s) 
🗓️ Today's date: Wednesday, 2024-03-13
📖 Air Quality Measurements:
Date: 2024-01-10; Air Quality: 9.0
Date: 2024-01-11; Air Quality: 8.0
Date: 2024-01-12; Air Quality: 9.0
Date: 2024-01-13; Air Quality: 14.0
Date: 2024-01-14; Air Quality: 13.0

The average air quality from 2024-01-10 to 2024-01-14 was 10.6. This indicates that the air quality was generally moderate during that period.


In [13]:
QUESTION11 = "When and what was the maximum air quality from 2024-01-10 till 2024-01-14?"

response11 = generate_response(
    QUESTION11, 
    feature_view, 
    model_llm,
    tokenizer,
    model_air_quality,
    llm_chain,
    verbose=True,
)

print(response11)

Finished: Reading data from Hopsworks, using ArrowFlight (1.00s) 
🗓️ Today's date: Wednesday, 2024-03-13
📖 Air Quality Measurements:
Date: 2024-01-10; Air Quality: 9.0
Date: 2024-01-11; Air Quality: 8.0
Date: 2024-01-12; Air Quality: 9.0
Date: 2024-01-13; Air Quality: 14.0
Date: 2024-01-14; Air Quality: 13.0

The maximum air quality from 2024-01-10 to 2024-01-14 was on 2024-01-13 with an air quality level of 14.0. This indicates that the air quality on that day was considered unhealthy for sensitive groups.


In [14]:
QUESTION12 = "When and what was the minimum air quality from 2024-01-10 till 2024-01-14?"

response12 = generate_response(
    QUESTION12, 
    feature_view, 
    model_llm, 
    tokenizer, 
    model_air_quality, 
    llm_chain,
    verbose=True,
)

print(response12)

Finished: Reading data from Hopsworks, using ArrowFlight (1.00s) 
🗓️ Today's date: Wednesday, 2024-03-13
📖 Air Quality Measurements:
Date: 2024-01-10; Air Quality: 9.0
Date: 2024-01-11; Air Quality: 8.0
Date: 2024-01-12; Air Quality: 9.0
Date: 2024-01-13; Air Quality: 14.0
Date: 2024-01-14; Air Quality: 13.0

The minimum air quality from 2024-01-10 to 2024-01-14 was on 2024-01-11 with an air quality level of 8.0. This indicates that the air quality on that day was considered healthy.


In [15]:
QUESTION2 = "What was the air quality yesterday?"

response2 = generate_response(
    QUESTION2,
    feature_view, 
    model_llm,
    tokenizer,
    model_air_quality,
    llm_chain,
    verbose=True,
)

print(response2)

Finished: Reading data from Hopsworks, using ArrowFlight (0.93s) 
🗓️ Today's date: Wednesday, 2024-03-13
📖 Air Quality Measurements:
Date: 2024-03-12; Air Quality: 46.0

The air quality yesterday, 2024-03-12, had an air quality level of 46.0. This indicates that the air quality was considered hazardous and it is not recommended to go outside.


In [16]:
QUESTION3 = "What will the air quality be like in 2024-03-18?"

response3 = generate_response(
    QUESTION3, 
    feature_view, 
    model_llm, 
    tokenizer,
    model_air_quality,
    llm_chain,
    verbose=True,
)

print(response3)

Finished: Reading data from Hopsworks, using ArrowFlight (0.99s) 
🗓️ Today's date: Wednesday, 2024-03-13
📖 Air Quality Measurements:
Date: 2024-03-13; Air Quality: 51.0
Date: 2024-03-14; Air Quality: 30.84
Date: 2024-03-15; Air Quality: 30.84
Date: 2024-03-16; Air Quality: 30.84
Date: 2024-03-17; Air Quality: 30.84
Date: 2024-03-18; Air Quality: 30.84

The air quality on 2024-03-18 is expected to be at an air quality level of 30.84. This indicates that the air quality on that day was considered unhealthy for sensitive groups. It is recommended to limit outdoor activities.


In [17]:
QUESTION4 = "What will the air quality be like the day after tomorrow?"

response4 = generate_response(
    QUESTION4, 
    feature_view, 
    model_llm, 
    tokenizer, 
    model_air_quality, 
    llm_chain,
    verbose=True,
)

print(response4)

Finished: Reading data from Hopsworks, using ArrowFlight (1.11s) 
🗓️ Today's date: Wednesday, 2024-03-13
📖 Air Quality Measurements:
Date: 2024-03-13; Air Quality: 51.0
Date: 2024-03-14; Air Quality: 30.84
Date: 2024-03-15; Air Quality: 30.84

The air quality the day after tomorrow, 2024-03-15, is expected to be at an air quality level of 30.84. This indicates that the air quality on that day was considered unhealthy for sensitive groups. It is recommended to limit outdoor activities.


In [18]:
QUESTION5 = "What will the air quality be like on Sunday?"

response5 = generate_response(
    QUESTION5, 
    feature_view, 
    model_llm, 
    tokenizer, 
    model_air_quality, 
    llm_chain,
    verbose=True,
)

print(response5)

Finished: Reading data from Hopsworks, using ArrowFlight (0.97s) 
🗓️ Today's date: Wednesday, 2024-03-13
📖 Air Quality Measurements:
Date: 2024-03-13; Air Quality: 51.0
Date: 2024-03-14; Air Quality: 30.84
Date: 2024-03-15; Air Quality: 30.84
Date: 2024-03-16; Air Quality: 30.84
Date: 2024-03-17; Air Quality: 30.84

The air quality on Sunday, 2024-03-17, is expected to be at an air quality level of 51.0. This indicates that the air quality on that day was considered unhealthy for sensitive groups. It is recommended to limit outdoor activities.


In [19]:
QUESTION7 = "What will the air quality be like on March 16?"

response7 = generate_response(
    QUESTION7, 
    feature_view,
    model_llm,
    tokenizer, 
    model_air_quality, 
    llm_chain,
    verbose=True,
)

print(response7)

Finished: Reading data from Hopsworks, using ArrowFlight (1.20s) 
🗓️ Today's date: Wednesday, 2024-03-13
📖 Air Quality Measurements:
Date: 2024-03-13; Air Quality: 51.0
Date: 2024-03-14; Air Quality: 30.84
Date: 2024-03-15; Air Quality: 30.84
Date: 2024-03-16; Air Quality: 30.84

The air quality on March 16 is expected to be at an air quality level of 30.84. This indicates that the air quality on that day was considered unhealthy for sensitive groups. It is recommended to limit outdoor activities.


In [20]:
QUESTION = "Is this level safe or not?"

response = generate_response(
    QUESTION7, 
    feature_view, 
    model_llm, 
    tokenizer,
    model_air_quality,
    llm_chain,
    verbose=True,
)

print(response)

Finished: Reading data from Hopsworks, using ArrowFlight (0.89s) 
🗓️ Today's date: Wednesday, 2024-03-13
📖 Air Quality Measurements:
Date: 2024-03-13; Air Quality: 51.0
Date: 2024-03-14; Air Quality: 30.84
Date: 2024-03-15; Air Quality: 30.84
Date: 2024-03-16; Air Quality: 30.84

The air quality on March 16 is expected to be at an air quality level of 30.84. This indicates that the air quality on that day was considered unhealthy for sensitive groups. It is recommended to limit outdoor activities.


In [21]:
QUESTION = "Is this air quality level dangerous?"

response = generate_response(
    QUESTION, 
    feature_view, 
    model_llm, 
    tokenizer,
    model_air_quality, 
    llm_chain,
    verbose=True,
)

print(response)

🗓️ Today's date: Wednesday, 2024-03-13
📖 

The air quality level is not dangerous, but it is not safe for everyone. It is recommended to limit outdoor activities, especially for sensitive groups such as children, the elderly, and those with respiratory issues.


In [22]:
QUESTION = "Can you please explain different air quality levels?"

response = generate_response(
    QUESTION, 
    feature_view, 
    model_llm, 
    tokenizer,
    model_air_quality, 
    llm_chain,
    verbose=True,
)

print(response)

🗓️ Today's date: Wednesday, 2024-03-13
📖 

Certainly! Air quality levels are categorized based on the Air Quality Index (AQI), which is a measure of how clean or polluted the air is. The AQI ranges from 0 to 500, with higher numbers indicating more pollution. Here are the different air quality levels and their corresponding AQI ranges:

1. Good: AQI 0-50
2. Moderate: AQI 51-100
3. Unhealthy for Sensitive Groups: AQI 101-150
4. Unhealthy: AQI 151-200
5. Very Unhealthy: AQI 201-300
6. Hazardous: AQI 301-500

These categories help people understand the potential health risks associated with different air quality levels. For example, "Good" air quality is considered safe for everyone, while "Unhealthy" air quality may cause respiratory issues for sensitive groups or even the general population if the AQI is very high.


---