## <span style='color:#ff5f27'> 📝 Imports

In [1]:
!pip install -r requirements.txt --quiet

[0m

In [2]:
import joblib

from functions.llm_chain import load_model, get_llm_chain, generate_response

## <span style="color:#ff5f27;"> 🔮 Connect to Hopsworks Feature Store </span>

In [3]:
import hopsworks

project = hopsworks.login()

fs = project.get_feature_store() 

Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://snurran.hops.works/p/5242
Connected. Call `.close()` to terminate connection gracefully.


## <span style="color:#ff5f27;"> ⚙️ Feature View Retrieval</span>

In [4]:
# Retrieve the 'air_quality_fv' feature view
feature_view = fs.get_feature_view(
    name='air_quality_fv',
    version=1,
)

# Initialize batch scoring
feature_view.init_batch_scoring(1)

## <span style="color:#ff5f27;">🪝 Retrieve AirQuality Model from Model Registry</span>

In [5]:
# Retrieve the model registry
mr = project.get_model_registry()

# Retrieve the 'air_quality_xgboost_model' from the model registry
retrieved_model = mr.get_model(
    name="air_quality_xgboost_model",
    version=1,
)

# Download the saved model artifacts to a local directory
saved_model_dir = retrieved_model.download()

Connected. Call `.close()` to terminate connection gracefully.
Downloading model artifact (0 dirs, 6 files)... DONE

In [6]:
# Load the XGBoost regressor model and label encoder from the saved model directory
model_air_quality = joblib.load(saved_model_dir + "/xgboost_regressor.pkl")
encoder = joblib.load(saved_model_dir + "/label_encoder.pkl")

# Display the retrieved XGBoost regressor model
model_air_quality

## <span style='color:#ff5f27'>⬇️ LLM Loading

In [7]:
# Load the LLM and its corresponding tokenizer.
model_llm, tokenizer = load_model()

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


2024-03-08 10:10:27,333 INFO: We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

## <span style='color:#ff5f27'>⛓️ LangChain

In [8]:
# Create and configure a language model chain.
llm_chain = get_llm_chain(
    model_llm, 
    tokenizer,
)



## <span style='color:#ff5f27'>🧬 Model Inference


In [9]:
QUESTION7 = "Hi!"

response7 = generate_response(
    QUESTION7,
    feature_view,
    model_llm, 
    tokenizer,
    model_air_quality,
    encoder,
    llm_chain,
    verbose=True,
)

print(response7)

🗓️ Today's date: Friday, 2024-03-08
📖 

Hello! How can I assist you with air quality information today?


In [10]:
QUESTION = "Who are you?"

response = generate_response(
    QUESTION,
    feature_view,
    model_llm,
    tokenizer,
    model_air_quality,
    encoder,
    llm_chain,
    verbose=True,
)

print(response)

🗓️ Today's date: Friday, 2024-03-08
📖 

I am an AI Air Quality Assistant, here to help you with air quality information.


In [11]:
QUESTION1 = "What was the average air quality from 2024-01-10 till 2024-01-14 in New York?"

response1 = generate_response(
    QUESTION1, 
    feature_view, 
    model_llm, 
    tokenizer, 
    model_air_quality, 
    encoder,
    llm_chain,
    verbose=True,
)

print(response1)

Finished: Reading data from Hopsworks, using ArrowFlight (8.92s) 
🗓️ Today's date: Friday, 2024-03-08
📖 Air Quality Measurements for New York:
Date: 2024-01-10; Air Quality: 7.2
Date: 2024-01-11; Air Quality: 5.9
Date: 2024-01-12; Air Quality: 10.8
Date: 2024-01-13; Air Quality: 5.9
Date: 2024-01-14; Air Quality: 5.1

The average air quality from January 10 to January 14 in New York was 6.7. This indicates that the air quality was generally moderate, and it is safe to go outside for most activities.


In [12]:
QUESTION11 = "When and what was the maximum air quality from 2024-01-10 till 2024-01-14 in New York?"

response11 = generate_response(
    QUESTION11, 
    feature_view, 
    model_llm,
    tokenizer,
    model_air_quality,
    encoder,
    llm_chain,
    verbose=True,
)

print(response11)

Finished: Reading data from Hopsworks, using ArrowFlight (8.57s) 
🗓️ Today's date: Friday, 2024-03-08
📖 Air Quality Measurements for New York:
Date: 2024-01-10; Air Quality: 7.2
Date: 2024-01-11; Air Quality: 5.9
Date: 2024-01-12; Air Quality: 10.8
Date: 2024-01-13; Air Quality: 5.9
Date: 2024-01-14; Air Quality: 5.1

The maximum air quality from January 10 to January 14 in New York was on January 12, with an air quality of 10.8. This indicates that the air quality was poor on that day, and it is recommended to limit outdoor activities, especially for sensitive groups such as children, the elderly, and those with respiratory issues.


In [13]:
QUESTION12 = "When and what was the minimum air quality from 2024-01-10 till 2024-01-14 in New York?"

response12 = generate_response(
    QUESTION12, 
    feature_view, 
    model_llm, 
    tokenizer, 
    model_air_quality, 
    encoder,
    llm_chain,
    verbose=True,
)

print(response12)

Finished: Reading data from Hopsworks, using ArrowFlight (8.65s) 
🗓️ Today's date: Friday, 2024-03-08
📖 Air Quality Measurements for New York:
Date: 2024-01-10; Air Quality: 7.2
Date: 2024-01-11; Air Quality: 5.9
Date: 2024-01-12; Air Quality: 10.8
Date: 2024-01-13; Air Quality: 5.9
Date: 2024-01-14; Air Quality: 5.1

The minimum air quality from January 10 to January 14 in New York was on January 11, with an air quality of 5.9. This indicates that the air quality was generally good on that day, and it is safe to go outside for most activities.


In [14]:
QUESTION2 = "What was the air quality yesterday in London?"

response2 = generate_response(
    QUESTION2, 
    feature_view, 
    model_llm, 
    tokenizer, 
    model_air_quality, 
    encoder,
    llm_chain,
    verbose=True,
)

print(response2)

Finished: Reading data from Hopsworks, using ArrowFlight (8.70s) 
🗓️ Today's date: Friday, 2024-03-08
📖 Air Quality Measurements for London:
Date: 2024-03-07; Air Quality: 25.7

Yesterday in London, the air quality was 25.7, which indicates that the air quality was poor. It is recommended to limit outdoor activities, especially for sensitive groups such as children, the elderly, and those with respiratory issues.


In [15]:
QUESTION3 = "What will the air quality be like in London in 2024-03-10?"

response3 = generate_response(
    QUESTION3, 
    feature_view, 
    model_llm, 
    tokenizer,
    model_air_quality,
    encoder,
    llm_chain,
    verbose=True,
)

print(response3)

Finished: Reading data from Hopsworks, using ArrowFlight (8.77s) 
🗓️ Today's date: Friday, 2024-03-08
📖 Air Quality Measurements for London:
Date: 2024-03-08; Air Quality: 24.3
Date: 2024-03-09; Air Quality: 16.71
Date: 2024-03-10; Air Quality: 11.18

The air quality in London on March 10, 2024, was 11.18, which indicates that the air quality was unhealthy for sensitive groups such as children, the elderly, and those with respiratory issues. It is recommended to limit outdoor activities on that day for these groups.


In [16]:
QUESTION4 = "What will the air quality be like in Chicago the day after tomorrow?"

response4 = generate_response(
    QUESTION4, 
    feature_view, 
    model_llm, 
    tokenizer, 
    model_air_quality, 
    encoder,
    llm_chain,
    verbose=True,
)

print(response4)

Finished: Reading data from Hopsworks, using ArrowFlight (8.76s) 
🗓️ Today's date: Friday, 2024-03-08
📖 Air Quality Measurements for Chicago:
Date: 2024-03-08; Air Quality: 10.0
Date: 2024-03-09; Air Quality: 8.19
Date: 2024-03-10; Air Quality: 8.61

The air quality in Chicago the day after tomorrow, on March 10, 2024, was 8.61. This indicates that the air quality was unhealthy for sensitive groups such as children, the elderly, and those with respiratory issues. It is recommended to limit outdoor activities on that day for these groups.


In [17]:
QUESTION5 = "What will the air quality be like in London on Sunday?"

response5 = generate_response(
    QUESTION5, 
    feature_view, 
    model_llm, 
    tokenizer, 
    model_air_quality, 
    encoder,
    llm_chain,
    verbose=True,
)

print(response5)

Finished: Reading data from Hopsworks, using ArrowFlight (8.83s) 
🗓️ Today's date: Friday, 2024-03-08
📖 Air Quality Measurements for London:
Date: 2024-03-08; Air Quality: 24.3
Date: 2024-03-09; Air Quality: 16.71
Date: 2024-03-10; Air Quality: 11.18

On Sunday, the air quality in London is expected to be 16.71, which indicates that the air quality is unhealthy for sensitive groups such as children, the elderly, and those with respiratory issues. It is recommended to limit outdoor activities on that day for these groups.


In [18]:
QUESTION7 = "What will the air quality be like on March 9 in London?"

response7 = generate_response(
    QUESTION7, 
    feature_view,
    model_llm,
    tokenizer, 
    model_air_quality, 
    encoder,
    llm_chain,
    verbose=True,
)

print(response7)

Finished: Reading data from Hopsworks, using ArrowFlight (8.80s) 
🗓️ Today's date: Friday, 2024-03-08
📖 Air Quality Measurements for London:
Date: 2024-03-08; Air Quality: 24.3
Date: 2024-03-09; Air Quality: 16.71

On March 9 in London, the air quality was 16.71, which indicates that the air quality was unhealthy for sensitive groups such as children, the elderly, and those with respiratory issues. It is recommended to limit outdoor activities on that day for these groups.


In [19]:
QUESTION = "Is this level safe or not?"

response = generate_response(
    QUESTION7, 
    feature_view, 
    model_llm, 
    tokenizer,
    model_air_quality,
    encoder,
    llm_chain,
    verbose=True,
)

print(response)

Finished: Reading data from Hopsworks, using ArrowFlight (8.76s) 
🗓️ Today's date: Friday, 2024-03-08
📖 Air Quality Measurements for London:
Date: 2024-03-08; Air Quality: 24.3
Date: 2024-03-09; Air Quality: 16.71





On March 9 in London, the air quality was 16.71, which indicates that the air quality was unhealthy for sensitive groups such as children, the elderly, and those with respiratory issues. It is recommended to limit outdoor activities on that day for these groups.


In [20]:
QUESTION = "Is this air quality level dangerous?"

response = generate_response(
    QUESTION, 
    feature_view, 
    model_llm, 
    tokenizer,
    model_air_quality, 
    encoder,
    llm_chain,
    verbose=True,
)

print(response)

🗓️ Today's date: Friday, 2024-03-08
📖 





The air quality level of 16.71 in London is considered unhealthy for sensitive groups such as children, the elderly, and those with respiratory issues. It is not dangerous for everyone, but it is recommended to limit outdoor activities for sensitive groups.


In [21]:
QUESTION = "Can you please explain different air quality levels?"

response = generate_response(
    QUESTION, 
    feature_view, 
    model_llm, 
    tokenizer,
    model_air_quality, 
    encoder,
    llm_chain,
    verbose=True,
)

print(response)

🗓️ Today's date: Friday, 2024-03-08
📖 





Certainly! Air quality levels are usually measured using an index, such as the Air Quality Index (AQI) or the Pollution Standards Index (PSI). These indices provide a numerical value that represents the air quality at a particular location. The levels are usually categorized into different color-coded ranges, with each range representing a different level of air quality. Here's a general breakdown of the air quality levels:

1. Good (AQI 0-50): The air quality is considered good, and it is safe for everyone to breathe.
2. Moderate (AQI 51-100): The air quality is generally fine, but sensitive groups might experience some discomfort.
3. Unhealthy for Sensitive Groups (AQI 101-150): The air quality is unhealthy for sensitive groups such as children, the elderly, and those with respiratory issues. It is recommended to limit outdoor activities for these groups.
4. Unhealthy (AQI 151-200): The air quality is unhealthy for everyone, and sensitive groups should avoid prolonged outdoor activi

---