## <span style='color:#ff5f27'> 📝 Imports

In [1]:
!pip install -r requirements.txt --quiet

[0m

In [2]:
import joblib

from functions.llm_chain import (
    load_model, 
    get_llm_chain, 
    generate_response,
)

## <span style="color:#ff5f27;"> 🔮 Connect to Hopsworks Feature Store </span>

In [3]:
import hopsworks

project = hopsworks.login()

fs = project.get_feature_store() 

Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://snurran.hops.works/p/5242
Connected. Call `.close()` to terminate connection gracefully.


## <span style="color:#ff5f27;"> ⚙️ Feature View Retrieval</span>

In [4]:
# Retrieve the 'air_quality_fv' feature view
feature_view = fs.get_feature_view(
    name='air_quality_fv',
    version=1,
)

# Initialize batch scoring
feature_view.init_batch_scoring(1)

## <span style="color:#ff5f27;">🪝 Retrieve AirQuality Model from Model Registry</span>

In [5]:
# Retrieve the model registry
mr = project.get_model_registry()

# Retrieve the 'air_quality_xgboost_model' from the model registry
retrieved_model = mr.get_model(
    name="air_quality_xgboost_model",
    version=1,
)

# Download the saved model artifacts to a local directory
saved_model_dir = retrieved_model.download()

Connected. Call `.close()` to terminate connection gracefully.
Downloading model artifact (0 dirs, 6 files)... DONE

In [6]:
# Load the XGBoost regressor model and label encoder from the saved model directory
model_air_quality = joblib.load(saved_model_dir + "/xgboost_regressor.pkl")
encoder = joblib.load(saved_model_dir + "/label_encoder.pkl")

# Display the retrieved XGBoost regressor model
model_air_quality

## <span style='color:#ff5f27'>⬇️ LLM Loading

In [7]:
# Load the LLM and its corresponding tokenizer.
model_llm, tokenizer = load_model()

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


2024-05-13 16:25:44,848 INFO: We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]



## <span style='color:#ff5f27'>⛓️ LangChain

In [8]:
# Create and configure a language model chain.
llm_chain = get_llm_chain(
    model_llm,
    tokenizer,
)



## <span style='color:#ff5f27'>🧬 Model Inference


In [9]:
QUESTION = "Who are you?"

response = generate_response(
    QUESTION,
    feature_view,
    model_air_quality,
    encoder,
    model_llm,
    tokenizer,
    llm_chain,
    verbose=True,
)

print(response)

🗓️ Today's date: Monday, 2024-05-13
📖 
I am an expert in air quality, here to help you understand the air quality in your city.


In [10]:
QUESTION1 = "What was the air quality from 2024-01-10 till 2024-01-14 in New York?"

response1 = generate_response(
    QUESTION1, 
    feature_view, 
    model_air_quality, 
    encoder,
    model_llm, 
    tokenizer,
    llm_chain,
    verbose=True,
)

print(response1)

Finished: Reading data from Hopsworks, using ArrowFlight (8.20s) 
🗓️ Today's date: Monday, 2024-05-13
📖 Air Quality Measurements for New York:
Date: 2024-01-10; Air Quality: 7.2
Date: 2024-01-11; Air Quality: 5.9
Date: 2024-01-12; Air Quality: 10.8
Date: 2024-01-13; Air Quality: 5.9
Date: 2024-01-14; Air Quality: 5.1
The average air quality in New York from January 10th to January 14th, 2024, was 6.9, which indicates moderate air quality. During this period, the air quality was mostly safe for most people, but sensitive groups, such as children and the elderly, may have experienced some health effects. It is generally recommended to go for a walk, but sensitive individuals may want to limit their outdoor activity.


In [11]:
QUESTION11 = "When and what was the maximum air quality from 2024-01-10 till 2024-01-14 in New York?"

response11 = generate_response(
    QUESTION11, 
    feature_view, 
    model_air_quality,
    encoder,
    model_llm,
    tokenizer,
    llm_chain,
    verbose=True,
)

print(response11)

Finished: Reading data from Hopsworks, using ArrowFlight (8.23s) 
🗓️ Today's date: Monday, 2024-05-13
📖 Air Quality Measurements for New York:
Date: 2024-01-10; Air Quality: 7.2
Date: 2024-01-11; Air Quality: 5.9
Date: 2024-01-12; Air Quality: 10.8
Date: 2024-01-13; Air Quality: 5.9
Date: 2024-01-14; Air Quality: 5.1
The maximum air quality in New York from 2024-01-10 to 2024-01-14 was on 2024-01-12 with an air quality of 10.8. This level is considered unhealthy for sensitive groups, and it is advisable to limit outdoor activities.


In [12]:
QUESTION12 = "When and what was the minimum air quality from 2024-01-10 till 2024-01-14 in New York?"

response12 = generate_response(
    QUESTION12, 
    feature_view,  
    model_air_quality, 
    encoder,
    model_llm, 
    tokenizer,
    llm_chain,
    verbose=True,
)

print(response12)

Finished: Reading data from Hopsworks, using ArrowFlight (8.18s) 
🗓️ Today's date: Monday, 2024-05-13
📖 Air Quality Measurements for New York:
Date: 2024-01-10; Air Quality: 7.2
Date: 2024-01-11; Air Quality: 5.9
Date: 2024-01-12; Air Quality: 10.8
Date: 2024-01-13; Air Quality: 5.9
Date: 2024-01-14; Air Quality: 5.1
The minimum air quality in New York during that period was on January 14th, with an air quality measurement of 5.1. This is considered to be a good air quality, suitable for most outdoor activities.


In [13]:
QUESTION2 = "What was the air quality yesterday in London?"

response2 = generate_response(
    QUESTION2,
    feature_view,
    model_air_quality,
    encoder,
    model_llm,
    tokenizer,
    llm_chain,
    verbose=True,
)

print(response2)

Finished: Reading data from Hopsworks, using ArrowFlight (9.45s) 
🗓️ Today's date: Monday, 2024-05-13
📖 Air Quality Measurements for London:
Date: 2024-05-12; Air Quality: 16.5
The air quality in London yesterday, on May 12, 2024, was 16.5. This falls within the safe range, so it would be safe for you to go for a walk or engage in outdoor activities.


In [14]:
QUESTION = "What was the air quality like last week in London?"

response = generate_response(
    QUESTION,
    feature_view, 
    model_air_quality,
    encoder,
    model_llm,
    tokenizer,
    llm_chain,
    verbose=True,
)

print(response)

Finished: Reading data from Hopsworks, using ArrowFlight (7.71s) 
🗓️ Today's date: Monday, 2024-05-13
📖 Air Quality Measurements for London:
Date: 2024-05-06; Air Quality: 16.4
Date: 2024-05-07; Air Quality: 14.2
Date: 2024-05-08; Air Quality: 15.1
Date: 2024-05-09; Air Quality: 23.4
Date: 2024-05-10; Air Quality: 26.2
Date: 2024-05-11; Air Quality: 23.1
Date: 2024-05-12; Air Quality: 16.5
Date: 2024-05-13; Air Quality: 10.5
Last week in London, the air quality was generally moderate to good. The measurements from May 6th to May 13th showed varying levels, with the highest being 26.2 on May 10th and the lowest being 10.5 on May 13th. Overall, the air quality was within safe limits for most activities, but it would be advisable to avoid prolonged outdoor activities on days with higher air quality readings, such as May 10th.


In [15]:
QUESTION3 = "What will the air quality be like in London in 2024-05-20?"

response3 = generate_response(
    QUESTION3, 
    feature_view, 
    model_air_quality,
    encoder,
    model_llm, 
    tokenizer,
    llm_chain,
    verbose=True,
)

print(response3)

Finished: Reading data from Hopsworks, using ArrowFlight (7.96s) 
🗓️ Today's date: Monday, 2024-05-13
📖 Air Quality Measurements for London:
Date: 2024-05-13; Air Quality: 10.5
Date: 2024-05-14; Air Quality: 11.58
Date: 2024-05-15; Air Quality: 10.56
Date: 2024-05-16; Air Quality: 10.04
Date: 2024-05-17; Air Quality: 9.87
Date: 2024-05-18; Air Quality: 9.2
Date: 2024-05-19; Air Quality: 9.44
Date: 2024-05-20; Air Quality: 9.4
The air quality in London on May 20, 2024, is expected to be at a level of 9.4, which is considered safe and suitable for outdoor activities.


In [16]:
QUESTION4 = "What will the air quality be like in Chicago tomorrow?"

response4 = generate_response(
    QUESTION4, 
    feature_view, 
    model_air_quality, 
    encoder,
    model_llm, 
    tokenizer,
    llm_chain,
    verbose=True,
)

print(response4)

Finished: Reading data from Hopsworks, using ArrowFlight (7.95s) 
🗓️ Today's date: Monday, 2024-05-13
📖 Air Quality Measurements for Chicago:
Date: 2024-05-13; Air Quality: 10.4
Date: 2024-05-14; Air Quality: 10.2
Based on the air quality measurements for Chicago, the air quality level on May 14th is expected to be slightly better than today, with an Air Quality index of approximately 10.2. This level is considered safe for most people, but those with respiratory conditions may still want to limit their outdoor activity.


In [17]:
QUESTION5 = "What will the air quality be like in London next Sunday?"

response5 = generate_response(
    QUESTION5, 
    feature_view, 
    model_air_quality, 
    encoder,
    model_llm, 
    tokenizer, 
    llm_chain,
    verbose=True,
)

print(response5)

Finished: Reading data from Hopsworks, using ArrowFlight (7.96s) 
🗓️ Today's date: Monday, 2024-05-13
📖 Air Quality Measurements for London:
Date: 2024-05-13; Air Quality: 10.5
Date: 2024-05-14; Air Quality: 11.58
Date: 2024-05-15; Air Quality: 10.56
Date: 2024-05-16; Air Quality: 10.04
Date: 2024-05-17; Air Quality: 9.87
Date: 2024-05-18; Air Quality: 9.2
Date: 2024-05-19; Air Quality: 9.44
Based on the air quality measurements for London, the air quality on the following Sunday, May 19th, is expected to be at a level of 9.44. This is considered to be within the safe range, meaning it is safe to go outside and enjoy a walk.


In [18]:
QUESTION7 = "What will the air quality be like on May 18 in London?"

response7 = generate_response(
    QUESTION7,
    feature_view,
    model_air_quality,
    encoder,
    model_llm,
    tokenizer,
    llm_chain,
    verbose=True,
)

print(response7)

Finished: Reading data from Hopsworks, using ArrowFlight (7.63s) 
🗓️ Today's date: Monday, 2024-05-13
📖 Air Quality Measurements for London:
Date: 2024-05-13; Air Quality: 10.5
Date: 2024-05-14; Air Quality: 11.58
Date: 2024-05-15; Air Quality: 10.56
Date: 2024-05-16; Air Quality: 10.04
Date: 2024-05-17; Air Quality: 9.87
Date: 2024-05-18; Air Quality: 9.2
The air quality on May 18 in London is expected to be safe for most people, as the air quality index is 9.2. However, if you have a respiratory condition or are particularly sensitive to air pollution, you may want to limit your outdoor activities.


In [19]:
QUESTION = "Can you please explain different PM2_5 air quality levels?"

response = generate_response(
    QUESTION, 
    feature_view, 
    model_air_quality, 
    encoder,
    model_llm, 
    tokenizer,
    llm_chain,
    verbose=True,
)

print(response)

🗓️ Today's date: Monday, 2024-05-13
📖 




Certainly! PM2.5 refers to particulate matter with a diameter of 2.5 micrometers or less, which can be inhaled into the lungs and cause health issues. The air quality levels for PM2.5 are as follows:

1. Good (below 12 µg/m³): At this level, the air quality is considered safe, and there is no need to worry about going for a walk or engaging in outdoor activities.

2. Moderate (12-18 µg/m³): The air quality is acceptable, but sensitive groups like children, the elderly, and those with respiratory issues may experience mild discomfort. It is still safe to go for a walk or engage in outdoor activities.

3. Poor (18-25 µg/m³): At this level, air quality is considered unhealthy for sensitive groups, and general discomfort may be experienced by the population. It is advisable to limit prolonged outdoor activities.

4. Very Poor (25-35 µg/m³): The air quality is unhealthy and can cause respiratory issues for most people. It is recommended to avoid prolonged outdoor activities and stay indoors

---

## <span style='color:#ff5f27'>🧬 Inference with OpenAI


In [20]:
from openai import OpenAI
import os
import getpass

from functions.llm_chain import generate_response_openai

In [21]:
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY") or getpass.getpass('🔑 Enter your OpenAI API key: ')

client = OpenAI(
    api_key=os.environ["OPENAI_API_KEY"],
)

🔑 Enter your OpenAI API key:  ···················································


In [22]:
QUESTION = "What was the air quality like last week in London?"

response = generate_response_openai(   
    QUESTION,
    feature_view,
    model_air_quality,
    encoder,
    client,
    verbose=True,
)
print(response)

2024-05-13 16:31:31,396 INFO: HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
Finished: Reading data from Hopsworks, using ArrowFlight (10.46s) 
🗓️ Today's date: Monday, 2024-05-13
📖 Air Quality Measurements for London:
Date: 2024-05-06; Air Quality: 16.4
Date: 2024-05-07; Air Quality: 14.2
Date: 2024-05-08; Air Quality: 15.1
Date: 2024-05-09; Air Quality: 23.4
Date: 2024-05-10; Air Quality: 26.2
Date: 2024-05-11; Air Quality: 23.1
Date: 2024-05-12; Air Quality: 16.5
2024-05-13 16:31:52,983 INFO: HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
The air quality in London last week showed some variation. At the beginning of the week, the air quality was excellent, with levels such as 16.4, indicating very clean air, ideal for outdoor activities such as walking, running, or other forms of exercise without concern for air pollution. As the week progressed, air quality slightly decreased, reaching higher levels by the middle of 

In [23]:
QUESTION4 = "What will the air quality be like in Chicago tomorrow?"

response4 = generate_response_openai(
    QUESTION4,
    feature_view,
    model_air_quality,
    encoder,
    client,
    verbose=True,
)

print(response4)

2024-05-13 16:31:54,450 INFO: HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
Finished: Reading data from Hopsworks, using ArrowFlight (8.02s) 
🗓️ Today's date: Monday, 2024-05-13
📖 Air Quality Measurements for Chicago:
Date: 2024-05-13; Air Quality: 10.4
Date: 2024-05-14; Air Quality: 10.2
2024-05-13 16:32:11,342 INFO: HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
The air quality in Chicago tomorrow will be excellent, with a reading of 10.2, indicating very clean air. It will be a great day for outdoor activities, such as going for a walk or enjoying the park, as the air will be very safe to breathe.


---