## <span style='color:#ff5f27'> 📝 Colab Users - Uncomment & Run the following 2 Cells

In [1]:
# !pip install hopsworks --quiet
# !pip install xgboost==2.0.3 --quiet
# !pip install scikit-learn==1.4.1.post1 --quiet
# !pip install langchain==0.1.10 --quiet
# !pip install bitsandbytes==0.42.0 --quiet
# !pip install accelerate==0.27.2 --quiet
# !pip install transformers==4.38.2 --quiet

In [2]:
# !mkdir -p functions
# !cd functions && wget https://raw.githubusercontent.com/featurestorebook/mlfs-book/main/notebooks/ch03/functions/air_quality_data_retrieval.py 
# !cd functions && wget https://raw.githubusercontent.com/featurestorebook/mlfs-book/main/notebooks/ch03/functions/context_engineering.py
# !cd functions && wget https://raw.githubusercontent.com/featurestorebook/mlfs-book/main/notebooks/ch03/functions/llm_chain.py
# !cd functions && wget https://raw.githubusercontent.com/featurestorebook/mlfs-book/main/notebooks/ch03/functions/util.py

## <span style='color:#ff5f27'> 📝 Imports

In [3]:
from xgboost import XGBRegressor
import hopsworks
from openai import OpenAI
from functions.llm_chain import (
    load_model, 
    get_llm_chain, 
    generate_response, 
    generate_response_openai,
)
import pandas as pd
import os
import warnings
warnings.filterwarnings("ignore")

## <span style="color:#ff5f27;"> 🔮 Connect to Hopsworks Feature Store </span>

In [4]:
# If you haven't set the env variable 'HOPSWORKS_API_KEY', then uncomment the next line and enter your API key
# os.environ["HOPSWORKS_API_KEY"] = ""

project = hopsworks.login()
fs = project.get_feature_store() 

Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/1173659
Connected. Call `.close()` to terminate connection gracefully.


In [5]:
# Get_or_create the 'air_quality_fv' feature view
feature_view = fs.get_feature_view(
    name='air_quality_fv',
    version=1
)

# Initialize batch scoring
feature_view.init_batch_scoring(1)

weather_fg = fs.get_feature_group(
    name='weather',
    version=1,
)

## <span style="color:#ff5f27;">🪝 Retrieve AirQuality Model from Model Registry</span>

In [6]:
# Retrieve the model registry
mr = project.get_model_registry()

# Retrieve the 'air_quality_xgboost_model' from the model registry
retrieved_model = mr.get_model(
    name="air_quality_xgboost_model",
    version=1,
)

# Download the saved model artifacts  to a local directory
saved_model_dir = retrieved_model.download()

Connected. Call `.close()` to terminate connection gracefully.
Downloading model artifact (1 dirs, 5 files)... DONE

In [7]:
# Loading the XGBoost regressor model and label encoder from the saved model directory
# model_air_quality = joblib.load(saved_model_dir + "/xgboost_regressor.pkl")
model_air_quality = XGBRegressor()

model_air_quality.load_model(saved_model_dir + "/model.json")

# Displaying the retrieved XGBoost regressor model
model_air_quality

## <span style='color:#ff5f27'>⬇️ LLM Loading

In [8]:
import time
start_time = time.time()

# Load the LLM and its corresponding tokenizer.
model_llm, tokenizer = load_model(model_id="imiraoui/OpenHermes-2.5-Mistral-7B-sharded")

duration = time.time() - start_time
print(f"The code execution took {duration} seconds.")

Loading model from disk


Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
`low_cpu_mem_usage` was None, now set to True since model is quantized.


The code execution took 7.7953715324401855 seconds.


## <span style='color:#ff5f27'>⛓️ LangChain

In [9]:
import time
start_time = time.time()


# Create and configure a language model chain.
llm_chain = get_llm_chain(
    model_llm,
    tokenizer,
)

duration = time.time() - start_time
print(f"The code execution took {duration} seconds.")

The code execution took 0.07165884971618652 seconds.


## <span style='color:#ff5f27'>🧬 Domain-specific Evaluation Harness

**Systematic evaluations** that can run automatically in CI/CD pipelines are key to evaluating models/RAG. 


In [10]:
QUESTION7 = "Hi!"

response7 = generate_response(
    QUESTION7,
    feature_view,
    weather_fg,
    model_air_quality,
    model_llm, 
    tokenizer,
    llm_chain,
    verbose=False,
)

print(response7)

Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)


 
Hello! I can help you with air quality information for your city. Based on the context table provided, the air quality indicators for your city on Saturday, November 16, 2024 are as follows:

- Fine particulate matter (PM2.5): 12 µg/m³
- Ozone (O3): 30 ppb
- Nitrogen dioxide (NO2): 25 ppb
- Sulfur dioxide (SO2): 5 ppb

The air quality on this day is considered good. The PM2.5 level is within the safe range (12 µg/m³), and the ozone, nitrogen dioxide, and sulfur dioxide levels are also within the safe limits. You can feel comfortable going for a walk or engaging in outdoor activities.


In [11]:
QUESTION = "Who are you?"

response = generate_response(
    QUESTION,
    feature_view,
    weather_fg,
    model_air_quality,
    model_llm, 
    tokenizer,
    llm_chain,
    verbose=False,
)

print(response)

 
I am an expert in air quality analysis. I can provide you with information about the current air quality in your city based on the context table you have provided.

To give you the most accurate information, I need to know the city you are referring to. Can you please provide me with the city name?


In [12]:
QUESTION1 = "What was the average air quality from 2024-01-10 till 2024-01-14?"

response1 = generate_response(
    QUESTION1, 
    feature_view,
    weather_fg,
    model_air_quality,
    model_llm, 
    tokenizer,
    llm_chain,
    verbose=False,
)

print(response1)

Finished: Reading data from Hopsworks, using Hopsworks Feature Query Service (0.84s) 
 assistant
The average air quality from January 10th to January 14th was 10.4. During this period, the air quality was generally safe for most people, but it would be advisable to avoid outdoor activities if you have respiratory issues or are particularly sensitive to air pollution.


In [13]:
QUESTION11 = "When and what was the air quality like last week?"

response11 = generate_response(
    QUESTION11, 
    feature_view,
    weather_fg,
    model_air_quality,
    model_llm, 
    tokenizer,
    llm_chain,
    verbose=False,
)

print(response11)

Finished: Reading data from Hopsworks, using Hopsworks Feature Query Service (1.49s) 
 
Last week, the air quality varied throughout the week. On Monday, November 10th, the air quality was at its best with a measurement of 10.0. This indicates excellent air quality, and it was safe for outdoor activities. On Tuesday, November 11th, the air quality slightly increased to 20.0, which is still considered good for outdoor activities. However, on Wednesday, November 12th, the air quality slightly deteriorated to 12.0, which is considered moderate and might not be the best for sensitive individuals. The air quality continued to fluctuate throughout the week, with measurements of 18.0 on Thursday, 16.0 on Friday and Saturday. Overall, the air quality was generally good, but there were some periods when it was moderate, so it's essential to check the air quality before engaging in outdoor activities.


In [14]:
QUESTION12 = "When and what was the minimum air quality from 2024-01-10 till 2024-01-14?"

response12 = generate_response(
    QUESTION12, 
    feature_view,
    weather_fg,
    model_air_quality,
    model_llm, 
    tokenizer,
    llm_chain,
    verbose=False,
)

print(response12)

Finished: Reading data from Hopsworks, using Hopsworks Feature Query Service (1.58s) 

The minimum air quality during that period was on 2024-01-11, with an air quality level of 8.0. This is considered good air quality, which is safe for outdoor activities and breathing. You can go for a walk or engage in outdoor activities without any significant concerns.


In [15]:
QUESTION2a = "What was the air quality like last week?"

response2 = generate_response(
    QUESTION2a,
    feature_view,
    weather_fg,
    model_air_quality,
    model_llm, 
    tokenizer,
    llm_chain,
    verbose=False,
)

print(response2)

Finished: Reading data from Hopsworks, using Hopsworks Feature Query Service (1.16s) 

Last week, on November 10th, the air quality in the city was measured at 10.0. This indicates that the air quality was in the "Good" range, which means it is safe to go outside and engage in outdoor activities.


In [16]:
QUESTION2 = "What was the air quality like yesterday?"

response2 = generate_response(
    QUESTION2,
    feature_view,
    weather_fg,
    model_air_quality,
    model_llm, 
    tokenizer,
    llm_chain,
    verbose=False,
)

print(response2)

Finished: Reading data from Hopsworks, using Hopsworks Feature Query Service (1.11s) 
 
Yesterday, on November 15th, the air quality in the city was classified as Moderate. The air quality measurement was 16.0, which is within the safe range. It is generally safe to go outside and engage in outdoor activities, although sensitive individuals may experience mild discomfort. It is recommended to limit prolonged exposure to outdoor activities, especially for those who are sensitive to air pollution.


In [17]:
QUESTION3 = "What will the air quality be like next Tuesday?"

response3 = generate_response(
    QUESTION3, 
    feature_view,
    weather_fg,
    model_air_quality,
    model_llm, 
    tokenizer,
    llm_chain,
    verbose=False,
)

print(response3)

Finished: Reading data from Hopsworks, using Hopsworks Feature Query Service (1.01s) 
 
Next Tuesday, the air quality is expected to be moderate. The PM2.5 concentration is forecasted to be around 15 µg/m3, which is within the safe range of 0-35 µg/m3. This level of air quality is suitable for most outdoor activities, including walking and jogging. However, if you or someone in your family has a pre-existing respiratory condition, it is advisable to consult with a healthcare professional before engaging in prolonged outdoor activities.


In [18]:
QUESTION4 = "What will the air quality be like the day after tomorrow?"

response4 = generate_response(
    QUESTION4, 
    feature_view,
    weather_fg,
    model_air_quality,
    model_llm, 
    tokenizer,
    llm_chain,
    verbose=False,
)

print(response4)

Finished: Reading data from Hopsworks, using Hopsworks Feature Query Service (0.99s) 
 
The air quality on Monday, November 18th, is expected to be moderate. The air quality index (AQI) is predicted to be around 60. This level is generally considered safe for most people, but those who are sensitive to air pollution may want to limit their outdoor activities. It's a good day to go for a walk or engage in outdoor activities, but it's advisable to monitor the air quality throughout the day and avoid prolonged exposure if the AQI rises significantly.


In [19]:
QUESTION5 = "What will the air quality be like this Sunday?"

response5 = generate_response(
    QUESTION5, 
    feature_view,
    weather_fg,
    model_air_quality,
    model_llm, 
    tokenizer,
    llm_chain,
    verbose=False,
)

print(response5)

Finished: Reading data from Hopsworks, using Hopsworks Feature Query Service (1.06s) 
 
This Sunday, the air quality in the city is expected to be moderate, with an Air Quality Index (AQI) of 65. This falls within the "moderate" category, which means the air quality is generally safe for most people to breathe, but sensitive groups such as children, the elderly, and those with respiratory issues may still experience some discomfort. It would be suitable for outdoor activities, but individuals should monitor their health and adjust their plans accordingly.


In [20]:
QUESTION7 = "What will the air quality be like for the rest of the week?"

response7 = generate_response(
    QUESTION7, 
    feature_view,
    weather_fg,
    model_air_quality,
    model_llm, 
    tokenizer,
    llm_chain,
    verbose=False,
)

print(response7)

Reading data from Hopsworks, using Hopsworks Feature Query Service..   

You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


Finished: Reading data from Hopsworks, using Hopsworks Feature Query Service (0.99s) 
 assistant
Based on the air quality measurements provided, the air quality for the rest of the week is as follows:

- On Sunday, the air quality is expected to be safe for outdoor activities, with a reading of 25.03.
- On Monday, the air quality may be slightly higher than ideal for outdoor activities, with a reading of 25.83. It is still generally safe, but sensitive individuals may want to take precautions.
- On Tuesday, the air quality is expected to improve slightly, with a reading of 24.86. This is considered safe for outdoor activities.
- On Wednesday, the air quality is expected to be significantly higher, with a reading of 52.44. This level of air pollution may cause health concerns, especially for vulnerable populations, and it is advisable to limit outdoor activities.
- On Thursday, the air quality is expected to improve, with a reading of 13.73. This is considered safe for outdoor activitie

In [21]:
QUESTION = "Will the air quality be safe or not for the next week?"

response = generate_response(
    QUESTION7, 
    feature_view,
    weather_fg,
    model_air_quality,
    model_llm, 
    tokenizer,
    llm_chain,
    verbose=False,
)

print(response)

Finished: Reading data from Hopsworks, using Hopsworks Feature Query Service (1.02s) 
 assistant
Based on the air quality measurements provided, the air quality on November 16th is 25.03. This level is considered moderate, and it may be advisable to avoid prolonged outdoor activities. 

For the following days, the air quality is expected to improve. On November 17th, the air quality is expected to be 25.83, which is still in the moderate range. On November 18th, the air quality is expected to be 24.86, which is in the good range. 

The air quality on November 19th is expected to be 52.44, which is considered unhealthy. It is advised to limit outdoor activities and avoid prolonged exposure to the air during this time. 

Finally, on November 20th, the air quality is expected to improve to 13.73, which is in the good range. This is a safe level for outdoor activities. 

Overall, the air quality for the rest of the week is expected to fluctuate between moderate and good levels. It is advis

In [22]:
QUESTION = "Is tomorrow's air quality level dangerous?"

response = generate_response(
    QUESTION, 
    feature_view,
    weather_fg,
    model_air_quality,
    model_llm, 
    tokenizer,
    llm_chain,
    verbose=False,
)

print(response)

Finished: Reading data from Hopsworks, using Hopsworks Feature Query Service (1.06s) 
 assistant
Based on the air quality indicators for the city provided, tomorrow's air quality level is safe. The PM2.5 and PM10 levels are within the safe range, and the ozone concentration is also within the acceptable limit. You can go outside and enjoy your day without worrying about any health issues caused by air pollution.


In [23]:
QUESTION = "Can you please explain different PM2_5 air quality levels?"

response = generate_response(
    QUESTION, 
    feature_view,
    weather_fg,
    model_air_quality,
    model_llm, 
    tokenizer,
    llm_chain,
    verbose=False,
)

print(response)

 
Certainly! PM2.5 levels are categorized into different air quality levels based on their concentration in the air. Here's a breakdown of these levels:

1. Good: PM2.5 levels below 12 µg/m³ indicate good air quality. This is ideal for outdoor activities and general well-being.
2. Moderate: PM2.5 levels between 12 and 35 µg/m³ are considered moderate. While it's not a significant health concern, sensitive groups like children and the elderly may experience respiratory issues.
3. Poor: PM2.5 levels ranging from 35 to 75 µg/m³ are categorized as poor. This level may cause respiratory issues for the general population, and it's advised to limit outdoor activities.
4. Very Poor: PM2.5 levels between 75 and 150 µg/m³ are considered very poor. This can lead to respiratory problems for most people, and it's recommended to limit outdoor activities and stay indoors if possible.
5. Severe: PM2.5 levels above 150 µg/m³ are classified as severe. This can lead to serious respiratory issues and shou

In [24]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"

In [25]:
# !pip install openai --quiet
# !pip install gradio==3.40.1 --quiet

In [27]:
import gradio as gr
from transformers import pipeline
import numpy as np
from xgboost import XGBRegressor
from functions.llm_chain import load_model, get_llm_chain, generate_response


2024-11-16 18:01:23,795 INFO: HTTP Request: GET https://api.gradio.app/gradio-messaging/en "HTTP/1.1 200 OK"


In [28]:
# Initialize the ASR pipeline
transcriber = pipeline("automatic-speech-recognition", model="openai/whisper-base.en")

def transcribe(audio):
    sr, y = audio
    y = y.astype(np.float32)
    if y.ndim > 1 and y.shape[1] > 1:
        y = np.mean(y, axis=1)
    y /= np.max(np.abs(y))
    return transcriber({"sampling_rate": sr, "raw": y})["text"]

def generate_query_response(user_query, method, openai_api_key=None):
    if method == 'Hermes LLM':        
        response = generate_response(
            user_query,
            feature_view,
            weather_fg,
            model_air_quality,
            model_llm,
            tokenizer,
            llm_chain,
            verbose=False,
        )
        return response
    
    elif method == 'OpenAI API' and openai_api_key:
        client = OpenAI(
            api_key=openai_api_key
        )
        
        response = generate_response_openai(   
            user_query,
            feature_view,
            weather_fg,
            model_air_quality,
            client=client,
            verbose=True,
        )
        return response
        
    else:
        return "Invalid method or missing API key."

def handle_input(text_input=None, audio_input=None, method='Hermes LLM', openai_api_key=""):
    if audio_input is not None:
        user_query = transcribe(audio_input)
    else:
        user_query = text_input
    
    # Check if OpenAI API key is required but not provided
    if method == 'OpenAI API' and not openai_api_key.strip():
        return "OpenAI API key is required for this method."

    if user_query:
        return generate_query_response(user_query, method, openai_api_key)
    else:
        return "Please provide input either via text or voice."
    

# Setting up the Gradio Interface
iface = gr.Interface(
    fn=handle_input,
    inputs=[
        gr.Textbox(placeholder="Type here or use voice input..."), 
        gr.Audio(), 
        gr.Radio(["Hermes LLM", "OpenAI API"], label="Choose the response generation method"),
        gr.Textbox(label="Enter your OpenAI API key (only if you selected OpenAI API):", type="password")  # Removed `optional=True`
    ],
    outputs="text",
    title="🌤️ AirQuality AI Assistant 💬",
    description="Ask your questions about air quality or use your voice to interact. Select the response generation method and provide an OpenAI API key if necessary."
)

iface.launch(share=True)


config.json:   0%|          | 0.00/1.94k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/290M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/1.53k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/805 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.41M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

normalizer.json:   0%|          | 0.00/52.7k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/34.6k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/1.83k [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/185k [00:00<?, ?B/s]

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


* Running on local URL:  http://127.0.0.1:7860
2024-11-16 18:01:55,669 INFO: HTTP Request: GET http://127.0.0.1:7860/gradio_api/startup-events "HTTP/1.1 200 OK"
2024-11-16 18:01:55,686 INFO: HTTP Request: HEAD http://127.0.0.1:7860/ "HTTP/1.1 200 OK"
2024-11-16 18:01:56,159 INFO: HTTP Request: GET https://api.gradio.app/pkg-version "HTTP/1.1 200 OK"
2024-11-16 18:01:56,484 INFO: HTTP Request: GET https://api.gradio.app/v3/tunnel-request "HTTP/1.1 200 OK"
2024-11-16 18:01:56,557 INFO: HTTP Request: GET https://cdn-media.huggingface.co/frpc-gradio-0.3/frpc_linux_amd64 "HTTP/1.1 200 OK"
* Running on public URL: https://e8c06633b1c9cdbc81.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)
2024-11-16 18:01:59,232 INFO: HTTP Request: HEAD https://e8c06633b1c9cdbc81.gradio.live "HTTP/1.1 200 OK"




Finished: Reading data from Hopsworks, using Hopsworks Feature Query Service (3.94s) 


---