# <span style="font-width:bold; font-size: 3rem; color:#1EB182;"> **Air Quality** </span><span style="font-width:bold; font-size: 3rem; color:#333;">- Part 04: Batch Inference</span>

## 🗒️ This notebook is divided into the following sections:

1. Download model and batch inference data
2. Make predictions, generate PNG for forecast
3. Store predictions in a monitoring feature group adn generate PNG for hindcast

## <span style='color:#ff5f27'> 📝 Imports

In [1]:
import datetime
import pandas as pd
from xgboost import XGBRegressor
import hopsworks
import json
from functions import util

2024-03-21 20:35:24,101 INFO: generated new fontManager


In [23]:
today = datetime.datetime.now() #date.today()
tomorrow = today + datetime.timedelta(days = 1)

## <span style="color:#ff5f27;"> 📡 Connect to Hopsworks Feature Store </span>

In [3]:
project = hopsworks.login()
conn = hopsworks.connection()
fs = project.get_feature_store() 

location_str = conn.get_secrets_api().get_secret("SENSOR_LOCATION_JSON").value
location = json.loads(location_str)
country=location['country']
city=location['city']
street=location['street']
print(location_str)

Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://snurran.hops.works/p/5240
Connected. Call `.close()` to terminate connection gracefully.
Connected. Call `.close()` to terminate connection gracefully.
{"country": "sweden", "city": "stockholm", "street": "stockholm-hornsgatan-108-gata"}


## <span style="color:#ff5f27;"> ⚙️ Feature View Retrieval</span>


In [4]:
# feature_view = fs.get_feature_view(
#     name='air_quality_fv',
#     version=1,
# )

## <span style="color:#ff5f27;">🪝 Download the model from Model Registry</span>

In [5]:
mr = project.get_model_registry()

retrieved_model = mr.get_model(
    name="air_quality_xgboost_model",
    version=1,
)

# Download the saved model artifacts to a local directory
saved_model_dir = retrieved_model.download()

Connected. Call `.close()` to terminate connection gracefully.
Downloading model artifact (1 dirs, 6 files)... DONE

In [6]:
# Loading the XGBoost regressor model and label encoder from the saved model directory
# retrieved_xgboost_model = joblib.load(saved_model_dir + "/xgboost_regressor.pkl")
retrieved_xgboost_model = XGBRegressor()

retrieved_xgboost_model.load_model(saved_model_dir + "/model.json")

# Displaying the retrieved XGBoost regressor model
retrieved_xgboost_model

## <span style="color:#ff5f27;">✨ Get Weather Forecast Features with Feature View   </span>



In [25]:
weather_fg = fs.get_feature_group(
    name='weather',
    version=1,
)

batch_data = weather_fg.filter(weather_fg.date >= today).read()
batch_data

Finished: Reading data from Hopsworks, using ArrowFlight (0.44s) 


Unnamed: 0,date,temperature_2m_mean,precipitation_sum,wind_speed_10m_max,wind_direction_10m_dominant,city
0,2024-03-22 00:00:00+00:00,8.45,0.1,24.066206,248.039383,stockholm
1,2024-03-23 00:00:00+00:00,7.35,0.0,13.004921,265.23645,stockholm
2,2024-03-24 00:00:00+00:00,5.35,0.6,6.608722,60.642342,stockholm
3,2024-03-25 00:00:00+00:00,2.8,0.0,1.44,270.0,stockholm
4,2024-03-27 00:00:00+00:00,1.9,0.4,12.727921,28.739704,stockholm
5,2024-03-26 00:00:00+00:00,3.25,0.1,11.841756,160.463257,stockholm
6,2024-03-28 00:00:00+00:00,3.35,0.0,22.702845,345.302643,stockholm
7,2024-03-29 00:00:00+00:00,4.65,0.0,15.141414,18.004259,stockholm
8,2024-03-30 00:00:00+00:00,4.3,0.0,9.659814,63.435013,stockholm


In [None]:
spine_df = pd.Dataframe()

spine_group = fs.get_or_create_spine_group(
                    name="sales",
                    version=1,
                    description="Physical shop sales features",
                    primary_key=['ss_store_sk'],
                    event_time='sale_date',
                    dataframe=spine_df
                    )

In [None]:
# batch_data = feature_view.get_batch_data(start_time=tomorrow, event_time=True, primary_key=True)
# pred_df = batch_data.drop(columns=['date'])
# print(feature_view.query.to_string())

### <span style="color:#ff5f27;">🤖 Making the predictions</span>

In [26]:
batch_data['predicted_pm25'] = retrieved_xgboost_model.predict(
    batch_data[['temperature_2m_mean', 'precipitation_sum', 'wind_speed_10m_max', 'wind_direction_10m_dominant']])
batch_data

Unnamed: 0,date,temperature_2m_mean,precipitation_sum,wind_speed_10m_max,wind_direction_10m_dominant,city,predicted_pm25
0,2024-03-22 00:00:00+00:00,8.45,0.1,24.066206,248.039383,stockholm,18.452847
1,2024-03-23 00:00:00+00:00,7.35,0.0,13.004921,265.23645,stockholm,21.591589
2,2024-03-24 00:00:00+00:00,5.35,0.6,6.608722,60.642342,stockholm,46.81892
3,2024-03-25 00:00:00+00:00,2.8,0.0,1.44,270.0,stockholm,35.114464
4,2024-03-27 00:00:00+00:00,1.9,0.4,12.727921,28.739704,stockholm,25.125097
5,2024-03-26 00:00:00+00:00,3.25,0.1,11.841756,160.463257,stockholm,47.233841
6,2024-03-28 00:00:00+00:00,3.35,0.0,22.702845,345.302643,stockholm,17.074268
7,2024-03-29 00:00:00+00:00,4.65,0.0,15.141414,18.004259,stockholm,37.657612
8,2024-03-30 00:00:00+00:00,4.3,0.0,9.659814,63.435013,stockholm,44.778652


In [27]:
batch_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9 entries, 0 to 8
Data columns (total 7 columns):
 #   Column                       Non-Null Count  Dtype              
---  ------                       --------------  -----              
 0   date                         9 non-null      datetime64[us, UTC]
 1   temperature_2m_mean          9 non-null      float32            
 2   precipitation_sum            9 non-null      float32            
 3   wind_speed_10m_max           9 non-null      float32            
 4   wind_direction_10m_dominant  9 non-null      float32            
 5   city                         9 non-null      object             
 6   predicted_pm25               9 non-null      float32            
dtypes: datetime64[us, UTC](1), float32(5), object(1)
memory usage: 452.0+ bytes


### <span style="color:#ff5f27;">🤖 Saving the predictions (for monitoring) to a Feature Group</span>

In [None]:
batch_data['street'] = street
batch_data['city'] = city
batch_data['country'] = country
# Fill in the number of days before the date on which you made the forecast (base_date)
batch_data['days_before_forecast_day'] = range(1, len(batch_data)+1)
batch_data

In [None]:
batch_data.info()

### Create Forecast Graph
Draw a graph of the predictions with dates as a PNG and save it to the github repo
Show it on github pages

In [None]:
file_path = "../../docs/air-quality/assets/img/pm25_forecast.png"
plt = util.plot_air_quality_forecast(city, street, batch_data, file_path)
plt.show()

In [None]:
# Get or create feature group
monitor_fg = fs.get_or_create_feature_group(
    name='aq_monitoring',
    description='Air Quality prediction monitoring',
    version=1,
    primary_key=['country','street','date', 'days_before_forecast_day'],
    event_time="date"
)

In [None]:
monitor_fg.insert(batch_data, wait=True)

In [None]:
# air_quality_fg = fs.get_feature_group(
#     name='air_quality',
#     version=1,
# )
# a = air_quality_fg.read()
# a = a.sort_values(by=['date'])
# a

In [None]:
# We will create a hindcast chart for  only the forecasts made 1 day beforehand
monitoring_df = monitor_fg.filter(monitor_fg.days_before_forecast_day == 1).read()
monitoring_df

In [None]:
air_quality_fg = fs.get_feature_group(
    name='air_quality',
    version=1,
)
air_quality_df = air_quality_fg.read()
air_quality_df

In [None]:
outcome_df = air_quality_df[['date', 'pm25']]
preds_df =  monitoring_df[['date', 'predicted_pm25']]

hindcast_df = pd.merge(preds_df, outcome_df, on="date")
hindcast_df = hindcast_df.sort_values(by=['date'])
hindcast_df

### Plot the Hindcast comparing predicted with forecasted values (1-day prior forecast)

In [None]:
file_path = "../../docs/air-quality/assets/img/pm25_hindcast_1day.png"
plt = util.plot_air_quality_forecast(city, street, hindcast_df, file_path, hindcast=True)
plt.show()

---