In [2]:
from feast import FeatureStore
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from joblib import dump

In [5]:
# Getting our FeatureStore
store = FeatureStore(repo_path=".")

In [7]:
# Retrieving the saved dataset and converting it to a DataFrame
training_df = store.get_saved_dataset(name="solar_power_dataset").to_df()



In [8]:
# Separating the features and labels
labels = training_df['TARGET']
features = training_df.drop(
    labels=['TARGET', 'event_timestamp', "DATA_ID", "SOURCE_KEY"],
    axis=1)

# Splitting the dataset into train and test sets
X_train, X_test, y_train, y_test = train_test_split(features,
                                                    labels,
                                                    stratify=labels)

In [9]:
# Creating and training LogisticRegression
reg = LogisticRegression()
reg.fit(X=X_train[sorted(X_train)], y=y_train)

# Saving the model
dump(value=reg, filename="model.joblib")

['model.joblib']

Once we have our model, we can use it for inference. But rather than load the inference data from our .parquet files, we can fetch the latest features from them and save them in our feature repository. This enables prediction with very low latency.

In a local environment, the performance differences between online and offline inference may be very small. But if the source data is stored in a GCP bucket or in AWS cloud storage, the differences might be very noticeable.

Feast will fetch the latest feature values and store them in ./data/online_store.db.