In [1]:
# Copyright 2022 NVIDIA Corporation. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# =====

<img src="http://developer.download.nvidia.com/compute/machine-learning/frameworks/nvidia_logo.png" style="width: 90px; float: right;">

## 4. Building Multi-Stage Recommender Systems

Recommender Systems (RecSys) are the engine of the modern internet and the catalyst for human decisions. Building a recommendation system is challenging because it requires multiple stages (data preprocessing, offline training, item retrieval, filtering, ranking, ordering, etc.) to work together seamlessly and efficiently. The biggest challenges for new practitioners are the lack of understanding around what RecSys look like in the real world, and the gap between examples of simple models and a production-ready end-to-end recommender systems.

The figure below represents a four-stage recommender systems. This is more complex process than only training a single model and deploying it, and it is much more realistic and closer to what's happening in the real-world recommender production systems.

<img src="./images/fourstage.png" width=800 height=400 />

In this lab, we are going to showcase how we can deploy a multi-stage recommender systems using Merlin Systems library easily on Triton Inference Server. Let's go over the concepts in the figure briefly.

- **Retrieval:** This is the step to narrow down millions of items into thousands of candidates. We are going to train a Two-Tower item retrieval model to retrieve the relevant top-K candidate items.
- **Filtering:** This step is to exclude the already interacted or undesirable items from the candidate items set or to apply business logic rules. Although this is an important step, for this example we skip this step.
- **Scoring:** This is also known as ranking. Here the retrieved and filtered candidate items are being scored. We are going to train a ranking model to be able to use at our scoring step.
- **Ordering:** At this stage, we can order the final set of items that we want to recommend to the user. Here, we’re able to align the output of the model with business needs, constraints, or criteria.

To learn more about the four-stage recommender systems, you can listen to Even Oldridge's [Moving Beyond Recommender Models talk at KDD'21](https://www.youtube.com/watch?v=5qjiY-kLwFY&list=PL65MqKWg6XcrdN4TJV0K1PdLhF_Uq-b43&index=8) and read more in this [blog post](https://eugeneyan.com/writing/system-design-for-discovery/).

**Learning Objectives**

- Train a ranking and retriveal model with [Merlin Models](https://github.com/NVIDIA-Merlin/models)
- Export user query tower, user and item features, and item embedding
- Create a feature store with Feast and register features to feature repo.

**GOAL:** In this lab, we build and deploy a multi-stage recommender system to predict candidate items relevance scores, and then recommend top-k most relevant items for a given user.

**Import Required Libraries**

In [2]:
import os

import glob
import cudf 
import pandas as pd
import numpy as np
import nvtabular as nvt
from nvtabular.ops import *
import gc
from datetime import datetime

from merlin.schema.tags import Tags
import merlin.models.tf as mm
from merlin.io.dataset import Dataset
from merlin.models.utils.dataset import unique_rows_by_features

import tensorflow as tf

2022-09-07 18:32:34.796091: I tensorflow/core/platform/cpu_feature_guard.cc:194] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-07 18:32:36.950878: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 16255 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB-LS, pci bus id: 0000:8a:00.0, compute capability: 7.0


In [3]:
seed = 42
tf.random.set_seed(seed)
np.random.seed(seed)

In [4]:
data_path = '/workspace/data/ecom/'
output_path = os.path.join(data_path,'processed_nvt')
output_path2 = os.path.join(data_path,'processed_filtered')
BASE_DIR = os.environ.get(
    "BASE_DIR", "/workspace/recsys_tutorial/"
)

Read processed parquet files as Dataset objects.

In [5]:
train_raw = Dataset(os.path.join(output_path, "train", "*.parquet"), part_size="500MB")
valid_raw = Dataset(os.path.join(output_path, "valid", "*.parquet"), part_size="500MB")



**Filter out the negative rows**

Here, we will filter our datasets with NVTabular `Filter()` operator to select only positive interaction rows where `target==1` in the dataset. We do that because we want to use `negative sampling` technique when training our candidate retrieval and ranking models.

In [6]:
inputs = train_raw.schema.column_names
outputs = inputs >> Filter(f=lambda df: df["target"] == 1)

In [7]:
workflow2 = nvt.Workflow(outputs)

workflow2.fit(train_raw)

workflow2.transform(train_raw).to_parquet(
    output_path=os.path.join(output_path2, "train")
)

workflow2.transform(valid_raw).to_parquet(
    output_path=os.path.join(output_path2, "valid")
)

In [8]:
workflow2.save(os.path.join(output_path2, "workflow2"))

**Read filtered parquet files as Dataset objects.**

In [9]:
train = Dataset(os.path.join(output_path2, "train", "*.parquet"), part_size="500MB")
valid = Dataset(os.path.join(output_path2, "valid", "*.parquet"), part_size="500MB")

In [10]:
train.schema

Unnamed: 0,name,tags,dtype,is_list,is_ragged,properties.num_buckets,properties.freq_threshold,properties.max_size,properties.start_index,properties.cat_path,properties.embedding_sizes.cardinality,properties.embedding_sizes.dimension,properties.domain.min,properties.domain.max
0,user_id,"(Tags.USER, Tags.USER_ID, Tags.CATEGORICAL)",int32,False,False,,0.0,0.0,0.0,.//categories/unique.user_id.parquet,350630.0,512.0,0.0,350629.0
1,ts_weekday,"(Tags.USER, Tags.CATEGORICAL)",int32,False,False,,0.0,0.0,0.0,.//categories/unique.ts_weekday.parquet,8.0,16.0,0.0,7.0
2,ts_hour,"(Tags.USER, Tags.CATEGORICAL)",int32,False,False,,0.0,0.0,0.0,.//categories/unique.ts_hour.parquet,25.0,16.0,0.0,24.0
3,product_id,"(Tags.CATEGORICAL, Tags.ITEM_ID, Tags.ITEM)",int32,False,False,,0.0,0.0,0.0,.//categories/unique.product_id.parquet,51376.0,512.0,0.0,51375.0
4,cat_0,"(Tags.CATEGORICAL, Tags.ITEM)",int32,False,False,,0.0,0.0,0.0,.//categories/unique.cat_0.parquet,14.0,16.0,0.0,13.0
5,cat_1,"(Tags.CATEGORICAL, Tags.ITEM)",int32,False,False,,0.0,0.0,0.0,.//categories/unique.cat_1.parquet,61.0,16.0,0.0,60.0
6,cat_2,"(Tags.CATEGORICAL, Tags.ITEM)",int32,False,False,,0.0,0.0,0.0,.//categories/unique.cat_2.parquet,90.0,20.0,0.0,89.0
7,brand,"(Tags.CATEGORICAL, Tags.ITEM)",int32,False,False,,0.0,0.0,0.0,.//categories/unique.brand.parquet,2653.0,132.0,0.0,2652.0
8,price,"(Tags.CONTINUOUS, Tags.ITEM)",float32,False,False,,,,,,,,,
9,relative_price,"(Tags.CONTINUOUS, Tags.ITEM)",float32,False,False,,,,,,,,,


### 4.1. Building and Training a Candidate Retrieval Model with Merlin Models

Industrial recommender systems have major tasks to accomplish that can be quite demanding. One requirement is to deliver a recommendation under the expected latency requirements (e.g., within milliseconds) to warrant a good user experience. That might require a significant amount of creativity and engineering. And the second consideration is that we might want to minimize infrastructure costs while solving the latency issue, which is yet another obstacle to overcome!

In large scale recommender systems pipelines, the size of the item catalog (number of unique items) might be in the order of millions or billions. At such scale, a typical setup is having two-stage pipeline, where a faster candidate retrieval model quickly extracts thousands of relevant items and a then a more powerful ranking model (i.e. with more features and more powerful architecture) ranks the top-k items that are going to be displayed to the user. Therefore, industrial recommender systems usually consists of candidate retrieval and ranking (scoring) stages. The candidate retrieval stage retrieves candidate items that are relevant to user interests, while the ranking stage sorts candidate items by user interests.

In this notebook, we start with the first stage of multi-stage recommender systems- the Candidate Retrieval. For ML-based candidate retrieval model, as it needs to quickly score millions of items for a given user, a popular choices are models that can produce recommendation scores by just computing the dot product the user embeddings and item embeddings. Popular choices of such models are `Matrix Factorization (MF)`, which learns low-rank user and item embeddings, and the `Two-Tower architecture`, which is a neural network with two MLP towers where both user and item features are fed to generate user and item embeddings in the output. Such models can be efficiently served by indexing the trained item embeddings into an Approximate Nearest Neighbors (ANN) engine and during inference scoring user embeddings over all indexed item embeddings within the engine. 

#### Two-Tower Model

We are going to train a Two-Tower model for item retrieval. A Two-Tower Model consists of item (candidate) and user (query) encoder towers. With two towers, the model can learn representations (embeddings) for queries and candidates separately.


<img src="./images/twotower.png" width=400 height=400 />

**Negative Sampling** <br>


Many datasets for recommender systems contain implicit feedback with logs of user interactions like clicks, add-to-cart, purchases, music listening events, rather than explicit ratings that reflects user preferences over items. To be able to learn from implicit feedback, we use the general (and naive) assumption that the interacted items are more relevant for the user than the non-interacted ones. In Merlin Models we provide some scalable negative sampling algorithms for the Item Retrieval Task. In particular, we use in this example the in-batch sampling algorithm which uses the items interacted by other users as negatives within the same mini-batch.

Now, let's build our Two-Tower model. In a nutshell, we aggregate all user features to feed in user tower and feed the item features to the item tower. Then we compute the positive score by multiplying the user embedding with the item embedding and sample negative items (read more about negative sampling [here](https://openreview.net/pdf?id=824xC-SgWgU) and [here](https://medium.com/mlearning-ai/overview-negative-sampling-on-recommendation-systems-230a051c6cd7)), whose item embeddings are also multiplied by the user embedding. Then we apply the loss function on top of the positive and negative scores.

In [11]:
schema = train.schema
schema = schema.select_by_tag([Tags.ITEM_ID, Tags.USER_ID, Tags.ITEM, Tags.USER]).without(['event_time_ts', 'user_id_raw', 'product_id_raw'])

In [12]:
model_tt = mm.TwoTowerModel(
    schema,
    query_tower=mm.MLPBlock([128, 64], no_activation_last_layer=True),
    samplers=[mm.InBatchSampler()],
    embedding_options=mm.EmbeddingOptions(infer_embedding_sizes=True),
)

In [13]:
%%time
model_tt.compile(
    optimizer="adam",
    run_eagerly=False,
    loss="categorical_crossentropy",
    metrics=[mm.RecallAt(10), mm.NDCGAt(10)],
)
model_tt.fit(train, validation_data=valid, batch_size=1024 * 8, epochs=2)

Epoch 1/2
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method


The sampler InBatchSampler returned no samples for this batch.




The sampler InBatchSampler returned no samples for this batch.


Epoch 2/2
CPU times: user 1min 48s, sys: 5.88 s, total: 1min 54s
Wall time: 36.2 s


<keras.callbacks.History at 0x7f45ab659b20>

#### 4.1.1. Exporting query (user) model

We export the query tower to use it later during the model deployment stage with Merlin Systems.

In [14]:
query_tower = model_tt.retrieval_block.query_block()
query_tower.save(os.path.join(BASE_DIR, "query_tower"))



INFO:tensorflow:Assets written to: /workspace/recsys_tutorial/query_tower/assets


INFO:tensorflow:Assets written to: /workspace/recsys_tutorial/query_tower/assets


### 4.2. Train a ranking Model

In this section we train DLRM architecture as our ranking (scoring) model with using negative sampling technique.

<img src="./images/DLRM.png" width=400 height=400 />

Define schema object and remove columns from the schema.

In [15]:
schema = train.schema.without(['event_time_ts', 'user_id_raw', 'product_id_raw'])
target_column = schema.select_by_tag(Tags.TARGET).column_names[0]
target_column

'target'

In this section we are going to learn how we can train a ranking model with negative sampling method as we did for Two-Tower model. This time, we are going to use `UniformNegativeSampling` class for that.

Augment the batch of positive interactions with `n_per_positive` negatives sampled from the same batch. 

In [16]:
from merlin.models.tf.data_augmentation.negative_sampling import UniformNegativeSampling
from merlin.models.tf.dataset import BatchedDataset

# do negative sampling on the fly
batch_size, n_per_positive = 2048, 64
add_negatives = UniformNegativeSampling(schema, n_per_positive, seed=42, return_tuple=True)
dataset = BatchedDataset(train, batch_size=batch_size, shuffle=True)
dataset = dataset.map(add_negatives)

We can see that our train dataset only has positive interactions.

In [17]:
train.to_ddf().compute().target.value_counts()

1    1474912
Name: target, dtype: int32

After negative sampling, we can check a batch and see that negatives are added to positive interactions. We have a batch with a shape of smaller than `batch_size + batch_size *n_per_positive` length.

In [18]:
inputs, target = next(iter(dataset))
print(target)

tf.Tensor(
[[1]
 [1]
 [1]
 ...
 [0]
 [0]
 [0]], shape=(132722, 1), dtype=int32)


In [19]:
model = mm.DLRMModel(
    schema,
    embedding_dim=64,
    bottom_block=mm.MLPBlock([128, 64]),
    top_block=mm.MLPBlock([128, 64, 32]),
    prediction_tasks=mm.BinaryClassificationTask(target_column),
)

In [20]:
%%time
model.compile(optimizer='adam', run_eagerly=False, metrics=[], 
              weighted_metrics=[tf.keras.metrics.BinaryAccuracy(),tf.keras.metrics.AUC()]
             )
model.fit(dataset, epochs=2, class_weight = {0: 1, 1: n_per_positive}, train_metrics_steps=100)

Epoch 1/2
Epoch 2/2
CPU times: user 2min 47s, sys: 11 s, total: 2min 58s
Wall time: 2min 17s


<keras.callbacks.History at 0x7f45a9eef760>

We used `class_weight` arg to penalize more the misclassification made by the minority class (actual positive interactions) so the model doesn't get biased towards the majority class (sampled negatives). 

In [21]:
valid_dataset = BatchedDataset(valid, shuffle=False, batch_size = batch_size)
valid_dataset = valid_dataset.map(add_negatives)
metrics = model.evaluate(valid_dataset, return_dict=True)
metrics



{'loss': 0.8333698511123657,
 'binary_accuracy': 0.6017794013023376,
 'auc': 0.8256058096885681,
 'regularization_loss': 0.0}

In [22]:
model.save(os.path.join(BASE_DIR, "dlrm"))



INFO:tensorflow:Assets written to: /workspace/recsys_tutorial/dlrm/assets


INFO:tensorflow:Assets written to: /workspace/recsys_tutorial/dlrm/assets


### 4.3. Set up a feature store with Feast

In [23]:
!rm -rf $BASE_DIR/feature_repo
!cd $BASE_DIR && feast init feature_repo


Creating a new Feast repository in [1m[32m/workspace/recsys_tutorial/feature_repo[0m.



[Feast](https://docs.feast.dev/) is an end-to-end open source feature store for machine learning. Feast (Feature Store) is a customizable operational data system that re-uses existing infrastructure to manage and serve machine learning features to real-time models.

Below we create a new Feast repository called `feature_repo` under the `BASE_DIR`.

You should be seeing a message like Creating a new Feast repository in ... printed out above. Now, we navigate to the feature_repo folder and remove the demo parquet file created by default, and `examples.py` file.

In [24]:
feature_repo_path = os.path.join(BASE_DIR, "feature_repo")
if os.path.exists(f"{feature_repo_path}/example.py"):
    os.remove(f"{feature_repo_path}/example.py")
if os.path.exists(f"{feature_repo_path}/data/driver_stats.parquet"):
    os.remove(f"{feature_repo_path}/data/driver_stats.parquet")

In [25]:
user_features = (
    unique_rows_by_features(train_raw, [Tags.USER,Tags.TIME], Tags.USER_ID)
    .compute()
    .reset_index(drop=True)
)

**unique_rows_by_features** : A utility function we can easily extract both unique user and item features tables as cuDF dataframes. The method extracts unique rows from a specified dataset (train_raw) based on a specified id-column tag (Tags.USER_ID), and the features to return are defined by `features_tag` ([Tags.USER,Tags.TIME]).

In [26]:
user_features["datetime"] = user_features["event_time_ts"].astype("datetime64[ns]")
user_features["created"] = datetime.now()
user_features["created"] = user_features["created"].astype("datetime64[ns]")

In [27]:
user_features = user_features.drop(columns=['event_time_ts'])

In [28]:
user_features.head()

Unnamed: 0,user_id,ts_weekday,ts_hour,user_id_raw,datetime,created
0,1,2,6,478741761,2020-03-15 11:47:05,2022-09-07 18:36:20.526028
1,2,4,6,512402665,2020-01-13 11:47:29,2022-09-07 18:36:20.526028
2,3,2,9,512416542,2020-03-29 13:19:05,2022-09-07 18:36:20.526028
3,4,4,8,512454459,2020-03-23 12:35:18,2022-09-07 18:36:20.526028
4,5,2,6,512487885,2020-03-29 11:14:14,2022-09-07 18:36:20.526028


In [29]:
user_features.to_parquet(
    os.path.join(BASE_DIR, "feature_repo/data", "user_features.parquet")
)

In [30]:
item_features = (
    unique_rows_by_features(train_raw, [Tags.ITEM, Tags.TIME], Tags.ITEM_ID)
    .compute()
    .reset_index(drop=True)
)

In [31]:
item_features["datetime"] = item_features["event_time_ts"].astype("datetime64[ns]")
item_features["created"] = datetime.now()
item_features["created"] = item_features["created"].astype("datetime64[ns]")

In [32]:
item_features = item_features.drop(columns=['event_time_ts'])

In [33]:
item_features.head(2)

Unnamed: 0,product_id,cat_0,cat_1,cat_2,brand,price,relative_price,TE_user_id_target,TE_brand_target,TE_cat_1_target,TE_cat_2_target,product_id_raw,datetime,created
0,1,1,1,1,1,0.472491,0.061859,0.434125,0.530555,0.574812,0.910135,1004767,2020-03-31 16:24:49,2022-09-07 18:36:21.478107
1,2,1,1,1,2,1.546095,0.09492,-0.848214,0.431139,0.572524,0.908703,1005115,2020-03-31 14:33:00,2022-09-07 18:36:21.478107


In [34]:
# save to disk
item_features.to_parquet(
    os.path.join(BASE_DIR, "feature_repo/data", "item_features.parquet")
)

In [35]:
item_embs = model_tt.item_embeddings(
    Dataset(item_features, schema=schema), batch_size=1024
)
item_embs_df = item_embs.compute(scheduler="synchronous")



INFO:tensorflow:Assets written to: /tmp/tmpsi2hxblk/assets


INFO:tensorflow:Assets written to: /tmp/tmpsi2hxblk/assets






In [36]:
# select only item_id together with embedding columns
item_embeddings = item_embs_df.drop(
    columns=['cat_0', 'cat_1', 'cat_2', 'brand', 'price',
       'relative_price', 'TE_user_id_target', 'TE_brand_target',
       'TE_cat_1_target', 'TE_cat_2_target']
)

In [37]:
item_embeddings.head(2)

Unnamed: 0,product_id,0,1,2,3,4,5,6,7,8,...,54,55,56,57,58,59,60,61,62,63
0,1,0.199455,-0.737148,-0.806116,0.673871,0.739071,0.20846,0.049455,-0.641033,0.23702,...,-0.78319,-0.52361,-0.795948,0.487675,-0.72492,-0.519836,-0.202461,0.449809,-1.110398,-0.467047
1,2,-0.172654,-0.519757,-1.771691,0.043944,-0.142695,-0.614831,0.644453,0.315116,1.239898,...,0.645124,0.017929,-0.412314,-0.921249,0.271537,-0.30716,0.632588,-0.344432,-2.812758,-1.156532


In [38]:
# save to disk
item_embeddings.to_parquet(os.path.join(BASE_DIR, "item_embeddings.parquet"))

### 4.4. Create feature definitions

Now we will create our user and item features definitions in the user_features.py and item_features.py files and save these files in the feature_repo.

In [39]:
file = open(os.path.join(BASE_DIR, "feature_repo/", "user_features.py"), "w")
file.write(
    """
from google.protobuf.duration_pb2 import Duration
import datetime
from feast import Entity, Feature, FeatureView, ValueType
from feast.infra.offline_stores.file_source import FileSource

user_features = FileSource(
    path="{}",
    event_timestamp_column="datetime",
    created_timestamp_column="created",
)

user_raw = Entity(name="user_id_raw", value_type=ValueType.INT32, description="user id raw",)

user_features_view = FeatureView(
    name="user_features",
    entities=["user_id_raw"],
    ttl=Duration(seconds=86400 * 7),
    features=[
        Feature(name="ts_weekday", dtype=ValueType.INT32),
        Feature(name="ts_hour", dtype=ValueType.INT32),
        Feature(name="user_id", dtype=ValueType.INT32),
    ],
    online=True,
    input=user_features,
    tags=dict(),
)
""".format(
        os.path.join(BASE_DIR, "feature_repo/data/", "user_features.parquet")
    )
)
file.close()

In [40]:
with open(os.path.join(BASE_DIR, "feature_repo/", "item_features.py"), "w") as f:
    f.write(
        """
from google.protobuf.duration_pb2 import Duration
import datetime
from feast import Entity, Feature, FeatureView, ValueType
from feast.infra.offline_stores.file_source import FileSource

item_features = FileSource(
    path="{}",
    event_timestamp_column="datetime",
    created_timestamp_column="created",
)

item = Entity(name="product_id", value_type=ValueType.INT32, description="product id",)

item_features_view = FeatureView(
    name="item_features",
    entities=["product_id"],
    ttl=Duration(seconds=86400 * 7),
    features=[
        Feature(name="cat_0", dtype=ValueType.INT32),
        Feature(name="cat_1", dtype=ValueType.INT32),
        Feature(name="cat_2", dtype=ValueType.INT32),
        Feature(name="brand", dtype=ValueType.INT32),
        Feature(name="price", dtype=ValueType.FLOAT),
        Feature(name="relative_price", dtype=ValueType.FLOAT),
        Feature(name="TE_user_id_target", dtype=ValueType.FLOAT),
        Feature(name="TE_brand_target", dtype=ValueType.FLOAT),
        Feature(name="TE_cat_1_target", dtype=ValueType.FLOAT),
        Feature(name="TE_cat_2_target", dtype=ValueType.FLOAT),
        Feature(name="product_id_raw", dtype=ValueType.INT32),
    ],
    online=True,
    input=item_features,
    tags=dict(),
)
""".format(
            os.path.join(BASE_DIR, "feature_repo/data/", "item_features.parquet")
        )
    )
file.close()

### Summary 

In this hands-on lab we learned how
- to train a Two-Tower model as candidate retrieval model using negative sampling technique 
- to train a DLRM model as ranking model using negative sampling technique 
- to export user, item features and item embeddings and save them
- to set up a feature store using open-source tool FEAST and register features

For first three steps, we used [Merlin Models](https://github.com/NVIDIA-Merlin/models) library.  Now we are ready to move on to our final lab where we will build an ensemble graph and deploy multiple models as an ensemble to Triton Inference Server [TIS](https://github.com/triton-inference-server/server) using [Merlin Systems](https://github.com/NVIDIA-Merlin/systems) library.

Please execute the cell below to shut down the kernel before moving on to the next notebook, `05-Deploying-multi-stage-RecSys-with-Merlin-Systems`.

In [41]:
import IPython
app = IPython.Application.instance()
app.kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}