Merge branch 'refactor-etc-dep-minseokl' into 'main'

High-level Deprecation of ETC See merge request dl/hugectr/hugectr!1479
NVIDIA-Merlin · Sep 27, 2023 · 969a9c1 · 969a9c1
2 parents d22df87 + c1cb00b
commit 969a9c1
Show file tree

Hide file tree

Showing 5 changed files with 14 additions and 4 deletions.
diff --git a/HugeCTR/src/embedding_training_cache/embedding_training_cache_impl.cpp b/HugeCTR/src/embedding_training_cache/embedding_training_cache_impl.cpp
@@ -42,7 +42,10 @@ EmbeddingTrainingCacheImpl<TypeKey>::EmbeddingTrainingCacheImpl(
     : embeddings_(embeddings),
       ps_manager_(ps_types, sparse_embedding_files, get_embedding_type(embeddings),
                   embedding_params, get_max_embedding_size_(), resource_manager, local_paths,
-                  hmem_cache_configs) {}
+                  hmem_cache_configs) {
+  HCTR_LOG_S(WARNING, WORLD) << "EmbeddingTrainingCache will be deprecated in a future release"
+                             << std::endl;
+}
 
 template <typename TypeKey>
 void EmbeddingTrainingCacheImpl<TypeKey>::load_(std::vector<std::string>& keyset_file_list) {

diff --git a/docs/source/api/python_interface.md b/docs/source/api/python_interface.md
@@ -99,7 +99,9 @@ solver = hugectr.CreateSolver(max_eval_batches = 300,
 
 ***
 
-#### CreateETC method
+#### CreateETC method (deprecated)
+
+**Warning**: this method will be deprecated in a future release.
 
 ```python
 hugectr.CreateETC()

diff --git a/docs/source/hugectr_embedding_training_cache.md b/docs/source/hugectr_embedding_training_cache.md
@@ -1,4 +1,4 @@
-# HugeCTR Embedding Training Cache
+# HugeCTR Embedding Training Cache (Deprecated)
 
 ```{contents}
 ---
@@ -10,6 +10,8 @@ backlinks: none
 
 ## Introduction to the HugeCTR Embedded Training Cache
 
+**Warning**: this feature will be deprecated in a future release.
+
 This document introduces the **Embedding Training Cache (ETC)** feature in HugeCTR for incremental training. The ETC allows training models with huge embedding tables that exceed the available GPU memory in size.
 
 Normally, the maximum model size in HugeCTR is limited by the hardware resources. A model with larger embedding tables will of course require more GPU memory. However, the amount of GPU's and, therefore, also the amount of GPU memory that can be fit into a single machine or a cluster is finite. This naturally upper-bounds the size of the models that can be executed in a specific setup. The ETC feature is designed to ease this restriction by prefetching portions of the embedding table to the GPU in the granularity of pass as they are required.

diff --git a/notebooks/README.md b/notebooks/README.md
@@ -98,7 +98,7 @@ The notebooks are located within the container and can be found in the `/HugeCTR
 
 Here's a list of notebooks that you can run:
 - [hugectr_e2e_demo_with_nvtabular.ipynb](hugectr_e2e_demo.ipynb): Notebook to preprocess data using NVTabular, train the model with HugeCTR, and do the offline inference with the HugeCTR HPS.
-- [continuous_training.ipynb](continuous_training.ipynb): Notebook to introduce how to deploy continued training with HugeCTR.
+- [continuous_training.ipynb](continuous_training.ipynb) (deprecated): Notebook to introduce how to deploy continued training with HugeCTR.
 - ~multi_gpu_offline_inference.ipynb~: It was deprecated. Check out [this HPS TRT notebook](hps_trt/notebooks/demo_for_tf_trained_model.ipynb) as an alternative.
 - [hps_demo.ipynb](hps_demo.ipynb): Demonstrate how to utilize HPS Python APIs together with ONNX Runtime APIs to create an ensemble inference model.
 - [training_and_inference_with_remote_filesystem.ipynb](training_and_inference_with_remote_filesystem.ipynb): Demonstrates how to train a model with data that is stored in a remote file system such as Hadoop HDFS and AWS S3.

diff --git a/notebooks/continuous_training.ipynb b/notebooks/continuous_training.ipynb
@@ -36,9 +36,12 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "**Warning**: The feature `Embedding Training Cache (ETC)` used by this notebook will be deprecated in the future release.\n",
+    "\n",
     "## Overview\n",
     "The notebook introduces how to use the Embedding Training Cache (ETC) feature in HugeCTR for the continuous training. The ETC feature is designed to handle recommendation models with huge embedding table by the incremental training method, which allows you to train such a model that the model size is much larger than the available GPU memory size.\n",
     "\n",