Skip to content

Commit

Permalink
Merge branch 'refactor-etc-dep-minseokl' into 'main'
Browse files Browse the repository at this point in the history
High-level Deprecation of ETC

See merge request dl/hugectr/hugectr!1479
  • Loading branch information
minseokl committed Sep 27, 2023
2 parents d22df87 + c1cb00b commit 969a9c1
Show file tree
Hide file tree
Showing 5 changed files with 14 additions and 4 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,10 @@ EmbeddingTrainingCacheImpl<TypeKey>::EmbeddingTrainingCacheImpl(
: embeddings_(embeddings),
ps_manager_(ps_types, sparse_embedding_files, get_embedding_type(embeddings),
embedding_params, get_max_embedding_size_(), resource_manager, local_paths,
hmem_cache_configs) {}
hmem_cache_configs) {
HCTR_LOG_S(WARNING, WORLD) << "EmbeddingTrainingCache will be deprecated in a future release"
<< std::endl;
}

template <typename TypeKey>
void EmbeddingTrainingCacheImpl<TypeKey>::load_(std::vector<std::string>& keyset_file_list) {
Expand Down
4 changes: 3 additions & 1 deletion docs/source/api/python_interface.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,9 @@ solver = hugectr.CreateSolver(max_eval_batches = 300,

***

#### CreateETC method
#### CreateETC method (deprecated)

**Warning**: this method will be deprecated in a future release.

```python
hugectr.CreateETC()
Expand Down
4 changes: 3 additions & 1 deletion docs/source/hugectr_embedding_training_cache.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# HugeCTR Embedding Training Cache
# HugeCTR Embedding Training Cache (Deprecated)

```{contents}
---
Expand All @@ -10,6 +10,8 @@ backlinks: none

## Introduction to the HugeCTR Embedded Training Cache

**Warning**: this feature will be deprecated in a future release.

This document introduces the **Embedding Training Cache (ETC)** feature in HugeCTR for incremental training. The ETC allows training models with huge embedding tables that exceed the available GPU memory in size.

Normally, the maximum model size in HugeCTR is limited by the hardware resources. A model with larger embedding tables will of course require more GPU memory. However, the amount of GPU's and, therefore, also the amount of GPU memory that can be fit into a single machine or a cluster is finite. This naturally upper-bounds the size of the models that can be executed in a specific setup. The ETC feature is designed to ease this restriction by prefetching portions of the embedding table to the GPU in the granularity of pass as they are required.
Expand Down
2 changes: 1 addition & 1 deletion notebooks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ The notebooks are located within the container and can be found in the `/HugeCTR

Here's a list of notebooks that you can run:
- [hugectr_e2e_demo_with_nvtabular.ipynb](hugectr_e2e_demo.ipynb): Notebook to preprocess data using NVTabular, train the model with HugeCTR, and do the offline inference with the HugeCTR HPS.
- [continuous_training.ipynb](continuous_training.ipynb): Notebook to introduce how to deploy continued training with HugeCTR.
- [continuous_training.ipynb](continuous_training.ipynb) (deprecated): Notebook to introduce how to deploy continued training with HugeCTR.
- ~multi_gpu_offline_inference.ipynb~: It was deprecated. Check out [this HPS TRT notebook](hps_trt/notebooks/demo_for_tf_trained_model.ipynb) as an alternative.
- [hps_demo.ipynb](hps_demo.ipynb): Demonstrate how to utilize HPS Python APIs together with ONNX Runtime APIs to create an ensemble inference model.
- [training_and_inference_with_remote_filesystem.ipynb](training_and_inference_with_remote_filesystem.ipynb): Demonstrates how to train a model with data that is stored in a remote file system such as Hadoop HDFS and AWS S3.
Expand Down
3 changes: 3 additions & 0 deletions notebooks/continuous_training.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,12 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"**Warning**: The feature `Embedding Training Cache (ETC)` used by this notebook will be deprecated in the future release.\n",
"\n",
"## Overview\n",
"The notebook introduces how to use the Embedding Training Cache (ETC) feature in HugeCTR for the continuous training. The ETC feature is designed to handle recommendation models with huge embedding table by the incremental training method, which allows you to train such a model that the model size is much larger than the available GPU memory size.\n",
"\n",
Expand Down

0 comments on commit 969a9c1

Please sign in to comment.