Add TensorFlow 2.2.0 support (#46)

Add the support for TensorFlow 2.2.0 which matches the code level used in the WML CE early access conda channel.
IBM · Nov 2, 2020 · 0ec3709 · 0ec3709
1 parent 2157ec4
commit 0ec3709
Show file tree

Hide file tree

Showing 5 changed files with 3,023 additions and 68 deletions.
diff --git a/README.md b/README.md
@@ -26,11 +26,14 @@ previously possible and, ultimately, generate more accurate results.
 
 TFLMS is built into the `tensorflow-gpu` conda package so it is installed by
 default when you install the GPU enabled TensorFlow from WML CE.
-The support is currently available in the [WML CE conda channel](https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/#/).
+
+The support is currently available for TensorFlow 2.2.0 in the [WML CE early access conda channel](https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda-early-access/).
+
+The support is currently available for TensorFlow 2.1.0 in the [WML CE conda channel](https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/#/).
+
 For more information on this channel, how to add channels, and install
 frameworks see [this WML CE install documentation](https://www.ibm.com/support/knowledgecenter/SS5SF7_1.7.0/navigation/wmlce_install.htm).
 
-
 # How to enable TFLMS
 
 The TFLMS functionality is disabled by default in TensorFlow and needs to be
@@ -153,33 +156,6 @@ process have socket affinity with the GPU which allows the fastest
 connection paths between system memory and GPU memory, which reduces the
 training or inferencing time.
 
-# Memory defragmentation
-When using very large tensors or during the course of a very long training
-operation, the model's memory allocation and usage pattern may lead to
-fragmented GPU memory and out of memory errors. When this occurs there is
-enough free memory in the GPU for the next allocation, but it is in
-non-contiguous blocks. In these cases, the process will fail and output a
-message like this:
-
-```
-Enough free memory to satisfy the allocation request exists but it is fragmented.
-Enabling Large Model Support defragmentation may avoid this failure.
-```
-
-TFLMS is capable of defragmenting sections of GPU memory to gather a
-contiguous block large enough for the request. This feature waits for current
-GPU computation to finish and then relocates active tensors to coalesce
-contiguous free memory blocks.
-
-Even with the GPU computation cleared, the moving of active tensors carries
-a risk of introducing NaN errors or other instability into the model. Despite
-this risk it has performed well in multi-week training runs with very large
-tensors and defragmentation called frequently.
-
-Due to the possible risk of instability the Large Model Support defragmentation
-is disabled by default and can be enabled along with LMS with the `tf.config.experimental.set_lms_defrag_enabled(True)` API or the  
-`config.gpu_options.experimental.lms_defrag_enabled=True` ConfigProto setting.
-
 # Model memory usage analysis with allocator statistics
 TFLMS adds several APIs to obtain GPU memory allocator statistics such as
 the number of allocations, the peak memory usage, the amount

diff --git a/examples/AllocatorStats.md b/examples/AllocatorStats.md
@@ -68,6 +68,25 @@ Returns the limit of reservable memory.
 
 **Parameter:** `gpu_id`: The zero indexed GPU ID for which to retrieve the statistic.
 
+```python
+tf.experimental.get_gpu_host_bytes_in_use(numa_node)
+```
+Returns the current number of bytes in use in the GPU host (CPU memory) allocator.
+
+_Since: 2.2.0_
+
+**Parameter:** `numa_node`: The ID of the NUMA node for the allocator.
+
+```python
+tf.experimental.get_gpu_host_peak_bytes_in_use(numa_node)
+```
+Returns the peak number of bytes in use in the GPU host (CPU memory) allocator.
+
+_Since: 2.2.0_
+
+**Parameter:** `numa_node`: The ID of the NUMA node for the allocator.
+
+
 ## Large Model Support Specific Statistics
 The Large Model Support specific statistics provide information about Large
 Model Support's memory management. The statics use the following terms:
@@ -80,9 +99,6 @@ Inactive tensors are those tensors which are not currently being used by an
 executing operation or a soon-to-be executing operation.
 * reclaim bytes - Reclaimed bytes are the bytes of inactive tensors which have
 been moved from GPU memory to the system (host) memory.
-* defragmentation - A method of producing contiguous memory blocks by moving
-active bytes to allow free memory blocks between the active bytes to coalesce
-into larger contiguous blocks.
 
 
 ```python
@@ -114,41 +130,39 @@ Returns the number of reclaimed bytes.
 
 **Parameter:** `gpu_id`: The zero indexed GPU ID for which to retrieve the statistic.
 
+
 ```python
-tf.experimental.get_num_single_reclaims(gpu_id)
+tf.experimental.get_current_bytes_reclaimed(gpu_id)
 ```
-Large Model Support will reclaim the bytes of single tensors when possible.
-This returns the number of times single tensors' bytes were reclaimed.
+Returns the current number of reclaimed bytes.
+
+_Since: 2.2.0_
 
 **Parameter:** `gpu_id`: The zero indexed GPU ID for which to retrieve the statistic.
 
+
 ```python
-tf.experimental.get_num_full_reclaims(gpu_id)
+tf.experimental.get_peak_bytes_reclaimed(gpu_id)
 ```
-When no single tensor reclamation is able to free enough GPU memory for the
-allocation request, all tensors are reclaimed. This returns the number
-of times all tensors were reclaimed.
+Returns the peak number of reclaimed bytes.
 
-**Parameter:** `gpu_id`: The zero indexed GPU ID for which to retrieve the statistic.
+_Since: 2.2.0_
 
+**Parameter:** `gpu_id`: The zero indexed GPU ID for which to retrieve the statistic.
 
 ```python
-tf.experimental.get_num_defragmentations(gpu_id)
+tf.experimental.get_num_single_reclaims(gpu_id)
 ```
-GPU memory may become fragmented such that there are no contiguous blocks which
-can fulfill an allocation request, even after reclaiming all inactive
-tensors. In this case, active tensors may be moved to allow free blocks to be
-coalesced to produce a contiguous memory block large enough to fulfill the
-allocation request. The defragmentation function of Large Model Support is
-disabled by default. This API returns the number of times defragmentation was
-performed.
+Large Model Support will reclaim the bytes of single tensors when possible.
+This returns the number of times single tensors' bytes were reclaimed.
 
 **Parameter:** `gpu_id`: The zero indexed GPU ID for which to retrieve the statistic.
 
-
 ```python
-tf.experimental.get_bytes_defragged(gpu_id)
+tf.experimental.get_num_full_reclaims(gpu_id)
 ```
-The number of bytes moved during GPU memory defragmentation.
+When no single tensor reclamation is able to free enough GPU memory for the
+allocation request, all tensors are reclaimed. This returns the number
+of times all tensors were reclaimed.
 
 **Parameter:** `gpu_id`: The zero indexed GPU ID for which to retrieve the statistic.
diff --git a/examples/ManyModel.py b/examples/ManyModel.py
@@ -134,8 +134,6 @@ def get_callbacks(args):
 def run_model(args):
     if args.lms:
         tf.config.experimental.set_lms_enabled(True)
-    if args.lms_defrag:
-        tf.config.experimental.set_lms_defrag_enabled(True)
 
     image_dim = args.image_size
     opt = tf.keras.optimizers.RMSprop()
@@ -209,14 +207,6 @@ def main():
                            help='Disable LMS (Default)')
     parser.set_defaults(lms=False)
 
-    defrag_group = parser.add_mutually_exclusive_group(required=False)
-    defrag_group.add_argument('--lms_defrag', dest='lms_defrag',
-                              action='store_true',
-                              help='Enable LMS defragmentation')
-    defrag_group.add_argument('--no-lms_defrag', dest='lms_defrag',
-                              action='store_false',
-                              help='Disable LMS defragmentation (Default)')
-    parser.set_defaults(lms_defrag=False)
     lms_stats = parser.add_mutually_exclusive_group(required=False)
     lms_stats.add_argument('--lms_stats', dest='lms_stats', action='store_true',
                            help='Log LMS per-step stats to a file named '

diff --git a/examples/callbacks.py b/examples/callbacks.py
@@ -27,7 +27,7 @@
 nvtx.nvtxMarkA.restype = None
 
 STATS_KEYS = ['time', 'allocs', 'reclaim_ones',
-              'reclaim_alls', 'defrags', 'gib_reclaimed', 'gib_defragged']
+              'reclaim_alls', 'gib_reclaimed']
 
 class CudaProfileCallback(Callback):
     def __init__(self, profile_epoch, profile_batch_start, profile_batch_end):
@@ -66,9 +66,7 @@ def _get_stats(self):
         stats['allocs'] = tf.experimental.get_num_allocs(self._gpu_id)
         stats['reclaim_ones'] = tf.experimental.get_num_single_reclaims(self._gpu_id)
         stats['reclaim_alls'] = tf.experimental.get_num_full_reclaims(self._gpu_id)
-        stats['defrags'] = tf.experimental.get_num_defragmentations(self._gpu_id)
         stats['gib_reclaimed'] = tf.experimental.get_bytes_reclaimed(self._gpu_id) / 1073741824.0
-        stats['gib_defragged'] = tf.experimental.get_bytes_defragged(self._gpu_id) / 1073741824.0
         return stats
 
     def step_begin(self):
@@ -114,9 +112,7 @@ def write_step_stats(logfile, step_type, epoch, step_num, step_stats):
         row.append(step_stats['allocs'])
         row.append(step_stats['reclaim_ones'])
         row.append(step_stats['reclaim_alls'])
-        row.append(step_stats['defrags'])
         row.append(step_stats['gib_reclaimed'])
-        row.append(step_stats['gib_defragged'])
         with open(logfile, 'a+', newline='') as csvfile:
             statswriter = csv.writer(csvfile)
             statswriter.writerow(row)
@@ -127,8 +123,7 @@ def write_step_log_header(logfile):
         statswriter = csv.writer(csvfile)
         statswriter.writerow(['step type', 'epoch', 'step',
                               'duration', 'allocs', 'reclaimOnes',
-                              'reclaimAlls', 'defrags',
-                              'GiB reclaimed', 'GiB defragged'])
+                              'reclaimAlls', 'GiB reclaimed'])
 
 
 class LMSStatsLogger(Callback):