# Purning

**Table of contents**<a id='toc0_'></a>    
- 1. [Pruning for on-device inference w/ XNNPACK](#toc1_)    
  - 1.1. [Setup 设置](#toc1_1_)    
  - 1.2. [Build and train the dense model 构建和训练密集模型](#toc1_2_)    
  - 1.3. [Build the sparse model 构建稀疏模型](#toc1_3_)    
    - 1.3.1. [Fine-tune the sparse model 微調稀疏模型](#toc1_3_1_)    
  - 1.4. [Model conversion and benchmarking 模型轉換和基準測試](#toc1_4_)    
  - 1.5. [查看 Model 大小](#toc1_5_)    
  - 1.6. [Conclusion 結論](#toc1_6_)    

<!-- vscode-jupyter-toc-config
	numbering=true
	anchor=true
	flat=false
	minLevel=2
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

## 1. <a id='toc1_'></a>[Pruning for on-device inference w/ XNNPACK](#toc0_)

Welcome to the guide on Keras weights pruning for improving latency of on-device inference via [XNNPACK](https://github.com/google/XNNPACK).

欢迎阅读Keras权重修剪指南，以通过XNNPACK改善设备上推理的延迟。

<br>

This guide presents the usage of the newly introduced `tfmot.sparsity.keras.PruningPolicy` API and demonstrates how it could be used for accelerating mostly convolutional models on modern CPUs using [XNNPACK Sparse inference](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/delegates/xnnpack/README.md#sparse-inference).

本指南介绍了新引入的API的用法 `tfmot.sparsity.keras.PruningPolicy` ，并演示了如何使用XNNPACK稀疏推理在现代CPU上加速主要的卷积模型。

<br>

The guide covers the following steps of the model creation process:

本指南涵盖了模型创建过程的以下步骤：

* Build and train the dense baseline 构建并训练密集基线
* Fine-tune model with pruning 使用修剪微调模型
* Convert to TFLite 转换为TFLite
* On-device benchmark 设备上基准测试

<br>

The guide doesn't cover the best practices for the fine-tuning with pruning. For more detailed information on this topic, please check out our [comprehensive guide](https://www.tensorflow.org/model_optimization/guide/pruning/comprehensive_guide.md).

本指南没有介绍使用修剪进行微调的最佳实践。有关此主题的更多详细信息，请查看我们的[综合指南](https://www.tensorflow.org/model_optimization/guide/pruning/comprehensive_guide.md)。

### 1.1. <a id='toc1_1_'></a>[Setup 设置](#toc0_)

In [1]:
# pip install -q tensorflow
# pip install -q tensorflow-model-optimization

In [1]:
import tempfile

import tensorflow as tf
import numpy as np

from tensorflow import keras
import tensorflow_datasets as tfds
import tensorflow_model_optimization as tfmot

%load_ext tensorboard

### 1.2. <a id='toc1_2_'></a>[Build and train the dense model 构建和训练密集模型](#toc0_)

We build and train a simple baseline CNN for classification task on [CIFAR10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset.

我们构建并训练了一个简单的基线CNN，用于[CIFAR10数据集](https://www.cs.toronto.edu/~kriz/cifar.html)的分类任务。

In [2]:
# Load CIFAR10 dataset.

(ds_train, ds_val, ds_test), ds_info = tfds.load("cifar10",
                                        split=['train[:90%]', 'train[90%:]', 'test'],
                                        as_supervised=True,
                                        with_info=True,
                                        )

# Normalize the input image so that each pixel value is between 0 and 1.
def normalize_img(img, label):
    """Normalizes images: `uint8` -> `float32`."""
    return tf.image.convert_image_dtype(img, tf.float32), label



2023-06-30 10:02:48.128136: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-06-30 10:02:48.128238: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-06-30 10:02:48.141922: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-06-30 10:02:48.142038: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-06-30 10:02:48.142095: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from S

In [3]:
# Load the data in batches of 128 images.

batch_size = 128
def prepare_dataset(ds, buffer_size=None):
    ds = ds.map(normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE) # 将ds数据集中的每个元素传递给normalize_img函数进行预处理。
    ds = ds.cache() # cache方法将预处理后的数据集缓存到内存中，可以在需要多次迭代访问数据集时提高访问速度。
    if buffer_size:
        ds = ds.shuffle(buffer_size)
    ds = ds.batch(batch_size) # ds.batch(batch_size)表示将ds数据集划分为多个批次，每个批次包含batch_size个元素。
    ds = ds.prefetch(tf.data.experimental.AUTOTUNE) # 当模型训练时，prefetch方法会异步地从数据集中预取一定数量的元素，并将它们放入缓冲区中。
    return ds

ds_train = prepare_dataset(ds_train, ds_info.splits['train'].num_examples)
ds_val = prepare_dataset(ds_val)
ds_test = prepare_dataset(ds_test)

In [4]:
# Build the dense baseline model.

dense_model = keras.Sequential([keras.layers.InputLayer(input_shape=(32,32,3)),

                                keras.layers.ZeroPadding2D(padding=1), # 使用ZeroPadding2D层，我们可以有效地扩展输入特征图的大小，而不会损失输入图像的信息。

                                keras.layers.Conv2D(filters=8,kernel_size=(3, 3),strides=(2, 2),padding='valid'),
                                keras.layers.BatchNormalization(),
                                keras.layers.ReLU(),

                                keras.layers.DepthwiseConv2D(kernel_size=(3, 3), padding='same'),
                                keras.layers.BatchNormalization(),
                                keras.layers.ReLU(),

                                keras.layers.Conv2D(filters=16, kernel_size=(1, 1)),
                                keras.layers.BatchNormalization(),
                                keras.layers.ReLU(),

                                keras.layers.ZeroPadding2D(padding=1),

                                keras.layers.DepthwiseConv2D(kernel_size=(3, 3), strides=(2, 2), padding='valid'),
                                keras.layers.BatchNormalization(),
                                keras.layers.ReLU(),

                                keras.layers.Conv2D(filters=32, kernel_size=(1, 1)),
                                keras.layers.BatchNormalization(),
                                keras.layers.ReLU(),

                                keras.layers.GlobalAveragePooling2D(),
                                keras.layers.Flatten(),
                                keras.layers.Dense(10)])

In [5]:
# Compile and train the dense model for 10 epochs.
dense_model.compile(
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer='adam',
    metrics=['accuracy'])

In [6]:
dense_model.fit(
  ds_train,
  epochs=10,
  validation_data=ds_val,
  verbose=2)

Epoch 1/10


2023-06-30 10:02:50.961762: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8600
2023-06-30 10:02:51.303033: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2023-06-30 10:02:51.363886: I tensorflow/stream_executor/cuda/cuda_blas.cc:1786] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.


352/352 - 3s - loss: 1.9814 - accuracy: 0.2736 - val_loss: 2.2209 - val_accuracy: 0.1674 - 3s/epoch - 8ms/step
Epoch 2/10
352/352 - 1s - loss: 1.7085 - accuracy: 0.3657 - val_loss: 1.6562 - val_accuracy: 0.3934 - 935ms/epoch - 3ms/step
Epoch 3/10
352/352 - 1s - loss: 1.6018 - accuracy: 0.4130 - val_loss: 1.6059 - val_accuracy: 0.3984 - 814ms/epoch - 2ms/step
Epoch 4/10
352/352 - 1s - loss: 1.5480 - accuracy: 0.4352 - val_loss: 1.5534 - val_accuracy: 0.4264 - 795ms/epoch - 2ms/step
Epoch 5/10
352/352 - 1s - loss: 1.5107 - accuracy: 0.4527 - val_loss: 1.6707 - val_accuracy: 0.3994 - 785ms/epoch - 2ms/step
Epoch 6/10
352/352 - 1s - loss: 1.4785 - accuracy: 0.4693 - val_loss: 1.5060 - val_accuracy: 0.4508 - 771ms/epoch - 2ms/step
Epoch 7/10
352/352 - 1s - loss: 1.4525 - accuracy: 0.4775 - val_loss: 1.4310 - val_accuracy: 0.4810 - 747ms/epoch - 2ms/step
Epoch 8/10
352/352 - 1s - loss: 1.4259 - accuracy: 0.4860 - val_loss: 1.5433 - val_accuracy: 0.4500 - 916ms/epoch - 3ms/step
Epoch 9/10
352

<keras.callbacks.History at 0x7f3fbc0e2d00>

In [19]:
# Evaluate the dense model
_ , dense_model_accuracy = dense_model.evaluate(ds_test, verbose=1)
print("dense_model_accuracy = ", dense_model_accuracy)

dense_model_accuracy =  0.476500004529953


### 1.3. <a id='toc1_3_'></a>[Build the sparse model 构建稀疏模型](#toc0_)

Using the instructions from the [comprehensive guide](https://www.tensorflow.org/model_optimization/guide/pruning/comprehensive_guide.md), we apply `tfmot.sparsity.keras.prune_low_magnitude` function with parameters that target on-device acceleration via pruning i.e. `tfmot.sparsity.keras.PruneForLatencyOnXNNPack` policy.

使用来自[综合指南](https://www.tensorflow.org/model_optimization/guide/pruning/comprehensive_guide.md)的说明，我们应用 `tfmot.sparsity.keras.prune_low_magnitude` 具有通过修剪以设备上加速为目标的参数的函数，即政策 `tfmot.sparsity.keras.PruneForLatencyOnXNNPack` 。

In [8]:
prune_low_magnitude = tfmot.sparsity.keras.prune_low_magnitude

# Compute end step to finish pruning after after 5 epochs.
end_epochs = 5

num_iterations_per_epoch = len(ds_train)  # 幾個 batch
end_step  = num_iterations_per_epoch  * end_epochs



In [82]:
# Define parameters for pruning.
pruning_params = {
    'pruning_schedule':tfmot.sparsity.keras.PolynomialDecay(initial_sparsity=0.25,
                                                            final_sparsity=0.75,
                                                            begin_step=0,
                                                            end_step=end_step),
    'pruning_policy':tfmot.sparsity.keras.PruneForLatencyOnXNNPack()
}

In [83]:
# Try to apply pruning wrapper with pruning policy parameter.
try:
    model_for_pruning = prune_low_magnitude(dense_model, **pruning_params)
except ValueError as e:
    print(e)

The call `prune_low_magnitude` results in `ValueError` with the message `Could not find a GlobalAveragePooling2D layer with keepdims = True in all output branches`. The message indicates that the model isn't supported for pruning with policy [tfmot.sparsity.keras.PruneForLatencyOnXNNPack](https://www.tensorflow.org/model_optimization/api_docs/python/tfmot/sparsity/keras/PruneForLatencyOnXNNPack) and specifically the layer `GlobalAveragePooling2D` requires the parameter `keepdims = True`. Let's fix that and reapply `prune_low_magnitude` function.

呼叫 prune_low_magnitude 結果 ValueError 為消息 Could not find a GlobalAveragePooling2D layer with keepdims = True in all output branches 。該消息指示模型不支持使用策略進行修剪 [tfmot.sparsity.keras.PruneForLatencyOnXNNPack](https://www.tensorflow.org/model_optimization/api_docs/python/tfmot/sparsity/keras/PruneForLatencyOnXNNPack) ，並且該層 GlobalAveragePooling2D 需要該參數 keepdims = True 。讓我們修復它並重新應用 prune_low_magnitude 函數。

In [84]:
fixed_dense_model = keras.Sequential([keras.layers.InputLayer(input_shape=(32, 32, 3)),
                  
                    keras.layers.ZeroPadding2D(padding=1),

                    keras.layers.Conv2D(filters=8,kernel_size=(3, 3),strides=(2, 2),padding='valid'),
                    keras.layers.BatchNormalization(),
                    keras.layers.ReLU(),
                    
                    keras.layers.DepthwiseConv2D(kernel_size=(3, 3), padding='same'),
                    keras.layers.BatchNormalization(),
                    keras.layers.ReLU(),
                    
                    keras.layers.Conv2D(filters=16, kernel_size=(1, 1)),
                    keras.layers.BatchNormalization(),
                    keras.layers.ReLU(),

                    keras.layers.ZeroPadding2D(padding=1),

                    keras.layers.DepthwiseConv2D(kernel_size=(3, 3), strides=(2, 2), padding='valid'),
                    keras.layers.BatchNormalization(),
                    keras.layers.ReLU(),

                    keras.layers.Conv2D(filters=32, kernel_size=(1, 1)),
                    keras.layers.BatchNormalization(),
                    keras.layers.ReLU(),
                    
                    keras.layers.GlobalAveragePooling2D(keepdims=True),
                    keras.layers.Flatten(),
                    keras.layers.Dense(10)])

In [85]:
# Use the pretrained model for pruning instead of training from scratch.
fixed_dense_model.set_weights(dense_model.get_weights())

In [86]:
# Try to reapply pruning wrapper.
model_for_pruning = prune_low_magnitude(fixed_dense_model, **pruning_params)

Invocation of `prune_low_magnitude` has finished without any errors meaning that the model is fully supported for the [tfmot.sparsity.keras.PruneForLatencyOnXNNPack](https://www.tensorflow.org/model_optimization/api_docs/python/tfmot/sparsity/keras/PruneForLatencyOnXNNPack) policy and can be accelerated using [XNNPACK Sparse inference](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/delegates/xnnpack/README.md#sparse-inference).

的調用 `prune_low_magnitude` 已完成，沒有任何錯誤，這意味著模型完全支持 tfmot.sparsity.keras.PruneForLatencyOnXNNPack 策略，並且可以使用[XNNPACK稀疏推理](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/delegates/xnnpack/README.md#sparse-inference)來加速。

#### 1.3.1. <a id='toc1_3_1_'></a>[Fine-tune the sparse model 微調稀疏模型](#toc0_)
Following the [pruning example](https://www.tensorflow.org/model_optimization/guide/pruning/pruning_with_keras.md), we fine-tune the sparse model using the weights of the dense model. We start fine-tuning of the model with 25% sparsity (25% of the weights are set to zero) and end with 75% sparsity.

在[修剪示例](https://www.tensorflow.org/model_optimization/guide/pruning/pruning_with_keras.md)之後，我們使用密集模型的權重來微調稀疏模型。我們以25%的稀疏度（25%的權重設置為零）開始對模型進行微調，並以75%的稀疏度結束。


In [87]:
logdir = tempfile.mkdtemp() # Create temp dir 創建臨時目錄

callbacks = [
  tfmot.sparsity.keras.UpdatePruningStep(),
  tfmot.sparsity.keras.PruningSummaries(log_dir=logdir),
]

model_for_pruning.compile(
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer='adam',
    metrics=['accuracy'])

model_for_pruning.fit(
  ds_train,
  epochs=15,
  validation_data=ds_val,
  callbacks=callbacks)

Epoch 1/15






Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


<keras.callbacks.History at 0x7f3ffc9aa0d0>

The logs show the progression of sparsity on a per-layer basis.

日誌顯示了稀疏性在每層基礎上的進展。

In [88]:
# Evaluate the dense model.
_, pruned_model_accuracy = model_for_pruning.evaluate(ds_test, verbose=0)

print('Dense model test accuracy:', dense_model_accuracy)
print('Pruned model test accuracy:', pruned_model_accuracy)

Dense model test accuracy: 0.476500004529953
Pruned model test accuracy: 0.44690001010894775


In [89]:
# #docs_infra: no_execute
# %tensorboard --logdir={logdir}

After the fine-tuning with pruning, test accuracy demonstrates a modest improvement (43% to 44%) compared to the dense model. Let's compare on-device latency using [TFLite benchmark](https://www.tensorflow.org/lite/performance/measurement).

在使用修剪進行微調之後，與密集模型相比，測試準確率顯示出適度的提高（43%至44%）。讓我們使用[TFLite基準測試](https://www.tensorflow.org/lite/performance/measurement)來比較設備上的延遲。

### 1.4. <a id='toc1_4_'></a>[Model conversion and benchmarking 模型轉換和基準測試](#toc0_)

To convert the pruned model into TFLite, we need replace the `PruneLowMagnitude` wrappers with original layers via the `strip_pruning` function. Also, since the weights of the pruned model (`model_for_pruning`) are mostly zeros, we may apply an optimization [tf.lite.Optimize.EXPERIMENTAL_SPARSITY](https://www.tensorflow.org/lite/api_docs/python/tf/lite/Optimize#EXPERIMENTAL_SPARSITY) to efficiently store the resulted TFLite model. This optimization flag is not required for the dense model.

要將修剪後的模型轉換為TFLite，我們需要通過函數將 `PruneLowMagnitude` 包裝器替換為原始層 `strip_pruning` 。此外，由於修剪模型（`model_for_pruning`）的權重  大多為零，因此我們可以應用優化 [tf.lite.Optimize.EXPERIMENTAL_SPARSITY](https://www.tensorflow.org/lite/api_docs/python/tf/lite/Optimize#EXPERIMENTAL_SPARSITY) 來有效地存儲所得到的TFLite模型。密集模型不需要此優化標誌

In [90]:
converter = tf.lite.TFLiteConverter.from_keras_model(dense_model)
dense_tflite_model = converter.convert()

INFO:tensorflow:Assets written to: /tmp/tmponl7c5ul/assets


INFO:tensorflow:Assets written to: /tmp/tmponl7c5ul/assets
2023-06-30 11:11:52.734662: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:357] Ignored output_format.
2023-06-30 11:11:52.734681: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:360] Ignored drop_control_dependency.
2023-06-30 11:11:52.734789: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /tmp/tmponl7c5ul
2023-06-30 11:11:52.736576: I tensorflow/cc/saved_model/reader.cc:78] Reading meta graph with tags { serve }
2023-06-30 11:11:52.736586: I tensorflow/cc/saved_model/reader.cc:119] Reading SavedModel debug info (if present) from: /tmp/tmponl7c5ul
2023-06-30 11:11:52.742202: I tensorflow/cc/saved_model/loader.cc:228] Restoring SavedModel bundle.
2023-06-30 11:11:52.790649: I tensorflow/cc/saved_model/loader.cc:212] Running initialization op on SavedModel bundle at path: /tmp/tmponl7c5ul
2023-06-30 11:11:52.805829: I tensorflow/cc/saved_model/loader.cc:301] SavedModel

In [91]:
# 儲存 dense_tflite_model
_, dense_tflite_file = tempfile.mkstemp('.tflite') # Create temp dir 創建臨時目錄
with open(dense_tflite_file, 'wb') as f:
  f.write(dense_tflite_model)

model_for_export = tfmot.sparsity.keras.strip_pruning(model_for_pruning)

converter = tf.lite.TFLiteConverter.from_keras_model(model_for_export)
converter.optimizations = [tf.lite.Optimize.EXPERIMENTAL_SPARSITY]
pruned_tflite_model = converter.convert()

# 儲存 pruned_tflite_model
_, pruned_tflite_file = tempfile.mkstemp('.tflite')
with open(pruned_tflite_file, 'wb') as f:
  f.write(pruned_tflite_model)

INFO:tensorflow:Assets written to: /tmp/tmpopi3mcs6/assets


INFO:tensorflow:Assets written to: /tmp/tmpopi3mcs6/assets
2023-06-30 11:11:53.917989: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:357] Ignored output_format.
2023-06-30 11:11:53.918007: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:360] Ignored drop_control_dependency.
2023-06-30 11:11:53.918114: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /tmp/tmpopi3mcs6
2023-06-30 11:11:53.919793: I tensorflow/cc/saved_model/reader.cc:78] Reading meta graph with tags { serve }
2023-06-30 11:11:53.919804: I tensorflow/cc/saved_model/reader.cc:119] Reading SavedModel debug info (if present) from: /tmp/tmpopi3mcs6
2023-06-30 11:11:53.924642: I tensorflow/cc/saved_model/loader.cc:228] Restoring SavedModel bundle.
2023-06-30 11:11:53.950014: I tensorflow/cc/saved_model/loader.cc:212] Running initialization op on SavedModel bundle at path: /tmp/tmpopi3mcs6
2023-06-30 11:11:53.961224: I tensorflow/cc/saved_model/loader.cc:301] SavedModel

Following the instructions of [TFLite Model Benchmarking Tool](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/tools/benchmark), we build the tool, upload it to the Android device together with dense and pruned TFLite models, and benchmark both models on the device.

按照[TFLite模型基準測試工具](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/tools/benchmark)的說明，我們構建了該工具，將其與密集和修剪的TFLite模型一起上傳到Android設備，並在設備上對兩個模型進行基準測試。

In [92]:
!adb shell /data/local/tmp/benchmark_model \
    --graph=/data/local/tmp/dense_model.tflite \
    --use_xnnpack=true \
    --num_runs=100 \
    --num_threads=1

/bin/bash: /home/cosmo/anaconda3/envs/TensorFlow_2.8.3__Python_3.9/lib/libtinfo.so.6: no version information available (required by /bin/bash)
/bin/bash: adb：命令找不到


In [93]:
! adb shell /data/local/tmp/benchmark_model \
    --graph=/data/local/tmp/pruned_model.tflite \
    --use_xnnpack=true \
    --num_runs=100 \
    --num_threads=1

/bin/bash: /home/cosmo/anaconda3/envs/TensorFlow_2.8.3__Python_3.9/lib/libtinfo.so.6: no version information available (required by /bin/bash)
/bin/bash: adb：命令找不到


### 1.5. <a id='toc1_5_'></a>[查看 Model 大小](#toc0_)

In [94]:
import pathlib
models_dir = pathlib.Path("/tmp/models/")
models_dir.mkdir(exist_ok=True, parents=True)

In [95]:
tflite_model_file = models_dir/"dense_tflite_model.tflite"
tflite_model_file.write_bytes(dense_tflite_model)

tflite_model_file = models_dir/"pruned_tflite_model.tflite"
tflite_model_file.write_bytes(pruned_tflite_model)

!ls -lh {models_dir}

/bin/bash: /home/cosmo/anaconda3/envs/TensorFlow_2.8.3__Python_3.9/lib/libtinfo.so.6: no version information available (required by /bin/bash)
總用量 24K
-rw-rw-r-- 1 cosmo cosmo 12K  6月 30 11:11 dense_tflite_model.tflite
-rw-rw-r-- 1 cosmo cosmo 11K  6月 30 11:11 pruned_tflite_model.tflite


In [96]:
# 刪除資料夾裡的 model
!rm -rf {models_dir}/*

/bin/bash: /home/cosmo/anaconda3/envs/TensorFlow_2.8.3__Python_3.9/lib/libtinfo.so.6: no version information available (required by /bin/bash)


### 1.6. <a id='toc1_6_'></a>[Conclusion 結論](#toc0_)

In this tutorial, we show how one may create sparse models for faster on-device performance using the new functionality introduced by the TF MOT API and XNNPack. These sparse models are smaller and faster than their dense counterparts while retaining or even surpassing their quality.

在本教程中，我們將展示如何使用TF MOT API和XNNPack引入的新功能創建稀疏模型，以提高設備上的性能。這些稀疏模型比密集模型更小，更快，同時保持甚至超過它們的質量。

We encourage you to try this new capability which can be particularly important for deploying your models on device.
我們鼓勵您嘗試此新功能，它對於在設備上部署模型尤其重要。