<a href="https://colab.research.google.com/github/Aliasgharshinwari/emgaxo/blob/main/emgaxo_full_tutorial_SW.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **Clean Install of Miniconda**
Removes any existing Conda installation and performs a fresh install of Miniconda3 to ensure a clean environment manager.

In [1]:
# Clean any broken or old conda installs
!rm -rf /usr/local/miniconda

# Reinstall Miniconda cleanly
!wget -q https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
!bash Miniconda3-latest-Linux-x86_64.sh -b -p /usr/local/miniconda
!rm Miniconda3-latest-Linux-x86_64.sh

PREFIX=/usr/local/miniconda
Unpacking bootstrapper...
Unpacking payload...

Installing base environment...

Preparing transaction: ...working... done
Executing transaction: ...working... done
installation finished.
    You currently have a PYTHONPATH environment variable set. This may cause
    unexpected behavior when running the Python interpreter in Miniconda3.
    For best results, please verify that your PYTHONPATH only points to
    directories of packages that are compatible with the Python interpreter
    in Miniconda3: /usr/local/miniconda


### **Create Python 3.10 Environment**
Initializes Conda, accepts the Terms of Service, and creates a specific environment named `py310` running Python 3.10.12.

In [2]:
# Initialize conda in this session
import sys
sys.path.append('/usr/local/miniconda/lib/python3.10/site-packages')
!eval "$(/usr/local/miniconda/bin/conda shell.bash hook)"

# Accept Anaconda Terms of Service (for main + r channels)
!/usr/local/miniconda/bin/conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main
!/usr/local/miniconda/bin/conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r

# Create a new environment with Python 3.10.12
!/usr/local/miniconda/bin/conda create -y -n py310 python=3.10.12

# Show environments
!/usr/local/miniconda/bin/conda env list

# Verify Python version in the new env
!/usr/local/miniconda/bin/conda run -n py310 python --version


accepted Terms of Service for [4;94mhttps://repo.anaconda.com/pkgs/main[0m
accepted Terms of Service for [4;94mhttps://repo.anaconda.com/pkgs/r[0m
[1;33mJupyter detected[0m[1;33m...[0m
[1;32m2[0m[1;32m channel Terms of Service accepted[0m
Retrieving notices: - \ | / - \ | / - \ | / - \ done
Channels:
 - defaults
Platform: linux-64
Collecting package metadata (repodata.json): / - \ | / - \ | / - done
Solving environment: | done

## Package Plan ##

  environment location: /usr/local/miniconda/envs/py310

  added / updated specs:
    - python=3.10.12


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    python-3.10.12             |       h955ad1f_0        26.8 MB
    setuptools-80.9.0          |  py310h06a4308_0         1.4 MB
    wheel-0.45.1               |  py310h06a4308_0         115 KB
    --------------------------------------

### **Activate Environment in Colab**
Updates the system `PATH` variable to prioritize the executables from the newly created `py310` environment for the current session.

In [3]:
import os
os.environ['PATH'] = "/usr/local/miniconda/envs/py310/bin:" + os.environ['PATH']
!python --version


Python 3.10.12


### **Clean Up Previous Clones**
Removes any existing copies of the `emgaxo` repository to ensure a fresh clone.

In [4]:
!rm -rf emgaxo/

### **Clone and Install Project**
Clones the repository and installs the package in **editable mode**.

In [5]:
!git clone https://github.com/Aliasgharshinwari/emgaxo.git
!cd emgaxo
!pip install -e emgaxo/
!python3 -m venv emgaxo/emgaxo_env
!source emgaxo/emgaxo_env/bin/activate

Cloning into 'emgaxo'...
remote: Enumerating objects: 654, done.[K
remote: Counting objects: 100% (106/106), done.[K
remote: Compressing objects: 100% (88/88), done.[K
remote: Total 654 (delta 55), reused 39 (delta 16), pack-reused 548 (from 2)[K
Receiving objects: 100% (654/654), 3.16 MiB | 8.72 MiB/s, done.
Resolving deltas: 100% (318/318), done.
Obtaining file:///content/emgaxo
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Preparing editable metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: emgaxo
  Building editable for emgaxo (pyproject.toml) ... [?25l[?25hdone
  Created wheel for emgaxo: filename=emgaxo-0.1.1-0.editable-py3-none-any.whl size=3545 sha256=8bab69418fc5dc4ce86a92b738c76ab3252e0c756c9d817b5a99467afc26e625
  Stored in directory: /tmp/pip-ephem-wheel-cache-16uj85j7/wheels/bb/5b/35/75f76

### **Install Project Dependencies**
Installs the required Python libraries listed in `requirements.txt`.

In [6]:
!pip install -r emgaxo/requirements.txt

Collecting tensorflow==2.14.1 (from -r emgaxo/requirements.txt (line 1))
  Downloading tensorflow-2.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.1 kB)
Collecting tensorflow-estimator==2.14.0 (from -r emgaxo/requirements.txt (line 2))
  Downloading tensorflow_estimator-2.14.0-py2.py3-none-any.whl.metadata (1.3 kB)
Collecting onnx==1.17.0 (from -r emgaxo/requirements.txt (line 3))
  Downloading onnx-1.17.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (16 kB)
Collecting onnxruntime-gpu==1.21.0 (from -r emgaxo/requirements.txt (line 4))
  Downloading onnxruntime_gpu-1.21.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (4.8 kB)
Collecting onnxruntime_extensions==0.14.0 (from -r emgaxo/requirements.txt (line 5))
  Downloading onnxruntime_extensions-0.14.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.6 kB)
Collecting tf2onnx==1.16.1 (from -r emgaxo/requirements.txt (line 6))
  Downloading tf

### **Step 1: Train Baseline Model**
Train a fully connected model on the MNIST dataset and export it to ONNX format. To save time, the model is already trained and saved in ONNX format. We can simply load and preprocess it in the next step.

### **Preprocess ONNX Model**
Uses ONNX Runtime tools to preprocess the model graph to prepare it for quantization. This step fuses the MutMul and Add Operators to create Gemm Operator.

In [20]:
!python -m onnxruntime.quantization.preprocess --input ./emgaxo/models/mnist_model.onnx --output ./emgaxo/models/mnist_model_infer.onnx
!mkdir -p ./models
!cp ./emgaxo/models/mnist_model_infer.onnx ./models/mnist_model_infer.onnx

### **Step 2: Static Quantization**
Performs static quantization on the preprocessed model to create a standard Quantized GEMM (QGEMM) model. The Gemm Nodes are Converted to QGemm Nodes and the activation node (ReLU) is also removed in this process.

In [21]:
!python emgaxo/MNIST_Tutorial_SW/02_mnist_static_quantization.py

2025-12-20 13:37:17.496604: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2025-12-20 13:37:17.538592: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-12-20 13:37:17.538647: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-12-20 13:37:17.538688: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-12-20 13:37:17.546375: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2025-12-20 13:37:17.546642: I tensorflow/core/platform/cpu_feature_guard.cc:182] This Tens

### **Test GEMM Accuracy**
Evaluates the accuracy of the standard FP32 model (GEMM) to check for accuracy degradation after preprocessing step.

In [22]:
!python emgaxo/MNIST_Tutorial_SW/03_accuracy_tester_gemm.py

✅ MNIST test data already exists.
Average inference time: 0.0081 ms
Accuracy: 98.06%
Precision: 0.9807
Recall: 0.9806
F1 Score: 0.9806


### **Test QGEMM Accuracy**
Evaluates the accuracy of the standard quantized model (QGEMM) to check for accuracy degradation after quantization.

In [23]:
!python emgaxo/MNIST_Tutorial_SW/04_accuracy_tester_qgemm.py

[0;93m2025-12-20 13:37:38.065053122 [W:onnxruntime:, transformer_memcpy.cc:83 ApplyImpl] 2 Memcpy nodes are added to the graph tf2onnx for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.[m
[0;93m2025-12-20 13:37:38.065297038 [W:onnxruntime:, session_state.cc:1263 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.[m
[0;93m2025-12-20 13:37:38.065311395 [W:onnxruntime:, session_state.cc:1265 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.[m
Average inference time: 0.0069 ms
Accuracy: 97.11%
Precision: 0.9714
Recall: 0.9711
F1 Score: 0.9711


### **Step 3: Inject Custom Operators**
Modifies the ONNX graph by replacing standard `QGemm` operators with your  `ApproxQGemm` operators. The Multiplication Inside the ApproxQGemm is characterized by the INIT Value Specified inside the script.

In [24]:
!python emgaxo/MNIST_Tutorial_SW/05_modifying_qgemm_with_ApproxQGemm.py

[DEBUG modify_model] INIT_Value = 68719476735
1 Successfully replaced sequential/dense/MatMul/MatMulAddFusion_quant (QGemm) with sequential/dense/MatMul/MatMulAddFusion_quant (ApproxQGemm)
68719476735
2 Successfully replaced sequential/dense_1/MatMul/MatMulAddFusion_quant (QGemm) with sequential/dense_1/MatMul/MatMulAddFusion_quant (ApproxQGemm)
68719476735
3 Successfully replaced sequential/dense_2/MatMul/MatMulAddFusion_quant (QGemm) with sequential/dense_2/MatMul/MatMulAddFusion_quant (ApproxQGemm)
68719476735
4 Successfully replaced sequential/dense_3/MatMul/MatMulAddFusion_quant (QGemm) with sequential/dense_3/MatMul/MatMulAddFusion_quant (ApproxQGemm)
68719476735
5 Successfully replaced sequential/dense_4/MatMul/MatMulAddFusion_quant (QGemm) with sequential/dense_4/MatMul/MatMulAddFusion_quant (ApproxQGemm)
68719476735
6 Successfully replaced sequential/dense_5/MatMul/MatMulAddFusion_quant (QGemm) with sequential/dense_5/MatMul/MatMulAddFusion_quant (ApproxQGemm)
68719476735
Mode

### **Graph Cleanup**
Removes `QuantizeLinear` and `DeQuantizeLinear` nodes from the graph, likely as an optimization step for the custom operator workflow. This enables the model to ingest uint8 values as well as output uint8 values.

In [25]:
!python emgaxo/MNIST_Tutorial_SW/06_removing_quantize_linear.py

### **Reinstall ONNX Runtime GPU**
Forces a reinstall of `onnxruntime-gpu` to ensure the environment is clean and compatible with the custom operators.

In [26]:
!pip uninstall onnxruntime onnxruntime-gpu -y
!pip install onnxruntime-gpu==1.21.0

[0mFound existing installation: onnxruntime-gpu 1.21.0
Uninstalling onnxruntime-gpu-1.21.0:
  Successfully uninstalled onnxruntime-gpu-1.21.0
Collecting onnxruntime-gpu==1.21.0
  Using cached onnxruntime_gpu-1.21.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (4.8 kB)
Using cached onnxruntime_gpu-1.21.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (280.8 MB)
Installing collected packages: onnxruntime-gpu
Successfully installed onnxruntime-gpu-1.21.0


### **Step 4: Test Approximate Multiplier based MLP Accuracy**
Runs the final accuracy test using the custom `ApproxQGemm` operator to validate the performance of your approximate computing implementation.

In [27]:
!python emgaxo/MNIST_Tutorial_SW/07_accuracy_tester_ApproxQGemm.py

[0;93m2025-12-20 13:38:12.652704301 [W:onnxruntime:, graph.cc:4401 CleanUnusedInitializersAndNodeArgs] Removing initializer 'dense_5_scale'. It is not used by any node and should be removed from the model.[m
[0;93m2025-12-20 13:38:12.652727706 [W:onnxruntime:, graph.cc:4401 CleanUnusedInitializersAndNodeArgs] Removing initializer 'dense_5_zero_point'. It is not used by any node and should be removed from the model.[m
Average inference time: 1.8142 ms
Accuracy: 97.98%
Precision: 0.9799
Recall: 0.9798
F1 Score: 0.9798
