# XOR Neural Network in PyTorch and Keras

## Introduction
This script demonstrates how to implement a simple neural network for solving the XOR problem using **PyTorch** and **Keras**. The XOR problem is a classic example of a problem that cannot be solved with a simple perceptron but requires a multi-layer neural network. We will:

1. Implement the XOR neural network in **PyTorch**.
2. Implement the same XOR neural network in **Keras**.
3. Train both models on XOR data.
4. Convert the trained Keras model to **TensorFlow Lite (TFLite)** for deployment on embedded systems like the **ESP32S3**.

## Understanding the Tools

### PyTorch
- PyTorch is an open-source machine learning framework developed by Facebook AI.
- It is widely used for deep learning research and production.
- It provides dynamic computation graphs, making it flexible and easy to debug.

### TensorFlow / Keras
- TensorFlow is a machine learning framework developed by Google.
- Keras is a high-level API built on top of TensorFlow that simplifies the process of building neural networks.
- TensorFlow Lite (TFLite) is a lightweight version of TensorFlow designed for mobile and embedded devices like **ESP32S3**.

---

## Step 1: Import Required Libraries

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import tensorflow as tf
from tensorflow import keras

**Explanation:**
- `torch` is the main PyTorch module.
- `torch.nn` contains classes for building neural networks.
- `torch.optim` provides optimization algorithms.
- `numpy` is used for numerical operations.
- `tensorflow` and `keras` are used to implement the model in Keras.

---

## Step 2: Define the XOR Dataset

In [2]:
# XOR input and expected output
data = np.array([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=np.float32)
labels = np.array([[0], [1], [1], [0]], dtype=np.float32)

**Explanation:**
- The XOR dataset consists of four binary input pairs and their expected outputs.
- The goal of the network is to learn the XOR function.

---

## Step 3: Implement the XOR Neural Network in PyTorch

In [3]:
class XOR_NN_PyTorch(nn.Module):
    def __init__(self):
        super(XOR_NN_PyTorch, self).__init__()
        self.hidden = nn.Linear(2, 3)
        self.output = nn.Linear(3, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.sigmoid(self.hidden(x))
        x = self.sigmoid(self.output(x))
        return x

**Explanation:**
- `nn.Linear(2, 3)`: A fully connected layer with **2 input neurons** and **3 hidden neurons**.
- `nn.Linear(3, 1)`: A second fully connected layer with **3 hidden neurons** and **1 output neuron**.
- `Sigmoid()`: Activation function that maps outputs to values between 0 and 1.

---

## Step 4: Train the PyTorch Model
**Choosing the Optimizer (How the Model Learns)**

Why not **SGD** (Stochastic Gradient Descent)?

- XOR is a non-linearly separable problem, so SGD struggles.
- It would need 30,000+ epochs to get decent results.

Why **Adam** (Adaptive Moment Estimation)?

- Adaptive learning rate: It adjusts how fast it learns at different times.
- Momentum-based: Helps avoid getting stuck in bad spots.
- Handles sparse gradients well – Useful for deep networks.
- Faster training: Works well for most deep learning tasks.
- Works out of the box – Requires less tuning than SGD.

Why **RMSprop** (Root Mean Square Propagation)?

- Good for small datasets.
- Helps stabilize learning but can sometimes learn too fast and overfit.
- Adapts the learning rate for each weight in the network by dividing the gradient by a moving average of its recent magnitudes

In [4]:
model = XOR_NN_PyTorch()
criterion = nn.MSELoss()
# XOR is not linearly separable, and SGD struggles with non-convex optimization problems
#optimizer = optim.SGD(model.parameters(), lr=0.1) # needs 30000 epochs
# adaptive optimizer, adjusts the learning rate dynamically
optimizer = optim.Adam(model.parameters(), lr=0.1)  # 🔥 Faster Convergence
# or RMSprop → ⚡ Works well with small datasets
#optimizer = optim.RMSprop( model.parameters(), lr=0.1) # a bit too good

# PyTorch models work with tensors instead of normal lists/arrays.
# convert the input data (data) and expected labels (labels) into PyTorch tensors so the model can process them
data_tensor = torch.tensor(data)
labels_tensor = torch.tensor(labels)

for epoch in range(10000):
    optimizer.zero_grad()      # Reset gradients to zero
    output = model(data_tensor)  # Forward pass: Get model's prediction
    loss = criterion(output, labels_tensor)  # Compute the loss
    loss.backward()            # Compute gradients (how to adjust weights)
    optimizer.step()           # Update model weights based on gradients

    if epoch % 1000 == 0:
        print(f"Epoch {epoch}, Loss: {loss.item():.6f}")

# ✅ Save the Trained Model
torch.save(model.state_dict(), "xor_demo.pth")
print("\n✅ Model trained and saved as xor_demo.pth")


Epoch 0, Loss: 0.254387
Epoch 1000, Loss: 0.000083
Epoch 2000, Loss: 0.000024
Epoch 3000, Loss: 0.000011
Epoch 4000, Loss: 0.000005
Epoch 5000, Loss: 0.000003
Epoch 6000, Loss: 0.000002
Epoch 7000, Loss: 0.000001
Epoch 8000, Loss: 0.000001
Epoch 9000, Loss: 0.000000

✅ Model trained and saved as xor_demo.pth


**What this code does**
### The Training Loop (How Learning Happens)

| Step | Code | Explanation |
| ----- | ----- | ----- |
| **1** | `optimizer.zero_grad()` | Clears gradients from the previous step to avoid accumulation. |
| **2** | `output = model(data_tensor)` | Feeds input data through the network (forward pass). |
| **3** | `loss = criterion(output, labels_tensor)` | Computes how far the prediction is from the correct label. |
| **4** | `loss.backward()` | Calculates gradients (derivatives) for each weight in the model. |
| **5** | `optimizer.step()` | Updates model weights using the computed gradients. |
| **6** | `if epoch % 1000 == 0:` | Every 1000 epochs, prints the current loss. |



### Discussion Questions

- What would happen if we used SGD instead of Adam?
- Why do we use a hidden layer in this model?
- How could we reduce training time further?
- How do we know if the model is overfitting?



**What would happen if we used SGD instead of Adam?**

- Slower Learning: SGD updates weights at a fixed learning rate, which is not ideal for XOR (a non-linearly separable problem).
- More Epochs Required: Would need 30,000+ epochs to reach good accuracy.
- Gets Stuck Easily: Can get trapped in local minima because it lacks adaptive learning rates.
- When is SGD Useful?: Works well for large datasets but struggles with small ones like XOR.

👉 Follow-up Questions:

- Why does Adam perform better here?
- When should we prefer SGD over Adam?
- What about RMSprop?

2️⃣ Why do we use a hidden layer in this model?

- XOR is Not Linearly Separable: A simple linear classifier (like logistic regression) cannot learn XOR.
- Captures Nonlinear Patterns: The hidden layer allows the model to map inputs to a curved decision boundary.
- More Neurons ≠ Always Better: Too many neurons may lead to overfitting.

👉 **Follow-up Questions:**

- What happens if we remove the hidden layer?
- Could a single hidden neuron solve XOR?
- What if we add more hidden layers?

3️⃣ How could we reduce training time further?

- Use a smaller number of epochs (e.g., 10,000 instead of 30,000) – does it still learn XOR?
- Reduce learning rate to prevent overshooting.
- Try different optimizers like RMSprop or AdamW.
- Batch Training: Instead of using the full dataset at once, could we train on mini-batches?
- Use GPU acceleration (PyTorch can run on CUDA).

👉 **Follow-up Questions:**

- Does reducing epochs harm accuracy?
- Would adding more neurons make training faster or slower?
- Could different activation functions (e.g., ReLU instead of Sigmoid) help?

4️⃣ How do we know if the model is overfitting?

- Loss is very low, but generalization is bad (i.e., works well on training data but fails on new data).
- Too many parameters in a small dataset can lead to overfitting.
- Look at Validation Loss: If training loss is low but validation loss is high, it's overfitting.
- Try Regularization (L2 weight decay) or dropout layers to prevent it.

👉 **Follow-up Questions:**

- If we train for even more epochs (e.g., 100,000), what happens?
- Would increasing the dataset size help reduce overfitting?
- Could we use data augmentation for an XOR problem?


### Prediction
We are testing the model to see how it works

In [5]:
import torch

# Ensure the model is in evaluation mode (disables dropout, batch norm updates)
model.eval()

# Convert input data to a PyTorch tensor
data_tensor = torch.tensor(data, dtype=torch.float32)

# Perform predictions (no_grad prevents tracking gradients for efficiency)
with torch.no_grad():
    predictions = model(data_tensor)

# Print predictions
print("\nNeural Network Predictions:")
for i, pred in enumerate(predictions):
    pred_value = pred.item()  # Convert tensor to Python float
    print(f"Inputs: {data[i]} -> Prediction: {pred_value:.4f} -> Rounded: {round(pred_value)}")



Neural Network Predictions:
Inputs: [0. 0.] -> Prediction: 0.0004 -> Rounded: 0
Inputs: [0. 1.] -> Prediction: 0.9996 -> Rounded: 1
Inputs: [1. 0.] -> Prediction: 0.9994 -> Rounded: 1
Inputs: [1. 1.] -> Prediction: 0.0004 -> Rounded: 0


---

## BONUS: Implement the XOR Neural Network in Keras

This is informational only. We are not using Keras for building models for embedded systems

PyTorch is more relevant
- ✅ PyTorch helps understand embedded AI better
- ✅ Easier transition to Edge AI tools (ESP capability, flexibility, optimization)
- ✅ Keras adds unnecessary abstraction

```!python
keras_model = keras.Sequential([
    keras.layers.Dense(3, activation='sigmoid', input_shape=(2,)),
    keras.layers.Dense(1, activation='sigmoid')
])

keras_model.compile(optimizer='sgd', loss='mean_squared_error')
keras_model.fit(data, labels, epochs=10000, verbose=0)
```


**Explanation:**
- `Sequential()`: Creates a simple feedforward neural network.
- `Dense(3, activation='sigmoid')`: Hidden layer with 3 neurons and sigmoid activation.
- `Dense(1, activation='sigmoid')`: Output layer with 1 neuron and sigmoid activation.
- `compile()`: Configures the model with **SGD** optimizer and **MSE** loss function.
- `fit()`: Trains the model for **10,000 epochs**.

## **Step 5:** Convert PyTorch Model to ONNX

First, we need to convert the trained PyTorch model into ONNX format, which acts as a bridge between PyTorch and TensorFlow.

It could be done with command line:
```
onnx2tf -i xor_model.onnx -o xor_model_tf -cotof
```

In [6]:
!pip install onnx

Collecting onnx
  Downloading onnx-1.18.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.9 kB)
Downloading onnx-1.18.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.6/17.6 MB[0m [31m85.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: onnx
Successfully installed onnx-1.18.0


In [7]:
import torch
import onnx

# Load your trained PyTorch model
model = XOR_NN_PyTorch()  # Replace with your model class
model.load_state_dict(torch.load("xor_demo.pth", weights_only=True))
model.eval()

# Define input shape
dummy_input = torch.randn(1, 2)  # Same shape as your model input
# dummy_input = torch.tensor([[0.0, 0.0]], dtype=torch.float32) # or like that, start with zero

# Convert to ONNX
onnx_path = "xor_model.onnx"
torch.onnx.export(
    model,  # Use the original PyTorch model, not traced_model
    dummy_input,
    onnx_path,
    input_names=["input"],
    output_names=["output"],
    dynamic_axes={"input": {0: "batch_size"}, "output": {0: "batch_size"}},
    opset_version=17  # Use a recent ONNX opset version
)

print(f"✅ ONNX model saved to {onnx_path}")

✅ ONNX model saved to xor_model.onnx


## **Step 6:** Convert ONNX to TensorFlow

We convert ONNX to TensorFlow using onnx-tf.

Install the ONNX-TensorFlow converter, the depencencies are a bit tricky

In [8]:
!pip install onnx2tf onnx-graphsurgeon ai_edge_litert sng4onnx onnx-simplifier onnxruntime sne4onnx

Collecting onnx2tf
  Downloading onnx2tf-1.27.10-py3-none-any.whl.metadata (149 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/149.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━[0m [32m143.4/149.2 kB[0m [31m5.2 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m149.2/149.2 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting onnx-graphsurgeon
  Downloading onnx_graphsurgeon-0.5.8-py2.py3-none-any.whl.metadata (8.2 kB)
Collecting ai_edge_litert
  Downloading ai_edge_litert-1.3.0-cp311-cp311-manylinux_2_17_x86_64.whl.metadata (1.7 kB)
Collecting sng4onnx
  Downloading sng4onnx-1.0.4-py3-none-any.whl.metadata (4.6 kB)
Collecting onnx-simplifier
  Downloading onnx_simplifier-0.4.36-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.3 kB)
Collecting onnxruntime
  Downloading onnxruntime-1.22.0-cp311-cp311-manylinux_2_27_x86_

### Run the conversion

In [9]:
import onnx
import onnx2tf

# Load ONNX model
onnx_model_path = "xor_model.onnx"
onnx_model = onnx.load(onnx_model_path)

# Convert to TensorFlow
onnx2tf.convert(
    input_onnx_file_path=onnx_model_path,
    output_folder_path="xor_model",
    copy_onnx_input_output_names_to_tflite=True  # Ensures input/output names are preserved
)
print("Converted to TensorFlow SavedModel format.")


Simplifying...
Finish! Here is the difference:
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃            ┃ Original Model ┃ Simplified Model ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ Constant   │ 4              │ 4                │
│ Gemm       │ 2              │ 2                │
│ Sigmoid    │ 2              │ 2                │
│ Model Size │ 645.0B         │ 802.0B           │
└────────────┴────────────────┴──────────────────┘

Simplifying...
Finish! Here is the difference:
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃            ┃ Original Model ┃ Simplified Model ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ Constant   │ 4              │ 4                │
│ Gemm       │ 2              │ 2                │
│ Sigmoid    │ 2              │ 2                │
│ Model Size │ 802.0B         │ 802.0B           │
└────────────┴────────────────┴──────────────────┘

Simplifying...
Finish! Here is the difference:
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━

### Prepare a Representative Dataset

For INT8 quantization, TensorFlow Lite requires a representative dataset to calibrate the model's dynamic ranges. This dataset should be representative of the input data the model will process during inference.

In [10]:
import numpy as np

def representative_dataset():
    for _ in range(100):
        # Provide data in the same shape as your model's input
        data = np.random.rand(1, 2).astype(np.float32)
        yield [data]


Ensure that the data generated matches the input shape and data type expected by your model.

### Convert to INT8 TFLite Model

Use TensorFlow Lite's converter to quantize the model to INT8.

In [11]:
import tensorflow as tf

# Load the TFLite model
tflite_model_path = "xor_model/xor_model_float32.tflite"
interpreter = tf.lite.Interpreter(model_path=tflite_model_path)

# Get model input details
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

print("Input details:", input_details)
print("Output details:", output_details)


Input details: [{'name': 'input', 'index': 0, 'shape': array([1, 2], dtype=int32), 'shape_signature': array([-1,  2], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]
Output details: [{'name': 'Identity', 'index': 12, 'shape': array([1, 1], dtype=int32), 'shape_signature': array([-1,  1], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]


### Quantization from Float32 to Int16 or Int8 for ESP32S3

### Full Integer Quantization from Float32 (Recommended for ESP32S3)

To improve performance on embedded hardware, we should quantize weights and activations:

In [12]:
import tensorflow as tf

# Load the SavedModel
converter = tf.lite.TFLiteConverter.from_saved_model("xor_model")
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8

# Convert the model
tflite_quant_model = converter.convert()

# Save the quantized model
with open("xor_model_int8.tflite", "wb") as f:
    f.write(tflite_quant_model)

print("INT8 quantized TFLite model saved as 'xor_model_int8.tflite'.")


INT8 quantized TFLite model saved as 'xor_model_int8.tflite'.


### Verify TFLite Model Properties
After conversion, let’s check if the model is properly quantized:

In [13]:
interpreter = tf.lite.Interpreter(model_path="xor_model_int8.tflite")
interpreter.allocate_tensors()

input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

print("Input details:", input_details)
print("Output details:", output_details)


Input details: [{'name': 'serving_default_input:0', 'index': 0, 'shape': array([1, 2], dtype=int32), 'shape_signature': array([-1,  2], dtype=int32), 'dtype': <class 'numpy.int8'>, 'quantization': (0.0039170472882688046, -128), 'quantization_parameters': {'scales': array([0.00391705], dtype=float32), 'zero_points': array([-128], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]
Output details: [{'name': 'PartitionedCall:0', 'index': 12, 'shape': array([1, 1], dtype=int32), 'shape_signature': array([-1,  1], dtype=int32), 'dtype': <class 'numpy.int8'>, 'quantization': (0.00390625, -128), 'quantization_parameters': {'scales': array([0.00390625], dtype=float32), 'zero_points': array([-128], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]


Looks good! Your model has been successfully quantized to UINT8 (full integer quantization):

✅ Input dtype: numpy.uint8 (was float32)
✅ Output dtype: numpy.uint8 (was float32)
✅ Quantization scales:

- Input: 0.0038925335
- Output: 0.00390625

🚀 XNNPACK delegate applied

This means optimized execution using TensorFlow Lite's XNNPACK backend, which is great for performance on embedded devices!

### Predictions
This section demonstrates how the trained neural network model makes predictions on XOR inputs. The goal of this demonstration is to showcase how a quantized neural network can accurately compute the XOR function while running efficiently on embedded hardware.

In the previous implementation, the model was tested by feeding XOR input values and comparing the predicted outputs to expected results. Now, we will use the quantized TensorFlow Lite model and run inference using the TFLite interpreter.

#### Code for Running Predictions

The following code loads the quantized model and performs inference:

In [14]:
import numpy as np
import tensorflow.lite as tflite

# Load the TFLite model
interpreter = tflite.Interpreter(model_path="xor_model_int8.tflite")
interpreter.allocate_tensors()

# Get input and output details
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Define XOR input samples
inputs = np.array([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=np.float32)

# Quantize inputs
scale, zero_point = input_details[0]['quantization']
inputs_q = np.round(inputs / scale + zero_point).astype(np.int8)

# Run inference
predictions = []
for x in inputs_q:
    x = np.expand_dims(x, axis=0)  # TFLite expects batch dimension
    interpreter.set_tensor(input_details[0]['index'], x)
    interpreter.invoke()
    output_data = interpreter.get_tensor(output_details[0]['index'])

    # Dequantize output
    output_scale, output_zero_point = output_details[0]['quantization']
    pred = (output_data.astype(np.float32) - output_zero_point) * output_scale

    predictions.append(pred[0][0])

# Print results
print("\nNeural Network Predictions:")
for i, pred in enumerate(predictions):
    print(f"Inputs: {inputs[i]} -> Prediction: {pred:.4f} -> Rounded: {round(pred)}")



Neural Network Predictions:
Inputs: [0. 0.] -> Prediction: 0.0000 -> Rounded: 0
Inputs: [0. 1.] -> Prediction: 0.9961 -> Rounded: 1
Inputs: [1. 0.] -> Prediction: 0.9961 -> Rounded: 1
Inputs: [1. 1.] -> Prediction: 0.0000 -> Rounded: 0


Let's evaluate our work

Measures Model File Sizes: Compares the storage requirements of the original PyTorch model and the quantized TFLite model.​
Evaluates Inference Speed: Assesses the time taken to perform inference on a sample input for both models.​
NVIDIA Developer Forums
Checks Prediction Consistency: Ensures that the quantized model's outputs are consistent with those of the original model.​

In [15]:
import numpy as np
import tensorflow as tf

# Load the int8 TFLite model
interpreter_int8 = tf.lite.Interpreter(model_path='xor_model_int8.tflite')
interpreter_int8.allocate_tensors()

# Retrieve input and output tensor details
input_details = interpreter_int8.get_input_details()
output_details = interpreter_int8.get_output_details()

# Get quantization parameters for input and output
input_scale, input_zero_point = input_details[0]['quantization']
output_scale, output_zero_point = output_details[0]['quantization']

# Prepare your float32 input data
input_data = np.array([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=np.float32)

# Iterate over each input
for single_input in input_data:
    # Reshape input to match model's expected input shape [1, 2]
    single_input = np.expand_dims(single_input, axis=0)

    # Quantize the input data to int8
    input_data_int8 = np.round(single_input / input_scale + input_zero_point).astype(np.int8)

    # Set the tensor with the quantized data
    interpreter_int8.set_tensor(input_details[0]['index'], input_data_int8)

    # Run inference
    interpreter_int8.invoke()

    # Retrieve and dequantize the output data
    output_data_int8 = interpreter_int8.get_tensor(output_details[0]['index'])
    output_data = (output_data_int8.astype(np.float32) - output_zero_point) * output_scale

    print("Input:", single_input)
    print(f"Output {output_data_int8}, Dequantized {output_data}")


Input: [[0. 0.]]
Output [[-128]], Dequantized [[0.]]
Input: [[0. 1.]]
Output [[127]], Dequantized [[0.99609375]]
Input: [[1. 0.]]
Output [[127]], Dequantized [[0.99609375]]
Input: [[1. 1.]]
Output [[-128]], Dequantized [[0.]]


### Define Functions

In [16]:
import os
import time
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import tensorflow as tf

def get_file_size(file_path):
    """Returns the size of the file in kilobytes."""
    return os.path.getsize(file_path) / 1024

def measure_inference_time(model, input_data, model_type='pytorch'):
    """
    Measures the average inference time over the input data.

    Parameters:
    - model: The model to evaluate.
    - input_data: Data to run through the model.
    - model_type: 'pytorch' or 'tflite' indicating the type of model.

    Returns:
    - Average inference time in milliseconds.
    """
    if model_type == 'pytorch':
        model.eval()
        with torch.no_grad():
            start_time = time.time()
            for _ in range(10):  # Run multiple times for averaging
                for single_input in input_data:
                    model(single_input)
            total_time = time.time() - start_time
    elif model_type == 'tflite':
        interpreter = tf.lite.Interpreter(model_content=model)
        interpreter.allocate_tensors()
        input_details = interpreter.get_input_details()
        output_details = interpreter.get_output_details()

        start_time = time.time()
        for _ in range(10):  # Run multiple times for averaging
            for single_input in input_data:
                # Ensure the input is in the correct shape and type
                input_data = np.expand_dims(single_input, axis=0).astype(input_details[0]['dtype'])
                interpreter.set_tensor(input_details[0]['index'], input_data)
                interpreter.invoke()
                interpreter.get_tensor(output_details[0]['index'])
        total_time = time.time() - start_time
    else:
        raise ValueError("Unsupported model type. Choose 'pytorch' or 'tflite'.")

    avg_time = (total_time / (10 * len(input_data))) * 1000  # Convert to milliseconds
    return avg_time

def evaluate_model_consistency(model, input_data, model_type='pytorch'):
    """
    Evaluates the consistency of the model's outputs over the input data.

    Parameters:
    - model_path: Path to the TFLite model file or PyTorch model.
    - input_data: Data to run through the model.
    - model_type: 'pytorch', 'tflite_float32', or 'tflite_int8' indicating the type of model.

    Returns:
    - Standard deviation of the outputs.
    """
    outputs = []

    if model_type == 'pytorch':
        model.eval()
        print( input_data )
        with torch.no_grad():
            for single_input in input_data:
                single_input = torch.tensor(single_input, dtype=torch.float32).unsqueeze(0)  # Ensure shape (1,2)
                output = model(single_input)
                outputs.append(output.numpy().flatten())  # Flatten for consistency

    elif model_type in ['tflite_float32', 'tflite_int8']:
        interpreter = tf.lite.Interpreter(model_content=model)
        interpreter.allocate_tensors()
        input_details = interpreter.get_input_details()
        output_details = interpreter.get_output_details()

        # Retrieve input quantization parameters
        input_scale, input_zero_point = input_details[0]['quantization']
        output_scale, output_zero_point = output_details[0]['quantization']

        for single_input in input_data:
            # Ensure input is a 2D array with shape [1, input_dim]
            single_input = np.expand_dims(single_input, axis=0).astype(np.float32)

            if model_type == 'tflite_int8':
                # Quantize the input data for int8 model
                single_input = np.round(single_input / input_scale + input_zero_point).astype(np.int8)

            # Set the tensor with the appropriately formatted data
            interpreter.set_tensor(input_details[0]['index'], single_input)

            # Run inference
            interpreter.invoke()

            # Retrieve and dequantize the output data
            output_data = interpreter.get_tensor(output_details[0]['index'])
            if model_type == 'tflite_int8':
                output_data = (output_data.astype(np.float32) - output_zero_point) * output_scale

            outputs.append(output_data)

    else:
        raise ValueError("Unsupported model type. Choose 'pytorch', 'tflite_float32', or 'tflite_int8'.")

    outputs = np.array(outputs)
    outputs = outputs.flatten()
    print( outputs )

    expected_outputs = np.array([0, 1, 1, 0])
    print( expected_outputs )
    predictions = np.round(outputs)  # Round outputs to nearest binary value
    accuracy = np.mean(predictions == expected_outputs)
    return accuracy



### Load the PyTorch model

In [18]:
# Define the PyTorch model architecture
class XOR_NN_PyTorch(nn.Module):
    def __init__(self):
        super(XOR_NN_PyTorch, self).__init__()
        self.hidden = nn.Linear(2, 3)
        self.output = nn.Linear(3, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.sigmoid(self.hidden(x))
        x = self.sigmoid(self.output(x))
        return x

# Load the model
pytorch_model_path = 'xor_demo.pth'
pytorch_model = XOR_NN_PyTorch()
pytorch_model.load_state_dict(torch.load(pytorch_model_path))
pytorch_model.eval()


XOR_NN_PyTorch(
  (hidden): Linear(in_features=2, out_features=3, bias=True)
  (output): Linear(in_features=3, out_features=1, bias=True)
  (sigmoid): Sigmoid()
)

### Load the TFLite models

In [19]:
# Paths to the TFLite models
tflite_float32_model_path = 'xor_model/xor_model_float32.tflite'
tflite_int8_model_path = 'xor_model_int8.tflite'

# Load the TFLite models
with open(tflite_float32_model_path, 'rb') as f:
    tflite_float32_model = f.read()

with open(tflite_int8_model_path, 'rb') as f:
    tflite_int8_model = f.read()


### Prepare the input data for each model

In [20]:
# XOR input data
xor_input_data = np.array([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=np.float32)

# For PyTorch: Convert to torch tensors
pytorch_input_data = [torch.tensor(data, dtype=torch.float32) for data in xor_input_data]


### Evaluate the Models

Measure the size, inference time, and consistency for each model:

In [21]:
# Evaluate PyTorch model
pytorch_size = get_file_size(pytorch_model_path)
pytorch_avg_time = measure_inference_time(pytorch_model, pytorch_input_data, 'pytorch')
pytorch_consistency = evaluate_model_consistency(pytorch_model, pytorch_input_data, 'pytorch')

# Evaluate TFLite Float32 model
tflite_float32_size = get_file_size(tflite_float32_model_path)
tflite_float32_avg_time = measure_inference_time(tflite_float32_model, xor_input_data, 'tflite')
tflite_float32_consistency = evaluate_model_consistency(tflite_float32_model, xor_input_data, 'tflite_float32')

# Evaluate TFLite INT8 model
tflite_int8_size = get_file_size(tflite_int8_model_path)
tflite_int8_avg_time = measure_inference_time(tflite_int8_model, xor_input_data, 'tflite')
tflite_int8_consistency = evaluate_model_consistency(tflite_int8_model, xor_input_data, 'tflite_int8')

# Print results
print(f"PyTorch Model: Size = {pytorch_size} bytes, Avg Inference Time = {pytorch_avg_time:.6f} seconds")
print(f"TFLite Float32 Model: Size = {tflite_float32_size} bytes, Avg Inference Time = {tflite_float32_avg_time:.6f} seconds")
print(f"TFLite INT8 Model: Size = {tflite_int8_size} bytes, Avg Inference Time = {tflite_int8_avg_time:.6f} seconds")

print("Accuracy Check (Outputs):")
print(f"PyTorch Model Output: {pytorch_consistency}")
print(f"Model Accuracy: {pytorch_consistency * 100:.2f}%")
print(f"TFLite Float32 Model Output: {tflite_float32_consistency}")
print(f"TFLite INT8 Model Output: {tflite_int8_consistency}")

[tensor([0., 0.]), tensor([0., 1.]), tensor([1., 0.]), tensor([1., 1.])]
[4.4463607e-04 9.9964261e-01 9.9940622e-01 4.2789589e-04]
[0 1 1 0]
[4.4463668e-04 9.9964261e-01 9.9940628e-01 4.2789482e-04]
[0 1 1 0]
[0.         0.99609375 0.99609375 0.        ]
[0 1 1 0]
PyTorch Model: Size = 2.0859375 bytes, Avg Inference Time = 0.142491 seconds
TFLite Float32 Model: Size = 2.09375 bytes, Avg Inference Time = 0.065184 seconds
TFLite INT8 Model: Size = 2.6953125 bytes, Avg Inference Time = 0.037932 seconds
Accuracy Check (Outputs):
PyTorch Model Output: 1.0
Model Accuracy: 100.00%
TFLite Float32 Model Output: 1.0
TFLite INT8 Model Output: 1.0


### Interpretation

- Typically, quantized models like the TFLite INT8 version are expected to have smaller sizes due to reduced precision. However, in your case, the INT8 model is larger. This anomaly might result from additional overhead introduced during the quantization process or specific model architecture characteristics.​

- The TFLite INT8 model demonstrates the fastest inference time, aligning with expectations that lower-precision models execute computations more swiftly. The PyTorch model exhibits the slowest inference, which is consistent with observations that TensorFlow Lite often outperforms PyTorch in mobile environments due to its optimized architecture.

- The output consistency, as measured by standard deviation, indicates variability in model predictions. The PyTorch and TFLite Float32 models show some variability, while the TFLite INT8 model exhibits no variability (standard deviation of 0.0). This could suggest that the INT8 model is producing uniform outputs, potentially indicating a loss of sensitivity or dynamic range due to quantization. Such issues have been reported where INT8 quantization leads to degraded performance or incorrect confidence scores