##

## **Training a deep learning model (Clasical solution)**

```javascript
Since we made our dataset private on Hugging Face during phase 2, we need to log in to access it.
```

In [2]:
!huggingface-cli login --token ""

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: fineGrained).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [2]:
!pip install pennylane evaluate 

Defaulting to user installation because normal site-packages is not writeable
Collecting pennylane
  Using cached PennyLane-0.37.0-py3-none-any.whl.metadata (9.3 kB)
Collecting rustworkx (from pennylane)
  Using cached rustworkx-0.15.1-cp38-abi3-macosx_11_0_arm64.whl.metadata (9.9 kB)
Collecting autograd (from pennylane)
  Using cached autograd-1.6.2-py3-none-any.whl.metadata (706 bytes)
Collecting appdirs (from pennylane)
  Using cached appdirs-1.4.4-py2.py3-none-any.whl.metadata (9.0 kB)
Collecting semantic-version>=2.7 (from pennylane)
  Using cached semantic_version-2.10.0-py2.py3-none-any.whl.metadata (9.7 kB)
Collecting autoray>=0.6.11 (from pennylane)
  Using cached autoray-0.6.12-py3-none-any.whl.metadata (5.7 kB)
Collecting pennylane-lightning>=0.37 (from pennylane)
  Downloading PennyLane_Lightning-0.37.0-cp39-cp39-macosx_11_0_arm64.whl.metadata (23 kB)
Using cached PennyLane-0.37.0-py3-none-any.whl (1.8 MB)
Using cached autoray-0.6.12-py3-none-any.whl (50 kB)
Downloading Pen

## **Imports**

In [4]:
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
from torchvision.transforms import RandomResizedCrop, Compose, Normalize, ToTensor

from tqdm import tqdm
import pennylane as qml
import evaluate
from datasets import load_dataset, load_metric
from transformers import  (AutoModel, AutoConfig, 
                          AutoImageProcessor, 
                          Trainer, TrainingArguments)


## **Loading the dataset**

```javascript
We used the same datset as in the classical version
``` 

In [5]:
dataset = load_dataset("LaLegumbreArtificial/womanium-balance")
dataset["train"].features

{'image': Image(mode=None, decode=True, id=None),
 'label': ClassLabel(names=['GOOD', 'DAMAGE'], id=None)}

In [6]:
dataset

DatasetDict({
    train: Dataset({
        features: ['image', 'label'],
        num_rows: 14000
    })
    test: Dataset({
        features: ['image', 'label'],
        num_rows: 6000
    })
})

## **Preprocessing of the data**


```java
In the next cells same as the classical model we need to preprocess the data to be able to feed it to the hybrid model
```

In [7]:
checkpoint = "google/vit-base-patch16-224"

image_processor  = AutoImageProcessor.from_pretrained(checkpoint)


normalize = Normalize(mean=image_processor.image_mean, std=image_processor.image_std)
size = (
    image_processor.size["shortest_edge"]
    if "shortest_edge" in image_processor.size
    else (image_processor.size["height"], image_processor.size["width"])
)
_transforms = Compose([RandomResizedCrop(size), ToTensor(), normalize])

def transforms(examples):
    examples["pixel_values"] = [_transforms(img.convert("RGB")) for img in examples["image"]]
    del examples["image"]
    return examples

preprocessor_config.json:   0%|          | 0.00/160 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/69.7k [00:00<?, ?B/s]

Fast image processor class <class 'transformers.models.vit.image_processing_vit_fast.ViTImageProcessorFast'> is available for this model. Using slow image processor class. To use the fast image processor class set `use_fast=True`.


In [8]:
dataset = dataset.with_transform(transforms)

```java
In the final result, each image is normalized, cropped, and converted into a tensor, making it ready for input into the model.
```

In [9]:
dataset["train"][0]["pixel_values"]

tensor([[[-0.5922, -0.5843, -0.5765,  ..., -0.5373, -0.5451, -0.5451],
         [-0.5922, -0.5843, -0.5686,  ..., -0.5373, -0.5451, -0.5451],
         [-0.5922, -0.5765, -0.5608,  ..., -0.5294, -0.5373, -0.5373],
         ...,
         [-0.5765, -0.5608, -0.5373,  ..., -0.3020, -0.3020, -0.2941],
         [-0.5765, -0.5529, -0.5373,  ..., -0.3098, -0.3098, -0.2941],
         [-0.5686, -0.5451, -0.5373,  ..., -0.3176, -0.3176, -0.3020]],

        [[-0.5922, -0.5843, -0.5765,  ..., -0.5373, -0.5451, -0.5451],
         [-0.5922, -0.5843, -0.5686,  ..., -0.5373, -0.5451, -0.5451],
         [-0.5922, -0.5765, -0.5608,  ..., -0.5294, -0.5373, -0.5373],
         ...,
         [-0.5765, -0.5608, -0.5373,  ..., -0.3020, -0.3020, -0.2941],
         [-0.5765, -0.5529, -0.5373,  ..., -0.3098, -0.3098, -0.2941],
         [-0.5686, -0.5451, -0.5373,  ..., -0.3176, -0.3176, -0.3020]],

        [[-0.5922, -0.5843, -0.5765,  ..., -0.5373, -0.5451, -0.5451],
         [-0.5922, -0.5843, -0.5686,  ..., -0

## **Evaluation metrics**

```java
In this case, we chose accuracy as the primary metric. Given that the dataset is balanced between the two classes, additional metrics are not necessary at this time.
```

In [12]:
accuracy = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return accuracy.compute(predictions=predictions, references=labels)

Downloading builder script:   0%|          | 0.00/4.20k [00:00<?, ?B/s]

```java
A data collator is a tool that helps prepare batches of data for training or testing a model. 
```

In [10]:
from transformers import DefaultDataCollator

data_collator = DefaultDataCollator()

## **- Quantum-Enhanced Vision Transformer -**

```java
In the next two cells, we created a hybrid model that combines a Vision Transformer architecture with a quantum circuit. The Vision Transformer processes the image input, and its output is passed through a dense layer. The resulting features are then fed into a quantum layer, implemented as a quantum circuit with 2 qubits. This quantum layer acts as a final transformation before producing the output logits, which can be used for classification. The quantum circuit is integrated into the model as a custom layer, enabling quantum computations on the features extracted by the Vision Transformer.
```

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

```java
Quantum Layer description:

qml.AngleEmbedding(inputs, wires=range(n_qubits)) - This function embeds classical input data into the quantum circuit by encoding it into the rotation angles of the qubits. It allows the input data to be represented in the quantum state, making it possible for the quantum circuit to process the data.

qml.BasicEntanglerLayers(weights, wires=range(n_qubits)) - This template applies a series of entangling operations across the qubits, along with rotations (which can be around the X, Y, or Z axis). Entanglement is a key feature of quantum mechanics that allows qubits to become interconnected and influence each other’s states. The rotations are parameterized by weights, which are trainable parameters with the shape (n_layers, n_qubits). This template makes it easier to create a trainable quantum circuit that can capture complex relationships in the data.

return [qml.expval(qml.PauliZ(wires=i)) for i in range(n_qubits)] - After processing the input through the quantum circuit, the final quantum states of the qubits are measured. Specifically, the circuit measures the expectation value of the Pauli-Z operator for each qubit. This measurement provides a classical output from the quantum circuit, which can be further processed in the overall model.
```

In [None]:
n_qubits = 2
dev = qml.device("default.qubit", wires=n_qubits)

@qml.qnode(dev)
def qnode(inputs, weights):
    qml.AngleEmbedding(inputs, wires=range(n_qubits))
    qml.BasicEntanglerLayers(weights, wires=range(n_qubits))
    return [qml.expval(qml.PauliZ(wires=i)) for i in range(n_qubits)]

n_layers = 6
weight_shapes = {"weights": (n_layers, n_qubits)}


# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

```java

Full Architecture Explanation: Quantum-Enhanced Vision Transformer Model

After creating the quantum layer, we developed a hybrid model called `QuantumEnhancedVisionTransformer`. This model is built on the "google/vit-base-patch16-224" Vision Transformer as the base architecture. The process flow is as follows:

1. Vision Transformer Processing The model first processes the input images using the Vision Transformer. This step extracts high-level features from the images.

2. Dropout Layer After obtaining the output from the Vision Transformer, we apply a dropout layer. This helps prevent overfitting by randomly setting a fraction of the input units to zero during training, which promotes generalization.

3. Dense Layer The features from the dropout layer are then passed through a dense (fully connected) layer. This layer reduces the dimensionality of the features and prepares them for the next step, ensuring that the most important information is retained.

4. Quantum Layer Finally, the processed features are fed into the quantum layer, which applies quantum computations to the data. This quantum layer can potentially capture complex patterns and relationships in the data that classical layers might miss. The output of the quantum layer is used to make the final predictions.

This hybrid model leverages the strengths of both classical deep learning (through the Vision Transformer) and quantum computing (through the quantum layer) to perform image classification.
```


In [11]:
class QuantumEnhancedVisionTransformer(nn.Module):
    def __init__(self, checkpoint, num_labels):
        super(QuantumEnhancedVisionTransformer, self).__init__()
        self.num_labels = num_labels

        # Create the model layers
        self.config = AutoConfig.from_pretrained(checkpoint, output_attentions=True, output_hidden_states=True)
        self.model = AutoModel.from_pretrained(checkpoint, config=self.config)
        self.dropout = nn.Dropout(0.1)
        self.dense1 = nn.Linear(self.config.hidden_size, num_labels)
        self.qlayer = qml.qnn.TorchLayer(qnode, weight_shapes)


    def forward(self, pixel_values=None, labels=None):
        if pixel_values is None:
            raise ValueError("Wrong input")

        # Create the flow 
        outputs = self.model(pixel_values=pixel_values)
        pooled_output = outputs.pooler_output  

        # add custom layers
        dropout_output = self.dropout(pooled_output)
        dense_output = self.dense1(dropout_output)
        logits = self.qlayer(dense_output)

        loss = None
        if labels is not None:
            loss_fct = nn.CrossEntropyLoss()
            loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))

        if loss is not None:
            return loss, logits
        else:
            return logits

# parameters of the model
num_labels = 2
num_epochs = 5

# Create an object of the model
model = QuantumEnhancedVisionTransformer(checkpoint=checkpoint, num_labels=num_labels)


model.safetensors:   0%|          | 0.00/346M [00:00<?, ?B/s]

Some weights of ViTModel were not initialized from the model checkpoint at google/vit-base-patch16-224 and are newly initialized: ['vit.pooler.dense.bias', 'vit.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


## **Training and parameters**

```java
Here we defined the hyperparameters used for the model, such as the learning rate, optimizer settings, number of epochs, and batch sizes for both training and evaluation:

For simplification and also comparison we used the same parameters as the classical model
```

In [13]:
training_args = TrainingArguments(
    output_dir=f"Model_custom_pythorch_Q1",
    remove_unused_columns=False,
    eval_strategy="epoch",
    save_strategy="epoch",
    learning_rate=5e-5,
    per_device_train_batch_size=16,
    gradient_accumulation_steps=4,
    per_device_eval_batch_size=16,
    num_train_epochs=5,
    warmup_ratio=0.1,
    logging_steps=10,
    load_best_model_at_end=True,
    metric_for_best_model="accuracy",
    do_train=True,
    do_eval=True,
    push_to_hub=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
    tokenizer=image_processor,
    compute_metrics=compute_metrics,
)

trainer.train()

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
[34m[1mwandb[0m: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

  ········································


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc




Epoch,Training Loss,Validation Loss,Accuracy
0,0.4404,0.394224,0.956167
2,0.4323,0.372869,0.967833
4,0.3993,0.357715,0.981333




TrainOutput(global_step=545, training_loss=0.45231932434467, metrics={'train_runtime': 3945.5042, 'train_samples_per_second': 17.742, 'train_steps_per_second': 0.138, 'total_flos': 0.0, 'train_loss': 0.45231932434467, 'epoch': 4.9771689497716896})

In [49]:
trainer.push_to_hub()

model_weights.pth:   0%|          | 0.00/346M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/LaLegumbreArtificial/Model_custom_pythorch_Q1/commit/341e693c1aa779060d6fdf07d3ed43a97df0a07a', commit_message='End of training', commit_description='', oid='341e693c1aa779060d6fdf07d3ed43a97df0a07a', pr_url=None, pr_revision=None, pr_num=None)

## **Save the weigths**

```java
This section was created to show that we saved the model
```

In [48]:
torch.save(model.state_dict(), "/kaggle/working/Model_custom_pythorch_Q1/model_weights.pth")

## **Load the model**

In [20]:
# Initialize the model architecture
model_2 = QuantumEnhancedVisionTransformer(checkpoint=checkpoint, num_labels=num_labels)

# Load the saved weights
model_2.load_state_dict(torch.load("model_weights.pth"))
model_2.eval()  # Set the model to evaluation mode

Some weights of ViTModel were not initialized from the model checkpoint at google/vit-base-patch16-224 and are newly initialized: ['vit.pooler.dense.bias', 'vit.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


CustomVisionModel(
  (model): ViTModel(
    (embeddings): ViTEmbeddings(
      (patch_embeddings): ViTPatchEmbeddings(
        (projection): Conv2d(3, 768, kernel_size=(16, 16), stride=(16, 16))
      )
      (dropout): Dropout(p=0.0, inplace=False)
    )
    (encoder): ViTEncoder(
      (layer): ModuleList(
        (0-11): 12 x ViTLayer(
          (attention): ViTSdpaAttention(
            (attention): ViTSdpaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.0, inplace=False)
            )
            (output): ViTSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.0, inplace=False)
            )
          )
          (intermediate): ViTIntermediate(
            (dense): Linear(in_features

## **Predictions for testing**

```java
This two last cells were just to show how to do predictions with the model in this case we need to preprocess the data with the transforms() function and then give it to the model
```

In [31]:
prediction_arr = []
for i in range(len(dataset["train"])):
    if i % 100 == 0:
        print(f"DONE {i}")
    pixel_values = dataset["train"][i]["pixel_values"]
    with torch.no_grad():
        outputs = model_2(pixel_values=pixel_values.reshape(1,3,224,224))
        logits = outputs if isinstance(outputs, torch.Tensor) else outputs[1]
        predictions = torch.argmax(logits, dim=-1)


    prediction_arr.append(predictions.item())


DONE 0
DONE 100
DONE 200
DONE 300
DONE 400
DONE 500
DONE 600
DONE 700
DONE 800
DONE 900
DONE 1000
DONE 1100
DONE 1200
DONE 1300
DONE 1400
DONE 1500
DONE 1600
DONE 1700
DONE 1800
DONE 1900
DONE 2000
DONE 2100
DONE 2200
DONE 2300
DONE 2400
DONE 2500
DONE 2600
DONE 2700
DONE 2800
DONE 2900
DONE 3000
DONE 3100
DONE 3200
DONE 3300
DONE 3400
DONE 3500
DONE 3600
DONE 3700
DONE 3800
DONE 3900
DONE 4000
DONE 4100
DONE 4200
DONE 4300
DONE 4400
DONE 4500
DONE 4600
DONE 4700
DONE 4800
DONE 4900
DONE 5000
DONE 5100
DONE 5200
DONE 5300
DONE 5400
DONE 5500
DONE 5600
DONE 5700
DONE 5800
DONE 5900
DONE 6000
DONE 6100
DONE 6200
DONE 6300
DONE 6400
DONE 6500
DONE 6600
DONE 6700
DONE 6800
DONE 6900
DONE 7000
DONE 7100
DONE 7200
DONE 7300
DONE 7400
DONE 7500
DONE 7600
DONE 7700
DONE 7800
DONE 7900
DONE 8000
DONE 8100
DONE 8200
DONE 8300
DONE 8400
DONE 8500
DONE 8600
DONE 8700
DONE 8800
DONE 8900
DONE 9000
DONE 9100
DONE 9200
DONE 9300
DONE 9400
DONE 9500
DONE 9600
DONE 9700
DONE 9800
DONE 9900
DONE 10000
D

## **Final accuracy**

```java
This is not the last test of the model we need a final phase to determine how good was the model
```

In [46]:
from sklearn.metrics import accuracy_score

accuracy_score(dataset["train"]["label"], prediction_arr)

0.9883571428571428