# Using PyTorch Lightning with Tune

(tune-vanilla-pytorch-lightning-ref)=

PyTorch Lightning is a framework which brings structure into training PyTorch models. It
aims to avoid boilerplate code, so you don't have to write the same training
loops all over again when building a new model.

```{image} /images/pytorch_lightning_full.png
:align: center
```

The main abstraction of PyTorch Lightning is the `LightningModule` class, which
should be extended by your application. There is [a great post on how to transfer your models from vanilla PyTorch to Lightning](https://towardsdatascience.com/from-pytorch-to-pytorch-lightning-a-gentle-introduction-b371b7caaf09).

The class structure of PyTorch Lightning makes it very easy to define and tune model
parameters. This tutorial will show you how to use Tune to find the best set of
parameters for your application on the example of training a MNIST classifier. Notably,
the `LightningModule` does not have to be altered at all for this - so you can
use it plug and play for your existing models, assuming their parameters are configurable!

:::{note}
To run this example, you will need to install the following:

```bash
$ pip install "ray[tune]" torch torchvision pytorch-lightning
```
:::

```{contents}
:backlinks: none
:local: true
```

## PyTorch Lightning classifier for MNIST

Let's first start with the basic PyTorch Lightning implementation of an MNIST classifier.
This classifier does not include any tuning code at this point.

Our example builds on the MNIST example from the [blog post we talked about
earlier](https://towardsdatascience.com/from-pytorch-to-pytorch-lightning-a-gentle-introduction-b371b7caaf09).

First, we run some imports:

In [2]:
import math

import torch
import pytorch_lightning as pl
from filelock import FileLock
from torch.utils.data import DataLoader, random_split
from torch.nn import functional as F
from torchvision.datasets import MNIST
from torchvision import transforms
import os

And then there is the Lightning model adapted from the blog post.
Note that we left out the test set validation and made the model parameters
configurable through a `config` dict that is passed on initialization.
Also, we specify a `data_dir` where the MNIST data will be stored. Note that
we use a `FileLock` for downloading data so that the dataset is only downloaded
once per node.
Lastly, we added a new metric, the validation accuracy, to the logs.

In [3]:
class LightningMNISTClassifier(pl.LightningModule):
    """
    This has been adapted from
    https://towardsdatascience.com/from-pytorch-to-pytorch-lightning-a-gentle-introduction-b371b7caaf09
    """

    def __init__(self, config, data_dir=None):
        super(LightningMNISTClassifier, self).__init__()

        self.data_dir = data_dir or os.getcwd()

        self.layer_1_size = config["layer_1_size"]
        self.layer_2_size = config["layer_2_size"]
        self.lr = config["lr"]
        self.batch_size = config["batch_size"]

        # mnist images are (1, 28, 28) (channels, width, height)
        self.layer_1 = torch.nn.Linear(28 * 28, self.layer_1_size)
        self.layer_2 = torch.nn.Linear(self.layer_1_size, self.layer_2_size)
        self.layer_3 = torch.nn.Linear(self.layer_2_size, 10)

    def forward(self, x):
        batch_size, channels, width, height = x.size()
        x = x.view(batch_size, -1)

        x = self.layer_1(x)
        x = torch.relu(x)

        x = self.layer_2(x)
        x = torch.relu(x)

        x = self.layer_3(x)
        x = torch.log_softmax(x, dim=1)

        return x

    def cross_entropy_loss(self, logits, labels):
        return F.nll_loss(logits, labels)

    def accuracy(self, logits, labels):
        _, predicted = torch.max(logits.data, 1)
        correct = (predicted == labels).sum().item()
        accuracy = correct / len(labels)
        return torch.tensor(accuracy)

    def training_step(self, train_batch, batch_idx):
        x, y = train_batch
        logits = self.forward(x)
        loss = self.cross_entropy_loss(logits, y)
        accuracy = self.accuracy(logits, y)

        self.log("ptl/train_loss", loss)
        self.log("ptl/train_accuracy", accuracy)
        return loss

    def validation_step(self, val_batch, batch_idx):
        x, y = val_batch
        logits = self.forward(x)
        loss = self.cross_entropy_loss(logits, y)
        accuracy = self.accuracy(logits, y)
        return {"val_loss": loss, "val_accuracy": accuracy}

    def validation_epoch_end(self, outputs):
        avg_loss = torch.stack([x["val_loss"] for x in outputs]).mean()
        avg_acc = torch.stack([x["val_accuracy"] for x in outputs]).mean()
        self.log("ptl/val_loss", avg_loss)
        self.log("ptl/val_accuracy", avg_acc)

    @staticmethod
    def download_data(data_dir):
        transform = transforms.Compose([
            transforms.ToTensor(),
            transforms.Normalize((0.1307, ), (0.3081, ))
        ])
        with FileLock(os.path.expanduser("~/.data.lock")):
            return MNIST(data_dir, train=True, download=True, transform=transform)

    def prepare_data(self):
        mnist_train = self.download_data(self.data_dir)

        self.mnist_train, self.mnist_val = random_split(
            mnist_train, [55000, 5000])

    def train_dataloader(self):
        return DataLoader(self.mnist_train, batch_size=int(self.batch_size))

    def val_dataloader(self):
        return DataLoader(self.mnist_val, batch_size=int(self.batch_size))

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=self.lr)
        return optimizer


def train_mnist(config):
    model = LightningMNISTClassifier(config)
    trainer = pl.Trainer(max_epochs=10, enable_progress_bar=False)

    trainer.fit(model)

And that's it! You can now run `train_mnist(config)` to train the classifier, e.g.
like so:

In [4]:
def train_mnist_no_tune():
    config = {
        "layer_1_size": 128,
        "layer_2_size": 256,
        "lr": 1e-3,
        "batch_size": 64
    }
    train_mnist(config)

## Tuning the model parameters

The parameters above should give you a good accuracy of over 90% already. However,
we might improve on this simply by changing some of the hyperparameters. For instance,
maybe we get an even higher accuracy if we used a larger batch size.

Instead of guessing the parameter values, let's use Tune to systematically try out
parameter combinations and find the best performing set.

First, we need some additional imports:

In [5]:
from pytorch_lightning.loggers import TensorBoardLogger
# import setproctitle
from ray import tune, air
from ray.air import session
from ray.tune import CLIReporter, JupyterNotebookReporter
from ray.tune.schedulers import ASHAScheduler, PopulationBasedTraining
from ray.tune.integration.pytorch_lightning import TuneReportCallback, \
    TuneReportCheckpointCallback

### Talking to Tune with a PyTorch Lightning callback

PyTorch Lightning introduced [Callbacks](https://pytorch-lightning.readthedocs.io/en/latest/extensions/callbacks.html)
that can be used to plug custom functions into the training loop. This way the original
`LightningModule` does not have to be altered at all. Also, we could use the same
callback for multiple modules.

Ray Tune comes with ready-to-use PyTorch Lightning callbacks. To report metrics
back to Tune after each validation epoch, we will use the `TuneReportCallback`:

In [6]:
TuneReportCallback(
    {
        "loss": "ptl/val_loss",
        "mean_accuracy": "ptl/val_accuracy"
    },
    on="validation_end")

<ray.tune.integration.pytorch_lightning.TuneReportCallback at 0x131a122b0>

This callback will take the `val_loss` and `val_accuracy` values
from the PyTorch Lightning trainer and report them to Tune as the `loss`
and `mean_accuracy`, respectively.

### Adding the Tune training function

Then we specify our training function. Note that we added the `data_dir` as a
parameter here to avoid
that each training run downloads the full MNIST dataset. Instead, we want to access
a shared data location.

We are also able to specify the number of epochs to train each model, and the number
of GPUs we want to use for training. We also create a TensorBoard logger that writes
logfiles directly into Tune's root trial directory - if we didn't do that PyTorch
Lightning would create subdirectories, and each trial would thus be shown twice in
TensorBoard, one time for Tune's logs, and another time for PyTorch Lightning's logs.

In [7]:
def train_mnist_tune(config, num_epochs=10, num_gpus=0, data_dir="~/data"):
    data_dir = os.path.expanduser(data_dir)
    model = LightningMNISTClassifier(config, data_dir)
    trainer = pl.Trainer(
        max_epochs=num_epochs,
        # If fractional GPUs passed in, convert to int.
        gpus=math.ceil(num_gpus),
        logger=TensorBoardLogger(
            save_dir=os.getcwd(), name="", version="."),
        enable_progress_bar=False,
        callbacks=[
            TuneReportCallback(
                {
                    "loss": "ptl/val_loss",
                    "mean_accuracy": "ptl/val_accuracy"
                },
                on="validation_end")
        ])
    trainer.fit(model)

### Configuring the search space

Now we configure the parameter search space. We would like to choose between three
different layer and batch sizes. The learning rate should be sampled uniformly between
`0.0001` and `0.1`. The `tune.loguniform()` function is syntactic sugar to make
sampling between these different orders of magnitude easier, specifically
we are able to also sample small values.

In [8]:
config = {
    "layer_1_size": tune.choice([32, 64, 128]),
    "layer_2_size": tune.choice([64, 128, 256]),
    "lr": tune.loguniform(1e-4, 1e-1),
    "batch_size": tune.choice([32, 64, 128]),
}

### Selecting a scheduler

In this example, we use an [Asynchronous Hyperband](https://blog.ml.cmu.edu/2018/12/12/massively-parallel-hyperparameter-optimization/)
scheduler. This scheduler decides at each iteration which trials are likely to perform
badly, and stops these trials. This way we don't waste any resources on bad hyperparameter
configurations.

In [9]:
num_epochs = 10

scheduler = ASHAScheduler(
    max_t=num_epochs,
    grace_period=1,
    reduction_factor=2)

### Changing the CLI output

We instantiate a `CLIReporter` to specify which metrics we would like to see in our
output tables in the command line. This is optional, but can be used to make sure our
output tables only include information we would like to see.

In [10]:
reporter = CLIReporter(
    parameter_columns=["layer_1_size", "layer_2_size", "lr", "batch_size"],
    metric_columns=["loss", "mean_accuracy", "training_iteration"])

### Passing constants to the train function

The `data_dir`, `num_epochs` and `num_gpus` we pass to the training function
are constants. To avoid including them as non-configurable parameters in the `config`
specification, we can use `tune.with_parameters` to wrap around the training function.

In [11]:
gpus_per_trial = 0
data_dir = "~/data"

train_fn_with_parameters = tune.with_parameters(train_mnist_tune,
                                                num_epochs=num_epochs,
                                                num_gpus=gpus_per_trial,
                                                data_dir=data_dir)

### Training with GPUs

We can specify how many resources Tune should request for each trial.
This also includes GPUs.

PyTorch Lightning takes care of moving the training to the GPUs. We
already made sure that our code is compatible with that, so there's
nothing more to do here other than to specify the number of GPUs
we would like to use:

In [12]:
resources_per_trial = {"cpu": 1, "gpu": gpus_per_trial}

You can also specify {ref}`fractional GPUs for Tune <tune-parallelism>`,
allowing multiple trials to share GPUs and thus increase concurrency under resource constraints.
While the `gpus_per_trial` passed into
Tune is a decimal value, the `gpus` passed into the `pl.Trainer` should still be an integer.
Please note that if using fractional GPUs, it is the user's responsibility to
make sure multiple trials can share GPUs and there is enough memory to do so.
Ray does not automatically handle this for you.

If you want to use multiple GPUs per trial, you should check out the
[Ray Lightning Library](https://github.com/ray-project/ray_lightning).
This library makes it easy to run multiple concurrent trials with Ray Tune, with each trial also running
in a distributed fashion using Ray.

### Putting it together

Lastly, we need to create a `Tuner()` object and start Ray Tune with `tuner.fit()`.

The full code looks like this:

In [13]:
def tune_mnist_asha(num_samples=10, num_epochs=10, gpus_per_trial=0, data_dir="~/data"):
    config = {
        "layer_1_size": tune.choice([32, 64, 128]),
        "layer_2_size": tune.choice([64, 128, 256]),
        "lr": tune.loguniform(1e-4, 1e-1),
        "batch_size": tune.choice([32, 64, 128]),
    }

    scheduler = ASHAScheduler(
        max_t=num_epochs,
        grace_period=1,
        reduction_factor=2)

    # reporter = CLIReporter(
    reporter = JupyterNotebookReporter(
        parameter_columns=["layer_1_size", "layer_2_size", "lr", "batch_size"],
        metric_columns=["loss", "mean_accuracy", "training_iteration"])

    train_fn_with_parameters = tune.with_parameters(train_mnist_tune,
                                                    num_epochs=num_epochs,
                                                    num_gpus=gpus_per_trial,
                                                    data_dir=data_dir)
    resources_per_trial = {"cpu": 1, "gpu": gpus_per_trial}
    
    tuner = tune.Tuner(
        tune.with_resources(
            train_fn_with_parameters,
            resources=resources_per_trial
        ),
        tune_config=tune.TuneConfig(
            metric="loss",
            mode="min",
            scheduler=scheduler,
            num_samples=num_samples,
        ),
        run_config=air.RunConfig(
            local_dir="./runs",
            name="tune_mnist_asha",
            progress_reporter=reporter,
            log_to_file=True,
        ),
        param_space=config,
    )
    results = tuner.fit()

    print("Best hyperparameters found were: ", results.get_best_result().config)

In the example above, Tune runs 10 trials with different hyperparameter configurations.
An example output could look like so:

```{code-block} bash
:emphasize-lines: 12

  +------------------------------+------------+-------+----------------+----------------+-------------+--------------+----------+-----------------+----------------------+
  | Trial name                   | status     | loc   |   layer_1_size |   layer_2_size |          lr |   batch_size |     loss |   mean_accuracy |   training_iteration |
  |------------------------------+------------+-------+----------------+----------------+-------------+--------------+----------+-----------------+----------------------|
  | train_mnist_tune_63ecc_00000 | TERMINATED |       |            128 |             64 | 0.00121197  |          128 | 0.120173 |       0.972461  |                   10 |
  | train_mnist_tune_63ecc_00001 | TERMINATED |       |             64 |            128 | 0.0301395   |          128 | 0.454836 |       0.868164  |                    4 |
  | train_mnist_tune_63ecc_00002 | TERMINATED |       |             64 |            128 | 0.0432097   |          128 | 0.718396 |       0.718359  |                    1 |
  | train_mnist_tune_63ecc_00003 | TERMINATED |       |             32 |            128 | 0.000294669 |           32 | 0.111475 |       0.965764  |                   10 |
  | train_mnist_tune_63ecc_00004 | TERMINATED |       |             32 |            256 | 0.000386664 |           64 | 0.133538 |       0.960839  |                    8 |
  | train_mnist_tune_63ecc_00005 | TERMINATED |       |            128 |            128 | 0.0837395   |           32 | 2.32628  |       0.0991242 |                    1 |
  | train_mnist_tune_63ecc_00006 | TERMINATED |       |             64 |            128 | 0.000158761 |          128 | 0.134595 |       0.959766  |                   10 |
  | train_mnist_tune_63ecc_00007 | TERMINATED |       |             64 |             64 | 0.000672126 |           64 | 0.118182 |       0.972903  |                   10 |
  | train_mnist_tune_63ecc_00008 | TERMINATED |       |            128 |             64 | 0.000502428 |           32 | 0.11082  |       0.975518  |                   10 |
  | train_mnist_tune_63ecc_00009 | TERMINATED |       |             64 |            256 | 0.00112894  |           32 | 0.13472  |       0.971935  |                    8 |
  +------------------------------+------------+-------+----------------+----------------+-------------+--------------+----------+-----------------+----------------------+
```

As you can see in the `training_iteration` column, trials with a high loss
(and low accuracy) have been terminated early. The best performing trial used
`layer_1_size=128`, `layer_2_size=64`, `lr=0.000502428` and
`batch_size=32`.

## Using Population Based Training to find the best parameters

The `ASHAScheduler` terminates those trials early that show bad performance.
Sometimes, this stops trials that would get better after more training steps,
and which might eventually even show better performance than other configurations.

Another popular method for hyperparameter tuning, called
[Population Based Training](https://deepmind.com/blog/article/population-based-training-neural-networks),
instead perturbs hyperparameters during the training run. Tune implements PBT, and
we only need to make some slight adjustments to our code.

### Adding checkpoints to the PyTorch Lightning module

First, we need to introduce
another callback to save model checkpoints. Since Tune requires a call to
`session.report()` after creating a new checkpoint to register it, we will use
a combined reporting and checkpointing callback:

In [14]:
TuneReportCheckpointCallback(
    metrics={
        "loss": "ptl/val_loss",
        "mean_accuracy": "ptl/val_accuracy"
    },
    filename="checkpoint",
    on="validation_end")

<ray.tune.integration.pytorch_lightning.TuneReportCheckpointCallback at 0x1121d04f0>

The `checkpoint` value is the name of the checkpoint file within the
checkpoint directory.

We also include checkpoint loading in our training function:

In [15]:
def train_mnist_tune_checkpoint(config,
                                checkpoint_dir=None,
                                num_epochs=10,
                                num_gpus=0,
                                data_dir="~/data"):
    data_dir = os.path.expanduser(data_dir)
    kwargs = {
        "max_epochs": num_epochs,
        # If fractional GPUs passed in, convert to int.
        "gpus": math.ceil(num_gpus),
        "logger": TensorBoardLogger(
            save_dir=os.getcwd(), name="", version="."),
        "enable_progress_bar": False,
        "callbacks": [
            TuneReportCheckpointCallback(
                metrics={
                    "loss": "ptl/val_loss",
                    "mean_accuracy": "ptl/val_accuracy"
                },
                filename="checkpoint",
                on="validation_end")
        ]
    }

    if checkpoint_dir:
        kwargs["resume_from_checkpoint"] = os.path.join(
            checkpoint_dir, "checkpoint")

    model = LightningMNISTClassifier(config=config, data_dir=data_dir)
    trainer = pl.Trainer(**kwargs)

    trainer.fit(model)

### Configuring and running Population Based Training

We need to call Tune slightly differently:

In [16]:
def tune_mnist_pbt(num_samples=10, num_epochs=10, gpus_per_trial=0, data_dir="~/data"):
    config = {
        "layer_1_size": tune.choice([32, 64, 128]),
        "layer_2_size": tune.choice([64, 128, 256]),
        "lr": 1e-3,
        "batch_size": 64,
    }

    scheduler = PopulationBasedTraining(
        perturbation_interval=4,
        hyperparam_mutations={
            "lr": tune.loguniform(1e-4, 1e-1),
            "batch_size": [32, 64, 128]
        })

    reporter = CLIReporter(
        parameter_columns=["layer_1_size", "layer_2_size", "lr", "batch_size"],
        metric_columns=["loss", "mean_accuracy", "training_iteration"])
    
    tuner = tune.Tuner(
        tune.with_resources(
            tune.with_parameters(
                train_mnist_tune_checkpoint,
                num_epochs=num_epochs,
                num_gpus=gpus_per_trial,
                data_dir=data_dir),
            resources={
                "cpu": 1,
                "gpu": gpus_per_trial
            }
        ),
        tune_config=tune.TuneConfig(
            metric="loss",
            mode="min",
            scheduler=scheduler,
            num_samples=num_samples,
        ),
        run_config=air.RunConfig(
            name="tune_mnist_asha",
            progress_reporter=reporter,
        ),
        param_space=config,
    )
    results = tuner.fit()

    print("Best hyperparameters found were: ", results.get_best_result().config)

Instead of passing tune parameters to the `config` dict, we start
with fixed values, though we are also able to sample some of them, like the
layer sizes. Additionally, we have to tell PBT how to perturb the hyperparameters.
Note that the layer sizes are not tuned right here. This is because we cannot simply
change layer sizes during a training run - which is what would happen in PBT.

To test running both of our main scripts (`tune_mnist_asha` and `tune_mnist_pbt`), all you have to do is specify
a `data_dir` folder and run the scripts with reasonable parameters:

In [17]:
!mkdir -p data
data_dir = "data/"

tune_mnist_asha(num_samples=16, num_epochs=6, gpus_per_trial=0, data_dir=data_dir)


0,1
Current time:,2023-04-15 07:02:33
Running for:,00:00:59.81
Memory:,10.9/16.0 GiB

Trial name,status,loc,layer_1_size,layer_2_size,lr,batch_size
train_mnist_tune_977e2_00000,RUNNING,127.0.0.1:57597,128,128,0.0644753,64
train_mnist_tune_977e2_00001,RUNNING,127.0.0.1:57609,128,128,0.00428978,32
train_mnist_tune_977e2_00002,RUNNING,127.0.0.1:57618,32,128,0.00812826,64
train_mnist_tune_977e2_00003,RUNNING,127.0.0.1:57628,32,256,0.000237569,64
train_mnist_tune_977e2_00004,RUNNING,127.0.0.1:57635,64,128,0.000176657,32
train_mnist_tune_977e2_00005,RUNNING,127.0.0.1:57646,64,64,0.0190303,32
train_mnist_tune_977e2_00006,RUNNING,127.0.0.1:57654,32,256,0.00155692,32
train_mnist_tune_977e2_00007,RUNNING,127.0.0.1:57661,32,64,0.00172033,32
train_mnist_tune_977e2_00008,PENDING,,64,128,0.00183517,128
train_mnist_tune_977e2_00009,PENDING,,64,64,0.0279331,64


2023-04-15 07:01:31,743	INFO worker.py:1553 -- Started a local Ray instance.
[2m[36m(train_mnist_tune pid=57597)[0m   rank_zero_deprecation(
[2m[36m(train_mnist_tune pid=57597)[0m GPU available: False, used: False
[2m[36m(train_mnist_tune pid=57597)[0m TPU available: False, using: 0 TPU cores
[2m[36m(train_mnist_tune pid=57597)[0m IPU available: False, using: 0 IPUs
[2m[36m(train_mnist_tune pid=57597)[0m HPU available: False, using: 0 HPUs


[2m[36m(train_mnist_tune pid=57597)[0m Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
[2m[36m(train_mnist_tune pid=57597)[0m Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]
  1%|▏         | 131072/9912422 [00:00<00:10, 915193.49it/s]
  4%|▍         | 393216/9912422 [00:00<00:05, 1725501.44it/s]
  9%|▊         | 851968/9912422 [00:00<00:03, 2891697.86it/s]
 30%|███       | 2981888/9912422 [00:00<00:01, 5207016.40it/s]
 37%|███▋      | 3670016/9912422 [00:00<00:01, 4819473.44it/s]
 47%|████▋     | 4620288/9912422 [00:01<00:00, 5877791.94it/s]
 54%|█████▍    | 5341184/9912422 [00:01<00:00, 5581754.83it/s]
 63%|██████▎   | 6258688/9912422 [00:01<00:00, 5662667.63it/s]
 74%|███████▍  | 7372800/9912422 [00:01<00:00, 6913254.33it/s]
 82%|████████▏ | 8159232/9912422 [00:01<00:00, 7079233.46it/s]
 90%|█████████ | 8945664/9912422 [00:01<00:00, 7044864.56it/s]
100%|██████████| 9912422/9912422 [00:01<00:00, 5611378.97it/s]


[2m[36m(train_mnist_tune pid=57597)[0m Extracting data/MNIST/raw/train-images-idx3-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57597)[0m 
[2m[36m(train_mnist_tune pid=57597)[0m Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
[2m[36m(train_mnist_tune pid=57597)[0m Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to data/MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s] 
100%|██████████| 28881/28881 [00:00<00:00, 737346.42it/s]


[2m[36m(train_mnist_tune pid=57597)[0m Extracting data/MNIST/raw/train-labels-idx1-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57597)[0m 
[2m[36m(train_mnist_tune pid=57597)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
[2m[36m(train_mnist_tune pid=57597)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]
  8%|▊         | 131072/1648877 [00:00<00:01, 931134.56it/s]
 24%|██▍       | 393216/1648877 [00:00<00:00, 1749457.93it/s]
 64%|██████▎   | 1048576/1648877 [00:00<00:00, 3706815.30it/s]
100%|██████████| 1648877/1648877 [00:00<00:00, 3522669.47it/s]


[2m[36m(train_mnist_tune pid=57597)[0m Extracting data/MNIST/raw/t10k-images-idx3-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57597)[0m 
[2m[36m(train_mnist_tune pid=57597)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
[2m[36m(train_mnist_tune pid=57597)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to data/MNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]m 
100%|██████████| 4542/4542 [00:00<00:00, 102099.43it/s]
[2m[36m(train_mnist_tune pid=57597)[0m 
[2m[36m(train_mnist_tune pid=57597)[0m   | Name    | Type   | Params
[2m[36m(train_mnist_tune pid=57597)[0m -----------------------------------
[2m[36m(train_mnist_tune pid=57597)[0m 0 | layer_1 | Linear | 100 K 
[2m[36m(train_mnist_tune pid=57597)[0m 1 | layer_2 | Linear | 16.5 K
[2m[36m(train_mnist_tune pid=57597)[0m 2 | layer_3 | Linear | 1.3 K 
[2m[36m(train_mnist_tune pid=57597)[0m -----------------------------------
[2m[36m(train_mnist_tune pid=57597)[0m 118 K     Trainable params
[2m[36m(train_mnist_tune pid=57597)[0m 0         Non-trainable params
[2m[36m(train_mnist_tune pid=57597)[0m 118 K     Total params
[2m[36m(train_mnist_tune pid=57597)[0m 0.473     Total estimated model params size (MB)


[2m[36m(train_mnist_tune pid=57597)[0m Extracting data/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57597)[0m 


[2m[36m(train_mnist_tune pid=57597)[0m   rank_zero_warn(
[2m[36m(train_mnist_tune pid=57597)[0m   rank_zero_warn(


[2m[36m(train_mnist_tune pid=57609)[0m Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz


[2m[36m(train_mnist_tune pid=57609)[0m   rank_zero_deprecation(
[2m[36m(train_mnist_tune pid=57609)[0m GPU available: False, used: False
[2m[36m(train_mnist_tune pid=57609)[0m TPU available: False, using: 0 TPU cores
[2m[36m(train_mnist_tune pid=57609)[0m IPU available: False, using: 0 IPUs
[2m[36m(train_mnist_tune pid=57609)[0m HPU available: False, using: 0 HPUs


[2m[36m(train_mnist_tune pid=57609)[0m Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]
  1%|          | 98304/9912422 [00:00<00:14, 695577.64it/s]
  3%|▎         | 262144/9912422 [00:00<00:08, 1147967.77it/s]
  5%|▌         | 524288/9912422 [00:00<00:05, 1686490.68it/s]
  9%|▊         | 851968/9912422 [00:00<00:04, 2259684.58it/s]
 15%|█▍        | 1441792/9912422 [00:00<00:02, 3476519.37it/s]
 18%|█▊        | 1802240/9912422 [00:00<00:02, 2875819.93it/s]
 29%|██▉       | 2883584/9912422 [00:00<00:01, 5058356.04it/s]
 35%|███▌      | 3473408/9912422 [00:00<00:01, 4544648.09it/s]
 40%|████      | 3997696/9912422 [00:01<00:01, 3672083.44it/s]
 51%|█████     | 5079040/9912422 [00:01<00:00, 5248103.95it/s]
 58%|█████▊    | 5701632/9912422 [00:01<00:01, 2322631.98it/s]
 65%|██████▍   | 6422528/9912422 [00:02<00:01, 2932961.93it/s]
 70%|███████   | 6979584/9912422 [00:02<00:00, 3042612.04it/s]
 77%|███████▋  | 7634944/9912422 [00:02<00:00, 3619644.77it/s]
 83%|████████▎ | 8192000/9912422 [00:02<00:00, 3768203.97it/s]
 89%|████████▉ | 

[2m[36m(train_mnist_tune pid=57609)[0m Extracting data/MNIST/raw/train-images-idx3-ubyte.gz to data/MNIST/raw


100%|██████████| 9912422/9912422 [00:02<00:00, 3636436.98it/s]


[2m[36m(train_mnist_tune pid=57609)[0m 
[2m[36m(train_mnist_tune pid=57609)[0m Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
[2m[36m(train_mnist_tune pid=57609)[0m Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to data/MNIST/raw/train-labels-idx1-ubyte.gz
[2m[36m(train_mnist_tune pid=57609)[0m Extracting data/MNIST/raw/train-labels-idx1-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57609)[0m 
[2m[36m(train_mnist_tune pid=57609)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 658765.05it/s]


[2m[36m(train_mnist_tune pid=57609)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]
  6%|▌         | 98304/1648877 [00:00<00:01, 876783.02it/s]
 42%|████▏     | 688128/1648877 [00:00<00:00, 2448610.03it/s]
 83%|████████▎ | 1376256/1648877 [00:00<00:00, 4058955.58it/s]


[2m[36m(train_mnist_tune pid=57609)[0m Extracting data/MNIST/raw/t10k-images-idx3-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57609)[0m 
[2m[36m(train_mnist_tune pid=57609)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 2998410.76it/s]


[2m[36m(train_mnist_tune pid=57609)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to data/MNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]m 
100%|██████████| 4542/4542 [00:00<00:00, 114743.56it/s]
[2m[36m(train_mnist_tune pid=57609)[0m 
[2m[36m(train_mnist_tune pid=57609)[0m   | Name    | Type   | Params
[2m[36m(train_mnist_tune pid=57609)[0m -----------------------------------
[2m[36m(train_mnist_tune pid=57609)[0m 0 | layer_1 | Linear | 100 K 
[2m[36m(train_mnist_tune pid=57609)[0m 1 | layer_2 | Linear | 16.5 K
[2m[36m(train_mnist_tune pid=57609)[0m 2 | layer_3 | Linear | 1.3 K 
[2m[36m(train_mnist_tune pid=57609)[0m -----------------------------------
[2m[36m(train_mnist_tune pid=57609)[0m 118 K     Trainable params
[2m[36m(train_mnist_tune pid=57609)[0m 0         Non-trainable params
[2m[36m(train_mnist_tune pid=57609)[0m 118 K     Total params
[2m[36m(train_mnist_tune pid=57609)[0m 0.473     Total estimated model params size (MB)


[2m[36m(train_mnist_tune pid=57609)[0m Extracting data/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57609)[0m 


[2m[36m(train_mnist_tune pid=57609)[0m   rank_zero_warn(
[2m[36m(train_mnist_tune pid=57609)[0m   rank_zero_warn(
[2m[36m(train_mnist_tune pid=57618)[0m   rank_zero_deprecation(
[2m[36m(train_mnist_tune pid=57618)[0m GPU available: False, used: False
[2m[36m(train_mnist_tune pid=57618)[0m TPU available: False, using: 0 TPU cores
[2m[36m(train_mnist_tune pid=57618)[0m IPU available: False, using: 0 IPUs
[2m[36m(train_mnist_tune pid=57618)[0m HPU available: False, using: 0 HPUs


[2m[36m(train_mnist_tune pid=57618)[0m Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
[2m[36m(train_mnist_tune pid=57618)[0m Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]
  1%|          | 98304/9912422 [00:00<00:11, 865313.86it/s]
  3%|▎         | 294912/9912422 [00:00<00:07, 1355131.94it/s]
  7%|▋         | 655360/9912422 [00:00<00:04, 2266395.57it/s]
 16%|█▌        | 1572864/9912422 [00:00<00:01, 4754257.38it/s]
 21%|██        | 2064384/9912422 [00:00<00:02, 3636040.88it/s]
 34%|███▍      | 3407872/9912422 [00:00<00:01, 5901976.51it/s]
 41%|████      | 4063232/9912422 [00:00<00:01, 4720222.06it/s]
 47%|████▋     | 4620288/9912422 [00:01<00:01, 3353891.11it/s]
 59%|█████▉    | 5865472/9912422 [00:01<00:01, 3095785.07it/s]
 87%|████████▋ | 8585216/9912422 [00:01<00:00, 6424655.18it/s]
 98%|█████████▊| 9699328/9912422 [00:02<00:00, 6249566.20it/s]
100%|██████████| 9912422/9912422 [00:02<00:00, 4784936.60it/s]


[2m[36m(train_mnist_tune pid=57618)[0m Extracting data/MNIST/raw/train-images-idx3-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57618)[0m 
[2m[36m(train_mnist_tune pid=57618)[0m Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
[2m[36m(train_mnist_tune pid=57618)[0m Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to data/MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s] 
100%|██████████| 28881/28881 [00:00<00:00, 617368.34it/s]


[2m[36m(train_mnist_tune pid=57618)[0m Extracting data/MNIST/raw/train-labels-idx1-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57618)[0m 
[2m[36m(train_mnist_tune pid=57618)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
[2m[36m(train_mnist_tune pid=57618)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]
  6%|▌         | 98304/1648877 [00:00<00:01, 843240.67it/s]
 18%|█▊        | 294912/1648877 [00:00<00:00, 1397280.96it/s]
 46%|████▌     | 753664/1648877 [00:00<00:00, 2740250.63it/s]


[2m[36m(train_mnist_tune pid=57618)[0m Extracting data/MNIST/raw/t10k-images-idx3-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57618)[0m 
[2m[36m(train_mnist_tune pid=57618)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 3018628.14it/s]


[2m[36m(train_mnist_tune pid=57618)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to data/MNIST/raw/t10k-labels-idx1-ubyte.gz
[2m[36m(train_mnist_tune pid=57618)[0m Extracting data/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57618)[0m 


100%|██████████| 4542/4542 [00:00<00:00, 97377.92it/s]
[2m[36m(train_mnist_tune pid=57618)[0m 
[2m[36m(train_mnist_tune pid=57618)[0m   | Name    | Type   | Params
[2m[36m(train_mnist_tune pid=57618)[0m -----------------------------------
[2m[36m(train_mnist_tune pid=57618)[0m 0 | layer_1 | Linear | 25.1 K
[2m[36m(train_mnist_tune pid=57618)[0m 1 | layer_2 | Linear | 4.2 K 
[2m[36m(train_mnist_tune pid=57618)[0m 2 | layer_3 | Linear | 1.3 K 
[2m[36m(train_mnist_tune pid=57618)[0m -----------------------------------
[2m[36m(train_mnist_tune pid=57618)[0m 30.6 K    Trainable params
[2m[36m(train_mnist_tune pid=57618)[0m 0         Non-trainable params
[2m[36m(train_mnist_tune pid=57618)[0m 30.6 K    Total params
[2m[36m(train_mnist_tune pid=57618)[0m 0.123     Total estimated model params size (MB)
[2m[36m(train_mnist_tune pid=57618)[0m   rank_zero_warn(
[2m[36m(train_mnist_tune pid=57618)[0m   rank_zero_warn(
[2m[36m(train_mnist_tune pid=57628)[0

[2m[36m(train_mnist_tune pid=57628)[0m Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
[2m[36m(train_mnist_tune pid=57628)[0m Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]
  1%|          | 98304/9912422 [00:00<00:11, 858571.33it/s]
  3%|▎         | 327680/9912422 [00:00<00:06, 1576442.32it/s]
  7%|▋         | 720896/9912422 [00:00<00:04, 2178632.79it/s]
 11%|█         | 1081344/9912422 [00:00<00:03, 2655164.57it/s]
 19%|█▊        | 1835008/9912422 [00:00<00:01, 4169449.09it/s]
 23%|██▎       | 2293760/9912422 [00:00<00:02, 3739501.46it/s]
 33%|███▎      | 3309568/9912422 [00:00<00:01, 5477285.65it/s]
 40%|███▉      | 3932160/9912422 [00:00<00:01, 5543715.89it/s]
 46%|████▌     | 4521984/9912422 [00:01<00:01, 4536809.80it/s]
 59%|█████▊    | 5799936/9912422 [00:01<00:00, 4988170.10it/s]
 77%|███████▋  | 7634944/9912422 [00:01<00:00, 7751462.94it/s]
 86%|████████▋ | 8552448/9912422 [00:01<00:00, 7593067.99it/s]
 95%|█████████▍| 9404416/9912422 [00:01<00:00, 7382134.00it/s]
100%|██████████| 9912422/9912422 [00:01<00:00, 5553137.27it/s]


[2m[36m(train_mnist_tune pid=57628)[0m Extracting data/MNIST/raw/train-images-idx3-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57628)[0m 
[2m[36m(train_mnist_tune pid=57628)[0m Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
[2m[36m(train_mnist_tune pid=57628)[0m Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to data/MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s] 
100%|██████████| 28881/28881 [00:00<00:00, 632367.54it/s]


[2m[36m(train_mnist_tune pid=57628)[0m Extracting data/MNIST/raw/train-labels-idx1-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57628)[0m 
[2m[36m(train_mnist_tune pid=57628)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
[2m[36m(train_mnist_tune pid=57628)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]
  4%|▍         | 65536/1648877 [00:00<00:02, 641271.70it/s]
 12%|█▏        | 196608/1648877 [00:00<00:01, 858987.99it/s]
 32%|███▏      | 524288/1648877 [00:00<00:00, 1799898.55it/s]
 64%|██████▎   | 1048576/1648877 [00:00<00:00, 3037657.12it/s]
 83%|████████▎ | 1376256/1648877 [00:00<00:00, 2718812.78it/s]
100%|██████████| 1648877/1648877 [00:00<00:00, 2537806.76it/s]


[2m[36m(train_mnist_tune pid=57628)[0m Extracting data/MNIST/raw/t10k-images-idx3-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57628)[0m 
[2m[36m(train_mnist_tune pid=57628)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
[2m[36m(train_mnist_tune pid=57628)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to data/MNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]m 
100%|██████████| 4542/4542 [00:00<00:00, 89636.89it/s]
[2m[36m(train_mnist_tune pid=57628)[0m 
[2m[36m(train_mnist_tune pid=57628)[0m   | Name    | Type   | Params
[2m[36m(train_mnist_tune pid=57628)[0m -----------------------------------
[2m[36m(train_mnist_tune pid=57628)[0m 0 | layer_1 | Linear | 25.1 K
[2m[36m(train_mnist_tune pid=57628)[0m 1 | layer_2 | Linear | 8.4 K 
[2m[36m(train_mnist_tune pid=57628)[0m 2 | layer_3 | Linear | 2.6 K 
[2m[36m(train_mnist_tune pid=57628)[0m -----------------------------------
[2m[36m(train_mnist_tune pid=57628)[0m 36.1 K    Trainable params
[2m[36m(train_mnist_tune pid=57628)[0m 0         Non-trainable params
[2m[36m(train_mnist_tune pid=57628)[0m 36.1 K    Total params
[2m[36m(train_mnist_tune pid=57628)[0m 0.145     Total estimated model params size (MB)


[2m[36m(train_mnist_tune pid=57628)[0m Extracting data/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57628)[0m 


[2m[36m(train_mnist_tune pid=57628)[0m   rank_zero_warn(
[2m[36m(train_mnist_tune pid=57628)[0m   rank_zero_warn(
[2m[36m(train_mnist_tune pid=57635)[0m   rank_zero_deprecation(
[2m[36m(train_mnist_tune pid=57635)[0m GPU available: False, used: False
[2m[36m(train_mnist_tune pid=57635)[0m TPU available: False, using: 0 TPU cores
[2m[36m(train_mnist_tune pid=57635)[0m IPU available: False, using: 0 IPUs
[2m[36m(train_mnist_tune pid=57635)[0m HPU available: False, using: 0 HPUs


[2m[36m(train_mnist_tune pid=57635)[0m Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
[2m[36m(train_mnist_tune pid=57635)[0m Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]
  1%|          | 65536/9912422 [00:00<00:19, 514758.42it/s]
  2%|▏         | 163840/9912422 [00:00<00:15, 616675.29it/s]
  4%|▎         | 360448/9912422 [00:00<00:08, 1114840.18it/s]
  6%|▌         | 557056/9912422 [00:00<00:07, 1242428.30it/s]
 10%|▉         | 983040/9912422 [00:00<00:04, 2159802.29it/s]
 13%|█▎        | 1310720/9912422 [00:00<00:04, 1926071.03it/s]
 20%|█▉        | 1933312/9912422 [00:00<00:03, 2645245.51it/s]
 25%|██▌       | 2523136/9912422 [00:01<00:02, 3365616.10it/s]
 29%|██▉       | 2916352/9912422 [00:01<00:02, 3380089.87it/s]
 37%|███▋      | 3637248/9912422 [00:01<00:01, 4324198.52it/s]
 42%|████▏     | 4128768/9912422 [00:01<00:01, 3956839.97it/s]
 49%|████▉     | 4849664/9912422 [00:01<00:01, 4578872.38it/s]
 54%|█████▍    | 5341184/9912422 [00:01<00:01, 2931913.45it/s]
 58%|█████▊    | 5734400/9912422 [00:02<00:01, 2110796.24it/s]
 66%|██████▌   | 6553600/9912422 [00:02<00:01, 3042039.37it/s]
 71%|███████▏  | 70

[2m[36m(train_mnist_tune pid=57635)[0m Extracting data/MNIST/raw/train-images-idx3-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57635)[0m 
[2m[36m(train_mnist_tune pid=57635)[0m Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
[2m[36m(train_mnist_tune pid=57635)[0m Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to data/MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s] 
100%|██████████| 28881/28881 [00:00<00:00, 802738.80it/s]


[2m[36m(train_mnist_tune pid=57635)[0m Extracting data/MNIST/raw/train-labels-idx1-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57635)[0m 
[2m[36m(train_mnist_tune pid=57635)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
[2m[36m(train_mnist_tune pid=57635)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]
  4%|▍         | 65536/1648877 [00:00<00:02, 571129.77it/s]
 12%|█▏        | 196608/1648877 [00:00<00:01, 965676.58it/s]
 30%|██▉       | 491520/1648877 [00:00<00:00, 1785381.00it/s]
 54%|█████▎    | 884736/1648877 [00:00<00:00, 2577142.55it/s]


[2m[36m(train_mnist_tune pid=57635)[0m Extracting data/MNIST/raw/t10k-images-idx3-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57635)[0m 
[2m[36m(train_mnist_tune pid=57635)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 2634489.81it/s]


[2m[36m(train_mnist_tune pid=57635)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to data/MNIST/raw/t10k-labels-idx1-ubyte.gz
[2m[36m(train_mnist_tune pid=57635)[0m Extracting data/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57635)[0m 


100%|██████████| 4542/4542 [00:00<00:00, 103416.89it/s]
[2m[36m(train_mnist_tune pid=57635)[0m 
[2m[36m(train_mnist_tune pid=57635)[0m   | Name    | Type   | Params
[2m[36m(train_mnist_tune pid=57635)[0m -----------------------------------
[2m[36m(train_mnist_tune pid=57635)[0m 0 | layer_1 | Linear | 50.2 K
[2m[36m(train_mnist_tune pid=57635)[0m 1 | layer_2 | Linear | 8.3 K 
[2m[36m(train_mnist_tune pid=57635)[0m 2 | layer_3 | Linear | 1.3 K 
[2m[36m(train_mnist_tune pid=57635)[0m -----------------------------------
[2m[36m(train_mnist_tune pid=57635)[0m 59.9 K    Trainable params
[2m[36m(train_mnist_tune pid=57635)[0m 0         Non-trainable params
[2m[36m(train_mnist_tune pid=57635)[0m 59.9 K    Total params
[2m[36m(train_mnist_tune pid=57635)[0m 0.239     Total estimated model params size (MB)
[2m[36m(train_mnist_tune pid=57635)[0m   rank_zero_warn(
[2m[36m(train_mnist_tune pid=57635)[0m   rank_zero_warn(
[2m[36m(train_mnist_tune pid=57646)[

[2m[36m(train_mnist_tune pid=57646)[0m Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
[2m[36m(train_mnist_tune pid=57646)[0m Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]
  1%|          | 98304/9912422 [00:00<00:11, 881388.06it/s]
  3%|▎         | 262144/9912422 [00:00<00:08, 1201388.27it/s]
  5%|▌         | 524288/9912422 [00:00<00:05, 1697631.12it/s]
  8%|▊         | 753664/9912422 [00:00<00:04, 1901384.87it/s]
 15%|█▍        | 1441792/9912422 [00:00<00:02, 3583522.19it/s]
 19%|█▊        | 1835008/9912422 [00:00<00:02, 3198025.86it/s]
 31%|███       | 3047424/9912422 [00:00<00:01, 5151887.47it/s]
 36%|███▌      | 3571712/9912422 [00:00<00:01, 4688955.99it/s]
 41%|████      | 4063232/9912422 [00:01<00:01, 4671659.58it/s]
 46%|████▌     | 4554752/9912422 [00:01<00:01, 4025326.98it/s]
 53%|█████▎    | 5242880/9912422 [00:01<00:01, 3861766.87it/s]
 67%|██████▋   | 6619136/9912422 [00:01<00:00, 5610160.97it/s]
 73%|███████▎  | 7208960/9912422 [00:01<00:00, 5268007.58it/s]
 78%|███████▊  | 7766016/9912422 [00:01<00:00, 4654688.83it/s]
 91%|█████████ | 9011200/9912422 [00:01<00:00, 6338651.78it/s]
 98%|█████████▊| 

[2m[36m(train_mnist_tune pid=57646)[0m Extracting data/MNIST/raw/train-images-idx3-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57646)[0m 
[2m[36m(train_mnist_tune pid=57646)[0m Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
[2m[36m(train_mnist_tune pid=57646)[0m Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to data/MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s] 
100%|██████████| 28881/28881 [00:00<00:00, 814308.34it/s]


[2m[36m(train_mnist_tune pid=57646)[0m Extracting data/MNIST/raw/train-labels-idx1-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57646)[0m 
[2m[36m(train_mnist_tune pid=57646)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
[2m[36m(train_mnist_tune pid=57646)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]
  4%|▍         | 65536/1648877 [00:00<00:02, 635162.27it/s]
 14%|█▍        | 229376/1648877 [00:00<00:01, 1158193.53it/s]
 22%|██▏       | 360448/1648877 [00:00<00:01, 1200580.55it/s]
 48%|████▊     | 786432/1648877 [00:00<00:00, 2366476.33it/s]
 85%|████████▌ | 1409024/1648877 [00:00<00:00, 3705546.34it/s]
100%|██████████| 1648877/1648877 [00:00<00:00, 2476266.85it/s]


[2m[36m(train_mnist_tune pid=57646)[0m Extracting data/MNIST/raw/t10k-images-idx3-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57646)[0m 
[2m[36m(train_mnist_tune pid=57646)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
[2m[36m(train_mnist_tune pid=57646)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to data/MNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]m 
100%|██████████| 4542/4542 [00:00<00:00, 87997.68it/s]


[2m[36m(train_mnist_tune pid=57646)[0m Extracting data/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57646)[0m 


[2m[36m(train_mnist_tune pid=57646)[0m 
[2m[36m(train_mnist_tune pid=57646)[0m   | Name    | Type   | Params
[2m[36m(train_mnist_tune pid=57646)[0m -----------------------------------
[2m[36m(train_mnist_tune pid=57646)[0m 0 | layer_1 | Linear | 50.2 K
[2m[36m(train_mnist_tune pid=57646)[0m 1 | layer_2 | Linear | 4.2 K 
[2m[36m(train_mnist_tune pid=57646)[0m 2 | layer_3 | Linear | 650   
[2m[36m(train_mnist_tune pid=57646)[0m -----------------------------------
[2m[36m(train_mnist_tune pid=57646)[0m 55.1 K    Trainable params
[2m[36m(train_mnist_tune pid=57646)[0m 0         Non-trainable params
[2m[36m(train_mnist_tune pid=57646)[0m 55.1 K    Total params
[2m[36m(train_mnist_tune pid=57646)[0m 0.220     Total estimated model params size (MB)
[2m[36m(train_mnist_tune pid=57646)[0m   rank_zero_warn(
[2m[36m(train_mnist_tune pid=57646)[0m   rank_zero_warn(
[2m[36m(train_mnist_tune pid=57654)[0m   rank_zero_deprecation(
[2m[36m(train_mnist_tune p

[2m[36m(train_mnist_tune pid=57654)[0m Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
[2m[36m(train_mnist_tune pid=57654)[0m Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]
  1%|          | 65536/9912422 [00:00<00:22, 447396.80it/s]
  2%|▏         | 229376/9912422 [00:00<00:10, 905377.43it/s]
  5%|▍         | 458752/9912422 [00:00<00:06, 1426149.36it/s]
 10%|▉         | 950272/9912422 [00:00<00:03, 2650893.90it/s]
 13%|█▎        | 1245184/9912422 [00:00<00:03, 2538110.17it/s]
 20%|█▉        | 1966080/9912422 [00:00<00:02, 3719034.96it/s]
 24%|██▍       | 2359296/9912422 [00:01<00:03, 2392756.16it/s]
 30%|███       | 3014656/9912422 [00:01<00:02, 3184939.22it/s]
 34%|███▍      | 3407872/9912422 [00:01<00:02, 2985198.26it/s]
 38%|███▊      | 3768320/9912422 [00:01<00:02, 2372401.09it/s]
 44%|████▍     | 4358144/9912422 [00:01<00:02, 2693507.13it/s]
 47%|████▋     | 4685824/9912422 [00:02<00:04, 1072250.31it/s]
 58%|█████▊    | 5701632/9912422 [00:02<00:02, 1935912.80it/s]
 62%|██████▏   | 6160384/9912422 [00:02<00:01, 2061309.93it/s]
 71%|███████   | 7045120/9912422 [00:03<00:00, 2917427.54it/s]
 76%|███████▋  | 7

[2m[36m(train_mnist_tune pid=57654)[0m Extracting data/MNIST/raw/train-images-idx3-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57654)[0m 
[2m[36m(train_mnist_tune pid=57654)[0m Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
[2m[36m(train_mnist_tune pid=57654)[0m Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to data/MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s] 
100%|██████████| 28881/28881 [00:00<00:00, 570377.79it/s]


[2m[36m(train_mnist_tune pid=57654)[0m Extracting data/MNIST/raw/train-labels-idx1-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57654)[0m 
[2m[36m(train_mnist_tune pid=57654)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
[2m[36m(train_mnist_tune pid=57654)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]
  6%|▌         | 98304/1648877 [00:00<00:02, 694512.62it/s]
 16%|█▌        | 262144/1648877 [00:00<00:01, 1130975.56it/s]
 28%|██▊       | 458752/1648877 [00:00<00:00, 1444633.05it/s]
 54%|█████▎    | 884736/1648877 [00:00<00:00, 2428338.44it/s]
 81%|████████▏ | 1343488/1648877 [00:00<00:00, 2738574.73it/s]
100%|██████████| 1648877/1648877 [00:00<00:00, 2460140.86it/s]


[2m[36m(train_mnist_tune pid=57654)[0m Extracting data/MNIST/raw/t10k-images-idx3-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57654)[0m 
[2m[36m(train_mnist_tune pid=57654)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
[2m[36m(train_mnist_tune pid=57654)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 17070366.28it/s]
[2m[36m(train_mnist_tune pid=57654)[0m 
[2m[36m(train_mnist_tune pid=57654)[0m   | Name    | Type   | Params
[2m[36m(train_mnist_tune pid=57654)[0m -----------------------------------
[2m[36m(train_mnist_tune pid=57654)[0m 0 | layer_1 | Linear | 25.1 K
[2m[36m(train_mnist_tune pid=57654)[0m 1 | layer_2 | Linear | 8.4 K 
[2m[36m(train_mnist_tune pid=57654)[0m 2 | layer_3 | Linear | 2.6 K 
[2m[36m(train_mnist_tune pid=57654)[0m -----------------------------------
[2m[36m(train_mnist_tune pid=57654)[0m 36.1 K    Trainable params
[2m[36m(train_mnist_tune pid=57654)[0m 0         Non-trainable params
[2m[36m(train_mnist_tune pid=57654)[0m 36.1 K    Total params
[2m[36m(train_mnist_tune pid=57654)[0m 0.145     Total estimated model params size (MB)


[2m[36m(train_mnist_tune pid=57654)[0m Extracting data/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57654)[0m 


[2m[36m(train_mnist_tune pid=57654)[0m   rank_zero_warn(
[2m[36m(train_mnist_tune pid=57654)[0m   rank_zero_warn(


Trial name,date,done,episodes_total,experiment_id,hostname,iterations_since_restore,loss,mean_accuracy,node_ip,pid,time_since_restore,time_this_iter_s,time_total_s,timestamp,timesteps_since_restore,timesteps_total,training_iteration,trial_id,warmup_time
train_mnist_tune_977e2_00001,2023-04-15_07-02-22,False,,ee690cfea05d4fb18ab2e2505453dac7,20-MacBook-Pro.local,1,0.21598,0.93551,127.0.0.1,57609,35.8776,35.8776,35.8776,1681534942,0,,1,977e2_00001,0.00870395


[2m[36m(train_mnist_tune pid=57628)[0m Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
[2m[36m(train_mnist_tune pid=57628)[0m Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]
  1%|          | 98304/9912422 [00:00<00:10, 959586.07it/s]
  3%|▎         | 262144/9912422 [00:00<00:07, 1244020.72it/s]
  7%|▋         | 655360/9912422 [00:00<00:03, 2394638.48it/s]
 15%|█▍        | 1441792/9912422 [00:00<00:01, 4395549.67it/s]
 19%|█▉        | 1900544/9912422 [00:00<00:02, 3602593.18it/s]
 28%|██▊       | 2785280/9912422 [00:00<00:01, 4338405.26it/s]
 36%|███▋      | 3604480/9912422 [00:00<00:01, 4785080.78it/s]
 42%|████▏     | 4194304/9912422 [00:00<00:01, 5049508.54it/s]
 52%|█████▏    | 5144576/9912422 [00:01<00:00, 6199670.90it/s]
 59%|█████▊    | 5799936/9912422 [00:01<00:00, 5420020.70it/s]
 70%|██████▉   | 6914048/9912422 [00:01<00:00, 6726432.31it/s]
 77%|███████▋  | 7634944/9912422 [00:01<00:00, 6407647.27it/s]
 85%|████████▍ | 8421376/9912422 [00:01<00:00, 6779432.71it/s]
 92%|█████████▏| 9142272/9912422 [00:01<00:00, 6633123.93it/s]
100%|██████████| 9912422/9912422 [00:01<00:00, 5188280.02it/s]


[2m[36m(train_mnist_tune pid=57628)[0m Extracting data/MNIST/raw/train-images-idx3-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57628)[0m 
[2m[36m(train_mnist_tune pid=57628)[0m Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
[2m[36m(train_mnist_tune pid=57628)[0m Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to data/MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s] 
100%|██████████| 28881/28881 [00:00<00:00, 580824.97it/s]


[2m[36m(train_mnist_tune pid=57628)[0m Extracting data/MNIST/raw/train-labels-idx1-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57628)[0m 
[2m[36m(train_mnist_tune pid=57628)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
[2m[36m(train_mnist_tune pid=57628)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]
  4%|▍         | 65536/1648877 [00:00<00:02, 602863.24it/s]
 12%|█▏        | 196608/1648877 [00:00<00:01, 906781.25it/s]
 26%|██▌       | 425984/1648877 [00:00<00:00, 1472801.12it/s]
 52%|█████▏    | 851968/1648877 [00:00<00:00, 2490653.20it/s]
 87%|████████▋ | 1441792/1648877 [00:00<00:00, 3664970.88it/s]
100%|██████████| 1648877/1648877 [00:00<00:00, 2230832.68it/s]


[2m[36m(train_mnist_tune pid=57628)[0m Extracting data/MNIST/raw/t10k-images-idx3-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57628)[0m 
[2m[36m(train_mnist_tune pid=57628)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
[2m[36m(train_mnist_tune pid=57628)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to data/MNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]m 
100%|██████████| 4542/4542 [00:00<00:00, 145620.64it/s]
[2m[36m(train_mnist_tune pid=57628)[0m 
[2m[36m(train_mnist_tune pid=57628)[0m   | Name    | Type   | Params
[2m[36m(train_mnist_tune pid=57628)[0m -----------------------------------
[2m[36m(train_mnist_tune pid=57628)[0m 0 | layer_1 | Linear | 50.2 K
[2m[36m(train_mnist_tune pid=57628)[0m 1 | layer_2 | Linear | 8.3 K 
[2m[36m(train_mnist_tune pid=57628)[0m 2 | layer_3 | Linear | 1.3 K 
[2m[36m(train_mnist_tune pid=57628)[0m -----------------------------------
[2m[36m(train_mnist_tune pid=57628)[0m 59.9 K    Trainable params
[2m[36m(train_mnist_tune pid=57628)[0m 0         Non-trainable params
[2m[36m(train_mnist_tune pid=57628)[0m 59.9 K    Total params
[2m[36m(train_mnist_tune pid=57628)[0m 0.239     Total estimated model params size (MB)


[2m[36m(train_mnist_tune pid=57597)[0m Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
[2m[36m(train_mnist_tune pid=57628)[0m Extracting data/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57628)[0m 
[2m[36m(train_mnist_tune pid=57597)[0m Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]
  1%|          | 65536/9912422 [00:00<00:19, 509178.37it/s]
  1%|▏         | 131072/9912422 [00:00<00:17, 565345.93it/s]
  2%|▏         | 196608/9912422 [00:00<00:17, 546091.69it/s]
  3%|▎         | 294912/9912422 [00:00<00:14, 672342.49it/s]
  5%|▍         | 458752/9912422 [00:00<00:10, 944234.83it/s]
  8%|▊         | 753664/9912422 [00:00<00:05, 1538933.77it/s]
 14%|█▎        | 1343488/9912422 [00:00<00:03, 2821376.36it/s]
 17%|█▋        | 1638400/9912422 [00:01<00:04, 2019517.09it/s]
 26%|██▌       | 2588672/9912422 [00:01<00:01, 3742229.63it/s]
 31%|███       | 3047424/9912422 [00:01<00:01, 3503526.24it/s]
 39%|███▉      | 3866624/9912422 [00:01<00:01, 4090866.37it/s]
 44%|████▎     | 4325376/9912422 [00:01<00:01, 3759410.25it/s]
 54%|█████▍    | 5341184/9912422 [00:01<00:01, 4569154.45it/s]
 61%|██████    | 6029312/9912422 [00:01<00:00, 5080422.16it/s]
 68%|██████▊   | 6717440/9912422 [00:01<00:00, 5496128.44it/s]
 74%|███████▍  | 737280

[2m[36m(train_mnist_tune pid=57635)[0m GPU available: False, used: False
[2m[36m(train_mnist_tune pid=57635)[0m TPU available: False, using: 0 TPU cores
[2m[36m(train_mnist_tune pid=57635)[0m IPU available: False, using: 0 IPUs
[2m[36m(train_mnist_tune pid=57635)[0m HPU available: False, using: 0 HPUs


[2m[36m(train_mnist_tune pid=57597)[0m 
[2m[36m(train_mnist_tune pid=57597)[0m Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
[2m[36m(train_mnist_tune pid=57597)[0m Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 667830.08it/s]


[2m[36m(train_mnist_tune pid=57597)[0m Extracting data/MNIST/raw/train-labels-idx1-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57597)[0m 
[2m[36m(train_mnist_tune pid=57597)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
[2m[36m(train_mnist_tune pid=57597)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]
  6%|▌         | 98304/1648877 [00:00<00:02, 765190.65it/s]
 16%|█▌        | 262144/1648877 [00:00<00:01, 1199265.67it/s]
 42%|████▏     | 688128/1648877 [00:00<00:00, 2439504.04it/s]
100%|██████████| 1648877/1648877 [00:00<00:00, 2871812.98it/s]


[2m[36m(train_mnist_tune pid=57597)[0m Extracting data/MNIST/raw/t10k-images-idx3-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57597)[0m 
[2m[36m(train_mnist_tune pid=57597)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
[2m[36m(train_mnist_tune pid=57597)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 100101.56it/s]
[2m[36m(train_mnist_tune pid=57597)[0m 
[2m[36m(train_mnist_tune pid=57597)[0m   | Name    | Type   | Params
[2m[36m(train_mnist_tune pid=57597)[0m -----------------------------------
[2m[36m(train_mnist_tune pid=57597)[0m 0 | layer_1 | Linear | 50.2 K
[2m[36m(train_mnist_tune pid=57597)[0m 1 | layer_2 | Linear | 4.2 K 
[2m[36m(train_mnist_tune pid=57597)[0m 2 | layer_3 | Linear | 650   
[2m[36m(train_mnist_tune pid=57597)[0m -----------------------------------
[2m[36m(train_mnist_tune pid=57597)[0m 55.1 K    Trainable params
[2m[36m(train_mnist_tune pid=57597)[0m 0         Non-trainable params
[2m[36m(train_mnist_tune pid=57597)[0m 55.1 K    Total params
[2m[36m(train_mnist_tune pid=57597)[0m 0.220     Total estimated model params size (MB)


[2m[36m(train_mnist_tune pid=57597)[0m Extracting data/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57597)[0m 
[2m[36m(train_mnist_tune pid=57635)[0m Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
[2m[36m(train_mnist_tune pid=57635)[0m Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]
  1%|          | 65536/9912422 [00:00<00:15, 642700.60it/s]
  2%|▏         | 229376/9912422 [00:00<00:08, 1094439.72it/s]
  5%|▌         | 524288/9912422 [00:00<00:05, 1855863.28it/s]
 11%|█         | 1114112/9912422 [00:00<00:02, 3334943.90it/s]
 18%|█▊        | 1769472/9912422 [00:00<00:02, 3885249.38it/s]
 31%|███▏      | 3112960/9912422 [00:00<00:01, 5885486.15it/s]
 37%|███▋      | 3702784/9912422 [00:00<00:01, 5058110.98it/s]
 50%|████▉     | 4947968/9912422 [00:01<00:01, 4001149.92it/s]
 76%|███████▋  | 7569408/9912422 [00:01<00:00, 7504658.11it/s]


[2m[36m(train_mnist_tune pid=57635)[0m Extracting data/MNIST/raw/train-images-idx3-ubyte.gz to data/MNIST/raw


100%|██████████| 9912422/9912422 [00:01<00:00, 5275604.71it/s]


[2m[36m(train_mnist_tune pid=57635)[0m 
[2m[36m(train_mnist_tune pid=57635)[0m Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
[2m[36m(train_mnist_tune pid=57635)[0m Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to data/MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s] 
100%|██████████| 28881/28881 [00:00<00:00, 660936.03it/s]


[2m[36m(train_mnist_tune pid=57635)[0m Extracting data/MNIST/raw/train-labels-idx1-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57635)[0m 
[2m[36m(train_mnist_tune pid=57635)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
[2m[36m(train_mnist_tune pid=57635)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]
  4%|▍         | 65536/1648877 [00:00<00:03, 510309.88it/s]
  8%|▊         | 131072/1648877 [00:00<00:02, 584260.67it/s]
 14%|█▍        | 229376/1648877 [00:00<00:02, 676086.52it/s]
 24%|██▍       | 393216/1648877 [00:00<00:01, 993896.68it/s]
 46%|████▌     | 753664/1648877 [00:00<00:00, 1837654.28it/s]
 81%|████████▏ | 1343488/1648877 [00:00<00:00, 3121335.83it/s]
100%|██████████| 1648877/1648877 [00:00<00:00, 2089195.61it/s]


[2m[36m(train_mnist_tune pid=57635)[0m Extracting data/MNIST/raw/t10k-images-idx3-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57635)[0m 
[2m[36m(train_mnist_tune pid=57635)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
[2m[36m(train_mnist_tune pid=57635)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to data/MNIST/raw/t10k-labels-idx1-ubyte.gz
[2m[36m(train_mnist_tune pid=57635)[0m Extracting data/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57635)[0m 
[2m[36m(train_mnist_tune pid=57661)[0m Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 884220.41it/s]
[2m[36m(train_mnist_tune pid=57635)[0m 
[2m[36m(train_mnist_tune pid=57635)[0m   | Name    | Type   | Params
[2m[36m(train_mnist_tune pid=57635)[0m -----------------------------------
[2m[36m(train_mnist_tune pid=57635)[0m 0 | layer_1 | Linear | 50.2 K
[2m[36m(train_mnist_tune pid=57635)[0m 1 | layer_2 | Linear | 4.2 K 
[2m[36m(train_mnist_tune pid=57635)[0m 2 | layer_3 | Linear | 650   
[2m[36m(train_mnist_tune pid=57635)[0m -----------------------------------
[2m[36m(train_mnist_tune pid=57635)[0m 55.1 K    Trainable params
[2m[36m(train_mnist_tune pid=57635)[0m 0         Non-trainable params
[2m[36m(train_mnist_tune pid=57635)[0m 55.1 K    Total params
[2m[36m(train_mnist_tune pid=57635)[0m 0.220     Total estimated model params size (MB)


[2m[36m(train_mnist_tune pid=57661)[0m Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]
  1%|          | 98304/9912422 [00:00<00:11, 876063.93it/s]
  2%|▏         | 229376/9912422 [00:00<00:09, 1070444.80it/s]
  5%|▌         | 524288/9912422 [00:00<00:05, 1873122.18it/s]
 13%|█▎        | 1245184/9912422 [00:00<00:02, 3888066.05it/s]
 17%|█▋        | 1671168/9912422 [00:00<00:03, 2666333.30it/s]
 33%|███▎      | 3244032/9912422 [00:00<00:01, 5464421.45it/s]
 44%|████▍     | 4358144/9912422 [00:01<00:01, 3301202.07it/s]
 57%|█████▋    | 5603328/9912422 [00:01<00:00, 4612514.74it/s]
 62%|██████▏   | 6160384/9912422 [00:01<00:00, 3756374.21it/s]
 75%|███████▌  | 7438336/9912422 [00:01<00:00, 4061214.53it/s]
 88%|████████▊ | 8716288/9912422 [00:02<00:00, 4641152.32it/s]
100%|██████████| 9912422/9912422 [00:02<00:00, 4288372.05it/s]


[2m[36m(train_mnist_tune pid=57661)[0m Extracting data/MNIST/raw/train-images-idx3-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57661)[0m 
[2m[36m(train_mnist_tune pid=57661)[0m Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
[2m[36m(train_mnist_tune pid=57661)[0m Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to data/MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s] 
100%|██████████| 28881/28881 [00:00<00:00, 617535.14it/s]


[2m[36m(train_mnist_tune pid=57661)[0m Extracting data/MNIST/raw/train-labels-idx1-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57661)[0m 
[2m[36m(train_mnist_tune pid=57661)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
[2m[36m(train_mnist_tune pid=57661)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]
  6%|▌         | 98304/1648877 [00:00<00:01, 847643.86it/s]
 16%|█▌        | 262144/1648877 [00:00<00:01, 1199522.92it/s]
 34%|███▍      | 557056/1648877 [00:00<00:00, 1769927.50it/s]
 72%|███████▏  | 1179648/1648877 [00:00<00:00, 3331286.12it/s]
100%|██████████| 1648877/1648877 [00:00<00:00, 3017898.39it/s]


[2m[36m(train_mnist_tune pid=57661)[0m Extracting data/MNIST/raw/t10k-images-idx3-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57661)[0m 
[2m[36m(train_mnist_tune pid=57661)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
[2m[36m(train_mnist_tune pid=57661)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to data/MNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]m 
100%|██████████| 4542/4542 [00:00<00:00, 98546.56it/s]


[2m[36m(train_mnist_tune pid=57661)[0m Extracting data/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57661)[0m 


[2m[36m(train_mnist_tune pid=57661)[0m 
[2m[36m(train_mnist_tune pid=57661)[0m   | Name    | Type   | Params
[2m[36m(train_mnist_tune pid=57661)[0m -----------------------------------
[2m[36m(train_mnist_tune pid=57661)[0m 0 | layer_1 | Linear | 25.1 K
[2m[36m(train_mnist_tune pid=57661)[0m 1 | layer_2 | Linear | 2.1 K 
[2m[36m(train_mnist_tune pid=57661)[0m 2 | layer_3 | Linear | 650   
[2m[36m(train_mnist_tune pid=57661)[0m -----------------------------------
[2m[36m(train_mnist_tune pid=57661)[0m 27.9 K    Trainable params
[2m[36m(train_mnist_tune pid=57661)[0m 0         Non-trainable params
[2m[36m(train_mnist_tune pid=57661)[0m 27.9 K    Total params
[2m[36m(train_mnist_tune pid=57661)[0m 0.112     Total estimated model params size (MB)
[2m[36m(train_mnist_tune pid=57661)[0m   rank_zero_warn(
[2m[36m(train_mnist_tune pid=57661)[0m   rank_zero_warn(
[2m[36m(train_mnist_tune pid=57646)[0m GPU available: False, used: False
[2m[36m(train_mni

[2m[36m(train_mnist_tune pid=57646)[0m Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
[2m[36m(train_mnist_tune pid=57646)[0m Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]
  1%|          | 65536/9912422 [00:00<00:23, 412534.47it/s]
  2%|▏         | 163840/9912422 [00:00<00:16, 608784.99it/s]
  4%|▎         | 360448/9912422 [00:00<00:09, 1039790.07it/s]
  8%|▊         | 819200/9912422 [00:00<00:04, 2220492.46it/s]
 13%|█▎        | 1277952/9912422 [00:00<00:03, 2318169.45it/s]
 25%|██▍       | 2457600/9912422 [00:00<00:01, 4333551.18it/s]
 29%|██▉       | 2916352/9912422 [00:01<00:02, 2968401.30it/s]
 41%|████      | 4063232/9912422 [00:01<00:01, 4217124.91it/s]
 46%|████▌     | 4554752/9912422 [00:01<00:01, 3895172.46it/s]
 56%|█████▌    | 5505024/9912422 [00:01<00:00, 5048911.51it/s]
 61%|██████▏   | 6094848/9912422 [00:01<00:00, 4858536.57it/s]
 67%|██████▋   | 6651904/9912422 [00:01<00:00, 4435339.63it/s]
 78%|███████▊  | 7766016/9912422 [00:01<00:00, 5932769.25it/s]
 85%|████████▌ | 8454144/9912422 [00:02<00:00, 4781185.90it/s]
 96%|█████████▌| 9469952/9912422 [00:02<00:00, 5903705.50it/s]
100%|██████████| 9

[2m[36m(train_mnist_tune pid=57646)[0m Extracting data/MNIST/raw/train-images-idx3-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57646)[0m 
[2m[36m(train_mnist_tune pid=57646)[0m Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
[2m[36m(train_mnist_tune pid=57646)[0m Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 1305819.95it/s]


[2m[36m(train_mnist_tune pid=57646)[0m Extracting data/MNIST/raw/train-labels-idx1-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57646)[0m 
[2m[36m(train_mnist_tune pid=57646)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
[2m[36m(train_mnist_tune pid=57646)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]
  6%|▌         | 98304/1648877 [00:00<00:02, 625889.90it/s]
 14%|█▍        | 229376/1648877 [00:00<00:01, 760438.52it/s]
 30%|██▉       | 491520/1648877 [00:00<00:00, 1373086.27it/s]
 62%|██████▏   | 1015808/1648877 [00:00<00:00, 2627522.93it/s]
100%|██████████| 1648877/1648877 [00:00<00:00, 2540967.65it/s]


[2m[36m(train_mnist_tune pid=57646)[0m Extracting data/MNIST/raw/t10k-images-idx3-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57646)[0m 
[2m[36m(train_mnist_tune pid=57646)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
[2m[36m(train_mnist_tune pid=57646)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 114879.87it/s]
[2m[36m(train_mnist_tune pid=57646)[0m 
[2m[36m(train_mnist_tune pid=57646)[0m   | Name    | Type   | Params
[2m[36m(train_mnist_tune pid=57646)[0m -----------------------------------
[2m[36m(train_mnist_tune pid=57646)[0m 0 | layer_1 | Linear | 100 K 
[2m[36m(train_mnist_tune pid=57646)[0m 1 | layer_2 | Linear | 16.5 K
[2m[36m(train_mnist_tune pid=57646)[0m 2 | layer_3 | Linear | 1.3 K 
[2m[36m(train_mnist_tune pid=57646)[0m -----------------------------------
[2m[36m(train_mnist_tune pid=57646)[0m 118 K     Trainable params
[2m[36m(train_mnist_tune pid=57646)[0m 0         Non-trainable params
[2m[36m(train_mnist_tune pid=57646)[0m 118 K     Total params
[2m[36m(train_mnist_tune pid=57646)[0m 0.473     Total estimated model params size (MB)


[2m[36m(train_mnist_tune pid=57646)[0m Extracting data/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57646)[0m 


[2m[36m(train_mnist_tune pid=57597)[0m GPU available: False, used: False
[2m[36m(train_mnist_tune pid=57597)[0m TPU available: False, using: 0 TPU cores
[2m[36m(train_mnist_tune pid=57597)[0m IPU available: False, using: 0 IPUs
[2m[36m(train_mnist_tune pid=57597)[0m HPU available: False, using: 0 HPUs


[2m[36m(train_mnist_tune pid=57597)[0m Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
[2m[36m(train_mnist_tune pid=57597)[0m Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]
  1%|          | 98304/9912422 [00:00<00:10, 979952.23it/s]
  3%|▎         | 262144/9912422 [00:00<00:07, 1306689.40it/s]
  7%|▋         | 720896/9912422 [00:00<00:03, 2738783.22it/s]
 17%|█▋        | 1671168/9912422 [00:00<00:01, 4181834.63it/s]
 29%|██▉       | 2883584/9912422 [00:00<00:01, 6508170.61it/s]
 39%|███▉      | 3866624/9912422 [00:00<00:00, 6141528.75it/s]
 46%|████▌     | 4521984/9912422 [00:00<00:00, 5706576.70it/s]
 53%|█████▎    | 5242880/9912422 [00:00<00:00, 6072277.92it/s]
 60%|█████▉    | 5898240/9912422 [00:01<00:01, 3489988.82it/s]
 80%|████████  | 7962624/9912422 [00:01<00:00, 6244935.48it/s]
 89%|████████▉ | 8814592/9912422 [00:01<00:00, 5414631.81it/s]
 96%|█████████▌| 9535488/9912422 [00:01<00:00, 4821077.90it/s]
100%|██████████| 9912422/9912422 [00:02<00:00, 4890951.15it/s]


[2m[36m(train_mnist_tune pid=57597)[0m Extracting data/MNIST/raw/train-images-idx3-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57597)[0m 
[2m[36m(train_mnist_tune pid=57597)[0m Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
[2m[36m(train_mnist_tune pid=57597)[0m Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to data/MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s] 
100%|██████████| 28881/28881 [00:00<00:00, 598257.09it/s]


[2m[36m(train_mnist_tune pid=57597)[0m Extracting data/MNIST/raw/train-labels-idx1-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57597)[0m 
[2m[36m(train_mnist_tune pid=57597)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
[2m[36m(train_mnist_tune pid=57597)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to data/MNIST/raw/t10k-images-idx3-ubyte.gz


[2m[36m(train_mnist_tune pid=57597)[0m Extracting data/MNIST/raw/t10k-images-idx3-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57597)[0m 
[2m[36m(train_mnist_tune pid=57597)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
[2m[36m(train_mnist_tune pid=57597)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 108894.39it/s]
[2m[36m(train_mnist_tune pid=57597)[0m 
[2m[36m(train_mnist_tune pid=57597)[0m   | Name    | Type   | Params
[2m[36m(train_mnist_tune pid=57597)[0m -----------------------------------
[2m[36m(train_mnist_tune pid=57597)[0m 0 | layer_1 | Linear | 50.2 K
[2m[36m(train_mnist_tune pid=57597)[0m 1 | layer_2 | Linear | 16.6 K
[2m[36m(train_mnist_tune pid=57597)[0m 2 | layer_3 | Linear | 2.6 K 
[2m[36m(train_mnist_tune pid=57597)[0m -----------------------------------
[2m[36m(train_mnist_tune pid=57597)[0m 69.5 K    Trainable params
[2m[36m(train_mnist_tune pid=57597)[0m 0         Non-trainable params
[2m[36m(train_mnist_tune pid=57597)[0m 69.5 K    Total params
[2m[36m(train_mnist_tune pid=57597)[0m 0.278     Total estimated model params size (MB)


[2m[36m(train_mnist_tune pid=57597)[0m Extracting data/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57597)[0m 


[2m[36m(train_mnist_tune pid=57646)[0m GPU available: False, used: False
[2m[36m(train_mnist_tune pid=57646)[0m TPU available: False, using: 0 TPU cores
[2m[36m(train_mnist_tune pid=57646)[0m IPU available: False, using: 0 IPUs
[2m[36m(train_mnist_tune pid=57646)[0m HPU available: False, using: 0 HPUs


[2m[36m(train_mnist_tune pid=57646)[0m Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
[2m[36m(train_mnist_tune pid=57646)[0m Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]
  1%|          | 98304/9912422 [00:00<00:14, 675225.27it/s]
  6%|▋         | 622592/9912422 [00:00<00:04, 2064528.58it/s]
 14%|█▍        | 1409024/9912422 [00:00<00:02, 4092781.01it/s]
 19%|█▉        | 1867776/9912422 [00:00<00:02, 4012092.59it/s]
 23%|██▎       | 2293760/9912422 [00:00<00:02, 3683982.61it/s]
 37%|███▋      | 3670016/9912422 [00:01<00:01, 3666853.86it/s]
 54%|█████▍    | 5341184/9912422 [00:01<00:00, 5739587.20it/s]
 60%|██████    | 5996544/9912422 [00:01<00:00, 4271187.43it/s]
 73%|███████▎  | 7274496/9912422 [00:01<00:00, 4901617.01it/s]
 79%|███████▉  | 7864320/9912422 [00:01<00:00, 4949143.70it/s]
 90%|█████████ | 8945664/9912422 [00:01<00:00, 5972496.74it/s]
 97%|█████████▋| 9633792/9912422 [00:02<00:00, 6129358.81it/s]
100%|██████████| 9912422/9912422 [00:02<00:00, 4664706.86it/s]


[2m[36m(train_mnist_tune pid=57646)[0m Extracting data/MNIST/raw/train-images-idx3-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57646)[0m 
[2m[36m(train_mnist_tune pid=57646)[0m Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
[2m[36m(train_mnist_tune pid=57646)[0m Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 491026.29it/s]


[2m[36m(train_mnist_tune pid=57646)[0m Extracting data/MNIST/raw/train-labels-idx1-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57646)[0m 
[2m[36m(train_mnist_tune pid=57646)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
[2m[36m(train_mnist_tune pid=57646)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]
  6%|▌         | 98304/1648877 [00:00<00:02, 723376.87it/s]
 18%|█▊        | 294912/1648877 [00:00<00:00, 1359038.61it/s]
 48%|████▊     | 786432/1648877 [00:00<00:00, 2833709.44it/s]
 68%|██████▊   | 1114112/1648877 [00:00<00:00, 2582841.04it/s]
100%|██████████| 1648877/1648877 [00:00<00:00, 1785888.21it/s]


[2m[36m(train_mnist_tune pid=57646)[0m Extracting data/MNIST/raw/t10k-images-idx3-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57646)[0m 
[2m[36m(train_mnist_tune pid=57646)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
[2m[36m(train_mnist_tune pid=57646)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to data/MNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]m 
100%|██████████| 4542/4542 [00:00<00:00, 99175.02it/s]
[2m[36m(train_mnist_tune pid=57646)[0m 
[2m[36m(train_mnist_tune pid=57646)[0m   | Name    | Type   | Params
[2m[36m(train_mnist_tune pid=57646)[0m -----------------------------------
[2m[36m(train_mnist_tune pid=57646)[0m 0 | layer_1 | Linear | 50.2 K
[2m[36m(train_mnist_tune pid=57646)[0m 1 | layer_2 | Linear | 4.2 K 
[2m[36m(train_mnist_tune pid=57646)[0m 2 | layer_3 | Linear | 650   
[2m[36m(train_mnist_tune pid=57646)[0m -----------------------------------
[2m[36m(train_mnist_tune pid=57646)[0m 55.1 K    Trainable params
[2m[36m(train_mnist_tune pid=57646)[0m 0         Non-trainable params
[2m[36m(train_mnist_tune pid=57646)[0m 55.1 K    Total params
[2m[36m(train_mnist_tune pid=57646)[0m 0.220     Total estimated model params size (MB)


[2m[36m(train_mnist_tune pid=57646)[0m Extracting data/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57646)[0m 


[2m[36m(train_mnist_tune pid=57597)[0m GPU available: False, used: False
[2m[36m(train_mnist_tune pid=57597)[0m TPU available: False, using: 0 TPU cores
[2m[36m(train_mnist_tune pid=57597)[0m IPU available: False, using: 0 IPUs
[2m[36m(train_mnist_tune pid=57597)[0m HPU available: False, using: 0 HPUs


[2m[36m(train_mnist_tune pid=57597)[0m Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
[2m[36m(train_mnist_tune pid=57597)[0m Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]
  1%|          | 98304/9912422 [00:00<00:10, 945604.96it/s]
  3%|▎         | 262144/9912422 [00:00<00:07, 1274576.71it/s]
  6%|▋         | 622592/9912422 [00:00<00:04, 2249512.98it/s]
 12%|█▏        | 1146880/9912422 [00:00<00:02, 3375702.52it/s]
 15%|█▌        | 1507328/9912422 [00:00<00:02, 2977523.04it/s]
 23%|██▎       | 2293760/9912422 [00:00<00:01, 4391131.84it/s]
 28%|██▊       | 2752512/9912422 [00:00<00:02, 3502950.04it/s]
 38%|███▊      | 3735552/9912422 [00:00<00:01, 5081975.06it/s]
 44%|████▎     | 4325376/9912422 [00:01<00:01, 4586382.95it/s]
 52%|█████▏    | 5144576/9912422 [00:01<00:00, 4943008.06it/s]
 64%|██████▍   | 6356992/9912422 [00:01<00:00, 6670596.64it/s]
 72%|███████▏  | 7110656/9912422 [00:01<00:00, 5401581.50it/s]
 79%|███████▉  | 7831552/9912422 [00:06<00:04, 476196.19it/s] 
 84%|████████▎ | 8290304/9912422 [00:06<00:02, 559356.61it/s]
 92%|█████████▏| 9109504/9912422 [00:07<00:01, 799330.55it/s]
 97%|█████████▋| 9

[2m[36m(train_mnist_tune pid=57597)[0m Extracting data/MNIST/raw/train-images-idx3-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57597)[0m 
[2m[36m(train_mnist_tune pid=57597)[0m Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
[2m[36m(train_mnist_tune pid=57597)[0m Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 518305.17it/s]


[2m[36m(train_mnist_tune pid=57597)[0m Extracting data/MNIST/raw/train-labels-idx1-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57597)[0m 
[2m[36m(train_mnist_tune pid=57597)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
[2m[36m(train_mnist_tune pid=57597)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]
  4%|▍         | 65536/1648877 [00:00<00:02, 558707.42it/s]
 12%|█▏        | 196608/1648877 [00:00<00:01, 948515.91it/s]
 30%|██▉       | 491520/1648877 [00:00<00:00, 1799304.04it/s]
100%|██████████| 1648877/1648877 [00:00<00:00, 3321562.35it/s]


[2m[36m(train_mnist_tune pid=57597)[0m Extracting data/MNIST/raw/t10k-images-idx3-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57597)[0m 
[2m[36m(train_mnist_tune pid=57597)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
[2m[36m(train_mnist_tune pid=57597)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to data/MNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]m 
100%|██████████| 4542/4542 [00:00<00:00, 1637909.79it/s]
[2m[36m(train_mnist_tune pid=57597)[0m 
[2m[36m(train_mnist_tune pid=57597)[0m   | Name    | Type   | Params
[2m[36m(train_mnist_tune pid=57597)[0m -----------------------------------
[2m[36m(train_mnist_tune pid=57597)[0m 0 | layer_1 | Linear | 100 K 
[2m[36m(train_mnist_tune pid=57597)[0m 1 | layer_2 | Linear | 33.0 K
[2m[36m(train_mnist_tune pid=57597)[0m 2 | layer_3 | Linear | 2.6 K 
[2m[36m(train_mnist_tune pid=57597)[0m -----------------------------------
[2m[36m(train_mnist_tune pid=57597)[0m 136 K     Trainable params
[2m[36m(train_mnist_tune pid=57597)[0m 0         Non-trainable params
[2m[36m(train_mnist_tune pid=57597)[0m 136 K     Total params
[2m[36m(train_mnist_tune pid=57597)[0m 0.544     Total estimated model params size (MB)


[2m[36m(train_mnist_tune pid=57597)[0m Extracting data/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57597)[0m 


[2m[36m(train_mnist_tune pid=57646)[0m GPU available: False, used: False
[2m[36m(train_mnist_tune pid=57646)[0m TPU available: False, using: 0 TPU cores
[2m[36m(train_mnist_tune pid=57646)[0m IPU available: False, using: 0 IPUs
[2m[36m(train_mnist_tune pid=57646)[0m HPU available: False, using: 0 HPUs


[2m[36m(train_mnist_tune pid=57646)[0m Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
[2m[36m(train_mnist_tune pid=57646)[0m Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]
  1%|          | 65536/9912422 [00:00<00:15, 646133.01it/s]
  8%|▊         | 819200/9912422 [00:00<00:06, 1494110.43it/s]
 14%|█▍        | 1409024/9912422 [00:00<00:03, 2598562.87it/s]
 20%|██        | 2031616/9912422 [00:00<00:02, 3536715.60it/s]
 25%|██▍       | 2457600/9912422 [00:00<00:02, 3317169.54it/s]
 35%|███▌      | 3506176/9912422 [00:01<00:01, 4661696.39it/s]
 46%|████▋     | 4587520/9912422 [00:01<00:00, 6180347.85it/s]
 53%|█████▎    | 5275648/9912422 [00:01<00:00, 5578247.46it/s]
 60%|█████▉    | 5898240/9912422 [00:01<00:00, 4784726.17it/s]
 67%|██████▋   | 6684672/9912422 [00:01<00:00, 5468718.88it/s]
 75%|███████▌  | 7471104/9912422 [00:01<00:00, 6005516.70it/s]
 84%|████████▍ | 8323072/9912422 [00:01<00:00, 6631257.30it/s]
 93%|█████████▎| 9207808/9912422 [00:01<00:00, 7215206.85it/s]
100%|██████████| 9912422/9912422 [00:02<00:00, 4913941.16it/s]


[2m[36m(train_mnist_tune pid=57646)[0m Extracting data/MNIST/raw/train-images-idx3-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57646)[0m 
[2m[36m(train_mnist_tune pid=57646)[0m Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
[2m[36m(train_mnist_tune pid=57646)[0m Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to data/MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s] 
100%|██████████| 28881/28881 [00:00<00:00, 723362.28it/s]


[2m[36m(train_mnist_tune pid=57646)[0m Extracting data/MNIST/raw/train-labels-idx1-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57646)[0m 
[2m[36m(train_mnist_tune pid=57646)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
[2m[36m(train_mnist_tune pid=57646)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]
 24%|██▍       | 393216/1648877 [00:00<00:01, 1101515.26it/s]
 46%|████▌     | 753664/1648877 [00:00<00:00, 1874605.02it/s]
 60%|█████▉    | 983040/1648877 [00:00<00:00, 1592734.08it/s]
 95%|█████████▌| 1572864/1648877 [00:00<00:00, 2715980.63it/s]
100%|██████████| 1648877/1648877 [00:00<00:00, 1893339.32it/s]


[2m[36m(train_mnist_tune pid=57646)[0m Extracting data/MNIST/raw/t10k-images-idx3-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57646)[0m 
[2m[36m(train_mnist_tune pid=57646)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
[2m[36m(train_mnist_tune pid=57646)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 95492.33it/s]
[2m[36m(train_mnist_tune pid=57646)[0m 
[2m[36m(train_mnist_tune pid=57646)[0m   | Name    | Type   | Params
[2m[36m(train_mnist_tune pid=57646)[0m -----------------------------------
[2m[36m(train_mnist_tune pid=57646)[0m 0 | layer_1 | Linear | 100 K 
[2m[36m(train_mnist_tune pid=57646)[0m 1 | layer_2 | Linear | 16.5 K
[2m[36m(train_mnist_tune pid=57646)[0m 2 | layer_3 | Linear | 1.3 K 
[2m[36m(train_mnist_tune pid=57646)[0m -----------------------------------
[2m[36m(train_mnist_tune pid=57646)[0m 118 K     Trainable params
[2m[36m(train_mnist_tune pid=57646)[0m 0         Non-trainable params
[2m[36m(train_mnist_tune pid=57646)[0m 118 K     Total params
[2m[36m(train_mnist_tune pid=57646)[0m 0.473     Total estimated model params size (MB)


[2m[36m(train_mnist_tune pid=57646)[0m Extracting data/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/MNIST/raw
[2m[36m(train_mnist_tune pid=57646)[0m 


2023-04-15 07:06:29,866	INFO tune.py:798 -- Total run time: 296.48 seconds (296.26 seconds for the tuning loop).


Best hyperparameters found were:  {'layer_1_size': 64, 'layer_2_size': 128, 'lr': 0.0018351694463587221, 'batch_size': 128}




In [18]:

print(1)

1


In [None]:
tune_mnist_pbt(num_samples=10, num_epochs=6, gpus_per_trial=0, data_dir=data_dir)

If you have more resources available (e.g. a GPU), you can modify the above parameters accordingly.

An example output of a run could look like this:

```bash
+-----------------------------------------+------------+-------+----------------+----------------+-----------+--------------+-----------+-----------------+----------------------+
| Trial name                              | status     | loc   |   layer_1_size |   layer_2_size |        lr |   batch_size |      loss |   mean_accuracy |   training_iteration |
|-----------------------------------------+------------+-------+----------------+----------------+-----------+--------------+-----------+-----------------+----------------------|
| train_mnist_tune_checkpoint_85489_00000 | TERMINATED |       |            128 |            128 | 0.001     |           64 | 0.108734  |        0.973101 |                   10 |
| train_mnist_tune_checkpoint_85489_00001 | TERMINATED |       |            128 |            128 | 0.001     |           64 | 0.093577  |        0.978639 |                   10 |
| train_mnist_tune_checkpoint_85489_00002 | TERMINATED |       |            128 |            256 | 0.0008    |           32 | 0.0922348 |        0.979299 |                   10 |
| train_mnist_tune_checkpoint_85489_00003 | TERMINATED |       |             64 |            256 | 0.001     |           64 | 0.124648  |        0.973892 |                   10 |
| train_mnist_tune_checkpoint_85489_00004 | TERMINATED |       |            128 |             64 | 0.001     |           64 | 0.101717  |        0.975079 |                   10 |
| train_mnist_tune_checkpoint_85489_00005 | TERMINATED |       |             64 |             64 | 0.001     |           64 | 0.121467  |        0.969146 |                   10 |
| train_mnist_tune_checkpoint_85489_00006 | TERMINATED |       |            128 |            256 | 0.00064   |           32 | 0.053446  |        0.987062 |                   10 |
| train_mnist_tune_checkpoint_85489_00007 | TERMINATED |       |            128 |            256 | 0.001     |           64 | 0.129804  |        0.973497 |                   10 |
| train_mnist_tune_checkpoint_85489_00008 | TERMINATED |       |             64 |            256 | 0.0285125 |          128 | 0.363236  |        0.913867 |                   10 |
| train_mnist_tune_checkpoint_85489_00009 | TERMINATED |       |             32 |            256 | 0.001     |           64 | 0.150946  |        0.964201 |                   10 |
+-----------------------------------------+------------+-------+----------------+----------------+-----------+--------------+-----------+-----------------+----------------------+
```

As you can see, each sample ran the full number of 10 iterations.
All trials ended with quite good parameter combinations and showed relatively good performances.
In some runs, the parameters have been perturbed. And the best configuration even reached a
mean validation accuracy of `0.987062`!

In summary, PyTorch Lightning Modules are easy to extend to use with Tune. It just took
us importing one or two callbacks and a small wrapper function to get great performing
parameter configurations.

## More PyTorch Lightning Examples

- {doc}`/tune/examples/includes/mnist_ptl_mini`:
  A minimal example of using [Pytorch Lightning](https://github.com/PyTorchLightning/pytorch-lightning)
  to train a MNIST model. This example utilizes the Ray Tune-provided
  {ref}`PyTorch Lightning callbacks <tune-integration-pytorch-lightning>`.
  See also {ref}`this tutorial for a full walkthrough <tune-pytorch-lightning-ref>`.
- {ref}`A walkthrough tutorial for using Ray Tune with Pytorch-Lightning <tune-pytorch-lightning-ref>`.
- {doc}`/tune/examples/includes/mlflow_ptl_example`: Example for using [MLflow](https://github.com/mlflow/mlflow/)
  and [Pytorch Lightning](https://github.com/PyTorchLightning/pytorch-lightning) with Ray Tune.