Skip to content

Commit

Permalink
Merge branch 'master' into automated
Browse files Browse the repository at this point in the history
  • Loading branch information
jhaux committed Dec 13, 2019
2 parents 39a3c68 + 1feffe6 commit 19aacb2
Show file tree
Hide file tree
Showing 28 changed files with 423 additions and 403 deletions.
17 changes: 16 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]
### Added
- Git integration adds all .py and .yaml files not just tracked ones.
- Support for validation batches in train mode. MinimalLoggingHook used in TemplateIterator logs them automatically under `root/train/validation`.
- `-d/--debug` flag to enable post-mortem debugging. Uses `pudb` if available, otherwise `pdb`.
- Logging of commandline used to start, logging root, git tag if applicable, hostname.
- Classes with fixed splits for included datasets.
- Added `edexplore` for dataset exploration with streamlit: `edexplore -b <config.yaml>`
- Added Late Loading! You can now return functions in your examples, which will only be evaluated at the end of you data processing pipeline, allowing you to stack many filter operations on top of each other.
- Added Late Loading. You can now return functions in your examples, which will only be evaluated at the end of you data processing pipeline, allowing you to stack many filter operations on top of each other.
- Added MetaView Dataset, which allows to store views on a base dataset without the need to recalculate the labels everytime.
- `TFBaseEvaluator` now parses config file for `fcond` flag to filter checkpoints, e.g.`edflow -e xxx --fcond "lambda c: any([str(n) in c for n in [240000, 320000]])"` will only evaluate checkpoint 240k and 320k
- Added MetaDataset for easy Dataloading
Expand All @@ -25,6 +30,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- CHANGELOG.md to document notable changes.

### Changed
- Saved config files use `-` instead of `:` in filename to be consistent.
- No more `-e/--evaluation <config>` and `-t/--train <config>` options. Specify all configs under `-b/--base <config1> <config2>`. Default to evaluation mode, specify `-t/--train` for training mode.
- Specifying model in config is optional.
- Code root determined by import path of iterator not model.
- When setting the `DatasetMixin` attribute `append_labels = True` the labels are not added to the example directly but behind the key `labels_`.
- Changed tiling background color to white
- Changed interface of `edflow.data.dataset.RandomlyJoinedDataset` to improve it.
Expand All @@ -36,4 +45,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- `make_var` was broken for period variables because subcommands lacked `**kwargs` in definition. This is fixed now.

### Removed
- Cannot start training and (multiple) evaluations at the same time anymore. Simplifies a lot and was almost never used.
- No single '-' possible for commandline specification of config parameters. Use '--'.
- It is no longer possible to pass callbacks as list via the config

### Fixed
- Image outputs in `template_pytorch` example.
- Negative numbers as values for keyword arguments are now properly parsed.
48 changes: 23 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,13 +140,14 @@ n_classes: 10
```

#### Train
To start training, use the `-t/--train <config>` command-line option and,
To start training, specify configuration files with `-b/--base <config>`
command-line option, use the `-t/--train` flag to enable training mode and,
optionally, the `-n/--name <name>` option to more easily find your experiments
later on:


```
$ edflow -t template_tfe/config.yaml -n hello_tfe
$ edflow -b template_tfe/config.yaml -t -n hello_tfe
[INFO] [train]: Starting Training.
[INFO] [train]: Instantiating dataset.
[INFO] [FashionMNIST]: Using split: train
Expand Down Expand Up @@ -194,7 +195,7 @@ Use `CTRL-C` to interrupt the training:
To resume training, run


edflow -t template_tfe/config.yaml -p logs/2019-08-05T18:55:20_hello_tfe/
edflow -b template_tfe/config.yaml -t -p logs/2019-08-05T18:55:20_hello_tfe/


It will load the last checkpoint in the project folder and continue training
Expand All @@ -203,23 +204,23 @@ This lets you easily adjust parameters without having to start training from
scratch, e.g.


edflow -t template_tfe/config.yaml -p logs/2019-08-05T18:55:20_hello_tfe/ --batch_size 32
edflow -b template_tfe/config.yaml -t -p logs/2019-08-05T18:55:20_hello_tfe/ --batch_size 32


will continue with an increased batch size. Instead of loading the latest
checkpoint, you can load a specific checkpoint by adding `-c <path to
checkpoint>`:


edflow -t template_tfe/config.yaml -p logs/2019-08-05T18:55:20_hello_tfe/ -c logs/2019-08-05T18:55:20_hello_tfe/train/checkpoints/model-1207.ckpt
edflow -b template_tfe/config.yaml -t -p logs/2019-08-05T18:55:20_hello_tfe/ -c logs/2019-08-05T18:55:20_hello_tfe/train/checkpoints/model-1207.ckpt


#### Evaluate
Evaluation mode will write all outputs of `eval_op` to disk and prepare them
for consumption by your evaluation functions. Just replace `-t` by `-e`:
for consumption by your evaluation functions. Just remove the training flag `-t`:


edflow -e template_tfe/config.yaml -p logs/2019-08-05T18:55:20_hello_tfe/ -c logs/2019-08-05T18:55:20_hello_tfe/train/checkpoints/model-1207.ckpt
edflow -b template_tfe/config.yaml -p logs/2019-08-05T18:55:20_hello_tfe/ -c logs/2019-08-05T18:55:20_hello_tfe/train/checkpoints/model-1207.ckpt


If `-c` is not specified, it will evaluate the latest checkpoint. The
Expand Down Expand Up @@ -251,7 +252,7 @@ def acc_callback(root, data_in, data_out, config):
# data_out contains all the keys that were specified in the eval_op
outputs = data_out[i]["outputs"]
# labels are also available on each example
loss = data_out[i]["loss"]
loss = data_out[i]["labels_"]["loss"]

prediction = np.argmax(outputs, axis=0)
correct += labels == prediction
Expand Down Expand Up @@ -279,7 +280,6 @@ be found for PyTorch in `template_pytorch/edflow.py` and requires only slightly
different syntax:

```python
import functools
import torch
import torch.nn as nn
import torch.optim as optim
Expand Down Expand Up @@ -332,7 +332,7 @@ class Iterator(TemplateIterator):
# get inputs
inputs, labels = kwargs["image"], kwargs["class"]
inputs = torch.tensor(inputs)
inputs = inputs.transpose(2, 3).transpose(1, 2)
inputs = inputs.permute(0, 3, 1, 2)
labels = torch.tensor(labels, dtype=torch.long)

# compute loss
Expand All @@ -352,7 +352,9 @@ class Iterator(TemplateIterator):
min_loss = np.min(loss.detach().numpy())
max_loss = np.max(loss.detach().numpy())
return {
"images": {"inputs": inputs.detach().numpy()},
"images": {
"inputs": inputs.detach().permute(0, 2, 3, 1).numpy()
},
"scalars": {
"min_loss": min_loss,
"max_loss": max_loss,
Expand All @@ -374,7 +376,7 @@ You can experiment with it in the exact same way as [above](#TensorFlow-Eager).
For example, to [start training](#Train) run:


edflow -t template_pytorch/config.yaml -n hello_pytorch
edflow -b template_pytorch/config.yaml -t -n hello_pytorch


See also [interrupt and resume](#interrupt-and-resume) and
Expand All @@ -386,7 +388,7 @@ See also [interrupt and resume](#interrupt-and-resume) and
edflow also supports graph-based execution, e.g.

cd examples
edflow -t mnist_tf/train.yaml -n hello_tensorflow
edflow -b mnist_tf/train.yaml -t -n hello_tensorflow

With TensorFlow 2.x going eager by default and TensorFlow 1.x supporting eager
execution, support for TensorFlow's 1.x graph
Expand All @@ -405,30 +407,26 @@ For more information, look into our [documentation](https://edflow.readthedocs.i
```
$ edflow --help
usage: edflow [-h] [-n description]
[-b [base_config.yaml [base_config.yaml ...]]] [-t config.yaml]
[-e [config.yaml [config.yaml ...]]] [-p PROJECT]
[-c CHECKPOINT] [-r] [-log LEVEL]
[-b [base_config.yaml [base_config.yaml ...]]] [-t] [-p PROJECT]
[-c CHECKPOINT] [-r] [-log LEVEL] [-d]
optional arguments:
-h, --help show this help message and exit
-n description, --name description
postfix of log directory.
-b [base_config.yaml [base_config.yaml ...]], --base [base_config.yaml [base_config.yaml ...]]
Path to base config. Any parameter in here is
overwritten by the train of eval config. Useful e.g.
for model parameters, which stay constant between
trainings and evaluations.
-t config.yaml, --train config.yaml
path to training config
-e [config.yaml [config.yaml ...]], --eval [config.yaml [config.yaml ...]]
path to evaluation configs
paths to base configs. Loaded from left-to-right.
Parameters can be overwritten or added with command-
line options of the form `--key value`.
-t, --train run in training mode
-p PROJECT, --project PROJECT
path to existing project
-c CHECKPOINT, --checkpoint CHECKPOINT
path to existing checkpoint
-r, --retrain reset global step
-log LEVEL, --log-level LEVEL
Set the std-out logging level.
set the std-out logging level.
-d, --debug enable post-mortem debugging
```


Expand Down
2 changes: 1 addition & 1 deletion edflow/config/commandline_kwargs.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ def parse_unknown_args(unknown):
kwargs = {}
for i in range(len(unknown)):
key = unknown[i]
if key[0] == "-" or key[:2] == "--":
if key[:2] == "--":
# Make sure that keys are only passed once
if key in kwargs:
raise ValueError("Double Argument: {} is passed twice".format(key))
Expand Down
1 change: 0 additions & 1 deletion edflow/data/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,6 @@
# even do that much and it would provide better documentation as this is
# actually our base class for datasets

from edflow.main import traceable_method, get_implementations_from_config
from edflow.util import PRNGMixin


Expand Down
25 changes: 22 additions & 3 deletions edflow/datasets/celeba.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,14 +75,18 @@ def _prepare(self):
pickle.dump(data, f)
edu.mark_prepared(self.root)

def _load(self):
with open(self._data_path, "rb") as f:
self._data = pickle.load(f)
def _get_split(self):
split = (
"test" if self.config.get("test_mode", False) else "train"
) # default split
if self.NAME in self.config:
split = self.config[self.NAME].get("split", split)
return split

def _load(self):
with open(self._data_path, "rb") as f:
self._data = pickle.load(f)
split = self._get_split()
assert split in ["train", "test", "val"]
self.logger.info("Using split: {}".format(split))
if split == "train":
Expand Down Expand Up @@ -122,6 +126,21 @@ def __len__(self):
return self._length


class CelebATrain(CelebA):
def _get_split(self):
return "train"


class CelebAVal(CelebA):
def _get_split(self):
return "val"


class CelebATest(CelebA):
def _get_split(self):
return "test"


if __name__ == "__main__":
from edflow.util import pp2mkdtable

Expand Down
20 changes: 17 additions & 3 deletions edflow/datasets/cifar.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,14 +83,18 @@ def _prepare(self):
pickle.dump(data, f)
edu.mark_prepared(self.root)

def _load(self):
with open(self._data_path, "rb") as f:
self._data = pickle.load(f)
def get_split(self):
split = (
"test" if self.config.get("test_mode", False) else "train"
) # default split
if self.NAME in self.config:
split = self.config[self.NAME].get("split", split)
return split

def _load(self):
with open(self._data_path, "rb") as f:
self._data = pickle.load(f)
split = self._get_split()
assert split in ["train", "test"]
self.logger.info("Using split: {}".format(split))
self.labels = {
Expand Down Expand Up @@ -119,6 +123,16 @@ def __len__(self):
return self._length


class CIFAR10Train(CIFAR10):
def _get_split(self):
return "train"


class CIFAR10Test(CIFAR10):
def _get_split(self):
return "test"


if __name__ == "__main__":
from edflow.util import pp2mkdtable

Expand Down
20 changes: 17 additions & 3 deletions edflow/datasets/fashionmnist.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,14 +51,18 @@ def _prepare(self):
pickle.dump(data, f)
edu.mark_prepared(self.root)

def _load(self):
with open(self._data_path, "rb") as f:
self._data = pickle.load(f)
def _get_split(self):
split = (
"test" if self.config.get("test_mode", False) else "train"
) # default split
if self.NAME in self.config:
split = self.config[self.NAME].get("split", split)
return split

def _load(self):
with open(self._data_path, "rb") as f:
self._data = pickle.load(f)
split = self._get_split()
assert split in ["train", "test"]
self.logger.info("Using split: {}".format(split))
if split == "test":
Expand Down Expand Up @@ -89,6 +93,16 @@ def __len__(self):
return self._length


class FashionMNISTTrain(FashionMNIST):
def _get_split(self):
return "train"


class FashionMNISTTest(FashionMNIST):
def _get_split(self):
return "test"


if __name__ == "__main__":
print("train")
d = FashionMNIST()
Expand Down
20 changes: 17 additions & 3 deletions edflow/datasets/mnist.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,14 +52,18 @@ def _prepare(self):
pickle.dump(data, f)
edu.mark_prepared(self.root)

def _load(self):
with open(self._data_path, "rb") as f:
self._data = pickle.load(f)
def _get_split(self):
split = (
"test" if self.config.get("test_mode", False) else "train"
) # default split
if self.NAME in self.config:
split = self.config[self.NAME].get("split", split)
return split

def _load(self):
with open(self._data_path, "rb") as f:
self._data = pickle.load(f)
split = self._get_split()
assert split in ["train", "test"]
self.logger.info("Using split: {}".format(split))
if split == "test":
Expand Down Expand Up @@ -90,6 +94,16 @@ def __len__(self):
return self._length


class MNISTTrain(MNIST):
def _get_split(self):
return "train"


class MNISTTest(MNIST):
def _get_split(self):
return "test"


if __name__ == "__main__":
print("train")
d = MNIST()
Expand Down

0 comments on commit 19aacb2

Please sign in to comment.