Skip to content
This repository was archived by the owner on Jul 18, 2024. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
159 changes: 127 additions & 32 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,22 +17,23 @@ PyTorchPipe (PTP) is a component-oriented framework that facilitates development
PTP frames training and testing procedures as _pipelines_ consisting of many components communicating through data streams.
Each such a stream can consist of several components, including one task instance (providing batches of data), any number of trainable components (models) and additional components providing required transformations and computations.


![Alt text](docs/source/img/data_flow_vqa_5_attention_gpu_loaders.png?raw=true "Exemplary multi-modal data flow diagram")


As a result, the training & testing procedures are no longer pinned to a specific task or model, and built-in mechanisms for compatibility checking (handshaking), configuration and global variables management & statistics collection facilitate rapid development of complex pipelines and running diverse experiments.

In its core, to _accelerate the computations_ on their own, PTP relies on PyTorch and extensively uses its mechanisms for distribution of computations on CPUs/GPUs, including multi-process data loaders and multi-GPU data parallelism.
The models are _agnostic_ to those operations and one indicates whether to use them in configuration files (data loaders) or by passing adequate run-time arguments (--gpu).
The models are _agnostic_ to those operations and one indicates whether to use them in configuration files (data loaders) or by passing adequate argument (--gpu) at run-time.

**Datasets:**
PTP focuses on multi-modal reasoning combining vision and language. Currently it offers the following _Tasks_ from the following task domains:
PTP focuses on multi-modal reasoning combining vision and language. Currently it offers the following _Tasks_ from the following task, categorized into three domains:

* CLEVR, GQA, ImageCLEF VQA-Med 2019 (Visual Question Answering)
* MNIST, CIFAR-100 (Image Classification)
* WiLY (Language Identification)
* WikiText-2 / WikiText-103 (Language Modelling)
* ANKI (Machine Translation)
![Alt text](docs/source/img/components/ptp_tasks.png?raw=true)

Aside of providing batches of samples, the Task class will automatically download the files associated with a given dataset (as long as the dataset is publicly available).
The diversity of those tasks (and associated models) proves the flexibility of the framework, we are working on incorporation of new ones into PTP.
The diversity of those tasks (and the associated models) proves the flexibility of the framework.
We are constantly working on incorporation of new Tasks into PTP.

**Pipelines:**
What people typically define as a _model_ in PTP is framed as a _pipeline_, consisting of many inter-connected components, with one or more _Models_ containing trainable elements.
Expand All @@ -41,35 +42,27 @@ The framework offers full flexibility and it is up to the programmer to choose t
Such a decomposition enables one to easily combine many components and models into pipelines, whereas the framework supports loading of pretrained models, freezing during training, saving them to checkpoints etc.

**Model/Component Zoo:**
PTP provides several ready to use, out of the box components, from ones of general usage to very specialised ones:
PTP provides several ready to use, out of the box models and other, non-trainable (but parametrizable) components.

* Feed Forward Network (Fully Connected layers with activation functions and dropout, variable number of hidden layers, general usage)
* Torch Vision Wrapper (wrapping several models from Torch Vision, e.g. VGG-16, ResNet-50, ResNet-152, DenseNet-121, general usage)
* Convnet Encoder (CNNs with ReLU and MaxPooling, can work with different sizes of images)
* LeNet-5 (classical baseline)
* Recurrent Neural Network (different kernels with activation functions and dropout, a single model can work both as encoder or decoder, general usage)
* Seq2Seq (Sequence to Sequence model, classical baseline)
* Attention Decoder (RNN-based decoder implementing Bahdanau-style attention, classical baseline)
* Sentence Embeddings (encodes words using embedding layer, general usage)

Currently PTP offers the following models useful for multi-modal fusion and reasoning:
![Alt text](docs/source/img/components/ptp_models.png?raw=true)

* VQA Attention (simple question-driven attention over the image)
* Element Wise Multiplication (Multi-modal Low-rank Bilinear pooling, MLB)
* Multimodel Compact Bilinear Pooling (MCB)
* Multimodal Factorized Bilinear Pooling
* Relational Networks
The model zoo includes several general usage components, such as:
* Feed Forward Network (variable number of Fully Connected layers with activation functions and dropout)
* Recurrent Neural Network (different cell types with activation functions and dropout, a single model can work both as encoder or decoder)

The framework also offers several components useful when working with text:
It also inludes few models specific for a given domain, but still quite general:
* Convnet Encoder (CNNs with ReLU and MaxPooling, can work with different sizes of images)
* General Image Encoder (wrapping several models from Torch Vision)
* Sentence Embeddings (encoding words using the embedding layer)

There are also some classical baselines both for vision like LeNet-5 or language domains, e.g. Seq2Seq (Sequence to Sequence model) or Attention Decoder (RNN-based decoder implementing Bahdanau-style attention).
PTP also offers the several models useful for multi-modal fusion and reasoning.

* Sentence Tokenizer
* Sentence Indexer
* Sentence One Hot Encoder
* Label Indexer
* BoW Encoder
* Word Decoder
![Alt text](docs/source/img/components/ptp_components_others.png?raw=true)

and several general-purpose components, from tensor transformations (List to Tensor, Reshape Tensor, Reduce Tensor, Concatenate Tensor), to components calculating losses (NLL Loss) and statistics (Accuracy Statistics, Precision/Recall Statistics, BLEU Statistics etc.) to viewers (Stream Viewer, Stream File Exporter etc.).
The framework also offers components useful when working with language, vision or other types of streams (e.g. tensor transformations).
There are also several general-purpose components, from components calculating losses and statistics to publishers and viewers.

**Workers:**
PTP workers are python scripts that are _agnostic_ to the tasks/models/pipelines that they are supposed to work with.
Expand Down Expand Up @@ -107,9 +100,111 @@ This command will install all dependencies via pip_, while still enabling you to
More in that subject can be found in the following blog post on [dev_mode](https://setuptools.readthedocs.io/en/latest/setuptools.html#development-mode).


## Quick start: MNIST image classification with a simple ConvNet model

Please consider a simple ConvNet model consisting of two parts:
* few convolutional layers accepting the MNIST images and returning feature maps being, in general, a 4D tensor (first dimension being the batch size, a rule of thumb in PTP),
* one (or more) dense layers that accept the (flattened) feature maps and return predictions in the form of logarithm of probability distributions (LogSoftmax as last non-linearity).

### Training the model

Assume that we will use ```NLL Loss``` function, and, besides, want to monitor the ```Accuracy``` statistics.
The resulting pipeline is presented below.
The additional ```Answer Decoder``` component translates the predictions into class names, whereas ```Stream Viewer``` displays content of the indicated data streams for a single sample randomly picked from the batch.
The associated ```mnist_classification_convnet_softmax.yml``` configuration file can be found in ```configs/tutorials``` folder.


![Alt text](docs/source/img/1_tutorials/data_flow_tutorial_mnist_1_training.png?raw=true "Trainining of a simple ConvNet model on MNIST dataset")


We will train the model with _ptp-offline-trainer_, a general _worker_ script that follows the classical training-validation, epoch-based methodology.
This means, that despite the presence of three section (associated with training, validation and test splits of the MNIST dataset) the trainer will consider only the content of ``training`` and ```validation``` sections (plus ```pipeline```, containing the definition of the whole pipeline).
Let's run the training by calling the following from the command line:

```console
ptp-offline-trainer --c configs/tutorials/mnist_classification_convnet_softmax.yml
```

__Note__: Please call ```offline-trainer --h``` to learn more about the run-time arguments. In order to understand the structure of the main configuration file please look at the default configuration file of the trainer located in ```configs/default/workers``` folder.

The trainer will log on the console training and validation statistis, along with additional information logged by the components, e.g. contents of the streams:

```console
[2019-07-05 13:31:44] - INFO - OfflineTrainer >>> episode 006000; epoch 06; loss 0.1968410313; accuracy 0.9219
[2019-07-05 13:31:45] - INFO - OfflineTrainer >>> End of epoch: 6
================================================================================
[2019-07-05 13:31:45] - INFO - OfflineTrainer >>> episode 006019; episodes_aggregated 000860; epoch 06; loss 0.1799264401; loss_min 0.0302138925; loss_max 0.5467863679; loss_std 0.0761705562; accuracy 0.94593; accuracy_std 0.02871 [Full Training]
[2019-07-05 13:31:45] - INFO - OfflineTrainer >>> Validating over the entire validation set (5000 samples in 79 episodes)
[2019-07-05 13:31:45] - INFO - stream_viewer >>> Showing selected streams for sample 20 (index: 55358):
'labels': One
'targets': 1
'predictions': tensor([-1.1452e+01, -1.6804e-03, -1.1357e+01, -1.1923e+01, -6.6160e+00,
-1.4658e+01, -9.6191e+00, -8.6472e+00, -9.6082e+00, -1.3505e+01])
'predicted_answers': One
```

Please note that whenever the validation loss goes down, the trainer automatically will save the pipeline to the checkpoint file:

```console
[2019-07-05 13:31:47] - INFO - OfflineTrainer >>> episode 006019; episodes_aggregated 000079; epoch 06; loss 0.1563445479; loss_min 0.0299939774; loss_max 0.5055227876; loss_std 0.0854654983; accuracy 0.95740; accuracy_std 0.02495 [Full Validation]
[2019-07-05 13:31:47] - INFO - mnist_classification_convnet_softmax >>> Exporting pipeline 'mnist_classification_convnet_softmax' parameters to checkpoint:
/users/tomaszkornuta/experiments/mnist/mnist_classification_convnet_softmax/20190705_132624/checkpoints/mnist_classification_convnet_softmax_best.pt
+ Model 'image_encoder' [ConvNetEncoder] params saved
+ Model 'classifier' [FeedForwardNetwork] params saved
```

After the training finsh the trainer will inform about the termination reason and indicate where the experiment files (model checkpoint, log files, statistics etc.) can be found:

```console
[2019-07-05 13:32:33] - INFO - mnist_classification_convnet_softmax >>> Updated training status in checkpoint:
/users/tomaszkornuta/experiments/mnist/mnist_classification_convnet_softmax/20190705_132624/checkpoints/mnist_classification_convnet_softmax_best.pt
[2019-07-05 13:32:33] - INFO - OfflineTrainer >>>
================================================================================
[2019-07-05 13:32:33] - INFO - OfflineTrainer >>> Training finished because Converged (Full Validation Loss went below Loss Stop threshold of 0.15)
[2019-07-05 13:32:33] - INFO - OfflineTrainer >>> Experiment finished!
[2019-07-05 13:32:33] - INFO - OfflineTrainer >>> Experiment logged to: /users/tomaszkornuta/experiments/mnist/mnist_classification_convnet_softmax/20190705_132624/
```


### Testing the model

In order to test the model generalization we will use _ptp-processor_, yet another general _worker_ script that performs a single pass over the indicated set.


![Alt text](docs/source/img/1_tutorials/data_flow_tutorial_mnist_2_test.png?raw=true "Test of the pretrained model on test split of the MNIST dataset ")


```console
ptp-processor --load /users/tomaszkornuta/experiments/mnist/mnist_classification_convnet_softmax/20190705_132624/checkpoints/mnist_classification_convnet_softmax_best.pt
```

__Note__: _ptp-processor_ uses the content of _test_ section as default, but it can be changed at run-time. Please call ```ptp-processor --h``` to learn about the available run-time arguments.


```console
[2019-07-05 13:34:41] - INFO - Processor >>> episode 000313; episodes_aggregated 000157; loss 0.1464060694; loss_min 0.0352710858; loss_max 0.3801054060; loss_std 0.0669835582; accuracy 0.95770; accuracy_std 0.02471 [Full Set]
[2019-07-05 13:34:41] - INFO - Processor >>> Experiment logged to: /users/tomaszkornuta/experiments/mnist/mnist_classification_convnet_softmax/20190705_132624/test_20190705_133436/
```

__Note__: Please analyze the ```mnist_classification_convnet_softmax.yml``` configuration file (located in ```configs/tutorials``` directory). Keep in mind that:
* all components come with default configuration files, located in ```configs/default/components``` folders,
* all workers come with default configuration files, located in ```configs/default/workers``` folders.


## Contributions

PTP is open for external contributions.
We follow the [Git Branching Model](https://nvie.com/posts/a-successful-git-branching-model/), in short:
* ```develop``` branch is the main branch, ```master``` branch is for used for releases only
* all changes are integrated by merging pull requests from feat/fix/other branches
* PTP is integrated with several DevOps monitoring the quality of code/pull requrests
* we strongly encourage unit testing and Test-Driven Development
* we use projects and kanban to monitor issues/progress/etc.


## Maintainers

A project of the Machine Intelligence team, IBM Research, Almaden.
A project of the Machine Intelligence team, IBM Research AI, Almaden Research Center.

* Tomasz Kornuta (tkornut@us.ibm.com)

Expand Down
108 changes: 108 additions & 0 deletions configs/tutorials/mnist_classification_convnet_softmax.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# Training parameters:
training:
task:
type: MNIST
batch_size: &b 64
use_train_data: True
# Use sampler that operates on a subset.
sampler:
type: SubsetRandomSampler
indices: [0, 55000]
# optimizer parameters:
optimizer:
type: Adam
lr: 0.0001
# settings parameters
terminal_conditions:
loss_stop_threshold: 0.15
early_stop_validations: -1
episode_limit: 10000
epoch_limit: 10

# Validation parameters:
validation:
task:
type: MNIST
batch_size: *b
use_train_data: True # True because we are splitting the training set to: validation and training
# Use sampler that operates on a subset.
sampler:
type: SubsetRandomSampler
indices: [55000, 60000]

# Testing parameters:
test:
task:
type: MNIST
batch_size: *b
use_train_data: False # Test set.

pipeline:
# Model 1: 3 CNN layers.
image_encoder:
type: ConvNetEncoder
priority: 1
# Using default stream names, so the following could be removed (leaving it just for the clarity though).
streams:
inputs: inputs
feature_maps: feature_maps

# Reshape inputs
reshaper:
type: ReshapeTensor
input_dims: [-1, 16, 1, 1]
output_dims: [-1, 16]
priority: 2
streams:
inputs: feature_maps
outputs: reshaped_maps
globals:
output_size: reshaped_maps_size

# Model 2: 1 Fully connected layer with softmax acitvation.
classifier:
type: FeedForwardNetwork
priority: 3
streams:
inputs: reshaped_maps
# Using default stream name, so the following could be removed (leaving it just for the clarity though).
predictions: predictions
globals:
input_size: reshaped_maps_size
prediction_size: num_classes


# Loss
nllloss:
type: NLLLoss
priority: 4
# Using default stream names, so the following could be removed (leaving it just for the clarity though).
streams:
targets: targets
predictions: predictions

accuracy:
priority: 5
type: AccuracyStatistics
# Using default stream names, so the following could be removed (leaving it just for the clarity though).
streams:
targets: targets
predictions: predictions

answer_decoder:
priority: 6
type: WordDecoder
import_word_mappings_from_globals: True
globals:
word_mappings: label_word_mappings
streams:
inputs: predictions
outputs: predicted_answers

stream_viewer:
priority: 7
type: StreamViewer
input_streams: labels, targets, predictions, predicted_answers


#: pipeline
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/img/components/ptp_components.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/img/components/ptp_models.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/img/components/ptp_tasks.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.