IBM · tkornuta-ibm · Jul 5, 2019 · Jul 5, 2019 · Jul 5, 2019 · Jul 5, 2019
diff --git a/README.md b/README.md
@@ -17,22 +17,23 @@ PyTorchPipe (PTP) is a component-oriented framework that facilitates development
 PTP frames training and testing procedures as _pipelines_ consisting of many components communicating through data streams.
 Each such a stream can consist of several components, including one task instance (providing batches of data), any number of trainable components (models) and additional components providing required transformations and computations.
 
+
+![Alt text](docs/source/img/data_flow_vqa_5_attention_gpu_loaders.png?raw=true "Exemplary multi-modal data flow diagram")
+
+
 As a result, the training & testing procedures are no longer pinned to a specific task or model, and built-in mechanisms for compatibility checking (handshaking), configuration and global variables management & statistics collection facilitate rapid development of complex pipelines and running diverse experiments.
 
 In its core, to _accelerate the computations_ on their own, PTP relies on PyTorch and extensively uses its mechanisms for distribution of computations on CPUs/GPUs, including multi-process data loaders and multi-GPU data parallelism.
-The models are _agnostic_ to those operations and one indicates whether to use them in configuration files (data loaders) or by passing adequate run-time arguments (--gpu).
+The models are _agnostic_ to those operations and one indicates whether to use them in configuration files (data loaders) or by passing adequate argument (--gpu) at run-time.
 
 **Datasets:**
-PTP focuses on multi-modal reasoning combining vision and language. Currently it offers the following _Tasks_ from the following task domains:
+PTP focuses on multi-modal reasoning combining vision and language. Currently it offers the following _Tasks_ from the following task, categorized into three domains:
 
-  * CLEVR, GQA, ImageCLEF VQA-Med 2019 (Visual Question Answering)
-  * MNIST, CIFAR-100 (Image Classification)
-  * WiLY (Language Identification)
-  * WikiText-2 / WikiText-103 (Language Modelling)
-  * ANKI (Machine Translation)
+![Alt text](docs/source/img/components/ptp_tasks.png?raw=true)
 
 Aside of providing batches of samples, the Task class will automatically download the files associated with a given dataset (as long as the dataset is publicly available).
-The diversity of those tasks (and associated models) proves the flexibility of the framework, we are working on incorporation of new ones into PTP.
+The diversity of those tasks (and the associated models) proves the flexibility of the framework.
+We are constantly working on incorporation of new Tasks into PTP.
 
 **Pipelines:**
 What people typically define as a _model_ in PTP is framed as a _pipeline_, consisting of many inter-connected components, with one or more _Models_ containing trainable elements.
@@ -41,35 +42,27 @@ The framework offers full flexibility and it is up to the programmer to choose t
 Such a decomposition enables one to easily combine many components and models into pipelines, whereas the framework supports loading of pretrained models, freezing during training, saving them to checkpoints etc.
 
 **Model/Component Zoo:**
-PTP provides several ready to use, out of the box components, from ones of general usage to very specialised ones:
+PTP provides several ready to use, out of the box models and other, non-trainable (but parametrizable) components.
 
-  * Feed Forward Network (Fully Connected layers with activation functions and dropout, variable number of hidden layers, general usage)
-  * Torch Vision Wrapper (wrapping several models from Torch Vision, e.g. VGG-16, ResNet-50, ResNet-152, DenseNet-121, general usage)
-  * Convnet Encoder (CNNs with ReLU and MaxPooling, can work with different sizes of images)
-  * LeNet-5 (classical baseline)
-  * Recurrent Neural Network (different kernels with activation functions and dropout, a single model can work both as encoder or decoder, general usage)
-  * Seq2Seq (Sequence to Sequence model, classical baseline)
-  * Attention Decoder (RNN-based decoder implementing Bahdanau-style attention, classical baseline)
-  * Sentence Embeddings (encodes words using embedding layer, general usage)
 
-Currently PTP offers the following models useful for multi-modal fusion and reasoning:
+![Alt text](docs/source/img/components/ptp_models.png?raw=true)
 
-  * VQA Attention (simple question-driven attention over the image)
-  * Element Wise Multiplication (Multi-modal Low-rank Bilinear pooling, MLB)
-  * Multimodel Compact Bilinear Pooling (MCB)
-  * Multimodal Factorized Bilinear Pooling
-  * Relational Networks
+The model zoo includes several general usage components, such as:
+  * Feed Forward Network (variable number of Fully Connected layers with activation functions and dropout)
+  * Recurrent Neural Network (different cell types with activation functions and dropout, a single model can work both as encoder or decoder)
 
-The framework also offers several components useful when working with text:
+It also inludes few models specific for a given domain, but still quite general:
+  * Convnet Encoder (CNNs with ReLU and MaxPooling, can work with different sizes of images)
+  * General Image Encoder (wrapping several models from Torch Vision)
+  * Sentence Embeddings (encoding words using the embedding layer)
+
+There are also some classical baselines both for vision like LeNet-5 or language domains, e.g. Seq2Seq (Sequence to Sequence model) or Attention Decoder (RNN-based decoder implementing Bahdanau-style attention).
+PTP also offers the several models useful for multi-modal fusion and reasoning.
 
-  * Sentence Tokenizer
-  * Sentence Indexer
-  * Sentence One Hot Encoder
-  * Label Indexer
-  * BoW Encoder
-  * Word Decoder
+![Alt text](docs/source/img/components/ptp_components_others.png?raw=true)
 
-and several general-purpose components, from tensor transformations (List to Tensor, Reshape Tensor, Reduce Tensor, Concatenate Tensor), to components calculating losses (NLL Loss) and statistics (Accuracy Statistics, Precision/Recall Statistics, BLEU Statistics etc.) to viewers (Stream Viewer, Stream File Exporter etc.).
+The framework also offers components useful when working with language, vision or other types of streams (e.g. tensor transformations).
+There are also several general-purpose components, from components calculating losses and statistics to publishers and viewers.
 
 **Workers:**
 PTP workers are python scripts that are _agnostic_ to the tasks/models/pipelines that they are supposed to work with.
@@ -107,9 +100,111 @@ This command will install all dependencies via pip_, while still enabling you to
 More in that subject can be found in the following blog post on [dev_mode](https://setuptools.readthedocs.io/en/latest/setuptools.html#development-mode).
 
 
+## Quick start: MNIST image classification with a simple ConvNet model
+
+Please consider a simple ConvNet model consisting of two parts: 
+  * few convolutional layers accepting the MNIST images and returning feature maps being, in general, a 4D tensor (first dimension being the batch size, a rule of thumb in PTP),
+  * one (or more) dense layers that accept the (flattened) feature maps and return predictions in the form of logarithm of probability distributions (LogSoftmax as last non-linearity).
+
+### Training the model
+
+Assume that we will use ```NLL Loss``` function, and, besides, want to monitor the ```Accuracy``` statistics.
+The resulting pipeline is presented below.
+The additional ```Answer Decoder``` component translates the predictions into class names, whereas ```Stream Viewer``` displays content of the indicated data streams for a single sample randomly picked from the batch.
+The associated ```mnist_classification_convnet_softmax.yml``` configuration file can be found in ```configs/tutorials``` folder.
+
+
+![Alt text](docs/source/img/1_tutorials/data_flow_tutorial_mnist_1_training.png?raw=true "Trainining of a simple ConvNet model on MNIST dataset")
+
+
+We will train the model with _ptp-offline-trainer_, a general _worker_ script that follows the classical training-validation, epoch-based methodology.
+This means, that despite the presence of three section (associated with training, validation and test splits of the MNIST dataset) the trainer will consider only the content of ``training`` and ```validation``` sections (plus ```pipeline```, containing the definition of the whole pipeline).
+Let's run the training by calling the following from the command line:
+
+```console
+ptp-offline-trainer --c configs/tutorials/mnist_classification_convnet_softmax.yml
+```
+
+__Note__: Please call ```offline-trainer --h``` to learn more about the run-time arguments. In order to understand the structure of the main configuration file please look at the default configuration file of the trainer located in ```configs/default/workers``` folder.
+
+The trainer will log on the console training and validation statistis, along with additional information logged by the components, e.g. contents of the streams:
+
+```console
+[2019-07-05 13:31:44] - INFO - OfflineTrainer >>> episode 006000; epoch 06; loss 0.1968410313; accuracy 0.9219
+[2019-07-05 13:31:45] - INFO - OfflineTrainer >>> End of epoch: 6
+================================================================================
+[2019-07-05 13:31:45] - INFO - OfflineTrainer >>> episode 006019; episodes_aggregated 000860; epoch 06; loss 0.1799264401; loss_min 0.0302138925; loss_max 0.5467863679; loss_std 0.0761705562; accuracy 0.94593; accuracy_std 0.02871 [Full Training]
+[2019-07-05 13:31:45] - INFO - OfflineTrainer >>> Validating over the entire validation set (5000 samples in 79 episodes)
+[2019-07-05 13:31:45] - INFO - stream_viewer >>> Showing selected streams for sample 20 (index: 55358):
+ 'labels': One
+ 'targets': 1
+ 'predictions': tensor([-1.1452e+01, -1.6804e-03, -1.1357e+01, -1.1923e+01, -6.6160e+00,
+        -1.4658e+01, -9.6191e+00, -8.6472e+00, -9.6082e+00, -1.3505e+01])
+ 'predicted_answers': One
+```
+
+Please note that whenever the validation loss goes down, the trainer automatically will save the pipeline to the checkpoint file:
+
+```console
+[2019-07-05 13:31:47] - INFO - OfflineTrainer >>> episode 006019; episodes_aggregated 000079; epoch 06; loss 0.1563445479; loss_min 0.0299939774; loss_max 0.5055227876; loss_std 0.0854654983; accuracy 0.95740; accuracy_std 0.02495 [Full Validation]
+[2019-07-05 13:31:47] - INFO - mnist_classification_convnet_softmax >>> Exporting pipeline 'mnist_classification_convnet_softmax' parameters to checkpoint:
+ /users/tomaszkornuta/experiments/mnist/mnist_classification_convnet_softmax/20190705_132624/checkpoints/mnist_classification_convnet_softmax_best.pt
+  + Model 'image_encoder' [ConvNetEncoder] params saved
+  + Model 'classifier' [FeedForwardNetwork] params saved
+```
+
+After the training finsh the trainer will inform about the termination reason and indicate where the experiment files (model checkpoint, log files, statistics etc.) can be found:
+
+```console
+[2019-07-05 13:32:33] - INFO - mnist_classification_convnet_softmax >>> Updated training status in checkpoint:
+ /users/tomaszkornuta/experiments/mnist/mnist_classification_convnet_softmax/20190705_132624/checkpoints/mnist_classification_convnet_softmax_best.pt
+[2019-07-05 13:32:33] - INFO - OfflineTrainer >>>
+================================================================================
+[2019-07-05 13:32:33] - INFO - OfflineTrainer >>> Training finished because Converged (Full Validation Loss went below Loss Stop threshold of 0.15)
+[2019-07-05 13:32:33] - INFO - OfflineTrainer >>> Experiment finished!
+[2019-07-05 13:32:33] - INFO - OfflineTrainer >>> Experiment logged to: /users/tomaszkornuta/experiments/mnist/mnist_classification_convnet_softmax/20190705_132624/
+```
+
+
+### Testing the model
+
+In order to test the model generalization we will use _ptp-processor_, yet another general _worker_ script that performs a single pass over the indicated set.
+
+
+![Alt text](docs/source/img/1_tutorials/data_flow_tutorial_mnist_2_test.png?raw=true "Test of the pretrained model on test split of the MNIST dataset ")
+
+
+```console
+ptp-processor --load /users/tomaszkornuta/experiments/mnist/mnist_classification_convnet_softmax/20190705_132624/checkpoints/mnist_classification_convnet_softmax_best.pt
+```
+
+__Note__: _ptp-processor_ uses the content of _test_ section as default, but it can be changed at run-time. Please call ```ptp-processor --h``` to learn about the available run-time arguments.
+
+
+```console
+[2019-07-05 13:34:41] - INFO - Processor >>> episode 000313; episodes_aggregated 000157; loss 0.1464060694; loss_min 0.0352710858; loss_max 0.3801054060; loss_std 0.0669835582; accuracy 0.95770; accuracy_std 0.02471 [Full Set]
+[2019-07-05 13:34:41] - INFO - Processor >>> Experiment logged to: /users/tomaszkornuta/experiments/mnist/mnist_classification_convnet_softmax/20190705_132624/test_20190705_133436/
+```
+
+__Note__: Please analyze the ```mnist_classification_convnet_softmax.yml``` configuration file (located in ```configs/tutorials``` directory). Keep in mind that:
+  * all components come with default configuration files, located in ```configs/default/components``` folders,
+  * all workers come with default configuration files, located in ```configs/default/workers``` folders.
+
+
+## Contributions
+
+PTP is open for external contributions.
+We follow the [Git Branching Model](https://nvie.com/posts/a-successful-git-branching-model/), in short:
+  * ```develop``` branch is the main branch, ```master``` branch is for used for releases only
+  * all changes are integrated by merging pull requests from feat/fix/other branches
+  * PTP is integrated with several DevOps monitoring the quality of code/pull requrests
+  * we strongly encourage unit testing and Test-Driven Development
+  * we use projects and kanban to monitor issues/progress/etc.
+
+
 ## Maintainers
 
-A project of the Machine Intelligence team, IBM Research, Almaden.
+A project of the Machine Intelligence team, IBM Research AI, Almaden Research Center.
 
 * Tomasz Kornuta (tkornut@us.ibm.com)
 

diff --git a/configs/tutorials/mnist_classification_convnet_softmax.yml b/configs/tutorials/mnist_classification_convnet_softmax.yml
@@ -0,0 +1,108 @@
+# Training parameters:
+training:
+  task: 
+    type: MNIST
+    batch_size: &b 64
+    use_train_data: True
+  # Use sampler that operates on a subset.
+  sampler:
+    type: SubsetRandomSampler
+    indices: [0, 55000]
+  # optimizer parameters:
+  optimizer:
+    type: Adam
+    lr: 0.0001
+  # settings parameters
+  terminal_conditions:
+    loss_stop_threshold: 0.15
+    early_stop_validations: -1
+    episode_limit: 10000
+    epoch_limit: 10
+
+# Validation parameters:
+validation:
+  task:
+    type: MNIST
+    batch_size: *b
+    use_train_data: True  # True because we are splitting the training set to: validation and training
+  # Use sampler that operates on a subset.
+  sampler:
+    type: SubsetRandomSampler
+    indices: [55000, 60000]
+
+# Testing parameters:
+test:
+  task:
+    type: MNIST
+    batch_size: *b
+    use_train_data: False # Test set.
+
+pipeline:
+  # Model 1: 3 CNN layers.
+  image_encoder:
+    type: ConvNetEncoder
+    priority: 1
+    # Using default stream names, so the following could be removed (leaving it just for the clarity though).
+    streams:
+      inputs: inputs
+      feature_maps: feature_maps
+
+  # Reshape inputs
+  reshaper:
+    type: ReshapeTensor
+    input_dims: [-1, 16, 1, 1]
+    output_dims: [-1, 16]
+    priority: 2
+    streams:
+      inputs: feature_maps
+      outputs: reshaped_maps
+    globals:
+      output_size: reshaped_maps_size
+
+  # Model 2: 1 Fully connected layer with softmax acitvation.
+  classifier:
+    type: FeedForwardNetwork 
+    priority: 3
+    streams:
+      inputs: reshaped_maps
+      # Using default stream name, so the following could be removed (leaving it just for the clarity though).
+      predictions: predictions
+    globals:
+      input_size: reshaped_maps_size
+      prediction_size: num_classes
+
+
+  # Loss
+  nllloss:
+    type: NLLLoss
+    priority: 4
+    # Using default stream names, so the following could be removed (leaving it just for the clarity though).
+    streams:
+      targets: targets
+      predictions: predictions
+
+  accuracy:
+    priority: 5
+    type: AccuracyStatistics
+    # Using default stream names, so the following could be removed (leaving it just for the clarity though).
+    streams:
+      targets: targets
+      predictions: predictions
+
+  answer_decoder:
+    priority: 6
+    type: WordDecoder
+    import_word_mappings_from_globals: True
+    globals:
+      word_mappings: label_word_mappings
+    streams:
+      inputs: predictions
+      outputs: predicted_answers
+
+  stream_viewer:
+    priority: 7
+    type: StreamViewer
+    input_streams: labels, targets, predictions, predicted_answers
+
+
+#: pipeline
diff --git a/docs/source/img/1_tutorials/data_flow_tutorial_mnist_1_training.png b/docs/source/img/1_tutorials/data_flow_tutorial_mnist_1_training.png
diff --git a/docs/source/img/1_tutorials/data_flow_tutorial_mnist_2_test.png b/docs/source/img/1_tutorials/data_flow_tutorial_mnist_2_test.png
diff --git a/docs/source/img/components/ptp_components.png b/docs/source/img/components/ptp_components.png
diff --git a/docs/source/img/components/ptp_components_others.png b/docs/source/img/components/ptp_components_others.png
diff --git a/docs/source/img/components/ptp_models.png b/docs/source/img/components/ptp_models.png
diff --git a/docs/source/img/components/ptp_tasks.png b/docs/source/img/components/ptp_tasks.png
diff --git a/docs/source/img/data_flow_vqa_5_attention_gpu_loaders.png b/docs/source/img/data_flow_vqa_5_attention_gpu_loaders.png