diff --git a/README.md b/README.md index e5b5020..1ed8ba2 100644 --- a/README.md +++ b/README.md @@ -17,22 +17,23 @@ PyTorchPipe (PTP) is a component-oriented framework that facilitates development PTP frames training and testing procedures as _pipelines_ consisting of many components communicating through data streams. Each such a stream can consist of several components, including one task instance (providing batches of data), any number of trainable components (models) and additional components providing required transformations and computations. + +![Alt text](docs/source/img/data_flow_vqa_5_attention_gpu_loaders.png?raw=true "Exemplary multi-modal data flow diagram") + + As a result, the training & testing procedures are no longer pinned to a specific task or model, and built-in mechanisms for compatibility checking (handshaking), configuration and global variables management & statistics collection facilitate rapid development of complex pipelines and running diverse experiments. In its core, to _accelerate the computations_ on their own, PTP relies on PyTorch and extensively uses its mechanisms for distribution of computations on CPUs/GPUs, including multi-process data loaders and multi-GPU data parallelism. -The models are _agnostic_ to those operations and one indicates whether to use them in configuration files (data loaders) or by passing adequate run-time arguments (--gpu). +The models are _agnostic_ to those operations and one indicates whether to use them in configuration files (data loaders) or by passing adequate argument (--gpu) at run-time. **Datasets:** -PTP focuses on multi-modal reasoning combining vision and language. Currently it offers the following _Tasks_ from the following task domains: +PTP focuses on multi-modal reasoning combining vision and language. Currently it offers the following _Tasks_ from the following task, categorized into three domains: - * CLEVR, GQA, ImageCLEF VQA-Med 2019 (Visual Question Answering) - * MNIST, CIFAR-100 (Image Classification) - * WiLY (Language Identification) - * WikiText-2 / WikiText-103 (Language Modelling) - * ANKI (Machine Translation) +![Alt text](docs/source/img/components/ptp_tasks.png?raw=true) Aside of providing batches of samples, the Task class will automatically download the files associated with a given dataset (as long as the dataset is publicly available). -The diversity of those tasks (and associated models) proves the flexibility of the framework, we are working on incorporation of new ones into PTP. +The diversity of those tasks (and the associated models) proves the flexibility of the framework. +We are constantly working on incorporation of new Tasks into PTP. **Pipelines:** What people typically define as a _model_ in PTP is framed as a _pipeline_, consisting of many inter-connected components, with one or more _Models_ containing trainable elements. @@ -41,35 +42,27 @@ The framework offers full flexibility and it is up to the programmer to choose t Such a decomposition enables one to easily combine many components and models into pipelines, whereas the framework supports loading of pretrained models, freezing during training, saving them to checkpoints etc. **Model/Component Zoo:** -PTP provides several ready to use, out of the box components, from ones of general usage to very specialised ones: +PTP provides several ready to use, out of the box models and other, non-trainable (but parametrizable) components. - * Feed Forward Network (Fully Connected layers with activation functions and dropout, variable number of hidden layers, general usage) - * Torch Vision Wrapper (wrapping several models from Torch Vision, e.g. VGG-16, ResNet-50, ResNet-152, DenseNet-121, general usage) - * Convnet Encoder (CNNs with ReLU and MaxPooling, can work with different sizes of images) - * LeNet-5 (classical baseline) - * Recurrent Neural Network (different kernels with activation functions and dropout, a single model can work both as encoder or decoder, general usage) - * Seq2Seq (Sequence to Sequence model, classical baseline) - * Attention Decoder (RNN-based decoder implementing Bahdanau-style attention, classical baseline) - * Sentence Embeddings (encodes words using embedding layer, general usage) -Currently PTP offers the following models useful for multi-modal fusion and reasoning: +![Alt text](docs/source/img/components/ptp_models.png?raw=true) - * VQA Attention (simple question-driven attention over the image) - * Element Wise Multiplication (Multi-modal Low-rank Bilinear pooling, MLB) - * Multimodel Compact Bilinear Pooling (MCB) - * Multimodal Factorized Bilinear Pooling - * Relational Networks +The model zoo includes several general usage components, such as: + * Feed Forward Network (variable number of Fully Connected layers with activation functions and dropout) + * Recurrent Neural Network (different cell types with activation functions and dropout, a single model can work both as encoder or decoder) -The framework also offers several components useful when working with text: +It also inludes few models specific for a given domain, but still quite general: + * Convnet Encoder (CNNs with ReLU and MaxPooling, can work with different sizes of images) + * General Image Encoder (wrapping several models from Torch Vision) + * Sentence Embeddings (encoding words using the embedding layer) + +There are also some classical baselines both for vision like LeNet-5 or language domains, e.g. Seq2Seq (Sequence to Sequence model) or Attention Decoder (RNN-based decoder implementing Bahdanau-style attention). +PTP also offers the several models useful for multi-modal fusion and reasoning. - * Sentence Tokenizer - * Sentence Indexer - * Sentence One Hot Encoder - * Label Indexer - * BoW Encoder - * Word Decoder +![Alt text](docs/source/img/components/ptp_components_others.png?raw=true) -and several general-purpose components, from tensor transformations (List to Tensor, Reshape Tensor, Reduce Tensor, Concatenate Tensor), to components calculating losses (NLL Loss) and statistics (Accuracy Statistics, Precision/Recall Statistics, BLEU Statistics etc.) to viewers (Stream Viewer, Stream File Exporter etc.). +The framework also offers components useful when working with language, vision or other types of streams (e.g. tensor transformations). +There are also several general-purpose components, from components calculating losses and statistics to publishers and viewers. **Workers:** PTP workers are python scripts that are _agnostic_ to the tasks/models/pipelines that they are supposed to work with. @@ -107,9 +100,111 @@ This command will install all dependencies via pip_, while still enabling you to More in that subject can be found in the following blog post on [dev_mode](https://setuptools.readthedocs.io/en/latest/setuptools.html#development-mode). +## Quick start: MNIST image classification with a simple ConvNet model + +Please consider a simple ConvNet model consisting of two parts: + * few convolutional layers accepting the MNIST images and returning feature maps being, in general, a 4D tensor (first dimension being the batch size, a rule of thumb in PTP), + * one (or more) dense layers that accept the (flattened) feature maps and return predictions in the form of logarithm of probability distributions (LogSoftmax as last non-linearity). + +### Training the model + +Assume that we will use ```NLL Loss``` function, and, besides, want to monitor the ```Accuracy``` statistics. +The resulting pipeline is presented below. +The additional ```Answer Decoder``` component translates the predictions into class names, whereas ```Stream Viewer``` displays content of the indicated data streams for a single sample randomly picked from the batch. +The associated ```mnist_classification_convnet_softmax.yml``` configuration file can be found in ```configs/tutorials``` folder. + + +![Alt text](docs/source/img/1_tutorials/data_flow_tutorial_mnist_1_training.png?raw=true "Trainining of a simple ConvNet model on MNIST dataset") + + +We will train the model with _ptp-offline-trainer_, a general _worker_ script that follows the classical training-validation, epoch-based methodology. +This means, that despite the presence of three section (associated with training, validation and test splits of the MNIST dataset) the trainer will consider only the content of ``training`` and ```validation``` sections (plus ```pipeline```, containing the definition of the whole pipeline). +Let's run the training by calling the following from the command line: + +```console +ptp-offline-trainer --c configs/tutorials/mnist_classification_convnet_softmax.yml +``` + +__Note__: Please call ```offline-trainer --h``` to learn more about the run-time arguments. In order to understand the structure of the main configuration file please look at the default configuration file of the trainer located in ```configs/default/workers``` folder. + +The trainer will log on the console training and validation statistis, along with additional information logged by the components, e.g. contents of the streams: + +```console +[2019-07-05 13:31:44] - INFO - OfflineTrainer >>> episode 006000; epoch 06; loss 0.1968410313; accuracy 0.9219 +[2019-07-05 13:31:45] - INFO - OfflineTrainer >>> End of epoch: 6 +================================================================================ +[2019-07-05 13:31:45] - INFO - OfflineTrainer >>> episode 006019; episodes_aggregated 000860; epoch 06; loss 0.1799264401; loss_min 0.0302138925; loss_max 0.5467863679; loss_std 0.0761705562; accuracy 0.94593; accuracy_std 0.02871 [Full Training] +[2019-07-05 13:31:45] - INFO - OfflineTrainer >>> Validating over the entire validation set (5000 samples in 79 episodes) +[2019-07-05 13:31:45] - INFO - stream_viewer >>> Showing selected streams for sample 20 (index: 55358): + 'labels': One + 'targets': 1 + 'predictions': tensor([-1.1452e+01, -1.6804e-03, -1.1357e+01, -1.1923e+01, -6.6160e+00, + -1.4658e+01, -9.6191e+00, -8.6472e+00, -9.6082e+00, -1.3505e+01]) + 'predicted_answers': One +``` + +Please note that whenever the validation loss goes down, the trainer automatically will save the pipeline to the checkpoint file: + +```console +[2019-07-05 13:31:47] - INFO - OfflineTrainer >>> episode 006019; episodes_aggregated 000079; epoch 06; loss 0.1563445479; loss_min 0.0299939774; loss_max 0.5055227876; loss_std 0.0854654983; accuracy 0.95740; accuracy_std 0.02495 [Full Validation] +[2019-07-05 13:31:47] - INFO - mnist_classification_convnet_softmax >>> Exporting pipeline 'mnist_classification_convnet_softmax' parameters to checkpoint: + /users/tomaszkornuta/experiments/mnist/mnist_classification_convnet_softmax/20190705_132624/checkpoints/mnist_classification_convnet_softmax_best.pt + + Model 'image_encoder' [ConvNetEncoder] params saved + + Model 'classifier' [FeedForwardNetwork] params saved +``` + +After the training finsh the trainer will inform about the termination reason and indicate where the experiment files (model checkpoint, log files, statistics etc.) can be found: + +```console +[2019-07-05 13:32:33] - INFO - mnist_classification_convnet_softmax >>> Updated training status in checkpoint: + /users/tomaszkornuta/experiments/mnist/mnist_classification_convnet_softmax/20190705_132624/checkpoints/mnist_classification_convnet_softmax_best.pt +[2019-07-05 13:32:33] - INFO - OfflineTrainer >>> +================================================================================ +[2019-07-05 13:32:33] - INFO - OfflineTrainer >>> Training finished because Converged (Full Validation Loss went below Loss Stop threshold of 0.15) +[2019-07-05 13:32:33] - INFO - OfflineTrainer >>> Experiment finished! +[2019-07-05 13:32:33] - INFO - OfflineTrainer >>> Experiment logged to: /users/tomaszkornuta/experiments/mnist/mnist_classification_convnet_softmax/20190705_132624/ +``` + + +### Testing the model + +In order to test the model generalization we will use _ptp-processor_, yet another general _worker_ script that performs a single pass over the indicated set. + + +![Alt text](docs/source/img/1_tutorials/data_flow_tutorial_mnist_2_test.png?raw=true "Test of the pretrained model on test split of the MNIST dataset ") + + +```console +ptp-processor --load /users/tomaszkornuta/experiments/mnist/mnist_classification_convnet_softmax/20190705_132624/checkpoints/mnist_classification_convnet_softmax_best.pt +``` + +__Note__: _ptp-processor_ uses the content of _test_ section as default, but it can be changed at run-time. Please call ```ptp-processor --h``` to learn about the available run-time arguments. + + +```console +[2019-07-05 13:34:41] - INFO - Processor >>> episode 000313; episodes_aggregated 000157; loss 0.1464060694; loss_min 0.0352710858; loss_max 0.3801054060; loss_std 0.0669835582; accuracy 0.95770; accuracy_std 0.02471 [Full Set] +[2019-07-05 13:34:41] - INFO - Processor >>> Experiment logged to: /users/tomaszkornuta/experiments/mnist/mnist_classification_convnet_softmax/20190705_132624/test_20190705_133436/ +``` + +__Note__: Please analyze the ```mnist_classification_convnet_softmax.yml``` configuration file (located in ```configs/tutorials``` directory). Keep in mind that: + * all components come with default configuration files, located in ```configs/default/components``` folders, + * all workers come with default configuration files, located in ```configs/default/workers``` folders. + + +## Contributions + +PTP is open for external contributions. +We follow the [Git Branching Model](https://nvie.com/posts/a-successful-git-branching-model/), in short: + * ```develop``` branch is the main branch, ```master``` branch is for used for releases only + * all changes are integrated by merging pull requests from feat/fix/other branches + * PTP is integrated with several DevOps monitoring the quality of code/pull requrests + * we strongly encourage unit testing and Test-Driven Development + * we use projects and kanban to monitor issues/progress/etc. + + ## Maintainers -A project of the Machine Intelligence team, IBM Research, Almaden. +A project of the Machine Intelligence team, IBM Research AI, Almaden Research Center. * Tomasz Kornuta (tkornut@us.ibm.com) diff --git a/configs/tutorials/mnist_classification_convnet_softmax.yml b/configs/tutorials/mnist_classification_convnet_softmax.yml new file mode 100644 index 0000000..d433952 --- /dev/null +++ b/configs/tutorials/mnist_classification_convnet_softmax.yml @@ -0,0 +1,108 @@ +# Training parameters: +training: + task: + type: MNIST + batch_size: &b 64 + use_train_data: True + # Use sampler that operates on a subset. + sampler: + type: SubsetRandomSampler + indices: [0, 55000] + # optimizer parameters: + optimizer: + type: Adam + lr: 0.0001 + # settings parameters + terminal_conditions: + loss_stop_threshold: 0.15 + early_stop_validations: -1 + episode_limit: 10000 + epoch_limit: 10 + +# Validation parameters: +validation: + task: + type: MNIST + batch_size: *b + use_train_data: True # True because we are splitting the training set to: validation and training + # Use sampler that operates on a subset. + sampler: + type: SubsetRandomSampler + indices: [55000, 60000] + +# Testing parameters: +test: + task: + type: MNIST + batch_size: *b + use_train_data: False # Test set. + +pipeline: + # Model 1: 3 CNN layers. + image_encoder: + type: ConvNetEncoder + priority: 1 + # Using default stream names, so the following could be removed (leaving it just for the clarity though). + streams: + inputs: inputs + feature_maps: feature_maps + + # Reshape inputs + reshaper: + type: ReshapeTensor + input_dims: [-1, 16, 1, 1] + output_dims: [-1, 16] + priority: 2 + streams: + inputs: feature_maps + outputs: reshaped_maps + globals: + output_size: reshaped_maps_size + + # Model 2: 1 Fully connected layer with softmax acitvation. + classifier: + type: FeedForwardNetwork + priority: 3 + streams: + inputs: reshaped_maps + # Using default stream name, so the following could be removed (leaving it just for the clarity though). + predictions: predictions + globals: + input_size: reshaped_maps_size + prediction_size: num_classes + + + # Loss + nllloss: + type: NLLLoss + priority: 4 + # Using default stream names, so the following could be removed (leaving it just for the clarity though). + streams: + targets: targets + predictions: predictions + + accuracy: + priority: 5 + type: AccuracyStatistics + # Using default stream names, so the following could be removed (leaving it just for the clarity though). + streams: + targets: targets + predictions: predictions + + answer_decoder: + priority: 6 + type: WordDecoder + import_word_mappings_from_globals: True + globals: + word_mappings: label_word_mappings + streams: + inputs: predictions + outputs: predicted_answers + + stream_viewer: + priority: 7 + type: StreamViewer + input_streams: labels, targets, predictions, predicted_answers + + +#: pipeline diff --git a/docs/source/img/1_tutorials/data_flow_tutorial_mnist_1_training.png b/docs/source/img/1_tutorials/data_flow_tutorial_mnist_1_training.png new file mode 100644 index 0000000..0de29a1 Binary files /dev/null and b/docs/source/img/1_tutorials/data_flow_tutorial_mnist_1_training.png differ diff --git a/docs/source/img/1_tutorials/data_flow_tutorial_mnist_2_test.png b/docs/source/img/1_tutorials/data_flow_tutorial_mnist_2_test.png new file mode 100644 index 0000000..b1d30ba Binary files /dev/null and b/docs/source/img/1_tutorials/data_flow_tutorial_mnist_2_test.png differ diff --git a/docs/source/img/components/ptp_components.png b/docs/source/img/components/ptp_components.png new file mode 100644 index 0000000..ea70a00 Binary files /dev/null and b/docs/source/img/components/ptp_components.png differ diff --git a/docs/source/img/components/ptp_components_others.png b/docs/source/img/components/ptp_components_others.png new file mode 100644 index 0000000..a959dae Binary files /dev/null and b/docs/source/img/components/ptp_components_others.png differ diff --git a/docs/source/img/components/ptp_models.png b/docs/source/img/components/ptp_models.png new file mode 100644 index 0000000..ab08406 Binary files /dev/null and b/docs/source/img/components/ptp_models.png differ diff --git a/docs/source/img/components/ptp_tasks.png b/docs/source/img/components/ptp_tasks.png new file mode 100644 index 0000000..360d001 Binary files /dev/null and b/docs/source/img/components/ptp_tasks.png differ diff --git a/docs/source/img/data_flow_vqa_5_attention_gpu_loaders.png b/docs/source/img/data_flow_vqa_5_attention_gpu_loaders.png new file mode 100644 index 0000000..5e16016 Binary files /dev/null and b/docs/source/img/data_flow_vqa_5_attention_gpu_loaders.png differ