diff --git a/README.md b/README.md index ee9cd50..74c545e 100644 --- a/README.md +++ b/README.md @@ -11,52 +11,94 @@ ## Description -PyTorchPipe (PTP) fosters the development of computational _pipelines_ and comparison of diverse neural network-based models. +PyTorchPipe (PTP) is a component-oriented framework that facilitates development of computational _multi-modal pipelines_ and comparison of diverse neural network-based models. PTP frames training and testing procedures as _pipelines_ consisting of many components communicating through data streams. -Each such a stream can consist of several components, including one problem instance (providing batches of data), (zero-or-more) trainable models and (any number of) additional components providing required transformations and computations. +Each such a stream can consist of several components, including one problem instance (providing batches of data), any number of trainable components (models) and additional components providing required transformations and computations. As a result, the training & testing procedures are no longer pinned to a specific problem or model, and built-in mechanisms for compatibility checking (handshaking), configuration management & statistics collection facilitate running diverse experiments. -In its core, to _accelerate the computations_ on their own, PTP relies on PyTorch and extensively uses its mechanisms for distribution of computations on CPUs/GPUs. +In its core, to _accelerate the computations_ on their own, PTP relies on PyTorch and extensively uses its mechanisms for distribution of computations on CPUs/GPUs, including multi-threaded data loaders and multi-GPU data parallelism. +The models are _agnostic_ to those operations and one indicates whether to use them in configuration files (data loaders) or by passing adequate run-time arguments (--gpu). + +**Datasets:** +PTP focuses on multi-modal reasoning combining vision and language. Currently it offers the following _Problems_ from both domains: + + * ImageCLEF VQA-Med 2019 (Visual Question Answering) + * MNIST (Image Classification) + * WiLY (Language Identification) + * WikiText-2 / WikiText-103 (Language Modelling) + * ANKI (Machine Translation) + +Aside of providing batches of samples, the Problem class will automatically download the files associated with a given dataset (as long as the dataset is publicly available). +The diversity of those problems proves the flexibility of the framework, we are working on incorporation of new ones into PTP. + +**Model Zoo:** +What people typically define as _model_ in PTP is decomposed into components, with _Model_ being a defived class that contains trainable elements. +Those components are loosely coupled and care only about the inputs they retrieve and outputs they produce. +The framework offers full flexibility and it is up to the programer to choose the _granularity_ of his/her components/models. +However, PTP provides several ready to use, out of the box components, from ones of general usage to very specialized ones: + + * Feed Forward Network (Fully Connected layers with activation functions and dropout, variable number of hidden layers, general usage) + * Torch Vision Wrapper (wrapping several models from Torch Vision, e.g. VGG-16, ResNet-50, ResNet-152, DenseNet-121, general usage) + * Convnet Encoder (CNNs with ReLU and MaxPooling, can work with different sizes of images) + * LeNet-5 (classical baseline) + * Recurrent Neural Network (different kernels with activation functions and dropout, a single model can work both as encoder or decoder, general usage) + * Seq2Seq (Sequence to Sequence model, classical baseline) + * Attention Decoder (RNN-based decoder implementing Bahdanau-style attention, classical baseline) + * Sencence Embeddings (encodes words using embedding layer, general usage) + +Currently PTP offers the following models useful for multi-modal fusion and reasoning: + + * VQA Attention (simple question-driven attention over the image) + * Element Wise Multiplication (Multi-modal Low-rank Bilinear pooling, MLB) + * Multimodel Compact Bilinear Pooling (MCB) + * Miltimodal Factorized Bilinear Pooling + * Relational Networks + +The framework also offers several components useful when working with text: + + * Sentence Tokenizer + * Sentence Indexer + * Sentence One Hot Encoder + * Label Indexer + * BoW Encoder + * Word Decoder + +and several general-purpose components, from tensor transformations (List to Tensor, Reshape Tensor, Reduce Tensor, Concatenate Tensor), to components calculating losses (NLL Loss) and statistics (Accuracy Statistics, Precision/Recall Statistics, BLEU Statistics etc.) to viewers (Stream Viewer, Stream File Exporter etc.). + +**Workers:** +PTP workers are python scripts that are _agnostic_ to the problems/models/pipelines that they are supposed to work with. +Currently framework offers two main workers: + + * ptp-online-trainer (a flexible trainer creating separate instances of training and validation problems and training the models by feeding the created pipeline with batches of data depending, relying on the notion of an _episode_) + * ptp-processor (performing one pass over the samples returned by a given problem instance, useful for collecting scores on test set, answers for submissions to competitions etc.) ## Installation PTP relies on [PyTorch](https://github.com/pytorch/pytorch), so you need to install it first. -Refer to the official installation [guide](https://github.com/pytorch/pytorch#installation) for its installation. +Please refer to the official installation [guide](https://github.com/pytorch/pytorch#installation) for details. It is easily installable via conda_, or you can compile it from source to optimize it for your machine. PTP is not (yet) available as a [pip](https://pip.pypa.io/en/stable/quickstart/) package, or on [conda](https://anaconda.org/pytorch/pytorch). However, we provide the `setup.py` script and recommend to use it for installation. -First please clone the project repository:: +First please clone the project repository: ```console git clone git@github.com:IBM/pytorchpipe.git cd pytorchpipe/ ``` -Then, install the dependencies by running:: - -```console -python setup.py install -``` - -This command will install all dependencies via pip_. - ---- -**NOTE** - -If you plan to develop and introduce changes, please call the following command instead:: +Next, install the dependencies by running: ```console python setup.py develop ``` -This will enable you to change the code of the existing components/workers and still be able to run them by calling the associated ``ptp-*`` commands. +This command will install all dependencies via pip_, while still enabling you to change the code of the existing components/workers and running them by calling the associated ``ptp-*`` commands. More in that subject can be found in the following blog post on [dev_mode](https://setuptools.readthedocs.io/en/latest/setuptools.html#development-mode). ---- ## Maintainers diff --git a/configs/default/components/models/attn_decoder_rnn..yml b/configs/default/components/models/attn_decoder_rnn..yml deleted file mode 100644 index f676809..0000000 --- a/configs/default/components/models/attn_decoder_rnn..yml +++ /dev/null @@ -1,78 +0,0 @@ -# This file defines the default values for the GRU decoder with attention. - -#################################################################### -# 1. CONFIGURATION PARAMETERS that will be LOADED by the component. -#################################################################### - -# Size of the hidden state (LOADED) -hidden_size: 100 - -# Wether to include the last hidden state in the outputs -output_last_state: False - -# Type of recurrent cell (LOADED) -# -> Only GRU is supported - -# Number of "stacked" layers (LOADED) -# -> Only a single layer is supported - -# Dropout rate (LOADED) -# Default: 0 (means that it is turned off) -dropout_rate: 0 - -# Prediction mode (LOADED) -# Options: -# * Dense (passes every activation through output layer) | -# * Last (passes only the last activation though output layer) | -# * None (all outputs are discarded) -prediction_mode: Dense - -# Enable FFN layer at the output of the RNN (before eventual feed back in the case of autoregression). -# Useful if the raw outputs of the RNN are needed, for attention encoder-decoder for example. -ffn_output: True - -# Length of generated output sequence (LOADED) -# User must set it per task, as it is task specific. -autoregression_length: 10 - -# If true, output of the last layer will be additionally processed with Log Softmax (LOADED) -use_logsoftmax: True - -streams: - #################################################################### - # 2. Keymappings associated with INPUT and OUTPUT streams. - #################################################################### - - # Stream containing batch of encoder outputs (INPUT) - inputs: inputs - - # Stream containing the inital state of the RNN (INPUT) - # The stream will be actually created only if `inital_state: Input` - input_state: input_state - - # Stream containing predictions (OUTPUT) - predictions: predictions - - # Stream containing the final output state of the RNN (output) - # The stream will be actually created only if `output_last_state: True` - output_state: output_state - -globals: - #################################################################### - # 3. Keymappings of variables that will be RETRIEVED from GLOBALS. - #################################################################### - - # Size of the input (RETRIEVED) - input_size: input_size - - # Size of the prediction (RETRIEVED) - prediction_size: prediction_size - - #################################################################### - # 4. Keymappings associated with GLOBAL variables that will be SET. - #################################################################### - - #################################################################### - # 5. Keymappings associated with statistics that will be ADDED. - #################################################################### -