In [None]:
!curl -O https://raw.githubusercontent.com/deepjavalibrary/d2l-java/master/tools/fix-colab-gpu.sh && bash fix-colab-gpu.sh

## Prepare Java Kernel for Google Colab
Since Java is not natively supported by Colab, we need to run the following code to enable Java kernel on Colab.

1. Run the cell bellow (click it and press Shift+Enter),
2. (If training on CPU, skip this step) If you want to use the GPU with MXNet in DJL 0.10.0, we need CUDA 10.1 or CUDA 10.2.
Since Colab supports CUDA 10.1, we will have to follow some steps to setup the environment.
Refresh the page (press F5) and stay at Python runtime on GPU. Run the file fix-colab-gpu script.

And then ensure that you have switched to CUDA 10.1.
3. After that, switch runtime to Java and hardware to GPU.(Might require refreshing the page and switching runtime)

Now you can write Java code.

In [None]:
!curl -O https://raw.githubusercontent.com/deepjavalibrary/d2l-java/master/tools/colab_build.sh && bash colab_build.sh

# Encoder-Decoder Architecture
:label:`sec_encoder-decoder`

As we have discussed in 
:numref:`sec_machine_translation`,
machine translation
is a major problem domain for sequence transduction models,
whose input and output are
both variable-length sequences.
To handle this type of inputs and outputs,
we can design an architecture with two major components.
The first component is an *encoder*:
it takes a variable-length sequence as the input and transforms it into a state with a fixed shape.
The second component is a *decoder*:
it maps the encoded state of a fixed shape
to a variable-length sequence.
This is called an *encoder-decoder* architecture,
which is depicted in :numref:`fig_encoder_decoder`.

![The encoder-decoder architecture.](https://github.com/d2l-ai/d2l-en-colab/blob/master/img/encoder-decoder.svg?raw=1)
:label:`fig_encoder_decoder`

Let us take machine translation from English to French
as an example.
Given an input sequence in English:
"They", "are", "watching", ".",
this encoder-decoder architecture
first encodes the variable-length input into a state,
then decodes the state 
to generate the translated sequence token by token
as the output:
"Ils", "regardent", ".".
Since the encoder-decoder architecture
forms the basis
of different sequence transduction models
in subsequent sections,
this section will convert this architecture
into an interface that will be implemented later.

## Encoder

In the encoder interface,
we just specify that
the encoder takes variable-length sequences as the input.
The implementation will be provided 
by any model that inherits this base `Encoder` class.


In [None]:
%load ../utils/djl-imports

In [None]:
public abstract class Encoder extends AbstractBlock {

    /* The base encoder interface for the encoder-decoder architecture. */
    private static final byte VERSION = 1;

    public Encoder() {
        super(VERSION);
    }

    @Override
    abstract protected NDList forwardInternal(
            ParameterStore parameterStore,
            NDList inputs,
            boolean training,
            PairList<String, Object> params);

    @Override
    public Shape[] getOutputShapes(Shape[] inputShapes) {
        throw new UnsupportedOperationException("Not implemented");
    }
}

## Decoder

In the following decoder interface,
we add an additional `initState` function
to convert the encoder output (`encOutputs`)
into the encoded state.
Note that this step
may need extra inputs such as 
the valid length of the input,
which was explained
in :numref:`subsec_mt_data_loading`.
To generate a variable-length sequence token by token,
every time the decoder
may map an input (e.g., the generated token at the previous time step)
and the encoded state
into an output token at the current time step.

In [None]:
public abstract class Decoder extends AbstractBlock {

    /* The base decoder interface for the encoder-decoder architecture. */
    private static final byte VERSION = 1;

    public NDArray attentionWeights;

    public Decoder() {
        super(VERSION);
    }

    @Override
    abstract protected NDList forwardInternal(
            ParameterStore parameterStore,
            NDList inputs,
            boolean training,
            PairList<String, Object> params);

    abstract public NDList initState(NDList encOutputs);

    @Override
    public Shape[] getOutputShapes(Shape[] inputShapes) {
        throw new UnsupportedOperationException("Not implemented");
    }
}


## Putting the Encoder and Decoder Together

In the end,
the encoder-decoder architecture
contains both an encoder and a decoder,
with optionally extra arguments.
In the forward propagation,
the output of the encoder
is used to produce the encoded state,
and this state
will be further used by the decoder as one of its input.


In [None]:
public class EncoderDecoder extends AbstractBlock {

    /* The base class for the encoder-decoder architecture. */
    private static final byte VERSION = 1;

    public Encoder encoder;
    public Decoder decoder;

    public EncoderDecoder(Encoder encoder, Decoder decoder) {
        super(VERSION);

        this.encoder = encoder;
        this.addChildBlock("encoder", this.encoder);
        this.decoder = decoder;
        this.addChildBlock("decoder", this.decoder);
    }

    /** {@inheritDoc} */
    @Override
    public void initializeChildBlocks(NDManager manager, DataType dataType, Shape... inputShapes) {
    }

    @Override
    protected NDList forwardInternal(ParameterStore parameterStore, NDList inputs, boolean training, PairList<String, Object> params) {
        NDArray encX = inputs.get(0);
        NDArray decX = inputs.get(1);
        NDList encOutputs = this.encoder.forward(parameterStore, new NDList(encX), training, params);
        NDList decState = this.decoder.initState(encOutputs);
        return this.decoder.forward(parameterStore, new NDList(decX).addAll(decState), training, params);
    }

    @Override
    public Shape[] getOutputShapes(Shape[] inputShapes) {
        throw new UnsupportedOperationException("Not implemented");
    }
}

The term "state" in the encoder-decoder architecture
has probably inspired you to implement this
architecture using neural networks with states.
In the next section,
we will see how to apply RNNs to design 
sequence transduction models based on 
this encoder-decoder architecture.


## Summary

* The encoder-decoder architecture can handle inputs and outputs that are both variable-length sequences, thus is suitable for sequence transduction problems such as machine translation.
* The encoder takes a variable-length sequence as the input and transforms it into a state with a fixed shape.
* The decoder maps the encoded state of a fixed shape to a variable-length sequence.


## Exercises

1. Suppose that we use neural networks to implement the encoder-decoder architecture. Do the encoder and the decoder have to be the same type of neural network?  
1. Besides machine translation, can you think of another application where the encoder-decoder architecture can be applied?
