Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Attempting to write a relation network in Gluon #8625

Open
anjishnu opened this issue Nov 12, 2017 · 2 comments
Open

Attempting to write a relation network in Gluon #8625

anjishnu opened this issue Nov 12, 2017 · 2 comments
Labels

Comments

@anjishnu
Copy link
Contributor

anjishnu commented Nov 12, 2017

I was trying to implement a Relational Network: https://arxiv.org/abs/1706.01427
It seems to be related to using for loops within the forward pass, which I assumed would be supported since the API is similar to pytorch

Implementation is below

class RecurrentRelational(gluon.Block):
    def __init__(self, dim=100, num_layers=1, layout='TNC',
                **kwargs):
        super(RecurrentRelational, self).__init__(**kwargs)
        self.key = 'recurrent-relational'
        with self.name_scope():
            # layers created in name_scope will inherit name space
            # from parent layer.
            #self.dropout = nn.Dropout(0.3)
            self.hidden_size = dim
            self.num_layers = num_layers
            self.layout = layout
            self.bn = nn.BatchNorm()

            # Recurrent Encoder
            self.rnn = rnn.RNN(self.hidden_size, self.num_layers,
                                layout=self.layout, bidirectional=True)
            # Relational Network
            self.g_hidden = 100
            self.relational_g1 = nn.Dense(self.g_hidden, activation='relu')
            self.relational_g2 = nn.Dense(self.g_hidden, activation='relu')

            self.relational_f = nn.Dense(100, activation='relu')
            # End RN

            self.binary = nn.Dense(2)

    def activate_relation(self, relation_vector):
        g_z = self.relational_g1(relation_vector)
        g_z = self.bn(g_z)
        g_z = self.relational_g2(g_z)
        return g_z

    def activate_aggregation(self, aggregation):
        return self.relational_f(self.bn(aggregation))

    def forward(self, (x1, x2)):
        z1 = self.rnn(x1)
        z2 = self.rnn(x2)
        batch_size, seq_len, hidden_dim = z1.shape
        num_objects = z1.shape[1]
        all_object_pairs = []

        for i in range(num_objects):
            first_object = z1[:, i, :]
            for j in range(num_objects):
                second_object = z2[:, j, :]
                relation_vector = mx.nd.concat(first_object, second_object, dim=1)
                all_object_pairs.append(relation_vector)

        all_relations = mx.nd.concat(*all_object_pairs, dim=0)
        z_rel = self.activate_relation(all_relations).reshape((-1, num_objects * num_objects,
                                                           self.g_hidden))
        z_agg = mx.nd.sum(z_rel, axis=1)
        return self.binary(self.activate_aggregation(z_agg))

The error I'm getting is

libc++abi.dylib: terminating with uncaught exception of type dmlc::Error: [16:57:57] src/engine/./threaded_engine.h:347: [16:57:57] src/operator/tensor/./matrix_op-inl.h:964: CropAssign only supports kWriteTo

Is there a different way to implement this that may avoid this issue?

I guess I essentially need to do the equivalent of the code below, but with the all_relations array being a memory view of the original array rather than a copy, does anyone know of a good tutorial or example of how to implement this with the NDArray API?

        num_relations = num_objects * num_objects
        #all_relations = []
        all_relations = mx.nd.zeros((batch_size * num_relations, hidden_dim * 2))
        for i in range(num_objects):
            first_object = z1[:, i, :]
            for j in range(num_objects):
                second_object = z2[:, j, :]
                relation_vector = mx.nd.concat(first_object, second_object, dim=1)
                start_index = ((i * num_objects) + j) * batch_size
                #all_relations.append(relation_vector)
                all_relations[start_index : start_index + batch_size] = relation_vector
@eric-haibin-lin
Copy link
Member

@reminisce can you help take a look at this issue?

@anjishnu
Copy link
Contributor Author

anjishnu commented Nov 18, 2017

Relevant thread : https://discuss.mxnet.io/t/cross-product-style-architectures-with-gluon/271/3

The model doesn't throw an exception on the latest mainline branch built from source, but I haven't gotten the network produce anything other than the same prediction for every sample.

And it throws an exception if I try to apply batch-normalization to stabilize training.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants