Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitoring of non-Theano channels. #349

Closed
rizar opened this issue Feb 25, 2015 · 15 comments
Closed

Monitoring of non-Theano channels. #349

rizar opened this issue Feb 25, 2015 · 15 comments

Comments

@rizar
Copy link
Contributor

rizar commented Feb 25, 2015

Good example when it is necessary: BLEU score or WER on the validation set.

The main challenge is that we again need aggregation schemes, and I do not think there is any chance to share code with aggregation schemes for Theano variable without making it very hard to read.

So I think about:

class MonitoredQuantity(object):
    def initialize(self):
         pass
    def accumulate(self, batch):   
         pass
    def readout(self):
         pass

just as we have for Theano variables, except that there is no need any more to have separate Aggregator and AggregationScheme. In a way the class is the scheme, and the object is the aggregator.

In most cases the user will simply pass a callback, and his callback will be wrapped into a default MonitoredQuantity descendant that will do simple averaging. I would put this logic into the DataStreamMonitoring, whereas the DatasetEvaluator would simply call initialize, accumulate and readout methods.

If you guys like this plan, I think I will implement it pretty soon.

@rizar rizar added this to the Blocks 0.1 milestone Feb 25, 2015
@bartvm
Copy link
Member

bartvm commented Feb 25, 2015

@mohammadpz was asked to implement this for speech recognition, and I already discussed this with him. I proposed a simpler idea though, considering you said you were against implementing any kind of aggregation logic.

My idea then was to simply pass a callback function (basically the accumulate part only) and add a list of values to the log which you would have to post-process/aggregate manually afterwards.

If you want to add aggregation, that's fine by me, but two things about your pseudo code: Your accumulate function needs not only the batch, it might also need the output of some of your Theano functions. You'll need to be able to define the ones it expects. You will also want to be able to define what sources of the batch it should be given, else you will need to re-write your monitor channel each time (e.g. for the supervised and unsupervised case) if your model has different input sources.

So something like this: DatasetEvaluator(callback=(BLUE, ('features',), [beam_search_output])) would make accumulate expect the input sentence and the predicted sentence as an output.

Maybe this is also a good moment to lift the requirement that our data steam provides exactly the same sources as the model inputs. It's feasible that you will want to monitor something here that takes data as an input that doesn't need to go into the model.

Lastly, thinking about it, all of this logic is almost too general just for monitoring, and we might want to call it "callback" instead. For example, this would be a nice place to save the samples of a denoising model to a figure. If you force that into a separate extension you're forced to reimplement the logic for compiling the Theano function, iterating over the validation stream, etc. which would all be handled here.

@rizar
Copy link
Contributor Author

rizar commented Feb 25, 2015

On 25.02.2015 13:32, Bart van Merriënboer wrote:

@mohammadpz https://github.com/mohammadpz was asked to implement
this for speech recognition, and I already discussed this with him. I
proposed a simpler idea though, considering you said you were against
implementing any kind of aggregation logic.

While it is true that I was against it, now it thinks not so difficult,
under assumption that we do not try to integrate with aggregation of
Theano variables. That would be hell.

My idea then was to simply pass a callback function (basically the
accumulate part only) and add a list of values to the log which you
would have to post-process/aggregate manually afterwards.

Hmm, that would make the log look quite ugly. Currently the printing
extension prints all channels that do not start with an underscore.
Would not like it to print these lists.

If you want to add aggregation, that's fine by me, but two things
about your pseudo code: Your accumulate function needs not only the
batch, it might also need the output of some of your Theano functions.

The same for the callback you propose, right?

You'll need to be able to define the ones it expects. You will also
want to be able to define what sources of the batch it should be
given, else you will need to re-write your monitor channel each time
(e.g. for the supervised and unsupervised case) if your model has
different input sources.

The data format with which DatasetEvaluator operates is a dictionary
of data from different sources. Fortunately this PyLearn2-reminiscent
problem you expect above seems to be non-relevant.

So something like this: DatasetEvaluator(callback=(BLUE,
('features',), [beam_search_output])) would make accumulate expect the
input sentence and the predicted sentence as an output.

I am not sure I understand this code, and also beam search output is not
Theano variable. It is written mostly in numpy.

Maybe this is also a good moment to lift the requirement that our data
steam provides exactly the same sources as the model inputs. It's
feasible that you will want to monitor something here that takes data
as an input that doesn't need to go into the model.

That's quite a good point. But this requirement is a very good
protection from non-trivial bugs... I would make it switchable and on by
default.

Lastly, thinking about it, all of this logic is almost too general
just for monitoring, and we might want to call it "callback" instead.
For example, this would be a nice place to save the samples of a
denoising model to a figure. If you force that into a separate
extension you're forced to reimplement the logic for compiling the
Theano function, iterating over the validation stream, etc. which
would all be handled here.

No problem, it can be used for other stuff, but I think that it is
better that terminology-wise we stick to the very clear use-case of BLUE
and WER. Thus we will have readable code with no extra abstraction layer.


Reply to this email directly or view it on GitHub
#349 (comment).

@bartvm
Copy link
Member

bartvm commented Feb 25, 2015

For printing I figured we could do something like print value[:25] + '...' if len(str(value)) > 25 else value. But let's go with aggregation, I like that much better, that way you can at least see the actual BLEU score instead of only calculating it afterwards (allowing you to e.g. do early stopping on it).

The same for the callback you propose, right?

My point is just the following: Any monitoring quantity should be able to specify a list of data stream sources it needs, as well as a list of Theano values from which it wants the value. These Theano variables can then be compiled together with the rest of the DatasetEvaluator. Let's say I want to calculate the WER, I will need the target sentence (from the data stream) and the output sentence (let's see without beam search, so a Theano variable). I don't see the point of monitoring on the batches of the data stream alone. Consider this example:

class LevenshteinDistance(MonitoredQuantity ):
    """Edit distance between language model predictions and targets.

    Assumes on-line learning (i.e. only one example per batch).

    """
    def __init__(self, vocab):
        self.vocab = vocab
        self.total_distance, self.examples_seen = 0, 0

    def accumulate(self, batch, outputs):
        target_words, = batch
        predicted_words, = outputs
        self.total_distance += levenshtein_distance(
            ' '.join(vocab[target_word] for target_word in targets_words),
            ' '.join(vocab[output_word] for output_word in output_word)
        )
        self.examples_seen += 1

    def readout(self):
        return self.total_distance / float(self.examples_seen)

This would then be invoked by something like:

DatasetEvaluator(callbacks=[(Levenshtein, ('targets',), [tensor.argmax(probs, axis=1)], vocab)])

Telling the DatasetEvaluator to compile the expression tensor.argmax(probs, axis=1) with the rest of the Theano variables. It then calls Levenshtein.accumulate each iteration with the requested data from the data stream ('targets', e.g. the next 3 words which the language model had to predict) and the output of the requested Theano variables (the predictions in this case).

@mpezeshki
Copy link
Member

@bartvm @rizar , Thank you guys. I try to understand your ideas and I'll get back to you asap.

@rizar
Copy link
Contributor Author

rizar commented Feb 25, 2015

Okay, I like your idea with outputs, it is crystal clear. I still think that requesting sources is redundant and complicates the picture. Why not give accumulate the dictionary for the whole batch?

I would make the requested Theano variables a part of the MonitoredQuantity interface. That is

DatasetEvaluator(quantities=[
    # supported now
    a_variable, 
    # the innovation
    LevenshteinDistance(requires=[tensor.argmax(probs, axis=1)]),
    # a shortcut for creating simple monitored quantities
    a_callback,  
])

@bartvm
Copy link
Member

bartvm commented Feb 25, 2015

Because you don't know what the names of the sources are. What might be called features in one dataset could be called input_words or features0 in another. You can't rely on a particular naming scheme for sources inside of the accumulate function, so you need to tell the evaluator somehow which ones to pass and in what order, so that the accumulate function knows how to consume them.

The alternative is to use a kind of RenameStream, but that gets really messy, because then the user also needs to change the names of his variables, just to please a monitoring channel.

@rizar
Copy link
Contributor Author

rizar commented Feb 25, 2015

If you do not want to rely on source names, you can add the input variables of your computation graph as requests. I just do not want two different ways of requesting data for one poor little class.

@bartvm
Copy link
Member

bartvm commented Feb 25, 2015

Two things:

That removes the option of requesting sources that the model doesn't use, which I think is important e.g. there might be some sort of metadata that my model is not aware of but that I want to use for monitoring. As a random example, let's say my data is from different domains, and I want to check the BLEU score difference between these domains. I could easily add a data source with this information and feed that to the extension, but there's no Theano variable for this.

Secondly, how intelligent is Theano if it comes to outputting input variables? I just want to make sure we're not doubling the memory used, or worse, transferring data to GPU and immediately transfering it back.

I understand that it seems overly complicated, but I gave it a bit of thought yesterday, and so far I don't see a simpler solution that is flexible enough. This is a pretty important callback, which should be able to do all sorts of crazy monitoring things, so I don't like the idea of hardcoding source names, or limiting it to the same data the model consumes.

@rizar
Copy link
Contributor Author

rizar commented Feb 25, 2015

First: well, you can create a stand-alone variable with the same name as the source you need and it will be matched with the corresponding data somewhere in DatasetEvaluator. This case is not an exception of the general scheme.

Second: if Theano is inefficient, we can handle input variables differently under the hood. The user would keep using variables only, which is very Block-ish. The abstraction of data source be prohibited from spreading all over Blocks code.

@bartvm
Copy link
Member

bartvm commented Feb 25, 2015

Not sure I find it Block-ish to create fake Theano variables that aren't part of any computation graph, but I guess it could keep the interface a bit cleaner.

I think that if we do that, we should still ease the restriction on the sources and input variables matching exactly. I would find it annoying if I remove a MonitorQuantity and suddenly get an error because now my data stream provides a source that isn't used anymore. I think we should just raise a warning when there are unused sources, something along the lines of "WARNING: Your data stream is providing sources that are not being used by the model, it might be more efficient to not load these sources at all."

@bartvm
Copy link
Member

bartvm commented Feb 25, 2015

@mohammadpz I think we've converged on a first concept :) Also, @rizar's interface as proposed in #349 (comment) is nicer than my idea of passing a tuple to callbacks=, so let's stick with that. Let me know if you have any questions, this has been a lengthy discussion, like usual :p

@rizar
Copy link
Contributor Author

rizar commented Feb 25, 2015

In DatasetEvaluator we do not have such restriction at all. So removing your quantity will not change anything. But I do not mind switching to warning there anyway.

Regarding implementation of this: it will be a way easier if in the first draft we do not try to integrate with the AggregationBuffer, but compile a separate function to provide inputs for these monitored quantities. Even though it can be two times slower in the cases when everything will have to be recomputed. After that we can try to have all computations done in one Theano function.

@mpezeshki
Copy link
Member

@bartvm , sure :)

@bartvm bartvm added CCW and removed CCW labels Mar 2, 2015
@bartvm
Copy link
Member

bartvm commented Mar 2, 2015

Just had a look, and it does seem we need to handle input variables for sources that are not part of the graph separately. If you do theano.function([x], [x]) it results in a DeepCopyOp, which is clearly overkill. However, maybe we can leave this optimization for a later ticket (let's just not forget to make the new issue after completing this one).

@dwf
Copy link
Contributor

dwf commented Mar 23, 2015

Addressed in #524.

@dwf dwf closed this as completed Mar 23, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants