-
Notifications
You must be signed in to change notification settings - Fork 349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Monitoring of non-Theano channels. #349
Comments
@mohammadpz was asked to implement this for speech recognition, and I already discussed this with him. I proposed a simpler idea though, considering you said you were against implementing any kind of aggregation logic. My idea then was to simply pass a callback function (basically the accumulate part only) and add a list of values to the log which you would have to post-process/aggregate manually afterwards. If you want to add aggregation, that's fine by me, but two things about your pseudo code: Your accumulate function needs not only the batch, it might also need the output of some of your Theano functions. You'll need to be able to define the ones it expects. You will also want to be able to define what sources of the batch it should be given, else you will need to re-write your monitor channel each time (e.g. for the supervised and unsupervised case) if your model has different input sources. So something like this: Maybe this is also a good moment to lift the requirement that our data steam provides exactly the same sources as the model inputs. It's feasible that you will want to monitor something here that takes data as an input that doesn't need to go into the model. Lastly, thinking about it, all of this logic is almost too general just for monitoring, and we might want to call it "callback" instead. For example, this would be a nice place to save the samples of a denoising model to a figure. If you force that into a separate extension you're forced to reimplement the logic for compiling the Theano function, iterating over the validation stream, etc. which would all be handled here. |
On 25.02.2015 13:32, Bart van Merriënboer wrote:
|
For printing I figured we could do something like
My point is just the following: Any monitoring quantity should be able to specify a list of data stream sources it needs, as well as a list of Theano values from which it wants the value. These Theano variables can then be compiled together with the rest of the class LevenshteinDistance(MonitoredQuantity ):
"""Edit distance between language model predictions and targets.
Assumes on-line learning (i.e. only one example per batch).
"""
def __init__(self, vocab):
self.vocab = vocab
self.total_distance, self.examples_seen = 0, 0
def accumulate(self, batch, outputs):
target_words, = batch
predicted_words, = outputs
self.total_distance += levenshtein_distance(
' '.join(vocab[target_word] for target_word in targets_words),
' '.join(vocab[output_word] for output_word in output_word)
)
self.examples_seen += 1
def readout(self):
return self.total_distance / float(self.examples_seen) This would then be invoked by something like: DatasetEvaluator(callbacks=[(Levenshtein, ('targets',), [tensor.argmax(probs, axis=1)], vocab)]) Telling the |
Okay, I like your idea with outputs, it is crystal clear. I still think that requesting sources is redundant and complicates the picture. Why not give I would make the requested Theano variables a part of the DatasetEvaluator(quantities=[
# supported now
a_variable,
# the innovation
LevenshteinDistance(requires=[tensor.argmax(probs, axis=1)]),
# a shortcut for creating simple monitored quantities
a_callback,
]) |
Because you don't know what the names of the sources are. What might be called The alternative is to use a kind of |
If you do not want to rely on source names, you can add the input variables of your computation graph as requests. I just do not want two different ways of requesting data for one poor little class. |
Two things: That removes the option of requesting sources that the model doesn't use, which I think is important e.g. there might be some sort of metadata that my model is not aware of but that I want to use for monitoring. As a random example, let's say my data is from different domains, and I want to check the BLEU score difference between these domains. I could easily add a data source with this information and feed that to the extension, but there's no Theano variable for this. Secondly, how intelligent is Theano if it comes to outputting input variables? I just want to make sure we're not doubling the memory used, or worse, transferring data to GPU and immediately transfering it back. I understand that it seems overly complicated, but I gave it a bit of thought yesterday, and so far I don't see a simpler solution that is flexible enough. This is a pretty important callback, which should be able to do all sorts of crazy monitoring things, so I don't like the idea of hardcoding source names, or limiting it to the same data the model consumes. |
First: well, you can create a stand-alone variable with the same name as the source you need and it will be matched with the corresponding data somewhere in Second: if Theano is inefficient, we can handle input variables differently under the hood. The user would keep using variables only, which is very Block-ish. The abstraction of data source be prohibited from spreading all over Blocks code. |
Not sure I find it Block-ish to create fake Theano variables that aren't part of any computation graph, but I guess it could keep the interface a bit cleaner. I think that if we do that, we should still ease the restriction on the sources and input variables matching exactly. I would find it annoying if I remove a |
@mohammadpz I think we've converged on a first concept :) Also, @rizar's interface as proposed in #349 (comment) is nicer than my idea of passing a tuple to |
In Regarding implementation of this: it will be a way easier if in the first draft we do not try to integrate with the |
@bartvm , sure :) |
Just had a look, and it does seem we need to handle input variables for sources that are not part of the graph separately. If you do |
Addressed in #524. |
Good example when it is necessary: BLEU score or WER on the validation set.
The main challenge is that we again need aggregation schemes, and I do not think there is any chance to share code with aggregation schemes for Theano variable without making it very hard to read.
So I think about:
just as we have for Theano variables, except that there is no need any more to have separate
Aggregator
andAggregationScheme
. In a way the class is the scheme, and the object is the aggregator.In most cases the user will simply pass a callback, and his callback will be wrapped into a default
MonitoredQuantity
descendant that will do simple averaging. I would put this logic into theDataStreamMonitoring
, whereas theDatasetEvaluator
would simply callinitialize
,accumulate
andreadout
methods.If you guys like this plan, I think I will implement it pretty soon.
The text was updated successfully, but these errors were encountered: