Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review #1 #156

colah opened this issue Apr 18, 2018 · 0 comments

Review #1 #156

colah opened this issue Apr 18, 2018 · 0 comments


Copy link

colah commented Apr 18, 2018

The following peer review was solicited as part of the Distill review process.

The reviewer chose to keep keep anonymity. Distill offers reviewers a choice between anonymous review and offering reviews under their name. Non-anonymous review allows reviewers to get credit for the service them offer to the community.

Distill is grateful to the reviewer, for taking the time to review this article.

Conflicts of Interest: Reviewer disclosed a minor potential conflict of interest.

Overall Comments

I think this is just OK.

I will list the things I like about it and the things I don't like, then I'll give my thoughts as I had them
during the first read through. I don't have a good sense of what the acceptance criteria are for this venue, so the people in charge of combining the reviews are going to have to make some kind of determination about that based on my comments.

Having read some other distill articles, I can say that I don't think this one is currently as good as the one about attention (which it is spiritually similar to). I think it is much less good than the How Momentum Works Article, which in my mind is the model for how these articles should be. To be precise, what I mean by 'good' is: did this help me understand the topic in a way that I wouldn't have been able to without a lot of effort on my own.

Things I Like

I like that you point out that a bunch of different models are doing something that is basically a special case of this one idea. That's a valuable contribution that helps me understand the world better.

I like that you put effort into giving good descriptions of those models, though I think it's hard to really do a thorough job of this in the space you have.

I like the idea of task representations, though I felt way too little time was spent on it.

I think the presentation is mostly nice and the diagrams are clear, though there are quite a few typos.

Things I Don't Like

To me the most interesting property of these models is the one you describe near the end:

It appears that modulating the values of FiLM parameters — which we reiterate are a tiny subset of the model’s parameters —  is sufficient to have drastic effects on the computational behaviour of the FiLM-ed network.

But after reading this article I don't feel like I understand this phenomenon any better!

IMO, way too much time is spent on the examples and not enough time is spent on understanding why these layers work the way that they do. This feels more like cataloguing than explaining to me.

Detailed Comments on Text

often required to handling


sometimes referred to multimodal learning.


is not typicaly


For a small number of class


Concatenation-based conditioning is equivalent to conditional biasing.

I like this diagram and I like that there is a footnote that deals with my obvious question about conv-nets.

How about multiplicative interactions, then?

Something of a non-sequitur.
Not a huge deal, but would be nice to explain why we might care about this?

Given that conditioning with additive

Suggest moving this up to before the diagram.

Can batch norm itself be described using this FiLM terminology?
Maybe you do this later in the text.

You can interact with the following fully-connected

Can I?
Oh yes I can, but it wasn't immediately clear I could change the params but not the input.

I actually don't see why you need to contrast film w/ attention - to me they are obviously different?
I guess doing so doesn't hurt much (it does disrupt flow a little).

weights forms a 3-tensor.

It doesn't seem like you'd lose anything by just calling this a multidimensional array...

we can represent a feature-wise affine transformation using a bilinear transformation

At this point I'm not sure this is helping me understand things better?

I like all of these examples of where FILM is being used in the sense that
they show you're talking about a generally applicable thing, but I'm not sure I can really say that
I've understood the different machine learning algorithms being used just from the descriptions here.
Maybe that's ok?

I also have to say that as a reader I got a bit fatigued reading about all the examples.
I wonder if there is a way to break this up to be more nonlinear, where you can click on example links and they take you out to a different page or something?

OK as I continue reading I'm more convinced that the list of examples is too long in its current form.

The section on properties of the trained models is too short, I think.
There is much more about these models that could be productively explored or explained.

given that an artist’s style may vary widely over time

Idea - chunk up an artist's life into decades or something and see if they get more separated.

Here's a thing I would like more discussion about:
I can also just cluster tasks with any number of clustering algorithms.
How do the clusters given by this method differ from those?
To what extent could you replace these FILM layers
with some existing clustering algorithm and a lookup table?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet

No branches or pull requests

3 participants