
Loading…
Chrome store feedback: "any.gs/linkbucks ads" #886
@Deathamns I don't get the same blocking results with Firefox with above test. Must be a frameId/parentFrameId mismatch with the Chrome API definition:
If the document of a (sub-)frame is loaded (type is main_frame or sub_frame), frameId indicates the ID of this frame, not the ID of the outer frame.
Could you look into this? Because of last time I tried to fix something in there, not sure anymore where in the platform-specific code to check.
Chromium:
µBlock.webRequest/onBeforeRequest(): type=main_frame, id=0, parent id=-1, url=http://www.any.gs/BeXpL
µBlock.webRequest/onBeforeRequest(): type=sub_frame, id=4, parent id=0, url=http://www.linkbucksmedia.com/director/?t=0c647d23627e6b421ad54cadf8a4813a6f2e7a3b
µBlock.webRequest/onBeforeRequest(): type=sub_frame, id=5, parent id=4, url=http://softingo.com/clp/linkbucks3rgxw?sc=smb&sn=File%20Downloaded20Successfully&sd=http://go34down.com/dwnld/h/dan/Setup_lb.exe&ap=13090011
Firefox:
µBlock.webRequest/onBeforeRequest(): type=main_frame, id=0, parent id=-1, url=http://www.any.gs/BeXpL
µBlock.webRequest/onBeforeRequest(): type=sub_frame, id=57, parent id=54, url=http://softingo.com/clp/linkbucks3rgxw?sc=smb&sn=File%20Downloaded20Successfully&sd=http://go34down.com/dwnld/h/dan/Setup_lb.exe&ap=13090011
µBlock.webRequest/onBeforeRequest(): type=sub_frame, id=57, parent id=54, url=https://dxnlihy2f7vx3.cloudfront.net/rdr/softingo.com/clp/linkbucks3rgxw?sc=smb&sn=File%20Downloaded20Successfully&sd=http://go34down.com/dwnld/h/dan/Setup_lb.exe&ap=13090011
µBlock.webRequest/onBeforeRequest(): type=sub_frame, id=57, parent id=54, url=http://fileplenty.com/clp/linkbucks3rgxw?sc=smb&sn=File%20Downloaded20Successfully&sd=http://go34down.com/dwnld/h/dan/Setup_lb.exe&ap=13090011
µBlock.webRequest/onBeforeRequest(): type=sub_frame, id=62, parent id=57, url=http://sub.enpley.info/N82BvNKW2d2e93df1b2d6389bb0ca7b055beb51af0rV2YVX0OXSYToyOntzOjI6InRzIjtpOjE0MjQ5MDg4ODg7czoxOiJmIjtzOjcyOiIvaG9tZS93d3cvYXNzZXRzL2JldHRlcl9pbnN0YWxsZXIvaW5zdGFsbGVycy9jbGkvc2V0dXBfMTQyNDg2MjI1NTk5NC5leGUiO30=
There is this filter in EasyList (I removed the unrelated domains):
|http:$subdocument,third-party,domain=linkbucksmedia.com
On Chromium I get this frame hierarchy:
-
www.any.gs/BeXpL-
www.linkbucksmedia.comsoftingo.com
-
The bug was that originally, uBlock did not evaluate softingo.com in the context of www.linkbucksmedia.com, but erroneously in the context of www.any.gs/BeXpL. So the test case above is now properly handled in Chromium with the fix.
Here is the hierarchy reported in Firefox:
www.any.gs/BeXpL-
fileplenty.com(<=dxnlihy2f7vx3.cloudfront.net<=softingo.com)sub.enpley.info
So the frame 57 (softingo.com, dxnlihy2f7vx3.cloudfront.net, fileplenty.com) is floating, i.e. not reported as being child of www.any.gs/BeXpL. I don't know what happened to www.linkbucksmedia.com, but it may be was blocked as a popup, not sure.
As far as I can tell, the frameIds are assigned correctly in frameModule.js, but the www.linkbucksmedia.com iframe somehow doesn't reach traffic.js.
I'll look into it later.
the frameIds are assigned correctly
Yes, I woke up realizing I interpreted wrongly the repeated frame id 57: it's not different frames with the same id, it's the same frame with different URLs -- probably redirection. I edited accordingly. From what I can see, it seems linkbuckets.com is closed as a popup window, so it never reaches onBeforeRequest().
Seems that it is not guaranteed that http-observer will immediately follow shouldLoad, meaning it can't pair the requests, i.e. the frame request will be considered behind-the-scene.
sL: http://www.any.gs/scripts/generated/key.js?t=***&49307698
ob: http://www.any.gs/scripts/generated/key.js?t=***&49307698
sL: http://www.linkbucksmedia.com/director/?t=***
sL: http://static.linkbucks.com/tmpl/mint/img/int_top_bg.gif
ob: http://static.linkbucks.com/tmpl/mint/img/int_top_bg.gif
ob: http://www.linkbucksmedia.com/director/?t=***
I have a long way to understand all the Firefox code, so excuse if this sounds naive, but from what I've been able to gathered, if httpObserver.lastRequest was a stack instead of a single entry, this would solve the pairing issue?
Order is not guaranteed, also shouldLoad in some cases sends double reports for the same net request, but actually there will be only one net request for the resource.
Sounds like an argument in favour of blocking in shouldLoad rather than http-observer, that we discussed briefly in #824. (I couldn't find the experimental code you mentioned for supporting this in the nosync stream, by the way)
@AlexVallat Now that gorhill mentioned redirects, I realized that redirects are handled in the background process, because they're depending on the http-observer (which can live only in background process), so the currently used sync-messaging still should be used for every request, since you never know when a redirect will happen.
@Deathamns I thought we concluded last time that redirection was no longer used, because it was only for the local mirroring feature, which has been removed...
@AlexVallat That is a different thing.
So, being able to pragmatically redirect http requests (via the extension) to a different URL was needed by local mirroring.
Being able to monitor http redirects that were made by the server/browser (for example by the Location http header) is still necessary.
@Deathamns Ah, right, I get it now. Sorry, thinking about the wrong thing. Yes, I can see how that is a giant pain. I suppose the intended way to deal with redirects is ContentPolicy.shouldProcess, only that is horribly broken and doesn't appear to ever actually get called.
So, in order to be able to catch a redirect to a blocked URL, we need to do it asynchronously to shouldProcess, and to decide whether to block that redirection or not we need context information that is only available in shouldLoad (specifically the owning frame and type of the request).
It seems that the only common identifying feature between what gets passed to shouldLoad and the nsIHttpChannel is the URI itself, so if order can't be relied on then I guess the only way to pair them is by a Map keyed on URI? That's pretty horrible, and you can't even guarantee that every URI that comes through shouldLoad will eventually hit the observer (some other policy could block it), which means the map could leak. Not a lot, but still the odd URL/details pair adding up over time...
The only other direction I can think of is abandoning ContentPolicy.shouldLoad and looking to acquire all the information it provides from nsIHttpChannel.loadInfo. This can at least in principle be done, I think, but I don't know how much e10s pain might be involved. In Firefox 36, at least, you can get:
.loadInfo.contentPolicyType
.loadInfo.loadingDocument
which at first glance might be enough to get the information that comes from shouldLoad.
.loadInfo.loadingDocument is null with e10s, which could have helped determining from which tab the request came from, but it is useless like this. Also, there would be still no way to determine the frameIds (which can be done only in frameModule.js with e10s enabled).
Wouldn't frameIds just come from nsIDOMWindowUtils.outerWindowID in the same way they do in frameModule.js? Or do you mean that that wouldn't work in e10s even with a non-null loadingDocument to pass to it? In any case, it looks like e10s has thrown a spanner into the works for the whole process of getting load info from the channel, so unless they change that before it's finalised, that will eventually be a non-starter too. Sigh.
nsIDOMWindowUtils can be used on Window objects, but windows are can be reached only via frame scripts with e10s.
which means the map could leak
Trivial to take care of, if we agree that entries in that map are short-lived. Have each entry timestamped to detect stale entries, and regularly houseclean the map for stale entries?
@gorhill As @AlexVallat wrote:
the only common identifying feature between what gets passed to shouldLoad and the nsIHttpChannel is the URI
A map is not viable, since a the exact same URL can be requested many times or from different tabs (which the http-observer is not aware of).
OK, got myself a portable copy of nightly for e10s (blegh) experimentation. It looks like there is still one useful bit of information inside ´´´.loadInfo´´´, which is ´´´.loadInfo.innerWindowID´´´. This corresponds to nsIDOMWindowUtils.currentInnerWindowID on the content side.
In practice, I found it appears to identify the parent frame ID, so any requests coming in get the innerWindowID of the document of the tab that is loading them. This is generally a good thing, because almost all the information we actually want is the parentFrame, but I think getting the ID of the inner frame itself could be an issue, I'm not sure if there's a way around that.
How could be innerWindowID helpful? How would you determine the tabId, frameIds or parentFrameIds from that?
But I guess, we're going off topic here. Maybe this should have an issue for itself.
Inner frames are a problem, yes - I'm still investigating how frameIds are used, and if there is anything that can be done for them. Apart from that, innerWindowId can be used as a direct replacement for parentFrameId. It is a unique identifier for the top level window, just as the current parentFrameId is.
For tabId, I'm pretty sure that will require iterating and access to the contentWindowAsCPOW to check against .currentInnerWindowID. There is supposed to be a Services.wm.getCurrentInnerWindowWithId to do this, just as there is a getOuterWindowWithId, but sadly I couldn't get either of those functions to return anything under either content or browser process.
Alternatively, it might be useful as a key for tying together shouldLoad and observer - at shouldLoad you know what innerWindowId the request is for, and that together with the URI should uniquely identify it for later looking up in handleRequest. It's still possible for there to be multiple URIs simultaneously in the same tab, but if that's the case, do we really care which one matches up with which channel? They would all be treated identically...
To get frameIds, we need outerWindow, since innerWindow will be different every time you refresh the page or navigate in history (more).
contentWindowAsCPOW - we want to fully support e10s, so no for this. Probably it wouldn't even work, since I added multiprocessCompatible flag in the install.rdf.
getCurrentInnerWindowWithId and getOuterWindowWithId - If these try to fetch the window objects, then this definitely won't work in e10s.
at shouldLoad you know what innerWindowID the request is for, and that together with the URI should uniquely identify it for later looking up in handleRequest
Unfortunately, innerWindowId (and outerWindowID) is still not unique enough, if I recall correctly, it's unique to each browser window only. However this definitely could strengthen the possibility to match the correct requests if a map were used (with small timestamp differences).
So, either a sync message is passed to the frame script to request a lookup to help find tab ID from innerWindowId, or shouldLoad must populate a map to lookup request details. That would have to be sync too, I guess, to ensure that it's arrived by the time the observer needs it.
innerWindowId seems to be unique to a frame, at least until navigation (as you mention), so that would seem to be unique enough when combined with a URI to me. If the same URI is loaded multiple times in the same frame, then I don't see how it matters exactly which request is matched up, as they are all going to be the same. The only problem would be if different frames or different tabs could have the same innerWindowId (and loaded the same URL), but I don't think that can happen.
With two browser windows open, you can keep navigating to the same URL in both windows and observe that the innerWindowID reported for each never matches - while they generally increment by one, they always skip over each other's used IDs.
I tried http://www.any.gs/BeXpL and Firefox passed the test with version at 02ab813. So I guess we can close this now?
@AlexVallat In which version exactly did they introduce loadInfo? Because in Firefox 29, it doesn't seem to work. If it's too new, then probably we can't use it, if we want support from version 24.
@gorhill I just tested, and it still can't pair them.
Because in Firefox 29, it doesn't seem to work. ... If it's too new, then probably we can't use it, if we want support from version 24.
May I ask why you want to support all the way down to Firefox 24? I can understand supporting down to the latest ESR release (31), but lower than that? What am I missing?
Not trying to criticize, just curious.
@Deathamns I traced it to http://hg.mozilla.org/mozilla-central/rev/6ca74bed32d8 but I'm not sure how to determine what release version that change actually landed in.
@AlexVallat Just checked (via portables). It is only supported from version 35, which is too fresh indeed.
Looking at this issue, it seems to be rare case. It seems that in every other case, the order is correct (shouldLoad -> http-observer). Probably saving only the last two requests would solve the issue.
@Daniellynet Because many people requested it, and if it's not too hard to keep it compatible, then why not. But yeah, I wouldn't bother either if this wasn't the case.
I can't reproduce the issue with Firefox.
At first I thought EasyList needed to be updated, but I still could reproduce the issue.
Looking into this closely, I found uBlock misinterpreted the
frameIdfor when a request is of typesub_frame. For a request of typesub_frame, the Chrome API says that this is not the frame in which the request occurs, but rather the frame id of the new frame itself. So in such case, theparentFrameIdneeds to be used.