Skip to content

Loading…

Chrome store feedback: "any.gs/linkbucks ads" #886

Closed
gorhill opened this Issue · 28 comments

4 participants

@gorhill

seems that i can't block those redirecting any.gs sites ads. Chrome always warning about "The site contains harmful programs". If i use "hpHosts’s Ad and tracking servers‎" it blocks whole site and all the others seems not to work... i'm trying this site: http://www.any.gs/BeXpL and i tested same in Adblock Plus and it's working fine

At first I thought EasyList needed to be updated, but I still could reproduce the issue.

Looking into this closely, I found uBlock misinterpreted the frameId for when a request is of type sub_frame. For a request of type sub_frame, the Chrome API says that this is not the frame in which the request occurs, but rather the frame id of the new frame itself. So in such case, the parentFrameId needs to be used.

@gorhill gorhill added a commit that closed this issue
@gorhill gorhill this fixes #886 b323a33
@gorhill gorhill closed this in b323a33
@gorhill

@Deathamns I don't get the same blocking results with Firefox with above test. Must be a frameId/parentFrameId mismatch with the Chrome API definition:

If the document of a (sub-)frame is loaded (type is main_frame or sub_frame), frameId indicates the ID of this frame, not the ID of the outer frame.

Could you look into this? Because of last time I tried to fix something in there, not sure anymore where in the platform-specific code to check.

@gorhill gorhill reopened this
@gorhill

Chromium:

µBlock.webRequest/onBeforeRequest(): type=main_frame, id=0,  parent id=-1, url=http://www.any.gs/BeXpL
µBlock.webRequest/onBeforeRequest(): type=sub_frame,  id=4,  parent id=0,  url=http://www.linkbucksmedia.com/director/?t=0c647d23627e6b421ad54cadf8a4813a6f2e7a3b
µBlock.webRequest/onBeforeRequest(): type=sub_frame,  id=5,  parent id=4,  url=http://softingo.com/clp/linkbucks3rgxw?sc=smb&sn=File%20Downloaded20Successfully&sd=http://go34down.com/dwnld/h/dan/Setup_lb.exe&ap=13090011

Firefox:

µBlock.webRequest/onBeforeRequest(): type=main_frame, id=0,  parent id=-1, url=http://www.any.gs/BeXpL
µBlock.webRequest/onBeforeRequest(): type=sub_frame,  id=57, parent id=54, url=http://softingo.com/clp/linkbucks3rgxw?sc=smb&sn=File%20Downloaded20Successfully&sd=http://go34down.com/dwnld/h/dan/Setup_lb.exe&ap=13090011
µBlock.webRequest/onBeforeRequest(): type=sub_frame,  id=57, parent id=54, url=https://dxnlihy2f7vx3.cloudfront.net/rdr/softingo.com/clp/linkbucks3rgxw?sc=smb&sn=File%20Downloaded20Successfully&sd=http://go34down.com/dwnld/h/dan/Setup_lb.exe&ap=13090011
µBlock.webRequest/onBeforeRequest(): type=sub_frame,  id=57, parent id=54, url=http://fileplenty.com/clp/linkbucks3rgxw?sc=smb&sn=File%20Downloaded20Successfully&sd=http://go34down.com/dwnld/h/dan/Setup_lb.exe&ap=13090011
µBlock.webRequest/onBeforeRequest(): type=sub_frame,  id=62, parent id=57, url=http://sub.enpley.info/N82BvNKW2d2e93df1b2d6389bb0ca7b055beb51af0rV2YVX0OXSYToyOntzOjI6InRzIjtpOjE0MjQ5MDg4ODg7czoxOiJmIjtzOjcyOiIvaG9tZS93d3cvYXNzZXRzL2JldHRlcl9pbnN0YWxsZXIvaW5zdGFsbGVycy9jbGkvc2V0dXBfMTQyNDg2MjI1NTk5NC5leGUiO30=

There is this filter in EasyList (I removed the unrelated domains):

|http:$subdocument,third-party,domain=linkbucksmedia.com

On Chromium I get this frame hierarchy:

  • www.any.gs/BeXpL
    • www.linkbucksmedia.com
      • softingo.com

The bug was that originally, uBlock did not evaluate softingo.com in the context of www.linkbucksmedia.com, but erroneously in the context of www.any.gs/BeXpL. So the test case above is now properly handled in Chromium with the fix.

Here is the hierarchy reported in Firefox:

  • www.any.gs/BeXpL
  • fileplenty.com (<= dxnlihy2f7vx3.cloudfront.net <= softingo.com)
    • sub.enpley.info

So the frame 57 (softingo.com, dxnlihy2f7vx3.cloudfront.net, fileplenty.com) is floating, i.e. not reported as being child of www.any.gs/BeXpL. I don't know what happened to www.linkbucksmedia.com, but it may be was blocked as a popup, not sure.

@Deathamns

As far as I can tell, the frameIds are assigned correctly in frameModule.js, but the www.linkbucksmedia.com iframe somehow doesn't reach traffic.js.
I'll look into it later.

@gorhill

the frameIds are assigned correctly

Yes, I woke up realizing I interpreted wrongly the repeated frame id 57: it's not different frames with the same id, it's the same frame with different URLs -- probably redirection. I edited accordingly. From what I can see, it seems linkbuckets.com is closed as a popup window, so it never reaches onBeforeRequest().

@Deathamns

Seems that it is not guaranteed that http-observer will immediately follow shouldLoad, meaning it can't pair the requests, i.e. the frame request will be considered behind-the-scene.

sL: http://www.any.gs/scripts/generated/key.js?t=***&49307698
ob: http://www.any.gs/scripts/generated/key.js?t=***&49307698
sL: http://www.linkbucksmedia.com/director/?t=***
sL: http://static.linkbucks.com/tmpl/mint/img/int_top_bg.gif
ob: http://static.linkbucks.com/tmpl/mint/img/int_top_bg.gif
ob: http://www.linkbucksmedia.com/director/?t=***
@gorhill

I have a long way to understand all the Firefox code, so excuse if this sounds naive, but from what I've been able to gathered, if httpObserver.lastRequest was a stack instead of a single entry, this would solve the pairing issue?

@Deathamns

Order is not guaranteed, also shouldLoad in some cases sends double reports for the same net request, but actually there will be only one net request for the resource.

@AlexVallat
Collaborator

Sounds like an argument in favour of blocking in shouldLoad rather than http-observer, that we discussed briefly in #824. (I couldn't find the experimental code you mentioned for supporting this in the nosync stream, by the way)

@Deathamns

@AlexVallat Now that gorhill mentioned redirects, I realized that redirects are handled in the background process, because they're depending on the http-observer (which can live only in background process), so the currently used sync-messaging still should be used for every request, since you never know when a redirect will happen.

@AlexVallat
Collaborator

@Deathamns I thought we concluded last time that redirection was no longer used, because it was only for the local mirroring feature, which has been removed...

@Deathamns

@AlexVallat That is a different thing.
So, being able to pragmatically redirect http requests (via the extension) to a different URL was needed by local mirroring.
Being able to monitor http redirects that were made by the server/browser (for example by the Location http header) is still necessary.

@AlexVallat
Collaborator

@Deathamns Ah, right, I get it now. Sorry, thinking about the wrong thing. Yes, I can see how that is a giant pain. I suppose the intended way to deal with redirects is ContentPolicy.shouldProcess, only that is horribly broken and doesn't appear to ever actually get called.

So, in order to be able to catch a redirect to a blocked URL, we need to do it asynchronously to shouldProcess, and to decide whether to block that redirection or not we need context information that is only available in shouldLoad (specifically the owning frame and type of the request).

It seems that the only common identifying feature between what gets passed to shouldLoad and the nsIHttpChannel is the URI itself, so if order can't be relied on then I guess the only way to pair them is by a Map keyed on URI? That's pretty horrible, and you can't even guarantee that every URI that comes through shouldLoad will eventually hit the observer (some other policy could block it), which means the map could leak. Not a lot, but still the odd URL/details pair adding up over time...

The only other direction I can think of is abandoning ContentPolicy.shouldLoad and looking to acquire all the information it provides from nsIHttpChannel.loadInfo. This can at least in principle be done, I think, but I don't know how much e10s pain might be involved. In Firefox 36, at least, you can get:

.loadInfo.contentPolicyType
.loadInfo.loadingDocument

which at first glance might be enough to get the information that comes from shouldLoad.

@Deathamns

.loadInfo.loadingDocument is null with e10s, which could have helped determining from which tab the request came from, but it is useless like this. Also, there would be still no way to determine the frameIds (which can be done only in frameModule.js with e10s enabled).

@AlexVallat
Collaborator

Wouldn't frameIds just come from nsIDOMWindowUtils.outerWindowID in the same way they do in frameModule.js? Or do you mean that that wouldn't work in e10s even with a non-null loadingDocument to pass to it? In any case, it looks like e10s has thrown a spanner into the works for the whole process of getting load info from the channel, so unless they change that before it's finalised, that will eventually be a non-starter too. Sigh.

@Deathamns

nsIDOMWindowUtils can be used on Window objects, but windows are can be reached only via frame scripts with e10s.

@gorhill

which means the map could leak

Trivial to take care of, if we agree that entries in that map are short-lived. Have each entry timestamped to detect stale entries, and regularly houseclean the map for stale entries?

@Deathamns

@gorhill As @AlexVallat wrote:

the only common identifying feature between what gets passed to shouldLoad and the nsIHttpChannel is the URI

A map is not viable, since a the exact same URL can be requested many times or from different tabs (which the http-observer is not aware of).

@AlexVallat
Collaborator

OK, got myself a portable copy of nightly for e10s (blegh) experimentation. It looks like there is still one useful bit of information inside ´´´.loadInfo´´´, which is ´´´.loadInfo.innerWindowID´´´. This corresponds to nsIDOMWindowUtils.currentInnerWindowID on the content side.

In practice, I found it appears to identify the parent frame ID, so any requests coming in get the innerWindowID of the document of the tab that is loading them. This is generally a good thing, because almost all the information we actually want is the parentFrame, but I think getting the ID of the inner frame itself could be an issue, I'm not sure if there's a way around that.

@Deathamns

How could be innerWindowID helpful? How would you determine the tabId, frameIds or parentFrameIds from that?
But I guess, we're going off topic here. Maybe this should have an issue for itself.

@AlexVallat
Collaborator

Inner frames are a problem, yes - I'm still investigating how frameIds are used, and if there is anything that can be done for them. Apart from that, innerWindowId can be used as a direct replacement for parentFrameId. It is a unique identifier for the top level window, just as the current parentFrameId is.

For tabId, I'm pretty sure that will require iterating and access to the contentWindowAsCPOW to check against .currentInnerWindowID. There is supposed to be a Services.wm.getCurrentInnerWindowWithId to do this, just as there is a getOuterWindowWithId, but sadly I couldn't get either of those functions to return anything under either content or browser process.

Alternatively, it might be useful as a key for tying together shouldLoad and observer - at shouldLoad you know what innerWindowId the request is for, and that together with the URI should uniquely identify it for later looking up in handleRequest. It's still possible for there to be multiple URIs simultaneously in the same tab, but if that's the case, do we really care which one matches up with which channel? They would all be treated identically...

@Deathamns

To get frameIds, we need outerWindow, since innerWindow will be different every time you refresh the page or navigate in history (more).

contentWindowAsCPOW - we want to fully support e10s, so no for this. Probably it wouldn't even work, since I added multiprocessCompatible flag in the install.rdf.

getCurrentInnerWindowWithId and getOuterWindowWithId - If these try to fetch the window objects, then this definitely won't work in e10s.

at shouldLoad you know what innerWindowID the request is for, and that together with the URI should uniquely identify it for later looking up in handleRequest

Unfortunately, innerWindowId (and outerWindowID) is still not unique enough, if I recall correctly, it's unique to each browser window only. However this definitely could strengthen the possibility to match the correct requests if a map were used (with small timestamp differences).

@AlexVallat
Collaborator

So, either a sync message is passed to the frame script to request a lookup to help find tab ID from innerWindowId, or shouldLoad must populate a map to lookup request details. That would have to be sync too, I guess, to ensure that it's arrived by the time the observer needs it.

innerWindowId seems to be unique to a frame, at least until navigation (as you mention), so that would seem to be unique enough when combined with a URI to me. If the same URI is loaded multiple times in the same frame, then I don't see how it matters exactly which request is matched up, as they are all going to be the same. The only problem would be if different frames or different tabs could have the same innerWindowId (and loaded the same URL), but I don't think that can happen.

With two browser windows open, you can keep navigating to the same URL in both windows and observe that the innerWindowID reported for each never matches - while they generally increment by one, they always skip over each other's used IDs.

@gorhill

I tried http://www.any.gs/BeXpL and Firefox passed the test with version at 02ab813. So I guess we can close this now?

@Deathamns

@AlexVallat In which version exactly did they introduce loadInfo? Because in Firefox 29, it doesn't seem to work. If it's too new, then probably we can't use it, if we want support from version 24.

@gorhill I just tested, and it still can't pair them.

@TheSweetLily

@Deathamns

Because in Firefox 29, it doesn't seem to work. ... If it's too new, then probably we can't use it, if we want support from version 24.

May I ask why you want to support all the way down to Firefox 24? I can understand supporting down to the latest ESR release (31), but lower than that? What am I missing?

Not trying to criticize, just curious.

@AlexVallat
Collaborator

@Deathamns I traced it to http://hg.mozilla.org/mozilla-central/rev/6ca74bed32d8 but I'm not sure how to determine what release version that change actually landed in.

@Deathamns

@AlexVallat Just checked (via portables). It is only supported from version 35, which is too fresh indeed.
Looking at this issue, it seems to be rare case. It seems that in every other case, the order is correct (shouldLoad -> http-observer). Probably saving only the last two requests would solve the issue.

@Daniellynet Because many people requested it, and if it's not too hard to keep it compatible, then why not. But yeah, I wouldn't bother either if this wasn't the case.

@Deathamns Deathamns added a commit that referenced this issue
@Deathamns Deathamns Firefox: workaround for #886 cc47d13
@gorhill

I can't reproduce the issue with Firefox.

@gorhill gorhill closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.