-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FacebookBot mislabeled as fake #10
Comments
Two solutions come to my mind:
I don't like either of these approaches, as they add overhead. |
Replying here to your comment and that above.
There is a risk to inflate the list in this approach. It could be critical assuming some ranges in FB may be quite wide, especially in IPv6 space. This is also a reason why (2) "drop SegmentTree completely" is hardly an option. Otherwise we could try to remove smaller ranges that have wider parents. What's on my mind is to sort the list and |
There's also third way that might solve this: Sort the array of ranges based on their end, as SegmentTree sorts it based on the start. Pass the sorted array, and bypass SegmentTree's sort: SegmentTree.new(@sorted_ranges, true) |
Agreed! As I said I don't like either.
That was also something I had in mind, but I was considering those partial matches as well, and accounting for those was why I discarded the idea. I also think there was some other bot that will need implementation for this workaround. Let me know what you think of option 3, as well. So far this looks in my head as the most viable approach, with no added overhead. I'm just not sure whether it'll work in practice :-) |
IMHO it makes no sense to bother about them beforehand. That would be premature optimization. FB is rather unique in this regard so far.
This looks like the easiest approach, but it needs to be thought of thoroughly algorithm-wise... it takes time. This would also need careful testing on Legitbot's side. It's also worth mentioning that we shall depend on the knowledge of "segment_tree" internal structures then. |
I don't quite get what you mean by this? It's the same approach as used now, only the input array is pre-sorted, and SegmentTree is instructed not to sort it beforehand. Regarding the other points - I agree. What I'm thinking is to cobble some tests and rewrite the facebook part and run it against the tests, and then against real-world ip data and see whether it works. As a matter of fact this bug surfaced by pure coincidence. I think at one point it will resolve itself, when Facebook change their IP ranges, but who knows when that will be. P.S.: This suggestion might be a possible solution too. |
Published as |
I'm starting to see Facebook Bot being labeled as a fake search engine, when in reality the IP address is genuine and I think the issue here is in the SegmentTree being built.
The IP in question is 69.171.251.1
I get the following in the console:
On the other hand:
On one hand the IPv4 SegmentTree range is too broad, but despite that the valid IP address is not returned and a legitimate bot is labeled as a fake one and thus being blocked.
The text was updated successfully, but these errors were encountered: