Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Imperva Incapsula prevent bot detection #319

Closed
clementbiron opened this issue Aug 19, 2021 · 3 comments
Closed

Imperva Incapsula prevent bot detection #319

clementbiron opened this issue Aug 19, 2021 · 3 comments
Assignees

Comments

@clementbiron
Copy link
Member

Trying to add Just Eat service with the following declaration

{
  "name": "Just Eat",
  "documents": {
    "Terms of Service": {
      "fetch": "https://www.just-eat.ie/info/terms-and-conditions",
      "select": {
        "startBefore": "#just-eat-website-terms-and-conditions",
        "endBefore": "#ii.just-eat-voucher-terms-conditions"
      }
    },
    "Privacy Policy": {
      "fetch": "https://www.just-eat.ie/info/privacy-policy",
      "select": [".main-text"]
    },
    "Trackers Policy": {
      "fetch": "https://www.just-eat.ie/info/cookies-policy",
      "select": [".main-text"]
    }
  }
}

I get this error message
Content inacessible: Error: The document cannot be accessed or its content can not be selected: The provided selector ".main-text" has no match in the web page https://www.just-eat.ie/info/cookies-policy.

The saved snapshot contains incorrect data:

<html>
<head>
<META NAME="robots" CONTENT="noindex,nofollow">
<script src="/_Incapsula_Resource?SWJIYLWA=5074a744e2e3d891814e9a2dace20bd4,719d34d31c8e3a6e6fffd425f7e032f3">
</script>
<body>
</body></html>

Some research leads me to believe that it is the following service https://www.imperva.com/products/advanced-bot-protection-management/ which seems to be well explained here https://www.imperva.com/blog/how-incapsula-client-classification-challenges-bots/

@martinratinaud
Copy link
Member

I did 3 things on this matter

Here is the content of my communication to them

Hi,
My name is Martin Ratinaud, CTO at the French Embassy for Digital Affairs.

We are running the OpenSource project "Open Terms Archive" which aims at tracking 
ToS for every service in the world, in all languages and all countries.

As such we are implementing a crawler that tracks changes on ToS regularly.

Could we get in touch so that we become a known and trusted bot.

Thanks a lot

Check our websites here: 
https://www.opentermsarchive.org/en
https://disinfo.quaidorsay.fr/en

@martinratinaud
Copy link
Member

Had a chat with Imperva and finally send an email on support@imperva.com

Hi,
My name is Martin Ratinaud, CTO at the French Embassy for Digital Affairs and Henri Verdier in CC is the ambassador.

We are running the OpenSource project "Open Terms Archive" which aims at tracking ToS for every service in the world, in all languages and all countries.

As such we are implementing a crawler that tracks changes on ToS regularly.

We know we are currently blocked by your services and would like our bot to be trusted by Imperva as a good bot (whitelisted) so that we are not blocked anymore

Thanks a lot

Check our websites here:
https://www.opentermsarchive.org/en
https://disinfo.quaidorsay.fr/en

@MattiSG
Copy link
Member

MattiSG commented Apr 24, 2023

We do not actively work on #166 at the moment. We will reopen it when we prioritise this work again. In the meantime, feel free to add any additional relevant information specific to Imperva Incapsula to this issue.

@MattiSG MattiSG closed this as completed Apr 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants