New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Form Authentication #434
Comments
It looks like this authentication method needs a custom solution. I suspect the <authFormParams>
<param name="(param name)">(param value)</param>
<!-- You can repeat this param tag as needed. -->
</authFormParams> I doubt this will work. You may have to do some coding. One approach could be to extend If your authentication method uses a known standard, you can provide more information on it and we can turn this into a feature request. |
Hello!
but getting an error:
versions: |
GenericHttpClientFactory (github #434).
This is a validation error because validation was not updated to support the new feature. It can be ignored unless you start the collector with the An updated snapshot was just made to fix the validation error. |
Thank you for the quick response! I tested the issue with 2.8.1 (2017-12-08) snapshot. On
Got the following:
I believe Also analyzing server-side request variable, I haven't found The problem is more generic, I guess, because configuring
also doesn't bring Please feel free ask additional information which help you reproduce the error. But I should note, that |
I wrongfully assumed the value should always be present. I will modify the validation. But I doubt it will make a difference. Again, if your authentication is "standard", point me to documentation for it, or share credentials with me so I can try. Otherwise, your best bet is to implement the auth logic yourself (e.g., extending |
I don't think it is a kind of "standard" practice, but in my current case there is a form authorization with additional fields, which can be empty. Providing of invisible for user, but available for crawlers form fields, and applying some heuristics to their values can be called as a generic approach. Please see https://www.drupal.org/project/honeypot as an example. Thank you for your commit, I believe it doesn't break class logic, it still generic enough. Also I absolutely agree with you, that more complex authorization form logic should be implemented by |
May I point you again to a problem?
When checking server-side request variable I cannot see a parameter with name 'param'. Is the problem my local? The server uses LAMP stack, analysing $_REQUEST and $_REQUEST vars. |
I know it's been a while, but do you still have the issue? If so, can you share your config along with authentication info to reproduce (privately if you want). |
Hi Pascal, Thanks for following up. We got sidetracked with some other issues but will revisit soon (likely Feb) |
I've got a scenario very similar to the original post. I have a website I am trying to extract information from. The pages I want to scrap are behind a form based authentication. The problem is that the login form generates a dynamic form stored in a hidden form field that needs to be sent back for authentication to work. Is there anyway to add something like the following to the httpClientFactory: <authPrefetch>
<prefetchURL> URL_of_the_original_form_with_the_dynamic_code </prefetchURL>
<prefetchParam>form_field_name_of_the_dynamic_hidden_field</prefetchParam>
</authPrefetch> This would create a prefetch pass to the URL defined to get the dynamically generated form field and submit it on the actual httpClientFactory authentication submission. |
Can you share the URL (or ideally your config)? Quite often the problem with your suggested approach is the dynamic field value is populated via JavaScript, precisely to prevent automated scraping. Is that your case? Javascript is not interpreted unless you use the PhantomJSDocumentFetcher. To have its integration with PhantomJS working you probably to modify the phantom.js script and do that magic yourself. I could confirm better with a URL to your login form. |
I'm also facing this issue @AntonioAmore Did you ever find a resolution? I made a simple script that var_dumps($_POST); website: 2021-01-14 18:50:28 INFO - Performing FORM authentication at "https://example.com/test.php" (username=username; password=*****) I've tried both 2.8.1 and 2.9.1 but the authFormParams field doesn't seem to do anything my config looks like
|
@mattbucci It's a pity to say, but no. I resolved my tasks by another, alternative way. |
Trying to authenticate using form authentication. The form appears to be generating a (hidden) unique token that need to be sent back.
From the site:
Assuming I am interpreting this correctly, any thoughts on how to capture the token and send it back in
httpClientFactory
?thanks!
The text was updated successfully, but these errors were encountered: