Internal sharepoint website is giving a 403 Forbidden #917

michaelt16 · 2024-03-04T22:14:08Z

Hi Pascal,

I have a question regarding crawling through an internal sharepoint site. It seems like everytime I go through the internal links I get a 403 forbidden, although I have setup the login aunthentication. Is there anything else I should think about when trying to solve this issue?

For context, I am testing with a depth of 1. Lets say that the first page required a log in as well (which works and able to crawl through it) but when the crawler goes through the sublinks (aka the sharepoint sites) it gives a 403 error although it typically just requires one login to access both.

What are some things I should look at when troubleshooting this? Let me know if configuration or more context is needed.

Thank you
-Michael

ohtwadi · 2024-03-08T22:18:29Z

Hi Michael,

The crawler offers generic NTLM support thanks to the Apache HttpClient library. It supports a few different NTML protocol versions but may not support the one you are using. Details on supported versions: https://hc.apache.org/httpcomponents-client-4.5.x/ntlm.html

You may also want to check with your system administrator to see if there are extra security layers or special configuration requirements you need to be aware of. Maybe you need to pass custom HTTP headers, or go through a proxy (look at <headers> and <proxySettings>).

Finally, if all fails you can try to find out if they offer a way to access your site via other authentication methods or maybe even white-list the crawler IP or some other workaround. There might be other network conditions you are not meeting with NTLM alone.

If you get a specific error from the crawler that suggests a bug, feel free to share your config here and the exact error/logs so we can look for a fix.

michaelt16 · 2024-03-13T19:20:52Z

Hi,

Thank you for your response. I switched the login to ntlm and it still gave me aa 403 forbidden error unfortunately.

I was thinking of this solution, I am not sure how it is going to work though. I was thinking of using some type of java browse bot and using it alongside norconex. Since I was able to use a browse bot to login to the sharepoint sites and retrieve the html contents.

stale · 2024-05-14T04:29:39Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale bot added the stale From automation, when inactive for too long. label May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Internal sharepoint website is giving a 403 Forbidden #917

Internal sharepoint website is giving a 403 Forbidden #917

michaelt16 commented Mar 4, 2024

ohtwadi commented Mar 8, 2024 •

edited

michaelt16 commented Mar 13, 2024 •

edited

stale bot commented May 14, 2024

Internal sharepoint website is giving a 403 Forbidden #917

Internal sharepoint website is giving a 403 Forbidden #917

Comments

michaelt16 commented Mar 4, 2024

ohtwadi commented Mar 8, 2024 • edited

michaelt16 commented Mar 13, 2024 • edited

stale bot commented May 14, 2024

ohtwadi commented Mar 8, 2024 •

edited

michaelt16 commented Mar 13, 2024 •

edited