New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sitemap #666
Sitemap #666
Conversation
Looks great! No worries about the unit tests. Myself or someone else can assist with that, not a hard requirement currently. Will wait a short period for feedback on the 301/404 question, then merge in |
ps. Great job finding that sitemaps library! |
@fromcouch I think @john-shaffer's got some freshly-baked code in |
I'm just trying things out in https://github.com/WP2Static/wp2static-addon-advanced-crawling, and added a few changes I needed to make that work. The 301/404 handling is already implemented in https://github.com/WP2Static/wp2static/blob/3d6583b48660461a52d11cadb310698653ee8726/src/Crawler.php#L189-L214. You just need to do something like $crawler = new Crawler();
$response = $crawler->crawlURL( $url );
if ( $response ) {
// check $response['code']
} $response will be null if there are any connection issues or other non-HTTP errors. I'm seeing that even on localhost, because something, somewhere is trying to redirect to HTTPS. |
I have a preference to move logic to other classes. But ok, I will add this to Helper. Edit: crawlUrl method uses CrawlCache::rmUrl that will be dangerous because remove content from database, also uses Request::getUrl internally. Not a good idea. I will remove existUrl method but still using Request::getUrl |
You can remove the rmUrl call. I've already removed it in the advanced crawling add-on because it needs a record of which URLs have been detected. We'll have to deal with removing 404s some other way (maybe after crawling). |
I was changed Request and left Crawler untouched |
@fromcouch looking to merge this - is the user-agent string still pending change to |
@leonstafford changed user agent, could we make constant for this user agent string? |
@fromcouch good idea - can be used in other add-ons, too, when hitting APIs |
@leonstafford add to pending tasks XD |
Added new library in composer to made easy sitemap processing.
This modification only detects sitemaps, not crawl contest yet it and adds to URL list the sitemaps detected.
I have added a new function to Request.php to detect if URLs exists or not, please validate.
Finally, so sorry, i haven't time to learn phpunit and i can't added testing I will promise in near future write my own unit testing.
First part of #632 issue.