Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sitemap #666

Merged
merged 9 commits into from Sep 23, 2020
Merged

Sitemap #666

merged 9 commits into from Sep 23, 2020

Conversation

fromcouch
Copy link
Contributor

Added new library in composer to made easy sitemap processing.

This modification only detects sitemaps, not crawl contest yet it and adds to URL list the sitemaps detected.

I have added a new function to Request.php to detect if URLs exists or not, please validate.

Finally, so sorry, i haven't time to learn phpunit and i can't added testing I will promise in near future write my own unit testing.

First part of #632 issue.

@leonstafford
Copy link
Contributor

Looks great!

No worries about the unit tests. Myself or someone else can assist with that, not a hard requirement currently.

Will wait a short period for feedback on the 301/404 question, then merge in

@leonstafford
Copy link
Contributor

ps. Great job finding that sitemaps library!

@leonstafford
Copy link
Contributor

@fromcouch I think @john-shaffer's got some freshly-baked code in master to help with the 301/404 issue you were having and we'll hear from him in future

@john-shaffer
Copy link
Contributor

I'm just trying things out in https://github.com/WP2Static/wp2static-addon-advanced-crawling, and added a few changes I needed to make that work.

The 301/404 handling is already implemented in https://github.com/WP2Static/wp2static/blob/3d6583b48660461a52d11cadb310698653ee8726/src/Crawler.php#L189-L214. You just need to do something like

$crawler = new Crawler();
$response = $crawler->crawlURL( $url );
if ( $response ) {
    // check $response['code']
}

$response will be null if there are any connection issues or other non-HTTP errors. I'm seeing that even on localhost, because something, somewhere is trying to redirect to HTTPS.

@fromcouch
Copy link
Contributor Author

fromcouch commented Sep 15, 2020

I have a preference to move logic to other classes. But ok, I will add this to Helper.

Edit: crawlUrl method uses CrawlCache::rmUrl that will be dangerous because remove content from database, also uses Request::getUrl internally. Not a good idea. I will remove existUrl method but still using Request::getUrl

@john-shaffer
Copy link
Contributor

You can remove the rmUrl call. I've already removed it in the advanced crawling add-on because it needs a record of which URLs have been detected. We'll have to deal with removing 404s some other way (maybe after crawling).

@fromcouch
Copy link
Contributor Author

I was changed Request and left Crawler untouched

@leonstafford
Copy link
Contributor

@fromcouch looking to merge this - is the user-agent string still pending change to WP2Static.com?

@fromcouch
Copy link
Contributor Author

@leonstafford changed user agent, could we make constant for this user agent string?

@leonstafford
Copy link
Contributor

@fromcouch good idea - can be used in other add-ons, too, when hitting APIs

@fromcouch
Copy link
Contributor Author

@leonstafford add to pending tasks XD

@leonstafford leonstafford merged commit 62f69e2 into elementor:master Sep 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants