Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sess_crawler-session re-redirect loop #47

Closed
csdougliss opened this issue Sep 10, 2014 · 5 comments
Closed

sess_crawler-session re-redirect loop #47

csdougliss opened this issue Sep 10, 2014 · 5 comments

Comments

@csdougliss
Copy link
Contributor

Hi Colin,

We use Nexcessnet Turpentine plugin (Varnish ESI) on our Magento website (along with your wonderful CM RedisSession module).

I've noticed an issue lately with Google Web Master Tools. If I use Fetch as Google I often get redirect or temporary unavailable. If I use Googlebot as my user agent in Firefox, I get a re-redirect loop.

When Varnish receives a first request it doesn't return a normal frontend cookie, it makes one up and this works fine. However I've noticed that when using google bot, no session cookie is generated. This is the expected behavior I believe, looking at the nexcess net demo site.

If I disable CM_RedisSession, I can see the following in var/session:

sess_crawler-session

core|a:3:{s:23:"_session_validator_data";a:4:{s:11:"remote_addr";s:9:"127.0.0.1";s:8:"http_via";s:0:"";s:20:"http_x_forwarded_for";s:13:"195.26.57.129";s:15:"http_user_agent";s:72:"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)";}s:13:"session_hosts";a:1:{s:13:"www.vax.co.uk";b:1;}s:8:"messages";O:34:"Mage_Core_Model_Message_Collection":2:{s:12:"*_messages";a:0:{}s:20:"*_lastAddedMessage";N;}}customer_uk|a:2:{s:23:"_session_validator_data";a:4:{s:11:"remote_addr";s:9:"127.0.0.1";s:8:"http_via";s:0:"";s:20:"http_x_forwarded_for";s:13:"195.26.57.129";s:15:"http_user_agent";s:72:"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)";}s:13:"session_hosts";a:1:{s:13:"www.vax.co.uk";b:1;}}checkout|a:2:{s:23:"_session_validator_data";a:4:{s:11:"remote_addr";s:9:"127.0.0.1";s:8:"http_via";s:0:"";s:20:"http_x_forwarded_for";s:13:"195.26.57.129";s:15:"http_user_agent";s:72:"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)";}s:13:"session_hosts";a:1:{s:13:"www.vax.co.uk";b:1;}}turpentine|a:3:{s:23:"_session_validator_data";a:4:{s:11:"remote_addr";s:9:"127.0.0.1";s:8:"http_via";s:0:"";s:20:"http_x_forwarded_for";s:13:"195.26.57.129";s:15:"http_user_agent";s:72:"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)";}s:13:"session_hosts";a:1:{s:13:"www.vax.co.uk";b:1;}s:8:"messages";a:0:{}}

When I disable CM Redis Session, I no longer get a re-direct loop using Google bot as my user-agent.

nexcess/magento-turpentine#599

@csdougliss
Copy link
Contributor Author

Could it have something to do with both servers (which are load balanced) writing to the same sess_crawler-session (as they both have the same name) ?

The redis session server is shared

This is the redis-session output:

1410348399.529225 [3 127.0.0.1:55049] "SELECT" "3"
1410348399.529342 [3 127.0.0.1:55049] "HMSET" "sess_crawler-session" "data" ":l4:\xcf\x13\xf3\x1acore|a:4:{s:23:\"_session_validator_data\";$\x00\xf0\x1011:\"remote_addr\";s:9:\"127.0.0.1\x10\x00\xb08:\"http_via\x0f\x0000:\"\a\x00#20\x17\x00\xb0x_forwarded\n\x00\x00\x1c\x00\xf1\x0213:\"195.26.57.129\x15\x00\x1351\x00\xa0user_agent\x17\x00\xf0\x1e72:\"Mozilla/5.0 (compatible; Googlebot/2.1; +\x88\x00\x81://www.g\x1b\x00\xf2\x02.com/bot.html)\";}}\x00\x04\xf5\x00Phosts\xec\x00\x101\x10\x01\x00\x97\x00\x00;\x00\xe0vax.co.uk\";b:14\x00\xb08:\"last_url\x94\x00\"37\xab\x00\x03k\x00\x050\x00\xa3/cms/index\x06\x00\x00-\x00\xf2\x158:\"messages\";O:34:\"Mage_Core_Model_M \x00\xe1_Collection\":2\x92\x00v2:\"\x00*\x00_A\x00ba:0:{}_\x01\x00\x1a\x00\x00\x97\x00SAddedD\x00\xf0\x06\";N;}}customer_uk|a:5K\x00\x0f\xed\x01\xff\x18P2:\"idH\x01\x02f\x01\x05L\x01\xa4segment_id3\x023i:3\x91\x01\x1f}\xe8\x01d\xbfheckout|a:3\xe5\x01\xff\x1c\x0f\x96\x03a st\xf3\x03\x90uk_defaul\xb6\x01\x01\xe6\x03\x0f\x9b\x03\xff\x16\x8f}catalog\xee\x02\xff\x94\x90turpentinp\b\x0f\x9e\x04\xff*\x03\x9e\b\t\xb2\x04\x80a:0:{}}}" "lock" "0"
1410348399.529544 [3 127.0.0.1:55049] "HINCRBY" "sess_crawler-session" "writes" "1"
1410348399.529651 [3 127.0.0.1:55049] "EXPIRE" "sess_crawler-session" "7200"

@colinmollenhour
Copy link
Owner

I don't know how Turpentine works, but in general having many users share one session id seems like a Bad Idea.. I assume Turpentine is assigning the session id as "crawler-session", so a simple solution should be to disable this feature and have it return a guuid. Cm_RedisSession already has features for reducing wasted resources due to bots.

@csdougliss
Copy link
Contributor Author

All crawlers matching the crawler regex would indeed have the same session Id (e.g. google bot, bing) etc. of crawler-session

Is there any way to cater for this in redis? I can disable the feature turpentine side, and this does in fact work it just means cookies are generated for bots which isn't needed normally.

If that's what's needed I will do that.

I do wonder why it works with the default session handler, however when testing I am only down to 1 node, not using the shared redis session, but on the same host

@colinmollenhour
Copy link
Owner

Not knowing any more about Turpentine I'd say that is what you should do. The bot handling in Cm_RedisSession is very effective (depending on how it is configured). It cut my sessions down by over 60% in one case. So you may have dozens vs one, but that is inconsequential. Also the bot handling works well with crawlers that cloak themselves as real users.

@csdougliss
Copy link
Contributor Author

Thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants