Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

404 Eror if tx_realurl_urldata -> original_url contains cHash parameter #315

Closed
Elpiojo83 opened this issue Nov 2, 2016 · 22 comments

Comments

@Elpiojo83
Copy link

commented Nov 2, 2016

Whenever the Original URL contains a cHash i get a 404 Error, the error does not occure if i am logged in in the TYPO3 backend.

@dmitryd

This comment has been minimized.

Copy link
Owner

commented Nov 2, 2016

Update to the latest version before posting the report. There is no difference whether you are logged on or not to BE with latest version.

@Elpiojo83

This comment has been minimized.

Copy link
Author

commented Nov 2, 2016

Sorry but I am using version 2.1.4 which seems to be the latest available in TER.

@Elpiojo83

This comment has been minimized.

Copy link
Author

commented Nov 2, 2016

I found that [FE][pageNotFoundOnCHashError] = 1 seems to be responsible for that behavior.

@dmitryd

This comment has been minimized.

Copy link
Owner

commented Nov 3, 2016

Sorry but I am using version 2.1.4 which seems to be the latest available in TER.

This does not match this:

the error does not occure if i am logged in in the TYPO3 backend.

Realurl worked differently before 2.1.4 depending on BE login. Now it works identical. So if behavior changes depending on BE login, this is either something else or old realurl version.

@dmitryd

This comment has been minimized.

Copy link
Owner

commented Nov 3, 2016

Oh, is it possible that you enabled an option to include id to cHash in TYPO3 configuration? If yes, all you cHashes will be come invalid in tx_realurl_urldata.

@kitzberger

This comment has been minimized.

Copy link

commented Jan 22, 2017

We're having the same issue here.

Every now and then we're experiencing single pages being suddenly unreachable until we delete certain entries from said table. A recent example:

mysql> select * from tx_realurl_urldata where speaking_url like '%en/home/' and rootpage_id=873\G
*************************** 1. row ***************************
              uid: 46
              pid: 0
           crdate: 1482483101
          page_id: 956
      rootpage_id: 873
     original_url: L=&id=956
     speaking_url: en/home/
request_variables: {"id":"956","L":""}
           expire: 0
*************************** 2. row ***************************
              uid: 26048
              pid: 0
           crdate: 1483794298
          page_id: 956
      rootpage_id: 873
     original_url: L=0&cHash=9f53d73f2ddab4fc9bf9f82de587c425&id=956
     speaking_url: en/home/
request_variables: {"id":"956","L":"0","cHash":"9f53d73f2ddab4fc9bf9f82de587c425"}
           expire: 1487408328
*************************** 3. row ***************************
              uid: 2850
              pid: 0
           crdate: 1482493047
          page_id: 956
      rootpage_id: 873
     original_url: L=0&id=956
     speaking_url: en/home/
request_variables: {"id":"956","L":"0"}
           expire: 0
3 rows in set (0.01 sec)

As soon as we delete uid=26048 the page is accessable again. Any idea why that is exactly?

Should there be any cHash-Parameter in that table/column at all? If not should we check any weird link generation that led to that entry? Thanks in advance!

@kitzberger

This comment has been minimized.

Copy link

commented Jan 23, 2017

I think now that the cHash parameter should not be in there. @dmitryd any idea how it might have ended up in there?

@dmitryd

This comment has been minimized.

Copy link
Owner

commented Jan 27, 2017

It is very simple. Realurl takes the URL from TYPO3 "as is" and converts it from its /index.php form to a speaking URL. If TYPO3 provides a URL with cHash, realurl converts that. If TYPO3 provides a URL without cHash, realurl also converts that. Realurl does not care because it does not know if cHash should be there or not. It converts one-to-one.

Normally if there is /en/home/, which maps to L=0&cHash=9f53d73f2ddab4fc9bf9f82de587c425&id=956 and L=0&id=956, it means that both these URLs were passed from TYPO3 to realurl. Realurl does its work and converts them.

Check your sites and find what makes TYPO3 to pass L=0&cHash=9f53d73f2ddab4fc9bf9f82de587c425&id=956 to TYPO3. This is your responsibility, not realurl's because it is you, who make these URLs. You see, it is original url!

I made same error once in the past. I had a language selector, which used custom typolink configuration, which had useCacheHash = 1. That created similar problems. Rootline created a link as index.php?id=1&L=0 but language selector produced index.php?id=1&L=0&cHash=ec77b708dfb5aae6ca49e59208389a60. This was my error, which I fixed and got rid of wrong URLs. And it had absolutely nothing to do with realurl.

@kitzberger

This comment has been minimized.

Copy link

commented Jan 27, 2017

Thanks for your feedback, @dmitryd. Now I can start debugging but know what I'm looking for ;-)

Just to be clear: a news single original_url should then always contain a cHash and the ones lacking cHash are the ones to get rid of, right?

Why is it troublesome having the same original_url twice (once with and once without cHash) in that table anyway? I imaging RealURL will pick the first entry it's finding, right? Why is it resulting in a 404 then?

@dmitryd

This comment has been minimized.

Copy link
Owner

commented Feb 6, 2017

Just to be clear: a news single original_url should then always contain a cHash and the ones lacking cHash are the ones to get rid of, right?

Correct.

Why is it troublesome having the same original_url twice (once with and once without cHash) in that table anyway? I imaging RealURL will pick the first entry it's finding, right? Why is it resulting in a 404 then?

Ordering is not that simple. It depends on:

  1. expiration (non-expired first, if it does not exist, than the oldest url, which is redirected to a newer to let search engine update their index)
  2. cHash. If there are several urls, than the url with cHash is preferred. This will work properly for most extension but may break incorrectly made language selector. But there are more cases with extensions than with language selectors.
  3. language (if there is nothing for the currently detected language, than default is checked - useful for content fallback)

It is complicated because this search happens when TYPO3 is not initialised yet. There is no page id, no TypoScript, no anything else. So realurl has to do a lot of guess work and hope that it is correct.

@mneuhaus

This comment has been minimized.

Copy link

commented Apr 3, 2017

we're running into similar issues with a TYPO3 7.6.16 and RealUrl 2.1.9.
At some point a cacheEntry looking like L=1&cHash=250d8279401ba2228e586aa046da79a2&id=1 ends up on the table tx_realurl_urldata. This entry seems to be "favored" over an additional entry that exists without cHash like L=1&id=1. At some point the cHash invalidates and thus results in a 404 in the frontend.

If i delete this one specific cacheEntry everything is fine again. I can create a custom entry in the tx_realurl_urldata table with an invalid cHash to reproduce the 404 issue.

I get, that this isn't really a "Bug" inside RealUrl, because it can't guess what it gets as url parameters and thus takes everything "as is".
My Problem is, that i'm unable to locate the "culprit" that actually creates the page link with cHash so far. Things i've tried:

  • regular page load
  • open frontend page from Backend
  • view frontend page in backend
  • ke_search indexing, indexed_search indexing

This issue is quite annoying currently, which is why i've implemented a dirty workaround for the site, that is experiencing this issue:

    /**
     */
    public function fixBrokenRealUrlCacheCommand()
    {
        $GLOBALS['TYPO3_DB']->exec_DELETEquery('tx_realurl_urldata', 'request_variables LIKE "%cHash%" AND speaking_url NOT LIKE "%cHash%"');
    }

PS: i've also tried to do a complete RealUrl Cache flush, took ~1.5 Days for the next 404 to appear.

@dmitryd

This comment has been minimized.

Copy link
Owner

commented Apr 3, 2017

At some point a cacheEntry looking like L=1&cHash=250d8279401ba2228e586aa046da79a2&id=1 ends up on the table tx_realurl_urldata

Remove useCacheHash = 1 from your language selector.

@dmitryd

This comment has been minimized.

Copy link
Owner

commented Apr 3, 2017

My Problem is, that i'm unable to locate the "culprit" that actually creates the page link with cHash so far.

Use realurl_trace extension. It will provide stack trace for a given regular expression for the url.

@mneuhaus

This comment has been minimized.

Copy link

commented Apr 8, 2017

Hey @dmitryd thanks for the Feedback, i disabled the cacheHash on the language selector, sadly there are still "broken" cache entries popping up.
I'd love to try the realurl_trace ext, but it's a bit pointless, because i fail to find the spot, that causes the faulty cache entry. i'll keep looking and will try to give feedback, when i manage to find out more!

@bucha

This comment has been minimized.

Copy link

commented Apr 18, 2017

Hi!

These lines are mainly for reference.

When I had this problem recently, I found out that an external marketing agency launched a campaign on Google AdWords. This ended in having gclid-parameters in the URLs (Google Click Identifier). As with almost every parameter, this means having a new cHash calculated by TYPO3.

The problem here is, that RealURL ignores this parameter alongside a few others:
'cache/ignoredGetParametersRegExp' => '/^(?:gclid|utm_[a-z]+|pk_campaign|pk_kwd|TSFE_ADMIN_PANEL.*)$/',

TYPO3 (by default) does not (LocalConfiguration.php):
'cHashExcludedParameters' => 'L, pk_campaign, pk_kwd, utm_source, utm_medium, utm_campaign, utm_term, utm_content'

Alas, TYPO3 includes the gclid-parameter in the cHash calculation, RealURL ignores it, but receives the cHash… Therefore you end up with the duplicated entries in the DB (cHash- vs. non-cHash version).

All you'll have to do, is to add gclid in the list of ignored parameters of the cHash calculation such as:
'cHashExcludedParameters' => 'gclid, L, pk_campaign, pk_kwd, utm_source, utm_medium, utm_campaign, utm_term, utm_content',

Now this is not the end of the story…

Just when I thought this problem was dealt with, it hit me again and again. After endless hours of depressing bughunt, I accidentally found out, that the site's search was triggering the error.
This was due to the fact, that the search's index tables still contained entries which hold the gclid-parameter.
When those search result items were listed, the typolink incl. gclidand cHash was created and finally the duplicate entry in tx_realurl_urldata was inserted.

So I decided to simply delete the broken entries in the search index;
DELETE FROM index_phash WHERE cHashParams LIKE "%s:5:\"gclid\"%";

Hope this helps.

@romm

This comment has been minimized.

Copy link
Contributor

commented Jun 1, 2017

@bucha thanks a lot for taking the time to write your message, it probably saved us hours of debugging work!

@houmark

This comment has been minimized.

Copy link

commented Jun 27, 2017

I have also encountered this issue. Thanks a lot for the write up @bucha - also time saved here. I guess adding the gclid should help to keep the consistency and if there's other weird campaigns with URL's then they should be ignored by both TYPO3 and realurl and such they should not be a problem?

If that's the case, maybe @dmitryd should consider getting gclid added to the default core list also or remove it from realurl for consistency to avoid what we are seeing here (maybe only if the core does not have it in the list, it should be possible for realurl to read the core configuration). This "issue" is pretty annoying as this will happen "randomly" and this could in theory be seen as a security issue, as one could spam various TYPO3 sites with gclid parameter in the URL to effectively disable the frontpage of any language version of a that website. Would not gain much from it, but that's not the point, people also doesn't gain much from defacing sites.

@dmitryd

This comment has been minimized.

Copy link
Owner

commented Jul 3, 2017

@houmark

maybe @dmitryd should consider getting gclid added to the default core list also or remove it from realurl for consistency to avoid what we are seeing here (maybe only if the core does not have it in the list, it should be possible for realurl to read the core configuration)

You also can propose a change to the core. Anybody can. You need it, so you do it 😉 I have enough work with realurl already.

As to removal from realurl, the answer is "no" because removing this parameter from realurl will cause various issues.

@dmitryd dmitryd closed this in 8cfb6f0 Jul 6, 2017

@dmitryd

This comment has been minimized.

Copy link
Owner

commented Jul 6, 2017

RealURL will not check & add gclid to excluded cHash parameters if TYPO3 does not do that.

@Elypson

This comment has been minimized.

Copy link

commented Aug 2, 2017

Might change this "not" to "now", I had to read several times to understand. Sounds good, though.
In my configuration, no utm_* was set in the TYPO3 configuration... So, also those links were an issue. Also, when realurl ignores utm_* and you can only set non-wildcard utm parameters in TYPO3, this is quite a big security leak. Whoever uses TYPO3 and Realurl might be taken down easily.

@romm

This comment has been minimized.

Copy link
Contributor

commented Aug 2, 2017

Hi @Elypson, you might want to take a look at #491, it has already been merged and will be part of the next release.

@Elypson

This comment has been minimized.

Copy link

commented Aug 2, 2017

You are right! I have added a comment there, as well

cepheiVV added a commit to cepheiVV/TYPO3CMS-Guide-FrontendLocalization that referenced this issue Oct 20, 2017
Update Index.rst
Lot of people seems to have issues with realUrl and `useCacheHash = 1`.
Dmitry (creater of realurl) also said it's causing issues, so he removed it from his lang nav. 
dmitryd/typo3-realurl#315
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
You can’t perform that action at this time.