New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find a strategy to eliminate double indexing of similar pages. #406

Open
assem-ch opened this Issue Apr 18, 2014 · 9 comments

Comments

Projects
None yet
3 participants
@assem-ch
Copy link
Member

assem-ch commented Apr 18, 2014

  • add "no index" to search pages if a parameter is not affecting the keywords in output such as: vocalized, script, view...
  • add "no follow" to clickable links that should not be indexed, like Juz:N, Hizb:2

This is the report of google in Webmaster tools about this issue:

Googlebot encountered problems while crawling your site http://www.alfanous.org/.

Googlebot encountered extremely large numbers of links on your site. This may indicate a problem with your site's URL structure. Googlebot may unnecessarily be crawling a large number of distinct URLs that point to identical or similar content, or crawling parts of your site that are not intended to be crawled by Googlebot. As a result Googlebot may consume much more bandwidth than necessary, or may be unable to completely index all of the content on your site.

More information about this issue

Here's a list of sample URLs with potential problems. However, this list may not include all problematic URLs on your site.

    http://www.alfanous.org/en/translation/?sortedby=score&recitation=14&query=ille-btig%C3%A2e&translation=en.transliteration&page=1&unit=translation&view=default
    http://www.alfanous.org/en/translation/?sortedby=mushaf&recitation=18&query=hixiyana&translation=en.ahmedali&page=1&unit=translation&view=default
    http://www.alfanous.org/id/translation/?sortedby=score&recitation=14&query=lang%3Apt+AND+gid%3A1702&translation=en.transliteration&page=1&unit=translation&view=default
    http://www.alfanous.org/pt/aya/?query=%D8%A2%D8%AA%D9%8A%D9%86%D8%A7%D9%87%D8%A7&page=1&unit=aya
    http://www.alfanous.org/en/translation/?query=gid:2633%20AND%20lang:de&page=1&unit=translation
    http://www.alfanous.org/fr/translation/?query=kjenner&page=1&unit=translation
    http://www.alfanous.org/es/translation/?query=Hereafter)%3B&page=1&unit=translation
    http://www.alfanous.org/fr/?query=sura%3A%22At-Taghabun%22%20%2B%20aya_id%3A12&page=1&unit=aya
    http://www.alfanous.org/en/aya/?query=%D9%88%D9%8E%D9%85%D9%90%D9%86%D9%8E&page=8&unit=aya
    http://www.alfanous.org/ar/translation/?query=befall.&page=1&unit=translation
    http://www.alfanous.org/ku/translation/?sortedby=mushaf&recitation=18&query=ilmu&translation=en.ahmedali&page=1&unit=translation&view=default
    http://www.alfanous.org/fr/aya/?query=aya_%3A%D8%B8%D9%8E%D8%A7%D9%84%D9%90%D9%85%D9%8E%D8%A9%D9%8C&page=1&unit=aya
    http://www.alfanous.org/ja/aya/?query=a_w%3A22&page=1&unit=aya
    http://www.alfanous.org/en/translation/?sortedby=score&recitation=14&query=biyun&translation=en.transliteration&page=1&unit=translation&view=default
    http://www.alfanous.org/ar/aya/?query=sura%3A%22%D8%A7%D9%84%D9%85%D8%A7%D8%A6%D8%AF%D8%A9%22+%2B+aya_id%3A75&page=1&unit=aya
    http://www.alfanous.org/fr/translation/?sortedby=mushaf&recitation=18&query=%5Buninhibited%5D&translation=en.ahmedali&page=1&unit=translation&view=default
    http://www.alfanous.org/en/translation/?query=budur.%22&%22=&-=&page=1&unit=translation
    http://www.alfanous.org/fr/translation/?query=grain,&page=4&unit=translation
    http://www.alfanous.org/ku/aya/?query=%3E%D8%A7%D9%82%D8%B0%D9%81%D9%8A%D9%87&page=1&unit=aya
    http://www.alfanous.org/ku/aya/?sortedby=mushaf&recitation=18&query=%D9%8A%D9%8E%D8%B4%D9%8E%D8%A7%D8%A1%D9%8F&translation=en.ahmedali&page=1&unit=aya&view=default
    http://www.alfanous.org/es/translation/?sortedby=mushaf&recitation=18&query=Noastr%C4%83%3F&translation=en.ahmedali&page=1&unit=translation&view=default
    http://www.alfanous.org/ms/aya/?query=%D8%A2%D9%8A%D8%A9_%3A%D9%85%D9%8F%D8%B9%D9%8E%D9%85%D9%91%D9%8E%D8%B1%D9%8D&page=1&unit=aya
    http://www.alfanous.org/en/translation/?query=lang%3Aes+AND+gid%3A4846&page=1&unit=translation
    http://www.alfanous.org/id/translation/?query=gid%3A623+AND+lang%3Aen&page=1&unit=translation
    http://www.alfanous.org/fr/?query=aya_:%D9%86%D9%8F%D9%83%D9%8E%D9%81%D9%91%D9%90%D8%B1%D9%92&%22=&page=1&unit=aya
    http://www.alfanous.org/en/translation/?query=gid:5524%20AND%20id:ml.abdulhameed&page=1&unit=translation
    http://www.alfanous.org/en/translation/?sortedby=mushaf&recitation=18&query=lang%3Ade+AND+gid%3A600&translation=en.ahmedali&page=1&unit=translation&view=default
    http://www.alfanous.org/ml/translation/?query=lang%3Aen+AND+gid%3A4078&page=1&unit=translation
    http://www.alfanous.org/ku/translation/?sortedby=score&recitation=14&query=gid%3A5432&translation=en.transliteration&page=1&unit=translation&view=default
    http://www.alfanous.org/ku/translation/?sortedby=mushaf&recitation=18&query=ya%5D&translation=en.ahmedali&page=1&unit=translation&view=default
    http://www.alfanous.org/ja/aya/?query=sura%3A%22Al-An'am%22%20%2B%20aya_id%3A53&page=1&unit=translation
    http://www.alfanous.org/en/translation/?query=rrug%C3%ABn&page=10&unit=translation
    http://www.alfanous.org/id/aya/?query=%3E%D8%B9%D9%86%D8%AF&page=1&unit=aya
    http://www.alfanous.org/es/?query=a_l%3A177&%22=&page=1&unit=aya
    http://www.alfanous.org/en/translation/?sortedby=mushaf&recitation=18&query=%D8%A7%D9%86%D8%B8%D8%B1%D9%88%D9%86%D8%A7%C2%BB&translation=en.ahmedali&page=1&unit=translation&view=default
    http://www.alfanous.org/id/translation/?query=gid:3424%20AND%20id:en.yusufali&page=1&unit=translation
    http://www.alfanous.org/pt/translation/?query=rozpowszechni%C5%82o&page=1&unit=translation
    http://www.alfanous.org/es/translation/?query=feekum&page=1&unit=translation
    http://www.alfanous.org/fr/translation/?query=op%2C&page=1&unit=translation
    http://www.alfanous.org/ja/aya/?action=search&query=%D9%82%D9%8F%D9%88%D9%91%D9%8E%D8%A9%D9%8E&sortedby=relevance&page=1&unit=aya
    http://www.alfanous.org/en/?query=%3E%D9%88%D8%A7%D8%AC%D8%B9%D9%84&page=1&unit=aya
    http://www.alfanous.org/ar/?query=%D9%83_%D8%A2%3A19&page=8&unit=aya
    http://www.alfanous.org/id/translation/?query=gid%3A295+AND+id%3Atr.yuksel&-=&page=1&unit=translation
    http://www.alfanous.org/ar/translation/?query=gid:5812%20AND%20id:en.yusufali&page=1&unit=translation
    http://www.alfanous.org/ar/aya/?action=search&query=%D8%A2%D9%8A%D8%A9_:%D9%88%D9%8E%D9%85%D9%8E%D9%84%D9%8E%D8%A6%D9%90%D9%87%D9%90&sortedby=relevance&page=1&unit=aya
    http://www.alfanous.org/en/translation/?query=%C5%9F%C3%BCkretmeyecekler&page=1&unit=translation
    http://www.alfanous.org/pt/translation/?sortedby=mushaf&recitation=18&query=deseja&translation=en.ahmedali&page=1&unit=translation&view=default
    http://www.alfanous.org/pt/aya/?query=%D8%A3%D9%81%D8%B3%D8%AF%D9%88%D9%87%D8%A7&page=1&unit=aya
    http://www.alfanous.org/en/translation/?sortedby=mushaf&recitation=18&query=lang%3Aen+AND+gid%3A5933&translation=en.ahmedali&page=1&unit=translation&view=default
@mdebbar

This comment has been minimized.

Copy link
Contributor

mdebbar commented Apr 18, 2014

Can we just redirect bots to common search pages instead of adding "no index"? For example: if a bot is requesting "?query=Allah&view=default" redirect it to "?query=Allah".

@assem-ch

This comment has been minimized.

Copy link
Member Author

assem-ch commented Apr 18, 2014

is there a way to do that?

@mdebbar

This comment has been minimized.

Copy link
Contributor

mdebbar commented Apr 18, 2014

Yes. We have to read the User-Agent (from the HTTP headers) and compare it with a list of bot user agents. http://user-agent-string.info/list-of-ua/bots. For example, Google's bot has the user agent "Googlebot".

@assem-ch

This comment has been minimized.

Copy link
Member Author

assem-ch commented Apr 18, 2014

will the User-agent think that it is crawling start url or end url? because if it assumed it is indexing different urls with same contents so it will be like "fake content"

@mdebbar

This comment has been minimized.

Copy link
Contributor

mdebbar commented Apr 18, 2014

When they hit the start url, they shouldn't get any content. We should respond with an HTTP redirect (code 302) so they go and load the end url. Maybe redirection is a bad idea, I'm not sure.

@assem-ch

This comment has been minimized.

Copy link
Member Author

assem-ch commented Apr 18, 2014

I dont know what's better , but seems for the urls sharable by users, is better to be redirected.

here is redirection of googlebot to smartphone version of pages:
https://developers.google.com/webmasters/smartphone-sites/redirects

@sneetsher

This comment has been minimized.

Copy link
Member

sneetsher commented May 10, 2017

I see two issues:

  • multi-language support (URL switch/sub-directory)

    Related doc to fix it: Multi-regional and multilingual sites & Use hreflang for language and regional URLs

      ##for each supported language xx
      <link rel="alternate" href="http://www.alfanous.org/xx/..." hreflang="xx" />
      ##default too
      <link rel="alternate" href="http://www.alfanous.org/..." hreflang="x-default" />
    
  • trivial query parameters (query seems the only important one like mdebbar had mentioned)

    Related doc to fix it: Use canonical URLs

      ##canonical page with only `query=` parameter
      <link rel="canonical" href="https://www.alfanous.org/?query=..." />
    

Update to add another one:

sneetsher added a commit to sneetsher/alfanous that referenced this issue May 11, 2017

Add canonical link and clean alternative link in WUI, related to Alfa…
…nous-team#406 SEO issue

* Add canonical link to HTML header with only 'query' as parameter
* Remove the tailing '?' from alternative link in HTML header for empty requests

sneetsher added a commit to sneetsher/alfanous that referenced this issue May 22, 2018

Add pagination rel-links to all searches, Alfanous-team#406
 Apply escape charaters filter on it
@sneetsher

This comment has been minimized.

Copy link
Member

sneetsher commented May 22, 2018

Canonical, multi-language & pagination rel-links added in the linked PR.

  • "no index" seems not needed, Canonical link is expected to fix it.
  • "no follow" for Juz:N, Hizb:N, sura_arabic:الفاتحة ...
    I'm ok if this is a temporary workaround, but seems to me that the correct way is to have specific result page template for such pre-ordered collection of ayahs.
    Ayahs should be ordered as in noble Quran by default and no separation between ayahs. That's what the user would expect.
@assem-ch

This comment has been minimized.

Copy link
Member Author

assem-ch commented May 23, 2018

Grouping by juz and hizb and sura, are intended to be #302 feature

assem-ch added a commit that referenced this issue May 25, 2018

Merge pull request #481 from sneetsher/master
Add meta links, tags and microdata, related to SEO, issue #406
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment