Crawlable removed? #169

polyclick · 2013-03-12T17:22:35Z

Why is the crawlable feature removed in 1.6? Any further explanation would be awesome. ;)

asual · 2013-03-14T19:24:51Z

I don't consider this crawlable mechanism as a good long term investment. The new history API removes the need of hashes completely and provides a much better base for search engine indexing. Here are sample queries that show how the content can be indexed:
https://www.google.com/search?q=site:asual.com+%22About+content.%22
http://search.yahoo.com/search?p=site%3Aasual.com+%22About+content.%22

benjaminhorner · 2013-04-11T13:13:24Z

Hi. I am not too sure as to what your explanation demonstrates…
As far as I can see, google is still recommending the #! method (https://developers.google.com/webmasters/ajax-crawling/docs/getting-started).
What is the alternative you are talking about ?

dbeja · 2013-06-12T15:17:41Z

If, for example, an user that is visiting the site with IE9 (there are still many) and paste the url in facebook (for sharing), the site is crawled wrongly, because it doesn't have the hashbang for old browsers.

trusktr · 2013-08-02T11:07:54Z

@asual As far as I know, the Google search bot indexes only the initial content it receives on the page and doesn't wait for javascript to execute and for the script to load AJAX content. So in the example you linked, the initial content is "Home content", which is what should be indexed. How is the search bot able to index the "About content"?

Specifically, what needs to be done to make this work? I'm sure if you tell people to just "use the History API", it won't work for everyone.

If your About tab is indeed being indexed, then maybe the Search bot is actually executing javascript, and waiting some number of seconds to index the final generated content, but I highly doubt Google would make their servers do this for each and every single page on the internet, which would require an extremely high use of cpu and resources on Google's end, which is why they want to delegate this to end developers to implement on their own servers, which is why Google suggests using a headless browser to create snapshots.

Why would Google suggest us to use a headless browser if Google is using a headless browser already? Something's up here. How is your linked example not just a special (accidental?) case?

Maybe it is true: Google has decided to parse and execute the javascript on every single page of the internet, but I seriously doubt it. You'd need a really REALLY enormous computer to do that, probably more powerful than any computer we have today. There are billions of websites in existence, most of them using javascript. How can Google possibly index billions of sites (including executing all of their javascripts)?

Let's think about it. Retrieving static HTML and indexing the content without executing javascript is one thing, but retrieving HTML then executing javascript must be on average 10 to 10000 times slower depending on the site, and prone to crashes, lockups, and endless loops, and take-advantage-of-google hacks.

Is Google really doing this???

Just curious. Food for thought. :)

trusktr · 2013-08-02T23:40:13Z

Ok, I thought about this a little more:

Google will be able to index your content if you use the History API if and only if you send identical content from the server when that URL is visited in the browser, not if you rely on reading the URL then using AJAX to get and show content based on the URL.

Example: your site uses history.pushState and loads content via ajax. If a user visits the URL you saved to the history, the server will send back the page in the same form as it appeared when it was generated by ajax. In this case, History API works fine. But if your app relies solely on AJAX to generate content (as in no matter what URL you visit the app always retrieves data based on the URL then displays it with AJAX (so that the server always sends exactly the same HTML at first)) then Google will not be able to index the site.

So simply telling someone to use History API isn't the entire solution. The Ajax site still needs to be able to send an initial HTML snapshot of the page for each URL, which is a big task for someone who designed their site using JSON to retrieve data and not HTML. Those users will have to do heavy modification to make sure the app sends HTML on each initial request.

A solution for apps relying on JSON would be to use a headless browser on initial request in order to generate HTML, then send that, and from that moment on the app can continue to use JSON like normal.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crawlable removed? #169

Crawlable removed? #169

polyclick commented Mar 12, 2013

asual commented Mar 14, 2013

benjaminhorner commented Apr 11, 2013

dbeja commented Jun 12, 2013

trusktr commented Aug 2, 2013

trusktr commented Aug 2, 2013

Crawlable removed? #169

Crawlable removed? #169

Comments

polyclick commented Mar 12, 2013

asual commented Mar 14, 2013

benjaminhorner commented Apr 11, 2013

dbeja commented Jun 12, 2013

trusktr commented Aug 2, 2013

trusktr commented Aug 2, 2013