Site is Invisible to Google? #819

Closed
sinned opened this Issue May 20, 2016 · 38 comments

Projects

None yet
@sinned
sinned commented May 20, 2016

Hey all,

I'm trying to understand how to make my site show up in Google search, and when using webmaster tools, it appears that my site appears blank to the Google Bot. For testing purposes, I deployed the existing version of this starter kit to http://react.dennisyang.com/

In build/webpack.config.js, I found that removing this from the config makes "Fetch as Google" show the site (but, oddly, the "what the public sees" is still blank):

      production: {
        presets: ['react-optimize']
      }

Has anyone had any luck getting a site made with this starter kit indexed on Google?

Here's a screenshot of "Fetch as Google" using the latest react-redux-starter-kit build:
screen shot 2016-05-20 at 1 21 13 pm

And, here's a screenshot of "Fetch as Google" with 'react-optimize' disabled:
screen shot 2016-05-20 at 1 21 44 pm

Any thoughts? This fix appeared to make the homepage of my site show up in Google, but any page using Redux to grab content was not rendered by Google.

Thanks!

dennis.

@dkenzik
dkenzik commented May 21, 2016 edited

@sinned - In order to allow any search engines to completely index your app, you'll need to implement some server-side rendering. This will turn your project into an isomorphic (or universal) app. There are numerous techniques for accomplishing this, but the gist of it is: spin up an express server; ensure your server is aware of your routes; use React's render (as html/string) to inject into your server-side template. This method will allow express to server up any entry point and data to the browser, after which react in the browser will take over. Here is a decent Smashing article on the topic.

You may also want to check out react-helmet. It will inject the necessary meta entries into your app's <head> for indexing by the engines. I'm not sure about support across all search engines, but I believe Google will index the results of react-helmet properly, and scrapers will make proper use of any opengraph tags from react-helmet. This may be all you need.

@yantakus
Contributor

Making an app isomorphic is absolutely not necessary for google to index it. Google executes javascript and then indexes the output.

@dkenzik
dkenzik commented May 22, 2016

@web2style - Agree 100%. Google does try to index JS output. I was speaking more generally regarding best practices when dealing with such tech and crawlers in general. Thanks for the nudge back to the topic though, considering the OP was indeed referencing Google. I've added emphasis to my previous response for clarity.

@anthonybrown
anthonybrown commented May 24, 2016 edited

@dkenzik I agree, it's easy to render on server-side first with React so why not do it?

@nicolasiensen

In my last experience with single page applications I had problems with the Facebook crawler, which doesn't execute your Javascript to collect the meta tags 😞

@gabeweaver
gabeweaver commented Jun 3, 2016 edited

I've been running into this issue the past few days and have gone back and forth.

Desired Outcome: Simple static site powered by the starter kit - something like a basic marketing site or docs similar to GatsbyJS.

Best time to use server-side rendering: If you have to hit any third-party services/APIs before the user interface can provide any value to the end user. this is an excellent explanation of why

  • Pros - It's straightforward and a common pattern, especially using something like react-dom-stream
  • Cons - It's another thing to manage, monitoring, maintain, and generally be responsible for if we have no other external data dependencies. Additionally, serving an SPA will always be less performant than serving static HTML and then layering on the JS.

Best time to use client-side rendering - When you don't have any sort of API or data that needs to be queried, rendering client side is extremely performant and immediately returns the necessary HTML and CSS for the first view.

  • Pros - easier to maintain, especially if the initial props are almost always identical. Hosting typically much cheaper and more approachable via something like S3 or Surge.sh.
  • Cons - Webpack is optimized for SPAs, not acting as a static site generator. SEO is a challenge.

I'm currently exploring using Surge.sh to serve static assets. It's really awesome for several reasons:

  • It's ridiculously simple to use - literally 15 seconds to go from no hosting provider or configuration to publishing ./dist and being able to access it via a custom domain / sub domain in any browser. It was just as easy to trigger deploys via CircleCI after the tests pass when a feature branch gets merged into master.
  • It's free...and really affordable when you need the premium features - $13 / mo and has the basic configuration options like SSL and redirects that we typically would need a server in order to accomplish.
  • It has a great caching strategy out of the box, which would simplify webpack configs and having to rely on bundle hashes and what not for cache busting.
  • Automatic GZIP compression decreases the current "production optimized and fractified" webpack output by as much as 75% in my the few tests i've run...putting my app and vendor JS bundles + index.html + 200.html + CSS + a few images at 350kb...of which only index.html (~75-100kb) is initially served and rendered along with the CSS to make it look nice.

The only problem is...the project i'm working on needs to be SEO friendly. I've starting exploring react-render-webpack-plugin after reading a few articles that seem promising.

Depending on how my experiment goes, I do think it's worthwhile to figure out how to make the starter kit have a configuration option for client-side rendering (that is SEO friendly), server-side rendering (that is SEO friendly), or true SPA that doesn't care about SEO similar to how CSSModules is an option in the config...

Any other thoughts on how to solve for this?

@gabeweaver

oh and if i didn't mention it...Google won't crawl and/or render the webpack bundles as they are currently configured by default in the starter kit.

@chovy
chovy commented Jun 9, 2016

Forget about SPA and indexing.

@sauravskumar

@gabeweaver Did you try phantomjs for server side rendering for bots and SPAs for normal clients?

@trungpham

We can support server side rendering using this library. https://github.com/makeomatic/redux-connect

@amrit92
amrit92 commented Jun 27, 2016

I am facing the same issue and looking for a solution. Can anyone point to solution for SSR using this starter-kit specifically? @trungpham Can you show an integration?

@yantakus
Contributor

I'm using v2.0 of this starter kit. It doesn't use code splitting, so this could be the point. But I don't have any problems with indexing by google. I generate sitemap.xml and all the pages are indexed without any problems.

@jhabdas
jhabdas commented Jun 29, 2016

As others have pointed out the Goog will crawl an Ajax site. Bing will do so as well. But those aren't the only two search engines out there. Due to time constraints crawlers allocate to index sites, Ajax sites will be crawled more slowly. Probably not a huge deal for most sites unless you have a lot of pages changing very often.

The term "Universal" is not interchangeable with "isomorphic", as Universal apps like Este.js are anything but SEO friendly.

Been meaning to write on this topic for years, but hopefully my talk on Isomorphic React (including example app) will prove useful to some:

http://habd.as/talks/isomorphic-rendering-react/

@davezuko
Owner

Out of scope at the moment, we do not support universal rendering so this is not much of a concern. Closing to cleanup issues.

@davezuko davezuko closed this Jun 29, 2016
@nhagen
nhagen commented Sep 21, 2016

I'm not sure I'm seeing any smoking guns here in terms of why google is unable to render pages with this starter kit. React is capable of supporting Googlebot rendering. Rather than discuss server-side rendering, how can we isolate what is preventing from googlebot rendering the page? A good question to ask is has anyone using this starter kit having their site indexed/rendered?

@jhabdas
jhabdas commented Sep 21, 2016 edited

Google and Bing have been Ajax crawling since at least 2012. Just look for Matt Cutts videos from around 2011, and Bing actually published the feature when announced in '12. Pages in a SPA will look like a black hole to all non Ajax crawlers (e.g. less sophisticated scrapers) and non-JS browsers such as elinks and lynx. Even though SPAs are crawled some may still experience issues using Google Search Console tools to, for example, test schema.org stuffs.

That said if you're building an app build an app. If you're building a website go static or isomorphic for best SEO and accessibility.

@nhagen
nhagen commented Sep 21, 2016

In our case we have several dependencies which aren't isomorphic because they're either wrappers around non-react components, or the maintainers just never considered running in an server environment (and so just importing them throws document is not defined). Moreover, changing from nginx to node is not ideal. So there are at least a few reasons why server-side rendering might not be worth the trouble, especially where the benefits we seek should be attainable without it.

I only care about google indexing for now, and I'm hoping to get the conversation started on what specifically is preventing that if it's something that is a common problem for users of this repo.

@ReLrO
ReLrO commented Oct 4, 2016

@nhagen Please let me know if you find a solution. I'll do the same.

@ReLrO
ReLrO commented Oct 5, 2016

@gabeweaver Do you have an example of a webpack config file that will enable Google crawling?

@jhabdas
jhabdas commented Oct 5, 2016 edited

@ReLrO https://github.com/jhabdas/lumpenradio-com/blob/master/tools/webpack.config.js.

It's not so much the Webpack config as it is the architectural approach. That is isomorphic. As I mentioned earlier, Google, and Bing, will crawl non-Isomorphic apps (those with which the content is injected by JavaScript).

@ReLrO
ReLrO commented Oct 5, 2016

Thanks @jhabdas. I am trying to see if I can avoid the isomorphic approach. I deployed my webapp to AWS S3 and AWS Cloudfront as a static website and I am trying to get Google to see it. I read that Google can see react generated sites (that use async calls), but when using the starter kit, Google only sees a blank page. As people here mentioned, it seems like it is something to do with the configuration of the starter kit. Might be something to do with the Hot Module Replacement approach. I am not sure...

@jhabdas
jhabdas commented Oct 5, 2016

@ReLrO the black hole has nothing to do with this starter kit, and everything to do with JavaScript.

@ReLrO
ReLrO commented Oct 5, 2016

@jhabdas what do you mean?

@jhabdas
jhabdas commented Oct 5, 2016

@ReLrO I've been building SPAs since Backbone was introduced, and here's what a backbone website looked like: https://speakerdeck.com/jhabdas/isomorphic-rendering-with-react?slide=6

@ReLrO
ReLrO commented Oct 5, 2016

I understand, but it was also my understanding that Google can now crawl such websites (for example - http://chrisarasin.com/react-seo/) but that doesn't work for me...

@jhabdas
jhabdas commented Oct 5, 2016

@ReLrO have you registered your site with the google search console and added a page to your sitemap.xml file? If so, what happens once google indexes a page and you search for it?

@ReLrO
ReLrO commented Oct 5, 2016

I have and the page is completely blank. Google also has a tool called Fetch as Google (https://support.google.com/webmasters/answer/6066468?hl=en) which allows you to see how the crawler sees your site and Google sees it as a blank page. I read these articles that prove that Google sees their React generated client-side pages and then I read the comments in this discussion and it seems like something that is enabled in the starter kit is causing this behavior.

Not sure I completely agree with the article you just sent. Serving static content off a CDN also has a lot of pros.

In any case, I guess that I will have to move to server-side rendering if I want Google to index my site. hopefully it will be an easy transition and I wont need to refactor a lot of code. Any pointers you can give me on how to do it quickly? I am using the starter-kit with react-router and react-redux-router.

@jhabdas
jhabdas commented Oct 5, 2016 edited

Fetch is not how Google sees your site, and apparently has the same issues it had three years ago. Please try what I suggested and let us know when you have an answer. I saw the same thing with Backbone (JS-content injected sites) back in 2013. And they were indeed crawled despite Google's lackluster and FUD-inducing tooling.

@jhabdas
jhabdas commented Oct 5, 2016

Also, there's no going back to Isomorphic. Once you start down the path the Cheshire cat will erase the tracks home.

@ReLrO
ReLrO commented Oct 5, 2016

Ok, thanks @jhabdas. I'll let you guys know after I index the page.

@jhabdas
jhabdas commented Oct 25, 2016

If you're looking to go isomorphic (yes, not Universal) look no further than the React Production Starter. And thank David for his incredible work and givingness. Thanks David <3

@sauravskumar
sauravskumar commented Oct 25, 2016 edited

@jhabdas can you give the link of the exact starter kit you'r talking about....... (are you talking about this starter kit????? )

@jhabdas
jhabdas commented Oct 25, 2016 edited

@sauravskumar indeed

@mstijak
mstijak commented Nov 13, 2016

If anybody is still interested, I wrote a blog post on how I managed to overcome this issue.

@jhabdas
jhabdas commented Nov 14, 2016 edited

@mstijak Thanks for the write-up. You're hitting on a very hot topic and I'm curious to know if your post takes off. A couple of pull quotes may serve you well in getting the view/read ratio climb a little.

FWIW, I popped open CX Docs in the lynx browser and compared it with what you'd see on an isomorphic app and here's the difference:

screen shot 2016-11-14 at 2 25 24 pm

screen shot 2016-11-14 at 2 25 40 pm

While it does not apply to apps (because apps are apps), it's important our news and blogs do not use JS magic as we'd be doing some pretty heavy damage to the Great Library that is Web. Also, be careful if you rely on a polyfill library to make your site work, as you're creating a single-point of failure for the future. Regardless, thanks again for sharing and I hope your post does well on Medium.

And for anyone else who's interested, you can find some isomorphic boilerplates for React on Awesome React Boilerplates. Have fun out there!

@elyobo
elyobo commented Nov 14, 2016

For us, getting appropriate metadata in for sharing links on things like FB and Twitter was important as well and it didn't seem like they did a full load and execution of the JS to find it. We did limited server side loading of data so that the key data was loaded, while the non-essential stuff still gets client side loaded.

@madshargreave

I sort of solved this by inling the javascript in my index.html

@jhabdas
jhabdas commented Dec 14, 2016 edited

Related conversation on Medium in case anyone wants to share their experiences.

@davezuko davezuko locked and limited conversation to collaborators Dec 14, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.