Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logged In Users Favourites fetched on every page request #1640

Closed
kentfredric opened this issue Jan 29, 2016 · 21 comments
Closed

Logged In Users Favourites fetched on every page request #1640

kentfredric opened this issue Jan 29, 2016 · 21 comments
Assignees

Comments

@kentfredric
Copy link
Contributor

I was trying to diagnose why my page loads were taking so long, and I found some "fun" things.

Every page request fetches https://metacpan.org/account/favorite/list_as_json, logged in or not.

And when diagnosing the problem using webpagetest.org ( as recommended by fastly ), this URI stuck out as being a significant slowdown, taking 500ms to query, ... and then returning 403.

https://metacpan.org/source/RJBS/if-0.0606/Changes
snapshot

Upon performing this myself, I found that a lot of metadata is being fetched without an obvious application. Specifically, its strange to be seeing so much metadata about "who I am" and "my favourites" when I'm on a page unrelated to "me", and I'm not seeing how that data is useful except for the fact I have to be logged in somehow

@kentfredric
Copy link
Contributor Author

This behaviour appears to be one of the largest perfomance limiters for at least logged-out clients, as the 403 result can't be cached, but is still part of every request.

And so that nice big delay is incurred on every fetch, and blocks rendering the page.

Here's a test I did on a page I expected not to already be on the CDN from an Australian test machine connected at "Native" speed, which could very much be in the same cabinet as a fastly server.

http://www.webpagetest.org/result/160129_8V_M07/

There are 2 "jobs", and each job has 2 requests: The initial page load ( which is ~2.5 seconds ), and a second page load ( a refresh , and takes ~0.7 seconds ).

Of this refresh, 0.5 of those 0.7 seconds are consumed by the null list_as_json call.

There's a bunch of other ripe targets that could be improved on greatly, but this seems the largest, most obvious one, and the one with the least explanation for why it has to be there.

Other candidates for "bad" content that is uncached and potentially blocking page render:

https://cnstrc.com/js/ac.js?_=1454069053076 # no max age, no expires
https://api.coderwall.com/rjbs/endorsecount.png # expires in 60 seconds
https://secure.gravatar.com/avatar/cec6b9b88295ead26c65f8747a76561b?s=130&d=identicon # expires in 5 minutes

These also seem to have reasonably short caching times

7.3 minutes - https://metacpan.org/static/icons/metacpan-icon.png
26.0 minutes - https://metacpan.org/static/icons/favicon.ico
26.4 minutes - https://metacpan.org/static/fonts//fontawesome-webfont.woff?v=4.2.0
30.8 minutes - https://metacpan.org/static/images/sponsors/bytemark_logo.png
31.1 minutes - https://metacpan.org/static/images/metacpan-logo.png
44.3 minutes - https://metacpan.org/static/images/sponsors/liquidweb_color.png

And that font was another hot item for delays because of its size.

And here's another simulated fetch from a wellington server with similar (* but faster ) results:

http://www.webpagetest.org/result/160129_ZR_K03/

Hopefully after the biggest fish are fried I can see if this problem is still there or not.

@ranguard
Copy link
Member

@kentfredric the list_as_json is to customise the ratings [ 11 ++ ] on the page and it's a different colour if YOU have rated it, which could be in the search results or a module page, which is most of the site.

It's called on document ready https://github.com/CPAN-API/metacpan-web/blob/2c72394c65858a85418607b2eb6f96bb0669248e/root/static/js/cpan.js#L108

If you can delay it until later please do.

I should be able to speed up the miss dramatically (with some fastly config to not check the origin if there isn't a cookie), I'll have a look at that, maybe this weekend and update this ticket

@ranguard ranguard self-assigned this Jan 29, 2016
@kentfredric
Copy link
Contributor Author

Could you show me any pages that show the [++] widget other than

  • The distributions own index
  • The module itself
  • A search listing

I don't for instance see that widget anywhere on

  • Browse/
  • Anything in a "Source" rendering
  • The "Recent" queue
  • Another authors module list.

And in the case of the former, why is it handled through a JavaScript call instead of being part of the page fetch metadata? Is the HTML entirely identical when logged in and when not? ( I presume the answer to this is "Yes" for caching reasons )

Am I right in assuming this caching problem this this specific URI is prevalent only on un-authenticated access?

If so, and assuming the cache doesn't invalidate like hot-cakes, then that's "acceptable" and this perf issue on un-authenticated access is just hiding the real issues I'm seeing under the diagnosis tool.

Working out how to solve the observed "hmm, its sitting here dialing out to some other domain" that I spot from time to time is a decidely harder fruit to pick.

@kentfredric
Copy link
Contributor Author

The other question I have is why its such a slow request, even in the un-authenticated case.

Trying to reduce the cost of that request seems desirable, but I don't know where its getting tied down.

( For instance, how well does it scale vs the size of favourites?, how well does it scale vs the number of CPAN distributions and the number of authors? )

@ranguard
Copy link
Member

It's also used to work out if you are logged in or not (menu on the top right), this was done so the main content could be cached more at the CDN, just hasn't been implimented yet.

If you are logged in then the CDN caches the results, which is purged of you change anything.

So making the non-logged incase (which we are currently not caching at the CDN) fast should fix everything for list_of_json for all situations.

The speed for logged in users would be from:

https://github.com/CPAN-API/metacpan-web/blob/master/lib/MetaCPAN/Web/Controller/Account/Favorite.pm#L51

And for non-logged in users from:

https://github.com/CPAN-API/metacpan-web/blob/master/lib/MetaCPAN/Web/Controller/Account.pm#L17

If you want to investigate further

@ranguard
Copy link
Member

http://www.webpagetest.org/result/160130_XN_S06/1/details/

44ms for /account/favorite/list_as_json when not logged in from Australia (vs the 598 is was before)

Logged in is harder to test as I don't have a box I can proxy through, but that should always have been fast.

The font is available from:

http://cdnjs.com/libraries/font-awesome

So maybe send a patch for that link to be updated.

I'll leave this ticket open for a little, I do have work in progress to get fastly caching our HTML for long periods of time, and then just purging when there is a change to author/module - but not comiting to a timeframe on that atm

I'll leave this ticket open for a week or so, incase you want to submit patches against it, otherwise I'll close from there, thanks for pointing this out

@kentfredric
Copy link
Contributor Author

Just looking at that test I see you used the default, "Cable" for testing. Its a useful thing to look at to see how network latency impedes performance, but I did all my linked reports with the Fastly recommended connection profile of "Native", that doesn't do any latency-injection and traffic-shaping ( That way things like CPU being bottlenecked becomes a more obvious limiter in some places )

@kentfredric
Copy link
Contributor Author

Something else to consider: The activity widget on project pages seems somewhat expensive, and might be slowing the page load down. If you think it might be possible to lazily inject the image after the real page load is done, I could file a seperate bug for that.

Otherwise you essentially have to query the dist metadata for the page, and then again fetch and image based on the metadata for all versions of that dist, and that doesn't seem like it would be cheap.

Granted a lot of my problems are happening on "First cache" passes, because I'm possibly one of the only people on my fastly CDN, so the chances of it already being cached are low.

@ranguard
Copy link
Member

Ahh, ummm, I don't understand, that response (not logged in) is now being served directly from the VCL on Fastly's servers, e.g. it doesn't hit our origin at all.

  if( req.http.Cookie:metacpan !~ "\w+" && req.http.Cookie:metacpan_secure !~ "\w+" && req.url.path ~ "/account/favorite/list_as_json" ) {

    # ResponseObject: Empty list_as_json
    error 900 "Fastly Internal";

  }
...
if (obj.status == 900 ) {
   set obj.http.Content-Type = "application/json; charset=utf-8";   
   synthetic {"{}"};
   return(deliver);
}        

And there are lots in Australia:

https://www.fastly.com/network-map

@kentfredric to be honest opening issues isn't really going to do much, when my caching work is finished things will improve once something has been requested... but always better to fix even for 1st hit... if you have the time :)

@kentfredric
Copy link
Contributor Author

I was using the australia target because they have a wide variety of browsers, my local fastly is Wellington, but the WebPagetest only has IE for that origin.

I'm trying to use this to /explain/ why I'm getting slow performance, because the information I'm getting directly from the network log is ... not enlightening.

@kentfredric
Copy link
Contributor Author

For instance, here's my own trace of loading a page that stalled for a good 5 seconds "Reading from metacpan.org" ...

No context.

snapshot116

@ranguard
Copy link
Member

Totally agree we should try move as much as possible to as late as possible (e.g. the graph and anything else)

Just trying to look at the single request to list_as_json

http://www.webpagetest.org/result/160130_W5_WBP/ ( Australia - EC2 - Firefox - Native )

So a bit seems to be ssl certs... but probably a lot of combinations of things! Probably worth just cleaning up as much as possible one thing at a time (e.g. font -> cdn, delay little graph image load to as late as possible etc)

@kentfredric
Copy link
Contributor Author

Things get substantially worse for me when I disable uBlock and trace with firebug.

snapshot117

Blue line = I can see something
Red line = I can click another link.

Yes, it really sat there for 30 seconds saying "getting data from google-analytics".

snapshot118

I re-enabled ublock, but it seems paypal is on the "allowed" list ( And this requiest ran on rjbs profile which contains a paypal widget )

And if I tried to pretend I knew why this took 15 seconds, I'd by lying:
https://metacpan.org/source/RJBS/Tree-File-0.112/t
snapshot119

I mean, why did it spend 13 seconds to even ask for an item from cache!?!?

What sort of things are happening to make it take 13 seconds to reach this line? https://github.com/CPAN-API/metacpan-web/blob/2c72394c65858a85418607b2eb6f96bb0669248e/root/static/js/cpan.js#L513

( I can only assume that's what I'm seeing )

Its also worth noting how my problems appear to have gotten worse in the same time as I enabled firebug. I've known firebug to slow things down before, but this is ridiculous.

Firebug causing slowness in my brain implies "The slowdown is happening in the JS for some reason"

I mean, here, everything is cached locally except the json listing mentioned earlier, and it sits on its ass twiddling its thumbs for 13 seconds for the fun of it

snapshot120

( Yellow links are blocked by ublock )

I can't get a good profile without firebug, but firebug blows a lot of things up in performance ... heisenbugs :(

@kentfredric
Copy link
Contributor Author

Trying to debug the JS perf in case there's anything interesting with Opera.

Turns out it looks like a lot of the time in the page load ( as in, 6/7ths of the time ) is in "ParseHTML", and a particular hot node is "$.fn.textWidth", and that function appears to account for at least half of the total time spent in JavaScript.
snapshot121

The 9th line in this flame chart is mostly occupied by that function call, and they're all called from a single anonymous function in:

https://metacpan.org/_assets/a869e92bd8b69760f4c3b0a85075f0f9.js

line 806, which just so happens to occur in document.ready

$(document).ready(function(){processUserData();$(".ttip").tooltip();Mousetrap.bind('?',function(){$('#keyboard-shortcuts').modal();});Mousetrap.bind('s',function(e){$('#search-input').focus();e.preventDefault();});Mousetrap.bind('g s',function(e){});$('a[data-keyboard-shortcut]').each(function(index,element){Mousetrap.bind($(element).data('keyboard-shortcut'),function(){window.location=$(element).attr('href');});});$('table.tablesorter').each(function(){var table=$(this);var sortid=(MetaCPAN.storage.getItem("tablesorter:"+table.attr('id'))||table.attr('data-default-sort')||'0,0');sortid=JSON.parse("["+sortid+"]");var cfg={sortList:[sortid],textExtraction:function(node){var $node=$(node);var sort=$node.attr("sort");if(!sort)return $node.text();if($node.hasClass("date")){return(new Date(sort)).getTime();}else{return sort;}},headers:{}};table.find('thead th').each(function(i,el){if($(el).hasClass('no-sort')){cfg.headers[i]={sorter:false};}});table.tablesorter(cfg);});$('.tablesorter.remote th.header').each(function(){$(this).unbind('click');$(this).click(function(event){var $cell=$(this);var params=$.getUrlVars();params.sort='[['+this.column+','+this.count++%2+']]';var query=$.param(params);var url=window.location.href.replace(window.location.search,'');window.location.href=url+'?'+query;});});$('.relatize').relatizeDate();$('#search-input').keydown(function(event){if(event.keyCode=='13'&&event.shiftKey){event.preventDefault();var luckyField=document.createElement("input");luckyField.type='hidden';luckyField.name='lucky';luckyField.value='1';var form=event.target.form;form.appendChild(luckyField);form.submit();}});var search_input=$("#search-input");var input_group=search_input.parent('.input-group');var ac_width=(input_group.length?input_group:search_input).outerWidth();search_input.bind('modules_autocomplete',function(){$(this).autocomplete({serviceUrl:'/search/autocomplete',deferRequestBy:150,preventBadQueries:false,dataType:'json',lookupLitmit:20,paramName:'q',autoSelectFirst:false,noCache:false,triggerSelectOnValidInput:false,maxHeight:180,width:ac_width,transformResult:function(data){var result=$.map(data,function(row){return{data:row.documentation,value:row.documentation};});var uniq={};result=$.grep(result,function(row){uniq[row.value]=typeof(uniq[row.value])=='undefined'?0:uniq[row.value];return uniq[row.value]++<1;});return{suggestions:result};},onSelect:function(suggestion){document.location.href='/pod/'+suggestion.value;}});$('.autocomplete-suggestions').off('mouseover.autocomplete');$('.autocomplete-suggestions').off('mouseout.autocomplete');});search_input.bind('authors_autocomplete',function(){$(this).constructorAutocomplete({key:'bQssESEa8XUBKzSUjarO',directResults:true,maxHeight:400,transformResult:function(response){if(response['sections']&&response['sections']['url']){response['sections']['url'].forEach(function(dataset){if(dataset.value){dataset.value=dataset.value.replace(/^[^:]+: /,"");}});}

This code block is from the Syntax Highlighting code.

As to why syntax highlighting code would be aggressively wasting time doing nothing on the /recent page is anyones guess.

@kentfredric
Copy link
Contributor Author

Gah. No, its not from syntax highlighting, minified file lied to me with its comments.

Its the /cpan.js you linked earlier.

IDK, I think that's a lot of CPU Time spent putting cute little "..." on the end of things.

https://github.com/CPAN-API/metacpan-web/blame/2c72394c65858a85418607b2eb6f96bb0669248e/root/static/js/cpan.js#L298-L315

@kentfredric
Copy link
Contributor Author

I locally tested an alternative way to get the ellipsis's on.

  1. I used a tool in firefox to replace your assets js with my own, sans elipsis code
  2. I modified the CSS dynamically as follows:
 .ellipsis {
    white-space: nowrap;
    display: block;
    overflow: hidden;
+   text-overflow: " ... ";
+   padding-right: 1px;
 }

Result:

snapshot123
snapshot124

The leading and trailing space in the text-overflow field discourage the dots being part of the continuous text sequence ( so you don't get Foo-0.... ), and the right hand padding is added to prevent text hitting the bounding box and appearing joined with the next block.

@haarg
Copy link
Member

haarg commented Feb 1, 2016

That ends up hiding the end of the dist name and the version number, which are arguably the most important parts.

@kentfredric
Copy link
Contributor Author

Here's an alternative approach at a textWidth() implementation that lazily builds a table of character widths.

It however has the following limitations:

  • It assumes everywhere on the page that calls textWidth() uses the exact same font rendering and caches values for characters based on that assumption
  • It assumes character widths are invariant, that is, no side effects of context-specific letter spacing exist, and this is clearly wrong for ligatures and combining characters and any character that will have a missleading width value when rendered as <span>♯</span>
  • It assumes that the number of unique characters inside is smaller than the sum of ( number of ellipsis fields + ( number of elipsis fields that require shortening * number of steps to perform a shortening )

As 99.99% of CPAN Uploads have distnames that can be represented in US-ASCII, this will mean its a guaranteed advantage on any page with more than 90 .ellipsis fields, and a predictable advantage on any page with more than 36 ( Because lower case letters make up the bulk of distnames with numbers as a close second, followed by punctuation (./-) followed by occasional uppercase )

However, this tilts itself in a negative direction if somebody uploads a distname with every unicode character in existence, which would in a worst case, result in at least 109,000 calls trying to reflow the page to calculate the letter width. And I think that's a bit of a dangerous tradeoff.

  var char_cache = {};
  $.fn.textWidth = function () {
    var text = $(this).text();
    var html_orig = $(this).html();
    var changed_html = false;
    var span;
    var width = 0;
    for ( var c = 0; c < text.length; c++ ) {
      var the_char = text[c];
      var w = char_cache[the_char];
      if ( w ) {
        width = width + w;
        continue;
      }
      if ( !changed_html ) {
        changed_html = true;
        $(this).html('<span></span>');
        span = $(this).find('span:first');
      }
      span.text(the_char);
      w = span.width();
      char_cache[the_char] = w;
      width += w;
    }
    if( changed_html ) {
      $(this).html(html_orig);
    }
    return width;
  }

@kentfredric
Copy link
Contributor Author

Here's a quick proof-of-concept employing 3 table cells instead of 1 table cell with a bit of CSS shenanigans

overflow.html.txt

The important part is this:

<td class="lhs"><strong>A very long message goes here</strong></td>
<td class="middle"><strong></strong></td>
<td class="rhs"><span class="clip"><strong>A very long message goes here</strong></span>

renders as

snapshot126

Obviously the logic needs a bit of a cleanup, but its just proving something can work.

Though you're going to get things that look rather weird if you try copy-paste that text :)

oalders added a commit that referenced this issue Mar 15, 2016
Cache assets at fastly for a year #1640 - we can purge "assets" to clear
@haarg
Copy link
Member

haarg commented Nov 17, 2016

Is there anything network related here that was actually a problem that isn't resolved? While we will be improving our cache controls, I don't see any real issues here other than the abbreviation rendering (which is also #1661).

@kentfredric
Copy link
Contributor Author

I am no longer seeing 500ms requests for list_as_json, now under <100ms ( often less than 10ms )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants