Skip to content
This repository has been archived by the owner on May 9, 2019. It is now read-only.

highly efficient solution #11

Closed
wants to merge 12 commits into from
Closed

highly efficient solution #11

wants to merge 12 commits into from

Conversation

haoel
Copy link
Contributor

@haoel haoel commented Mar 23, 2016

Using while-loop to keep adding the padding char one by one is not efficient.

Actually we can do it better. We can use bit operation to keeping doubling the padding chars which can significantly reduce the string concatenation operation.

The final code as below ( Update on March 29 ):

module.exports = leftpad;

function leftpad (str, len, ch) {
  //convert the `str` to String
  str = str +'';

  //needn't to pad
  len = len - str.length;
  if (len <= 0) return str;

  //convert the `ch` to String
  if (!ch && ch !== 0) ch = ' ';
  ch = ch + '';

  var pad = '';
  while (true) {
    if (len & 1) pad += ch;
    len >>= 1;
    if (len) ch += ch;
    else break;
  }
  return pad + str;
}

@haoel
Copy link
Contributor Author

haoel commented Mar 24, 2016

By The way, the algorithm this PR used should be the most efficient so far.

There are two reference about repeating one char N times can prove it.

1) Stackoverflow
http://stackoverflow.com/questions/202605/repeat-string-javascript

2) ES6 String.repeat() implementation
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/repeat

Any comments are welcome!

if (!ch && ch !== 0) ch = ' ';

len = len - str.length;
if (len <=0) return str;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing whitespace before 0

while (++i < len) {
str = ch + str;
ch = ch + '';
pad = '';

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var must be added, or pad will be a global variable. Don't pollute global variables.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why I add 'use strict' in all my js files. You should too.

This was referenced Mar 24, 2016
@reverofevil
Copy link

Wow, that's fast.

@sagivo
Copy link

sagivo commented Mar 24, 2016

perfect. +1

@standy
Copy link

standy commented Mar 24, 2016

Did some benchmarks here https://github.com/standy/bencmark-leftpad
Binary is ok even on small numbers in NodeJS

@toupeira
Copy link

How about adding some test cases for those Unicode characters they have now?

@stevemao
Copy link
Member

@haoel Would you like to submit a PR with benchmark tests so we can compare current and future PRs? Thanks.

@akira-cn akira-cn mentioned this pull request Mar 25, 2016
@haoel
Copy link
Contributor Author

haoel commented Mar 25, 2016

Benchmark Report

3 Left-Pad Versions

I have three version of left-pad functions will be tested

  • leftpad-original() is the current version in npm
  • leftpad-es6-repeat() is the version using ES6 String.repeat() method
  • leftpad-bit-ops() is the version I posted on this PR.

(Note: the Array.join() won't be considered here, because it has really bad performance obviously)

3 Test Cases

I will have three groups of test cases as below:

  • Long: leftpad("abcd", 100, ' ');
  • Normal: leftpad("abcd", 10, ' ');
  • Short: leftpad("abcd", 5, ' ');

Each test case will be run 1 million times for each version of left-pad. And we will have 5 rounds.

Test Result

We can see leftpad-bit-ops() has better performance for all of cases.

(time: ms)

perf_test

Test Source Code

function leftpad_orginal (str, len, ch) {
  str = String(str);
  var i = -1;
  if (!ch && ch !== 0) ch = ' ';
  len = len - str.length;
  while (++i < len) {
    str = ch + str;
  }
  return str;
}

function leftpad_es6_repeat(str, len, ch) {
  str = String(str);
  if (!ch && ch !== 0) ch = ' ';
  var l = len - str.length;
  if (l > 0) return ch.repeat(l)+str;
  return str;
}

function leftpad_bit_ops(str, len, ch) {
  str = String(str);
  if (!ch && ch !== 0) ch = ' ';
  len = len - str.length;
  if (len <= 0) return str;

  ch = ch + '';
  var pad = '';
  while (true) {
    if (len & 1) pad += ch;
    len >>= 1;
    if (len) ch += ch;
    else break;
  }
  return pad + str;
}

function test(fn, str, len, times) {
  var expected = " ".repeat(len-str.length) + str;
  len = expected.length;
  console.time(fn.name);
  for(i=0; i<times; i++) {
    if (expected !== fn(str, len, ' ') ){
      console.log("error");
      break;
    }
  }
  console.timeEnd(fn.name);
}

for(var round=0; round<5; round++) {
  console.log("====== "+round + " ======");
  var times = 1000000;
  var str = "abcd"
  var len = 100;
  console.log("test(\""+str+"\", " + len + ")" );
  test(leftpad_orginal, str, len, times);
  test(leftpad_es6_repeat, str, len, times);
  test(leftpad_bit_ops, str, len, times);
  console.log("\n");

  str = "abcd"
  len = 10;
  console.log("test(\""+str+"\", " + len + ")" );
  test(leftpad_orginal, str, len, times);
  test(leftpad_es6_repeat, str, len, times);
  test(leftpad_bit_ops, str, len, times);
  console.log("\n");

  str = "abcd"
  len = 5;
  console.log("test(\""+str+"\", " + len + ")" );
  test(leftpad_orginal, str, len, times);
  test(leftpad_es6_repeat, str, len, times);
  test(leftpad_bit_ops, str, len, times);
  console.log("\n");
}

Please feel free let me know any of concerns.

--Hao

@reverofevil
Copy link

One-char pad time measurement is mostly noise, consider increasing loop count and taking median time among loops.

But hey, why should we use correct implementation in Benchmark.js that is available for years.

@wyw
Copy link

wyw commented Mar 25, 2016

Some thoughts of mine after reviewing the code. ☺️

  1. A more uniform coding style

    ch = ch + '' --> ch += '', as pad += ch, len >>= 1 and ch += ch.

  2. The difference between val + '' and val.toString()

    Reference: http://www.2ality.com/2012/03/converting-to-string.html

@haoel
Copy link
Contributor Author

haoel commented Mar 26, 2016

@polkovnikov-ph Thanks for the comments.

And the reasons I didn't use Benchmark.js as below.

  • The algorithm reduces the O(n) to O(logn). so, the benchmark I thought just a kind of paper work.
  • And, you know, "Dependence is like a box of chocolates..."

So, I just want to keep it simple & stupid. ;-)

However, the fact is I didn't know benchmark.js before you mentioned it, because I am a C/C++/Java/Go/Python developer instead of JS developer, ;-)

Anyway, I re-run the performance test based on the Benchmark.js, the result as below:

Long: Original x 563,913 ops/sec ±2.18% (73 runs sampled)
Long: ES6 Repeat x 4,719,672 ops/sec ±2.27% (76 runs sampled)
Long: Bit Operation x 5,287,988 ops/sec ±1.62% (80 runs sampled)
Fastest is Long: Bit Operation

Normal: Original x 4,117,389 ops/sec ±5.52% (72 runs sampled)
Normal: ES6 Repeat x 7,235,634 ops/sec ±1.55% (79 runs sampled)
Normal: Bit Operation x 8,327,386 ops/sec ±1.88% (79 runs sampled)
Fastest is Normal: Bit Operation

@@ -3,15 +3,18 @@ module.exports = leftpad;
function leftpad (str, len, ch) {
str = String(str);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str = '' + str;
how about?
http://jsperf.com/number-to-string/2
number to string jsperf

@reverofevil
Copy link

@haoel Wow, it's really faster than a builtin function. It seems JS interpreter is using something like a copy-on-write rope, and won't instantiate the whole string unless needed. There is an indirect evidence of this by the fact that s[i] access on a string is slower than if one transforms it into an array of single chars by s = s.split(''). Looks like your implementation is trading off access performance for creation performance. Anyway, great job.

(I'm a Scala/C++ guy too. I've just needed it a year ago while rewriting hyphenator.js. Microbenchmarking is an extremely tricky thing, in no circumstances I'd opt doing it myself.)

@dainbrump
Copy link

While there is not any dramatic difference, typecasting does slow things down a little bit. It may seem like splitting hairs, but while Haoel's function is possibly the smartest way to left-pad, the following is marginally faster. For those who depend on shaving milliseconds, I present to you Haoel's left-pad, ever so slightly improved.

function leftpad_bit_ops_improved (str, len, ch) {
  if (!ch && ch !== 0) ch = ' ';
  str += ''; ch += '';
  len = len - str.length;
  if (len <= 0) return str;

  var pad = '';
  while (true) {
    if (len & 1) pad += ch;
    len >>= 1;
    if (len) ch += ch;
    else break;
  }
  return pad + str;
}

The only real difference is I've removed the typecasting for str and instead opted for tacking on an blank character to both str and ch early on in the function. This converts the variables to strings and maintains a consistency in the code as noted by @YuanweiWu and @franklinjavier. Other than that I could find no way to improve upon the implementation. Hats off to you, @haoel!

@haoel
Copy link
Contributor Author

haoel commented Mar 29, 2016

@dainbrump your version looks good, I will update this pull request.

@stevemao
Copy link
Member

What about caching common use-cases like what camwest#5 does? (Is ' ' a common use case?)
Would it be faster if we use O(n) if n <= 5 (or whatever the break even it is)?

"license": "WTFPL"
"license": "WTFPL",
"dependencies": {
"benchmark": "^2.1.0"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be devDependencies, i've already created a pull-request haoel#1

@dainbrump
Copy link

@haoel: Considering what @stevemao shared and considering the default use case is to pad with spaces, what do you think about using an array of pre-padded strings ranging from 1 to 10 spaces and a conditional block when the default case is needed. Seems to be faster when I tested.:

function leftpad_bit_ops_and_cached (str, len, ch) {
  var cache = [
    ' ', '  ', '   ', '    ', '     ', '      ',
    '       ', '        ', '         ', '          '
  ];
  if (!ch && ch !== 0) ch = ' ';
  str += ''; ch += '';
  len = len - str.length;
  if (len <= 0) return str;
  var pad = '';
  // Fastest so far for the default use case of padding with spaces
  if (ch === ' ') {
    if (len<=cache.length) {
      return cache[len-1]+str;
    } else {
      var div = (len/cache.length>>0);
      var rem = len-(div*cache.length);
      for (;div>0;div--) {
        pad += cache[cache.length-1];
      }
      if (rem) pad += cache[rem-1];
      return pad + str;
    }
  } else {
    while (true) {
      if (len & 1) pad += ch;
      len >>= 1;
      if (len) ch += ch;
      else break;
    }
    return pad + str;
  }
}

chore: move benchmark to devdep, add bench script
@haoel
Copy link
Contributor Author

haoel commented Mar 30, 2016

@dainbrump the cache is a good idea. The only concern seems we only can do it for some special chars (like the space in your example). Some other cases might need to pad '0' or other chars.

In my opinion, if we cannot generalize the code, then we have to balance between the performance and the clean code. For this case, I am inclined to go the way of the clean code.

But, I am fully openning for any of other different opinions.

@stevemao
Copy link
Member

Can you remove the comments? I'm happy to merge this if no one oppose?

.run();

/*
function test(fn, str, len, times) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, keep the useful comments. I'm talking about this block.

@haoel
Copy link
Contributor Author

haoel commented Apr 1, 2016

Sorry for misunderstanding. and it's done!

@@ -0,0 +1,69 @@
function leftpad_orginal (str, len, ch) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I've been busy with other stuff. If you have time could you put this function in a standalone js file and name it O(n)? Similar to the functions below. and put them in a bench folder. If not I'll do it myself. Thanks.

@haoel
Copy link
Contributor Author

haoel commented Apr 16, 2016

@stevemao I am busy on other project as well, please feel free do any change you like! :-)

zhuangya added a commit to zhuangya/left-pad that referenced this pull request Apr 17, 2016
this should finish what @stevemao said in left-pad#11
@haoel
Copy link
Contributor Author

haoel commented Apr 21, 2016

Thanks @zhuangya !

And @stevemao I think you can merge it now. ;-)

@zhuangya
Copy link

@haoel you have to merge my PR against yours first and you're welcome

@stevemao stevemao closed this in 5d41a3d Apr 27, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet