Skip to content

Sort huge file consisting of known set of characters

Jaroslav Shkarupilo edited this page Jul 30, 2016 · 5 revisions

Given: huge (GB+) text file that consists of digits from 0 to 9.

Task: print content to a console in a sorted order in optimal time.

Note: algorithm itself should run in optimal time, efficiency of different filesystem functions could be neglected.


Firstly you'll have to load all the file content into a string, then sort it in memory by applying any sorting algorithm that you're able to implement from scratch, than... Just kidding :)

Solution:

  • Read file stream character by character in a loop.
  • Count number of entries of every character.
  • Print characters from the smallest to the biggest in a quantity depending on saved counters.
  • Voilà! That's all. No bubble sort, no quick sort, no any other actual sorting applied at all.

PHP

function solution($filename) {
    $out = array_fill(0,10,0);
    $inp = fopen($filename,'r');
    
    // read at max. 10MB at a time
    while ($buf=fread($inp, 1024*1024*10)) {
        for ($i=0, $l=strlen($buf); $i<$l; $i++) {
            $out[intval($buf[$i])]++;
        }
    }
    
    fclose($inp);
    
    foreach ($out as $k=>$v) {
        for ($i=0; $i<$v; $i++) {
            echo $k;
        }
    }
}