Skip to content

Latest commit

 

History

History
19 lines (13 loc) · 945 Bytes

File metadata and controls

19 lines (13 loc) · 945 Bytes

Solution

Use an External Sort such as External Merge Sort

  1. We will assume that 20 gigabytes of data is too big to bring into memory.
  2. Let's assume we have "x" megabytes of memory (where x < 20 gigabytes)
  3. Divide the 20 gigabyte file into chunks of "x" megabytes each.
  4. Sort each chunk separately (this can be done in parallel) and save it back to the file system.
  5. Merge the chunks together into a fully sorted file.

Time

O(n log n) just like standard Merge Sort

Space Complexity

  • O(n) space is needed for storage of this data on a hard disk.
  • However, since we only have "x" megabytes of memory, our space complexity is O(x)
  • If we could fully parallelize our algorithm onto enough machines of "x" bytes each and be sorting the entire 20 gigabytes in chunks simultaneously, then total space complexity is O(n) (distributed onto multiple machines)