Simply sort a big .json or .ljson that does not fit in memory
This tool helps you sort a big json (or ljson) file that does not fit in memory. Given the batch_size that your machine can put in memory it will sort (based on the key) the whole file by reading as many times as necessary.
OS X, Linux & Windows: No specific requirements except python3 (3.5 and later).
pip install git+https://github.com/YaYaB/sort-big-json
usage: Sort a huge json file without loading in fully in RAM
[-h] [--input_file INPUT_FILE] [--batch_size BATCH_SIZE] [--key KEY]
[--sep SEP] [--is_json] [--output_file OUTPUT_FILE]
optional arguments:
-h, --help show this help message and exit
--input_file INPUT_FILE
Path to input file
--batch_size BATCH_SIZE
Batch size that can fit in memory
--key KEY Key or subkey used to sort
--sep SEP separator for nested key
--is_json Indicate if it is a json or ljson file
--output_file OUTPUT_FILE
Path to output sorted file
Please refer to here for examples.
The machine used has the following specs:
cpu: i7-6700HQ CPU @ 2.60GHz × 8
ram: 16Gb
Os: Ubuntu 18.04
SSD: 512Gb Toshiba M.2 2280 THNSN5512GPUK
The benchmark is the following: TODO
YaYaB
Distributed under the Apache license v2.0. See LICENSE
for more information.