Skip to content

ProfoundNetworks/gzipi

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

gzipi

Tools for indexing compressed files (currently supporting gzip and zstandard) to support random-like access.

Installing

To install library from the source code, run the following coomand:

$ python setup.py install

To install from pypi, run:

$ pip install gzipi

Testing

$ make test
$ make lint

Repacking existing archives

If your archive was not converted before, you need to repack it:

$ gzipi repack -f profiles.json.gz -i index.gzi -o repacked_profiles.json.gz --format json --field domain

This command produces the repacked archive and the index file.

Retrieving data

To quickly retrieve data, you need a repacked archive and the index file.

Retrieving multiple keys provided via stdin:

$ cat domains_to_retrieve.txt | gzipi retrieve -f repacked_profiles.json.gz -i index.gzi --format json --field domain

Retrieving a single key:

$ gzipi search --input-file profiles.json.gz --index-file index.gzi --key google.com

Using local and S3 paths:

$ gzipi retrieve -k domains.txt -f s3://logs/2019.json.gz -i index.json.gz --format json -o data.json --field domain

Indexing a file

If you gzip archive is already chunked, you can index it without repacking.

Indexing a file from stdin:

$ cat profiles.json.gz | gzipi index --format json --field id > index.json.gz

Indexing a local file:

$ gzipi profiles.json.bz -i profiles.json.gz -o index.json.gz --format csv --column 0 --delimiter ','

Help

To get more information, run the following command:

$ gzipi --help

About

Tools for indexing gzip files to support random-like access.

Resources

Stars

Watchers

Forks

Packages

No packages published