Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document parameters - what and how - for keyvi command line and python API #213

Open
netankit opened this issue Feb 17, 2017 · 5 comments
Open
Labels

Comments

@netankit
Copy link

Currently, we use keyvi compiler option of "floating_point_precision" for word embeddings in sharding/compiling step. It would be nice to pass this option to command line / python api of keyvi for any keyvi file where the values will be vectors.

Ex: keyvi_compiler_options = {"minimization": "off", "floating_point_precision": "single"}

This will be helpful in reducing the size of massive keyvi files composed of vector values. (Ex. Document Vectors- ~2.1B Vectors ~ 300 Dimensions). I haven't been able to figure out how one can use this feature. A standalone example with documentation will be useful.
@hendrikmuhs

@hendrikmuhs
Copy link
Contributor

Hey @netankit

you are right, all the config options lack documentation (and over time they became quite a few).

This is how you do it on the cmdline:

keyvicompiler -i float.txt -o float.kv_s -d json -V floating_point_precision=single

(Note that you - talking about size - add compression as well: keyvicompiler -i float.txt -o float.kv_s -d json -V floating_point_precision=single -V compression=zlib )

On the python side you pass it as a dictionary:

See https://github.com/cliqz-oss/keyvi/blob/master/pykeyvi/tests/json/json_dictionary_test.py#L65

cs = pykeyvi.JsonDictionaryCompiler(50000000, {'floating_point_precision': 'single'})

The first parameter is the memory limit, which has to be given in order to pass the parameter dictionary as the 2nd argument.

Equivalent to above compression can be added by e.g. 'compression': 'zlib'.

@hendrikmuhs
Copy link
Contributor

hendrikmuhs commented Feb 18, 2017

Note: The parameter parsing will change for 0.2 to make it more consistent. The memory limit which is right now an extra parameter will move into the parameter dictionary, so that all configurations are given by a python dictionary or a std::map<string, string> on the CPP side.

Changing title and label.

@hendrikmuhs hendrikmuhs changed the title [Feature Request] Single floating_point_precision for keyvi command line and python API Document parameters - what and how - for keyvi command line and python API Feb 18, 2017
@netankit
Copy link
Author

@hendrikmuhs Thanks for the detailed reply. I will use this for the time being. So, from v0.2 are keyvicompiler and keyviinspector completely going to be removed in favor of keyvi compile/dump?

@hendrikmuhs
Copy link
Contributor

hendrikmuhs commented Feb 18, 2017

ah, got it. It seems the keyvi cli tool does not support parameters yet. Good point, we should add it.

What I meant with 0.2 is moving memory_limit into the parameters, so the python call would look like:

cs = pykeyvi.JsonDictionaryCompiler({'floating_point_precision': 'single', 'memory_limit_mb': '50'})

There are no removal plans for keyvicompiler and/or keyviinspector. The keyvi cli (based on python) is just an alternative to the native tools. Use whatever you like.

The idea behind keyvi cli is faster implementation, it is much much easier to implement something in python + pykeyvi, than writing it in the cpp app. That means we will probably implement new features in keyvi cli only. But will see.

@hendrikmuhs
Copy link
Contributor

opened #214

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants