Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documents for command line arguments. #83

Closed
zheng-da opened this issue Apr 24, 2020 · 6 comments
Closed

Add documents for command line arguments. #83

zheng-da opened this issue Apr 24, 2020 · 6 comments
Labels
documentation Improvements or additions to documentation

Comments

@zheng-da
Copy link
Contributor

We need to explain the arguments of commands.

@zheng-da zheng-da added this to TODO in small improvements Apr 24, 2020
@AlexMRuch
Copy link

AlexMRuch commented Apr 30, 2020

I'd really appreciate this. For example, on https://aws-dglke.readthedocs.io/en/latest/train_user_data.html It's not super clear what should be in --data_path and --data_files.

For example, --data_path says "to specify the path to the knowledge graph dataset"; however, I presume this means "to specify the path to the folder containing the knowledge graph dataset".

Also, --data_files says "to specify the triplets of a knowledge graph as well as node/relation ID mapping"; however, it's not immediately clear the order of these files. For example, I would presume this would follow the order of the files listed under udd_[h|r|t]:

DGLBACKEND=pytorch dglke_train \
--data_path results_SXSW_2018 \
--data_files entities.tsv relations.tsv train.tsv valid.tsv test.tsv \
--format udd_hrt \
--model_name ComplEx \
--max_step 12000 --batch_size 1000 --neg_sample_size 200 --batch_size_eval 16 \
--hidden_dim 400 --gamma 19.9 --lr 0.25 --regularization_coef=1e-9 -adv \
--gpu 0 1 --async_update --force_sync_interval 1000 --log_interval 1000 \
--test

^^^ But the order isn't clear. It seems like entities.txt and relations.tsv should go at the end since if someone uses to raw_udd_[h|r|t] option this would keep the first three elements consistently for training, validation, and testing files.

Perhaps there should be --data_tuple_files and --data_mapping_files options?

UPDATE:
When I ran the code above, it gave me this output with FB_15k in the checkpoints, which doesn't seem right...

(dglke) amruch@wit:~/graphika/kg$ DGLBACKEND=pytorch dglke_train --data_path results_SXSW_2018 --data_files entities.tsv relations.tsv train.tsv valid.tsv test.tsv--format udd_hrt --model_name ComplEx --max_step 12000 --batch_size 1000 --neg_sample_size 200 --batch_size_eval 16 --hidden_dim 400 --gamma 19.9 --lr 0.25 --regularization_coef=1e-9 -adv --gpu 0 1 --async_update --force_sync_interval 1000 --log_interval 1000 --test
Using backend: pytorch
Logs are being recorded at: ckpts/ComplEx_FB15k_0/train.log
Reading train triples....

@zheng-da
Copy link
Contributor Author

Thank you very much for your feedback. We'll prioritize it and provide documentation of the argument options.

If you find the explanation from --help isn't clear, please post them here. We'll improve them. Thanks a lot for your help.

@AlexMRuch
Copy link

AlexMRuch commented Apr 30, 2020 via email

@zheng-da
Copy link
Contributor Author

zheng-da commented May 3, 2020

We need to clarify our documentation to address all of the questions in this issue: #84

@zheng-da zheng-da added the documentation Improvements or additions to documentation label May 4, 2020
@classicsong
Copy link
Contributor

The docs for command line arguments was updated along with 0.1.1 release.

@AlexMRuch
Copy link

AlexMRuch commented Aug 27, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
Development

No branches or pull requests

3 participants