-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Python API #82
Comments
One thing that would be especially helpful for a Python API would be to create a model class that once trained can do entity and edge prediction (e.g., https://graphvite.io/docs/latest/api/application.html#graphvite.application.KnowledgeGraphApplication). For example, if I have a list of entity nodes and relational edges and want to know either 1) what is the most likely or top-k destination nodes for a set of source nodes or 2) what is the probability that a certain type of edges exists between a source and destination node. Right now, I plan to borrow code for evaluating pre-trained knowledge graph embeddings (https://aws-dglke.readthedocs.io/en/latest/hyper_param.html#evaluation-on-pre-trained-embeddings --> https://github.com/awslabs/dgl-ke/blob/master/python/dglke/eval.py) to try to do this on my own; however, it seems like that would be helpful for downstream tasks for many users. Please let me know if this is something you think would be useful and if I develop such a script I can help share it with your or develop it in a way that works within the |
This is one of our motivations to create a Python API. It'll be great if you could contribute to this. Could you share such a script to us once you have it? I think we should totally work together on this. We'll share our previous design of the Python API. It'll be great if you can give us feedbacks. |
Happily! Thanks so much for your interest! Should we open another issue for the entity/link prediction ticket? Would you like make to fork |
My understanding is that you like to contribute to creating a model class to evaluate pre-trained embeddings for various tasks: entity classification and link prediction. Is this right? We can create another ticket to have more focused discussions. As for development, I think you can fork the repo and make a PR for us. Before that, can we start with discussion on API definition? We like to have API to be stable. So it'll be great if we can finalize the API design before we can go to actual code? |
Yes, that is correct. It would be wonderful to have a model class that can ingest pre-trained embeddings and then perform entity classification and link prediction similar to what Thanks for opening another ticket. I'll likely have questions throughout the process, so that'll help keep this issue ticket cleaner in the event others have ideas or wish to contribute to the Python API. I will fork the repo and can begin work after you all have discussed the API and settled on a stable definition, as you requested. Thanks for the guidance! |
The Python API is mainly defined for users to invoke KGE training in the Notebook environment. It doesn’t support distributed training. Load Data# Load builtin datasets
kg = dglke.dataset.FB15k()
# Load users' own data (raw or pre-formatted data)
kg = dglke.dataset.load(train=load_rdf('/path/to/train/file'),
valid=load_rdf('/path/to/valid/file'),
test=None,
format='htr') Model load and creationWhen a model is created, it has to be associated to a knowledge graph. Since KGE models are transductive, it’s only valid on a knowledge graph. model = dglke.TransE(dim=400)
model.attach_data(kg) Model trainingWhen training the model, it only trains on the knowledge graph associated with the model and save the model afterwards. When the model is saved to the disk, we only save the model embeddings and configurations to the disk. # When training a model, we need to provide the training data and
# specify all hyperparameters.
model.fit(num_epochs=10,
gpus=[0, 1, 2, 3], batch_size=1000,
neg_sample_size=400, lr=0.1,
warm_start=False)
model.save('/path/to/save/model') Restart model training from a checkpointTraining knowledge graph embeddings may take a long time. It’s likely that people want to save KGE models periodically and restart the training. We should allow KGE training from a checkpoint. model = model.TransE(dim=400)
model.load('/path/to/trained/model')
model.attach_data(kg)
model.fit() # This will lead to an error if there is no kg Model evaluationmodel.eval(kg.test, filter_edges=kg.train, neg_size=1000,
neg_sample_strategy='...')
triplets = load_rdf('..', format='htr')
model.link_prediction(triplets)
model.entity_embed # get the entity embeddings
model.relation_embed # get the relation embeddings |
I shared the API we defined a few months ago, but we didn't get time to implement them. I would like to share it with the community and ask for feedbacks. @AlexMRuch as a user, do you find this kind of API be intuitive for you? As for API for evaluation, is this what you have in mind? Feel free to propose your ideas and give us feedbacks on other APIs. Thanks. |
@AlexMRuch please feel free to open another ticket to discuss the evaluation API. |
Wonderful. Thanks! Given the information you posted above, perhaps we can just continue the API setup and evaluation discussion here, as it seems like this will involve creating the objects you mentioned above. The API seems pretty clear for me and is very similar to what I had in mind; however, a few things are unclear.
Is there any interest in adding visualization (e.g., reduction to 2D or 3D with Hope those suggestions help and that you get other useful feedback from the community! Please let me know when you've heard back from others and when you'd like me to try and contribute some code to this effort. Thanks! |
Yes, visualization is definitely desired. Do you have any suggestions on what are the good visualization tools out there for large graphs? |
Yes, much clearer! Thank you. I agree that calling
Ah, yes, that's a great idea – restarting training from a checkpoint. This will be very useful for how I plan to use If we wanted to add something like
What I mean by link prediction is where you have a source entity and a destination entity and you want to predict whether a particular kind of relation edge exists between them (i.e., are two Twitter users connected by a Retweet edge?). What I mean by entity prediction is where you have a source entity and a relation edge and you want to predict the most likely destination entity for that source-relation pair (i.e., who is a given Twitter user most likely to retweet, or what is the list of top-k most likely destination entities?). So this is not really a "classification" problem – my mistake. It should be called I hope that makes sense and that you agree with these ideas. I believe the idea of using KNN for entity prediction is why
In my work I usually do UMAP to knock the embeddings dimensions down to 2 or 3 dimensions and then just use I haven't visualized knowledge graphs yet, however, so I don't know what extra complexity will exist for their embeddings compared to For what it's worth, I also plan on using |
Thanks for your suggestions on visualization. Your visualization looks very cool. The team will investigate visualization tools and try out the ones you suggested. I think we'll need your help. |
Sounds great! I haven't used it for KGE before but have used it for other tasks (e.g., multi-label NLP classification tasks). I presume it should port over relatively easy to KG tasks given that you can define the problem the same: which learning rate minimizes the loss best.
You're very welcome! Very happy to help where I can. Please don't hesitate to reach out!! |
I am no longer working at the same company where I used this library.
…On Wed, Jul 14, 2021, 8:09 PM Nabila Abraham ***@***.***> wrote:
hi @zheng-da <https://github.com/zheng-da> and @AlexMRuch
<https://github.com/AlexMRuch> - is there any progress on this feature
request?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#82 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFIYWOM455U4JIZOZF4WWLDTXYRMFANCNFSM4MPKYIAA>
.
|
Hey guys, i couldn't figure if the API is already released or not? I assume not. I really liked how to defined the API above. Best regards
|
The Python API is convenient for many use cases. It allows more customization and is very friendly for Jupyter Notebook users.
The text was updated successfully, but these errors were encountered: