-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interesting phenomenon #28
Comments
Hello @dstaehler ! Your method of running Cleora seems fine. In order to investigate this phenomenon let's do the following:
Let us know how it goes :) |
Hello Barbara, Thanks for your response. I use cosine similarity for the comparison of the vectors. There may be (very likely) connections in the form of A -> C and C -> D somewhere in the dataset. But I have checked the near vector space for the examples where the strange effects occur by a shortest path scan of the whole dataset. The results are: e.g. two nodes that have a shortest path of 3 edges deliver a higher cosine similarity than a node pair where the shortest path is 1 edge. This occurs for almost all examples I checked. What I haven’t done so far is taken the node degree into account. But this would require a full degree scan of the dataset upfront. Skipped this due to scanning complexity. I checked the reduction of n as well (n = 2). Same effect. And I tried the increase of the dimensions as well (d = 512). Didn’t do 1024 but 512 produced the same results. So far I‘m a bit lost what else I cloud do. best Doug |
Dear Doug, Another option is that maybe your graph is bipartite. We have a discussion about such graphs here, with an explanation what to do: #29 Best! |
Hello Barbara, thank for your replay and the offer to have a look at the dataset. The graph is not bipartite. All the nodes belong to the same entity type. Regarding the example dataset I'll drop you an email. best Doug |
Hello Cleora team, a very interesting and clever solution for creating embeddings. However, I noticed a behavior that I cannot explain. When creating embeddings with one column (a category of a single node type) that contains both a start and an end node (simple edge list), nodes that are further away from each other generate a vector that is closer to each other. E.g .: (a) -> (b) -> (c) -> (d)
as Edge List:
a b
b c
c d
The vectors a and d are closer together than the vectors a and b (by Cosin value)
Volume: approx. 5.5 million nodes and 41 million edges
I created the embeddings with the following call:
--columns = 'complex :: reflexive :: nodes' -d = 128 -i = 'node.edgelist' -n = 4
As I understand the pattern, the reflexive relationship in a column of a single type (complex) should cover an edge list with a category of node types. What am I doing wrong with the configuration or is this an issue?
A short tip would be very appreciated.
Best
The text was updated successfully, but these errors were encountered: