Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TransE error: "ValueError: One of the provided node embedding computed with the TransE method contains NaN values." #10

Closed
realmarcin opened this issue Jun 14, 2022 · 7 comments

Comments

@realmarcin
Copy link

realmarcin commented Jun 14, 2022

When generating embeddings for KG-Microbe (KGX edge file from KG-Hub) using TransE, the following error was observed:

ValueError Traceback (most recent call last)
in
----> 1 embedding = model.fit_transform(kg)

~/Library/Python/3.7/lib/python/site-packages/cache_decorator/cache.py in wrapped(*args, **kwargs)
595 if not cache_enabled:
596 self.logger.info("The cache is disabled")
--> 597 result = function(*args, **kwargs)
598 self._check_return_type_compatability(result, self.cache_path)
599 return result

~/Library/Python/3.7/lib/python/site-packages/embiggen/utils/abstract_models/abstract_embedding_model.py in fit_transform(self, graph, return_dataframe, verbose)
164 graph=graph,
165 return_dataframe=return_dataframe,
--> 166 verbose=verbose
167 )
168

~/Library/Python/3.7/lib/python/site-packages/embiggen/embedders/ensmallen_embedders/transe.py in _fit_transform(self, graph, return_dataframe, verbose)
112 embedding_method_name=self.model_name(),
113 node_embeddings= node_embedding,
--> 114 edge_type_embeddings= edge_type_embedding,
115 )
116

~/Library/Python/3.7/lib/python/site-packages/embiggen/utils/abstract_models/embedding_result.py in init(self, embedding_method_name, node_embeddings, edge_embeddings, node_type_embeddings, edge_type_embeddings)
76 if np.isnan(numpy_embedding).any():
77 raise ValueError(
---> 78 f"One of the provided {embedding_list_name} "
79 f"computed with the {embedding_method_name} method "
80 "contains NaN values."

ValueError: One of the provided node embedding computed with the TransE method contains NaN values.

I am attaching a jupyter notebook to reproduce the problem.
load_graph_and.ipynb.zip

The input edge file is here: https://kg-hub.berkeleybop.io/kg-microbe/current/kg-microbe.tar.gz

@LucaCappelletti94
Copy link
Member

Hello Marcin, in the provided Jupyter you are loading the edge list using:

kg = Graph.from_csv(
    edge_path="./merged-kg_edges.tsv",
   sources_column_number=0,
   edge_list_edge_types_column_number=1,
   destinations_column_number=2,
   directed=False,
   name="kg-microbe")

but this will load the id column as source nodes, since the file is not a triples file like the other one.

Schermata 2022-06-14 alle 20 10 40

If you load the graph from the automatic retrieval (which points to the same edge list) you should not encounter any issue:

from grape.datasets.kghub import KGMicrobe
kg = KGMicrobe()

Nonetheless, it is interesting that this causes this peculiar issue, I will look into it.

@sanyabt
Copy link

sanyabt commented Jun 14, 2022

Hi @LucaCappelletti94, I ran into the same issue after running the embeddings on my graph - TransE model run after ntriples file loaded. Here is a screenshot of the graph loading and the error.

Screen Shot 2022-06-14 at 7 28 00 PM

ValueError Traceback (most recent call last)
Input In [17], in <cell line: 1>()
----> 1 embedding = model.fit_transform(npkg)

File ~/.conda/envs/faers-embed/lib/python3.8/site-packages/cache_decorator/cache.py:597, in Cache._decorate_function..wrapped(*args, **kwargs)
595 if not cache_enabled:
596 self.logger.info("The cache is disabled")
--> 597 result = function(*args, **kwargs)
598 self._check_return_type_compatability(result, self.cache_path)
599 return result

File ~/.conda/envs/faers-embed/lib/python3.8/site-packages/embiggen/utils/abstract_models/abstract_embedding_model.py:163, in AbstractEmbeddingModel.fit_transform(self, graph, return_dataframe, verbose)
149 if graph.has_disconnected_nodes():
150 warnings.warn(
151 (
152 f"Please be advised that the {graph.get_name()} graph "
(...)
160 )
161 )
--> 163 result = self._fit_transform(
164 graph=graph,
165 return_dataframe=return_dataframe,
166 verbose=verbose
167 )
169 if not isinstance(result, EmbeddingResult):
170 raise NotImplementedError(
171 f"The embedding result produced by the {self.model_name()} method "
172 f"from the library {self.library_name()} implemented in the class "
173 f"called {self.class.name} does not return an Embeddingresult "
174 f"but returns an object of type {type(result)}."
175 )

File ~/.conda/envs/faers-embed/lib/python3.8/site-packages/embiggen/embedders/ensmallen_embedders/transe.py:111, in TransEEnsmallen._fit_transform(self, graph, return_dataframe, verbose)
102 node_embedding = pd.DataFrame(
103 node_embedding,
104 index=graph.get_node_names()
105 )
106 edge_type_embedding = pd.DataFrame(
107 edge_type_embedding,
108 index=graph.get_unique_edge_type_names()
109 )
--> 111 return EmbeddingResult(
112 embedding_method_name=self.model_name(),
113 node_embeddings= node_embedding,
114 edge_type_embeddings= edge_type_embedding,
115 )

File ~/.conda/envs/faers-embed/lib
/python3.8/site-packages/embiggen/utils/abstract_models/embedding_result.py:77, in EmbeddingResult.init(self, embedding_method_name, node_embeddings, edge_embeddings, node_type_embeddings, edge_type_embeddings)
74 numpy_embedding = embedding
76 if np.isnan(numpy_embedding).any():
---> 77 raise ValueError(
78 f"One of the provided {embedding_list_name} "
79 f"computed with the {embedding_method_name} method "
80 "contains NaN values."
81 )
83 self._embedding_method_name = embedding_method_name
84 self._node_embeddings = node_embeddings

ValueError: One of the provided node embedding computed with the TransE method contains NaN values.

@LucaCappelletti94
Copy link
Member

Hello @sanyabt! Fortunately, most likely your error is only caused by the fact that the graph is loaded as direct and there may be trap nodes there. Could you try to run kg.get_trap_nodes_number()? If there are any, that is the cause and I have fixed it yesterday (I had forgotten about this corner case).

@LucaCappelletti94
Copy link
Member

Resolved also the corner case presented in the other peculiar undirected graph topology.

@sanyabt
Copy link

sanyabt commented Jun 15, 2022

Thank you! Do we need to update or reinstall grape for the fix?

@LucaCappelletti94
Copy link
Member

It will be necessary, but currently, @zommiommy is working on @pnrobinson Printer issue. As soon as that is fixed, we will run the build procedure and deploy the updated version on PyPI. I will notify you here when we do so.

We have added in the READMEs links to the telegram, discord and Twitter accounts to easily reach us.

@LucaCappelletti94
Copy link
Member

Deployed updated versions on Pypi, GraPE version 0.1.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants