Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Support for long neo4j-admin database import #335

Open
JPonsa opened this issue Apr 4, 2024 · 3 comments
Open

[Feature request] Support for long neo4j-admin database import #335

JPonsa opened this issue Apr 4, 2024 · 3 comments

Comments

@JPonsa
Copy link

JPonsa commented Apr 4, 2024

The comment below is an issue with neo4j not biocypher. Mentioning so (i) people is aware (ii) see if we are able to find a workaround

I discovered that somehow, at least on Windows, there is a max length for the neo4j-admin database import command. I am assuming this applies to all commands from the neo4j terminal. In my case importing a large list of files and having a full path for each file meant that I exceeded this limit. The command 2nd half of the command was getting stripped and failing to execute. I had to move the data into the dbms folder (the same folder you have the neo4j bin) so I could use shorter relative paths. I tried splitting the command in 2 (e.g. nodes and relationships) but failed. I had the issue even trying to execute the command as a script. Doing a quick google search seems this limit could be 8,191 characters, although I found references to other values.

It would be nice to have at least an alert/message somewhere fighting if the admin import command produced exceeded this limit

  • Does the header need to be in a separated file? could it be included in the part file reducing the length of the command significantly?
@nilskre
Copy link
Collaborator

nilskre commented Apr 4, 2024

Thanks for the report.
On Linux and Mac, we have not yet experienced this issue.
As you already mentioned, the maximum limit of characters for the neo4j-admin import call is caused by Neo4j (on Windows). Therefore, also the error handling should be done by neo4j (and not by BioCypher itself).

What is the exact limit of characters? I would expect, that this is documented in the neo4j docs?
Are there already related issues on neo4j regarding this?

@JPonsa
Copy link
Author

JPonsa commented Apr 4, 2024

I could not find a mention of it in the neo4j StackOverflow. Planning to write a post there. However, this is likely neither an issue with neo4j or biocyper but related to the operative system. Both Windows and Linux have a max length on the terminal. Not sure if the number are right but seems that for windows is over 8K and linux is 100K characters. In Windows this cannot be changed.

Although not a biocyper issue/bug, it is biocypher the one producing the csv fiiles and the executable. For example, is it a Biocypher design decision to produce a header and part file separately? Could they be combined in a single file? if so, that would reduce the command length significantly and reduce the changes of the issue. Or maybe someone who knows more about neo4j knows how the load could be split into multiple commands so you don't have this issue.

The length of the command is to a certain degree related to the KG complexity as it is directly related to the number of nodes types and the number of relationships. My example has 17 node types and approx 25 relationships. At the end how many people hit this issue will be a number game. So I am not able to size how frequent this issue will be or if will only affect Windows users.

So not a Bug, not a Must Have but a Nice to Have.

@nilskre
Copy link
Collaborator

nilskre commented Apr 5, 2024

Alright, thanks for the explanation. Now I understand your point. We will keep this in mind for future discussion.

As workaround in your case: Have you tried the online mode, where BioCypher is directly writing the nodes and edges to Neo4j (without the intermediate step with csv files)? To use this you need a running Neo4j instance. Then you adapt the biocypher_config.yaml and set offline to false. Then you need another config block, how the running neo4j instance can be accessed as shown here. Then BioCypher should directly connect to the running Neo4j instance and write the produced nodes and edges there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants