Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch from neo4jrestclient to neo4j for Neo4j 4.3 compatibility #343

Merged
merged 51 commits into from Jan 13, 2022

Conversation

kevinschaper
Copy link
Collaborator

It's not working yet, rather than querying directly from the driver it looks like a session & transaction are necessary

@kevinschaper
Copy link
Collaborator Author

I replaced http_driver.query with http_driver.session().run everywhere, but I think it's worth taking a second pass to add transactions.

It also appears to be hitting a constraint violation in the tests:

metadata = {'code': 'Neo.ClientError.Schema.ConstraintValidationFailed', 'message': "Node(10) already exists with label `biolink:NamedThing` and property `id` = 'A'"} ```

@kevinschaper
Copy link
Collaborator Author

@RichardBruskiewich I can't test against 3.5 locally, so I definitely need help with that.

I'm also not sure about the duplicate constraint validation messages, for example test_neo_source.py is giving

metadata = {'code': 'Neo.ClientError.Schema.ConstraintValidationFailed', 'message': "Node(10) already exists with label `biolink:NamedThing` and property `id` = 'A'"}

I see lots of similar messages when loading an actual kgx file, but my guess is that this might just be something to clean up for tidy output rather than an actual problem?

I don't know that it's strictly necessary for this PR, but I think it's probably worth looking at using transactions rather than just session().run() for each query.

This has examples of defining each query as it's own function to use with transactions: https://neo4j.com/docs/python-manual/current/session-api/ which might require a bit of refactoring

They're a little less elegant, but it might be reasonable to follow this example for just declaring explicit transactions
https://singerlinks.com/2020/12/neo4j-explicit-transactions-using-python/

@kevinschaper
Copy link
Collaborator Author

Also, I just realized that the connection problem that's getting hit in the tests might actually be incompatibility with Neo4j 3.5 (uh oh)

@kevinschaper kevinschaper changed the title A start at upgrading to the neo4j client for 4.3 compatibility Switch from neo4jrestclient to neo4j for Neo4j 4.3 compatibility Oct 8, 2021
@RichardBruskiewich
Copy link
Collaborator

Also, I just realized that the connection problem that's getting hit in the tests might actually be incompatibility with Neo4j 3.5 (uh oh)

Neo4j 4.* seems to have moved more definitively towards exclusive use of its Bolt port 7687. One suspects that this could be one particular source of gross major release backwards incompatibility with 3.* releases?

If KGX needs to support both releases, we may need to parameterize the calling of KGX to know which Neo4j version it is targeting, and this may also require the maintenance of two parallel code implementations, one supporting each version?

I can perhaps start taking a closer look at this next week sometime.

@RichardBruskiewich
Copy link
Collaborator

@cmungall @sierra-moxon, I've previously had the suspicion that that there is sufficient incompatibility between 3.5++ and 4.* Neo4j that one size of KGX neo4j sink/source won't fit all. I'm not sure how many KGX users currently depend on 3.5++, for which the pain of upgrade to 4.3++ would be significant.

Since freezing KGX at 3.5 does not seem to be a progressive option for the future, do we 1) completely abandon backwards compatibility to 3.5++ in the next KGX release and enforce 4.3++ compliance, or 2) config.yaml parameterize the user choice of Neo4j within a given KGX tool deployment environment, then support both the old (3.5+) and new (4.3+) compliant solutions in the code base (along with unit tests), perhaps marking 3.5++ as deprecated(?).

@kevinschaper
Copy link
Collaborator Author

It looks like the Neo4j 4.3 driver is compatibly with Neo4j 3.5: https://neo4j.com/developer/kb/neo4j-supported-versions/

The PyPi page lists something about slightly different defaults/details for connection strings that require encryption: https://pypi.org/project/neo4j/

@sierra-moxon
Copy link
Member

@RichardBruskiewich - @kevinschaper and I talked briefly with Justin in the KG hub meeting today. We probably can't support more than one version of neo4j. And the December demo looms large on the Translator side of this fence. What if we plan on migrating to the new version of neo4j, but do the work for it in a branch off master (and don't plan on merging until after December). It sounds like Kevin is ok with this idea, but also suggested we update the KGX README to document the supported version, and give our plan for upgrade. What do you think?

@PhillipsOwen
Copy link
Contributor

PhillipsOwen commented Oct 11, 2021

FWIW, I have been using KGX to load neo4j 3.5.4 graphs and then upgrading the ne04j 3.5.4 data files to neo4j 4.X.

I have a script that does it if anyone is interested.

@kevinschaper
Copy link
Collaborator Author

My one clarification is that I think after switching to the standard neo4j driver, it should still be possible to support 3.5, but I'm 100% on board with waiting until after December to eliminate the chance of downstream disruptions when folks have other priorities.

@RichardBruskiewich
Copy link
Collaborator

OK everyone. Sounds like a sensible plan: we're ultimately only going to support the latest Neo4j (i.e. 4.3) but that work to migrate over to this new version should be in a dedicated branch until after the December demo. We'll document status in the README as required.

Thanks all for the feedback. I'll start nibbling away at this starting sometime this week.

Richard Bruskiewich added 2 commits October 26, 2021 09:01
* master:
  Missing index link to meta-knowledge-graph command
* master:
  Update requirements.txt of KGX to be more permissive of its linkml version (this eliminates the designated_type error, but still has some other unit test failures)
@RichardBruskiewich
Copy link
Collaborator

@kevinschaper (@putmantime @sierra-moxon @cmungall) Moving right along...

The Neo4j company may be on a course to make the Community version of Neo4j a bit less palatable, to encourage folks to pay real money to use the technology.

e.g. See https://stackoverflow.com/questions/60429947/error-occurs-when-creating-a-new-database-under-neo4j-4-0

I'm hitting a few snags in the unit tests relating to this. I don't yet know how painful they are. Time will tell.

Is it the case that Translator will be running KGX only on Community versions of Neo4j or do we just expect serious biomedical systems to use a professional or better version?

@sierra-moxon
Copy link
Member

sierra-moxon commented Oct 27, 2021

I do not imagine many groups will use the professional/paid version of neo4j.

Richard Bruskiewich added 3 commits October 27, 2021 13:02
…_slate') to 'clean_database'; appended 'test_neo_to_graph_transform' test to 'gets_csv_to_neo_load()' to ensure sequential execution (tweaked 'neo' to 'neo4j' in name)
@RichardBruskiewich RichardBruskiewich marked this pull request as ready for review October 27, 2021 21:09
@RichardBruskiewich
Copy link
Collaborator

RichardBruskiewich commented Oct 27, 2021

HI @sierra-moxon (cc: @putmantime @kevinschaper), this branch still has unit test errors but most (all?) of them seem unrelated to the Neo4j 4.3 upgrade target of this branc (and are spun off as PR #351 and issue #353, for remedial attention).

I probably now just need to review and update a bit of the documentation, as discussed above, but the code otherwise ready for initial review.

I guess we keep this new code tagged for the post-December demo release (but available earlier, for Monarch use)?

@kevinschaper
Copy link
Collaborator Author

This is awesome, thank you for taking it over @RichardBruskiewich!

Waiting until December is still fine, it's only for local development that 3.5 isn't an option for me, so I've been running kgx from this branch when I want to load into a local neo4j. We have a cloud pipeline now too and it's fine using 3.5 with the current kgx release.

@RichardBruskiewich
Copy link
Collaborator

This is awesome, thank you for taking it over @RichardBruskiewich!

Waiting until December is still fine, it's only for local development that 3.5 isn't an option for me, so I've been running kgx from this branch when I want to load into a local neo4j. We have a cloud pipeline now too and it's fine using 3.5 with the current kgx release.

T'was what Tim P. assigned to me (job now "done" after much unavoidable procrastination).

@kevinschaper the other unit test failures seem completely unrelated to Neo4j (perhaps rather triggered by recent Biolink model changes plus ???!?). If what you need is the 4.3 Neo4j in your development, should work fine, but let me know if you encounter any issues.

@RichardBruskiewich
Copy link
Collaborator

I do not imagine many groups will use the professional/paid version of neo4j.

Yeah. Anyhow, the matter was not as serious as I expected.

@sonarcloud
Copy link

sonarcloud bot commented Jan 13, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

81.1% 81.1% Coverage
0.0% 0.0% Duplication

@sierra-moxon sierra-moxon merged commit 2e20cc5 into master Jan 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants