-
Notifications
You must be signed in to change notification settings - Fork 955
-
Notifications
You must be signed in to change notification settings - Fork 955
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
open source amundsen neo4j backup scripts #196
Comments
cc @jinhyukchang |
@jinhyukchang @feng-tao update? |
So, the main thing I'd like to know, is: is it sufficient to simply take a copy of the disk contents? To be able to take an actual backup through the neo4j commands requires enterprise afaik. I am trying out taking disk based snapshots now, but, I am not sure if it will work. Do the Lyft folks know? |
@javamonkey79 In Lyft, we use APOC to dump data and schema which can be done w/o taking down the DB) and then upload to S3. Copying db file could work, but I was afraid of possibility that it might dump the file when the state is not consistent. |
@jinhyukchang super, thanks, I'll check out apoc then. |
Actually, this link would be better one. |
* Implement APIs/Saga/Reducer for user 'own' and user 'read' resources * Added bookmarks, read, own to profile page * Refactor styles related to pagination and list items
Ok, I put a PR for this |
PR merged, this is done |
@jinhyukchang @feng-tao sadly, when I rolled this out to our prod instance, I found the files were strangely small. Thinking through this a little more, I think it is because the while wait loop is checking the file, but, I think the file does exist in an intermediate state. I'll work on a fix tomorrow. I'm not sure why this didn't happen in our QA env. |
@jinhyukchang @feng-tao ok, I checked our runs again on prod (I left it running), and while the first few runs emitted very little\no data the job did start working. I'm not sure why it would work after the first few runs. I'm still still fairly sure it needs to block on the export call somehow, but, I'm still trying to think of how to achieve that, since the export call is an async rest call. There seem to be a few choices:
I'll do some more testing today, to try and isolate the issue and provide a fix. cc @samshuster |
Hi @javamonkey79 , For example, this is the call we make and it's a blocked call. |
hey @jinhyukchang thanks for the follow up:
yup, that's what I mean by option 3. To clarify it a bit though, it may be possible to share the bin files from one container in the neo4j pod to another, but, then there may be environment strings and other setup that could cause setup. Sharing between pods in this way is not typical that I've seen. Basically, those pods are different containers. Right now, one container is the neo4j container, while the backup container is an aws cli based container. I think I may have to use the same image on both containers, which is probably what you're thinking. |
hey @jinhyukchang I have a new PR for this here. I want to let this run a few times, to make sure it works ok (I just rolled it out to our QA env). So (in case you are really on the ball today), please don't merge until after 12:00pm PST 2/18/20. I have already done some basic testing on it, to make sure it is good and so far it looks right. I will note, it took a little longer as there was a strange issue with our cluster neo4j pvc on QA. I think this contributed to some of the issues I saw. |
@jinhyukchang @feng-tao I apologize, but, there is some yet unknown issue with this. I am still working on it. Once I figure it out, I'll let it run for a week and then let you know when it is good again. The problem is, the while loop gets caught infinitely because the data file is not present. I thought the problem was related to the persistent volume, but, I updated it yesterday and it is still having issues. The really odd thing, is that the issue is isolated to our QA cluster. Our PROD cluster is running the backup cron job just fine. cc @samshuster |
No problem, @javamonkey79 |
just saw this, thanks @javamonkey79 , let us know once it is ready |
@feng-tao @jinhyukchang the changes are looking good in QA thus far, but, let's definitely stick to the 1 week rollout. |
@jinhyukchang I am observing a frustrating issue with regards to neo4j. I wonder if you have encountered it before, or, if it could be related to k8s setup. Basically:
The only differences that I've noted from what you've mentioned to my setup:
Does your process block other queries while backups are running? What sort of cadence are you running (daily, hourly, etc)? Have you tried out cypher-shell instead? tia.... |
@javamonkey79 Unfortunately, I didn't experience your symptom. We are performing backup every 10 minutes and it's not affecting performance at all not to mention that it's not blocking other queries. The main different I see is Could you try to make |
@jinhyukchang it's a little hard to find, but, this is the error I encountered: https://stackoverflow.com/q/21448081/27657 From there, you can look up in the docs (again, hard to find): https://neo4j.com/docs/operations-manual/3.3/configuration/ports/
|
Interesting.
Could you check your config and confirm if Neo4j is using it? |
@jinhyukchang I checked, and our version (which is the one here for the community as well btw) does not have the shell switch enabled:
I double checked, and it's not listening on 1337 either. I suppose I could try setting the flag to true and try running through neo4j-shell again. |
@jinhyukchang just a quick update; I've tested out the neo4j-shell approach and it has been working for a few days in our QA and DEV envs. We suspect that perhaps b\c cypher-shell communicates over bolt, that it is causing some port conflict issue, but, we're not sure. I will let this run through the weekend and if it looks good Monday I'll roll it out to prod; after that, I think we can merge. |
Here is an example one time pod to restore, I'll add this to the docs on my next PR:
|
* Implement APIs/Saga/Reducer for user 'own' and user 'read' resources * Added bookmarks, read, own to profile page * Refactor styles related to pagination and list items
* Implement APIs/Saga/Reducer for user 'own' and user 'read' resources * Added bookmarks, read, own to profile page * Refactor styles related to pagination and list items
* Implement APIs/Saga/Reducer for user 'own' and user 'read' resources * Added bookmarks, read, own to profile page * Refactor styles related to pagination and list items
AC
The text was updated successfully, but these errors were encountered: