open source amundsen neo4j backup scripts #196

javamonkey79 · 2019-12-06T23:43:44Z

AC

there will be scripts provided that allow amundsen neo4j data to be backed up (on a schedule) to cloud provider blob storage. aws s3 makes the most sense, and if others need other providers (e.g. azure), then they can provide an extension to this functionality
once these scripts are established, we should extended them to the k8s setup as well

feng-tao · 2019-12-14T06:39:44Z

cc @jinhyukchang
Hey Jin, could you help @javamonkey79 ? Thanks. Given it is almost holiday season, I am not sure if we could do it in 2019.

javamonkey79 · 2020-01-22T21:16:39Z

@jinhyukchang @feng-tao update?

javamonkey79 · 2020-01-22T21:29:10Z

So, the main thing I'd like to know, is: is it sufficient to simply take a copy of the disk contents? To be able to take an actual backup through the neo4j commands requires enterprise afaik. I am trying out taking disk based snapshots now, but, I am not sure if it will work. Do the Lyft folks know?

jinhyukchang · 2020-01-22T22:00:17Z

@javamonkey79 In Lyft, we use APOC to dump data and schema which can be done w/o taking down the DB) and then upload to S3. Copying db file could work, but I was afraid of possibility that it might dump the file when the state is not consistent.

https://neo4j.com/developer/neo4j-apoc/

javamonkey79 · 2020-01-22T23:13:42Z

@jinhyukchang super, thanks, I'll check out apoc then.

jinhyukchang · 2020-01-23T00:59:24Z

Actually, this link would be better one.
https://neo4j.com/docs/labs/apoc/current/export/

* Implement APIs/Saga/Reducer for user 'own' and user 'read' resources * Added bookmarks, read, own to profile page * Refactor styles related to pagination and list items

javamonkey79 · 2020-02-11T00:38:21Z

Ok, I put a PR for this

#281

javamonkey79 · 2020-02-13T22:55:18Z

PR merged, this is done

javamonkey79 · 2020-02-14T01:18:45Z

@jinhyukchang @feng-tao sadly, when I rolled this out to our prod instance, I found the files were strangely small. Thinking through this a little more, I think it is because the while wait loop is checking the file, but, I think the file does exist in an intermediate state. I'll work on a fix tomorrow. I'm not sure why this didn't happen in our QA env.

javamonkey79 · 2020-02-14T16:34:17Z

@jinhyukchang @feng-tao ok, I checked our runs again on prod (I left it running), and while the first few runs emitted very little\no data the job did start working. I'm not sure why it would work after the first few runs. I'm still still fairly sure it needs to block on the export call somehow, but, I'm still trying to think of how to achieve that, since the export call is an async rest call. There seem to be a few choices:

Check on the output of the invoked rest call, somehow. This might not be possible.
Check the emitted output file length; it may be possible to count the records and match that up with the response of the rest call.
Switch to using another image instead of the aws cli image I've been using, to a neo4j image and run the commands local to the pod instead of restfully. I ran into problems with this approach, which is why I took the approach I am on. So, I'm hesitant to try this, but, I may fall back to it.

I'll do some more testing today, to try and isolate the issue and provide a fix.

cc @samshuster

jinhyukchang · 2020-02-14T18:49:49Z

@jinhyukchang @feng-tao ok, I checked our runs again on prod (I left it running), and while the first few runs emitted very little\no data the job did start working. I'm not sure why it would work after the first few runs. I'm still still fairly sure it needs to block on the export call somehow, but, I'm still trying to think of how to achieve that, since the export call is an async rest call. There seem to be a few choices:

Check on the output of the invoked rest call, somehow. This might not be possible.

Check the emitted output file length; it may be possible to count the records and match that up with the response of the rest call.

Switch to using another image instead of the aws cli image I've been using, to a neo4j image and run the commands local to the pod instead of restfully. I ran into problems with this approach, which is why I took the approach I am on. So, I'm hesitant to try this, but, I may fall back to it.

I'll do some more testing today, to try and isolate the issue and provide a fix.

cc @samshuster

Hi @javamonkey79 ,
We are not running Neo4j in k8s environment, but is there a way not using REST API, but just use neo4j-shell within Neo4j pod?

For example, this is the call we make and it's a blocked call.
echo "CALL apoc.export.graphml.all(${data_file}, {useTypes: true, readLabels: true});" | ( time ${NEO4J_BIN}/neo4j-shell - ) | tee -a ${BACKUP_LOG_FILE}

javamonkey79 · 2020-02-14T19:32:57Z

hey @jinhyukchang thanks for the follow up:

Hi @javamonkey79 ,
We are not running Neo4j in k8s environment, but is there a way not using REST API, but just use neo4j-shell within Neo4j pod?

yup, that's what I mean by option 3. To clarify it a bit though, it may be possible to share the bin files from one container in the neo4j pod to another, but, then there may be environment strings and other setup that could cause setup. Sharing between pods in this way is not typical that I've seen. Basically, those pods are different containers. Right now, one container is the neo4j container, while the backup container is an aws cli based container. I think I may have to use the same image on both containers, which is probably what you're thinking.

javamonkey79 · 2020-02-18T16:30:00Z

hey @jinhyukchang I have a new PR for this here. I want to let this run a few times, to make sure it works ok (I just rolled it out to our QA env). So (in case you are really on the ball today), please don't merge until after 12:00pm PST 2/18/20.

I have already done some basic testing on it, to make sure it is good and so far it looks right. I will note, it took a little longer as there was a strange issue with our cluster neo4j pvc on QA. I think this contributed to some of the issues I saw.

javamonkey79 · 2020-02-19T18:47:18Z

@jinhyukchang @feng-tao I apologize, but, there is some yet unknown issue with this. I am still working on it. Once I figure it out, I'll let it run for a week and then let you know when it is good again.

The problem is, the while loop gets caught infinitely because the data file is not present. I thought the problem was related to the persistent volume, but, I updated it yesterday and it is still having issues.

The really odd thing, is that the issue is isolated to our QA cluster. Our PROD cluster is running the backup cron job just fine.

cc @samshuster

jinhyukchang · 2020-02-19T18:54:04Z

No problem, @javamonkey79
No rush here, and let us know when it's ready.

feng-tao · 2020-02-20T17:40:26Z

just saw this, thanks @javamonkey79 , let us know once it is ready

javamonkey79 · 2020-02-20T17:43:15Z

@feng-tao @jinhyukchang the changes are looking good in QA thus far, but, let's definitely stick to the 1 week rollout.

javamonkey79 · 2020-02-21T18:31:47Z

@jinhyukchang I am observing a frustrating issue with regards to neo4j. I wonder if you have encountered it before, or, if it could be related to k8s setup. Basically:

the cron pod starts, installs pip, aws-cli then makes the call to the cypher query to export the schema\data
neo4j then basically locks up
neo4j no longer returns data
neo4j does however still respond to network request in some limited capacity
neo4j webui stays up, but has no data and will not query
there are no errors in any logs

The only differences that I've noted from what you've mentioned to my setup:

I am using cyper-shell instead of neo4j-shell (I could not get neo4j-shell to work, and, it seems to be deprecated anyways)
My invocations are coming from another container instead of from the neo4j container itself; I think this is the canonical approach to this sort of work.

Does your process block other queries while backups are running? What sort of cadence are you running (daily, hourly, etc)? Have you tried out cypher-shell instead? tia....

jinhyukchang · 2020-02-21T19:00:07Z

@javamonkey79 Unfortunately, I didn't experience your symptom. We are performing backup every 10 minutes and it's not affecting performance at all not to mention that it's not blocking other queries.

The main different I see is neo4j-shell vs cypher-shell. I can see your cypher-shell command is passing bolt protocol where I suspect it works differently from Neo4j-shell.

Could you try to make neo4j-shell work? (Also, I didn't see any mention about deprecating Neo4j-shell)
https://neo4j.com/developer/kb/using-neo4j-shell-neo4j-ce-3x/

javamonkey79 · 2020-02-21T19:45:01Z

@jinhyukchang it's a little hard to find, but, this is the error I encountered:

https://stackoverflow.com/q/21448081/27657

From there, you can look up in the docs (again, hard to find):

https://neo4j.com/docs/operations-manual/3.3/configuration/ports/

Neo4j-shell 1337 dbms.shell.port
The neo4j-shell tool is being deprecated and it is recommended to discontinue its use. Supported tools that replace the functionality neo4j-shell are described under Chapter 10, Tools.

jinhyukchang · 2020-02-21T20:12:12Z

Interesting.
I was checking our config and it's just using default port

#⁠ Enable a remote shell server which Neo4j Shell clients can log in to.
dbms.shell.enabled=true
#⁠ The network interface IP the shell will listen on (use 0.0.0.0 for all interfaces).
#⁠dbms.shell.host=127.0.0.1
#⁠ The port the shell will listen on, default is 1337.
#⁠dbms.shell.port=1337

Could you check your config and confirm if Neo4j is using it?

javamonkey79 · 2020-02-22T00:00:07Z

@jinhyukchang I checked, and our version (which is the one here for the community as well btw) does not have the shell switch enabled:

"dbms.shell.enabled" | "Enable a remote shell server which Neo4j Shell clients can log in to. Only applicable to `neo4j-shell`." | "false"
-- | -- | --

I double checked, and it's not listening on 1337 either.

I suppose I could try setting the flag to true and try running through neo4j-shell again.

javamonkey79 · 2020-02-26T19:41:15Z

@jinhyukchang just a quick update; I've tested out the neo4j-shell approach and it has been working for a few days in our QA and DEV envs. We suspect that perhaps b\c cypher-shell communicates over bolt, that it is causing some port conflict issue, but, we're not sure. I will let this run through the weekend and if it looks good Monday I'll roll it out to prod; after that, I think we can merge.

javamonkey79 · 2020-03-10T01:17:13Z

Here is an example one time pod to restore, I'll add this to the docs on my next PR:

    apiVersion: v1
    kind: Pod
    metadata:
      name: restore-neo4j-from-latest
    spec:
      containers:
      - name: restore-neo4j-from-latest
        image: neo4j:3.3.0
        command:
         - "/bin/sh"
         - "-c"
         - |
            apk -v --update add --no-cache --quiet curl python py-pip && pip install awscli -q
            latest_backup=$(aws s3api list-objects-v2 --bucket "$BUCKET" --query 'reverse(sort_by(Contents, &LastModified))[:1].Key' --output=text)
            aws s3 cp s3://$BUCKET/$latest_backup /tmp
            tar -xf /tmp/$latest_backup -C /
            data_file=`ls /data|grep \.data`
            schema_file=`ls /data|grep \.schema`
            ./bin/neo4j-shell -host neo4j -file /data/$schema_file
            echo "CALL apoc.import.graphml('/data/$data_file', {useTypes: true, readLabels: true});" | /var/lib/neo4j/bin/neo4j-shell -host neo4j
        env:
          - name: BUCKET
            value: my-bucket-name
        volumeMounts:
          - name: data
            mountPath: /data        
      restartPolicy: OnFailure
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: neo4j-pvc

* Implement APIs/Saga/Reducer for user 'own' and user 'read' resources * Added bookmarks, read, own to profile page * Refactor styles related to pagination and list items

feng-tao added the keep fresh Disables stalebot from closing an issue label Dec 7, 2019

jornh mentioned this issue Dec 8, 2019

Would like a guide for How-To deploy Amundsen in production #53

Closed

19 tasks

feng-tao added the amundsenmetadatalibrary label Dec 14, 2019

javamonkey79 closed this as completed Feb 13, 2020

javamonkey79 reopened this Feb 14, 2020

javamonkey79 closed this as completed Mar 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

open source amundsen neo4j backup scripts #196

open source amundsen neo4j backup scripts #196

javamonkey79 commented Dec 6, 2019

feng-tao commented Dec 14, 2019

javamonkey79 commented Jan 22, 2020

javamonkey79 commented Jan 22, 2020

jinhyukchang commented Jan 22, 2020 •

edited

Loading

javamonkey79 commented Jan 22, 2020

jinhyukchang commented Jan 23, 2020 •

edited

Loading

javamonkey79 commented Feb 11, 2020

javamonkey79 commented Feb 13, 2020

javamonkey79 commented Feb 14, 2020

javamonkey79 commented Feb 14, 2020

jinhyukchang commented Feb 14, 2020

javamonkey79 commented Feb 14, 2020

javamonkey79 commented Feb 18, 2020

javamonkey79 commented Feb 19, 2020

jinhyukchang commented Feb 19, 2020

feng-tao commented Feb 20, 2020

javamonkey79 commented Feb 20, 2020

javamonkey79 commented Feb 21, 2020

jinhyukchang commented Feb 21, 2020

javamonkey79 commented Feb 21, 2020

jinhyukchang commented Feb 21, 2020 •

edited

Loading

javamonkey79 commented Feb 22, 2020

javamonkey79 commented Feb 26, 2020

javamonkey79 commented Mar 10, 2020

open source amundsen neo4j backup scripts #196

open source amundsen neo4j backup scripts #196

Comments

javamonkey79 commented Dec 6, 2019

AC

feng-tao commented Dec 14, 2019

javamonkey79 commented Jan 22, 2020

javamonkey79 commented Jan 22, 2020

jinhyukchang commented Jan 22, 2020 • edited Loading

javamonkey79 commented Jan 22, 2020

jinhyukchang commented Jan 23, 2020 • edited Loading

javamonkey79 commented Feb 11, 2020

javamonkey79 commented Feb 13, 2020

javamonkey79 commented Feb 14, 2020

javamonkey79 commented Feb 14, 2020

jinhyukchang commented Feb 14, 2020

javamonkey79 commented Feb 14, 2020

javamonkey79 commented Feb 18, 2020

javamonkey79 commented Feb 19, 2020

jinhyukchang commented Feb 19, 2020

feng-tao commented Feb 20, 2020

javamonkey79 commented Feb 20, 2020

javamonkey79 commented Feb 21, 2020

jinhyukchang commented Feb 21, 2020

javamonkey79 commented Feb 21, 2020

jinhyukchang commented Feb 21, 2020 • edited Loading

javamonkey79 commented Feb 22, 2020

javamonkey79 commented Feb 26, 2020

javamonkey79 commented Mar 10, 2020

jinhyukchang commented Jan 22, 2020 •

edited

Loading

jinhyukchang commented Jan 23, 2020 •

edited

Loading

jinhyukchang commented Feb 21, 2020 •

edited

Loading