Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recover data after docker-compose down #37

Closed
pedropalb opened this issue Apr 1, 2020 · 13 comments
Closed

Recover data after docker-compose down #37

pedropalb opened this issue Apr 1, 2020 · 13 comments

Comments

@pedropalb
Copy link

Hello!

In order to get rid of the bug below, I used docker-compose -f .\docker-compose-win10.yml down and then docker-compose -f .\docker-compose-win10.yml up -d.

Failed logging task to backend (2 lines, <500/100: events.add_batch/v1.0 (General data error: err=('2 document(s) failed to index.', [{'index': {'_index': 'events-log-d1bd92a3b039400cbafc60a7a5b1e52b', '_type': 'event', '_id': '45d72e8d79724292a7f7b0a5f58fb681', 'status': 503, 'error': {'type': 'unavailable_shards_exception', 'reason': '[events-log-d1bd92a3b039400cbafc60a7a, {'index': {'_index': 'events-log-d1bd92a3b039400cbafc60a7a5b1e52b', '_type': 'event', '_id': 'e61e98adaff94753afb633ef67afc017', 'status': 503, 'errouests and a refresh]'}, 'data': {'timr': {'type': 'unavailable_shards_exception', 'reason': '[events-log-d1bd92a3b039400cbafc60a7a5b1e52b][0] primary shard is not active Timeout: [1m], reqted new task id=63f6480b0a9a4d078a80cuest: [BulkShardRequest [[events-log-d1bd92a3b039400cbafc60a7a5b1e52b][0]] containing [2] requests and a refresh]'}, 'data': {'timestamp': 158576280385og', '@timestamp': '2020-04-01T17:40:0, 'type': 'log', 'task': '63f6480b0a9a4d078a80c8748f27fc65', 'level': 'info', 'worker': 'pa-barbosa01', 'msg': 'Train for 10 steps, validate for 263 s3afb633ef67afc017', 'status': 503, 'eteps\nEpoch 1/5', '@timestamp': '2020-04-01T17:40:04.236Z', 'metric': '', 'variant': ''}}}]), extra_info=[events-log-d1bd92a3b039400cbafc60a7a5b1e52b][request: [BulkShardRequest [[events-l0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[events-log-d1bd92a3b039400cbafc60a7a5b1e52b][0]] containing [2] requests andb0a9a4d078a80c8748f27fc65', 'level':  a refresh])>)

After that, I don't see any of all the experiments I have done until now. I thought that the services' containers had all the data mapped into the host filesystem (c:\opt\trains in my case).

How can I recover my experiments' data?

@bmartinn
Copy link
Member

bmartinn commented Apr 1, 2020

Hi @pedropalb,

The services' containers indeed have their data folders mapped to the host file system... This might have something to do with an issue during the Elasticsearch container initialization - can you please share the container's log? You can get the log using the following command:

$ docker logs trains-elastic

@pedropalb
Copy link
Author

It seems to be a problem with the free disk space. But I believe that 23.2 GB (the current free space in my disk) would be enough. The used space in c:\opt\trains is only 250 MB. The available space for docker disk image is 16 GB (from which only 3.1 GB). It follows the logs file:

trains-elastic-logs.txt

@bmartinn
Copy link
Member

bmartinn commented Apr 1, 2020

Hi @pedropalb,

By default, the high watermark is 90% (see here), so the question is not how much free space you have on your disk, but what is the percentage of used space - try clearing up some space to see if it helps.

Alternatively, you can also configure the elasticsearch container using a different watermark by setting the value to a different percentage or a hard-coded number of bytes - simply edit your docker compose file and add a new line under the services / elasticsearch / environment section:

services:
  ...
  elasticsearch:
    ...
    environment:
      ...
      cluster.routing.allocation.disk.watermark.high: "15gb"

In the following example, elasticsearch will hit the high watermark only when you have less than 15 gigabytes free on your disk.

Please let me know if that works for you :)

@pedropalb
Copy link
Author

Oh I see! I managed to free some space and it seems to have fixed the elastic search issue. But still, all my data has gone after the docker-compose down and up.

Every time we restart the server we have to create a new user and credentials? Can't I recover my data anymore?

trains-elastic-logs.txt

@bmartinn
Copy link
Member

bmartinn commented Apr 2, 2020

Hi @pedropalb,

Every time we restart the server we have to create a new user and credentials? Can't I recover my data anymore?

The user and credentials are stored in the configuration files, not in the Elasticsearch data - did you lose those as well?

Regarding Elasticsearch, the data should still be there - can you find and send the directory contents of the nodes folder inside the Elasticsearch data folder? It should be located in c:/opt/trains/data/elastic/nodes or thereabouts.

@pedropalb
Copy link
Author

It seems the Elasticsearch data is still there in the path you said. But the MongoDB is almost empty. I tried to query tasks, projects, users, etc. Everything is empty but the user collection that has only the newest user I created. What does go to MongoDB and what does go to Elasticsearch?

I didn't lose the configuration file but it has only the old credential. I didn't specify a user in the config file. I did that through the Web UI. So after the restart, I had to create new credentials and replace them in the config file. With the old credentials I couldn't even use the APIClient().

@bmartinn
Copy link
Member

bmartinn commented Apr 2, 2020

Hi @pedropalb,

You are correct in assuming that tasks, projects etc. (including user credentials) are stored in mongodb.
I now realize that your mongodb data has somehow disappeared, which is very strange - I previously thought it was only Elastic-related.

Can you please share the trains-mongo docker container log?
Also, can you see if there's anything in your C:\Users\Public\Documents\Hyper-V\Virtual hard disks folder? Maybe a file or sub-folder named mongodata?

A few other thoughts:

  1. Is it possible that your docker-compose file was somehow changed and the mount point for the mongodb data folder was changed? Did you update the docker-compose file or download a new one?
  2. Did you upgrade your Docker Desktop? In order to use Linux containers, you usually need to add mapped drives to the Shared Drives list in the Docker Desktop Settings. However, Docker Desktop seems to have an inconsistency in this feature since I can't find this setting any more in the Docker Desktop Settings page, but their troubleshoot page still says it's required (see Troubleshoot, under VOLUME MOUNTING REQUIRES SHARED DRIVES FOR LINUX CONTAINERS - the link there now points to nowhere...) In case of the mount silently failing (it shouldn't, but still), what you're seeing now is probably the result of the mongo data folder not being mapped outside of the docker container, in which case mongo simply creates a new empty database inside the docker that has nothing to do with the outside world.

@pedropalb
Copy link
Author

Here is the MongoDB log:
trains-mongo-logs.txt

There is nothing in C:\Users\Public\Documents\Hyper-V\Virtual hard disks.

  1. I did docker-compose down, downloaded a newer docker-compose file and did docker-compose up. But, now, without changing the docker-compose file, every time I restart the server, all the data is gone (I tested creating a new project and restarting the server).

  2. I upgraded the docker. I'm using Docker Desktop 2.2.0.4 with docker engine 19.03.8. I believe that this volume mapping is the source of my problems. I will check it.

Thanks!

@pedropalb
Copy link
Author

pedropalb commented Apr 3, 2020

Hi @bmartinn!

The issue is in the volume mapping in the docker-compose.yaml for the MongoDB service. By default, MongoDB writes the data in /data/db. But in the docker-compose file, we have mongodata:/data. But /data has no data at all and, consequently, neither does the volume trains_mongodata. When the container dies, all the mongo data in /data/db dies with it.

So, the solution - I hope - is to change the mapping from mongodata:/data to mongodata:/data/db so the MongoDB's records are mapped to the volume trains_mongodata.

I noted that docker-compose.win10.yml has this issue but the docker-compose.yml has not. The latter has no volume named trains_mongodata. It maps /data/db directly in a host directory as it is done with the other data sources.

Would this change impact other TRAINS' functionalities?

Thanks!

@bmartinn
Copy link
Member

bmartinn commented Apr 6, 2020

Hi @pedropalb,

I just found where the shared drives feature was moved to in the new Docker Desktop: please go to Settings / Resources / File Sharing and make sure your drive (c drive, I assume) is marked for file sharing - let me know if that changes anything.

@pedropalb
Copy link
Author

Hi @bmartinn,
I figured out what was the problem and reported the solution above in my previous comment.
Thanks.

@pedropalb
Copy link
Author

Hi,
I noticed that in the docker-compose-win10.yml file, the volume mapping of the mongo container is still using /data. Have you tested this on windows? As I reported above (#37 (comment)), it only worked for me changing - mongodata:/data to - mongodata:/data/db.
https://github.com/allegroai/trains-server/blob/3bf5126d84a67e7dec193bec0f6eff165e25665f/docker-compose-win10.yml#L90

Is there any other place where you change the mongo default data directory to /data?

@pedropalb pedropalb reopened this Jun 16, 2020
@bmartinn
Copy link
Member

bmartinn commented Jun 16, 2020

Hi @pedropalb
It seems like the new Docker for windows, solved the need for a specific volume for the mongodb docker (which was the reason to map the parent /data folder and not the /data/db and /data/configdb). The docker compose for windows is now updated with similar fix.
Let me know if the issue consists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants