Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Synology error seccomp unavailable when starting Elasticsearch #358

Closed
1 task done
N72826 opened this issue Nov 2, 2022 · 17 comments
Closed
1 task done
Labels
question Further information is requested

Comments

@N72826
Copy link

N72826 commented Nov 2, 2022

Latest and Greatest

  • I'm running the latest version of Tube Archivist and have read the release notes.

Operating System

Synology

Your Bug Report

Describe the bug

Hello, I want to thank you for creating this project. I already solved the redis issue. Archivist-es is continuously restarting. I will provide a log from synology docker here in a second. I primarily use portainer to manage the containers but since elasticsearch keeps bootlooping, the container log page won't even load on portainer.

Steps To Reproduce

attempting to start the containers

Expected behavior

archivist-es should run and continuously stay active instead of restarting itself every few seconds and occupying memory usage.
archivist-es.csv

Relevant log output

version: '3.3'

services:
  tubearchivist:
    container_name: tubearchivist
    restart: unless-stopped
    image: bbilly1/tubearchivist
    ports:
      - 8100:8000
    volumes:
      - /volume3/TA_Creators:/youtube
      - /volume3/docker/tubearchivist/cache:/cache
    environment:
      - ES_URL=http://10.10.0.215:9200
      - REDIS_HOST=archivist-redis
      - HOST_UID=1024
      - HOST_GID=100
      - TA_HOST=10.10.0.215
      - TA_PASSWORD=REDACTED
      - ELASTIC_PASSWORD=REDACTED
      - TZ=EST
    depends_on:
      - archivist-es
      - archivist-redis
  archivist-redis:
    image: redislabs/rejson
    container_name: archivist-redis
    restart: unless-stopped
    expose:
      - "6379"
    volumes:
      - /volume3/docker/tubearchivist/redis:/data
    depends_on:
      - archivist-es
  archivist-es:
    image: bbilly1/tubearchivist-es
    container_name: archivist-es
    restart: unless-stopped
    environment:
      - "xpack.security.enabled=true"
      - "discovery.type=single-node"
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - ELASTIC_PASSWORD=REDACTED
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - /volume3/docker/tubearchivist/es:/usr/share/elasticsearch/data
    expose:
      - "9200"

Anything else?

I attached my log output above since that's the only place where I could and because it only allowed me to export as a formatted csv file from docker. Apologies and hopefully the solution is easily apparent.

@bbilly1
Copy link
Member

bbilly1 commented Nov 3, 2022

What synology device is that? I think the crucial part of the error is

seccomp unavailable: CONFIG_SECCOMP not compiled into kernel

You can find that error a few places over the internet, for example here. Usually they suggest to deactivate that security feature, at your own risk of course, by setting the environment variable to the ES container:

"bootstrap.system_call_filter=false"

I have never seen that error, which suggests that this is a synology specific issue, maybe an older model with older software running?

@bbilly1 bbilly1 added the question Further information is requested label Nov 3, 2022
@bbilly1 bbilly1 changed the title [Bug]: archivist-es erroring out [Bug]: Synology error seccomp unavailable when starting Elasticsearch Nov 3, 2022
@lamusmaser
Copy link
Collaborator

The error seccomp unavailable: CONFIG_SECCOMP not compiled into kernel is a non-fatal error. I have my Synology system running ES and I can see the error as part of its instantiation logs.

Currently reviewing logs to see what is different between the two environments.

@lamusmaser
Copy link
Collaborator

lamusmaser commented Nov 3, 2022

The fatal error is coming from another area:

2022-11-02T11:30:47.149264603Z,stdout,"{\"@timestamp\":\"2022-11-02T11:30:47.147Z\", \"log.level\":\"ERROR\", \"message\":\"fatal exception while booting Elasticsearch\", \"ecs.version\": \"1.2.0\",\"service.name\":\"ES_ECS\",\"event.dataset\":\"elasticsearch.server\",\"process.thread.name\":\"main\",\"log.logger\":\"org.elasticsearch.bootstrap.Elasticsearch\",\"elasticsearch.node.name\":\"8aaff96761ad\",\"elasticsearch.cluster.name\":\"docker-cluster\",\"error.type\":\"java.lang.IllegalArgumentException\",\"error.message\":\"Could not load codec 'Lucene94'. Did you forget to add lucene-backward-codecs.jar?\",\"error.stack_trace\":\"java.lang.IllegalArgumentException: Could not load codec 'Lucene94'. Did you forget to add lucene-backward-codecs.jar?\n\tat org.apache.lucene.core@9.3.0/org.apache.lucene.index.SegmentInfos.readCodec(SegmentInfos.java:515)\n\tat org.apache.lucene.core@9.3.0/org.apache.lucene.index.SegmentInfos.parseSegmentInfos(SegmentInfos.java:404)\n\tat org.apache.lucene.core@9.3.0/org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:363)\n\tat org.apache.lucene.core@9.3.0/org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:299)\n\tat org.apache.lucene.core@9.3.0/org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:88)\n\tat org.apache.lucene.core@9.3.0/org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:77)\n\tat org.apache.lucene.core@9.3.0/org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:809)\n\tat org.apache.lucene.core@9.3.0/org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:109)\n\tat org.apache.lucene.core@9.3.0/org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:67)\n\tat org.apache.lucene.core@9.3.0/org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:60)\n\tat org.elasticsearch.server@8.4.3/org.elasticsearch.gateway.PersistedClusterStateService.nodeMetadata(PersistedClusterStateService.java:322)\n\tat org.elasticsearch.server@8.4.3/org.elasticsearch.env.NodeEnvironment.loadNodeMetadata(NodeEnvironment.java:599)\n\tat org.elasticsearch.server@8.4.3/org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:326)\n\tat org.elasticsearch.server@8.4.3/org.elasticsearch.node.Node.<init>(Node.java:456)\n\tat org.elasticsearch.server@8.4.3/org.elasticsearch.node.Node.<init>(Node.java:311)\n\tat org.elasticsearch.server@8.4.3/org.elasticsearch.bootstrap.Elasticsearch$2.<init>(Elasticsearch.java:214)\n\tat org.elasticsearch.server@8.4.3/org.elasticsearch.bootstrap.Elasticsearch.initPhase3(Elasticsearch.java:214)\n\tat org.elasticsearch.server@8.4.3/org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:67)\n\tSuppressed: org.apache.lucene.index.CorruptIndexException: checksum passed (d96d01b). possibly transient resource issue, or a Lucene or JVM bug (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path=\\"/usr/share/elasticsearch/data/_state/segments_13h\\")))\n\t\tat org.apache.lucene.core@9.3.0/org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:500)\n\t\tat org.apache.lucene.core@9.3.0/org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:370)\n\t\t... 15 more\nCaused by: java.lang.IllegalArgumentException: An SPI class of type org.apache.lucene.codecs.Codec with name 'Lucene94' does not exist.  You need to add the corresponding JAR file supporting this SPI to your classpath.  The current classpath supports the following names: [Lucene92, Lucene70, Lucene80, Lucene84, Lucene86, Lucene87, Lucene90, Lucene91, BWCLucene70Codec, Lucene62, Lucene60, SimpleText]\n\tat org.apache.lucene.core@9.3.0/org.apache.lucene.util.NamedSPILoader.lookup(NamedSPILoader.java:113)\n\tat org.apache.lucene.core@9.3.0/org.apache.lucene.codecs.Codec.forName(Codec.java:118)\n\tat org.apache.lucene.core@9.3.0/org.apache.lucene.index.SegmentInfos.readCodec(SegmentInfos.java:511)\n\t... 17 more\n\"}

Looking into my runtime, I do see the call for lucene, but they are succeeding by calling to a previous version. Looking at the logs for both, however, I see that both are making the successful call:
Issue ES

2022-11-02T11:30:43.172712792Z,stdout,"{\"@timestamp\":\"2022-11-02T11:30:42.962Z\", \"log.level\": \"INFO\", \"message\":\"loaded module [old-lucene-versions]\", \"ecs.version\": \"1.2.0\",\"service.name\":\"ES_ECS\",\"event.dataset\":\"elasticsearch.server\",\"process.thread.name\":\"main\",\"log.logger\":\"org.elasticsearch.plugins.PluginsService\",\"elasticsearch.node.name\":\"8aaff96761ad\",\"elasticsearch.cluster.name\":\"docker-cluster\"}

Stable ES

2022-08-26 15:51:21,stdout,"{\"@timestamp\":\"2022-08-26T15:51:21.967Z\", \"log.level\": \"INFO\", \"message\":\"loaded module [old-lucene-versions]\", \"ecs.version\": \"1.2.0\",\"service.name\":\"ES_ECS\",\"event.dataset\":\"elasticsearch.server\",\"process.thread.name\":\"main\",\"log.logger\":\"org.elasticsearch.plugins.PluginsService\",\"elasticsearch.node.name\":\"4de5af7914c8\",\"elasticsearch.cluster.name\":\"docker-cluster\"}

The only real difference I can see is the currently running version of ES:
Issue ES
version[8.4.3]
Stable ES
version[8.3.3]

Questions:

  1. Did you have a working configuration prior to the redis issue?
  2. Was this ES upgraded recently?
  3. If this was working previously, do you have the details of the previous ES version where it was working?

@N72826
Copy link
Author

N72826 commented Nov 3, 2022

What synology device is that? I think the crucial part of the error is

It's the FS1018 running DSM version DSM 7.1.1-42962 Update 1

Usually they suggest to deactivate that security feature, at your own risk of course, by setting the environment variable to the ES container:

"bootstrap.system_call_filter=false"

I tried adding that as an environment variable in the ES container and nothing changed.

Questions:
1. Did you have a working configuration prior to the redis issue?

I did have a working configuration, but since I have watchtower and it automatically updates other containers, I couldn't tell you when ES was last working since the containers have to be recreated for the images to be updated. Before I realized there was a problem with the ES container, there was another issue with the redis container shown here #354 So I removed the RDB file in the mounted folder and it began working again.

2. Was this ES upgraded recently?

I recreated the container with suggested bbilly1/tubearchivist-es because I thought that was the issue. Before that, I don't think it had been changed because my compose file had docker.elastic.co/elasticsearch/elasticsearch:8.3.2 as the image for ES.

3. If this was working previously, do you have the details of the previous ES version where it was working?

The nodes file in the ES mounted folder says written by Elasticsearch v8.3.2 to prevent a downgrade to a version prior to v8.0.0 which would result in data loss so I'm certain that was the last stable version I was on. When I try loading docker.elastic.co/elasticsearch/elasticsearch:8.3.2 like before, I receive an error saying lucene94 doesn't exist and the container continuously restarts.

@lamusmaser
Copy link
Collaborator

OK, so this originally broke in a static configuration of docker.elastic.co/elasticsearch/elasticsearch:8.3.2, and after an upgrade (caused by changing to bbilly1/tubearchivist-es and reversion we still see the issue.

Let me take a look closer at the container to see if there might be something that we can fix/replace to resolve the issue.

@lamusmaser
Copy link
Collaborator

Alright, found out that it looks like a corrupted index is causing the problem. Looking into another issue that was similar, I was able to look closer at your error output.

This segment of the error output gives us the required details about which index is causing the problems:

Suppressed: org.apache.lucene.index.CorruptIndexException: checksum passed (d96d01b). possibly transient resource issue,
 or a Lucene or JVM bug (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path=\\"/usr/share/elasticsearch/data/_state/segments_13h\\")))

This gives us a location of /usr/share/elasticsearch/data/_state/segments_13h that is having the problems. Back up the location, delete the data, then try and restart the container.
Note: This is from the perspective of the container; based on the docker_compose.yml that was provided, you'll want to target /volume3/docker/tubearchivist/es/_state/segments_13h for the backup/removal process.

Let us know if that works out for you and if it changes the output of the logs.

@N72826
Copy link
Author

N72826 commented Nov 4, 2022

image
alright I temporarily moved the segments file. now I am receiving this error. did I fuck up in changing the container image to the latest version of elasticsearch? because now it seems I can't downgrade. I will momentarily edit this reply with the results of changing it back to 8.5.0

@lamusmaser
Copy link
Collaborator

lamusmaser commented Nov 4, 2022

ES doesn't like downgrading, but that image seems to indicate progress. Looking forward to the results of the upgrade to 8.5.X to see if this issue is now resolved overall.

@bbilly1
Copy link
Member

bbilly1 commented Nov 4, 2022

ES 8.5.0 introduced breaking changes and a lot of other changes too, I haven't checked the release notes yet. Best to stick with the version in the docker-compose file.

Starting to change things in the ES filesystem, you'll run into danger to make things worse. maybe it's best to restore from backup?

But yes, downgrading is not supported by ES...

@N72826
Copy link
Author

N72826 commented Nov 4, 2022

image
alright it seems ES works now. but tubearchivist can't connect. what's the difference between using "expose" and "ports" because I never had this issue back when I used ports.
I really wish I hadn't gone to the latest version of elasticsearch. I checked here and updated the compose to use the latest version after I realized the container wasn't working properly when in fact I probably could have solved it just through deleting the RDB file that was preventing the redis container from running. :(

@lamusmaser
Copy link
Collaborator

Expose should be used when services are going to be on the same network. Since these are all part of the network (because they are being started by the same docker_compose.yml), it would mean that the internal services have access to that port on that container, but the host system doesn't have access to that port.

The problem that you are having is that you are attempting to connect to ES via the host IP. Change out the ES_URL from http://10.10.0.215:9200 to http://archivist-es:9200 and it should allow it to connect successfully.

@lamusmaser
Copy link
Collaborator

@N72826 have you been able to confirm that this configuration update allows the system to work as expected? If there are any additional problems, let us know and we can review and try to help with troubleshooting the cause.

@N72826
Copy link
Author

N72826 commented Nov 7, 2022

@N72826 have you been able to confirm that this configuration update allows the system to work as expected? If there are any additional problems, let us know and we can review and try to help with troubleshooting the cause.

yeah that solved the connection issue although it worked the way I had it configured previously for some reason.

Like I said before, I think the primary issue I had was redis being incapable of reading the RDB file like in #354 .
When I realized tubearchivist wasn't working, I first suspected it was the ES container being outdated since I had to manually update it in the past.

So I screwed myself over by setting the ES image to the latest. And after updating ES didn't fix it, I then found that other issue describing the fix for redis.

But the damage was already done and deleting the RDB file only fixed the redis container so now I was left with an incompatible version of ES. Thank you guys for helping me figure that out. Appreciate those who contribute to this project because it made my life easier.

The only thing I lost was my tubearchivist settings, I had it set to auto rescan subscriptions every 30 minutes and start downloads every hour. Scheduling the start downloads was easy but I remember having trouble with scheduling the rescan.

I know the cron scheduling is non standard and by design prevents you from adding the wildcard (*) in the minutes value, but somehow I was able to schedule the rescan to occur every 30 minutes.

@lamusmaser
Copy link
Collaborator

Every thirty minutes should be */30 * * for the timing, if you want it on 0 and 30. However, there is a note from the configuration page, under Scheduler Setup: Avoid an unnecessary frequent schedule to not get blocked by YouTube. For that reason * or wildcard for minutes is not supported. This means that you'll have to hard-code the time slot, such as 0,30 * *, or set another configuration that is some multiple of 30 minutes apart, such as 15,45 * *.

@N72826
Copy link
Author

N72826 commented Nov 7, 2022

Every thirty minutes should be */30 * * for the timing, if you want it on 0 and 30. However, there is a note from the configuration page, under Scheduler Setup: Avoid an unnecessary frequent schedule to not get blocked by YouTube. For that reason * or wildcard for minutes is not supported. This means that you'll have to hard-code the time slot, such as 0,30 * *, or set another configuration that is some multiple of 30 minutes apart, such as 15,45 * *.

Yeah, the wildcard doesn't work. And I tried the comma separated values but that also didn't work. So I'm not sure what I had it set to before but I accomplished it somehow and never got blocked by youtube.

This isn't even really an issue, more of a preference. The original issue I created this thread for has been solved just thought I might ask while I am here.

@bbilly1
Copy link
Member

bbilly1 commented Nov 8, 2022

Yeah, that part of the wiki is outdated. After we had a few people coming here surprised that they got blocked by YouTube, I thought it best to limit the frequency to maximal once per hour. I'll fix the wording...

But glad you figured things out here.

@bbilly1
Copy link
Member

bbilly1 commented Nov 12, 2022

closing this for now, better wording will be in the next release.

@bbilly1 bbilly1 closed this as completed Nov 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants