Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to start fscrawler using docker-compose #1267

Closed
LeoPiresDeSouza opened this issue Sep 28, 2021 · 13 comments · Fixed by #1286
Closed

Trying to start fscrawler using docker-compose #1267

LeoPiresDeSouza opened this issue Sep 28, 2021 · 13 comments · Fixed by #1286
Labels
check_for_bug Needs to be reproduced

Comments

@LeoPiresDeSouza
Copy link

LeoPiresDeSouza commented Sep 28, 2021

Hi, I really need your help.

I developed a system to create a Knowledge Base using elastic search and fscrawler 2.7. All the development were done in windows plataform and I was running fscrawler as a windows service.

Now I am trying to deploy the solution to a linux environment using docker. By now, elasticsearch and kibana containers are ready to use but I am having troubles to start fscrawler. Every time I run my docker-compose I got the same error message:

[leo.souza@srvdocker-trn volumes]$ sudo docker-compose up fscrawler
elst_elasticSearch_dev01 is up-to-date
fscrawler is up-to-date
Attaching to fscrawler
fscrawler | /usr/bin/fscrawler: 47: /usr/bin/fscrawler: ps: not found
fscrawler | ERROR StatusLogger Reconfiguration failed: No configuration found for '55054057' at 'null' in 'null'
fscrawler | Exception in thread "main" java.util.NoSuchElementException
fscrawler | at java.base/java.util.Scanner.throwFor(Scanner.java:937)
fscrawler | at java.base/java.util.Scanner.next(Scanner.java:1478)
fscrawler | at fr.pilato.elasticsearch.crawler.fs.cli.FsCrawlerCli.main(FsCrawlerCli.java:254)

Follows my docker-compose.yml:

version: "2"

networks:
elasticSearch:
driver: bridge

volumes:
elasticSearch:
driver: local

services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:$ELASTIC_VERSION
container_name: elst_elasticSearch_dev01
restart: unless-stopped
environment:
- "discovery.type=single-node"
- "ES_JAVA_OPTS=-Xms2g -Xmx2g"
- xpack.security.enabled=$ELASTIC_SECURITY
- xpack.security.authc.api_key.enabled=$ELASTIC_SECURITY
- ELASTIC_PASSWORD=$ELASTIC_PASSWORD
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- elasticSearch:/usr/share/elasticsearch/data
ports:
- 192.168.0.250:9200:9200
networks:
- elasticSearch

ent-search:
image: docker.elastic.co/enterprise-search/enterprise-search:$ELASTIC_VERSION
container_name: elst_enterpriseSearch_dev01
restart: unless-stopped
depends_on:
- "elasticsearch"
environment:
- "JAVA_OPTS=-Xms512m -Xmx512m"
- ENT_SEARCH_DEFAULT_PASSWORD=$ELASTIC_PASSWORD
- "elasticsearch.username=elastic"
- "elasticsearch.password=unimedrecife#01"
- "elasticsearch.host=http://elasticsearch:9200"
- "allow_es_settings_modification=true"
- "secret_management.encryption_keys=[4a2cd3f81d39bf28738c10db0ca782095ffac07279561809eecc722e0c20eb09]"
- "elasticsearch.startup_retry.interval=15"
ports:
- 192.168.0.250:3002:3002
networks:
- elasticSearch

kibana:
image: docker.elastic.co/kibana/kibana:$ELASTIC_VERSION
container_name: elst_kibana_dev01
restart: unless-stopped
depends_on:
- "elasticsearch"
- "ent-search"
ports:
- 192.168.0.250:5601:5601
environment:
- "ELASTICSEARCH_HOSTS=http://elasticsearch:9200"
- "ENTERPRISESEARCH_HOST=http://ent-search:3002"
- "ELASTICSEARCH_USERNAME=elastic"
- ELASTICSEARCH_PASSWORD=$ELASTIC_PASSWORD
networks:
- elasticSearch

fscrawler:
image: dadoonet/fscrawler:$FSCRAWLER_VERSION
container_name: fscrawler
restart: always
volumes:
- /var/lib/fscrawler/volumes/config:/root/.fscrawler
- /var/lib/fscrawler/volumes/logs:/usr/share/fscrawler/logs
- /var/lib/fscrawler/volumes/documents/:/tmp/es:ro
depends_on:
- elasticsearch
command: fscrawler --rest atendimento
networks:
- elasticSearch

I expecteded that fscrawler asked me about the creation fo the new job for "atendimento", so I could edit the _settings.yaml file.
Please, what is going wrong about that?

Thanks in advance.

@LeoPiresDeSouza LeoPiresDeSouza added the check_for_bug Needs to be reproduced label Sep 28, 2021
@dadoonet
Copy link
Owner

I think you are hitting this bug: #1229

@LeoPiresDeSouza
Copy link
Author

Thank you. But is there anything I can do to by now?

@dadoonet
Copy link
Owner

I'm afraid not.

you could try to apply a patch by yourself and build the images locally. But otherwise you will need to wait for the fix.
I can't commit on any date though.

@LeoPiresDeSouza
Copy link
Author

Ok. By now, I am running fscrawler as a service in linux CentOs. It is running pretty well.
I will follow #1229.

Thanks.

@LeoPiresDeSouza
Copy link
Author

Do you know if this bug is for a specific linux distribution? I am using CentOs 7.
What I want to know is if there is any environment where I can use fscrawler whith docker.

Thank's again.

@dadoonet
Copy link
Owner

It's a bug with the Docker image. The bug will happen with every machine type.

@LeoPiresDeSouza
Copy link
Author

Ok. Is it possible to run fscrawler in an linux machine as a orphan backgroud job (nohup or setsid)?

@LeoPiresDeSouza
Copy link
Author

Well, I tried as a background job (using &) and I got the same error as in docker compose.
But using nohup and not in background, even when I close the terminal window it still indexes files.

@dadoonet
Copy link
Owner

May be try with the --silent option?

@helsonxiao
Copy link
Contributor

You could try to revert these two commits and build your own docker image before this bug is fixed.
1469917
1469917

@LeoPiresDeSouza
Copy link
Author

I tried "nohup ./fscrawler --silent myindex &" and "setsid ./fscrawler --silent myindex &".
In both cases fscrawler runs in background indexing files as expected.
But it seems to run as --trace and not as --silent. In nohup case, it fills $HOME/nohup.out with trace messages and in setsid case it sends trace messages to the console.

@LeoPiresDeSouza
Copy link
Author

By now I am running it as:

setsid ./fscrawler --silent myindex &>/dev/null &

@dadoonet
Copy link
Owner

This should have been fixed by #1286.

Try the 2.8-SNAPSHOT version:

docker rm fscrawler
docker run --name fscrawler -d -it -v $PWD/config:/root/.fscrawler -v $PWD/logs/$job_name:/usr/share/fscrawler/logs dadoonet/fscrawler:2.8-SNAPSHOT fscrawler --debug

If you want to run as a service (with docker-compose), indeed --silent is probably needed.

@dadoonet dadoonet linked a pull request Oct 15, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
check_for_bug Needs to be reproduced
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants