Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training hangs and almost never reaches 100% (reached one or twice) #1726

Open
boyoma opened this issue Mar 16, 2023 · 3 comments
Open

Training hangs and almost never reaches 100% (reached one or twice) #1726

boyoma opened this issue Mar 16, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@boyoma
Copy link

boyoma commented Mar 16, 2023

Describe the bug
My installation looks, fine (web embedding on another domain too) but every time I press train chabot percentage will raise slowly and eventually it will stop before reaching 100%, and I will need to press the button again. It got completed maybe once out of 100 times. More often than not it stuck at 0%.

The exact same bot was first produced in localhost and train there and it was working very fine. slow but it completes.

I'm using a 4 GB Memory / 80 GB Disk / Ubuntu 20.04 (LTS) x64

To Reproduce
Steps to reproduce the behavior:

  1. Go to 'a bot'
  2. Click on 'train chatbot'
  3. See error 'almost never reach 100%'

Expected behavior
Being slow it is ok but at least it should complete

Environment (please complete the following information):

  • OS: linux
  • Browser chrome
  • Browser Version 111.0.5563.64
  • Botpress Version 12.30.7
@boyoma boyoma added the bug Something isn't working label Mar 16, 2023
@cccaballero
Copy link

cccaballero commented Apr 17, 2023

Seems like I have the same problem, I installed on my local PC using the following docker-comopse.yml to test:

version: '3'

services:
  botpress:
    image: botpress/server
    expose:
      - 3000
    ports:
      - "3000:3000"
    environment:
      - DATABASE_URL=postgres://postgres:secretpw@postgres:5435/botpress_db
    depends_on:
      - postgres
    volumes:
      - ./build/botpress/data:/botpress/data

  postgres:
    image: postgres:11.2-alpine
    expose:
      - 5435
    environment:
      PGPORT: 5435
      POSTGRES_DB: botpress_db
      POSTGRES_PASSWORD: secretpw
      POSTGRES_USER: postgres
    volumes:
      - pgdata:/var/lib/postgresql/data

volumes:
  pgdata:

This is all I can see in the logs:

botpress_1 | 04/17/2023 00:17:21.386 [NLU] training-queue [test01/87443a324c26b933.b7f4a95061d75566.3265.en] Training Queued.
botpress_1 | 04/17/2023 00:17:21.692 [NLU] Engine:training Training worker successfully started on process with pid 180.

I am using to test a new bot called test01 from the Small Talk template and without making changes.

I have not been able to complete any training, the maximum that I have been able to reach is 80%

@sebburon
Copy link
Contributor

sebburon commented Apr 20, 2023

How much memory is accessible to your Botpress containers?
Usually, when the training stops between 80 and 99% it's because the training process was killed by the OS because it was using too much memory.

Make sure your Botpress node has access to at least 3GB of ram.

Thanks,

@cccaballero
Copy link

cccaballero commented Apr 24, 2023

@sebburon I don't think it's a memory problem, I don't have any limits defined for the docker container, and I have plenty of ram. This is what docker stats tells me:

MEM USAGE / LIMIT
573.6MiB / 38.88GiB

@michaelmass michaelmass transferred this issue from botpress/botpress Jun 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants