Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server crash while trying to open a job #7959

Open
2 tasks done
ralwing opened this issue May 29, 2024 · 4 comments
Open
2 tasks done

Server crash while trying to open a job #7959

ralwing opened this issue May 29, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@ralwing
Copy link

ralwing commented May 29, 2024

Actions before raising this issue

  • I searched the existing issues and did not find anything similar.
  • I read/searched the docs

Steps to Reproduce

  1. Create a new task with this collection of pcd files:

  2. Try to open the job

  3. The server returns 500 while trying to load and in there is a crash in the logs

2024-05-28 13:21:42,169 DEBG 'uvicorn-0' stderr output:
[2024-05-28 13:21:42,167] ERROR django.request: Internal Server Error: /api/jobs/588/data
Traceback (most recent call last):
  File "/opt/venv/lib/python3.10/site-packages/redis/connection.py", line 822, in send_packed_command
    self._sock.sendall(item)
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/venv/lib/python3.10/site-packages/asgiref/sync.py", line 534, in thread_handler
    raise exc_info[1]
  File "/opt/venv/lib/python3.10/site-packages/django/core/handlers/exception.py", line 42, in inner
    response = await get_response(request)
  File "/opt/venv/lib/python3.10/site-packages/django/core/handlers/base.py", line 253, in _get_response_async
    response = await wrapped_callback(
  File "/opt/venv/lib/python3.10/site-packages/asgiref/sync.py", line 479, in __call__
    ret: _R = await loop.run_in_executor(
  File "/opt/venv/lib/python3.10/site-packages/asgiref/current_thread_executor.py", line 40, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/opt/venv/lib/python3.10/site-packages/asgiref/sync.py", line 538, in thread_handler
    return func(*args, **kwargs)
  File "/opt/venv/lib/python3.10/site-packages/django/views/decorators/csrf.py", line 56, in wrapper_view
    return view_func(*args, **kwargs)
  File "/opt/venv/lib/python3.10/site-packages/rest_framework/viewsets.py", line 125, in view
    return self.dispatch(request, *args, **kwargs)
  File "/opt/venv/lib/python3.10/site-packages/rest_framework/views.py", line 509, in dispatch
    response = self.handle_exception(exc)
  File "/opt/venv/lib/python3.10/site-packages/rest_framework/views.py", line 469, in handle_exception
    self.raise_uncaught_exception(exc)
  File "/opt/venv/lib/python3.10/site-packages/rest_framework/views.py", line 480, in raise_uncaught_exception
    raise exc
  File "/opt/venv/lib/python3.10/site-packages/rest_framework/views.py", line 506, in dispatch
    response = handler(request, *args, **kwargs)
  File "/home/django/cvat/apps/engine/views.py", line 1903, in data
    return data_getter(request, db_job.segment.start_frame,
  File "/home/django/cvat/apps/engine/views.py", line 727, in __call__
    return super().__call__(request, start, stop, db_data)
  File "/home/django/cvat/apps/engine/views.py", line 657, in __call__
    buff, mime_type = frame_provider.get_chunk(self.number, self.quality)
  File "/home/django/cvat/apps/engine/frame_provider.py", line 207, in get_chunk
    return self._loaders[quality].get_chunk_path(chunk_number, quality, self._db_data)
  File "/home/django/cvat/apps/engine/cache.py", line 80, in get_task_chunk_data_with_mime
    item = self._get_or_set_cache_item(
  File "/home/django/cvat/apps/engine/cache.py", line 68, in _get_or_set_cache_item
    item = create_item()
  File "/home/django/cvat/apps/engine/cache.py", line 55, in create_item
    self._cache.set(key, item)
  File "/opt/venv/lib/python3.10/site-packages/django/core/cache/backends/redis.py", line 191, in set
    self._cache.set(key, value, self.get_backend_timeout(timeout))
  File "/opt/venv/lib/python3.10/site-packages/django/core/cache/backends/redis.py", line 108, in set
    client.set(key, value, ex=timeout)
  File "/opt/venv/lib/python3.10/site-packages/redis/commands/core.py", line 2302, in set
    return self.execute_command("SET", *pieces, **options)
  File "/opt/venv/lib/python3.10/site-packages/redis/client.py", line 1258, in execute_command
    return conn.retry.call_with_retry(
  File "/opt/venv/lib/python3.10/site-packages/redis/retry.py", line 49, in call_with_retry
    fail(error)
  File "/opt/venv/lib/python3.10/site-packages/redis/client.py", line 1262, in <lambda>
    lambda error: self._disconnect_raise(conn, error),
  File "/opt/venv/lib/python3.10/site-packages/redis/client.py", line 1248, in _disconnect_raise
    raise error
  File "/opt/venv/lib/python3.10/site-packages/redis/retry.py", line 46, in call_with_retry
    return do()
  File "/opt/venv/lib/python3.10/site-packages/redis/client.py", line 1259, in <lambda>
    lambda: self._send_command_parse_response(
  File "/opt/venv/lib/python3.10/site-packages/redis/client.py", line 1234, in _send_command_parse_response
    conn.send_command(*args)
  File "/opt/venv/lib/python3.10/site-packages/redis/connection.py", line 840, in send_command
    self.send_packed_command(
  File "/opt/venv/lib/python3.10/site-packages/redis/connection.py", line 833, in send_packed_command
    raise ConnectionError(f"Error {errno} while writing to socket. {errmsg}.")
redis.exceptions.ConnectionError: Error 104 while writing to socket. Connection reset by peer.

2024-05-28 13:21:42,170 DEBG 'uvicorn-0' stdout output:
INFO:     172.17.0.7:0 - "GET /api/jobs/588/data?org=&quality=compressed&type=chunk&number=0 HTTP/1.0" 500 Internal Server Error

Expected Behavior

The job is opening without a crashing server.

Possible Solution

When I create a task by splitting this set of files into several smaller ones, the task loads correctly

Context

No response

Environment

Server version: 2.10.2

Core version: 14.1.0

Canvas version: 2.19.1

UI version: 1.61.3


docker ps
CONTAINER ID   IMAGE                                       COMMAND                  CREATED        STATUS                PORTS                                                                                          NAMES
313b516a88ec   gcr.io/iguazio/alpine:3.17                  "/bin/sh -c '/bin/sl…"   4 hours ago    Up 4 hours                                                                                                           nuclio-local-storage-reader
6ff6a699543f   cvat/server:v2.10.2                         "./backend_entrypoin…"   5 weeks ago    Up 2 days             8080/tcp                                                                                       cvat_server
bb31577677e0   cvat.onnx.wongkinyiu.yolov7:latest          "processor"              3 months ago   Up 2 days (healthy)   0.0.0.0:32768->8080/tcp, :::32768->8080/tcp                                                    nuclio-nuclio-onnx-wongkinyiu-yolov7
58d8c150e120   cvat/server:v2.10.2                         "./backend_entrypoin…"   3 months ago   Up 2 days             8080/tcp                                                                                       cvat_worker_annotation
44faad47461b   quay.io/nuclio/dashboard:1.11.24-amd64      "/docker-entrypoint.…"   3 months ago   Up 2 days (healthy)   80/tcp, 0.0.0.0:8070->8070/tcp, :::8070->8070/tcp                                              nuclio
c92c9dc59126   cvat/ui:v2.10.2                             "/docker-entrypoint.…"   3 months ago   Up 2 days             80/tcp                                                                                         cvat_ui
08de5d80177a   cvat/server:v2.10.2                         "./backend_entrypoin…"   3 months ago   Up 2 days             8080/tcp                                                                                       cvat_worker_quality_reports
2539f26b757b   cvat/server:v2.10.2                         "./backend_entrypoin…"   3 months ago   Up 2 days             8080/tcp                                                                                       cvat_worker_webhooks
24d164eceea9   cvat/server:v2.10.2                         "./backend_entrypoin…"   3 months ago   Up 2 days             8080/tcp                                                                                       cvat_worker_import
57a1d9beae70   cvat/server:v2.10.2                         "./backend_entrypoin…"   3 months ago   Up 2 days             8080/tcp                                                                                       cvat_utils
b719a17c7ab3   cvat/server:v2.10.2                         "./backend_entrypoin…"   3 months ago   Up 2 days             8080/tcp                                                                                       cvat_worker_analytics_reports
db6e950d799a   cvat/server:v2.10.2                         "./backend_entrypoin…"   3 months ago   Up 2 days             8080/tcp                                                                                       cvat_worker_export
8faca5315bed   timberio/vector:0.26.0-alpine               "/usr/local/bin/vect…"   3 months ago   Up 2 days                                                                                                            cvat_vector
9eccec23bd9d   traefik:v2.10                               "/entrypoint.sh trae…"   3 months ago   Up 2 days             0.0.0.0:8080->8080/tcp, :::8080->8080/tcp, 80/tcp, 0.0.0.0:8090->8090/tcp, :::8090->8090/tcp   traefik
88e8c039a55b   apache/kvrocks:2.7.0                        "kvrocks -c /var/lib…"   3 months ago   Up 2 days (healthy)   6666/tcp                                                                                       cvat_redis_ondisk
b7b44c5d9557   redis:7.2.3-alpine                          "docker-entrypoint.s…"   3 months ago   Up 2 days             6379/tcp                                                                                       cvat_redis_inmem
be87ecfe416d   clickhouse/clickhouse-server:23.11-alpine   "/entrypoint.sh"         3 months ago   Up 2 days             8123/tcp, 9000/tcp, 9009/tcp                                                                   cvat_clickhouse
3e39d6d1da34   postgres:15-alpine                          "docker-entrypoint.s…"   3 months ago   Up 2 days             5432/tcp                                                                                       cvat_db
cdca097c27e8   openpolicyagent/opa:0.45.0-rootless         "/opa run --server -…"   3 months ago   Up 2 days                                                                                                            cvat_opa
# docker --version
Docker version 23.0.1, build a5ee5b1
/ # docker images
REPOSITORY                                                                  TAG               IMAGE ID       CREATED         SIZE
cvat.pth.dschoerk.transt                                                    latest            5ff0598b1ad0   3 months ago    1.42GB
cvat.openvino.omz.public.mask_rcnn_inception_resnet_v2_atrous_coco          latest            35dbf34336f1   3 months ago    1.94GB
cvat.openvino.omz.public.mask_rcnn_inception_resnet_v2_atrous_coco.base     latest            a361c0d353cf   3 months ago    1.88GB
cvat.openvino.omz.public.faster_rcnn_inception_resnet_v2_atrous_coco        latest            0a6bf6da7cd4   3 months ago    1.73GB
cvat.openvino.omz.public.faster_rcnn_inception_resnet_v2_atrous_coco.base   latest            4c60dc98baff   3 months ago    1.67GB
cvat.openvino.omz.intel.text-detection-0004                                 latest            cd102ee7c8c6   3 months ago    1.5GB
cvat.openvino.omz.intel.text-detection-0004.base                            latest            527ab1dfda6e   3 months ago    1.45GB
cvat.openvino.omz.intel.semantic-segmentation-adas-0001                     latest            4a67fc92e65f   3 months ago    1.51GB
cvat.openvino.omz.intel.semantic-segmentation-adas-0001.base                latest            4b83055c9665   3 months ago    1.46GB
cvat.openvino.omz.intel.person-reidentification-retail-0277                 latest            045ada2caecb   3 months ago    1.63GB
cvat.openvino.omz.intel.person-reidentification-retail-0277.base            latest            57e5f5b3dbfc   3 months ago    1.57GB
cvat.openvino.omz.intel.face-detection-0205                                 latest            b553f4a167a8   3 months ago    1.51GB
cvat.openvino.omz.intel.face-detection-0205.base                            latest            beedfac5e088   3 months ago    1.46GB
cvat.openvino.dextr                                                         latest            93d12e723174   3 months ago    1.68GB
cvat.openvino.dextr.base                                                    latest            d5eddecadb29   3 months ago    1.63GB
cvat.onnx.wongkinyiu.yolov7                                                 latest            e443f3c05b37   3 months ago    770MB
cvat.openvino.base                                                          latest            f8c819411853   3 months ago    1.43GB
traefik                                                                     v2.10             ee69e8120b64   4 months ago    153MB
gcr.io/iguazio/alpine                                                       3.17              eaba187917cc   4 months ago    7.06MB
cvat/ui                                                                     v2.10.2           a83357de1feb   4 months ago    143MB
cvat/server                                                                 v2.10.2           ee6648bb036d   4 months ago    3.02GB
clickhouse/clickhouse-server                                                23.11-alpine      ddd2efb58fe7   4 months ago    910MB
postgres                                                                    15-alpine         478703aef7f8   5 months ago    240MB
apache/kvrocks                                                              2.7.0             373063f3f9d4   5 months ago    37.3MB
redis                                                                       7.2.3-alpine      d2d4688fcebe   5 months ago    41MB
grafana/grafana-oss                                                         10.1.2            31656ec60d2e   8 months ago    391MB
quay.io/nuclio/handler-builder-python-onbuild                               1.11.24-amd64     94caa75b7738   10 months ago   55.9MB
quay.io/nuclio/dashboard                                                    1.11.24-amd64     86a4ab0cb6f4   10 months ago   250MB
cvat/server                                                                 latest            19ef97c9cc19   17 months ago   4.71GB
timberio/vector                                                             0.26.0-alpine     d8ecc9831523   18 months ago   122MB
openpolicyagent/opa                                                         0.45.0-rootless   8723f2dc306a   19 months ago   84.3MB
cvat/ui                                                                     v2.2.0            822d202cfca2   20 months ago   51.2MB
cvat/server                                                                 v2.2.0            0dba6fa26ad3   20 months ago   4.63GB
postgres                                                                    10                1cad456b3a24   23 months ago   202MB
openvino/cvat_server                                                        latest            041f75bb1d7e   2 years ago     5.95GB
cvat_kibana                                                                 latest            5f2f95ad9ef4   2 years ago     493MB
cvat_logstash                                                               latest            a8ea37ce806a   2 years ago     674MB
cvat_elasticsearch                                                          latest            cac2fd48f1aa   2 years ago     678MB
postgres                                                                    10-alpine         2c86947136ab   2 years ago     79.9MB
openvino/cvat_ui                                                            v1.7.0            2b45ff0ccf48   2 years ago     49MB
openvino/cvat_server                                                        v1.7.0            7720e30a355d   2 years ago     4.71GB
openpolicyagent/opa                                                         0.34.2-rootless   f85ee8a15a91   2 years ago     71.9MB
traefik                                                                     v2.4              de1a7c9d5d63   2 years ago     92MB
redis                                                                       4.0-alpine        e3dd0e49bca5   4 years ago     20.4MB
quay.io/nuclio/uhttpc                                                       0.0.1-amd64       5c59b3d31aa8   6 years ago     3.96MB
@ralwing ralwing added the bug Something isn't working label May 29, 2024
@bsekachev
Copy link
Member

bsekachev commented May 29, 2024

https://stackoverflow.com/questions/64783283/connectionerror-error-104-while-writing-to-socket-connection-reset-by-peer

Try to set chunk size less than default (for example 16 images).
One more suggestion is to disable "Use cache" option. It will make task creating longer.

@ralwing
Copy link
Author

ralwing commented May 29, 2024

I have other sets of pointclouds which are way bigger than this (10GB), and they are uploaded and opened fine.

@ralwing
Copy link
Author

ralwing commented May 29, 2024

Redis logs:

I20240529 07:42:07.455459   104 redis_connection.cc:88] [connection] Failed to tokenize the request. Error: Protocol error: invalid bulk length
I20240529 07:59:59.830586   103 redis_connection.cc:88] [connection] Failed to tokenize the request. Error: Protocol error: invalid bulk length
I20240529 08:12:58.690346   102 redis_connection.cc:88] [connection] Failed to tokenize the request. Error: Protocol error: invalid bulk length
I20240529 08:13:03.894992   105 redis_connection.cc:88] [connection] Failed to tokenize the request. Error: Protocol error: invalid bulk length
I20240529 08:28:58.435904   104 redis_connection.cc:88] [connection] Failed to tokenize the request. Error: Protocol error: invalid bulk length
I20240529 09:23:53.568164   104 redis_connection.cc:88] [connection] Failed to tokenize the request. Error: Protocol error: invalid bulk length
I20240529 09:24:00.185640   102 redis_connection.cc:88] [connection] Failed to tokenize the request. Error: Protocol error: invalid bulk length
I20240529 09:43:35.570396   106 redis_connection.cc:88] [connection] Failed to tokenize the request. Error: Protocol error: invalid bulk length

@ralwing
Copy link
Author

ralwing commented May 29, 2024

Shrinking the buffer didn't help.
Although disabling no-cache option helps, it triples the job size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants