Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urllib3.exceptions.ProtocolError: Connection broken: IncompleteRead #170

Closed
phretor opened this issue Jul 1, 2022 · 2 comments
Closed
Labels
improvement New feature or request
Milestone

Comments

@phretor
Copy link

phretor commented Jul 1, 2022

I'm running the following consumer on 20 replicas:

# rulematcher/rulematcher.py
...
    def process(self, task: Task) -> None:  # type: ignore
        headers = task.headers
        sample: ResourceBase = task.get_resource("sample")
        analysis = None

        if headers["type"] == "sample":
            if sample.content is None:
                return None

            log.info("Processing sample %s", sample.metadata.get("sha256"))
...

The if sample.content is None line randomly triggers this error:

[
        "Traceback (most recent call last):\n",
        "  File \"/usr/local/lib/python3.7/site-packages/urllib3/response.py\", line 441, in _error_catcher\n    yield\n",
        "  File \"/usr/local/lib/python3.7/site-packages/urllib3/response.py\", line 518, in read\n    data = self._fp.read() if not fp_closed else b\"\"\n",
        "  File \"/usr/local/lib/python3.7/http/client.py\", line 478, in read\n    s = self._safe_read(self.length)\n",
        "  File \"/usr/local/lib/python3.7/http/client.py\", line 630, in _safe_read\n    raise IncompleteRead(b''.join(s), amt)\n",
        "http.client.IncompleteRead: IncompleteRead(2097152 bytes read, 15508683 more expected)\n",
        "\nDuring handling of the above exception, another exception occurred:\n\n",
        "Traceback (most recent call last):\n",
        "  File \"/usr/local/lib/python3.7/site-packages/karton/core/karton.py\", line 178, in internal_process\n    self.process(self.current_task)\n",
        "  File \"/usr/local/lib/python3.7/site-packages/karton/rulematcher/rulematcher.py\", line 260, in process\n    if sample.content is None:\n",
        "  File \"/usr/local/lib/python3.7/site-packages/karton/core/resource.py\", line 413, in content\n    return self.download()\n",
        "  File \"/usr/local/lib/python3.7/site-packages/karton/core/resource.py\", line 467, in download\n    self._content = self.backend.download_object(self.bucket, self.uid)\n",
        "  File \"/usr/local/lib/python3.7/site-packages/karton/core/backend.py\", line 600, in download_object\n    return reader.read()\n",
        "  File \"/usr/local/lib/python3.7/site-packages/urllib3/response.py\", line 544, in read\n    raise IncompleteRead(self._fp_bytes_read, self.length_remaining)\n",
        "  File \"/usr/local/lib/python3.7/contextlib.py\", line 130, in __exit__\n    self.gen.throw(type, value, traceback)\n",
        "  File \"/usr/local/lib/python3.7/site-packages/urllib3/response.py\", line 458, in _error_catcher\n    raise ProtocolError(\"Connection broken: %r\" % e, e)\n",
        "urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(2097152 bytes read, 15508683 more expected)', IncompleteRead(2097152 bytes read, 15508683 more expected))\n"
    ]

This seems to be the culprit karton/backend.py:

    def download_object(self, bucket: str, object_uid: str) -> bytes:
        """
        Download resource object from object storage.

        :param bucket: Bucket name
        :param object_uid: Object identifier
        :return: Content bytes
        """
        reader = self.minio.get_object(bucket, object_uid)
        try:
            return reader.read()
        finally:
            reader.release_conn()
            reader.close()

I have about 1000 active tasks. The MinIO stack is deployed as follows:

version: "3.9"

# Settings and configurations that are common for all containers
x-minio-common: &minio-common
  image: quay.io/minio/minio:RELEASE.2022-05-08T23-50-31Z
  command: server --console-address ":9001" http://minio{1...4}/data{1...2}
  environment:
    MINIO_ROOT_USER: "***"
    MINIO_ROOT_PASSWORD: "***"
  healthcheck:
    test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
    interval: 30s
    timeout: 20s
    retries: 3
  networks:
    - minio_internal
  deploy:
    restart_policy:
      condition: on-failure
    resources:
      limits:
        memory: 512M
        cpus: "0.5"

# starts 4 docker containers running minio server instances.
# using nginx reverse proxy, load balancing, you can access
# it through port 9000.
services:
  minio1:
    <<: *minio-common
    hostname: minio1
    volumes:
      - /data/shares/stor04/minio/data1-1:/data1
      - /data/shares/stor04/minio/data1-2:/data3

  minio2:
    <<: *minio-common
    hostname: minio2
    volumes:
      - /data/shares/stor04/minio/data2-1:/data1
      - /data/shares/stor04/minio/data2-2:/data2

  minio3:
    <<: *minio-common
    hostname: minio3
    volumes:
      - /data/shares/stor04/minio/data3-1:/data1
      - /data/shares/stor04/minio/data3-2:/data2

  minio4:
    <<: *minio-common
    hostname: minio4
    volumes:
      - /data/shares/stor04/minio/data4-1:/data1
      - /data/shares/stor04/minio/data4-2:/data2

  nginx:
    image: nginx:alpine
    hostname: minio
    volumes:
      - /data/stacks/minio/nginx.conf:/etc/nginx/nginx.conf:ro
    depends_on:
      - minio1
      - minio2
      - minio3
      - minio4
    networks:
      - minio_internal
      - minio_external
      - public
    ports:
      - 9000:9000
      - 9001:9001
    deploy:
      restart_policy:
        condition: on-failure
      resources:
        limits:
          memory: 256M
          cpus: "0.5"

networks:
  minio_external:
    external: true
    name: minio_external
  minio_internal:
    external: true
    name: minio_internal
  public:
    external: true
    name: public
@psrok1
Copy link
Member

psrok1 commented Jul 1, 2022

Actually it looks like connectivity issues. Sometimes we have them as well with MinIO cluster. As they're usually temporary, I think built-in download/upload retry in that method should be implemented

Related issue: #18

@psrok1 psrok1 added the improvement New feature or request label Jul 15, 2022
@psrok1 psrok1 added this to the 5.0.0 milestone Jul 20, 2022
@psrok1
Copy link
Member

psrok1 commented Jul 22, 2022

Network interruptions should be handled much better by Botocore that will be used instead of minio-py, let's see how it go with v5.0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants