Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory consumption increase #38

Closed
pcorbel opened this issue Nov 5, 2019 · 2 comments
Closed

Memory consumption increase #38

pcorbel opened this issue Nov 5, 2019 · 2 comments

Comments

@pcorbel
Copy link
Contributor

pcorbel commented Nov 5, 2019

Expected Behavior

When running target-redshift for big and long streams extract,
the target is able to load all the data with a stable resources consumption.

Current Behavior

When running target-redshift==0.2.1 for one big and long stream extract on the Python Docker image python:3.7.5-buster,
the target is ultimately killed by Unix because of an OOM error.
Then, the target send a SIGPIPE to the tap, causing a BrokenPipeError when calling singer.write_message()

    sys.stdout.flush()
BrokenPipeError: [Errno 32] Broken pipe

Possible Solution

We either have something like a weird data-shape which is causing us to hang onto old pointers/refs OR there's an honest to god memory leak

Steps to Reproduce

  1. Build the following Docker image with docker build -t memory_test .
FROM python:3.7.5-buster

RUN apt-get update --yes \
  && apt-get upgrade --yes \
  && apt-get install --yes \
    vim \
    htop

RUN pip install target-redshift==0.2.1
RUN echo '#!/bin/bash\n'\
'# Print the schema\n'\
'echo "{\"type\": \"SCHEMA\", \"stream\": \"engagements\", \"schema\": {\"properties\": {\"engagement_id\": {\"type\": [\"null\", \"integer\"]}, \"engagement\": {\"properties\": {\"id\": {\"type\": [\"null\", \"integer\"]}, \"portal_id\": {\"type\": [\"null\", \"integer\"]}, \"active\": {\"type\": [\"null\", \"boolean\"]}, \"created_at\": {\"type\": [\"null\", \"integer\"]}, \"last_updated\": {\"type\": [\"null\", \"integer\"]}, \"created_by\": {\"type\": [\"null\", \"integer\"]}, \"modified_by\": {\"type\": [\"null\", \"integer\"]}, \"owner_id\": {\"type\": [\"null\", \"integer\"]}, \"type\": {\"type\": [\"null\", \"string\"]}, \"timestamp\": {\"type\": [\"null\", \"integer\"]}, \"activity_type\": {\"type\": [\"null\", \"string\"]}}, \"type\": \"object\"}, \"associations\": {\"properties\": {\"contact_ids\": {\"items\": {\"type\": [\"null\", \"integer\"]}, \"type\": [\"null\", \"array\"]}, \"company_ids\": {\"items\": {\"type\": [\"null\", \"integer\"]}, \"type\": [\"null\", \"array\"]}, \"deal_ids\": {\"items\": {\"type\": [\"null\", \"integer\"]}, \"type\": [\"null\", \"array\"]}, \"owner_ids\": {\"items\": {\"type\": [\"null\", \"integer\"]}, \"type\": [\"null\", \"array\"]}, \"workflow_ids\": {\"items\": {\"type\": [\"null\", \"integer\"]}, \"type\": [\"null\", \"array\"]}, \"ticket_ids\": {\"items\": {\"type\": [\"null\", \"integer\"]}, \"type\": [\"null\", \"array\"]}, \"content_ids\": {\"items\": {\"type\": [\"null\", \"integer\"]}, \"type\": [\"null\", \"array\"]}, \"quote_ids\": {\"items\": {\"type\": [\"null\", \"integer\"]}, \"type\": [\"null\", \"array\"]}}, \"type\": [\"null\", \"object\"]}, \"attachments\": {\"items\": {\"properties\": {\"id\": {\"type\": [\"null\", \"integer\"]}}, \"type\": \"object\"}, \"type\": [\"null\", \"array\"]}, \"metadata\": {\"properties\": {\"body\": {\"type\": [\"null\", \"string\"]}, \"from\": {\"properties\": {\"email\": {\"type\": [\"null\", \"string\"]}, \"first_name\": {\"type\": [\"null\", \"string\"]}, \"last_name\": {\"type\": [\"null\", \"string\"]}}, \"type\": [\"null\", \"object\"]}, \"to\": {\"items\": {\"properties\": {\"email\": {\"type\": [\"null\", \"string\"]}}, \"type\": \"object\"}, \"type\": [\"null\", \"array\"]}, \"cc\": {\"items\": {\"properties\": {\"email\": {\"type\": [\"null\", \"string\"]}}, \"type\": \"object\"}, \"type\": [\"null\", \"array\"]}, \"bcc\": {\"items\": {\"properties\": {\"email\": {\"type\": [\"null\", \"string\"]}}, \"type\": \"object\"}, \"type\": [\"null\", \"array\"]}, \"subject\": {\"type\": [\"null\", \"string\"]}, \"html\": {\"type\": [\"null\", \"string\"]}, \"text\": {\"type\": [\"null\", \"string\"]}, \"status\": {\"type\": [\"null\", \"string\"]}, \"for_object_type\": {\"type\": [\"null\", \"string\"]}, \"start_time\": {\"type\": [\"null\", \"integer\"]}, \"end_time\": {\"type\": [\"null\", \"integer\"]}, \"title\": {\"type\": [\"null\", \"string\"]}, \"to_number\": {\"type\": [\"null\", \"string\"]}, \"from_number\": {\"type\": [\"null\", \"string\"]}, \"external_id\": {\"type\": [\"null\", \"string\"]}, \"duration_milliseconds\": {\"type\": [\"null\", \"integer\"]}, \"external_account_id\": {\"type\": [\"null\", \"string\"]}, \"recording_url\": {\"format\": \"uri\", \"type\": [\"null\", \"string\"]}, \"disposition\": {\"type\": [\"null\", \"string\"]}}, \"type\": [\"null\", \"object\"]}}, \"type\": \"object\"}, \"key_properties\": [\"engagement_id\"]}"\n'\
'# Print an infinity of record\n'\
'let i=0 \n'\
'while true; do\n'\
'let i=i+1\n'\
'echo "{\"type\": \"RECORD\", \"stream\": \"engagements\", \"record\": {\"engagement\": {\"id\": ${i}, \"portal_id\": 123456, \"active\": true, \"created_at\": 1502097241136, \"last_updated\": 1566630888209, \"created_by\": 123456, \"modified_by\": 123456, \"owner_id\": 123456, \"type\": \"NOTE\", \"timestamp\": 1502097241136}, \"associations\": {\"contact_ids\": [], \"company_ids\": [], \"deal_ids\": [], \"owner_ids\": [123456], \"workflow_ids\": [], \"ticket_ids\": [], \"content_ids\": [], \"quote_ids\": []}, \"attachments\": [], \"metadata\": {\"body\": \"Hello im a body\"}, \"engagement_id\": ${i}, \"time_extracted\": \"2019-11-05T09:46:28.095053Z\"}}"\n'\
'done' >> /tmp/record_generator.sh

RUN chmod +x /tmp/record_generator.sh

RUN echo '{\n'\
'  "default_column_length": 65535,\n'\
'  "logging_level": "INFO",\n'\
'  "invalid_records_detect": false,\n'\
'  "invalid_records_threshold": 0,\n'\
'  "persist_empty_tables": true,\n'\
'  "max_batch_rows": 100000,\n'\
'  "redshift_host": "your_cluster",\n'\
'  "redshift_port": 5439,\n'\
'  "redshift_database": "your_db",\n'\
'  "redshift_username": "your_user",\n'\
'  "redshift_password": "your_password",\n'\
'  "redshift_schema": "backfill_hubspot",\n'\
'  "target_s3": {\n'\
'    "aws_access_key_id": "your_key",\n'\
'    "aws_secret_access_key": "your_access_key",\n'\
'    "bucket": "your_bucket",\n'\
'    "key_prefix": "your_prefix"\n'\
'  }\n'\
'}\n' >> /tmp/target_config.json
  1. Log into the container in interactive mode with docker run -it memory_test /bin/bash
  2. Update the /tmp/target_config.json file with your own credentials
  3. Run the channel in background with nohup bash -c "/tmp/record_generator.sh | target-redshift --config /tmp/target_config.json" &
  4. Monitor the resource consumption with htop

Context (Environment)

I was trying to backfill a big stream (engagements from the tap-hubspot), but the job always fails

@pcorbel pcorbel closed this as completed Nov 5, 2019
@pcorbel
Copy link
Contributor Author

pcorbel commented Nov 5, 2019

Need further investigation

@AlexanderMann
Copy link
Collaborator

@pcorbel is there a reason you closed this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants