Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduled email worker is reporting an error #9498

Closed
3 tasks done
sweileh opened this issue Apr 9, 2020 · 5 comments
Closed
3 tasks done

Scheduled email worker is reporting an error #9498

sweileh opened this issue Apr 9, 2020 · 5 comments

Comments

@sweileh
Copy link

sweileh commented Apr 9, 2020

I have an issue with scheduled email of superset.

After installing superset in containers using docker-compose, everything is working perfectly except for the scheduled email feature.
After following exactly what is in the superset installation and configuration page (regarding the report emails), the worker that does the email task is reporting the below error.

Expected results

What is expected is for the worker to successfully send the email.

Actual results

When I send it as test email I get the below error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/celery/app/trace.py", line 385, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/app/superset/app.py", line 114, in __call__
    return task_base.__call__(self, *args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/celery/app/trace.py", line 650, in __protected_call__
    return self.run(*args, **kwargs)
  File "/app/superset/tasks/schedules.py", line 372, in schedule_email_report
    deliver_dashboard(schedule)
  File "/app/superset/tasks/schedules.py", line 210, in deliver_dashboard
    driver = create_webdriver()
  File "/app/superset/tasks/schedules.py", line 155, in create_webdriver
    options.add_argument("--headless")
UnboundLocalError: local variable 'options' referenced before assignment

When I wait for the cron to occur, I got the following error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/celery/app/trace.py", line 385, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/app/superset/app.py", line 114, in __call__
    return task_base.__call__(self, *args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/celery/app/trace.py", line 650, in __protected_call__
    return self.run(*args, **kwargs)
  File "/app/superset/tasks/schedules.py", line 432, in schedule_hourly
    schedule_window(ScheduleType.dashboard.value, start_at, stop_at, resolution)
  File "/app/superset/tasks/schedules.py", line 414, in schedule_window
    schedule.crontab, start_at, stop_at, resolution=resolution
  File "/app/superset/tasks/schedules.py", line 392, in next_schedules
    if eta - previous < timedelta(seconds=resolution):
TypeError: unsupported type for timedelta seconds component: str

Screenshots

Screenshot from 2020-04-09 23-32-57

Screenshot from 2020-04-09 23-26-24

This error is when I check the test email.
image

this error is when I wait for the cron to occur.
Screenshot from 2020-04-10 01-04-51

How to reproduce the bug

Used the following Dockerfile (same as original but added the webdrivers to it and exposed 8088 port and 5555 port for flower).

######################################################################
# PY stage that simply does a pip install on our requirements
######################################################################
ARG PY_VER=3.6.9
FROM python:${PY_VER} AS superset-py

RUN mkdir /app \
        && apt-get update -y \
        && apt-get install -y --no-install-recommends \
            build-essential \
            default-libmysqlclient-dev \
            libpq-dev \
            libgtk-3-0 xvfb firefox-esr \
        && rm -rf /var/lib/apt/lists/*
RUN wget https://github.com/mozilla/geckodriver/releases/download/v0.24.0/geckodriver-v0.24.0-linux64.tar.gz
RUN tar -x geckodriver -zf geckodriver-v0.24.0-linux64.tar.gz -O > /usr/bin/geckodriver

RUN chmod +x /usr/bin/geckodriver
RUN rm geckodriver-v0.24.0-linux64.tar.gz
RUN wget -q "https://chromedriver.storage.googleapis.com/79.0.3945.36/chromedriver_linux64.zip" -O /tmp/chromedriver.zip \
    && unzip /tmp/chromedriver.zip -d /usr/bin/ \
    && rm /tmp/chromedriver.zip

# First, we just wanna install requirements, which will allow us to utilize the cache
# in order to only build if and only if requirements change
COPY ./requirements.txt /app/
RUN cd /app \
        && pip install --no-cache -r requirements.txt

######################################################################
# Node stage to deal with static asset construction
######################################################################
FROM node:10-jessie AS superset-node

ARG NPM_BUILD_CMD="build"
ENV BUILD_CMD=${NPM_BUILD_CMD}

# NPM ci first, as to NOT invalidate previous steps except for when package.json changes
RUN mkdir -p /app/superset-frontend
RUN mkdir -p /app/superset/assets
COPY ./docker/frontend-mem-nag.sh /
COPY ./superset-frontend/package* /app/superset-frontend/
RUN /frontend-mem-nag.sh \
        && cd /app/superset-frontend \
        && npm ci

# Next, copy in the rest and let webpack do its thing
COPY ./superset-frontend /app/superset-frontend
# This is BY FAR the most expensive step (thanks Terser!)
RUN cd /app/superset-frontend \
        && npm run ${BUILD_CMD} \
        && rm -rf node_modules

######################################################################
# Final lean image...
######################################################################
ARG PY_VER=3.6.9
FROM python:${PY_VER} AS lean

ENV LANG=C.UTF-8 \
    LC_ALL=C.UTF-8 \
    FLASK_ENV=production \
    FLASK_APP="superset.app:create_app()" \
    PYTHONPATH="/app/pythonpath" \
    SUPERSET_HOME="/app/superset_home" \
    SUPERSET_PORT=8088

RUN useradd --user-group --no-create-home --no-log-init --shell /bin/bash superset \
        && mkdir -p ${SUPERSET_HOME} ${PYTHONPATH} \
        && apt-get update -y \
        && apt-get install -y --no-install-recommends \
            build-essential \
            default-libmysqlclient-dev \
            libpq-dev \
            libgtk-3-0 xvfb firefox-esr \
        && rm -rf /var/lib/apt/lists/*
RUN wget https://github.com/mozilla/geckodriver/releases/download/v0.24.0/geckodriver-v0.24.0-linux64.tar.gz
RUN tar -x geckodriver -zf geckodriver-v0.24.0-linux64.tar.gz -O > /usr/bin/geckodriver

RUN chmod +x /usr/bin/geckodriver
RUN rm geckodriver-v0.24.0-linux64.tar.gz
RUN wget -q "https://chromedriver.storage.googleapis.com/79.0.3945.36/chromedriver_linux64.zip" -O /tmp/chromedriver.zip \
    && unzip /tmp/chromedriver.zip -d /usr/bin/ \


COPY --from=superset-py /usr/local/lib/python3.6/site-packages/ /usr/local/lib/python3.6/site-packages/
# Copying site-packages doesn't move the CLIs, so let's copy them one by one
COPY --from=superset-py /usr/local/bin/gunicorn /usr/local/bin/celery /usr/local/bin/flask /usr/bin/
COPY --from=superset-node /app/superset/static/assets /app/superset/static/assets
COPY --from=superset-node /app/superset-frontend /app/superset-frontend

## Lastly, let's install superset itself
COPY superset /app/superset
COPY setup.py MANIFEST.in README.md /app/
RUN cd /app \
        && chown -R superset:superset * \
        && pip install -e .
RUN Xvfb :10 -ac &
RUN export DISPLAY=:10

COPY ./docker/docker-entrypoint.sh /usr/bin/

WORKDIR /app

USER superset


Used the following configuration file (same as original, but added the schedule email configurations):


import logging
import os

from werkzeug.contrib.cache import FileSystemCache
from celery.schedules import crontab

logger = logging.getLogger()

def get_env_variable(var_name, default=None):
    """Get the environment variable or raise exception."""
    try:
        return os.environ[var_name]
    except KeyError:
        if default is not None:
            return default
        else:
            error_msg = "The environment variable {} was missing, abort...".format(
                var_name
            )
            raise EnvironmentError(error_msg)


DATABASE_DIALECT = get_env_variable("DATABASE_DIALECT")
DATABASE_USER = get_env_variable("DATABASE_USER")
DATABASE_PASSWORD = get_env_variable("DATABASE_PASSWORD")
DATABASE_HOST = get_env_variable("DATABASE_HOST")
DATABASE_PORT = get_env_variable("DATABASE_PORT")
DATABASE_DB = get_env_variable("DATABASE_DB")

# The SQLAlchemy connection string.
SQLALCHEMY_DATABASE_URI = "%s://%s:%s@%s:%s/%s" % (
    DATABASE_DIALECT,
    DATABASE_USER,
    DATABASE_PASSWORD,
    DATABASE_HOST,
    DATABASE_PORT,
    DATABASE_DB,
)

REDIS_HOST = get_env_variable("REDIS_HOST")
REDIS_PORT = get_env_variable("REDIS_PORT")

RESULTS_BACKEND = FileSystemCache('/app/superset_home/sqllab')

class CeleryConfig(object):
    BROKER_URL = "redis://%s:%s/0" % (REDIS_HOST, REDIS_PORT)
    CELERY_IMPORTS = ("superset.sql_lab",)
    CELERY_RESULT_BACKEND = "redis://%s:%s/1" % (REDIS_HOST, REDIS_PORT)
    CELERY_TASK_PROTOCOL = 1
    CELERY_ANNOTATIONS = {
        'sql_lab.get_sql_results': {
            'rate_limit': '100/s',
        },
        'email_reports.send': {
            'rate_limit': '1/s',
            'time_limit': 120,
            'soft_time_limit': 150,
            'ignore_result': True,
        },
    }
    CELERYBEAT_SCHEDULE = {
        'email_reports.schedule_hourly': {
            'task': 'email_reports.schedule_hourly',
            'schedule': crontab(minute=1, hour='*'),
        },
    }


CELERY_CONFIG = CeleryConfig

# email configurations
ENABLE_SCHEDULED_EMAIL_REPORTS = get_env_variable("ENABLE_SCHEDULED_EMAIL_REPORTS")
EMAIL_NOTIFICATIONS = get_env_variable("EMAIL_NOTIFICATIONS")

# smtp server configuration
SMTP_HOST = get_env_variable("SMTP_HOST")
SMTP_STARTTLS = get_env_variable("SMTP_STARTTLS")
SMTP_SSL = get_env_variable("SMTP_SSL")
SMTP_USER = get_env_variable("SMTP_USER")
SMTP_PORT = get_env_variable("SMTP_PORT")
SMTP_PASSWORD = get_env_variable("SMTP_PASSWORD")
SMTP_MAIL_FROM = get_env_variable("SMTP_MAIL_FROM")

# Email reports - minimum time resolution (in minutes) for the crontab
EMAIL_REPORTS_CRON_RESOLUTION = get_env_variable("EMAIL_REPORTS_CRON_RESOLUTION")

# Email report configuration
# From address in emails
EMAIL_REPORT_FROM_ADDRESS = get_env_variable("EMAIL_REPORT_FROM_ADDRESS")

# Send bcc of all reports to this address. Set to None to disable.
# This is useful for maintaining an audit trail of all email deliveries.
EMAIL_REPORT_BCC_ADDRESS = get_env_variable("EMAIL_REPORT_BCC_ADDRESS")

# User credentials to use for generating reports
# This user should have permissions to browse all the dashboards and
# slices.
# TODO: In the future, login as the owner of the item to generate reports
EMAIL_REPORTS_USER = get_env_variable("EMAIL_REPORTS_USER")
EMAIL_REPORTS_SUBJECT_PREFIX = get_env_variable("EMAIL_REPORTS_SUBJECT_PREFIX")
EMAIL_REPORTS_WEBDRIVER = get_env_variable("EMAIL_REPORTS_WEBDRIVER")

# The base URL to query for accessing the user interface
WEBDRIVER_BASEURL = get_env_variable("WEBDRIVER_BASEURL")

#
# Optionally import superset_config_docker.py (which will have been included on
# the PYTHONPATH) in order to allow for local settings to be overridden
#
try:
    from superset_config_docker import *  # noqa
    import superset_config_docker

    logger.info(
        f"Loaded your Docker configuration at " f"[{superset_config_docker.__file__}]"
    )
except ImportError:
    logger.info("Using default Docker config...")

The .env file

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
COMPOSE_PROJECT_NAME=superset

# database configurations (do not modify)
DATABASE_DB=<SOME_DB>
DATABASE_HOST=<SOME_HOST>
DATABASE_PASSWORD=<SOME_PASSWORD>
DATABASE_USER=<SOME_USER>

# database engine specific environment variables
# change the below if you prefers another database engine
DATABASE_PORT=5432
DATABASE_DIALECT=postgresql+psycopg2
POSTGRES_DB=<SOME_DB>
POSTGRES_USER=<SOME_USER>
POSTGRES_PASSWORD=<SOME_PASSWORD>

# Add the mapped in /app/pythonpath_docker which allows devs to override stuff
PYTHONPATH=/app/pythonpath:/app/pythonpath_docker
REDIS_HOST=redis
REDIS_PORT=6379

FLASK_ENV=production
SUPERSET_ENV=production
SUPERSET_LOAD_EXAMPLES=yes

# Define the secert key
SECRET_KEY=<SOME_KEY>

# Email configurations
ENABLE_SCHEDULED_EMAIL_REPORTS=True
EMAIL_NOTIFICATIONS=True

# smtp server configuration
SMTP_HOST="smtp.gmail.com"
SMTP_STARTTLS=False
SMTP_SSL=True
SMTP_USER="notifications"
SMTP_PORT=587
SMTP_PASSWORD="<SOME_PASSWORD>"
SMTP_MAIL_FROM="<SOME_EMAIL>"

# Email reports - minimum time resolution (in minutes) for the crontab
EMAIL_REPORTS_CRON_RESOLUTION=15

# Email report configuration
# From address in emails
EMAIL_REPORT_FROM_ADDRESS="<SOME_EMAIL>"

# Send bcc of all reports to this address. Set to None to disable.
# This is useful for maintaining an audit trail of all email deliveries.
EMAIL_REPORT_BCC_ADDRESS="<SOME_EMAIL>"

# User credentials to use for generating reports
# This user should have permissions to browse all the dashboards and
# slices.
# TODO: In the future, login as the owner of the item to generate reports
EMAIL_REPORTS_USER="<SOME_ADMIN_USER>"
EMAIL_REPORTS_SUBJECT_PREFIX="[Superset Report] "
EMAIL_REPORTS_WEBDRIVER="firefox"

# Window size - this will impact the rendering of the data
WEBDRIVER_WINDOW={"dashboard": (1600, 2000), "slice": (3000, 1200)}

# The base URL to query for accessing the user interface
WEBDRIVER_BASEURL="http://0.0.0.0:8088/"

I tried both web drivers (firefox and chrome) and got the same error with both

Finally, below is my docker-compose file (similar to original file but with extra worker, celery beat for email and flower added)

x-superset-build: &superset-build
  args:
    NPM_BUILD_CMD: build-dev
  context: ./
  dockerfile: Dockerfile
  target: dev
x-superset-depends-on: &superset-depends-on
  - redis
x-superset-volumes: &superset-volumes
  # /app/pythonpath_docker will be appended to the PYTHONPATH in the final container
  - ./docker/docker-init.sh:/app/docker-init.sh
  - ./docker/pythonpath_dev:/app/pythonpath
  - ./superset:/app/superset
  - ./superset-frontend:/app/superset-frontend
  - superset_home:/app/superset_home

version: "3.7"
services:
  redis:
    image: redis:3.2
    restart: unless-stopped
    ports:
      - "127.0.0.1:6379:6379"
    volumes:
      - redis:/data

  superset:
    build: *superset-build
    env_file: docker/.env
    restart: unless-stopped
    ports:
      - 8088:8088
    depends_on: *superset-depends-on
    volumes: *superset-volumes

  superset-init:
    build: *superset-build
    command: ["/app/docker-init.sh"]
    env_file: docker/.env
    depends_on: *superset-depends-on
    volumes: *superset-volumes

  superset-node:
    image: node:10-jessie
    command: ["bash", "-c", "cd /app/superset-frontend && npm install --global webpack webpack-cli && npm install && npm run dev"]
    env_file: docker/.env
    depends_on: *superset-depends-on
    volumes: *superset-volumes

  superset-worker-1:
    build: *superset-build
    command: ["celery", "worker", "--pool=prefork", "--app=superset.tasks.celery_app:app", "-Ofair" , "-c 4"]
    env_file: docker/.env
    restart: unless-stopped
    depends_on: *superset-depends-on
    volumes: *superset-volumes

  superset-worker-2:
    build: *superset-build
    command: ["celery", "worker", "--pool=prefork", "--app=superset.tasks.celery_app:app", "-Ofair" , "-c 4"]
    env_file: docker/.env
    restart: unless-stopped
    depends_on: *superset-depends-on
    volumes: *superset-volumes

  superset-beat:
    build: *superset-build
    command: ["celery", "beat", "--app=superset.tasks.celery_app:app"]
    user: root
    env_file: docker/.env
    restart: unless-stopped
    depends_on: *superset-depends-on
    volumes: *superset-volumes

  superset-flower:
    build: *superset-build
    command: ["celery", "flower", "--app=superset.tasks.celery_app:app", "-Ofair"]
    env_file: docker/.env
    restart: unless-stopped
    depends_on: *superset-depends-on
    ports:
      - 5555:5555
    volumes: *superset-volumes


volumes:
  superset_home:
    external: false
  redis:
    external: false

Environment

all the versions are what is exactly in the master.

Checklist

Make sure these boxes are checked before submitting your issue - thank you!

  • I have checked the superset logs for python stacktraces and included it here as text if there are any.
  • I have reproduced the issue with at least the latest released version of superset.
  • I have checked the issue tracker for the same issue and I haven't found one similar.

Additional context

ps. I think it might be in the email report configurations, but I have been trying to solve it for almost 2 days. I though it might be the webdrivers but the code does not reach that point. The unbounded local error occur when the code tries to set the webdriver instance to headless.

I also changed the docker file to install chrome as in miteshchavda reply here and I also changed the driver to chrome, but I still got the same issue.

Thanks in advance

@sweileh
Copy link
Author

sweileh commented Apr 25, 2020

I'm closing this since I got no response.

@caldweln
Copy link

@sweileh were you able to resolve this in the end? I'm experiencing this issue too.

@bkyryliuk
Copy link
Member

after reading the code: https://github.com/apache/incubator-superset/blob/master/superset/tasks/schedules.py#L170

looks like EMAIL_REPORTS_WEBDRIVER is not set to chrome or firefox - that's causing the exception

@sweileh
Copy link
Author

sweileh commented Jun 23, 2020

Sorry for the late response.
No I did not fix this issue. I'm planning to do it externally since I need the email to be send based on a trigger.
As for @bkyryliuk, I did set the driver as show above to firefox
EMAIL_REPORTS_WEBDRIVER="firefox"

@Hemanthdev
Copy link

I am also facing the Email issue on the superset SMTP configuration

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants