Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-4084] Fix bug downloading incomplete logs from ElasticSearch #5177

Merged
merged 1 commit into from
May 14, 2019

Conversation

cong-zhu
Copy link
Contributor

@cong-zhu cong-zhu commented Apr 24, 2019

Jira

Description

  • Here are some details about my PR, including screenshots of any UI changes:
    • Fix bug downloading incomplete logs from ElasticSearch
    • Use streaming logs download for memory efficiency

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:
    • Use existing unit tests as this is not a new feature

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

Code Quality

  • Passes flake8

@pingzh

@cong-zhu cong-zhu force-pushed the congzhu-fix-es-download-logs branch 2 times, most recently from 9050587 to 288437f Compare April 25, 2019 06:31
@codecov-io
Copy link

codecov-io commented Apr 25, 2019

Codecov Report

Merging #5177 into master will increase coverage by <.01%.
The diff coverage is 87.09%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5177      +/-   ##
==========================================
+ Coverage   78.69%   78.69%   +<.01%     
==========================================
  Files         472      472              
  Lines       30082    30103      +21     
==========================================
+ Hits        23673    23691      +18     
- Misses       6409     6412       +3
Impacted Files Coverage Δ
airflow/utils/log/es_task_handler.py 97.36% <80%> (-2.64%) ⬇️
airflow/www/views.py 76.19% <90.47%> (+0.11%) ⬆️
airflow/models/taskinstance.py 92.59% <0%> (+0.17%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 118492e...3f5c600. Read the comment docs.

@cong-zhu cong-zhu force-pushed the congzhu-fix-es-download-logs branch from 288437f to dc2d03c Compare April 25, 2019 17:24
@feng-tao
Copy link
Member

cc @KevinYang21

@KevinYang21
Copy link
Member

KevinYang21 commented Apr 28, 2019

@cong-zhu Do we have unit test for this feature before? If so we'll still need to update it so we capture the use case which this PR is fixing, if not we might want to add that test. Also the CI is failing now, if you're still working on it then you can mark the PR as a WIP PR by putting a [WIP] into the PR title.

@cong-zhu cong-zhu changed the title [AIRFLOW-4084] Fix bug downloading incomplete logs from ElasticSearch [AIRFLOW-4084][WIP] Fix bug downloading incomplete logs from ElasticSearch May 1, 2019
@cong-zhu
Copy link
Contributor Author

cong-zhu commented May 1, 2019

@KevinYang21 Yes, we have unit test for this log download feature, though it's not testing against ElasticSearch. CI failed because of python version. I marked this PR as WIP.

@cong-zhu cong-zhu force-pushed the congzhu-fix-es-download-logs branch 2 times, most recently from 73d38a6 to d13d81c Compare May 2, 2019 21:38
@cong-zhu
Copy link
Contributor Author

cong-zhu commented May 2, 2019

@KevinYang21 PTAL, I fixed CI failure and added unit tests.

@cong-zhu cong-zhu changed the title [AIRFLOW-4084][WIP] Fix bug downloading incomplete logs from ElasticSearch [AIRFLOW-4084] Fix bug downloading incomplete logs from ElasticSearch May 13, 2019
try:
metadata['max_offset'] = s[max_log_line - 1].execute()[-1].offset if max_log_line > 0 else 0
except Exception as e:
msg = 'Could not get current log size with log_id: {}, ' \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think we can just pass e to exception and log the msg separately if you want to keep it


try:
if ti is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: if ti:

@pingzh
Copy link
Contributor

pingzh commented May 13, 2019

LGTM

@cong-zhu cong-zhu force-pushed the congzhu-fix-es-download-logs branch from d13d81c to 3f5c600 Compare May 13, 2019 23:30
@cong-zhu
Copy link
Contributor Author

@KevinYang21, I updated the way to handle exception. This PR has been verified on my local.

@KevinYang21 KevinYang21 merged commit 56a038a into apache:master May 14, 2019
JCoder01 pushed a commit to JCoder01/airflow that referenced this pull request May 14, 2019
ashb pushed a commit that referenced this pull request Jun 7, 2019
andriisoldatenko pushed a commit to andriisoldatenko/airflow that referenced this pull request Jul 26, 2019
wmorris75 pushed a commit to modmed/incubator-airflow that referenced this pull request Jul 29, 2019
dharamsk pushed a commit to postmates/airflow that referenced this pull request Aug 8, 2019
@mik-laj
Copy link
Member

mik-laj commented Nov 23, 2019

I wonder why this change had to make changes to the views.py file. Why were the Task Handler not updated? In my opinion, we should check the existence of the download_logs key in the implementation of the handler logic and then disable the pagination mechanism. Now the abstraction of code is running away and it is possible that we are breaking other handlers because they expected different behavior.

I am working on documentation for this class and if we revert this change we will be able to do it as follows. In my opinion this is the original behavior of this code.

class TaskHandler(logging.Handler, ABC):
    """
    Handler that allows you to write and read information about a specific task.
    """
    @abstractmethod
    def read(
        self, task_instance: TaskInstance, try_number: Optional[int] = None, metadata: Optional[Dict] = None
    ) -> Tuple[List[str], List[Dict]]:
        """
        Read logs of given task instance.

        It supports log pagination. To do this, the first call to this function contains an empty metadata
        object. As a result, list of logs and list of metadata should be returned. The resulting
        metadata should contain the key ``end_of_logs``, which determines whether pagination should be
        continued. It is possible to return more metadata objects, but only the first is used, so
        you should always return a list with one item.

        The remaining keys in the dictionary are sent back to the method without changes, which means that
        if you add an additional key with a token or page number, you can expect that the key will be
        available in the next request for logs.

        If the metadata in the call contains the ``download_logs'' key, then full logs should be
        returned without pagination.

        :param task_instance: task instance object
        :param try_number: task instance try_number to read logs from. If None
           it returns all logs separated by try_number
        :param metadata: log metadata,
            can be used for steaming log reading and auto-tailing.
        :return: a list of logs and list of metadata objects.
        """
        ...

    @abstractmethod
    def set_context(self, task_instance: TaskInstance) -> None:
        """
        Provide task_instance context to airflow task handler.

        Different implementations provide different behavior. Examples of behavior are:

        * in the case of handlers writing to a file, it may start writing to another file;
        * for remote services, it can start adding labels to logs.

        This allows us to later search for logs for a single task

        :param task_instance: task instance object
        """
        ...

I would like to point out that in this PR there is a duplicate mechanism of handling the case when try_numbers is empty, i.e. the log for all try numbers is downloaded.

Previously it was part of the file_task_handler handler, and now it has been copied a second time to another place despite the es_task_handler extended from this handler.
https://github.com/apache/airflow/blob/master/airflow/utils/log/file_task_handler.py#L155-L164
https://github.com/apache/airflow/blob/master/airflow/utils/log/es_task_handler.py#L36
https://github.com/apache/airflow/blob/master/airflow/www/views.py#L595-L599

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants