-
Notifications
You must be signed in to change notification settings - Fork 13.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIRFLOW-620] Feature to tail custom number of logs instead of rendering whole log #3992
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3992 +/- ##
=========================================
+ Coverage 73.54% 74.04% +0.5%
=========================================
Files 421 421
Lines 27724 27790 +66
=========================================
+ Hits 20389 20577 +188
+ Misses 7335 7213 -122
Continue to review full report at Codecov.
|
@phani8996 can you post some gifs on this feature in work? |
@msumit I am not able to upload gif. I am attaching images for reference. |
@msumit This isn't the equivalent of |
airflow/bin/cli.py
Outdated
lines = fl.readlines() | ||
fl.close() | ||
out = "".join(l for l in lines[-num_lines:]) | ||
line = "***** Showing only last {num_lines} lines from {filename} *****" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Messages like this don't belong in here - that is something that the FileTaskHandler etc should add.
airflow/www/views.py
Outdated
num_lines = conf.getint('webserver', 'num_lines') | ||
metadata["num_lines"] = num_lines | ||
if num_lines in tail_lines_list: | ||
tail_lines_list.remove(num_lines) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this conditional for? I can't work it out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's to avoid duplicates. lets say user have added in config that dropdown should contain entries 100,200,500,1000 and user picked 100 as default number of lines to tail when user lands on get_logs page. Without this condition airflow ending showing 2 entries for 100 in dropdown. This will ensure no duplicates in such case
airflow/www/views.py
Outdated
tailing_required = conf.getboolean('webserver', 'tail_logs') | ||
if tailing_required and conf.has_option('webserver', 'num_lines'): | ||
num_lines = conf.getint('webserver', 'num_lines') | ||
metadata["num_lines"] = num_lines |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please find a way of re-writing this. The nested conditions and checks make it hard to see what the intent here is. Perhaps some comment saying when/why we do things would help me understand this code.
I'm also notsure what the webserver -> tail_logs setting is for. Moving that to default config means you can add a comment to it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am essentially using three fields for this feature. Boolean flag tail_logs
in webserver
to turn on and off this feature. tail_lines_list
in webserver
to show default entries in dropdown and finally num_lines
in webserver
to pick default number of lines to tail when user lands on get_logs
page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relevant comments have been added. @ashb please review.
That's true sumit. This feature doesn't tail logs in fly mode. I am planning to add that in a separate PR as this brings a lot of changes to current log viewing mechanism and this maybe unstable and experimental. |
except Exception as e: | ||
log = "*** Failed to load local log file: {}\n".format(location) | ||
log += "*** {}\n".format(str(e)) | ||
if num_lines: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a lot of duplicate logic in this one with the tail_logs
function.
# Default number of lines to tail when page loads | ||
num_lines = 100 | ||
# Entries to be shown in dropdown | ||
tail_lines_list = 50,100,200,500,1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove these options, since they are also hardcoded in Javascript.
# Flag to control tailing logs | ||
tail_logs = True | ||
# Default number of lines to tail when page loads | ||
num_lines = 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe up this a bit? A proper Java stacktrace can be bigger than 100 lines.
airflow/bin/cli.py
Outdated
logpath = "{logdir}/{filename}".format(logdir=logdir, filename=filename) | ||
logsize = os.path.getsize(logpath) | ||
if logsize >= 100 * 1024 * 1024: | ||
p1 = subprocess.Popen(["tail", "-n " + str(num_lines), filename], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need a subprocess for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For bigger log files reading whole file and returning last few lines is impractical. Based on file size we will rendering by tail or file reading in python
Tests seems to be failing at places unrelated to my code changes. Do anyone has any idea what is causing and how to fix it? |
@phani8996 Can you rebase onto master? To make sure that you're up to date. |
It is upto date with master. |
@phani8996 can you rebase? |
…into tail_logs
@ashb Can you help out with travis. I don't understand why 2 test runs are failing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feedback based on reading the python code.
I haven't looked at the JS yet, and I'd like to look at it in a browser before we merge this.
airflow/utils/helpers.py
Outdated
if current_position == 0: | ||
return None | ||
fl.seek(-num_bytes, 1) | ||
data = str(fl.read(num_bytes), 'utf-8', 'ignore') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than doing this we should probably open the file in text mode in the first place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opening file in text mode won't allow seeking wrt current position or end position. Text mode allows seeking only wrt file starting.
@@ -317,3 +317,45 @@ def render_log_filename(ti, try_number, filename_template): | |||
task_id=ti.task_id, | |||
execution_date=ti.execution_date.isoformat(), | |||
try_number=try_number) | |||
|
|||
|
|||
def tail_file(filepath, lines): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to see a test of this function please, at least covering one happy/working case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added. Please check
@@ -334,7 +334,7 @@ def get_byte_range_file(fl, num_bytes): | |||
if current_position == 0: | |||
return None | |||
fl.seek(-num_bytes, os.SEEK_CUR) | |||
data = str(fl.read(num_bytes), 'utf-8', 'ignore') | |||
data = fl.read(num_bytes).decode('utf-8') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need for Python2.7 support. It has been removed from the master branch
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Dear Airflow Maintainers,
This is a change in the way we render logs. This PR will give a dropdown of number of lines to be tailed with a refresh button. On refresh webserver fetch selected number of lines from local worker logs or remote worker logs. On can toggle this feature using config variable tail_logs in webserver, default number of lines to tail via config variable num_lines in webserver and items in dropdown can be controlled via config variable tail_lines_list in webserver in csv format
Please accept this PR that addresses the following issues:
https://issues.apache.org/jira/browse/AIRFLOW-620
Testing Done:
Manually tested on all major browsers.
This will add a dropdown of number of lines to be tailed and refresh button to get latest logs instead of refreshing the whole page.