Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include timestamps in the mdsip "errors" log file #2683

Open
mwinkel-dev opened this issue Jan 10, 2024 · 11 comments
Open

Include timestamps in the mdsip "errors" log file #2683

mwinkel-dev opened this issue Jan 10, 2024 · 11 comments
Labels
feature Indicates new feature requests or implementations tool/mdsip Relates to one of the MDSplus over IP tools (mdsipd, mdsipsd, mdsip_server) US Priority

Comments

@mwinkel-dev
Copy link
Contributor

Affiliation
MIT PSFC

Description
Mdsip writes to two log files: access and errors. (These log files are typically located in /var/log/mdsplus/mdsipd.) The access file has timestamps for each log record, however the errors file does not. Having timestamps on the errors log records can be useful for troubleshooting some issues (e.g., thousands of mdsip connections in a day resulting in huge log files).

Example
See the timestamps in the access log file.

Additional context
Might be good to enhance both log files so they can easily be parsed by various tools (Python, spreadsheets, log file analysis tools. and so forth).

@mwinkel-dev mwinkel-dev added feature Indicates new feature requests or implementations tool/mdsip Relates to one of the MDSplus over IP tools (mdsipd, mdsipsd, mdsip_server) labels Jan 10, 2024
@mwinkel-dev
Copy link
Contributor Author

mwinkel-dev commented Feb 12, 2024

@joshStillerman also suggested including the session id for the mdsip connection.

@ModestMC
Copy link

ModestMC commented Feb 13, 2024

Resurfacing this for visibility #2619 . I know it's different functionality, but we also would benefit from improved logging and can weigh in on what we encounter (eg. the ability to attribute who did what that broke something via say the PID for an error and the user whose access is associated with that PID).

@sflanagan
Copy link

Can we also add the username that the mdsip session belongs to into the .errors log?

Trying to associate an error message with the user that produced it is currently quite difficult.

@mwinkel-dev
Copy link
Contributor Author

mwinkel-dev commented Feb 14, 2024

Hi @sflanagan and @ModestMC,

Adding the username and PID are also a great suggestions!

I haven't looked at the code yet, so don't know what is possible.

But I do know that analyzing the existing access and errors logs don't contain enough information to make it easy to troubleshoot mdsip issues. (In recent months, I have spent a lot of time digging through the logs to investigate errors; it is presently a difficult task.)

Let me know if you want this upgraded to a US Priority feature.

Thanks,
-MarkW

@ModestMC
Copy link

@mwinkel-dev I can't speak for Sean offhand, but I do think this could potentially be very useful.

My thought (if you're open to it) is that before we try to decide whether this is something that should be US Priority, it'd be good to know how much work it would be to pull off. Would you be willing to spend 15-30 minutes looking into the underlying code that writes these logs and trying to evaluate how much work it'd be to implement? We know you guys have plenty of things to do. If it turns out this gets ugly, we call it a wash or find a compromise, but if you think it's not to bad I think we could sign on to it.

Cheers,
Mitchell

@mwinkel-dev
Copy link
Contributor Author

Hi @ModestMC,

I glanced at the logging code in November 2024 while researching Issue #2649. At that time, I noticed that the access log file has timestamps, but that the errors log file does not.

Thus, my hunch (perhaps wrong) is that upgrading errors to have timestamps will be straightforward. Just a matter of copying the existing timestamp code from the procedures that write to the access log.

However, adding the other desired attributes (process ID, user, mdsip session ID) might be more difficult. Have not investigated that, so don't know which attributes will be easy to include in the error messages.

-MarkW

@ModestMC
Copy link

Hey @mwinkel-dev,

We agree that it will be some work to incorporate the other desired attributes (process ID, user, session ID, etc) in the log file; is there any reason you couldn't include timestamps at least in the interim? Perfect is the enemy of good.

Thank you for putting in the work that you do.

-Mitchell

@mwinkel-dev
Copy link
Contributor Author

mwinkel-dev commented Apr 25, 2024

Hi @ModestMC,

Thanks for the recent post indicating that GA wants a fix for this issue. I have therefore upgraded this to "US Priority". Work on this issue will start in a day or two.

Because this involves the mdsip service, presumably this fix will also be needed in the ga_atlas branch.

After a preliminary investigation is done, I will update this GitHub issue with two estimates:

  1. days needed to add just a timestamp to the errors log file, and
  2. days needed to add other information (connection ID, process ID, user) to both the access and the errors log file.

Note that the complete fix will probably involve two PRs, one for each option listed above. GA can then decide if it makes sense to wait for both PRs so they can be included in a single release of ga_atlas. Or alternatively, to cherry-pick the option 1) PR immediately to ga_atlas, and defer the option 2) PR until a later release of ga_atlas.

@ModestMC
Copy link

Hey @mwinkel-dev,

I wouldn't go so far as to say this fix is 'needed'; the hope is that just adding timestamps is lowhanging fruit that would be easy to incorporate without breaking things.

If it turns out this will be time consuming, it should not delay the ga_atlas fixes for the dispatcher issues. I believe Sean is planning to update Atlas tomorrow, so my guess is that it'll be food for the next update.

It might be good to make a separate issue for major log improvements like connection/process ID and user, as I think they're an extension of this issue's scope (make no mistake, we still think they're very, very good ideas).

@ModestMC
Copy link

Something we wondered today; is it possible to have the logs include which user accessed a particular tree? Would also be nice to toggle such an option on and off

@joshStillerman
Copy link
Contributor

That is an interesting idea. If we go to using the system logs and journalctl they will be organized by connection / service. Regardless maybe a tool could be written to filter the agrigate of them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Indicates new feature requests or implementations tool/mdsip Relates to one of the MDSplus over IP tools (mdsipd, mdsipsd, mdsip_server) US Priority
Projects
None yet
Development

No branches or pull requests

4 participants