Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit writes to session file #243

Merged

Conversation

eliasyishak
Copy link
Contributor

@eliasyishak eliasyishak commented Feb 28, 2024

Fixes:

Related to:

This update makes it so that we only write to the session file only when the session id needs to be updated. Previously, we were writing with every event in order to keep track of when the last ping was so that we can keep the session alive or start a new one.

We now instead use the last modified timestamp for the session file as a stand in for when the last ping happened.

The file's format has also changed

// OLD
{"session_id": 123, "last_ping": 123}

// NEW
{"session_id": 123}

  • I’ve reviewed the contributor guide and applied the relevant portions to this PR.
Contribution guidelines:

Note that many Dart repos have a weekly cadence for reviewing PRs - please allow for some latency before initial review feedback.

@eliasyishak eliasyishak changed the title Limit read writes to session json Limit writes to session file Feb 28, 2024
Copy link

Package publishing

Package Version Status Publish tag (post-merge)
package:unified_analytics 5.8.6 ready to publish unified_analytics-v5.8.6
package:cli_config 0.1.2 already published at pub.dev
package:extension_discovery 2.0.1-wip WIP (no publish necessary)
package:graphs 2.3.2-wip WIP (no publish necessary)

Documentation at https://github.com/dart-lang/ecosystem/wiki/Publishing-automation.

'session_id': now.millisecondsSinceEpoch,
'last_ping': now.millisecondsSinceEpoch,
}));
sessionFile.writeAsStringSync('${now.millisecondsSinceEpoch}');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, when we discussed this yesterday, I was assuming this was the client ID, and we'd never need to re-write this. Since this is the session ID, this means we would still need to write to this when starting a new session, and another process could read from it in the middle of a write, do I have that correct?

If so, I have a few thoughts:

  1. using last modified timestamp still seems like an improvement, and we'd at least get less race conditions
  2. STILL using JSON is probably better, because then we would get a parse error vs. just getting the WRONG session_id
  3. let's just hard-code the json as a string, rather than invoking the overhead of jsonEncode
  4. maybe it's enough to catch the parse error, and then re-try 1 second later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with all of these points. Removing the last ping key should be easy enough. Also with the refactor to prevent a huge amount of write operations, I think we may be able to fall back to attempting to paraeContents again?

If we see that this is still coming up in the error logs for the race condition, we can return to setting the sessionId back to the current timestamp. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may be able to fall back to attempting to paraeContents again?

It would still be possible for the second parse to fail, right?

If we see that this is still coming up in the error logs for the race condition, we can return to setting the sessionId back to the current timestamp.

Not sure what you mean here. Are you proposing using something else other than the current timestamp for the sessionId?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would still be possible for the second parse to fail, right?

Yes, the parseContents method always has the possibility of failing.

Not sure what you mean here. Are you proposing using something else other than the current timestamp for the sessionId?

And what I meant here is monitoring the crash logs to see if we run into this race condition again... i would suspect that we would have a lot less instances of this error since we are now writing a lot less by checking File.lastModifiedSync method

Copy link
Contributor Author

@eliasyishak eliasyishak Feb 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so I think i found a better solution now in my latest commit

If we fail when we attempt Session.parseContents, we will now set the session id to be the current timestamp (within the catch blocks), and then we will also pass this timestamp to the file so that whatever we have set in the error handling blocks agrees with what is persisted. How does that sound?


Session({
required this.homeDirectory,
required this.fs,
required ErrorHandler errorHandler,
}) : sessionFile = fs.file(p.join(
homeDirectory.path, kDartToolDirectoryName, kSessionFileName)),
_sessionId = DateTime.now().millisecondsSinceEpoch,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like it's "solving" the problem of LateInitializationError by assigning the variable the wrong value. Or am I misunderstanding how this works?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct, this value is getting set when calling the constructor and it will eventually get replaced when we run Session.initialize. This just makes it so that we don't have to make the variable late.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's technically not an invalid session id since later on in the code, we could be setting the this value in the file because we need a new session id if the old one had 30 mins of inactivity

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could be setting the this value in the file because we need a new session id if the old one had 30 mins of inactivity

But if it has NOT been 30 minutes of inactivity, the correct session ID would be a different value, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's correct, we would have to decide if that is worth it to remove the late variable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lemme take a look at this later today?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm, no rush

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from discussion, maybe we should just make _sessionId nullable, and allow sending a null session id. this would also help when sending errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep this makes sense with me, i also just confirmed that sending null for the session id is a valid json payload for GA4

// Initialize the session handler with the session_id and last_ping
// variables by parsing the json file
// Initialize the session handler with the session_id
// by parsing the json file
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a side note, I think we should refactor the error handling class so that it doesn't depend on session data AND remove the need for the log handler

This is because:

  1. Having session information for error reports is not necessary, this is just for maintainers of this package so we know what errors are coming up in the wild
  2. We can also remove the need for the log handler because there is no value in locally persisting the error event on the user's computer (this log file is used for the survey handler feature, we are not using error events for targeting users)

Copy link
Member

@christopherfujino christopherfujino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eliasyishak eliasyishak merged commit 24087e1 into dart-lang:main Mar 6, 2024
6 checks passed
@eliasyishak eliasyishak deleted the limit-read-writes-to-session-json branch March 6, 2024 21:32
@eliasyishak eliasyishak restored the limit-read-writes-to-session-json branch March 6, 2024 21:32
@eliasyishak eliasyishak deleted the limit-read-writes-to-session-json branch March 6, 2024 21:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants