Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull request for issue #28 #29

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mdiep-cese
Copy link
Contributor

This ticket closes issue #28

The getUUID function is changed so that it can accommodates logs where the field clientTime are not present.

Copy link
Contributor

@EandrewJones EandrewJones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This indeed fixes the problem. However, it's unclear why we need to construct an id from the individual field values in the log.

Questions:

  • How is this function used elsewhere in the codebase?
  • Do we want/need to be able to parse the id to retrieve those values in some way?
  • Do we want our ids to have a partial-logical ordering?

If the only requirement is to generate a uuid, then why not just use str(uuid.uuid4()) and call it a day? We'll never get a collision.

Tagging @Jyyjy or @amirmghaemi


# Add timestmap from clientTime if exists, otherwise use endTime information
if log["logType"] == "interval":
UUID = UUID + str(log["endTime"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use incremental syntax here:
UUID += str(log["endTime"])

if log["logType"] == "interval":
UUID = UUID + str(log["endTime"])
else:
UUID = UUID + str(log["clientTime"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

UUID = UUID + str(log["clientTime"])

# Add logtype information
UUID = UUID + str(log["logType"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too

# Add logtype information
UUID = UUID + str(log["logType"])

# Add type information, if it exists
if "type" in log:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this to differentiate between two logs with the same session ID and same client/endTime but of different types?

@Jyyjy
Copy link
Contributor

Jyyjy commented Feb 13, 2024

This indeed fixes the problem. However, it's unclear why we need to construct an id from the individual field values in the log.

Questions:

* How is this function used elsewhere in the codebase?

* Do we want/need to be able to parse the id to retrieve those values in some way?

* Do we want our ids to have a partial-logical ordering?

If the only requirement is to generate a uuid, then why not just use str(uuid.uuid4()) and call it a day? We'll never get a collision.

Tagging @Jyyjy or @amirmghaemi

As the package is written, getUUID has to return the same value when the same log is passed in. str(uuid.uuid4()) will create a different uuid when the same log is passed. hash() might be a better option.

  1. getUUID isn't really used within the distill package. Users are expected to use it to create a dictionary mapping UUID to logs. Then that dictionary is whats passed to the segmentation functions. This is one of the biggest pains of working with distill, you have to manage the UUID's and dictionary of logs yourself.
  2. No, all that info is in the log, which the UUID (assuming the user set things up correctly) maps to.
  3. Not sure exactly what you mean. One of the assumptions of the segmentation functions is that the user sorts the log dictionary by clienttime.

Also, the reason @mdiep-cese ran into this issue is that interval logs have some inconsistencies in userale, and we have historically filtered out all interval logs. I'm not sure about the details, but that's been josh's guidance. This may be the relevant ticket. But my point is that nothing in this package is built to deal with interval logs.

Distill is even less mature than userale. The upside is that we can change things a lot without really affecting anyone. I'm team fresh rewrite.

Edit: found an old PR which sparked a discussion about this last year
UMD-ARLIS#18

@EandrewJones EandrewJones mentioned this pull request Feb 13, 2024
@EandrewJones
Copy link
Contributor

Let's move this discussion to a dedicated issue.

@mdiep-cese You can just make the minor changes for now and this should be good to go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants