New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usage Tracking and logging for Parsl project reporting #34

Closed
yadudoc opened this Issue Oct 4, 2017 · 7 comments

Comments

Projects
None yet
3 participants
@yadudoc
Contributor

yadudoc commented Oct 4, 2017

For reporting purposes we need to capture the following information

  • Hash of Username, hostname (ip)
  • Hash of Parsl script (if possible)
  • Count of apps launched by DFK
  • Site IDs used
  • Total run time

A UDP packet with this info could be sent out from the Parsl script and this could be captured and logged on a server hosted on AWS.
Here are the components we'll need

  • Log info on the client side and send a UDP packet
  • Receive and log info to stable storage on our side.
  • Config option to opt-out of tracking
  • Documentation to clearly indicate the info collected and how to opt-out

@yadudoc yadudoc added this to the Parsl-0.3.0 milestone Oct 4, 2017

@yadudoc yadudoc self-assigned this Oct 4, 2017

@yadudoc yadudoc modified the milestones: Parsl-0.3.0, Parsl-0.4.0 Dec 7, 2017

@danielskatz danielskatz changed the title from Usage Tracking and logging to Usage Tracking and logging for Parsl project reporting Dec 12, 2017

@kylechard

This comment has been minimized.

Collaborator

kylechard commented Jan 11, 2018

At this stage we should focus on the bare minimum (and least invasive).

Some of these stats can be sent at the beginning while others are sent at the end. Sending at the end is slightly risky as there might be failures that would kill the script.

Beginning

  • hash of username+IP (to avoid the millions of 'ubuntu' users)
  • UDP packet will include sending IP
  • start time

End

  • number of apps launched by DFK
  • number of app failures
  • number of sites used
  • total compute hours
  • end time

In the future we could add the following

  • site ID (when we have a site registry)
  • hash of the script

Other things

  • Opt-out option in the config: "usage_tracking = False"
  • Information on the Parsl site about anonymous usage tracking with details about what we track
@annawoodard

This comment has been minimized.

Collaborator

annawoodard commented Jan 12, 2018

We might also consider a username-based filter for devs, which would automatically set a flag to lump testing stats separately from user stats. Presumably for project reporting we only want user stats, and it could be a pain to disentangle them post hoc. Alternatively we could have a 'testing mode' configuration flag, but that makes it more likely we forget to set it sometimes and inflate the user stats.

@kylechard

This comment has been minimized.

Collaborator

kylechard commented Jan 12, 2018

Agreed. We actually discussed this same topic this afternoon. The thought was to have a testing mode for our usage and for our automated tests.

@annawoodard

This comment has been minimized.

Collaborator

annawoodard commented Jan 12, 2018

Sounds good! Were you guys thinking a testing mode configuration flag, or an automated filter? Everyone makes mistakes, so if we let it be possible for us to forget, I think we will forget from time to time.

Filter pros:

  1. Eliminates per-workflow possibility dev forgets to set the flag

Filter cons:

  1. Introduces per-dev possibility we forget to add dev to the filter
  2. Possible collisions (user gets filtered as dev)-- probably unlikely for now
  3. Less flexible: there may be situations where devs should count as users
  4. More work for new people to contribute

OK, maybe I'm overthinking this, but maybe filter+config flag is the way to go. In other words, if the config flag is not set, the default would be to evaluate the filter (so devs would by default run in testing mode). But the configuration flag would override that so that a dev could run in user mode, or a user could run in testing mode. That would eliminate cons 3) and 4).

@yadudoc

This comment has been minimized.

Contributor

yadudoc commented Jan 12, 2018

I am in the process of adding is a testing mode where the usage stats would contain a test flag so that we can identify them and tell apart the origin. For automated tests, I'll be setting this flag.

I agree that the tests should not be counted and there's a possibility that we might forget the flag while developing tests. There's little risk here since these would be short and we generally don't count scripts that run shorter than a minute. If necessary, we could also track the filename and exclude one's that start with test_ which is a convention we follow for all our tests.

@annawoodard

This comment has been minimized.

Collaborator

annawoodard commented Jan 12, 2018

There's little risk here since these would be short and we generally don't count scripts that run shorter than a minute.

I'm definitely planning on testing with more realistic workflows that would run longer than a minute 😄 .

I'm fine with scrapping the username filter idea, but I would emphatically protest filtering on filenames. A lot of users consider the first time they run something to be a 'test' and will have files named "test_v10" and "test_v1_new_reallynew_v1_v2_latest" etc.

yadudoc added a commit that referenced this issue Jan 18, 2018

Support for anonymized usage tracking ref. #34
Track state of the dfk at the start and end of the workflow and
send minimal usage info via UDP back to tracking.parsl-project.org.

yadudoc added a commit that referenced this issue Jan 22, 2018

Merge remote-tracking branch 'origin/flowcontrol'
    This merge include key functionality for flowcontrol(#46)
    as well as usage_tracking (#34). Documentation is pending
    and will be added straight to the master branch.
@yadudoc

This comment has been minimized.

Contributor

yadudoc commented Jan 22, 2018

With this functionality merged, I think this issue should be closed and we should open a doc issue on documenting anonymized usage tracking and being upfront about it.

@yadudoc yadudoc closed this Jan 22, 2018

benclifford pushed a commit that referenced this issue Aug 9, 2018

Merge pull request #34 from Parsl/setup-requirements-before-import-Pa…
…rsl/parsl#139

Do not import parsl before dependencies are set up.

annawoodard pushed a commit that referenced this issue Sep 24, 2018

Merge pull request #34 from Parsl/setup-requirements-before-import-Pa…
…rsl/parsl#139

Do not import parsl before dependencies are set up.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment