Skip to content
This repository has been archived by the owner on Apr 14, 2021. It is now read-only.

Metric collection and reporting functionality for Bundler #7298

Closed
wants to merge 93 commits into from

Conversation

Akrabut
Copy link

@Akrabut Akrabut commented Aug 13, 2019

This work has been done as part of Google Summer of Code 2019
This is a metric reporting system for Bundler, the backend API that will collect and instrument some of the metrics in rubygems.org is here
This PR is for the the purpose of getting feedback (and the metrics are still being sent to localhost for testing purposes).
This project has been mentored by @indirect

Added functionality:

Metrics collection:

For all commands- the following are appended into ~/.bundle/metrics.yml:

Command used
Options if specified
Time taken to fully execute for commands that don't kill Bundler
Time taken to start executing for commands that do
Timestamp

For install/outdated/package/update/pristine- record the following system info and send it to the server, along with all the metrics collected so far in metrics.yml, then truncate the file:

The prior, and-
A randomized hex ID
Remote git repository (hashed)
git version
rvm version
rbenv version
chruby version
Host system details
Ruby version
Bundler version
Rubygems version
Ruby engine
CI’s
Extra user agent strings
Gemfile gem count
Actually installed gem count
git gem count
Path gem count
Gem source count
List of gem sources (hashed)

When gems are downloaded- also appends:

Gem download time

When gems are installed- also appends:

Gem installation time

When a gemfile is resolved- also appends:

Gemfile resolve time.

When an installation fails- appends the regular command metrics, and:

Name and version of the gem that failed to install

Opt out and in to metric collection

Opt out:

bundle config set disable_metrics true allows the user to opt out of metrics collection
Deletes the metrics.yml file and doesn't collect any farther metrics.
Also skips all added metric functionality when running Bundler.

Opt in:

bundle config set disable_metrics false allows the user to opt back in to metric collection.

Default:

When no disable_metrics value is found in the global config file (or the file doesn't exist), Bundler behaves as if disable_metrics is false.

All added functionality has passed (including the added tests) the tests locally on my system.

@hsbt
Copy link
Member

hsbt commented Aug 14, 2019

This data was very sensitive. I think it's enabled only opt-in.

@Akrabut
Copy link
Author

Akrabut commented Aug 14, 2019

@hsbt what data would you say is sensitive aside from the data we hash?
We currently don't collect any personally identifying info aside from remote git repo (which is hashed in Bundler side).

@hsbt
Copy link
Member

hsbt commented Aug 14, 2019

The all of software versions are sensitive for the production system.

@Akrabut
Copy link
Author

Akrabut commented Aug 14, 2019

@hsbt any specific ones? some metrics such as bundler_version, rubygems_version, ruby_version, host, command, options and ci have been collected for a long time in fetcher#user_agent without user consent.

If there are some specific metrics you think would hurt users' privacy maybe we can remove them or make them opt_in only.

I think we may get very few metrics if we make opt_out the default option.

@hsbt
Copy link
Member

hsbt commented Aug 14, 2019

some metrics such as bundler_version, rubygems_version, ruby_version, host, command, options and ci have been collected for a long time in fetcher#user_agent without the user consent.

Ah, I see. I have another concern. Why Metrics send the duplicated information like bundler, rubygems, and other versions?

@indirect Should we publish the new privacy policy for user's system metrics and its usage? I have concerns that this feature is flooded in Ruby users.

@Akrabut
Copy link
Author

Akrabut commented Aug 14, 2019

This data takes a lot of parsing when its reported by the user_agent and has to be extracted from the fastly logs.
The purpose of the project is to eventually replace the fastly parsing, however I'm not sure if the Bundler core team wants to change the user agent format to avoid sending duplicate data - that's up for you guys to decide as I'm not sure what is being done with the user agent in the backend :p

@hsbt
Copy link
Member

hsbt commented Aug 16, 2019

@Akrabut Do not open the new pull-request. We should keep the discussion and track code changes.

@hsbt hsbt reopened this Aug 16, 2019
@Akrabut
Copy link
Author

Akrabut commented Aug 16, 2019

@hsbt Sorry, rebased this instead.

@deivid-rodriguez
Copy link
Member

@Akrabut You can remove 11709ea from this PR. I'm fixing that in #7309 👍.

@Akrabut Akrabut force-pushed the metrics-project branch 3 times, most recently from c20387f to a362281 Compare August 24, 2019 13:59
@Akrabut
Copy link
Author

Akrabut commented Aug 29, 2019

This is as far as I could get with the tests I think.
bundle exec now doesn't require anything, and only install/outdated/update/package/pristine require yaml to use to_yaml and YAML.load_stream.

@deivid-rodriguez
Copy link
Member

I suggest using git bisect to find out which one of your commits is breaking the test suite.

The ultimate problem might not be in your changes and lie in RVM, but it seems clear that your changes are at least uncovering it, because the CI on the master branch is consistently green.

@Akrabut
Copy link
Author

Akrabut commented Sep 2, 2019

I suggest using git bisect to find out which one of your commits is breaking the test suite.

The ultimate problem might not be in your changes and lie in RVM, but it seems clear that your changes are at least uncovering it, because the CI on the master branch is consistently green.

The problem is I cant manage to have RVM properly function on either of my systems even if I completely uninstall and delete all remanents of rbenv/chruby.
I'm not able to reproduce these test failures on either of my systems with either rbenv or chruby, they all pass locally.

I'll try to reproduce locally on another system that has only RVM installed and I'll report.

# this results in test failures in tests which don't expect errors.
# the command DOES run successfully!
# begin
# rvm_ver = `rvm --version`
Copy link
Author

@Akrabut Akrabut Sep 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue lied here.
After setting up RVM on a gitpod.io system, running any RVM command ran the command successfully, and also echoed:

Warning! PATH is not properly set up, /workspace/.rvm/bin is not at first place.
      Usually this is caused by shell initialization files. Search for PATH=... entries.
      You can also re-add RVM to your profile by running: rvm get stable --auto-dotfiles
      To fix it temporarily in this shell session run: rvm use .rvm
      To ignore this error add rvm_silence_path_mismatch_check_flag=1 to your ~/.rvmrc file.

This is what caused the tests to fail.
The way I see this, we either:

  1. Dont collect user RVM version, or:
  2. We change all the failing tests to accept this warning.

I've commented out the rvm version collecting and the build is finally green!

@hsbt hsbt closed this Apr 14, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants