Metric collection and reporting functionality for Bundler #7298

Akrabut · 2019-08-13T20:30:20Z

This work has been done as part of Google Summer of Code 2019
This is a metric reporting system for Bundler, the backend API that will collect and instrument some of the metrics in rubygems.org is here
This PR is for the the purpose of getting feedback (and the metrics are still being sent to localhost for testing purposes).
This project has been mentored by @indirect

Added functionality:

Metrics collection:

For all commands- the following are appended into ~/.bundle/metrics.yml:

Command used
Options if specified
Time taken to fully execute for commands that don't kill Bundler
Time taken to start executing for commands that do
Timestamp

For install/outdated/package/update/pristine- record the following system info and send it to the server, along with all the metrics collected so far in metrics.yml, then truncate the file:

The prior, and-
A randomized hex ID
Remote git repository (hashed)
git version
rvm version
rbenv version
chruby version
Host system details
Ruby version
Bundler version
Rubygems version
Ruby engine
CI’s
Extra user agent strings
Gemfile gem count
Actually installed gem count
git gem count
Path gem count
Gem source count
List of gem sources (hashed)

When gems are downloaded- also appends:

Gem download time

When gems are installed- also appends:

Gem installation time

When a gemfile is resolved- also appends:

Gemfile resolve time.

When an installation fails- appends the regular command metrics, and:

Name and version of the gem that failed to install

Opt out and in to metric collection

Opt out:

bundle config set disable_metrics true allows the user to opt out of metrics collection
Deletes the metrics.yml file and doesn't collect any farther metrics.
Also skips all added metric functionality when running Bundler.

Opt in:

bundle config set disable_metrics false allows the user to opt back in to metric collection.

Default:

When no disable_metrics value is found in the global config file (or the file doesn't exist), Bundler behaves as if disable_metrics is false.

All added functionality has passed (including the added tests) the tests locally on my system.

hsbt · 2019-08-14T08:52:28Z

This data was very sensitive. I think it's enabled only opt-in.

Akrabut · 2019-08-14T11:02:07Z

@hsbt what data would you say is sensitive aside from the data we hash?
We currently don't collect any personally identifying info aside from remote git repo (which is hashed in Bundler side).

hsbt · 2019-08-14T11:58:11Z

The all of software versions are sensitive for the production system.

Akrabut · 2019-08-14T12:23:25Z

@hsbt any specific ones? some metrics such as bundler_version, rubygems_version, ruby_version, host, command, options and ci have been collected for a long time in fetcher#user_agent without user consent.

If there are some specific metrics you think would hurt users' privacy maybe we can remove them or make them opt_in only.

I think we may get very few metrics if we make opt_out the default option.

hsbt · 2019-08-14T12:38:31Z

some metrics such as bundler_version, rubygems_version, ruby_version, host, command, options and ci have been collected for a long time in fetcher#user_agent without the user consent.

Ah, I see. I have another concern. Why Metrics send the duplicated information like bundler, rubygems, and other versions?

@indirect Should we publish the new privacy policy for user's system metrics and its usage? I have concerns that this feature is flooded in Ruby users.

Akrabut · 2019-08-14T12:57:09Z

This data takes a lot of parsing when its reported by the user_agent and has to be extracted from the fastly logs.
The purpose of the project is to eventually replace the fastly parsing, however I'm not sure if the Bundler core team wants to change the user agent format to avoid sending duplicate data - that's up for you guys to decide as I'm not sure what is being done with the user agent in the backend :p

lib/bundler/installer/gem_installer.rb

hsbt · 2019-08-16T11:22:07Z

@Akrabut Do not open the new pull-request. We should keep the discussion and track code changes.

Akrabut · 2019-08-16T11:30:05Z

@hsbt Sorry, rebased this instead.

deivid-rodriguez · 2019-08-16T16:18:46Z

@Akrabut You can remove 11709ea from this PR. I'm fixing that in #7309 👍.

Akrabut · 2019-08-29T13:46:06Z

This is as far as I could get with the tests I think.
bundle exec now doesn't require anything, and only install/outdated/update/package/pristine require yaml to use to_yaml and YAML.load_stream.

deivid-rodriguez · 2019-09-01T08:19:18Z

I suggest using git bisect to find out which one of your commits is breaking the test suite.

The ultimate problem might not be in your changes and lie in RVM, but it seems clear that your changes are at least uncovering it, because the CI on the master branch is consistently green.

spec/spec_helper.rb

Akrabut · 2019-09-02T13:53:43Z

I suggest using git bisect to find out which one of your commits is breaking the test suite.

The ultimate problem might not be in your changes and lie in RVM, but it seems clear that your changes are at least uncovering it, because the CI on the master branch is consistently green.

The problem is I cant manage to have RVM properly function on either of my systems even if I completely uninstall and delete all remanents of rbenv/chruby.
I'm not able to reproduce these test failures on either of my systems with either rbenv or chruby, they all pass locally.

I'll try to reproduce locally on another system that has only RVM installed and I'll report.

This reverts commit ac282dc. Because the filter is actually used once: https://github.com/bundler/bundler/blob/3ec8165eec0a1af8c45ee258d3e7cb61fba875a2/spec/update/git_spec.rb#L162 My bad.

…ted metrics_spec.rb

…tor bug in another PR

… if a connection has already been made we can use it to avoid creating a new one

…s tests with Travis

…now rescues for it

… metrics as YAML instead of JSON to avoid requiring JSON

…n rebase

… its possible and into CLI#run, so that it appears twice instead of six times

…on rvm --version command which fails tests that do not expect errors.

Akrabut · 2019-09-07T15:54:03Z

lib/bundler/metrics.rb

+      # this results in test failures in tests which don't expect errors.
+      # the command DOES run successfully!
+      # begin
+      #   rvm_ver = `rvm --version`


The issue lied here.
After setting up RVM on a gitpod.io system, running any RVM command ran the command successfully, and also echoed:

Warning! PATH is not properly set up, /workspace/.rvm/bin is not at first place. Usually this is caused by shell initialization files. Search for PATH=... entries. You can also re-add RVM to your profile by running: rvm get stable --auto-dotfiles To fix it temporarily in this shell session run: rvm use .rvm To ignore this error add rvm_silence_path_mismatch_check_flag=1 to your ~/.rvmrc file.

This is what caused the tests to fail.
The way I see this, we either:

Dont collect user RVM version, or:

We change all the failing tests to accept this warning.

I've commented out the rvm version collecting and the build is finally green!

…ics.yml file if file is larger than 100 MB and fix a failing test

Akrabut mentioned this pull request Aug 14, 2019

Documentation page for Bundler metric reporting and collecting functionality rubygems/bundler-site#477

Closed

esasse reviewed Aug 14, 2019

View reviewed changes

lib/bundler/installer/gem_installer.rb Outdated Show resolved Hide resolved

This was referenced Aug 16, 2019

Metric collection and reporting functionality for Bundler (refactored) Akrabut/bundler#4

Closed

Metric collection and reporting functionality for Bundler (refactored) #7308

Closed

Akrabut closed this Aug 16, 2019

hsbt reopened this Aug 16, 2019

Akrabut force-pushed the metrics-project branch 3 times, most recently from c20387f to a362281 Compare August 24, 2019 13:59

Akrabut mentioned this pull request Aug 31, 2019

Gem author can see statistics about which ruby versions people use rubygems/rubygems.org#1439

Closed

Akrabut force-pushed the metrics-project branch from 9b5333c to 3151d38 Compare September 1, 2019 10:14

deivid-rodriguez reviewed Sep 1, 2019

View reviewed changes

spec/spec_helper.rb Outdated Show resolved Hide resolved

deivid-rodriguez and others added 6 commits September 7, 2019 12:22

Revert "Remove unused filter"

52a4e32

This reverts commit ac282dc. Because the filter is actually used once: https://github.com/bundler/bundler/blob/3ec8165eec0a1af8c45ee258d3e7cb61fba875a2/spec/update/git_spec.rb#L162 My bad.

send Bundler version over HTTP to a dummy server

242fb74

Send all user-agent metrics over HTTP

7b210c7

metrics over HTTP exported to metrics.rb class

6cddab3

add comments and refractor

06606bf

moved cis to metrics.rb, removed cis tests from fetcher_spec.rb, crea…

ac93fcb

…ted metrics_spec.rb

Akrabut added 20 commits September 7, 2019 12:22

manually revert commit 11709ea, david rodriguez fixed this bundle doc…

e20fd1d

…tor bug in another PR

fix missed new line at end of doctor file

da15aad

change HTTP connection from localhost to rubygems.org

5c44579

change HTTP connection to rubygems.org to a persistent connection, so…

45ed8f1

… if a connection has already been made we can use it to avoid creating a new one

refactor tests to properly function when using Travis

825a8af

add man documentation for

581db4d

refactor more tests to work properly with Travis

9c6dada

add more disable_metrics docs to man and refactor more methods to pas…

22b8006

…s tests with Travis

reformat man pages to pass tests

352d3a9

reformat bundle-config.1.txt again

38ec33e

CI's fail on line 105 in metrics.rb when using the open function, it …

fa1e8ad

…now rescues for it

fix metrics_spec to work properly with CIs

1a0fa03

rebase against bundler/bundler master branch

6e6c951

sync bundle-config man page

30b4ae6

use local psyched_yaml instead of default gem psych

f270b3e

refactor to not require anything when bundle exec is run, and to send…

bc06079

… metrics as YAML instead of JSON to avoid requiring JSON

fix an elusive extra space

3e30381

fix URI and remove an extra require

d6ffd44

fix line 103 in doctor.rb which for some reason hasn't been changed i…

33ff805

…n rebase

try to fix metrics specs to not display warnings in Travis

96ade20

Akrabut force-pushed the metrics-project branch from ebd201f to 96ade20 Compare September 7, 2019 09:37

Akrabut added 5 commits September 7, 2019 12:50

remove rebase leftovers

fb699f5

refactor Metrics.record_and_send_full_info out of command files where…

de0778b

… its possible and into CLI#run, so that it appears twice instead of six times

comment out rvm version collection for now, as it may throw an error …

c914023

…on rvm --version command which fails tests that do not expect errors.

fix multithreaded test errors by forcing initialization before each test

0c279d8

ensure @command_metrics is initialized in specs before its used

fec75fe

Akrabut commented Sep 7, 2019

View reviewed changes

Akrabut added 2 commits September 7, 2019 19:03

fix before blocks in metrics_spec.rb

dd94eb6

make most metrics class methods private, stop adding data to the metr…

01c73ff

…ics.yml file if file is larger than 100 MB and fix a failing test

hsbt closed this Apr 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metric collection and reporting functionality for Bundler #7298

Metric collection and reporting functionality for Bundler #7298

Akrabut commented Aug 13, 2019

hsbt commented Aug 14, 2019

Akrabut commented Aug 14, 2019 •

edited

Loading

hsbt commented Aug 14, 2019 •

edited

Loading

Akrabut commented Aug 14, 2019 •

edited

Loading

hsbt commented Aug 14, 2019

Akrabut commented Aug 14, 2019

hsbt commented Aug 16, 2019

Akrabut commented Aug 16, 2019

deivid-rodriguez commented Aug 16, 2019

Akrabut commented Aug 29, 2019 •

edited

Loading

deivid-rodriguez commented Sep 1, 2019

Akrabut commented Sep 2, 2019

Akrabut Sep 7, 2019 •

edited

Loading

Metric collection and reporting functionality for Bundler #7298

Metric collection and reporting functionality for Bundler #7298

Conversation

Akrabut commented Aug 13, 2019

Added functionality:

Metrics collection:

For all commands- the following are appended into ~/.bundle/metrics.yml:

For install/outdated/package/update/pristine- record the following system info and send it to the server, along with all the metrics collected so far in metrics.yml, then truncate the file:

When gems are downloaded- also appends:

When gems are installed- also appends:

When a gemfile is resolved- also appends:

When an installation fails- appends the regular command metrics, and:

Opt out and in to metric collection

Opt out:

Opt in:

Default:

All added functionality has passed (including the added tests) the tests locally on my system.

hsbt commented Aug 14, 2019

Akrabut commented Aug 14, 2019 • edited Loading

hsbt commented Aug 14, 2019 • edited Loading

Akrabut commented Aug 14, 2019 • edited Loading

hsbt commented Aug 14, 2019

Akrabut commented Aug 14, 2019

hsbt commented Aug 16, 2019

Akrabut commented Aug 16, 2019

deivid-rodriguez commented Aug 16, 2019

Akrabut commented Aug 29, 2019 • edited Loading

deivid-rodriguez commented Sep 1, 2019

Akrabut commented Sep 2, 2019

Akrabut Sep 7, 2019 • edited Loading

Choose a reason for hiding this comment

Akrabut commented Aug 14, 2019 •

edited

Loading

hsbt commented Aug 14, 2019 •

edited

Loading

Akrabut commented Aug 14, 2019 •

edited

Loading

Akrabut commented Aug 29, 2019 •

edited

Loading

Akrabut Sep 7, 2019 •

edited

Loading