Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic telemetry features #2314

Merged
merged 35 commits into from
Mar 21, 2022
Merged

Add basic telemetry features #2314

merged 35 commits into from
Mar 21, 2022

Conversation

julian-risch
Copy link
Member

@julian-risch julian-risch commented Mar 15, 2022

Proposed changes:

  • This PR adds basic telemetry features. It allows sending anonymous usage statistics to steer the development process and support continuous software improvements for all of Haystack's users.
  • A PR for updating the documentation on the Website is here: add telemetry documentation page haystack-website#268

The events contain an anonymous user id, which is a randomly generated uuid. There is no way to infer your identity from the user id or any other content of the event.

To prevent revealing a user's identity, the following properties will never be used by telemetry:

  • IP addresses
  • Host names
  • File paths
  • Queries
  • Document contents

You can have a look at the meta data that is shared about your setup by calling print_telemetry_report() from haystack/telemetry.py. If you would like to inspect all information that telemetry shares when you use Haystack, you can enable writing all events to a log file by calling enable_writing_events_to_file(). The default location of that file is ~/.haystack/telemetry.log.

Here is an exemplary event that is sent when tutorial 1 is executed via running Tutorial1_Basic_QA_Pipeline.py.

{
    "event": "tutorial 1 executed",
    "distinct_id": "9baab867-3bc8-438c-9974-a192c9d53cd1",
    "properties": {
        "os_family": "Darwin",
        "os_machine": "arm64",
        "os_version": "21.3.0",
        "haystack_version": "1.0.0",
        "python_version": "3.9.6",
        "torch_version": "1.9.0",
        "transformers_version": "4.13.0"
        "execution_env": "script",
        "context": null,
        "n_gpu": 0,
    },
}

Users can opt-out of telemetry via the following options.

  • disable telemetry by calling disable_telemetry() within Python or by setting the environment variable HAYSTACK_TELEMETRY_ENABLED to "False" directly
  • if using a bash shell, by adding the following line to the file ~/.bashrc to permanently disable telemetry: export HAYSTACK_TELEMETRY_ENABLED=False.
  • if using zsh, e.g., on MacOS, by adding that line to the file ~/.zshrc.
  • if using the standard command prompt in Windows, by setting a user-level environment variable via running the following command to permanently disable telemetry: setx HAYSTACK_TELEMETRY_ENABLED "False".
  • if using the Windows PowerShell the command is: [Environment]::SetEnvironmentVariable("HAYSTACK_TELEMETRY_ENABLED","False","User").

To Do

  • The posthog server needs to be changed from the development server to the production server.
  • The public Haystack demo needs to set the context variable.
  • Change the link to the documentation page about telemetry.
  • Make sure the CI does not generate events. All events sent by CI can now be filtered out via execution_env != "ci".
  • Make sure running api and ui tests locally does not generate events

Status (please check what you already did):

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@julian-risch julian-risch added the type:feature New feature or request label Mar 15, 2022
@julian-risch julian-risch changed the title add basic telemetry features Add basic telemetry features Mar 15, 2022
Copy link
Contributor

@ZanSara ZanSara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I added many comments but most are minor and a few are simply architectural. There's only one related to model_name_or_path that might be worth looking into, and some concerns of mine about the generous use of except Exception: pass. Nothing blocking though 🙂

haystack/errors.py Outdated Show resolved Hide resolved
haystack/telemetry.py Show resolved Hide resolved
haystack/telemetry.py Outdated Show resolved Hide resolved
haystack/telemetry.py Outdated Show resolved Hide resolved
haystack/telemetry.py Outdated Show resolved Hide resolved
haystack/telemetry.py Outdated Show resolved Hide resolved
haystack/telemetry.py Show resolved Hide resolved
test/test_telemetry.py Show resolved Hide resolved
test/test_telemetry.py Outdated Show resolved Hide resolved
haystack/nodes/base.py Show resolved Hide resolved
Copy link
Contributor

@ZanSara ZanSara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 🙂 As soon as the tests pass it can be merged. Good job! 👍

haystack/telemetry.py Outdated Show resolved Hide resolved
test/test_telemetry.py Outdated Show resolved Hide resolved
test/test_telemetry.py Show resolved Hide resolved
Copy link
Member

@tstadel tstadel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!
I'd however get rid of the Thread in fire_and_forget as posthog already takes care of this and creating threads is a rather expensive operation (can take several 10s or 100s of ms). Note that currently we create and destroy a thread on any call.

haystack/telemetry.py Outdated Show resolved Hide resolved
haystack/telemetry.py Outdated Show resolved Hide resolved
haystack/errors.py Show resolved Hide resolved
haystack/telemetry.py Show resolved Hide resolved
haystack/telemetry.py Show resolved Hide resolved
test/test_telemetry.py Outdated Show resolved Hide resolved
@julian-risch julian-risch merged commit ac5617e into master Mar 21, 2022
@julian-risch julian-risch deleted the telemetry branch March 21, 2022 10:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants