Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-3525] [Bug] Option --no-send-anonymous-usage-stats creates different behavior compared to DBT_SEND_ANONYMOUS_USAGE_STATS=false #9336

Open
2 tasks done
mbarugelCA opened this issue Jan 4, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@mbarugelCA
Copy link

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Running time dbt ls -s something --no-send-anonymous-usage-stats generates these logs:

�[0m19:46:13.215433 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0xffff98916350>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0xffff96f84e50>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0xffff96f87be0>]}


============================== 19:46:13.217242 | 57017d07-14d8-4178-884f-d2cdeda0a998 ==============================
�[0m19:46:13.217242 [info ] [MainThread]: Running with dbt=1.7.4
�[0m19:46:13.217977 [debug] [MainThread]: running dbt with arguments {'printer_width': '80', 'indirect_selection': 'eager', 'log_cache_events': 'False', 'write_json': 'True', 'partial_parse': 'True', 'cache_selected_only': 'False', 'warn_error': 'None', 'fail_fast': 'False', 'version_check': 'True', 'log_path': '/usr/local/dbt_snowflake/logs', 'profiles_dir': '/usr/local/airflow/.dbt', 'debug': 'False', 'use_colors': 'True', 'use_experimental_parser': 'False', 'no_print': 'None', 'quiet': 'False', 'warn_error_options': 'WarnErrorOptions(include=[], exclude=[])', 'static_parser': 'True', 'log_format': 'default', 'invocation_command': 'dbt ls -s something --no-send-anonymous-usage-stats', 'target_path': 'None', 'introspect': 'True', 'send_anonymous_usage_stats': 'True'}
�[0m19:46:13.409792 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'project_id', 'label': '57017d07-14d8-4178-884f-d2cdeda0a998', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0xffff9911d210>]}
�[0m19:46:13.456307 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'adapter_info', 'label': '57017d07-14d8-4178-884f-d2cdeda0a998', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0xffff95f31900>]}
�[0m19:46:13.457063 [info ] [MainThread]: Registered adapter: snowflake=1.7.1
�[0m19:46:13.518584 [debug] [MainThread]: checksum: 248e8aba381ba2d577dce7ab50d2010c2aac45d2b27e6e6214cdd96d2d41fabc, vars: {}, profile: , target: , version: 1.7.4
�[0m19:46:14.017287 [debug] [MainThread]: Partial parsing enabled: 0 files deleted, 0 files added, 0 files changed.
�[0m19:46:14.017913 [debug] [MainThread]: Partial parsing enabled, no changes found, skipping parsing
�[0m19:46:14.075769 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'load_project', 'label': '57017d07-14d8-4178-884f-d2cdeda0a998', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0xffff94b2c0d0>]}
�[0m19:46:14.194683 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'resource_counts', 'label': '57017d07-14d8-4178-884f-d2cdeda0a998', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0xffff94dcea70>]}
�[0m19:46:14.195421 [info ] [MainThread]: Found 337 models, 11 snapshots, 444 tests, 10 seeds, 2 operations, 145 sources, 0 exposures, 0 metrics, 683 macros, 0 groups, 0 semantic models
�[0m19:46:14.196130 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': '57017d07-14d8-4178-884f-d2cdeda0a998', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0xffff94dcea10>]}
�[0m19:46:14.199641 [warn ] [MainThread]: The selection criterion 'something' does not match any nodes
�[0m19:46:14.200179 [warn ] [MainThread]: No nodes selected!
�[0m19:46:14.201185 [debug] [MainThread]: Resource report: {"command_name": "list", "command_success": true, "command_wall_clock_time": 1.0115864, "process_user_time": 1.816159, "process_kernel_time": 0.270297, "process_mem_max_rss": "137156", "process_in_blocks": "0", "process_out_blocks": "0"}
�[0m19:46:14.201706 [debug] [MainThread]: Command `dbt ls` succeeded at 19:46:14.201608 after 1.01 seconds
�[0m19:46:14.202355 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0xffff98916350>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0xffff95f31900>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0xffff94dcc550>]}
�[0m19:46:14.203033 [debug] [MainThread]: Flushing usage events

and the time output is:

real    0m3.561s
user    0m2.040s
sys     0m0.281s

Whereas running time DBT_SEND_ANONYMOUS_USAGE_STATS=false dbt ls -s something generates these logs:



============================== 19:48:04.896626 | 66292a5a-377e-4a8a-a374-91bea051e347 ==============================
�[0m19:48:04.896626 [info ] [MainThread]: Running with dbt=1.7.4
�[0m19:48:04.898463 [debug] [MainThread]: running dbt with arguments {'printer_width': '80', 'indirect_selection': 'eager', 'write_json': 'True', 'log_cache_events': 'False', 'partial_parse': 'True', 'cache_selected_only': 'False', 'profiles_dir': '/usr/local/airflow/.dbt', 'fail_fast': 'False', 'version_check': 'True', 'log_path': '/usr/local/dbt_snowflake/logs', 'debug': 'False', 'warn_error': 'None', 'use_colors': 'True', 'use_experimental_parser': 'False', 'no_print': 'None', 'quiet': 'False', 'warn_error_options': 'WarnErrorOptions(include=[], exclude=[])', 'static_parser': 'True', 'log_format': 'default', 'invocation_command': 'dbt ls -s something', 'target_path': 'None', 'introspect': 'True', 'send_anonymous_usage_stats': 'False'}
�[0m19:48:05.135858 [info ] [MainThread]: Registered adapter: snowflake=1.7.1
�[0m19:48:05.187251 [debug] [MainThread]: checksum: 248e8aba381ba2d577dce7ab50d2010c2aac45d2b27e6e6214cdd96d2d41fabc, vars: {}, profile: , target: , version: 1.7.4
�[0m19:48:05.519577 [debug] [MainThread]: Partial parsing enabled: 0 files deleted, 0 files added, 0 files changed.
�[0m19:48:05.520165 [debug] [MainThread]: Partial parsing enabled, no changes found, skipping parsing
�[0m19:48:05.703670 [info ] [MainThread]: Found 337 models, 11 snapshots, 444 tests, 10 seeds, 2 operations, 145 sources, 0 exposures, 0 metrics, 683 macros, 0 groups, 0 semantic models
�[0m19:48:05.707770 [warn ] [MainThread]: The selection criterion 'something' does not match any nodes
�[0m19:48:05.708355 [warn ] [MainThread]: No nodes selected!
�[0m19:48:05.709225 [debug] [MainThread]: Resource report: {"command_name": "list", "command_success": true, "command_wall_clock_time": 0.8377126, "process_user_time": 1.595339, "process_kernel_time": 0.210098, "process_mem_max_rss": "137264", "process_in_blocks": "0", "process_out_blocks": "0"}
�[0m19:48:05.709759 [debug] [MainThread]: Command `dbt ls` succeeded at 19:48:05.709641 after 0.84 seconds
�[0m19:48:05.710292 [debug] [MainThread]: Flushing usage events

And time output is:

real    0m2.220s
user    0m1.814s
sys     0m0.219s

So using the env var config actually results in fewer events logged and a large improvement in runtime.

Expected Behavior

As I understand it, --no-send-anonymous-usage-stats and DBT_SEND_ANONYMOUS_USAGE_STATS=false should be identical.

Steps To Reproduce

Run:
time dbt ls -s something --no-send-anonymous-usage-stats
and
time DBT_SEND_ANONYMOUS_USAGE_STATS=false dbt ls -s something

Compare the logs and runtime.

Relevant log output

No response

Environment

- OS:
- Python:
- dbt:

Which database adapter are you using with dbt?

snowflake

Additional Context

No response

@mbarugelCA mbarugelCA added bug Something isn't working triage labels Jan 4, 2024
@github-actions github-actions bot changed the title [Bug] Option --no-send-anonymous-usage-stats creates different behavior compared to DBT_SEND_ANONYMOUS_USAGE_STATS=false [CT-3525] [Bug] Option --no-send-anonymous-usage-stats creates different behavior compared to DBT_SEND_ANONYMOUS_USAGE_STATS=false Jan 4, 2024
@dbeatty10 dbeatty10 self-assigned this Jan 4, 2024
@dbeatty10
Copy link
Contributor

Thanks for reporting this @mbarugelCA 🙏

I was able to reproduce the same thing that you reported.

Reprex

First, I needed to make sure I had all the relevant environment variables unset:

echo $DBT_SEND_ANONYMOUS_USAGE_STATS
echo $DO_NOT_TRACK
unset DBT_SEND_ANONYMOUS_USAGE_STATS
unset DO_NOT_TRACK

From there, I ran all the following commands to see what dbt is doing:

dbt --debug ls -s something --send-anonymous-usage-stats | grep -Eo "'send_anonymous_usage_stats': '(.*)'"
DBT_SEND_ANONYMOUS_USAGE_STATS=true dbt --debug ls -s something | grep -Eo "'send_anonymous_usage_stats': '(.*)'"
dbt --debug ls -s something --no-send-anonymous-usage-stats | grep -Eo "'send_anonymous_usage_stats': '(.*)'"
DBT_SEND_ANONYMOUS_USAGE_STATS=false dbt --debug ls -s something | grep -Eo "'send_anonymous_usage_stats': '(.*)'" 
DO_NOT_TRACK=true dbt --debug ls -s something | grep -Eo "'send_anonymous_usage_stats': '(.*)'"

Explanation:

  • --debug writes debug-level logs to standard out (rather than needing to look in logs/dbt.log.
  • Then the grep -Eo "'send_anonymous_usage_stats': '(.*)'" part finds the log line that shows the config settings that dbt is using.

The output we'd expect is:

'send_anonymous_usage_stats': 'True'
'send_anonymous_usage_stats': 'True'
'send_anonymous_usage_stats': 'False'
'send_anonymous_usage_stats': 'False'
'send_anonymous_usage_stats': 'False'

But depending on the config setting for send_anonymous_usage_stats within profiles.yml, I saw different behavior.

When send_anonymous_usage_stats is either set to True (default), None, or not set within profiles.yml:

'send_anonymous_usage_stats': 'True'
'send_anonymous_usage_stats': 'True'
'send_anonymous_usage_stats': 'True'   ❌ 
'send_anonymous_usage_stats': 'False'
'send_anonymous_usage_stats': 'False'

When send_anonymous_usage_stats is set to False within profiles.yml:

'send_anonymous_usage_stats': 'False'  ❌
'send_anonymous_usage_stats': 'True'
'send_anonymous_usage_stats': 'False'
'send_anonymous_usage_stats': 'False'
'send_anonymous_usage_stats': 'False'

Conclusion

👉 So it looks like the --send-anonymous-usage-stats / --no-send-anonymous-usage-stats CLI flags are being ignored rather than adhering to the standard precedence.

Acceptance criteria

The output from the above commands is always the following, regardless if the send_anonymous_usage_stats setting within profiles.yml is None, True, or False):

'send_anonymous_usage_stats': 'True'
'send_anonymous_usage_stats': 'True'
'send_anonymous_usage_stats': 'False'
'send_anonymous_usage_stats': 'False'
'send_anonymous_usage_stats': 'False'

@dbeatty10 dbeatty10 removed the triage label Jan 4, 2024
@dbeatty10 dbeatty10 removed their assignment Jan 4, 2024
mirnawong1 added a commit to dbt-labs/docs.getdbt.com that referenced this issue Jan 8, 2024
…rue` (#4713)

## What are you changing in this pull request and why?

While responding to dbt-labs/dbt-core#9336, I
set `DO_NOT_TRACK=0` and examined the result. It does not behave the
same way as `DBT_SEND_ANONYMOUS_USAGE_STATS=True`.

Looking at the source code
[here](https://github.com/dbt-labs/dbt-core/blob/11cc71b75fa64b09888461339eb1eb3b394f9528/core/dbt/cli/flags.py#L252-L254)
explains why.

So we can safely just remove this line from the docs for
[`send_anonymous_usage_stats`](https://docs.getdbt.com/reference/global-configs/usage-stats)
to avoid confusion.

## Additional info

We first added support for the [Console Do Not
Track](https://consoledonottrack.com/) initiative within
dbt-labs/dbt-core#5000 as described in
dbt-labs/dbt-core#3540.

Any of the following are equivalent to
`DBT_SEND_ANONYMOUS_USAGE_STATS=False` (whether they are uppercase,
lowercase, or mixed case):
```
export DO_NOT_TRACK=1
export DO_NOT_TRACK=t
export DO_NOT_TRACK=true
export DO_NOT_TRACK=y
export DO_NOT_TRACK=yes
```

Any other values of `DO_NOT_TRACK` are ignored altogether and not have
any effect.

## Checklist
- [x] Review the [Content style
guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md)
so my content adheres to these guidelines.
@dbeatty10
Copy link
Contributor

I'm not sure if it affects the behavior described in this issue or not, but we take an additional environment variable DO_NOT_TRACK into consideration here:

# Support console DO NOT TRACK initiative.
if os.getenv("DO_NOT_TRACK", "").lower() in ("1", "t", "true", "y", "yes"):
object.__setattr__(self, "SEND_ANONYMOUS_USAGE_STATS", False)

We mostly expose only a single environment variable to control flags / global configs, but this is the rare case (and maybe only one) in which we allow two different environment variable names:

  • DBT_SEND_ANONYMOUS_USAGE_STATS
  • DO_NOT_TRACK

According to the click docs here, envvar can also be:

a list of different environment variables where the first one is picked.

Also, click handles the BOOL parameter type like this:

The string values “1”, “true”, “t”, “yes”, “y”, and “on” convert to True. “0”, “false”, “f”, “no”, “n”, and “off” convert to False.

This happens to be exactly what is in the logic below, which would lower the barriers to that portion of the code refactor:

if os.getenv("DO_NOT_TRACK", "").lower() in ("1", "t", "true", "y", "yes"):

So we might be able to remove this code in favor of adding this here:

    envvar=["DO_NOT_TRACK", "DBT_SEND_ANONYMOUS_USAGE_STATS"],

At the very least, it would reduce the number of occurrences of os.getenv in the code base (which feels like tech debt). But there's a possibility it would also resolve the undesirable behavior described in this issue.

@b-per
Copy link
Contributor

b-per commented Apr 19, 2024

I just found out this same issue!

I was running dbt-duckdb on a place (no Internet), and saw that any run was hanging for 30s.

I suspected it was due to tracking so I tried adding --no-send-anonymous-usage-stats, but it was still hanging.
Now trying with no internet and DO_NOT_TRACK=1, it doesn't hang.

This seems related to this value:

{"buffer_size": 30} if SNOWPLOW_TRACKER_VERSION < Version("0.13.0") else {"batch_size": 30}

Should we raise another issue that if the client is not connected to the internet we should not hang for 30 secs?

@dbeatty10
Copy link
Contributor

Should we raise another issue that if the client is not connected to the internet we should not hang for 30 secs?

It seems reasonable to give the anonymous tracking a quick shot and give up after a very short amount of time (1 or 2 seconds total) if it isn't able to establish a connection.

If you want to open an issue for this, I'll label it as help wanted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants