New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Re)consider hiding current month in Qubes userbase estimate chart (aka "Statistics") #3858

Closed
andrewdavidwong opened this Issue Apr 25, 2018 · 25 comments

Comments

Projects
None yet
4 participants
@andrewdavidwong
Member

andrewdavidwong commented Apr 25, 2018

[Branched from the discussion in #3841.]

Some people misinterpret the lower bar height of the current month as a drop in users. They don't understand that the bar will continue to increase in height because there's still time left to go before the end of the month. Hiding the current month would prevent this misunderstanding.

CC: @woju, @marmarek, @rootkovska

@andrewdavidwong andrewdavidwong added this to the Documentation/website milestone Apr 25, 2018

@woju

This comment has been minimized.

Show comment
Hide comment
@woju

woju Apr 25, 2018

Member

Current month is helpful when estimating adoption rate when releasing. We'd have to wait for a full month before knowing how people install new version. Recent R4.0 release is a good example: as of now we wouldn't have the information how people adopt it.

Consider that the numbers on vertical axis have little in connect with actual number of users or machines and what really matters is the relative change. And current month is actually a good example of how that works. So instead of hiding it, maybe we could have better explanation of what this graph is really good for. Wouldn't it be a good idea to stop calling it "userbase estimation" and instead rename it "estimated adoption rate"?

Member

woju commented Apr 25, 2018

Current month is helpful when estimating adoption rate when releasing. We'd have to wait for a full month before knowing how people install new version. Recent R4.0 release is a good example: as of now we wouldn't have the information how people adopt it.

Consider that the numbers on vertical axis have little in connect with actual number of users or machines and what really matters is the relative change. And current month is actually a good example of how that works. So instead of hiding it, maybe we could have better explanation of what this graph is really good for. Wouldn't it be a good idea to stop calling it "userbase estimation" and instead rename it "estimated adoption rate"?

@woju

This comment has been minimized.

Show comment
Hide comment
@woju

woju Apr 25, 2018

Member

And while at it, maybe we could change the Tor estimation with that proportional estimation, which I still haven't done, because it would introduce yet another confusion, which I'd have to explain to anyone reading the graph.

Member

woju commented Apr 25, 2018

And while at it, maybe we could change the Tor estimation with that proportional estimation, which I still haven't done, because it would introduce yet another confusion, which I'd have to explain to anyone reading the graph.

@andrewdavidwong

This comment has been minimized.

Show comment
Hide comment
@andrewdavidwong

andrewdavidwong Apr 26, 2018

Member

Current month is helpful when estimating adoption rate when releasing. We'd have to wait for a full month before knowing how people install new version. Recent R4.0 release is a good example: as of now we wouldn't have the information how people adopt it.

In that case, how about visually distinguishing the current month, e.g., using a dotted line for its border or blurring it, in order to indicate that it's still in the progress of growing?

Consider that the numbers on vertical axis have little in connect with actual number of users or machines and what really matters is the relative change. And current month is actually a good example of how that works. So instead of hiding it, maybe we could have better explanation of what this graph is really good for. Wouldn't it be a good idea to stop calling it "userbase estimation" and instead rename it "estimated adoption rate"?

But we've been treating it as an estimate of the number of users. In many posts in News, for example, we've pointed out that there are approximately 30k (or whatever number at the time) users.

Member

andrewdavidwong commented Apr 26, 2018

Current month is helpful when estimating adoption rate when releasing. We'd have to wait for a full month before knowing how people install new version. Recent R4.0 release is a good example: as of now we wouldn't have the information how people adopt it.

In that case, how about visually distinguishing the current month, e.g., using a dotted line for its border or blurring it, in order to indicate that it's still in the progress of growing?

Consider that the numbers on vertical axis have little in connect with actual number of users or machines and what really matters is the relative change. And current month is actually a good example of how that works. So instead of hiding it, maybe we could have better explanation of what this graph is really good for. Wouldn't it be a good idea to stop calling it "userbase estimation" and instead rename it "estimated adoption rate"?

But we've been treating it as an estimate of the number of users. In many posts in News, for example, we've pointed out that there are approximately 30k (or whatever number at the time) users.

@woju

This comment has been minimized.

Show comment
Hide comment
@woju

woju Apr 27, 2018

Member

In that case, how about visually distinguishing the current month, e.g., using a dotted line for its border or blurring it, in order to indicate that it's still in the progress of growing?

That's doable. I'll see to it in a week, since this week is holiday in Poland and I'm going to enjoy it.

But we've been treating it as an estimate of the number of users. In many posts in News, for example, we've pointed out that there are approximately 30k (or whatever number at the time) users.

I don't think that's correct (I mean, the number, not that we didn't tell that). Maybe the order of magnitude is accurate, which would justify to call it "approximately 30k".

The longer this experiment runs, the less I trust it to yield an absolute number of users, and I think that's a feature, since we don't want to know too much about people for privacy reasons. But the rate of change is quite reliable and that's what we need around releases. For example we promised R3.2 support until spring 2019, but

Member

woju commented Apr 27, 2018

In that case, how about visually distinguishing the current month, e.g., using a dotted line for its border or blurring it, in order to indicate that it's still in the progress of growing?

That's doable. I'll see to it in a week, since this week is holiday in Poland and I'm going to enjoy it.

But we've been treating it as an estimate of the number of users. In many posts in News, for example, we've pointed out that there are approximately 30k (or whatever number at the time) users.

I don't think that's correct (I mean, the number, not that we didn't tell that). Maybe the order of magnitude is accurate, which would justify to call it "approximately 30k".

The longer this experiment runs, the less I trust it to yield an absolute number of users, and I think that's a feature, since we don't want to know too much about people for privacy reasons. But the rate of change is quite reliable and that's what we need around releases. For example we promised R3.2 support until spring 2019, but

@woju

This comment has been minimized.

Show comment
Hide comment
@woju

woju May 9, 2018

Member

OK, so here is a new version:

  • Tango colours
  • current month one step lighter
  • Tor counting using new method suggested by a participant of Tor Summit 2016, via @mfc
  • annotation to make it clear that methodology changed

image

After the change around Tor, the overall number of users didn't change much. The recent versions went up 2 times, but for the older versions the number actually went down. I don't know why.

@andrewdavidwong @rootkovska @marmarek

Member

woju commented May 9, 2018

OK, so here is a new version:

  • Tango colours
  • current month one step lighter
  • Tor counting using new method suggested by a participant of Tor Summit 2016, via @mfc
  • annotation to make it clear that methodology changed

image

After the change around Tor, the overall number of users didn't change much. The recent versions went up 2 times, but for the older versions the number actually went down. I don't know why.

@andrewdavidwong @rootkovska @marmarek

@andrewdavidwong

This comment has been minimized.

Show comment
Hide comment
@andrewdavidwong

andrewdavidwong May 10, 2018

Member

Thanks, @woju. Perhaps we could we add some text explaining that the lighter shade means that the current month is not yet complete? Some readers probably won't notice that it's a shade lighter, and others may notice but not understand what it means.

Member

andrewdavidwong commented May 10, 2018

Thanks, @woju. Perhaps we could we add some text explaining that the lighter shade means that the current month is not yet complete? Some readers probably won't notice that it's a shade lighter, and others may notice but not understand what it means.

@woju

This comment has been minimized.

Show comment
Hide comment
@woju

woju May 10, 2018

Member

@andrewdavidwong Sure.

image

If there is nothing more, goes live tonight as is.

Member

woju commented May 10, 2018

@andrewdavidwong Sure.

image

If there is nothing more, goes live tonight as is.

@h01ger

This comment has been minimized.

Show comment
Hide comment
@h01ger

h01ger May 10, 2018

h01ger commented May 10, 2018

@woju

This comment has been minimized.

Show comment
Hide comment
@woju

woju May 10, 2018

Member

Thanks. I also like them.

Tor are the darker areas. I don't know how to make it clear in the legend, but I'd like not to take twice amount of space.

Colours cycle, because there is unbound number of Qubes releases, but limited amount of colours in tango. (And I'd like to pick only one of yellow/brown/orange because they feel similar). Maybe I'll try to label actual graph areas and remove the legend.

Member

woju commented May 10, 2018

Thanks. I also like them.

Tor are the darker areas. I don't know how to make it clear in the legend, but I'd like not to take twice amount of space.

Colours cycle, because there is unbound number of Qubes releases, but limited amount of colours in tango. (And I'd like to pick only one of yellow/brown/orange because they feel similar). Maybe I'll try to label actual graph areas and remove the legend.

@woju

This comment has been minimized.

Show comment
Hide comment
@woju

woju May 11, 2018

Member

@h01ger @andrewdavidwong What you'd say?
image

Member

woju commented May 11, 2018

@h01ger @andrewdavidwong What you'd say?
image

@h01ger

This comment has been minimized.

Show comment
Hide comment
@h01ger

h01ger May 11, 2018

h01ger commented May 11, 2018

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek May 11, 2018

Member

What red line means?

Member

marmarek commented May 11, 2018

What red line means?

@woju

This comment has been minimized.

Show comment
Hide comment
@woju

woju May 11, 2018

Member

@marmarek I changed the methodology of counting Tor users to that proportional estimate which was floating around. It's described in the text below the graph.

@h01ger Yes, older series miss the labels, but that's OK, since this graph is to get idea when older releases get low enough usage to phase out support.

Member

woju commented May 11, 2018

@marmarek I changed the methodology of counting Tor users to that proportional estimate which was floating around. It's described in the text below the graph.

@h01ger Yes, older series miss the labels, but that's OK, since this graph is to get idea when older releases get low enough usage to phase out support.

@woju

This comment has been minimized.

Show comment
Hide comment
@woju

woju May 11, 2018

Member

I'd like not to force label on r{1..2}, since algorithm of annotation placement is complicated enough. I'd like this graph to need as little maintenance as possible, and previous experience was it had issues which grew over time. In the current iteration I think I cleared most of those time-related problems and the script approaches stability.

Member

woju commented May 11, 2018

I'd like not to force label on r{1..2}, since algorithm of annotation placement is complicated enough. I'd like this graph to need as little maintenance as possible, and previous experience was it had issues which grew over time. In the current iteration I think I cleared most of those time-related problems and the script approaches stability.

@woju

This comment has been minimized.

Show comment
Hide comment
@woju

woju May 11, 2018

Member

@andrewdavidwong Could you update the /statistics/ page with a description of the new algorithm of counting Tor users? I don't know how to write it in proper, scientific English, but that's how it works:

  1. Normal users are counted as before, 1 unique IPv4 address = 1 user.
  2. With Tor, we'd really counted exit nodes that way, so we count them as follows:
    tor_users = tor_requests * (plain_users / plain_requests)
    (with plain_users being IP addresses as noted).

The idea was suggested by Henry de Valence, who @mfc met at Tor Summit 2016.
Here is relevant commit, with the important line highlighted: woju/qubes-stats@6f9c637#diff-ed1f7d020d4ec19f551f2a7aee66de36R182.

Member

woju commented May 11, 2018

@andrewdavidwong Could you update the /statistics/ page with a description of the new algorithm of counting Tor users? I don't know how to write it in proper, scientific English, but that's how it works:

  1. Normal users are counted as before, 1 unique IPv4 address = 1 user.
  2. With Tor, we'd really counted exit nodes that way, so we count them as follows:
    tor_users = tor_requests * (plain_users / plain_requests)
    (with plain_users being IP addresses as noted).

The idea was suggested by Henry de Valence, who @mfc met at Tor Summit 2016.
Here is relevant commit, with the important line highlighted: woju/qubes-stats@6f9c637#diff-ed1f7d020d4ec19f551f2a7aee66de36R182.

@andrewdavidwong

This comment has been minimized.

Show comment
Hide comment
@andrewdavidwong

andrewdavidwong May 12, 2018

Member

If I understand correctly, the intuitive idea is to estimate the number of Tor users by assuming that the ratio of users to requests is the same for both clearnet users and Tor users. Is that right?

How were we counting them before?

Member

andrewdavidwong commented May 12, 2018

If I understand correctly, the intuitive idea is to estimate the number of Tor users by assuming that the ratio of users to requests is the same for both clearnet users and Tor users. Is that right?

How were we counting them before?

@woju

This comment has been minimized.

Show comment
Hide comment
@woju

woju May 12, 2018

Member

Yes, that's the idea. I don't know if that assumption is correct though. I'm no statistician, but people who seemed wiser than me suggested that.

Before, we counted them like every other, 1 IP = 1 user. We just distinguished them, cross-referencing official database of exit nodes (that cross-reference is still the same, the IP counts as Tor if there was an exit node active for that address, with up to 24 h tolerance). So we really counted exit nodes. You can see that Tor number for stable releases saturates faster than plain https users. One conclusion would be that Tor users are conscious and update their systems faster, which is not that unreasonable. But the other explanation is that there is a problem with measurement. Shoudn't ratio of Tor users to all users remain approximately the same, irrespective of total number?

IIRC there were some arguments for leaving it the old way, most of those boiled down to user's privacy: if they are using Tor, the argument goes, we should not only protect their identity, but also obscure the number, or at least not make active effort of improving it (if not wholly omit). I don't recall who made that argument, but that may have been Whonix community.

I personally think they're partially right in that we're just not interested in absolute number of people (and certainly not in their personal data), because ITL funding hardly depends on adoption rates, and that's orthogonal to any Tor or whatever technology. However, some people in the team depend on donations, so the question of number comes back. Last but not least, some people derive their bragging rights from the current number of active users, so that's also important to them (that was one of the arguments for removing "any" line, which was lower than the stacked graph).

So, given that the reason is to get idea how fast people upgrade to new version and when to end support for obsolete versions, we're really interested in traffic against versions. With a secondary goal being to estimate Qubes' popularity, that was done by counting IPs (and not for example counting requests and dividing by 4, since by default people's machines check for updates once per 6 hours; people's habits and Internet connections vary). But for Tor that felt wrong, so here's new approach.

Sorry for the core dump. If you'd like, feel free to pick whatever is right and organise maybe some kind of FAQ below the graph at /statistics/. And if something is still unclear, I'm happy to answer questions.

Member

woju commented May 12, 2018

Yes, that's the idea. I don't know if that assumption is correct though. I'm no statistician, but people who seemed wiser than me suggested that.

Before, we counted them like every other, 1 IP = 1 user. We just distinguished them, cross-referencing official database of exit nodes (that cross-reference is still the same, the IP counts as Tor if there was an exit node active for that address, with up to 24 h tolerance). So we really counted exit nodes. You can see that Tor number for stable releases saturates faster than plain https users. One conclusion would be that Tor users are conscious and update their systems faster, which is not that unreasonable. But the other explanation is that there is a problem with measurement. Shoudn't ratio of Tor users to all users remain approximately the same, irrespective of total number?

IIRC there were some arguments for leaving it the old way, most of those boiled down to user's privacy: if they are using Tor, the argument goes, we should not only protect their identity, but also obscure the number, or at least not make active effort of improving it (if not wholly omit). I don't recall who made that argument, but that may have been Whonix community.

I personally think they're partially right in that we're just not interested in absolute number of people (and certainly not in their personal data), because ITL funding hardly depends on adoption rates, and that's orthogonal to any Tor or whatever technology. However, some people in the team depend on donations, so the question of number comes back. Last but not least, some people derive their bragging rights from the current number of active users, so that's also important to them (that was one of the arguments for removing "any" line, which was lower than the stacked graph).

So, given that the reason is to get idea how fast people upgrade to new version and when to end support for obsolete versions, we're really interested in traffic against versions. With a secondary goal being to estimate Qubes' popularity, that was done by counting IPs (and not for example counting requests and dividing by 4, since by default people's machines check for updates once per 6 hours; people's habits and Internet connections vary). But for Tor that felt wrong, so here's new approach.

Sorry for the core dump. If you'd like, feel free to pick whatever is right and organise maybe some kind of FAQ below the graph at /statistics/. And if something is still unclear, I'm happy to answer questions.

andrewdavidwong added a commit to QubesOS/qubes-doc that referenced this issue May 12, 2018

@andrewdavidwong andrewdavidwong referenced this issue in QubesOS/qubes-doc May 12, 2018

Merged

Add FAQ to Statistics page #649

andrewdavidwong added a commit to QubesOS/qubes-doc that referenced this issue May 13, 2018

@andrewdavidwong

This comment has been minimized.

Show comment
Hide comment
@andrewdavidwong

andrewdavidwong May 13, 2018

Member

@woju: The FAQ is live. I suggest removing all the text in the lower-right corner of the image to avoid conflicting information in case the FAQ is updated in the future.

Member

andrewdavidwong commented May 13, 2018

@woju: The FAQ is live. I suggest removing all the text in the lower-right corner of the image to avoid conflicting information in case the FAQ is updated in the future.

marmarek added a commit to QubesOS/qubesos.github.io that referenced this issue May 13, 2018

autoupdate: _doc
_doc:
    gpg: Good signature from "Andrew David Wong (Qubes Documentation Signing Key)" [ultimate]
    object 47665c8e263d6cd374865910c7a4d75dea7deac4
    type commit
    tag adw_47665c8e
    tagger Andrew David Wong <adw@andrewdavidwong.com> 1526236328 -0500

    Tag for commit 47665c8e263d6cd374865910c7a4d75dea7deac4

    47665c8 Merge branch 'awokd-patch-1'
    8d34aac Merge branch 'patch-1' of https://github.com/awokd/qubes-doc into awokd-patch-1
    6635f9e Merge branch 'stats-faq' (QubesOS/qubes-issues#3858)
    612d924 tag as R3.2 content
    4cb0c47 Update Statistics FAQ (#649)
    c937c8e Add FAQ to Statistics page (QubesOS/qubes-issues#3858)
@andrewdavidwong

This comment has been minimized.

Show comment
Hide comment
@andrewdavidwong

andrewdavidwong May 13, 2018

Member

@woju: Please use an ISO 8601 format for the "last updated" date. We have an international audience, and region-specific date formats cause confusion.

Member

andrewdavidwong commented May 13, 2018

@woju: Please use an ISO 8601 format for the "last updated" date. We have an international audience, and region-specific date formats cause confusion.

@woju

This comment has been minimized.

Show comment
Hide comment
@woju

woju May 14, 2018

Member

I'd retain the info about the red line for at least 3 years until it goes off the graph, since the graph may be copied and pasted into slide decks.

As to the date, I'll replace it in the .json file, for machine readability. But for the graph, as an European, I'd say dd.mm.yyyy is the international date format. Confusion would be if I used / as separator, because there would be then possibility that the date was written by someone from certain offshore region. :->
https://en.wikipedia.org/wiki/Date_format_by_country

Member

woju commented May 14, 2018

I'd retain the info about the red line for at least 3 years until it goes off the graph, since the graph may be copied and pasted into slide decks.

As to the date, I'll replace it in the .json file, for machine readability. But for the graph, as an European, I'd say dd.mm.yyyy is the international date format. Confusion would be if I used / as separator, because there would be then possibility that the date was written by someone from certain offshore region. :->
https://en.wikipedia.org/wiki/Date_format_by_country

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek May 14, 2018

Member

But for the graph, as an European, I'd say dd.mm.yyyy is the international date format.

ISO 8601 is the standard, regardless what you think about it.

Member

marmarek commented May 14, 2018

But for the graph, as an European, I'd say dd.mm.yyyy is the international date format.

ISO 8601 is the standard, regardless what you think about it.

@h01ger

This comment has been minimized.

Show comment
Hide comment
@h01ger

h01ger May 14, 2018

h01ger commented May 14, 2018

@woju

This comment has been minimized.

Show comment
Hide comment
@woju

woju May 14, 2018

Member

Well, we don't order the graphs, OK, let it be that way. Goes live tonight.
image

Member

woju commented May 14, 2018

Well, we don't order the graphs, OK, let it be that way. Goes live tonight.
image

@woju woju closed this May 14, 2018

@woju

This comment has been minimized.

Show comment
Hide comment
@woju

woju May 14, 2018

Member

Guys I'm closing it, there isn't much more room for improvement, and I already spent much time on it. Let's revisit it when the red line goes off the left side and remove comment. Or when something breaks.

Member

woju commented May 14, 2018

Guys I'm closing it, there isn't much more room for improvement, and I already spent much time on it. Let's revisit it when the red line goes off the left side and remove comment. Or when something breaks.

@h01ger

This comment has been minimized.

Show comment
Hide comment
@h01ger

h01ger May 14, 2018

h01ger commented May 14, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment