Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign up(Re)consider hiding current month in Qubes userbase estimate chart (aka "Statistics") #3858
Comments
andrewdavidwong
added
enhancement
C: website
UX
labels
Apr 25, 2018
andrewdavidwong
added this to the
Documentation/website milestone
Apr 25, 2018
andrewdavidwong
assigned
woju
Apr 25, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
woju
Apr 25, 2018
Member
Current month is helpful when estimating adoption rate when releasing. We'd have to wait for a full month before knowing how people install new version. Recent R4.0 release is a good example: as of now we wouldn't have the information how people adopt it.
Consider that the numbers on vertical axis have little in connect with actual number of users or machines and what really matters is the relative change. And current month is actually a good example of how that works. So instead of hiding it, maybe we could have better explanation of what this graph is really good for. Wouldn't it be a good idea to stop calling it "userbase estimation" and instead rename it "estimated adoption rate"?
|
Current month is helpful when estimating adoption rate when releasing. We'd have to wait for a full month before knowing how people install new version. Recent R4.0 release is a good example: as of now we wouldn't have the information how people adopt it. Consider that the numbers on vertical axis have little in connect with actual number of users or machines and what really matters is the relative change. And current month is actually a good example of how that works. So instead of hiding it, maybe we could have better explanation of what this graph is really good for. Wouldn't it be a good idea to stop calling it "userbase estimation" and instead rename it "estimated adoption rate"? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
woju
Apr 25, 2018
Member
And while at it, maybe we could change the Tor estimation with that proportional estimation, which I still haven't done, because it would introduce yet another confusion, which I'd have to explain to anyone reading the graph.
|
And while at it, maybe we could change the Tor estimation with that proportional estimation, which I still haven't done, because it would introduce yet another confusion, which I'd have to explain to anyone reading the graph. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
andrewdavidwong
Apr 26, 2018
Member
Current month is helpful when estimating adoption rate when releasing. We'd have to wait for a full month before knowing how people install new version. Recent R4.0 release is a good example: as of now we wouldn't have the information how people adopt it.
In that case, how about visually distinguishing the current month, e.g., using a dotted line for its border or blurring it, in order to indicate that it's still in the progress of growing?
Consider that the numbers on vertical axis have little in connect with actual number of users or machines and what really matters is the relative change. And current month is actually a good example of how that works. So instead of hiding it, maybe we could have better explanation of what this graph is really good for. Wouldn't it be a good idea to stop calling it "userbase estimation" and instead rename it "estimated adoption rate"?
But we've been treating it as an estimate of the number of users. In many posts in News, for example, we've pointed out that there are approximately 30k (or whatever number at the time) users.
In that case, how about visually distinguishing the current month, e.g., using a dotted line for its border or blurring it, in order to indicate that it's still in the progress of growing?
But we've been treating it as an estimate of the number of users. In many posts in News, for example, we've pointed out that there are approximately 30k (or whatever number at the time) users. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
woju
Apr 27, 2018
Member
In that case, how about visually distinguishing the current month, e.g., using a dotted line for its border or blurring it, in order to indicate that it's still in the progress of growing?
That's doable. I'll see to it in a week, since this week is holiday in Poland and I'm going to enjoy it.
But we've been treating it as an estimate of the number of users. In many posts in News, for example, we've pointed out that there are approximately 30k (or whatever number at the time) users.
I don't think that's correct (I mean, the number, not that we didn't tell that). Maybe the order of magnitude is accurate, which would justify to call it "approximately 30k".
The longer this experiment runs, the less I trust it to yield an absolute number of users, and I think that's a feature, since we don't want to know too much about people for privacy reasons. But the rate of change is quite reliable and that's what we need around releases. For example we promised R3.2 support until spring 2019, but
That's doable. I'll see to it in a week, since this week is holiday in Poland and I'm going to enjoy it.
I don't think that's correct (I mean, the number, not that we didn't tell that). Maybe the order of magnitude is accurate, which would justify to call it "approximately 30k". The longer this experiment runs, the less I trust it to yield an absolute number of users, and I think that's a feature, since we don't want to know too much about people for privacy reasons. But the rate of change is quite reliable and that's what we need around releases. For example we promised R3.2 support until spring 2019, but |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
woju
May 9, 2018
Member
OK, so here is a new version:
- Tango colours
- current month one step lighter
- Tor counting using new method suggested by a participant of Tor Summit 2016, via @mfc
- annotation to make it clear that methodology changed
After the change around Tor, the overall number of users didn't change much. The recent versions went up 2 times, but for the older versions the number actually went down. I don't know why.
|
OK, so here is a new version:
After the change around Tor, the overall number of users didn't change much. The recent versions went up 2 times, but for the older versions the number actually went down. I don't know why. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
andrewdavidwong
May 10, 2018
Member
Thanks, @woju. Perhaps we could we add some text explaining that the lighter shade means that the current month is not yet complete? Some readers probably won't notice that it's a shade lighter, and others may notice but not understand what it means.
|
Thanks, @woju. Perhaps we could we add some text explaining that the lighter shade means that the current month is not yet complete? Some readers probably won't notice that it's a shade lighter, and others may notice but not understand what it means. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
@andrewdavidwong Sure. If there is nothing more, goes live tonight as is. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
h01ger
May 10, 2018
h01ger
commented
May 10, 2018
|
On Thu, May 10, 2018 at 02:40:23AM -0700, Wojtek Porczyk wrote:

I find this very confusing:
- I dont see any gray areas in the bars, so no tor users?
- why have r1 and r3.2 user the same colors (in the legend in the upper
left)
- why have r2 and r4.0 user the same colors (in the legend in the upper
left)
- why have r3.0 and r4.1 users the same colors (in the legend in the upper
left)
- why have r3.1 and tor users the same colors (in the legend in the upper
left)
- whats the meaning of the darker colors of the bars on the right?
I do like the tango colors! :)
…--
cheers,
Holger
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
woju
May 10, 2018
Member
Thanks. I also like them.
Tor are the darker areas. I don't know how to make it clear in the legend, but I'd like not to take twice amount of space.
Colours cycle, because there is unbound number of Qubes releases, but limited amount of colours in tango. (And I'd like to pick only one of yellow/brown/orange because they feel similar). Maybe I'll try to label actual graph areas and remove the legend.
|
Thanks. I also like them. Tor are the darker areas. I don't know how to make it clear in the legend, but I'd like not to take twice amount of space. Colours cycle, because there is unbound number of Qubes releases, but limited amount of colours in tango. (And I'd like to pick only one of yellow/brown/orange because they feel similar). Maybe I'll try to label actual graph areas and remove the legend. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
@h01ger @andrewdavidwong What you'd say? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
h01ger
May 11, 2018
h01ger
commented
May 11, 2018
|
On Fri, May 11, 2018 at 09:50:03AM -0700, Wojtek Porczyk wrote:
@h01ger @andrewdavidwong What you'd say?

much better, thanks!
(just which color are the 4.1 users? I suspect there are so few, that I
can't see this, so maybe omit the label for now? Also I believe 2.x
misses a label.
and maybe "shaded/darker users are tor users"...)
…--
cheers,
Holger
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
What red line means? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
woju
May 11, 2018
Member
@marmarek I changed the methodology of counting Tor users to that proportional estimate which was floating around. It's described in the text below the graph.
@h01ger Yes, older series miss the labels, but that's OK, since this graph is to get idea when older releases get low enough usage to phase out support.
|
@marmarek I changed the methodology of counting Tor users to that proportional estimate which was floating around. It's described in the text below the graph. @h01ger Yes, older series miss the labels, but that's OK, since this graph is to get idea when older releases get low enough usage to phase out support. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
woju
May 11, 2018
Member
I'd like not to force label on r{1..2}, since algorithm of annotation placement is complicated enough. I'd like this graph to need as little maintenance as possible, and previous experience was it had issues which grew over time. In the current iteration I think I cleared most of those time-related problems and the script approaches stability.
|
I'd like not to force label on r{1..2}, since algorithm of annotation placement is complicated enough. I'd like this graph to need as little maintenance as possible, and previous experience was it had issues which grew over time. In the current iteration I think I cleared most of those time-related problems and the script approaches stability. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
woju
May 11, 2018
Member
@andrewdavidwong Could you update the /statistics/ page with a description of the new algorithm of counting Tor users? I don't know how to write it in proper, scientific English, but that's how it works:
- Normal users are counted as before, 1 unique IPv4 address = 1 user.
- With Tor, we'd really counted exit nodes that way, so we count them as follows:
tor_users = tor_requests * (plain_users / plain_requests)
(withplain_usersbeing IP addresses as noted).
The idea was suggested by Henry de Valence, who @mfc met at Tor Summit 2016.
Here is relevant commit, with the important line highlighted: woju/qubes-stats@6f9c637#diff-ed1f7d020d4ec19f551f2a7aee66de36R182.
|
@andrewdavidwong Could you update the /statistics/ page with a description of the new algorithm of counting Tor users? I don't know how to write it in proper, scientific English, but that's how it works:
The idea was suggested by Henry de Valence, who @mfc met at Tor Summit 2016. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
andrewdavidwong
May 12, 2018
Member
If I understand correctly, the intuitive idea is to estimate the number of Tor users by assuming that the ratio of users to requests is the same for both clearnet users and Tor users. Is that right?
How were we counting them before?
|
If I understand correctly, the intuitive idea is to estimate the number of Tor users by assuming that the ratio of users to requests is the same for both clearnet users and Tor users. Is that right? How were we counting them before? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
woju
May 12, 2018
Member
Yes, that's the idea. I don't know if that assumption is correct though. I'm no statistician, but people who seemed wiser than me suggested that.
Before, we counted them like every other, 1 IP = 1 user. We just distinguished them, cross-referencing official database of exit nodes (that cross-reference is still the same, the IP counts as Tor if there was an exit node active for that address, with up to 24 h tolerance). So we really counted exit nodes. You can see that Tor number for stable releases saturates faster than plain https users. One conclusion would be that Tor users are conscious and update their systems faster, which is not that unreasonable. But the other explanation is that there is a problem with measurement. Shoudn't ratio of Tor users to all users remain approximately the same, irrespective of total number?
IIRC there were some arguments for leaving it the old way, most of those boiled down to user's privacy: if they are using Tor, the argument goes, we should not only protect their identity, but also obscure the number, or at least not make active effort of improving it (if not wholly omit). I don't recall who made that argument, but that may have been Whonix community.
I personally think they're partially right in that we're just not interested in absolute number of people (and certainly not in their personal data), because ITL funding hardly depends on adoption rates, and that's orthogonal to any Tor or whatever technology. However, some people in the team depend on donations, so the question of number comes back. Last but not least, some people derive their bragging rights from the current number of active users, so that's also important to them (that was one of the arguments for removing "any" line, which was lower than the stacked graph).
So, given that the reason is to get idea how fast people upgrade to new version and when to end support for obsolete versions, we're really interested in traffic against versions. With a secondary goal being to estimate Qubes' popularity, that was done by counting IPs (and not for example counting requests and dividing by 4, since by default people's machines check for updates once per 6 hours; people's habits and Internet connections vary). But for Tor that felt wrong, so here's new approach.
Sorry for the core dump. If you'd like, feel free to pick whatever is right and organise maybe some kind of FAQ below the graph at /statistics/. And if something is still unclear, I'm happy to answer questions.
|
Yes, that's the idea. I don't know if that assumption is correct though. I'm no statistician, but people who seemed wiser than me suggested that. Before, we counted them like every other, 1 IP = 1 user. We just distinguished them, cross-referencing official database of exit nodes (that cross-reference is still the same, the IP counts as Tor if there was an exit node active for that address, with up to 24 h tolerance). So we really counted exit nodes. You can see that Tor number for stable releases saturates faster than plain https users. One conclusion would be that Tor users are conscious and update their systems faster, which is not that unreasonable. But the other explanation is that there is a problem with measurement. Shoudn't ratio of Tor users to all users remain approximately the same, irrespective of total number? IIRC there were some arguments for leaving it the old way, most of those boiled down to user's privacy: if they are using Tor, the argument goes, we should not only protect their identity, but also obscure the number, or at least not make active effort of improving it (if not wholly omit). I don't recall who made that argument, but that may have been Whonix community. I personally think they're partially right in that we're just not interested in absolute number of people (and certainly not in their personal data), because ITL funding hardly depends on adoption rates, and that's orthogonal to any Tor or whatever technology. However, some people in the team depend on donations, so the question of number comes back. Last but not least, some people derive their bragging rights from the current number of active users, so that's also important to them (that was one of the arguments for removing "any" line, which was lower than the stacked graph). So, given that the reason is to get idea how fast people upgrade to new version and when to end support for obsolete versions, we're really interested in traffic against versions. With a secondary goal being to estimate Qubes' popularity, that was done by counting IPs (and not for example counting requests and dividing by 4, since by default people's machines check for updates once per 6 hours; people's habits and Internet connections vary). But for Tor that felt wrong, so here's new approach. Sorry for the core dump. If you'd like, feel free to pick whatever is right and organise maybe some kind of FAQ below the graph at /statistics/. And if something is still unclear, I'm happy to answer questions. |
added a commit
to QubesOS/qubes-doc
that referenced
this issue
May 12, 2018
andrewdavidwong
referenced this issue
in QubesOS/qubes-doc
May 12, 2018
Merged
Add FAQ to Statistics page #649
added a commit
to QubesOS/qubes-doc
that referenced
this issue
May 13, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
andrewdavidwong
May 13, 2018
Member
@woju: The FAQ is live. I suggest removing all the text in the lower-right corner of the image to avoid conflicting information in case the FAQ is updated in the future.
|
@woju: The FAQ is live. I suggest removing all the text in the lower-right corner of the image to avoid conflicting information in case the FAQ is updated in the future. |
added a commit
to QubesOS/qubesos.github.io
that referenced
this issue
May 13, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
woju
May 14, 2018
Member
I'd retain the info about the red line for at least 3 years until it goes off the graph, since the graph may be copied and pasted into slide decks.
As to the date, I'll replace it in the .json file, for machine readability. But for the graph, as an European, I'd say dd.mm.yyyy is the international date format. Confusion would be if I used / as separator, because there would be then possibility that the date was written by someone from certain offshore region. :->
https://en.wikipedia.org/wiki/Date_format_by_country
|
I'd retain the info about the red line for at least 3 years until it goes off the graph, since the graph may be copied and pasted into slide decks. As to the date, I'll replace it in the .json file, for machine readability. But for the graph, as an European, I'd say |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
May 14, 2018
Member
But for the graph, as an European, I'd say dd.mm.yyyy is the international date format.
ISO 8601 is the standard, regardless what you think about it.
ISO 8601 is the standard, regardless what you think about it. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
h01ger
May 14, 2018
h01ger
commented
May 14, 2018
|
On Mon, May 14, 2018 at 03:46:16AM -0700, Marek Marczykowski-Górecki wrote:
> But for the graph, as an European, I'd say dd.mm.yyyy is the international date format.
ISO 8601 _is_ the standard,
I agree with Marek here on ISO 8601 as the relevant standard. Qubes has
a world wide audience, it's not limited to Europe only.
Just look at the top of https://qubes-os.org - at least half of the
depicted experts are US americans ;)
And ISO 8601's ordering makes a lot of sense as well.
…--
cheers,
Holger
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
woju
closed this
May 14, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
woju
May 14, 2018
Member
Guys I'm closing it, there isn't much more room for improvement, and I already spent much time on it. Let's revisit it when the red line goes off the left side and remove comment. Or when something breaks.
|
Guys I'm closing it, there isn't much more room for improvement, and I already spent much time on it. Let's revisit it when the red line goes off the left side and remove comment. Or when something breaks. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
h01ger
May 14, 2018
h01ger
commented
May 14, 2018
|
On Mon, May 14, 2018 at 10:08:27AM -0700, Wojtek Porczyk wrote:
Well, we don't order the graphs, OK, let it be that way. Goes live tonight.

cool! thanks for all your work on this!
…--
cheers,
Holger
|




andrewdavidwong commentedApr 25, 2018
[Branched from the discussion in #3841.]
Some people misinterpret the lower bar height of the current month as a drop in users. They don't understand that the bar will continue to increase in height because there's still time left to go before the end of the month. Hiding the current month would prevent this misunderstanding.
CC: @woju, @marmarek, @rootkovska