Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arithmetic Median for relative values #365

Closed
Richargh opened this issue Feb 18, 2019 · 10 comments · Fixed by #894
Closed

Arithmetic Median for relative values #365

Richargh opened this issue Feb 18, 2019 · 10 comments · Fixed by #894
Assignees
Labels
pr-visualization Issues that touch the visualization pr(oject) which means web and desktop features.

Comments

@Richargh
Copy link

Richargh commented Feb 18, 2019

Feature Request

If a metric is relative the arithmetic median for the numbers should be shown when hovering over folders instead of summing them. Also show an arithmetic median symbol.

When I hover over a single building
Then the metric wihout any symbol is shown

Given I have a cc.json that contains relative attribute types
When I hover over a package
and the metric selected is a relative one
Then the airthmetic median for that metric with the correct symbol should be shown.

Calculating the Median

If you have three numbers 2,3,2.000 the median is 3. If you have an event amount of numbers it's apparanetly the sum of the two middle numbers, divided by two. At least that is my understanding, please confirm from a different source as well.

This list of statistical symbols suggests that the median symbol looks like a wave above the number.

@Richargh Richargh added the pr-visualization Issues that touch the visualization pr(oject) which means web and desktop features. label Feb 18, 2019
@Richargh Richargh changed the title Average for relative values Arithmetic Mean for relative values May 29, 2019
@NearW
Copy link
Contributor

NearW commented May 31, 2019

We don't create attributeTypes in the cc.json right now, which should give us information about whether a metric value should be interpreted as a relative or absolute one. This needs to be implemented before or while working on this issue

@NearW
Copy link
Contributor

NearW commented Jun 11, 2019

We will still have problems for metrics that depend on rloc.
For example: I have a folder with 2 files. fileA has 1.000.000 lines of code and fileB has 20. fileA has a test-coverage of 100% and fileB has 0%. We'd display 50% even though this value is not representive for that folder.

@Richargh
Copy link
Author

Richargh commented Jun 14, 2019

We'd display 50% even though this value is not representive for that folder.

Well it's correct fromt the point of view of the mean. But, yeah it's not representative. Do you have a different average idea? The one I know is the median which is:

generally used for skewed distributions
A median can be computed by listing all numbers in ascending order and then locating the number in the centre of that distribution.

If you have three numbers 2,3,2.000 the median is 3. If you have an event amount of numbers it's apparanetly the sum of the two middle numbers, divided by two. For your example it's still 50% coverage. I see no way around that besides also displaying a deviation. Do you have another idea?

@alschmut
Copy link
Contributor

We could use this awesome lighting fa-icon to point out that the value might not be representative (when the deviation is too high. Well when is it too high..?). And on hover we can show the deviation value. Okey, I admit its more like a joke, but an idea 😄

@Richargh
Copy link
Author

@alschmut We might do that based on user feedback. We probably don't need it right away. In the situation that @NearW pointed out the users can clearly see that there are only two buildings.

@Richargh Richargh changed the title Arithmetic Mean for relative values Arithmetic Median for relative values Jun 17, 2019
@Richargh
Copy link
Author

I changed the request to calculate the median instead of the mean.

@alexhunziker
Copy link
Contributor

The median is not only applicable to relative values, there are multiple other metrics where the median makes more sense than summing them up. Eg: age in weeks, number of authors, ...

I suggest we change the name of the attribute types to reflect that (e.g. AttributeType.median) and set reasonable values in the importers. In future versions it may also make sense to let the user switch between different aggregation methods in the visualization.

@alschmut
Copy link
Contributor

@alexhunziker i totally agree! But I would not rename the attributeType relative to median inside the cc.json. In my opinion the word „relative“ still describes the functionality correctly and therefor a renaming would only cause more issues with existing cc.json files.

@alexhunziker
Copy link
Contributor

Speaking of the .cc.json files; in our documentation we describe the AttributeType like this:

"attributesType": {
			"title": "attributesType list",
			"type": "object",
			"properties": {
				"sum": {
					"title": "sum of aggregated attribute numbers",
					"type": "number"
				},
				"average": {
					"title": "average of aggregated attribute numbers",
					"type": "number"
				}
			}
		}

Whereas the cc.json of our demo file contains this:

"attributeTypes":{
    "nodes":[
        {"rloc":"absolute"},
        {"avgCommits":"absolute"},
        ...
    ]}

Which I belive is not consistent with the above schema. Or am I wrong?

@alschmut
Copy link
Contributor

That is actually true. I think we should therefore address the JSON validation.

alexhunziker added a commit that referenced this issue Feb 4, 2020
alexhunziker added a commit that referenced this issue Mar 5, 2020
alexhunziker added a commit that referenced this issue Mar 5, 2020
alexhunziker added a commit that referenced this issue Mar 5, 2020
alexhunziker added a commit that referenced this issue Mar 12, 2020
alexhunziker added a commit that referenced this issue Mar 12, 2020
alexhunziker added a commit that referenced this issue Mar 12, 2020
alexhunziker added a commit that referenced this issue Mar 12, 2020
alexhunziker added a commit that referenced this issue Mar 12, 2020
alexhunziker added a commit that referenced this issue Mar 16, 2020
alexhunziker added a commit that referenced this issue Mar 21, 2020
alexhunziker added a commit that referenced this issue Mar 21, 2020
alexhunziker added a commit that referenced this issue Mar 21, 2020
alexhunziker added a commit that referenced this issue Mar 21, 2020
@alexhunziker alexhunziker self-assigned this Mar 22, 2020
alexhunziker added a commit that referenced this issue Mar 28, 2020
NearW added a commit that referenced this issue Apr 8, 2020
@NearW NearW closed this as completed in #894 Apr 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-visualization Issues that touch the visualization pr(oject) which means web and desktop features.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants