Statistics library for .NET Standard 1.0.
Basic Descriptive Statistics Algorithms implemented as sequential in both synchronous and aynchronous variants.
Implementations are based on:
-
IEnumerable<T>
as extension methods (LINQ -System.Linq
)-
netstandard1.0
-
Implementation [DONE]
-
Tests [INPROGRESS]
-
-
Span<T>
andMemory<T>
fromSystem.Memory
-
netstandard1.1
-
Implementation [INPROGRESS]
-
Tests [INPROGRESS]
-
-
IAsyncEnumerable<T>
from ``-
netstandard1.0
-
Implementation [INPROGRESS]
-
Tests [INPROGRESS]
-
-
ArrayList
not encouraged, just as an option for those that must use it in legacy code-
netstandard1.3
-
Implementation [INPROGRESS]
-
Tests [INPROGRESS]
-
NOTE: Parallel algorithms based on (TPL) can be found in
Group categorization might be wrong (accepting comments as issues), but the idea was just to simplify implementation and maintenance.
Implemented and planned (TODOs):
-
Measures of Central Tendency
-
Implemented
-
Tested
-
Average (Mean) Value algorithms
-
Arithmetic mean
-
Implemented
-
Tested
-
Weighted aritmetic mean
-
Implemented
-
Tested
-
Geometric mean
-
Implemented
-
Tested
-
Harmonic Mean
* https://en.wikipedia.org/wiki/Harmonic_mean * [x] Implemented * [x] Tested
-
Quadratic AKA Root Mean Square
-
Cubic Mean
-
Implemented
-
Tested
-
Generalized AKA Power Mean
-
Implemented
-
Tested
-
Weighted Generalized Mean
-
Implemented
-
Tested
-
-
-
Median
-
Implemented
-
Tested
-
Modes
-
Implemented
-
Tested
-
-
Summary tables
-
Frequencies
-
Frequency Counter
-
Implemented
-
Tested
-
-
Frequencies
-
Implemented
-
Tested
-
-
Frequency Distribution
-
Implemented
-
Tested
-
-
-
-
Shape of a probability distribution
-
Skewness
-
-
Pearson's moment coefficient of skewness
-
Pearson's first skewness coefficient (mode skewness)
-
Pearson's second skewness coefficient (median skewness)
-
-
Kurtosis
-
Differences between formulas: more references on R package 'e1071', pages -
27
- and -47
.
-
-
-
Measures of Dispersion
-
Moments
-
Implemented
-
Tested
-
Generalised
-
Implemented
-
Tested
-
-
Central
-
Implemented
-
Tested
-
-
Central Absolute
-
Implemented
-
Tested
-
-
Row
-
Implemented
-
Tested
-
-
row absolute
-
Implemented
-
Tested
-
-
Variance (Sample and Population)
-
Standard Deviation (Sample and Population)
-
Range
-
Coefficient of variation
-
Quantile
-
[] Implemented
-
[] Tested
-
Quartile
-
Implemented
-
Tested
-
[] Percentile
-
Implemented
-
Tested
-
Interquartile range
-
Implemented
-
Tested
-
-
Dependency Measures
* Correlation
* [ ] Pearson product-moment correlation
* [ ] Spearman's rho
* [ ] Kendall's tau
* Covariance
- Inferential Methods (will be moved to separate repo/nuget)
* F Statistics
* Studentt Statistic
* Welcht Statistic
TODO:
- naming discussion
To Consider: | |
Central Tendencies | OK |
Dispersion | Variability seems to be OK |
Shape | Was named - Distribution Asymmetry Roundness |
Central Tendencies | OK |
Dependency /Dependencies | Correlation Covariance |
Comparison of this library with
-
vanilla R (no special libs)
-
vanilla Python (no special libs)
-
Matlab/Octave
x = IEnumerable<T> |
||||
---|---|---|---|---|
extension methods |
C# HolisticWare | Octave | R | Python | |
---|---|---|---|---|
Central Tendencies | x.MeanArithmetic(); |
mean(vector_list, "a") |
mean(vector_list) |
|
x.MeanGeometric(); |
mean(vector_list, "g") |
N/A (custom function) |
||
x.MeanGeometricNaive(); |
mean(vector_list, "g") |
N/A (custom function) |
||
x.MeanHarmonic(); |
mean(vector_list, "h") |
??? |
||
x.MeanSquared(); |
meansq(vector_list) |
??? |
||
x.MeanCubic(); |
??? |
???? |
||
x.MeanWeighted(); |
??? |
???? |
||
x.Median(); |
??? |
???? |
||
x.MedianWeighted(); |
??? |
???? |
||
x.Modes(); |
??? |
???? |
||
x.ModesRank(); |
??? |
???? |
||
Dispersion | x.Moment(); |
??? |
???? |
|
x.MomentCentral(); |
??? |
???? |
||
x.MomentCentralAbsolute(); |
??? |
???? |
||
x.MomentRaw(); |
??? |
???? |
||
x.MomentRawAbsolute(); |
??? |
???? |
||
x.Range(); |
??? |
???? |
||
x.StandardDeviationPopulation(); |
??? |
???? |
||
x.StandardDeviationSample(); |
??? |
???? |
||
x.VariancePopulation(); |
??? |
???? |
||
x.Variance(); |
??? |
???? |
||
Distribution | x.FrequencyCounter(); |
??? |
???? |
|
x.FrequencyDistribution(); |
??? |
???? |
||
x.Frequencies(); |
??? |
???? |
||
x.Percentiles(); |
??? |
???? |
||
x.PercentilesRank(); |
??? |
???? |
||
Distribution Asymm | x.Skewness(); |
??? |
???? |
|
Distribution Round | x.Kurtosis(); |
??? |
???? |
|
x.KurtosisSample(); |
??? |
???? |
||
x.KurtosisSampleExcess(); |
??? |
???? |
||
Dependency | x.Correlation(y); |
??? |
???? |
|
x.CorrelationOptimized(y); |
??? |
???? |
||
x.Covariance(y); |
??? |
???? |
||
Inferential | x.StudenttStatisticIndependent(y); |
??? |
???? |
|
x.FStatistic(y); |
??? |
???? |
||
x.Welcht(y); |
??? |
???? |
-
Numerical Methods, Algorithms and Tools in C#
-
IEnumerable<T>
extension methods (LINQ like) -
async
/await
APIEach synchronous (blocking) method has processor bound async (TPL counterpart).
https://docs.microsoft.com/en-us/dotnet/csharp/async#recognize-cpu-bound-and-io-bound-work
https://docs.microsoft.com/en-us/dotnet/standard/async-in-depth
-
TPL/PLINQ
Parallel versions target .NET Standard 1.1, so they are in placed separate nuget and repo:
There are 2 reasons for the lack of IEnumerable<Nullable<T>>
implementations:
When Nullable Arithmetic is extended to LINQ implementation the results are not consistent
// suppose
int? i_sum_1 = 2 + null;
int? i_product_1 = 2 * null;
//
// suppose
int? i_sum_2 = 2 + 5 + null;
int? i_product_2 = 2 * 5 + null;
// generalized case (LINQ)
// sum(i)
int? i_sum_3 = (new[] {2, 5, null}).Sum(); // 7
It is possible to avoid this with Aggregate()
extension method, but the decision was made to skip this
implementation, because of following reason.
One of the first steps in analyzing and processing data is missing data processing, where missing data
is usually represented as null
values. So, every IEnumerable<Nullable<T>>
must be converted/transformed
to IEnumerable<T>
prior to further analysis and processing.
-
Octave
-
API
- flexible API for Skewness and Kurtosis
-
testing
-
unit testing
-
Moments
-
Dispersion
-
-
-
C# 7.2
stub already in :
tests/unit-tests/UnitTests.CommonShared/Sync/DarkVaderTests/Tests20180119Dataset01/Tests011MeacuresCentralTendencies.MeanArithmetic.cs
-
Span<T>
, -
ReadOnlySpan<T>
, -
Memory<T>
, -
ReadOnlyMemory<T>
-
-
optimizations
-
data caching
reuse of precalculated values
-
async
/await
tuning -
parallel algorithms (separate repo/nuget)
-
-
unit tests
-
currently (2018-02-24): 48
-
online calculator sources (references / links)
mainly used for comparing results
-
-
more algorithms
-
Multivariate Statistics
-
Inferential Statistics (Statistical Inference)
-
-
unit tests
-
RX
Performance question boils down to use of for
vs foreach
. Current implementation uses for
.
After benchmarks will be added and tests conducted, implementation might change in the future.
-
https://stackoverflow.com/questions/365615/in-net-which-loop-runs-faster-for-or-foreach
-
https://codeblog.jonskeet.uk/2009/01/29/for-vs-foreach-on-arrays-and-lists/
-
online calculators
-
some ideas and inspirations came from following libraries:
-
Correlation
-
LinqStatistics
-
Meta.Numerics
-
Authors
-
Darko Katovic - Katodix
KIF (Faculty of Kinesiology, University of Zagreb, Croatia)
-
Miljenko Cvjetko - moljac
Microsoft (Xamarin Inc.), HolisticWare
-