Skip to content

HolisticWare-Libraries/HolisticWare.Core

Repository files navigation

HolisticWare.Core.Math.Statistics.Sequential

Statistics library for .NET Standard 1.0.

Basic Descriptive Statistics Algorithms implemented as sequential in both synchronous and aynchronous variants.

Implementations are based on:

  • IEnumerable<T> as extension methods (LINQ - System.Linq)

    • netstandard1.0

    • Implementation [DONE]

    • Tests [INPROGRESS]

  • Span<T> and Memory<T> from System.Memory

    • netstandard1.1

    • Implementation [INPROGRESS]

    • Tests [INPROGRESS]

  • IAsyncEnumerable<T> from ``

    • netstandard1.0

    • Implementation [INPROGRESS]

    • Tests [INPROGRESS]

  • ArrayList not encouraged, just as an option for those that must use it in legacy code

    • netstandard1.3

    • Implementation [INPROGRESS]

    • Tests [INPROGRESS]

NOTE: Parallel algorithms based on (TPL) can be found in

Algorithms

Group categorization might be wrong (accepting comments as issues), but the idea was just to simplify implementation and maintenance.

Implemented and planned (TODOs):

  1. Measures of Central Tendency

    • Implemented

    • Tested

    1. Average (Mean) Value algorithms

      1. Arithmetic mean

      2. Weighted aritmetic mean

      3. Geometric mean

      4. Harmonic Mean

      *   https://en.wikipedia.org/wiki/Harmonic_mean
      
      *   [x] Implemented
      
      *   [x] Tested
      
      1. Quadratic AKA Root Mean Square

      2. Cubic Mean

      3. Generalized AKA Power Mean

      4. Weighted Generalized Mean

        • Implemented

        • Tested

  2. Summary tables

    • Frequencies

      • Frequency Counter

        • Implemented

        • Tested

      • Frequencies

        • Implemented

        • Tested

      • Frequency Distribution

        • Implemented

        • Tested

  3. Shape of a probability distribution

    • Skewness

      • Skewnes (Alpha 3)

        • Pearson's moment coefficient of skewness

        • Pearson's first skewness coefficient (mode skewness)

        • Pearson's second skewness coefficient (median skewness)

      • Kurtosis

  4. Measures of Dispersion

  5. Dependency Measures

*   Correlation 

    *   [ ] Pearson product-moment correlation

    *   [ ] Spearman's rho

    *   [ ] Kendall's tau

*   Covariance
  1. Inferential Methods (will be moved to separate repo/nuget)
*   F Statistics

*   Studentt Statistic

*   Welcht Statistic

TODO:

  • naming discussion
To Consider:
Central Tendencies OK
Dispersion Variability seems to be OK
Shape Was named - Distribution
Asymmetry
Roundness
Central Tendencies OK
Dependency /Dependencies Correlation
Covariance

Comparison Table

Comparison of this library with

x = IEnumerable<T>
extension methods
C# HolisticWare Octave R Python
Central Tendencies x.MeanArithmetic(); mean(vector_list, "a") mean(vector_list)
x.MeanGeometric(); mean(vector_list, "g") N/A (custom function)
x.MeanGeometricNaive(); mean(vector_list, "g") N/A (custom function)
x.MeanHarmonic(); mean(vector_list, "h") ???
x.MeanSquared(); meansq(vector_list) ???
x.MeanCubic(); ??? ????
x.MeanWeighted(); ??? ????
x.Median(); ??? ????
x.MedianWeighted(); ??? ????
x.Modes(); ??? ????
x.ModesRank(); ??? ????
Dispersion x.Moment(); ??? ????
x.MomentCentral(); ??? ????
x.MomentCentralAbsolute(); ??? ????
x.MomentRaw(); ??? ????
x.MomentRawAbsolute(); ??? ????
x.Range(); ??? ????
x.StandardDeviationPopulation(); ??? ????
x.StandardDeviationSample(); ??? ????
x.VariancePopulation(); ??? ????
x.Variance(); ??? ????
Distribution x.FrequencyCounter(); ??? ????
x.FrequencyDistribution(); ??? ????
x.Frequencies(); ??? ????
x.Percentiles(); ??? ????
x.PercentilesRank(); ??? ????
Distribution Asymm x.Skewness(); ??? ????
Distribution Round x.Kurtosis(); ??? ????
x.KurtosisSample(); ??? ????
x.KurtosisSampleExcess(); ??? ????
Dependency x.Correlation(y); ??? ????
x.CorrelationOptimized(y); ??? ????
x.Covariance(y); ??? ????
Inferential x.StudenttStatisticIndependent(y); ??? ????
x.FStatistic(y); ??? ????
x.Welcht(y); ??? ????

Usage

Central Tendencies Measures

Dispersion Measures

Distribution Measures

Distribution Asymmetry Measures

Distribution Roundness Measures

Technical/Platform Implementation Details

IEnumerable<Nullable<T>> implementation

There are 2 reasons for the lack of IEnumerable<Nullable<T>> implementations:

1. Mathematical Consistency

When Nullable Arithmetic is extended to LINQ implementation the results are not consistent

// suppose
int? i_sum_1     = 2 + null;
int? i_product_1 = 2 * null;
// 
// suppose
int? i_sum_2     = 2 + 5 + null;
int? i_product_2 = 2 * 5 + null;
// generalized case (LINQ)
// sum(i)
int? i_sum_3     = (new[] {2, 5, null}).Sum();      //  7

It is possible to avoid this with Aggregate() extension method, but the decision was made to skip this implementation, because of following reason.

2. Data Science Data Preparation (Data Munging/ Data Wrangling)

One of the first steps in analyzing and processing data is missing data processing, where missing data is usually represented as null values. So, every IEnumerable<Nullable<T>> must be converted/transformed to IEnumerable<T> prior to further analysis and processing.

Comparison

TODOs / Plans

Performance

Performance question boils down to use of for vs foreach. Current implementation uses for.

After benchmarks will be added and tests conducted, implementation might change in the future.

References Links

Alternatives

Authors / Contributirs

  • Authors

    • Darko Katovic - Katodix

      KIF (Faculty of Kinesiology, University of Zagreb, Croatia)

    • Miljenko Cvjetko - moljac

      Microsoft (Xamarin Inc.), HolisticWare

About

HolisticWare.Core portable utilities

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published