Add more functions #3

PragTob · 2019-07-03T20:35:50Z

Or well extract statistics functions we already built but only used in the calculation of other values so far :)

New property 🎉 I think now it's harder to write an implementation that is wrong that passes the property than the other way around.

MariliaMJ · 2019-07-04T14:28:47Z

lib/statistex.ex

+
+  def frequency_distribution(samples) do
+    Enum.reduce(samples, %{}, fn sample, counts ->
+      Map.update(counts, sample, 1, fn old_value -> old_value + 1 end)


💅 suggestion : This style is not exactly functional. Consider writing a private function based on recursion, like so:
https://elixir-lang.org/getting-started/recursion.html

Also, it's important to point out that, for large samples, Enum will become very slow. One should consider using Stream or even, for very huge data, GenStage, something like Flow: https://hexdocs.pm/flow/Flow.html
Bear in mind that erlang was not built for mathematical computation, so it won't have a performance as good as other languages (python or java for example). It was built with communication, parallel processing and scalability.
Take a look here: https://stackoverflow.com/questions/13629142/why-is-erlang-slower-than-java-on-all-these-small-math-benchmarks#13629562

Hi there,

thanks for your input.

re functional: For me FP is more about separation of side effects/pure functions and maybe immutable data structures. I'm well familiar with the concept of recursion and also use it in this library. I know lots of FP code uses recursion, reduce is also arguably very functional. If you have a recursive implementation of this function that is more readable and/or faster I'm happy to accept PRs.

re performance: I'm also aware of the performance trade offs of Erlang, please refer to the performance section of the README. Parallelization is currently not really a focus of Statistex but might be an option later. The major client of statistex (benchee) already calculates the statistics in parallel for multiple data sets so this would be unlikely to yield performance improvements.

PragTob added 4 commits July 3, 2019 19:48

Add separate variance function/value

28f081b

add separate frequency_distribution function

9a73091

Frequency distribution counts gotta sum up to sample size

7fcb588

New property 🎉 I think now it's harder to write an implementation that is wrong that passes the property than the other way around.

and every sample is in the frequency distribution

9b22014

MariliaMJ reviewed Jul 4, 2019

View reviewed changes

PragTob merged commit d1f0913 into master Jul 5, 2019

PragTob deleted the add-more-functions branch July 5, 2019 13:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add more functions #3

Add more functions #3

PragTob commented Jul 3, 2019

MariliaMJ Jul 4, 2019

MariliaMJ Jul 4, 2019

PragTob Jul 5, 2019

Add more functions #3

Add more functions #3

Conversation

PragTob commented Jul 3, 2019

MariliaMJ Jul 4, 2019

Choose a reason for hiding this comment

MariliaMJ Jul 4, 2019

Choose a reason for hiding this comment

PragTob Jul 5, 2019

Choose a reason for hiding this comment