# Object Oriented vs. Functional Programming in Data Science : Distribution

## Introduction

There are two interesting and most popular programming approaches in industry right now, which are **Object Oriented Programming** and **Functional Programming**. In this article, we are going to compare two approaches in Data Science, starting with **Probability Distributions**.

## Objectives

In this article, our objective is to create classes and functions to calculate the **Probability Density Function**, **Cumulative Density Function** and **Inverse Cumulative Density Function** of three Probability Distributions, **Normal Distribution**, **Student's t-Distribution** and **f-Distribution**. We will use both **Object Oriented Programming** and **Functional Programming** in this procedure.

## Code

### Importing Necessary Modules

For simplification, we are going to use the probability distributions from `scipy.stats`. We can import those distributions directly from this module.

In [1]:
from scipy.stats import norm, t, f

### Object Oriented: Creating Class for Probability Distributions

In this procedure for **Object Oriented Programming**, we begin by creating a base class named `Distribution` which will contain three methods

- `.pdf`: Probability Density Function of the given distribution
- `.cdf`: Cumulative Density Function of the given distribution
- `.inverse`: Inverse Cumulative Density Function of the given distribution

This class will be initialized using `dist` which is the distributions from `scipy.stats` and `**kwargs` which is any keyword arguments. `**kwargs` is for generalize the child classes in the distributions with hyperparameters, for example, `Degree of Freedom`.

In [2]:
class Distribution():
    def __init__(self, dist, **kwargs):
        self.dist = dist
        self.kwargs = kwargs
    
    def pdf(self, x):
        return self.dist.pdf(x, **self.kwargs)
    
    def cdf(self, x):
        return self.dist.cdf(x, **self.kwargs)
    
    def inverse(self, p):
        return self.dist.ppf(p, **self.kwargs)

### Functional Programming: Creating functions for Probability Distributions

In this approach, we create three functions that returns a function, these are three following functions we defined

- `dist_pdf` returns a function for **Probability Density Function** which take the parameter x and return the **Probability Density Function** of x
- `dist_cdf` returns a function for **Cumulative Density Function** which take the parameter x and return the **Cumulative Density Function** of x
- `dist_inverse` returns a function for **Inverse Cumulative Density Function** which take the parameter x and return the **Inverse Cumulative Density Function** of x

All functions take parameter `distribution` from `scipy.stats` and additional keyword arguments (for example, `Degree of Freedom`)

In [3]:
def dist_pdf(distribution, **kwargs):
    def dist(x):
        return distribution.pdf(x, **kwargs)
    return dist

def dist_cdf(distribution, **kwargs):
    def dist(x):
        return distribution.cdf(x, **kwargs)
    return dist

def dist_inverse(distribution, **kwargs):
    def dist(x):
        return distribution.ppf(x, **kwargs)
    return dist

### Normal Distribution

For **Object Oriented Programming** Approach, we create a an extension class from `Distribution` and initialize them using `norm` from `scipy.stats`

In [4]:
class NormalDistribution(Distribution):
    def __init__(self):
        super().__init__(norm)

We then create a `normal_oop` for a Normal Distribution Class and `normal_fp_pdf`, `normal_fp_cdf` and `normal_fp_inverse` for **Functional Programming** Approach 

In [5]:
normal_oop = NormalDistribution()
normal_fp_pdf, normal_fp_cdf, normal_fp_inverse = dist_pdf(norm), dist_cdf(norm), dist_inverse(norm)

We then create test values to test our both approaches

In [6]:
test_value = 1.645
test_p = 0.95

This is the result of **Object Oriented Programming** approach

In [7]:
print(f"pdf of {test_value}", normal_oop.pdf(test_value))
print(f"cdf of {test_value}", normal_oop.cdf(test_value))
print(f"inverse of {test_p}", normal_oop.inverse(test_p))

pdf of 1.645 0.10311081109198142
cdf of 1.645 0.9500150944608786
inverse of 0.95 1.6448536269514722


and this is the result of **Function Programming** approach

In [8]:
print(f"pdf of {test_value}", normal_fp_pdf(test_value))
print(f"cdf of {test_value}", normal_fp_cdf(test_value))
print(f"inverse of {test_p}", normal_fp_inverse(test_p))

pdf of 1.645 0.10311081109198142
cdf of 1.645 0.9500150944608786
inverse of 0.95 1.6448536269514722


### Student's t-Distribution

For **Object Oriented Programming** Approach, we create a an extension class from `Distribution` and initialize them using `t` from `scipy.stats` and additional parameter `df` for Degree of Freedom

In [9]:
class tDistribution(Distribution):
    def __init__(self, df):
        super().__init__(t, df=df)

We then again create class and functions for two approaches with `degree of freedom = 100`

In [10]:
test_t_df = 100

t_oop = tDistribution(test_t_df)
t_fp_pdf, t_fp_cdf, t_fp_inverse = dist_pdf(t, df=test_t_df), dist_cdf(t, df=test_t_df), dist_inverse(t, df=test_t_df)

This is the result of **Object Oriented Programming** approach

In [11]:
print(f"pdf of {test_value}", t_oop.pdf(test_value))
print(f"cdf of {test_value}", t_oop.cdf(test_value))
print(f"inverse of {test_p}", t_oop.inverse(test_p))

pdf of 1.645 0.10333092764232613
cdf of 1.645 0.9484451174124768
inverse of 0.95 1.66023432606575


This is the result of **Functional Programming** approach

In [12]:
print(f"pdf of {test_value}", t_fp_pdf(test_value))
print(f"cdf of {test_value}", t_fp_cdf(test_value))
print(f"inverse of {test_p}", t_fp_inverse(test_p))

pdf of 1.645 0.10333092764232613
cdf of 1.645 0.9484451174124768
inverse of 0.95 1.66023432606575


### f-Distribution

For **Object Oriented Programming** Approach, we create a an extension class from `Distribution` and initialize them using `f` from `scipy.stats` and additional parameter `df1` and `df2` for Numerator and Denominator Degree of Freedom

In [13]:
class fDistribution(Distribution):
    def __init__(self, df1, df2):
        super().__init__(f, dfn=df1, dfd=df2)

We then again create class and functions for two approaches with `Numerator Degree of Freedom = 10` and `Denominator Degree of Freedom = 10`

In [14]:
test_f_dfn = 10
test_f_dfd = 10

f_oop = fDistribution(test_f_dfn, test_f_dfd)
f_fp_pdf, f_fp_cdf, f_fp_inverse = dist_pdf(f, dfn=test_f_dfn, dfd=test_f_dfd),\
                                dist_cdf(f, dfn=test_f_dfn, dfd=test_f_dfd),\
                                dist_inverse(f, dfn=test_f_dfn, dfd=test_f_dfd)

This is the result of **Object Oriented Programming** Approach

In [15]:
print(f"pdf of {test_value}", f_oop.pdf(test_value))
print(f"cdf of {test_value}", f_oop.cdf(test_value))
print(f"inverse of {test_p}", f_oop.inverse(test_p))

pdf of 1.645 0.27526271864167623
cdf of 1.645 0.7775043016623333
inverse of 0.95 2.9782370160823213


This is the result of **Functional Programming** Approach

In [16]:
print(f"pdf of {test_value}", f_fp_pdf(test_value))
print(f"cdf of {test_value}", f_fp_cdf(test_value))
print(f"inverse of {test_p}", f_fp_inverse(test_p))

pdf of 1.645 0.27526271864167623
cdf of 1.645 0.7775043016623333
inverse of 0.95 2.9782370160823213


## Conclusion

Both **Object Oriented Programming** and **Functional Programming** provides the same output for probability distribution although they take the different approaches. We can treat probability distributions as objects which has the methods or we can define pure functions on those probability density functions.

However it is for the readers to decide which one is more simple / readable and maintainable. This is up to everyone's preference as both methods have advantages and disadvantages.