# Creating a Python package - Analyze a Gaussian Distribution
- Read dataset
- Calculate mean
- Calculate standard deviation
- Plot histogram
- Plot probability density function (PDF)

<br><br>
* **Gaussian Distribution PDF**:
$$f\left(x\ \middle| \ \mu , \ \sigma^{2} \right)=\frac{1}{\sqrt{2\pi\sigma^{2}}}e^{\frac{-\left(x-\mu\right)^{2}}{2\sigma^{2}}}$$<br>

* **Binomial Distribution**:
The binomial distribution is used when there are exactly two mutually exclusive outcomes of a trial. These outcomes are appropriately labeled "success" and "failure". The binomial distribution is used to obtain the probability of observing $k$ successes in $n$ trials, with the probability of success on a single trial denoted by $p$. The binomial distribution assumes that $p$ is fixed for all trials

$$f\left(k,\ n,\ p\right) = \frac{n!}{k!\left(n-k\right)!}p^{k}\left(1-p\right)^{\left(n-k\right)}$$

Where $p$ is the probability of an outcome, $n$ is the number of observations, $k$ is the outcome of interest

> * mean: $\mu = n\cdot p$<br>
> * variance: $\sigma^{2}=n\cdot p \cdot \left(1-p\right)$
> * standard deviation: $\sqrt{n\cdot p \cdot \left(1-p\right)}$<br>

> Example:<br>
Let $k=2$, $n=2$, amd $p=0.5$
$$f\left(k=2,\ n=2,\ p=2\right)=\frac{2!}{2!\left(2-2\right)!}\left(0.5\right)^{2}\left(1-0.5\right)^{\left(2-2\right)}=\frac{2}{2}\left(0.25\right)\left(1\right)=0.25$$

## ASIDE: Further Resources
If you would like to review the Gaussian (normal) distribution and binomial distribution, here are a few resources:

This free Udacity course, [Intro to Statistics](https://www.udacity.com/course/intro-to-statistics--st101), has a lesson on Gaussian distributions as well as the Binomial distribution.

This free course, [Intro to Descriptive Statistics](https://www.udacity.com/course/intro-to-descriptive-statistics--ud827), also has a Gaussian distributions lesson.

**Here are the wikipedia articles:**
* [Gaussian Distributions Wikipedia](https://en.wikipedia.org/wiki/Normal_distribution)
* [Binomial Distributions Wikipedia](https://en.wikipedia.org/wiki/Normal_distribution)

## Gaussian PDF Code:
```python
import math
import matplotlib.pyplot as plt

class Gaussian():
    """
    Gaussian distribution class for calculating and 
    visualizing a Gaussian distribution.
    Attributes:
        mean (float) representing the mean value of the distribution
        stdev (float) representing the standard deviation of the distribution
        data_list (list of floats) a list of floats extracted from the data file
    """
    def __init__(self, mu = 0, sigma = 1):
        
        self.mean = mu
        self.stdev = sigma
        self.data = []


    def calculate_mean(self):
        """
        Method to calculate the mean of the data set.
        Args: 
            None
        
        Returns: 
            float: mean of the data set    
        """
        self.mean = sum(self.data) / len(self.data)
        
        return self.mean


    def calculate_stdev(self, sample=True):
        """
        Method to calculate the standard deviation of the data set.
        Args: 
            sample (bool): whether the data represents a sample or population
        Returns: 
            float: standard deviation of the data set    
        """
        mu = self.mean
        diff = [(x - mu) ** 2 for x in self.data]
        quantity = sum(diff)
        
        n = len(self.data)
        if sample == True:
            n -= 1
        else:
            pass
        
        variance = quantity / n
        
        self.stdev = math.sqrt(variance)
        
        return self.stdev


    def read_data_file(self, file_name, sample=True):
        """
        Method to read in data from a txt file. The txt file should have
        one number (float) per line. The numbers are stored in the data attribute. 
        After reading in the file, the mean and standard deviation are calculated
        Args:
            file_name (string): name of a file to read from
        Returns:
            None
        """
        
        # This code opens a data file and appends the data to a list called data_list
        with open(file_name) as file:
            data_list = []
            line = file.readline()
            while line:
                data_list.append(int(line))
                line = file.readline()
        file.close()
        
        self.data = data_list
        self.mean = self.calculate_mean()
        self.stdev = self.calculate_stdev(sample)


    def plot_histogram(self):
        """
        Method to output a histogram of the instance variable data using 
        matplotlib pyplot library.
        Args:
            None
        Returns:
            None
        """
        x_label = 'Data'
        y_label = 'Frequency'
        plt.hist(self.data)
        plt.title('Histogram of Data')
        plt.xlabel(x_label)
        plt.ylabel(y_label)


    def pdf(self, x):
        """
        Probability density function calculator for the gaussian distribution.
        Args:
            x (float): point for calculating the probability density function
        Returns:
            float: probability density function output
        """
        (1.0 / (self.stdev * math.sqrt(2*math.pi))) * math.exp(-0.5*((x - self.mean) / self.stdev) ** 2)
        
        denom_quantity = 2 * math.pi * self.stdev ** 2
        denom = math.sqrt(denom_quantity)
        
        frac = 1 / denom
        
        exp_frac_num = (-1) * (x - self.mean) ** 2
        exp_frac_denom = 2 * self.stdev ** 2
        exp_frac = exp_frac_num / exp_frac_denom
        
        prob_dens_fct = frac * math.exp(exp_frac)
        
        return prob_dens_fct
```

## Magic Methods
Magic methods let you customize and override default python behavior

It's not possible to add to `Gaussian()`s together, but if we adapt the method of doing so by changing python's default behavior for this class, we can!
```python
    def __add__(self, other):
        """
        Magic method to add together two Gaussian distributions
            When summing two Gaussian distributions, the mean value is the sum
                of the means of each Gaussian.
            When summing two Gaussian distributions, the standard deviation is the
                square root of the sum of square ie sqrt(stdev_one ^ 2 + stdev_two ^ 2)
        Args:
            other (Gaussian): Gaussian instance
        Returns:
            Gaussian: Gaussian distribution 
        """
        result = Gaussian()

        result.mean = self.mean + other.mean
        result.stdev = math.sqrt(self.stdev ** 2 + other.stdev ** 2)
        
        return result


    def __repr__(self):
        """
        Magic method to output the characteristics of the Gaussian instance
        Args:
            None
        Returns:
            string: characteristics of the Gaussian
        """
        return "mean {}, standard deviation {}".format(self.mean, self.stdev)
```