# Fundamentals of Data Analysis

***


GG00411367 Ioan Domsa

# Normal Distribution

![Picture NormalDistribution](images/Img_NormalDistrib.png)

## What is Normal Distribution

A normal distribution is a type of continuous probability distribution in which most data points cluster toward the middle of the range, while the rest taper off symmetrically toward either extreme. 

The middle of the range is also known as the mean of the distribution.

The normal distribution is also known as a Gaussian distribution

## Shape

A normal distribution is **symmetric** from the peak of the curve, where the mean is. 

The graph appears as bell-shaped where the mean, median, and mode are of the same values and appear at the peak of the curve.

## Parameters


**1. Mean**

The mean is used as a measure of central tendency. It defines the location of the peak, and most of the data points are clustered around the mean.

Any changes made to the value of the mean move the curve either to the left or right along the X-axis


![Picture NormalDistribution](images/Img_ND-Mean.png)


**2. Standard Deviation**

The standard deviation measures the dispersion of the data points relative to the mean. It determines how far away from the mean the data points are positioned and represents the distance between the mean and the observations.

Typically, a small standard deviation relative to the mean produces a steep curve, while a large standard deviation relative to the mean produces a flatter curve.

![Picture NormalDistribution](images/Img_ND-Std.png)


## Properties 

**It is symmetric**

This means that the distribution curve can be divided in the middle to produce two equal halves

**The mean, median, and mode are equal**

The middle point of a normal distribution is the point with the maximum frequency, which means that it possesses the most observations of the variable.

**Empirical rule**

68.25% of all cases fall within +/- one standard deviation from the mean. 

95% of all cases fall within +/- two standard deviations from the mean. 

99% of all cases fall within +/- three standard deviations from the mean.

**Skewness and kurtosis**

Skewness represents a distribution's degree of symmetry. Since the normal distribution is perfectly symmetric, it has a skewness of zero.

kurtosis measures the thickness of the tail ends relative to the tails of a normal distribution. For a normal distribution, kurtosis is always equal to 3.

**The total area under the curve is 1.**

## Formula


$ f(x) = \frac{1}{\sigma \sqrt{2 \pi }}e^{-\frac{1}{2}(\frac{(x - \mu)}{\sigma})^2} $

$ f(x) = Probability \:Density \:Function $

$ \sigma = Standard \:Deviation $

$ \mu = Mean $

## Mean, Median and Mode

The mean median and mode are measurements of central tendency. In other words, it tells where the “middle” of a data set is. Each of these statistics defines the middle differently:

1. The mean is the average of a data set. It is found by adding all of the numbers together and dividing by the number of items in the set.


3. The median is the middle of the set of numbers. It is found by ordering the set from lowest to highest and finding the exact middle.


3. The mode is the most common number in a data set.

### Calculate the Mean, Median and Mode with python libraries

In [4]:
# Import libraries

import numpy as np
import scipy.stats as ss

In [6]:
# example of dataset
dataset = [2, 2, 3, 4, 2, 1, 6, 7, 18]

* Calculate the Mean by by adding all of the numbers together and dividing by the number of items

Mean = (2+2+3+4+2+1+6+7+18)/9 = 5


In [12]:
# Mean with python

mean = np.mean(dataset)
print("Mean of the set is :" f'{mean}')

Mean of the set is :5.0


* Calculate the Median. Ordonate the set:[1,2,2,2,3,4,6,7,18]

Median = 3 as number 3 is in the middle of the set

In [16]:
# Median with python

median = np.median(dataset)
print("Median of the set is :" f'{median}')

Median of the set is :3.0


* Calculate the Mode by finding the most common item in the dataset

Mode = 2 as number 2 appears most often in the dataset

In [18]:
# Mode with python

mode = ss.mode(dataset)
print("Mede of the set is :" f'{mode[0]}')

Mede of the set is :[2]


###  References

https://en.wikipedia.org/wiki/Normal_distribution

https://www.techtarget.com/whatis/definition/normal-distribution

https://corporatefinanceinstitute.com/resources/data-science/normal-distribution/

http://rovdownloads.com/blog/understanding-and-interpreting-s-curves-and-cdf-curves-2/

http://www.emerson.emory.edu/services/latex/latex_119.html

https://www.codymd.com/left-align-latex-equation-in-jupyter-notebook/

https://www.statisticshowto.com/probability-and-statistics/statistics-definitions/mean-median-mode/#mode

https://www.w3schools.com/python/python_ml_mean_median_mode.asp#:~:text=Mean%2C%20Median%2C%20and%20Mode&text=In%20Machine%20Learning%20(and%20in,Mode%20%2D%20The%20most%20common%20value