# Sample Standard Deviation 

![image.png](attachment:image.png)
![image-2.png](attachment:image-2.png)

- The value for the standard deviation describes how closely the data set is to the mean. One standard deviation away from the mean on either side contains approximately 68.3% of the samples, two standard deviations contains approximately 95.4% of the samples, and so on. Because this formula calculates an approximate value for the true standard deviation there will be some discrepency between the actual normal distribution and the one modeled by the sample standard deviation.

![image.png](attachment:image.png)

# The Standard Normal Distribution

- A standard normal random variable is a normally distributed random variable with mean μ = 0 and standard deviation σ = 1. It will always be denoted by the letter Z.

![image.png](attachment:image.png)

 
## The Probability Density Function(PDF) 
- defines the probability function representing the density of a continuous random variable lying between a specific range of values. In other words, the probability density function produces the likelihood of values of the continuous random variable. Sometimes it is also called a probability distribution function or just a probability function.

![image.png](attachment:image.png)

![image.png](attachment:image.png)

## What is a Cumulative Distribution Function?

- The Cumulative Distribution Function (CDF), of a real-valued random variable X, evaluated at x, is the probability function that X will take a value less than or equal to x. It is used to describe the probability distribution of random variables in a table. And with the help of these data, we can easily create a CDF plot in an excel sheet.

- In other words, CDF finds the cumulative probability for the given value. To determine the probability of a random variable, it is used and also to compare the probability between values under certain conditions. For discrete distribution functions, CDF gives the probability values till what we specify and for continuous distribution functions, it gives the area under the probability density function up to the given value specified.

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)
![image.png](attachment:image.png)
![image-3.png](attachment:image-3.png)

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

In [2]:
from scipy.stats import norm

In [3]:
norm.cdf(1)

0.8413447460685429

In [4]:
norm.cdf(-1)

0.15865525393145707

In [6]:
(norm.cdf(1) - norm.cdf(-1))*100   # P[x = -1 to x = 1] = 68.26 %     mean +/- 1σ

68.26894921370858

In [7]:
(norm.cdf(2) - norm.cdf(-2))*100   # P[x = -2 to x = 2] = 95.44%      mean +/- 2σ

95.44997361036415

In [8]:
(norm.cdf(3) - norm.cdf(-3))*100   # P[x = -3 to x = 3] = 99.73%      mean +/- 3σ

99.73002039367398

# Z - Scores (Standardisation)

![image.png](attachment:image.png)

#### A Z-score is a numerical measurement that says , how far data point x is away from mean value. 
##### Its is a standardized measure 

`If you’re asked to find standardized values, use this formula to make your calculations:`
`Standardized Values: `

`calculate a standardized value (a z-score), using the above formula. The symbols are:`

`X: the observation (a specific value that you are calculating the z-score for).`

`Mu(μ): the mean.`

`Sigma(σ): the standard deviation.`

![image-2.png](attachment:image-2.png)

![image.png](attachment:image.png)


In [16]:
norm.cdf(2)  # cummulative probability of z = 2

0.9772498680518208

In [None]:
# probability of 11 when, mean = 3 , varinace = 16

In [15]:
norm.cdf(11 , loc = 3 , scale = 4)    # P[x<=11] 
                                      # = P[(x-3)/4 <= (11-3)/4]
                                      # = P[(x-3)/4 <= (8)/4]
                                      # = P[Z <= 2]

0.9772498680518208

pmf_vs_pdf : https://www.youtube.com/playlist?list=PLTNMv857s9WVzutwxaMb0YZKW7hoveGLS

#### Variance and its sq-root, the standard deviation are both measures of the spread or variability in data. Two or  more data set could have the same mean , but very different variances . 

##### variance is based on squared difference between each observation and mean . 

##### the average squared deviation : How far is each data point from the center (mean) on average 

# Variance 

- `Variance is the expected value of the squared variation of a random variable from its mean value, in probability and statistics. Informally, variance estimates how far a set of numbers (random) are spread out from their mean value.`

- The value of variance is equal to the square of standard deviation, which is another central tool.

- Variance is symbolically represented by `σ2, s2, or Var(X). `


As we know already, the variance is the square of standard deviation, i.e.,


`Variance = (Standard deviation)2= σ2`


The corresponding formulas are hence,

`Population standard deviation σ =`
![image.png](attachment:image.png)

 
 
and
`Sample standard deviation s =`
![image-2.png](attachment:image-2.png)
 
 
 
Where X (or x) = Value of Observations

μ = Population mean of all Values

n = Number of observations in the sample set

![image-3.png](attachment:image-3.png)
 
= Sample mean

N = Total number of values in the population

![image.png](attachment:image.png)

In [17]:
(1*(1/6)) + (4*(1/6))  + (9*(1/6)) + (16*(1/6)) + (25*(1/6)) + (36*(1/6))

15.166666666666666

In [19]:
(91/6) - ((7/2)**2)

2.916666666666666

In [20]:
35/12

2.9166666666666665

In [None]:
Expected values of variance :  Σ ((x - E(x))^2) * P(x)

## Expected Value and Variance of a Discrete Random Variable

Expected Value (or mean) of a Discrete Random Variable

For a discrete random variable, the expected value, usually denoted as μ or `E(x)`, is calculated using:

![image.png](attachment:image.png)

![image.png](attachment:image.png)
![image-2.png](attachment:image-2.png)

## Variance of a Discrete Random Variable

![image.png](attachment:image.png)

# Covariance 

### 
cov(x,y) = covariance  betwen x and y variable

xi = data value of x
yi = data value of x

x-bar = mean of x 
y-bar = mean of y
N = nummber of data values



![image.png](attachment:image.png)

## Joint Distribution  : 

In [90]:
import scipy.stats


In [88]:
import scipy.stats
scipy.stats.norm(loc=100, scale=12)
#where loc is the mean and scale is the std dev
#if you wish to pull out a random number from your distribution
scipy.stats.norm.rvs(loc=100, scale=12)

#To find the probability that the variable has a value LESS than or equal
#let's say 113, you'd use CDF cumulative Density Function
scipy.stats.norm.cdf(113,100,12)
# Output: 0.86066975255037792
#or 86.07% probability

#To find the probability that the variable has a value GREATER than or
#equal to let's say 125, you'd use SF Survival Function 
scipy.stats.norm.sf(125,100,12)
# Output: 0.018610425189886332
#or 1.86%

#To find the variate for which the probability is given, let's say the 
#value which needed to provide a 98% probability, you'd use the 
#PPF Percent Point Function
scipy.stats.norm.ppf(.98,100,12)
# Output: 124.64498692758187

124.64498692758187

In [91]:
scipy.stats.norm(loc=100, scale=12)


<scipy.stats._distn_infrastructure.rv_frozen at 0x13b076a7d30>

In [96]:
scipy.stats.norm.rvs(loc=100, scale=12)


86.58028716991166

In [98]:
scipy.stats.norm.cdf(113,100,12)


0.8606697525503779

In [99]:
scipy.stats.norm.sf(125,100,12)


0.018610425189886332

In [100]:
scipy.stats.norm.cdf(125,100,12)


0.9813895748101137

In [101]:
1-0.9813895748101137


0.018610425189886315

In [None]:
0.5,0.2,0.7

In [105]:
0.5*0.2*0.7

0.06999999999999999

In [174]:
arr = [float(x) for x in "1.764052345967664 0.4001572083672233 0.9787379841057392 2.240893199201458 1.8675579901499675 -0.977277879876411 0.9500884175255894 -0.1513572082976979 -0.10321885179355784 0.41059850193837233 0.144043571160878 1.454273506962975 0.7610377251469934 0.12167501649282841 0.44386323274542566 0.33367432737426683 1.4940790731576061 -0.20515826376580087 0.31306770165090136 -0.8540957393017248 -2.5529898158340787 0.6536185954403606 0.8644361988595057 -0.7421650204064419 2.2697546239876076 -1.4543656745987648 0.04575851730144607 -0.1871838500258336 1.5327792143584575 1.469358769900285 0.1549474256969163 0.37816251960217356 -0.8877857476301128 -1.980796468223927 -0.3479121493261526 0.15634896910398005 1.2302906807277207 1.2023798487844113 -0.3873268174079523 -0.30230275057533557 -1.0485529650670926 -1.4200179371789752 -1.7062701906250126 1.9507753952317897 -0.5096521817516535 -0.4380743016111864 -1.2527953600499262 0.7774903558319101 -1.6138978475579515 -0.2127402802139687 -0.8954665611936756 0.38".split()]

In [175]:
arr,len(arr)

([1.764052345967664,
  0.4001572083672233,
  0.9787379841057392,
  2.240893199201458,
  1.8675579901499675,
  -0.977277879876411,
  0.9500884175255894,
  -0.1513572082976979,
  -0.10321885179355784,
  0.41059850193837233,
  0.144043571160878,
  1.454273506962975,
  0.7610377251469934,
  0.12167501649282841,
  0.44386323274542566,
  0.33367432737426683,
  1.4940790731576061,
  -0.20515826376580087,
  0.31306770165090136,
  -0.8540957393017248,
  -2.5529898158340787,
  0.6536185954403606,
  0.8644361988595057,
  -0.7421650204064419,
  2.2697546239876076,
  -1.4543656745987648,
  0.04575851730144607,
  -0.1871838500258336,
  1.5327792143584575,
  1.469358769900285,
  0.1549474256969163,
  0.37816251960217356,
  -0.8877857476301128,
  -1.980796468223927,
  -0.3479121493261526,
  0.15634896910398005,
  1.2302906807277207,
  1.2023798487844113,
  -0.3873268174079523,
  -0.30230275057533557,
  -1.0485529650670926,
  -1.4200179371789752,
  -1.7062701906250126,
  1.9507753952317897,
  -0.509652

In [176]:
arr = np.array(arr)

In [177]:
std = arr.std()
std

1.1133678186681364

In [178]:
mean = arr.mean()
mean

0.12524032797040804

In [179]:
upper = mean + (0.025774048*std)
upper

0.1539363235704159

In [180]:
lower  = mean - (0.025774048*std)

In [181]:
lower,upper

(0.0965443323704002, 0.1539363235704159)

In [182]:
for i in arr:
    if i < upper and i > lower:
        print("yess")

yess
yess


![image.png](attachment:image.png)