<hr/>

# Introduction to Data Science
**Tamás Budavári** - budavari@jhu.edu <br/>

- Probability Density Function (PDF)
- Cummulative Density Function (CDF)
- Moments
- Intro to Programming in Python

<hr/>

### Probability Density Function
- PDF a.k.a. Probability Distribution Density Function
- Probabiliy of $x$ being between $a$ and $b$ for any $(a,b)$ is

> $\displaystyle P_{ab} = \int_a^b p(x)\,dx$

- Always 

> $\displaystyle  \int_{-\infty}^{\infty} p(x)\,dx = 1$


- Example 1: uniform distribution on $(a,b)$

> $\displaystyle  U(x;a,b) = \frac{\pmb{1}_{ab}(x)}{b\!-\!a} $,
> where $\pmb{1}_{ab}(x)$ is 1 between $a$ and $b$, but 0 otherwise

- Example 2: Gaussian or normal distribution

> $\displaystyle  G\left(x;\mu,\sigma^2\right) = \frac{{1}}{\sqrt{2\pi\sigma^2}}\ \exp\left[{-\frac{(x\!-\!\mu)^2}{2 \sigma^2} }\right]$

- Example 3: Log-normal

### Gauss on Money!

<!--<img src='https://upload.wikimedia.org/wikipedia/commons/thumb/f/fd/DEU-10m-anv.jpg/640px-DEU-10m-anv.jpg' width=400 align=left>-->

<img src='files/640px-DEU-10m-anv.jpg' width=400 align=left>

- Even the formula

<img src='files/10DM.jpg' width=400 align=left>

### Cummulative Distribution Function
- Integral up to a given $x$: prob of being less than $x$

> $\displaystyle \mathrm{CDF}(x) = \int_{-\infty}^{x} p(t)\,dt$

In [None]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
import scipy
from scipy.stats import uniform

a, b = -1, 1
u = uniform(a, b-a)

x = np.linspace(-6, 6, 1000)

plt.plot(x, u.pdf(x));
plt.plot(x, u.cdf(x));

u.support()

In [None]:
plt.plot(x, u.pdf(x), ':')
plt.plot(x, u.cdf(x), ':')

from scipy.stats import norm as gaussian

g = gaussian(0, 2)

plt.plot(x, g.pdf(x));
plt.plot(x, g.cdf(x));

g.support()

In [None]:
l = scipy.stats.lognorm(1)

plt.plot(x, l.pdf(x), color='C4')

plt.plot(x, u.pdf(x), ':', color='C0')
plt.plot(x, g.pdf(x), '--', color='C2')

l.support()

### Characterization of PDFs

- Expectation value of $X$

> $\displaystyle \mu = \mathbb{E}[X] = \int_{-\infty}^{\infty}\! x\ p(x)\,dx$

- Expectation value of any $f(X)$

> $\displaystyle \mathbb{E}[f(X)] = \int_{-\infty}^{\infty}\! f(x)\,p(x)\,dx$

- Moments 

> $\displaystyle \mathbb{E}[X^k]$
    
- Central moments 

> $\displaystyle \mathbb{E}\big[(X\!-\!\mu)^k\big]$

- Variance

> $\displaystyle \mathbb{Var}[X] = \mathbb{E}\big[(X\!-\!\mu)^2\big]$

- Standard deviation

> $\displaystyle \sigma = \sqrt{\mathbb{Var}[X]}$

- Normalized moments

> $\displaystyle \mathbb{E}\left[\left(\frac{X\!-\!\mu}{\sigma}\right)^k\right]$

- Skewness

> 3rd normalized moment ($k$=3)

- Kurtosis

> 4th normalized moment ($k$=4)
 


<img src="files/skew_kurt.png" width=400 align=left>

In [None]:
# mean, variance, skewness, kurtosis
g.stats(moments='mvsk')

In [None]:
u.stats(moments='mvsk')

In [None]:
l.stats(moments='mvsk')

In [None]:
# multiple gaussians with the same standard deviation
gaussian([0,1,2],3).stats(moments='mv')

<h1><font color="darkblue">Python by Examples</font></h1>

- tuple, list, function, class, for, map,  lambda, import

- numpy, matplotlib 

In [None]:
# tuple
t = (1,'asdf')
t = 100, 0.1
N, mu = t
print (N)

In [None]:
# list
l = [1, 2, 3, 4, 5]

# numpy array
a = np.array([l, l], dtype=np.float64)
a

In [None]:
a.shape

In [None]:
# function
def f(x, k=2):
    return x**k

f3 = f(3)
print (f3)
f(2), f(2,2), f(2,3), f(2,k=4), f3

In [None]:
import math

# object-oriented programming
class Robot(object):
    
    def __init__(self, name, x=0, y=0, angle=0):
        self.name, self.x, self.y, self.angle = name, x, y, angle
        self.path = [(x,y)]
    
    def move(self, l=1):
        self.x += l * math.cos(self.angle)
        self.y += l * math.sin(self.angle)
        self.path.append((self.x, self.y))
        
    def left(self, a=math.pi/2):
        self.angle += a
        
    def right(self, a=math.pi/2):
        self.left(-a)

In [None]:
r = Robot('R2D2')
r.move()    # by 1 unit
r.left()    # 90 degrees
r.move(0.2)
r.left()
r.move(0.4)
r.right(np.pi/4)
r.move()

In [None]:
r.path # complete history

In [None]:
x, y = (c for c in zip(*r.path)) # unhomework to understand this line
plt.plot(x, y, 'ro-');

In [None]:
plt.plot(x, y, 'ro-', label=f"{r.name}'s path")
plt.legend()
plt.xlabel('x coordinate')
plt.ylabel('y coordinate')
plt.grid()
plt.savefig('robot.png', dpi=200)
plt.savefig('robot.pdf')

In [None]:
# lambda expressions
g = lambda x: x*x
g(2)

In [None]:
# using standard math 
import math

math.pi, math.sin(1.57)

In [None]:
# using numpy math
np.pi, np.sin(1.57)

In [None]:
# numpy methods work also on arrays, e.g., elementwise
np.sin( [1.57, 3.14, np.pi] ) 

In [None]:
# arrays: vectors and matrices
import numpy as np

l = [1, 2, 3]
a = np.array([l, l], dtype=np.int32)

In [None]:
a

In [None]:
print (a.shape)
print (a.T)

In [None]:
[l, l]

In [None]:
a * a 

In [None]:
a.dot(a) # why does this fail?

In [None]:
b = a.T.dot(a)
b

In [None]:
a.T @ a

In [None]:
# slicing arrays
print (b)
b[:2, 1:2]

In [None]:
b < 5

In [None]:
b[b < 5]

In [None]:
i,j = np.where(b < 5)

print (f'i: {i}')
print (f'j: {j}')
print (f'elements: {b[i,j]}')

In [None]:
# componentwise operations
print (np.sin(l))

# slow python loop
for s in map(math.sin, l): 
    print (s)

In [None]:
[math.sin(x) for x in l] # little better but not numpy speed