### Importing Libraries

In [2]:
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

# SciPy

SciPy provides a large number of functions that are useful for different types of scientific and engineering applications. Many SciPy functions operate on numpy arrays. For this course, we will primarily be using the SciPy.Stats sub-module. SciPy.Stats calculates density and mass functions, cumulative distribution functions, and quantile functions for many probability distributions. We will use these functions in the second and third courses of this series when we do more formal statistical analysis. If you are not familiar with probability distributions you can skip this section.


As a starting example, suppose we want to evaluate the cumulative distribution function (CDF) of the standard normal distribution at zero. Since zero is the median of the standard normal distribution, the resulting cumulative probability should be 1/2.

- The cumulative distribution function (CDF) tells you the total probability of a random variable being less than or equal to a specific value.

In [3]:
stats.norm.cdf(0)

0.5

- The CDF at 0, stats.norm.cdf(0), calculates the probability that a randomly chosen value from this distribution is less than or equal to 0.

Below are some additional examples of working with probability distributions:

In [6]:
# The median of a standard Student's t distribution with 10 degrees of freedom
print(stats.t(10).ppf(0.5))

# The 97.5 percentile of a standard Student's t distribution with 5 degrees of freedom
print(stats.t(5).ppf(0.975))

# The probability that a standard normal value is less than or equal to 3
print(stats.expon.cdf(3))

# The height of the standard normal density function at 1
print(stats.norm.pdf(1))

# The probability of getting exactly 3 heads in 10 flips of a fair coin
print(stats.binom(10, 0.5).pmf(3))

# The probability of getting 3 or fewer heads in 10 flips of a fair coin
print(stats.binom(10, 0.5).cdf(3))

6.805747424058503e-17
2.570581835636314
0.950212931632136
0.24197072451914337
0.11718749999999999
0.171875


## Data Solution 360

### Data Standardization

In [1]:
import pandas as pd
from sklearn.preprocessing import StandardScaler

In [4]:
data = {'Price': [250000, 300000, 500000, 800000, 200000],
                'Size': [1200, 1800, 2500, 3200, 1500]}


df = pd.DataFrame(data)
print(df)


    Price  Size
0  250000  1200
1  300000  1800
2  500000  2500
3  800000  3200
4  200000  1500


In [5]:
## Create a StandardScaler Object
scaler = StandardScaler()

df[["Price","Size"]]= scaler.fit_transform(df[["Price","Size"]])

In [6]:
df

Unnamed: 0,Price,Size
0,-0.727273,-1.162192
1,-0.5,-0.332055
2,0.409091,0.636438
3,1.772727,1.604931
4,-0.954545,-0.747123


In [7]:
df.describe()

Unnamed: 0,Price,Size
count,5.0,5.0
mean,0.0,0.0
std,1.118034,1.118034
min,-0.954545,-1.162192
25%,-0.727273,-0.747123
50%,-0.5,-0.332055
75%,0.409091,0.636438
max,1.772727,1.604931
