# Numpy 基础 3

# **📚Section 5-Statistical Analysis<a name='statistical_analysis'></a>**

Although NumPy is not a library for statistical analysis, it does provide several descriptive statistics functions. In NumPy documentation these are presented as “order”, “average and variances”, “correlating” and “histograms”, but all of those are just descriptive statistics.

In [None]:
import numpy as np

# Creating a NumPy array
data = np.array([1, 2, 3, 4, 5])

# Calculate mean and median
mean_value = np.mean(data)
median_value = np.median(data)

# Calculate variance and standard deviation
variance_value = np.var(data)
std_deviation_value = np.std(data)

## **5.2- Working with Distributions**

NumPy supports various probability distributions, making it easier to model and simulate data.

In [None]:
# Background Math Knowledge

# Normal Distribution: https://www.mathsisfun.com/data/standard-normal-distribution.html
# 常见分布：https://blog.dailydoseofds.com/p/nine-most-important-distributions

import numpy as np
# Generating random samples from a normal distribution
random_samples = np.random.normal(loc=0, scale=1, size=1000)

## **5-3-Hypothesis Testing**

In [None]:
# Background Math Knowledge

# https://blog.minitab.com/en/statistics-and-quality-data-analysis/what-are-t-values-and-p-values-in-statistics

# One-sample t-test
from scipy.stats import ttest_1samp
t_statistic, p_value = ttest_1samp(data, popmean=3)

In [None]:
p_value

## **5.4-Correlation and Regression** (Optional)

In [None]:
# Background Math Knowledge

# Simple Linear Regression: https://online.stat.psu.edu/stat501/lesson/1

data1 = np.array([3, 1, 5, 2, 4])
data2 = np.array([3, 1, 6, 7, 4])

# Calculate correlation coefficient
correlation_coefficient = np.corrcoef(data1, data2)[0, 1]
# Linear regression
slope, intercept = np.polyfit(data1, data2, 1)

In [None]:
intercept

## **5.5-Data Transformation**

In [None]:
# Data Centering
data = np.array([10, 20, 30, 40, 50])
mean = np.mean(data)
centered_data = data - mean

In [None]:
# Standardization
std_dev = np.std(data)
standardized_data = (data - mean) / std_dev

In [None]:
# Log Transformation
log_transformed_data = np.log(data)

## **5.6-Random Sampling**

Random sampling involves selecting a subset of data points from a larger dataset. NumPy also provides tools for generating random numbers from various probability distributions.

In [None]:
# Simple Random Sampling Without replacement
data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
random_samples = np.random.choice(data, size=5, replace=False)

In [None]:
# Bootstrap Sampling
num_samples = 1000
bootstrap_samples = np.random.choice(data, size=(num_samples, len(data)), replace=True)

## **5.7-Measures of central tendency** (Optional)

In [None]:
import random

In [None]:
!pip install scipy

In [None]:
import scipy.stats as sp

In [None]:
poisson1 = sp.poisson(5).rvs(1000)
poisson2 = sp.poisson(50).rvs(1000)

In [None]:
import matplotlib.pyplot as plt
fig, (ax1, ax2) = plt.subplots(1, 2, constrained_layout=True)
fig.suptitle('Sampling from poisson distribution')
ax1.hist(poisson1, bins=10)
ax1.set_title("Expectation of interval: 5")
ax2.hist(poisson2, bins=10)
ax2.set_title("Expectation of interval: 50");

In [None]:
import numpy as np
# Sample data
data = np.array([10, 20, 30, 40, 50, 20, 30, 40, 60])
# Mean
mean = np.mean(data)
print("Mean:", mean)
# Median
median = np.median(data)
print("Median:", median)
# Mode
mode = np.argmax(np.bincount(data))
print("Mode:", mode)

## **5.8-Measures of dispersion** (Optional)

Measures of dispersion are indicators of the extent to which data distributions are stretched or squeezed.

In [None]:
rand_matrix = np.random.rand(5,5)
print(f"Pearson product-moment correlation coefficient:\n{np.corrcoef(poisson1,poisson2)}\n")
print(f"Cross-correlation coefficient:\n{np.correlate(poisson1,poisson2)}\n")
print(f"Covariance matrix coefficients:\n{np.cov(poisson1,poisson2)}\n")
print(f"Pearson product-moment correlation coefficient:\n{np.corrcoef(rand_matrix)}\n")
print(f"Covariance matrix coefficients:\n{np.cov(rand_matrix)}")