# 生成分布随机数

|参数|功能|
|:---:|:---:|
|r|返回随机生成的数字|
|d|返回概率密度函数值，$f(x = z)$|
|p|返回累积密度函数(cdf)，等价于$P[X\leq z]$|
|q|返回逆累积密度函数(对应的分位数)|

## 正态分布，$X \sim N(\mu,\sigma^{2})$

[正态分布](https://zh.wikipedia.org/wiki/正态分布)：$E = \mu, Var = \sigma^2$，概率密度函数：$\frac1{\sigma\sqrt{2\pi}}\; \exp\left(-\frac{\left(x-\mu\right)^2}{2\sigma^2} \right) \!$

K-S散度

In [49]:
ks.test(rnorm(100),'pnorm',alternative = 'two.sided')


	One-sample Kolmogorov-Smirnov test

data:  rnorm(100)
D = 0.061678, p-value = 0.8413
alternative hypothesis: two-sided


#### rnorm（n，mean = 0，sd = 1）

In [205]:
print(rnorm(10,0,1))  #等效于rnorm(10, mean = 0, sd = 1)

 [1]  0.9041245 -0.4704379  1.2161526  1.5277645 -0.8121163 -0.7178396
 [7] -1.2273338 -0.6300887  0.5166825 -0.7965553


#### dnorm（x，mean = 0，sd = 1，log = FALSE），返回正态分布的概率密度函数的值；dnorm（z）表示标准正态分布密度函数f（x）在x=z处的函数值；log为TRUE时，函数返回值不再是正态分布而是对数分布

In [21]:
dnorm(1, mean = 1, sd = 1, log = FALSE)
dnorm(0)

#### pnorm（q，mean = 0，sd = 1，lower.tail = TRUE，log.p = FALSE）,返回值是正态分布的分布函数值，比如pnorm（z）等价于P[X≤z]

In [22]:
pnorm(1, mean = 1, sd = 1, log = FALSE)
pnorm(0)

#### qnorm（p，mean = 0，sd = 1，lower.tail = TRUE，log.p = FALSE），返回值是给定概率p后的对应分位点

In [30]:
qnorm(.85, mean = 1, sd = 1, log = FALSE)
qnorm(.5)

In [35]:
qnorm(c(0.05, 0.95),mean=1000,sd=100)    #1000的90%置信区间对应分位点

## 均匀分布，$X \sim B(n, p)$

## 二项分布(n重伯努利)，$X \sim B(n, p)$

n次伯努利试验中得到x次成功/失败的离散概率分布称为[二项分布](https://zh.wikipedia.org/wiki/%E4%BA%8C%E9%A0%85%E5%88%86%E4%BD%88)，$E = np, Var = np(1-p)$，概率质量函数：${\displaystyle f(k;n,p)=\Pr(X=k)={n \choose k}p^{k}(1-p)^{n-k}}$

#### rbinom(n, size, prob)，从给定样本产生给定概率的所需数量的随机值，size：试次，大于0

In [97]:
print(rbinom(8,500,.4))

[1] 183 217 192 209 210 197 193 192


#### dbinom(x, size, prob, log = FALSE)，给出每个点的概率密度分布，返回&ensp;$f(x)$

In [99]:
print(dbinom(200,500,.4))

[1] 0.03639907


In [101]:
choose(500,200)*0.4^200*0.6^300    #使用排列函数choose计算

#### pbinom(q, size, prob, lower.tail = TRUE, log.p = FALSE)，给出事件的累积概率，返回&ensp;$P[X\leq q]$

In [108]:
print(pbinom(200,500,.4))

[1] 0.5194108


#### qbinom(p, size, prob, lower.tail = TRUE, log.p = FALSE)，返回逆累积密度函数(对应的分位数)

In [107]:
print(qbinom(.5, 500, .4))

[1] 200


## 几何分布，$X\sim Pr(p)$

$E = \frac{1}{p}, Var = \frac{1-p}{p^2}$，概率质量函数：$\Pr(X=k)=(1-p)^{{k-1}}p$，n重伯努利试验进行到某种结果出现第一次为止，此时的试验总次数服从[几何分布](https://zh.wikipedia.org/wiki/%E5%B9%BE%E4%BD%95%E5%88%86%E4%BD%88)

#### rgeom(n, prob)

In [137]:
print(rgeom(5, .5))

[1]  1  0 10  0  1


#### dgeom(x, prob, log = FALSE)

In [153]:
dgeom(1, .5)    #此处应是0.5，不知有何问题

#### pgeom(q, prob, lower.tail = TRUE, log.p = FALSE)

In [151]:
pgeom(6, .5)

#### qgeom(p, prob, lower.tail = TRUE, log.p = FALSE)

In [156]:
qgeom(.5, prob = .5)

## 超几何分布，$X \sim H(n, K, N)$

$E = n\frac{k}{N}, Var = \displaystyle n{k \over N}{(N-k) \over N}{N-n \over N-1}$，概率质量函数：$f(k;n,K,N)={{{K \choose k}{{N-K} \choose {n-k}}} \over {N \choose n}}$，N个样本n次伯努利试验中抽到K个指定样本k次的概率(不放回)，n=1则[超几何分布](https://zh.wikipedia.org/wiki/%E8%B6%85%E5%87%A0%E4%BD%95%E5%88%86%E5%B8%83)还原为伯努利分布

m：白球数，n：黑球数，k：取球数

#### rhyper(nn, m, n, k)

In [171]:
print(rhyper(4, 10, 5, 5))

[1] 2 3 3 3


#### dhyper(x, m, n, k, log = FALSE)

In [172]:
dhyper(2, 5, 7, 4)  #白球5个，黑球7个，抽样4次取得2个白球

#### phyper(q, m, n, k, lower.tail = TRUE, log.p = FALSE)

In [175]:
dhyper(0, 5, 7, 4) + dhyper(1, 5, 7, 4) + dhyper(2, 5, 7, 4)
phyper(2, 5, 7, 4)

#### qhyper(p, m, n, k, lower.tail = TRUE, log.p = FALSE)

In [179]:
qhyper(.42, 5, 7, 4)
qhyper(.43, 5, 7, 4)    #向下取整

## 泊松分布，$X\sim\pi(\lambda), X\sim P(\lambda)$

$E = \lambda, Var = \lambda$，概率质量函数：$\frac{\lambda^k}{k!}e^{-\lambda}$，事件在单位时间或面积内以强度λ随机且独立出现

#### rpois(n, lambda)

In [181]:
print(rpois(10, 3))

 [1] 5 7 1 4 1 1 2 2 3 4


#### dpois(x, lambda, log = FALSE)

In [189]:
dpois(5, 3)

#### ppois(q, lambda, lower.tail = TRUE, log.p = FALSE)

In [190]:
dpois(0, 3) + dpois(1, 3) + dpois(2, 3)
ppois(2, 3)

#### qpois(p, lambda, lower.tail = TRUE, log.p = FALSE)

In [191]:
qpois(.5, 3)

## 指数分布，$X\sim Exp(\lambda)$

$E = \lambda^{-1}, Var = \lambda^{-2}$，概率质量函数：$
f(x;\lambda) = \lambda e^{-\lambda x}$，独立随机事件发生的时间间隔，此处参数为rate，对应$\frac{1}{\lambda}$

#### rexp(n, rate = 1)

In [200]:
print(rexp(5, .1))

[1] 10.515722  2.056978  8.941149  6.528677  6.985397


#### dexp(x, rate = 1, log = FALSE)

In [201]:
dexp(1, 1)

In [None]:
pexp(q, rate = 1, lower.tail = TRUE, log.p = FALSE)
qexp(p, rate = 1, lower.tail = TRUE, log.p = FALSE)

#### pexp(q, rate = 1, lower.tail = TRUE, log.p = FALSE)

In [203]:
pexp(1, .1)

#### qexp(p, rate = 1, lower.tail = TRUE, log.p = FALSE)

In [204]:
qexp(.5, 1)

## t分布，$Z\sim t(df)$

假设X服从标准正态分布$N(0,1)$，Y服从$\chi ^{2}$分布，那么$Z=\frac{X}{\sqrt{\frac{Y}{n}}}$的分布称为自由度为df的[t分布](https://zh.wikipedia.org/wiki/%E5%AD%A6%E7%94%9Ft-%E5%88%86%E5%B8%83),记为$Z\sim t(df)$

自由度df越小，t分布曲线愈平坦，曲线中间愈低，曲线双侧尾部翘得愈高；自由度df愈大，t分布曲线愈接近正态分布曲线，当自由度df=∞时，t分布曲线为标准正态分布曲线

统计量：$t={\frac{{\overline {X}}_{n}-\mu}{\sqrt{\frac{S^2}{n}}}}$，概率密度函数：${\frac  {\Gamma ((df +1)/2)}{{\sqrt  {df \pi }}\,\Gamma (df /2)\,(1+x^{2}/df )^{{(df +1)/2}}}}\!$

#### rt(n, df, ncp), ncp为非中心化参数&ensp;$\delta$，仅仅 可用abs(ncp)<=37.62的范围。如果忽略该参数，则使用中心分布

In [62]:
print(rt(3,df=10, ncp = 10))
print(rt(4,df=20, ncp = 20))
print(rt(5,df=30, ncp = 30))

[1]  6.269172  9.161156 10.045868
[1] 19.82308 21.98510 18.05173 22.59347
[1] 26.78821 26.33829 37.51754 29.56637 29.07232


#### dt(x, df, ncp, log = FALSE)， 返回&ensp;$f(x)$

In [66]:
dt(2, df = 20)
dt(2, df = 10)
dt(4, df = 10, ncp = 4)

#### pt(q, df, ncp, lower.tail = TRUE, log.p = FALSE)，返回累积密度函数(cdf)，等价于$P[X\leq q]$，lower.tail：TRUE (default), probabilities are P[X ≤ x], otherwise, P[X > x]

In [90]:
pt(4, df = 20, ncp = 4)
pt(4, df = 20, ncp = 4, lower.tail = FALSE)
pt(4, df = 20, ncp = 4) + pt(4, df = 20, ncp = 4, lower.tail = FALSE)

#### qt(p, df, ncp, lower.tail = TRUE, log.p = FALSE)，返回逆累积密度函数(对应的分位数)

In [78]:
qt(0.5, df = 20, ncp = 4)

## F分布，$X\sim F(df_1,df_2)$

设$X\sim\chi ^{2}(df_1),Y\sim\chi ^{2}(df_2)$，且X、Y独立，则随机变量$F=\frac{X/df_1}{Y/df_2}$服从自由度为$(df_1,df_2)$的[F分布](https://zh.wikipedia.org/wiki/F-%E5%88%86%E5%B8%83)，ANOVA中可写作：$X={\frac  {s_{1}^{2}}{\sigma _{1}^{2}}}\;/\;{\frac  {s_{2}^{2}}{\sigma _{2}^{2}}}$

$E = \frac{df_{2}}{df_{2}-2},df_2>2$，$Var=\frac  {2df_{2}^{2}\,(df_{1}+df_{2}-2)}{df_{1}(df_{2}-2)^{2}(df_{2}-4)},df_2>4$

#### rf(n, df1, df2, ncp)

In [121]:
print(rf(3,df1 = 10, df2 = 10, ncp = 10))

[1] 4.849814 2.658284 1.192684


#### df(x, df1, df2, ncp, log = FALSE)

In [122]:
df(2, 10, 10, 1)

#### pf(q, df1, df2, ncp, lower.tail = TRUE, log.p = FALSE)

In [123]:
pf(2, 10, 10, 1)

#### qf(p, df1, df2, ncp, lower.tail = TRUE, log.p = FALSE)

In [128]:
qf(.5, 10, 10)

## 卡方分布，$X\sim\chi ^{2}(df)$

设随机变量$X_1,X_2,...,X_n$相互独立且都服从$N(0,1)$，则$\chi^{2}=\sum_{i=1}^{df}X_i^2$服从自由度为df的[卡方分布](https://zh.wikipedia.org/wiki/%E5%8D%A1%E6%96%B9%E5%88%86%E4%BD%88)

$E = df, Var = 2df$，概率密度函数：${\displaystyle f_{df}(x)={\frac {{\frac {1}{2}}^{\frac {df}{2}}}{\Gamma ({\frac {df}{2}})}}x^{{\frac {df}{2}}-1}e^{\frac {-x}{2}}}$

#### rchisq(n, df, ncp = 0)

In [109]:
print(rchisq(3, df = 20))

[1] 20.16848 13.16151 15.64013


#### dchisq(x, df, ncp = 0, log = FALSE)，返回&ensp;$f(x)$

In [114]:
dchisq(2, df = 10)
dchisq(1, df = 10, ncp = 1)

#### pchisq(q, df, ncp = 0, lower.tail = TRUE, log.p = FALSE)

In [118]:
pchisq(23, df = 10, ncp = 4)
pchisq(23, df = 20, ncp = 4, lower.tail = FALSE)
pchisq(23, df = 30, ncp = 4) + pchisq(23, df = 30, ncp = 4, lower.tail = FALSE)

#### qchisq(p, df, ncp = 0, lower.tail = TRUE, log.p = FALSE)

In [116]:
qchisq(0.5, df = 20, ncp = 4)

## Beta分布，$X\sim Be(\alpha,\beta)$

也称[B分布](https://zh.wikipedia.org/wiki/%CE%92%E5%88%86%E5%B8%83)，是指一组定义在(0,1)区间的连续概率分布，有两个参数α,β>0

$E[x] = \frac{\alpha}{\alpha + \beta},E[\ln{x}]=\psi(\alpha)-\psi(\alpha+\beta),Var = \frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}$，概率密度函数：$f(x;\alpha,\beta)=\frac  {x^{{\alpha -1}}(1-x)^{{\beta -1}}}{{\mathrm  {B}}(\alpha ,\beta )}$

#### rbeta(n, shape1, shape2, ncp = 0)

In [206]:
print(rbeta(5, 2, 2))

[1] 0.4697952 0.6384061 0.8721710 0.2423926 0.2575728


#### dbeta(x, shape1, shape2, ncp = 0, log = FALSE)，返回&ensp;$f(x)$

In [222]:
dbeta(.5, .5, .5)

#### pbeta(q, shape1, shape2, ncp = 0, lower.tail = TRUE, log.p = FALSE)

In [221]:
pbeta(.5, .5, .5)

#### qbeta(p, shape1, shape2, ncp = 0, lower.tail = TRUE, log.p = FALSE)

In [220]:
qbeta(.5, .5, .5)

# 排列组合

计算$C_6^3$

In [129]:
choose(6,3)

计算$A_6^3$

In [130]:
choose(6,3) * factorial(3)