### Wisdom of the Crowd: Sampling and Describing

This activity focuses on tracking the results of sampling from a known distribution and examining the mean and variance of the sample as the sample size increases.  You will generate a normal distribution using `scipy.stats`.  Using this distribution you will generate samples and determine the mean and variance of these, resulting in plots similar to those from the lectures.

#### Index

- [Problem 1](#-Problem-1)
- [Problem 2](#-Problem-2)
- [Problem 3](#-Problem-3)



In [2]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scipy.stats as stats

[Back to top](#-Index)

### Problem 1

#### $N(10, 4)$

**10 Points**

Below, use `scipy.stats.norm` to create a normal distribution with mean = 10 and scale = 4.  Assign this as `dist`.  Then, perform 1000 iterations where you sample the loop index as the sample size with the `.rvs()` method on dist also using the iteration as the `random_state` parameter in `.rvs()`.  Keep track of the sample means with the list `means`.  Uncomment the code to visualize the results.  

In [3]:
### GRADED
dist = ''
means = []
for i in range(1, 1000):
    pass
    
### BEGIN SOLUTION
dist = stats.norm(10, 4)
means = []
for i in range(1, 1000):
    sample = dist.rvs(i, random_state = i)
    means.append(np.mean(sample))
### END SOLUTION

### ANSWER CHECK
print(means[:5])

# plt.plot(range(1, 1000), means)
# plt.axhline(10, label = 'True Mean', color = 'red')
# plt.xlabel('N Participants')
# plt.ylabel('Sample Mean')
# plt.legend();

[16.497381454652967, 9.0539506507364, 13.095514389352422, 10.248202617565216, 11.91891698717907]


In [4]:
### BEGIN HIDDEN TESTS
dist_ = stats.norm(10, 4)
means_ = []
for i in range(1, 1000):
    sample_ = dist_.rvs(i, random_state = i)
    means_.append(np.mean(sample_))
#
#
#
assert means == means_
### END HIDDEN TESTS

[Back to top](#-Index)

### Problem 2

#### Mean and Variance of samples


Compute the variance of samples drawn from a normal distribution with mean = 10 and scale (standard deviation) = 4. The sample sizes will vary from 1 to 999.


Follow these steps to complete the task:

1. Utilize the `dist` object created in Problem 1.
2. Iterate over sample sizes ranging from 1 to 999.
3. For each sample size, generate a sample from the normal distribution using the `.rvs()` method on `dist`.
4. Calculate the variance of each sample using the `np.var()` function.
5. Store the computed variances in a list named `variances`.

Ensure that each sample is generated with the same random state (e.g., `random_state=1`) to maintain consistency throughout the iterations.


In [5]:
### GRADED
variances = []
    
### BEGIN SOLUTION
variances = []
for i in range(1, 1000):
    sample = dist.rvs(i, random_state = i)
    variances.append(np.var(sample)/i)
### END SOLUTION

### ANSWER CHECK
print(variances[:5])

[0.0, 0.25990755125959586, 2.848675869599996, 1.7100201767562866, 3.28810836749552]


In [6]:
# fig, ax = plt.subplots(1, 2, figsize = (20, 5))
# ax[0].plot(range(5, 1500), means)
# ax[0].axhline(10, label = 'True Mean', color = 'red')
# ax[0].set_xlabel('N Participants')
# ax[0].set_ylabel('Sample Mean')
# ax[0].legend();

# ax[1].plot(range(5, 1500), variances)
# ax[1].set_xlabel('N Participants (log)')
# ax[1].set_ylabel('Sample Variance (log)')
# ax[1].set_xscale('log')
# ax[1].set_yscale('log')

In [7]:
### BEGIN HIDDEN TESTS
variances_ = []
for i in range(1, 1000):
    sample_ = dist_.rvs(i, random_state = i)
    variances_.append(np.var(sample_)/i)
#
#
#
assert variances == variances_
### END HIDDEN TESTS

[Back to top](#-Index)

### Problem 3

#### Interpreting the Results

Here, we can interpret the mean as the estimator or model to describe our data with.  As we increase the number of participants in the samples, does our estimator get closer to the truth?  Answer "yes" or "no" as a string below for `ans3`. 

In [8]:
### GRADED
ans3 = ''
    
### BEGIN SOLUTION
ans3 = 'yes'
### END SOLUTION

### ANSWER CHECK
print(type(ans3))

<class 'str'>


In [9]:
### BEGIN HIDDEN TESTS
ans_ = 'yes'
#
#
#
assert ans3 == ans_
### END HIDDEN TESTS