# Sampling

> Many of the most egregious statistical assertions are caused by good statistical methods applied to bad samples, not the opposite.

> Some of the most egregious statistical mistakes involve lying with data; the statistical analysis is fine, but the data is bogus.
- Naked Statistics

### Introduction

In previous lessons, we looked at statistics from the entire NBA, and from there, calculated summary statistics of different variables like height or weight.  In this lesson, we'll see how we can use sampling as an alternative to exploring data for an entire population.  

### Population and Sampling

Remember that in statistics, we define a **population** to be the collection of people, animals, locations that the study is focused on.  

1. Select a representive subset of the population and then
2. From that sample, draw inferences about the rest of the population   

The ideal is for a *simple random sample* of the population.

> A  **Simple Random Sample** (SRS) is where every item in the population has an equal chance of being selected and enrolled in the survey.

### Sampling Procedure

1. Define the population 

* What population is of interest in the study
    * Think of the boundaries
* the **unit of analysis**, that is, are we looking to gather information on students, classrooms, or schools


2. Find a sampling frame 
> A **Sampling Frame** is the source material or device from which a sample is drawn.

* This is what we're sampling from 
*  electoral register, a student or phone directory, or homeowner lists.  
* American Psychological Association, the American Medical Association or business registration lists.

3. Random Selection

In [5]:
from numpy import random

In [10]:
random.randint(0, 1000, 100)

array([442, 534, 171, 945, 735, 662, 793, 988, 405, 550, 596, 659, 854,
       465, 843, 151, 488, 732, 240,  46, 288,  44, 185, 575, 738, 165,
       574, 112, 600, 790, 471,   8, 532, 225, 184, 163, 390, 139, 567,
       930, 158, 124, 694, 951, 385, 417, 418, 994, 380, 355, 237, 962,
       461, 492, 378,  74, 868, 340, 216, 297, 970, 630, 906, 212, 638,
       649, 781, 746, 974, 207, 361, 910, 610, 117, 165, 121, 597, 935,
        45, 785, 554, 478, 460, 603, 135, 948, 117,  86,  57, 655, 521,
       647, 203, 224, 937, 947,  59, 521, 625,  99])

4. Seek responses/enrollment

5. Data Collection


### An example and some questions

Let's say that we want to find the amount that individuals exercise in the week in a local town.  To do so, we get a phone book from the town, and randomly select individuals from the phone book.  That evening, we go one by one through our selected individuals, and call each individual to ask how many hours per week they spend at the gym, on average.  Whoever answers the phone, we write the information down.

* What are some of the issues with the way the survey was conducted?  

* As a checklist, consider every step of the survey.  What could have gone wrong at each step? (Not all steps were done poorly, but it's still good to consider them all).

Is there further information you would like to know about how the survey was collected?  

How could you improve the survey?

### Summary

In this last lesson, we learned about sampling.  We use sampling because it is often too costly to collect information on the population in question.  Ideally, we collect a simple random sample.  A simple random sample is where every item in the population has an equal chance of being selected and enrolled in the survey.

We then spoke about the process of sampling, which involves (1) defining the population and units of measurement (2) finding a sampling frame (3) randomly selecting from our sampling frame (4) enrollment and (5) data collection.

### Resources

* [Sampling Techniques - UCA](https://uca.edu/psychology/files/2013/08/Ch7-Sampling-Techniques.pdf)
* [Sampling Poverty Action Lab - Duflo](https://www.povertyactionlab.org/sites/default/files/documents/Using%20Randomization%20in%20Development%20Economics.pdf)
* [Sampling Wikipedia](https://en.wikipedia.org/wiki/Sampling_(statistics))

* [Sampling in RCTs](https://bpspubs.onlinelibrary.wiley.com/doi/pdf/10.1111/j.1365-2125.1982.tb01429.)
