# Sampling with Python 


\textbf{Sampling} is the process of selecting a subset of individuals or observations from a larger population to estimate or make inferences about the population as a whole. Sampling is often used in research to save time and resources by studying a smaller, manageable subset of the population rather than the entire population.

There are various sampling techniques that can be used depending on the research question, the population, and the available resources. Here are 5 commonly used sampling techniques:


## Simple random sampling:

This is a basic sampling technique in which each individual or observation in the population has an equal chance of being selected for the sample. This can be done using a random number generator or a table of random numbers.



### Use-Case Example: 

 In a study conducted by the Centers for Disease Control and Prevention (CDC) in the United States, a simple random sample of households was selected from a list of all households in the country to estimate the prevalence of diabetes among adults. The sample was selected using a computer-assisted method that randomly selected households from the list. The results of the study were used to inform public health policies and programs related to diabetes prevention and control.

In [None]:
import random

# population data
population = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# sample size
sample_size = 5

# simple random sampling
sample = random.sample(population, sample_size)

print(sample)


## Stratified sampling:
This technique involves dividing the population into subgroups or strata based on certain characteristics (e.g., age, gender, income, etc.) and then selecting a random sample from each stratum in proportion to its size. This helps ensure that the sample is representative of the population with respect to these characteristics.

### Use-Case Example: 
In a study conducted by researchers at the University of California, Los Angeles (UCLA), a stratified random sample of households was selected from a list of all households in Los Angeles County to estimate the prevalence of asthma among children. The population was divided into strata based on zip code and a random sample was selected from each stratum. The results of the study were used to identify areas with high rates of asthma and to develop targeted interventions to improve asthma management and reduce the burden of the disease in the community.

In [9]:
import random
import random

# population data with strata
population = {
    'male': [18, 22, 25, 30, 32],
    'female': [20, 24, 26, 28, 35]
}

# sample size
sample_size = 6

# stratified sampling
sample = []
for stratum in population:
    stratum_sample_size = int(sample_size * len(population[stratum]) / len(population))
    if stratum_sample_size > 0:
        stratum_sample = random.sample(population[stratum], min(stratum_sample_size, 
                                                                len(population[stratum])))
        sample.extend(stratum_sample)

print(sample)



[18, 32, 30, 22, 25, 35, 26, 24, 28, 20]


## Cluster sampling: 
This technique involves dividing the population into clusters or groups (e.g., neighborhoods, schools, etc.) and then randomly selecting a sample of clusters. All individuals within the selected clusters are then included in the sample. This can be more efficient than simple random sampling when the population is large and dispersed.

### Use Case Example 
In a study conducted by the World Health Organization (WHO) in several African countries, a cluster sample of households was selected from a list of all households in each country to estimate the prevalence of malaria among children. The population was divided into clusters based on villages or communities, and a random sample of clusters was selected. All households within the selected clusters were included in the sample. The results of the study were used to guide the development and implementation of malaria prevention and control strategies in the affected countries.

In [6]:
import random

# population data with clusters
population = {
    'cluster1': [1, 2, 3],
    'cluster2': [4, 5, 6],
    'cluster3': [7, 8, 9, 10]
}

# sample size
sample_size = 7

# cluster sampling with replacement
sample = []
while len(sample) < sample_size:
    cluster = random.choice(list(population.keys()))
    element = random.choice(population[cluster])
    sample.append(element)

print(sample)


[6, 4, 10, 10, 3, 2, 2]


## Systematic sampling: 
This technique involves selecting every nth individual from a list or sequence of individuals in the population. The value of n is determined by dividing the population size by the desired sample size.

### Use-Case Example: 

In a study conducted by researchers at the University of Michigan, a systematic sample of patients was selected from a list of all patients who visited a primary care clinic over a certain period of time to estimate the prevalence of depression among the patients. Every 10th patient who visited the clinic was selected for inclusion in the sample. The results of the study were used to improve the screening and management of depression in primary care settings.

In [7]:
import random

# population data
population = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# sample size
sample_size = 5

# systematic sampling
sample = [population[i] for i in range(0, len(population), int(len(population)/sample_size))]

print(sample)


[1, 3, 5, 7, 9]


## Convenience sampling:
This technique involves selecting individuals who are readily available and willing to participate in the study, such as friends, family, or colleagues. This is often used when time and resources are limited, but the sample may not be representative of the population.

### Use-Case Example: 

 In a study conducted by researchers at a university, a convenience sample of students was selected from a psychology course to examine the relationship between personality traits and academic performance. The researchers recruited participants by advertising the study in the course and offering extra credit to those who participated. The results of the study were used to generate hypotheses for future research, but the findings may not be generalizable to the wider population of students.

![image.png](attachment:image.png)

In [8]:
import random

# population data
population = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# sample size
sample_size = 5

# convenience sampling
sample = random.sample(population, sample_size)

print(sample)


[6, 1, 3, 5, 8]
