## **Probability Sampling**

**1. Simple Random Sampling**

Each member of the population has an equal chance of being selected.

Example: Randomly selecting 100 students from a school of 5,000.

Best for: When the population is homogeneous (similar characteristics).

Limitation: Can be inefficient for large populations.

In [None]:
import numpy as np

# Create a population of 10,000 values
population = np.arange(1, 10001)

# Take a random sample of size 100
sample = np.random.choice(population, size=100, replace=False)

print("Random Sample:", sample[:10])

**2. Stratified Sampling**

The population is divided into subgroups (strata), and samples are taken proportionally from each group.

Example: A company wants to survey employees but ensures proportional representation from different departments (HR, IT, Sales).

Best for: When the population has distinct groups (e.g., gender, income level).

Limitation: Requires prior knowledge of population structure.

In [None]:
import pandas as pd
import numpy as np

# Create a population with categories
data = {'ID': np.arange(1, 101),
        'Department': np.random.choice(['HR', 'IT', 'Sales'], size=100)}

df = pd.DataFrame(data)

# Perform stratified sampling
sampled_df = df.groupby('Department').sample(n=5, random_state=42)

print(sampled_df.head(10))

**3. Systemic Sampling**

Selects every k-th member from a population after a random start.

Example: Selecting every 10th customer from a store's customer database.

Best for: Large datasets where a patterned selection is practical.

Limitation: Can introduce bias if there is a hidden pattern in the population.

In [None]:
import numpy as np

population = np.arange(1, 1001)  # Population of 1,000 people
k = 10  # Every 10th person

# Select every k-th person
systematic_sample = population[::k]

print("Systematic Sample:", systematic_sample[:10])

**4. Cluster Sampling**

Instead of selecting individuals, entire groups (clusters) are randomly selected.

Example: Selecting 5 random schools from a city and surveying all students in those schools.

Best for: Geographically spread populations (e.g., cities, schools).

Limitation: Less precise if clusters differ significantly.

In [None]:
import pandas as pd
import numpy as np

# Create a dataset with "clusters" (schools)
df = pd.DataFrame({'School_ID': np.repeat(np.arange(1, 11), 10), 'Student_ID': np.arange(1, 101)})

# Randomly select 3 clusters (schools)
selected_schools = np.random.choice(df['School_ID'].unique(), size=3, replace=False)

# Select all students from those schools
cluster_sample = df[df['School_ID'].isin(selected_schools)]

print(cluster_sample.head(10))

## **Non-Probability Sampling**

**1. Convenience Sampling**

Selecting the easiest individuals to reach.

Example: Surveying only people in a shopping mall instead of all city residents.

Limitation: High risk of bias because not everyone has an equal chance of being selected.

**2. Quota Sampling**

Like stratified sampling, but selection is not random (based on quotas).

Example: A survey ensures 50 men and 50 women are chosen, but interviewers pick them non-randomly.

Limitation: Human selection bias can affect results.

**3. Snowball Sampling**

Recruiting participants who refer others, useful for hard-to-reach groups.

Example: Studying drug users or undocumented immigrants by asking them to refer friends.

Limitation: Not representative of the entire population.