## What is Systematic Sampling?
Systematic Sampling is a statistical technique where elements are selected from an ordered sampling frame at regular intervals, known as the sampling interval. Instead of selecting elements randomly, the process involves:

1. Ordering the population elements.
2. Selecting a starting point randomly.
3. Choosing every k-th element, where 𝑘 is the sampling interval

The formula for the sampling interval is:

$$ k = \frac{\text{Population Size}}{\text{Sample Size}} $$


#### Advantages of Systematic Sampling
1. **Simplicity:** Easy to implement, especially for large datasets.
2. **Speed:** Faster than simple random sampling.
3. **Ensures Coverage:** Spreads the sample evenly across the population, reducing bias from clusters.
4. **Good for Automation:** Can be programmatically executed without needing extensive manual effort.

#### Disadvantages of Systematic Sampling
1. **Risk of Pattern Bias:** If the data contains a hidden pattern that coincides with the sampling interval, the sample may become biased.
2. **Not Truly Random:** While it's systematic, it doesn’t provide the same randomness as simple random sampling.
3. **Dependency on Sampling Frame:** The method assumes the population is well-ordered and not cyclically biased.

#### When to Use Systematic Sampling
1. When the population is large and a complete list (sampling frame) is available.
2. When data collection needs to be quick and cost-effective.
3. When the population is evenly distributed, with no cyclic patterns.
4. When random sampling is impractical or unnecessary.

#### Python Code Example


In [13]:
import numpy as np
import pandas as pd

In [49]:
population = pd.DataFrame({
    "ID" : range(1,1001), # generated 1000 ID 
    "Value" : np.random.randint(1,100,1000) # generated 1000 number in range 1 - 100
})

# We need 100 samples from the population
sample_size = 100 

k = len(population) // sample_size

# choose 1 random number to start systematic generation
start_index = np.random.randint(1,k)

# from starting index to end of population with k steps
systematic_sampling = population.iloc[start_index::k]

In [51]:
systematic_sampling

Unnamed: 0,ID,Value
5,6,60
15,16,34
25,26,72
35,36,35
45,46,27
...,...,...
955,956,50
965,966,85
975,976,5
985,986,81


#### Output Example
Suppose the random start point is 6, and 𝑘 = 10
the selected sample would contain the 6th, 16th, 26th, etc., rows from the dataset.<br> The systematic sample ensures a proportional and even distribution over the dataset.