
### Stratified Sampling:
Stratified sampling is a sampling technique in which the population is divided into distinct groups, called strata, based on specific characteristics (e.g., age, gender, income level). Then, a random sample is taken from each stratum, ensuring that the sample represents all subgroups in the population proportionally.

This method is used to improve the representativeness of the sample, especially when the population is heterogeneous.

### Steps in Stratified Sampling:
1 Divide the Population into Strata:

    - Identify the relevant characteristic(s) and divide the population into non-overlapping subgroups (strata).
2 Determine Sample Size for Each Stratum:

    - Decide the proportion of the sample to be taken from each stratum. This can be based on the size of the stratum relative to the entire population (proportional allocation) or a fixed number per stratum (equal allocation).
3 Random Sampling within Strata:

     - Perform simple random sampling within each stratum to select the required number of samples.
4 Combine Samples:

     - Merge the samples from all strata to form the final stratified sample.

In [5]:
import pandas as pd
import numpy as np

# Sample population data
data = {
    'Customer_ID': range(1, 101),  # 100 customers
    'Age_Group': np.random.choice(['Youth', 'Adult', 'Senior'], size=100, p=[0.3, 0.5, 0.2]),
    'Satisfaction_Score': np.random.randint(1, 6, size=100)  # Satisfaction scores (1–5)
}

df = pd.DataFrame(data)

# Display population distribution by strata
print("Population Distribution:")
print(df['Age_Group'].value_counts())

# Stratified sampling
sample_size = 30  # Total sample size
strata_proportions = df['Age_Group'].value_counts(normalize=True)  # Calculate proportions
strata_sample_sizes = (strata_proportions * sample_size).astype(int)  # Calculate samples per stratum

# Function to sample within each stratum
def stratified_sample(group):
    n = strata_sample_sizes[group.name]  # Get sample size for the current stratum
    return group.sample(n=n, random_state=42)

# Apply stratified sampling
stratified_sample_df = df.groupby('Age_Group', group_keys=False).apply(stratified_sample)

# Display results
print("\nStratified Sample Distribution:")
print(stratified_sample_df['Age_Group'].value_counts())
print("\nStratified Sample Data:")
print(stratified_sample_df)


Population Distribution:
Age_Group
Adult     57
Youth     29
Senior    14
Name: count, dtype: int64

Stratified Sample Distribution:
Age_Group
Adult     17
Youth      8
Senior     4
Name: count, dtype: int64

Stratified Sample Data:
    Customer_ID Age_Group  Satisfaction_Score
4             5     Adult                   3
11           12     Adult                   1
52           53     Adult                   3
25           26     Adult                   3
61           62     Adult                   2
98           99     Adult                   4
47           48     Adult                   1
55           56     Adult                   1
82           83     Adult                   2
24           25     Adult                   2
84           85     Adult                   4
7             8     Adult                   3
79           80     Adult                   4
46           47     Adult                   2
16           17     Adult                   2
30           31     Adult      

  stratified_sample_df = df.groupby('Age_Group', group_keys=False).apply(stratified_sample)
