# **Sampling Techniques**

## **Stratified Sampling**

In stratified sampling,we divide the population into distinct subgroup (strata) and sample a fixed number of items from each stratum.

In [13]:
import pandas as pd

#sample data
data={'category': ['A','A','B','B','C','C','A','B','C','C'],
'value':[10,15,20,25,30,35,40,45,50,55]}

df=pd.DataFrame(data)

# Stratifie sampling: Sample 1 item from each category
def stratified_sampling(df,stratify_by,n_samples):
    return df.groupby(stratify_by,group_keys=False).apply(lambda x:x.sample(n=n_samples))
#Get 1 sample from each category
stratified_sample=stratified_sampling(df,'category',1)
print("stratified Sample: \n",stratified_sample)

stratified Sample: 
   category  value
0        A     10
3        B     25
9        C     55


## **Cluster Sampling**

In Cluster Sampling, the population is divided into clusters,and entire cluster are randomly selected.

In [9]:
import pandas as pd
import numpy as np 

#sample data
data={'cluster':['X','X','Y','Y','Z','Z','X','Y','Z','Z'],
      'value':[10,15,20,25,30,35,40,45,50,55]}
df=pd.DataFrame(data)

#cluster sampling: Randomly select cluster
def cluster_sampling(df,cluster_by,n_clusters):
    sampled_clusters=np.random.choice(df[cluster_by].unique(),n_clusters,replace=False)
    return df[df[cluster_by].isin(sampled_clusters)]

#sample 2 Clusters
cluster_sample=cluster_sampling(df,'cluster',2)
print('\nCluster Sample:\n', cluster_sample)



Cluster Sample:
   cluster  value
0       X     10
1       X     15
2       Y     20
3       Y     25
6       X     40
7       Y     45


## **Systematic Sampling:**

In systematic sampling,we select entry k-th item from a list after a random starting point.

In [10]:
import pandas as pd
#sample Data
data= {'value':[10,15,20,25,30,35,40,45,50,55]}
df=pd.DataFrame(data)

#systematic Sampling: Select every k-th item
def systematic_sampling(df,step):
    return df.iloc[::step]


#select every 2nd item
systematic_sample=systematic_sampling(df,2)
print('\n Systematic Sample:\n',systematic_sample)


 Systematic Sample:
    value
0     10
2     20
4     30
6     40
8     50


## **Convenience Sampling:**

In convenience sampling,we select samples that are easiest to access, like the first few rows of a dataset.


In [11]:
import pandas as pd
#sample data
data={'value':[10,15,20,25,30,35,40,45,50,55]}


#convenience sampling : Select the first few samples

def convenience_sampling(df,n_samples):
    return df.head(n_samples)

#select the first 3 item
convenience_sample=convenience_sampling(df,3)
print("\nCnvenience Sample:\n",convenience_sample)


Cnvenience Sample:
    value
0     10
1     15
2     20
