### Random Sampling
sample() is an inbuilt function of random module in Python that returns a particular length list of items chosen 
from the sequence i.e. list, tuple, string or set. Used for random sampling without replacement.

In [12]:
# Python3 program to demonstrate
# the use of sample() function .

# import random
from random import sample

# Prints list of random items of given length
list1 = [1, 2, 3, 4, 5]

print(sample(list1,3))

[3, 4, 5]


### Systematic Sampling

Systematic Sampling is defined as the type of Probability Sampling where a researcher can research on a targeted data from large set of data. Targeted data is chosen by selecting random starting point and from that after certain interval next element is chosen for sample. In this a small subset (sample) is extracted from large data.

In [15]:
# Import in order to use inbuilt functions
import numpy as np
import pandas as pd

# Define total number of students
number_of_students = 15

# Create data dictionary
data = {'Id': np.arange(1, number_of_students+1).tolist(),
        'height': [159, 171, 158, 162, 162, 177, 160, 175, 168, 171, 178, 178, 173, 177, 164]}

# Transform dictionary into a data frame
df = pd.DataFrame(data)

display(df)

Unnamed: 0,Id,height
0,1,159
1,2,171
2,3,158
3,4,162
4,5,162
5,6,177
6,7,160
7,8,175
8,9,168
9,10,171


In [16]:
# Define systematic sampling function
def systematic_sampling(df, step):

    indexes = np.arange(0, len(df), step=step)
    systematic_sample = df.iloc[indexes]
    return systematic_sample


# Obtain a systematic sample and save it in a new variable
systematic_sample = systematic_sampling(df, 3)

# View sampled data frame
display(systematic_sample)

Unnamed: 0,Id,height
0,1,159
3,4,162
6,7,160
9,10,171
12,13,173


### Cluster Sampling
Cluster sampling is a type of probability sampling in which every and each element of the population is selected equally, we use the subsets of the population as the sampling part rather than the individual elements for sampling.

The population is divided into subsets or subgroups that are considered as clusters, and from the numbers of clusters, we select the individual cluster for the next step to be performed.

In [2]:
# import pandas
import pandas as pd

# import numpy
import numpy as np

# creating dictionary of data
data = {'N_numbers':np.arange(1,16)}

# creating dataframe
df = pd.DataFrame(data)
df

Unnamed: 0,N_numbers
0,1
1,2
2,3
3,4
4,5
5,6
6,7
7,8
8,9
9,10


In [3]:
# sample of data
samples = df.sample(4)
print(samples)

    N_numbers
13         14
3           4
1           2
6           7


### Stratified Sampling
Stratified Sampling is a sampling technique used to obtain samples that best represent the population. It reduces bias in selecting samples by dividing the population into homogeneous subgroups called strata, and randomly sampling data from each stratum(singular form of strata). 

In statistics, stratified sampling is used when the mean values of each stratum will differ. In Machine learning, stratified sampling is commonly used to create test datasets to evaluate models especially when the dataset is significantly large and unbalanced.  

In [4]:
import pandas as pd

# Create a dictionary of students
students = {'Name': ['Lisa', 'Kate', 'Ben', 'Kim', 'Josh', 'Alex', 'Evan', 'Greg', 'Sam', 'Ella'],
    'ID': ['001', '002', '003', '004', '005', '006', '007', '008', '009', '010'], 'Grade': ['A', 'A', 'C', 'B', 'B', 'B', 'C',
            'A', 'A', 'A'], 'Category': [2, 3, 1, 3, 2, 3, 3, 1, 2, 1]
}

# Create dataframe from students dictionary
df = pd.DataFrame(students)

# view the dataframe
df

Unnamed: 0,Name,ID,Grade,Category
0,Lisa,1,A,2
1,Kate,2,A,3
2,Ben,3,C,1
3,Kim,4,B,3
4,Josh,5,B,2
5,Alex,6,B,3
6,Evan,7,C,3
7,Greg,8,A,1
8,Sam,9,A,2
9,Ella,10,A,1


In [5]:
df.groupby('Grade', group_keys=False).apply(lambda x: x.sample(2))

Unnamed: 0,Name,ID,Grade,Category
7,Greg,8,A,1
9,Ella,10,A,1
3,Kim,4,B,3
5,Alex,6,B,3
6,Evan,7,C,3
2,Ben,3,C,1


In [6]:
df.groupby('Grade', group_keys=False).apply(lambda x: x.sample(frac=0.6))

Unnamed: 0,Name,ID,Grade,Category
0,Lisa,1,A,2
8,Sam,9,A,2
7,Greg,8,A,1
3,Kim,4,B,3
4,Josh,5,B,2
2,Ben,3,C,1
