##**Proportional Sampling :**


-- A technique to pick a datapoint among a given population / sample based on the feature values i.e (say feature1)

-- Probability of picking a datapoint is proportional to the feature value (feature1)

-- Prob. of picking 1st element < Prob. of picking 2nd element < ..... < Prob. of picking nth element



### **Algorithm :**

1) Compute the Sum of all the feature values. (Sum of '**feature1'**)

2) Normalize the Feature values i.e. **'norm_feature_1'**

3) Compute the Cumulative Sum of Normalized Feature values i.e. **'cum_norm_feature1'**

4) Generate a random probability value between 0 to 1.

5) Pick a datapoint based on the probability value i.e p

  - Based on the cumulative normalized feature sum values i.e. **'cum_norm_feature1'**

In [41]:
# Import all Libraries:

import numpy as np
import pandas as pd

In [42]:
# Our Data :

data = [5,10,7,15,20,30,15,20,25,50]

# Let's create a Dataframe :

df = pd.DataFrame(data,columns=['feature1'])
df

Unnamed: 0,feature1
0,5
1,10
2,7
3,15
4,20
5,30
6,15
7,20
8,25
9,50


### **1. Compute the sum of feature values :**

In [43]:
# Compute the total no. of feature values :

n = df.shape[0]
print('No. of Datapoints : ',n)

# Compute the total sum of the feature values :

sum = 0
for i in df['feature1']:
  sum += i
print("\nSum of di's or feature1's value : ",sum)


No. of Datapoints :  10

Sum of di's or feature1's value :  197


### **2. Normalized the feature values :**

In [44]:
# Compute Normalized Feature value : norm_feature_1

df['norm_feature_1'] = df['feature1']/sum

print('\n Final DataFrame : \n')
print(df)


 Final DataFrame : 

   feature1  norm_feature_1
0         5        0.025381
1        10        0.050761
2         7        0.035533
3        15        0.076142
4        20        0.101523
5        30        0.152284
6        15        0.076142
7        20        0.101523
8        25        0.126904
9        50        0.253807


### **3. Compute the cumulative sum of normalized features :**

In [45]:
# Compute Cumulative Sum of Normalized Features :

x = df['norm_feature_1']
y = []
prev = 0

for i in x:
  cum_sum = i + prev
  y.append(cum_sum)
  prev = cum_sum

df['cum_norm_feature1'] = y

df

Unnamed: 0,feature1,norm_feature_1,cum_norm_feature1
0,5,0.025381,0.025381
1,10,0.050761,0.076142
2,7,0.035533,0.111675
3,15,0.076142,0.187817
4,20,0.101523,0.28934
5,30,0.152284,0.441624
6,15,0.076142,0.517766
7,20,0.101523,0.619289
8,25,0.126904,0.746193
9,50,0.253807,1.0


### **4. Generate a random probability value between 0 and 1 :**

In [46]:
# Generate a random number between 0 and 1 :
# This random number i.e. p acts as a probability used to pickup datapoints 

p = np.random.uniform(0,1,1)

p[0]

0.18706288015954353

### **5. Pick Datpoints based on the prob. value (p) :**

In [47]:
# Conditional Sampling based on cumulative normalised sum values : (Proportional Sampling)

for i in range(n):
  if(p < df['cum_norm_feature1'][i]):
    pos = i-1
    if(pos < 0):
      pos = 0
    print('Picked',pos,'th datapoint : ',df['feature1'][pos])
    break

Picked 2 th datapoint :  7
