# Simple Random Sampling (SRS)
* In this algorithm we select our sample from population in complete random manner without any kind of bias. That means each and every element that is present in population will have equal chances of being selected in sample.

## Sampling Error
* **sampling error = parameter-statistics**
* Parameter is any descriptive measure of population (mean of Population,max of population, min of population etc) and Statistic is any descrptive measure of sample (mean of Sample, max of sample, min of sample).

In [1]:
import pandas as pd
import numpy as np

In [2]:
df = pd.read_csv('startup_funding.csv')
df.head()

Unnamed: 0,SNo,Date,StartupName,IndustryVertical,SubVertical,CityLocation,InvestorsName,InvestmentType,AmountInUSD,Remarks
0,0,01-08-2017,TouchKin,Technology,Predictive Care Platform,Bangalore,Kae Capital,Private Equity,1300000.0,
1,1,02-08-2017,Ethinos,Technology,Digital Marketing Agency,Mumbai,Triton Investment Advisors,Private Equity,,
2,2,02-08-2017,Leverage Edu,Consumer Internet,Online platform for Higher Education Services,New Delhi,"Kashyap Deorah, Anand Sankeshwar, Deepak Jain,...",Seed Funding,,
3,3,02-08-2017,Zepo,Consumer Internet,DIY Ecommerce platform,Mumbai,"Kunal Shah, LetsVenture, Anupam Mittal, Hetal ...",Seed Funding,500000.0,
4,4,02-08-2017,Click2Clinic,Consumer Internet,healthcare service aggregator,Hyderabad,"Narottam Thudi, Shireesh Palle",Seed Funding,850000.0,


In [3]:
amount = df['AmountInUSD']
amount.head()

0    13,00,000
1          NaN
2          NaN
3     5,00,000
4     8,50,000
Name: AmountInUSD, dtype: object

In [4]:
# To drop all the NaN records
amount.dropna(inplace=True)
amount.head()

0    13,00,000
3     5,00,000
4     8,50,000
5    10,00,000
6    26,00,000
Name: AmountInUSD, dtype: object

In [5]:
# To replace ',' with ''
a1 = amount.str.replace(',','')
a1.head()

0    1300000
3     500000
4     850000
5    1000000
6    2600000
Name: AmountInUSD, dtype: object

In [6]:
a1.iloc[0]

'1300000'

In [7]:
# To convert the particular column to numeric data type
a2 = pd.to_numeric(a1)
a2.headd()

0    1300000
3     500000
4     850000
5    1000000
6    2600000
Name: AmountInUSD, dtype: int64

In [9]:
type(a2.iloc[0])

numpy.int64

In [10]:
population_average = a2.mean()
population_max = a2.max()
population_min = a2.min()

print(population_average)
print(population_max)
print(population_min)

12031073.099016393
1400000000
16000


## To implement Simple Random Sampling
* **.sample(sample_size,random_state):** It'll allow you to select n samples from the population, and no records will be duplicates.
* **random_state:** it insures that each time we will run the .sample() function it should give the same samples

In [12]:
population = a2
sample_size = 100
population.sample()

1762    37500000
Name: AmountInUSD, dtype: int64

In [16]:
sample = population.sample(sample_size,random_state=1)

In [17]:
sample

142      4000000
1931    16200000
309       325000
985     25000000
439        18000
          ...   
1347     3000000
1810    30000000
1721      150000
1913    30000000
777      1500000
Name: AmountInUSD, Length: 100, dtype: int64

In [18]:
sample.shape

(100,)

In [19]:
sample_avg = sample.mean()
sample_max = sample.max()
sample_min = sample.min()

print(sample_avg)
print(sample_max)
print(sample_min)

24592930.0
1400000000
18000
