In [5]:
import math

## **Objective**: calculate the sample size for a client survey

An optical office has hired our services to conduct a patient satisfaction survey. Additionally, they want to evaluate the methods by which they find out about the services provided by the optician and the media in which they have seen advertising related to the brand.
The exercise is about finding the sample size.

Here is a Python function that can calculate the sample size needed for a survey, given the population size, the desired confidence level, and the allowable margin of error:

In [6]:
def sample_size(population, confidence, error):
    z = 1.96  # Z value for a confidence level of 95%
    p = 0.5   # we suppose a probability of success of 0.5
    e = error/100
    n = (z**2 * p * (1-p)) / (e**2)
    return math.ceil(n / (1 + ((n-1) / population)))


The formula for calculating the sample size is based on statistical theory and assumes a normal distribution of the population. This calculation is done in two stages: first, the number of people needed to have a certain level of precision (n) is determined, and then rounded up to get an integer.

### Things to remember when determining the size of your sample:
When figuring out a sample size, here are things to keep in mind:

- Don’t use a sample size less than 30. It has been statistically proven that 30 is the smallest sample size where an average result of a sample starts to represent the average result of a population.

- The confidence level most commonly used is 95%, but 90% can work in some cases. 

Increase the sample size to meet specific needs of your project:

- For a higher confidence level, use a larger sample size

- To decrease the margin of error, use a larger sample size

- For greater statistical significance, use a larger sample size

## Population analysis

the patient database indicates that there are 1850 patients registered for optics. Carrying out a quick analysis of the registered email field for these patients, we find that 410 have it blank and another 10 are invalid records. Since the survey will be sent by email we are left with 1430 patients as a population.

In [19]:
population = 1430
confidence = 95
error = 5

required_sample = sample_size(population, confidence, error)
print(f'Sample Size Required: {required_sample}, with a population: {population}, confidence: {confidence} and Margin error: {error}')

Sample Size Required: 303, with a population: 1430, confidence: 95 and Margin error: 5


### changing the Margin error to 3%

In [20]:
population = 1430
confidence = 95
error = 3

required_sample2 = sample_size(population, confidence, error)
print(f'Sample Size Required: {required_sample}, with a population: {population}, confidence: {confidence} and Margin error: {error}')

Sample Size Required: 303, with a population: 1430, confidence: 95 and Margin error: 3


### Survey response rate
 the calculated sample size is the minimum number to achieve what you input for confidence level and margin of error. If you are working with a survey, you will also need to think about the estimated response rate to figure out how many surveys you will need to send out.

By experience just 62% of patients answer the surveys so:

In [22]:
populationToSurvey = required_sample * 100 / 62
print(f'Population required to answer the survey supposed error 5% : {round(populationToSurvey)}' )

Population required to answer the survey supposed error 5% : 489


In [24]:
populationToSurvey = required_sample2 * 100 / 62
print(f'Population required to answer the survey supposed error 3% : {round(populationToSurvey)}' )

Population required to answer the survey supposed error 3% : 987


Due to manipulation analysis of the survey answer Margin error 5% is chosen.