Imagine you're curious about the average weight of people in your expansive office. To determine the average weight, you could directly measure each person's weight, but given the size of the office, this might be quite an undertaking. However, understanding the average weight of your colleagues could offer valuable insights. Here's how you could apply bootstrapping to this scenario:

You work for a substantial technology company with 1,000 employees in the building and 200 on your floor. Instead of surveying everyone, which could be time-consuming, you decide to use bootstrapping to estimate the average weight.

You start by conducting a survey on the first day where you measure the weights of 50 randomly selected individuals without replacement. These weights form your initial dataset.

In [None]:
import random
import numpy as np

# Set a seed value for reproducibility
random.seed(42)  # You can use any integer value as the seed


# Simulated list of weights of 50 people in kg in office 
weights_kg = [random.uniform(50, 100) for wt in range(50)]

print("List of Weights (kg):", weights_kg)



However, instead of performing this survey repeatedly, you decide to leverage these 50 data points to generate multiple bootstrapped samples.

Each "bag" of bootstrapped samples consists of 50 randomly chosen weights from your initial dataset, and these samples are drawn with replacement. This means that some weights might be selected multiple times in a single bag, while others might not be selected at all. For each of these bootstrapped bags, you calculate the mean weight of the individuals.

bag1 (50 samples with replacement)
bag2 (50 samples with replacement)
bag3 (50 samples with replacement)
.
.
bag100 (50 samples with replacement)



After performing this procedure 100 times, you end up with 100 different estimates for the average weight of your colleagues.

bag1 (50 samples with-replacement)--> Avg1
bag2 (50 samples with-replacement)--> Avg2
.
.
.
bag100 (50 samples with-replacement)--> Avg100


These estimates offer a range of possible values. By analyzing these values, you can calculate confidence intervals. For instance, you might conclude, "With 95% confidence, the average weight of people in this company is between 140 and 160 pounds."

In [None]:

# Number of bootstrap samples
num_bootstraps = 100  #100 bags will be created with replacement from the original sample(weights_kg)

# Initialize an empty list to store bootstrapped sample means
bootstrap_sample_means = []

# Perform bootstrapping
#100 bags will be created with replacement from the original sample(weights_kg)
for wt in range(num_bootstraps): 
    bootstrap_sample = [random.choice(weights_kg) for wt in range(len(weights_kg))] #randomly select 50 weights from the original sample with replacement
    bootstrap_mean = np.mean(bootstrap_sample) #calculate the mean of the 50 weights
    bootstrap_sample_means.append(bootstrap_mean) #append the mean to the list of means

# Calculate the 95% confidence interval
confidence_interval = np.percentile(bootstrap_sample_means, [2.5, 97.5])

print("Bootstrap Estimates of Mean Weight:", bootstrap_sample_means)
print("95% Confidence Interval:", confidence_interval)

In this context, bootstrapping helps you estimate the average weight of your colleagues in a way that's more feasible than measuring everyone individually. It provides a method to make informed conclusions about a population based on a sample, even when collecting data from the entire population might be impractical

In [25]:
# another way to write the same code : 


import random
import numpy as np

# Set a seed value for reproducibility
random.seed(42)  # You can use any integer value as the seed


# Simulated list of weights of 50 people in kg in office 
weights_kg = [random.uniform(50, 100) for wt in range(50)]

print("List of Weights (kg):", weights_kg)

# Number of bootstrap samples
num_bootstraps = 100  #100 bags will be created with replacement from the original sample(weights_kg)

# Initialize an empty list to store bootstrapped sample means
bootstrap_sample_means = []

# Perform bootstrapping
for _ in range(num_bootstraps):
    bootstrap_sample = []  # Initialize an empty list for the bootstrapped sample
    for _ in range(len(weights_kg)):
        random_weight = random.choice(weights_kg)  # Randomly choose a weight from the original list
        bootstrap_sample.append(random_weight)  # Add the chosen weight to the bootstrapped sample
    
    bootstrap_mean = np.mean(bootstrap_sample)  # Calculate the mean of the bootstrapped sample
    bootstrap_sample_means.append(bootstrap_mean)  # Store the mean in the list of bootstrapped means


# Calculate the 95% confidence interval
confidence_interval = np.percentile(bootstrap_sample_means, [2.5, 97.5])

print("Bootstrap Estimates of Mean Weight:", bootstrap_sample_means)
print("95% Confidence Interval:", confidence_interval)


List of Weights (kg): [81.97133992289419, 51.250537761133344, 63.75146591845596, 61.16053690744114, 86.82356070820062, 83.83497437114556, 94.60897838524227, 54.346941631470806, 71.09609098426353, 51.48986097190352, 60.931898740180166, 75.26776440516812, 51.32679848419318, 59.94188253433242, 82.49422188897617, 77.24707403016083, 61.02203110203483, 79.46328419379543, 90.47152283389133, 50.32493798390305, 90.2909625916404, 84.90696974941135, 67.01252582589959, 57.773974990589075, 97.86065361033906, 66.82972725563134, 54.637292169007395, 54.8358188416732, 92.37471831737298, 80.18630156834456, 90.35641366371901, 86.4865893346909, 76.81140457273503, 98.65578819896854, 68.92671886041768, 77.60203156366134, 91.47023321264975, 80.9259876182123, 93.08534501553886, 78.8676072628381, 85.22859181074617, 52.29121918278311, 61.39491378257734, 64.46939818010536, 53.98959884618137, 61.639544318051506, 55.05007147048646, 63.89868015550461, 81.78422221322, 68.24160894850421]
Bootstrap Estimates of Mean W