# **Build 99% Confidence Interval Using Sample Standard Deviation**

In [2]:
import numpy as np
std = 0.02
df = np.array([[1.13],
               [1.55],
               [1.43],
               [0.92],
               [1.25],
               [1.36],
               [1.32],
               [0.85],
               [1.07],
               [1.48],
               [1.20],
               [1.33],
               [1.18],
               [1.22],
               [1.29]
              ])
df

array([[1.13],
       [1.55],
       [1.43],
       [0.92],
       [1.25],
       [1.36],
       [1.32],
       [0.85],
       [1.07],
       [1.48],
       [1.2 ],
       [1.33],
       [1.18],
       [1.22],
       [1.29]])

In [5]:
#MEAN
mean_value = df.mean()
print(float(mean_value))

1.2386666666666666


In [7]:
#STANDARD DEVIATION
std_value = df.std()
print(float(std_value))

0.18661427836285438


In [12]:
from scipy import stats
interval = stats.norm.interval(0.99, loc=mean_value, scale=std_value)
interval = tuple(float(x) for x in interval)
interval

(0.7579801399989947, 1.7193531933343384)

The code calculates a 99% confidence interval for a dataset. Let's break down the steps:

1. **Import Libraries:**
   - `numpy` (as `np`): Used for numerical operations, especially on arrays.
   - `scipy.stats`: Provides statistical functions, including those for confidence intervals.

2. **Define Data and Parameters:**
   - `std = 0.02`:  This seems like a placeholder or potentially incorrect value.  The code *then calculates* the sample standard deviation, so this initial `std` isn't used in the confidence interval calculation.
   - `df`: A NumPy array representing the dataset (15 measurements).

3. **Calculate Sample Mean:**
   - `mean_value = df.mean()`: Computes the average of the values in the `df` array.

4. **Calculate Sample Standard Deviation:**
   - `std_value = df.std()`: Calculates the standard deviation of the values in `df`.  This is the *sample* standard deviation, which is used in the confidence interval calculation since the population standard deviation is unknown.

5. **Compute the Confidence Interval:**
   - `interval = stats.norm.interval(0.99, loc=mean_value, scale=std_value)`: This is the core of the calculation.
     - `0.99`:  Specifies a 99% confidence level.
     - `loc=mean_value`:  The sample mean (calculated in step 3).
     - `scale=std_value`: The sample standard deviation (calculated in step 4).

   The `stats.norm.interval` function uses the normal distribution to compute the interval.  Because the sample size is 15, using the normal distribution is an approximation.  If the sample size were smaller, a t-distribution would be more appropriate.

6. **Convert to Floats and Display:**
   - `interval = tuple(float(x) for x in interval)`: The function returns a tuple (a fixed size list) of NumPy float64 values. This line converts those values to standard Python floats for printing.  This is not strictly necessary.
   - `interval`: Displays the computed confidence interval.

**In short:** The code takes a dataset, calculates its mean and standard deviation, and then uses these statistics to determine a range (the confidence interval) within which the true population mean is likely to fall 99% of the time if you were to repeat this sampling process many times.

**Important Note about `std = 0.02`**:  The initial assignment of `std = 0.02` is irrelevant because the code calculates the *sample* standard deviation of the data using `df.std()`.  The provided `std` variable is never used in subsequent computations.  If the code's intent is to use a given standard deviation, that line needs to be removed and the variable substituted in place of `df.std()` in the confidence interval calculation.

99 % confindance that population mean will be lies under this interval (.7579801399989947, 1.7193531933343384)

# **Build 99% Confidence Interval Using Known Population Standard Deviation**

In [13]:
# population standard deviation
std = 0.2

In [15]:
from scipy.stats import norm

interval = stats.norm.interval(0.99, loc = df.mean(), scale = std)
interval = tuple(float(x) for x in interval)
interval

(0.7235008059568865, 1.7538325273764466)

stats.norm.interval: This function uses the normal distribution to compute the confidence interval. Since the population standard deviation is known, the normal distribution is appropriate.

0.99: Specifies a 99% confidence level.

loc=df.mean(): Centers the interval at the sample mean.

scale=std: Uses the provided population standard deviation(std = 0.2).

Convert to floats: interval = tuple(float(x) for x in interval)

Converts the endpoints to standard Python floats.

There is a very high level of confidence (99%) that the actual average number of characters a print-head can print before failure falls between (0.7235008059568865, 1.7538325273764466) million characters.