# <span style='color:#522258'>ASSIGNMENT 2</Span>
## <span style='color:#522258'>ESTIMATION AND CONFIDENCE INTERVAL</Span>

##### <span style='color:#C7253E'>Background</span>
    In quality control processes, especially when dealing with high-value items, destructive sampling is a necessary but costly method to ensure product quality. The test to determine whether an item meets the quality standards destroys the item, leading to the requirement of small sample sizes due to cost constraints.
##### <span style='color:#C7253E'>Scenario</span>
    A manufacturer of print-heads for personal computers is interested in estimating the mean durability of their print-heads in terms of the number of characters printed before failure. To assess this, the manufacturer conducts a study on a small sample of print-heads due to the destructive nature of the testing process.

##### <span style='color:#C7253E'>Data</span>
    A total of 15 print-heads were randomly selected and tested until failure. The durability of each print-head (in millions of characters) was recorded as follows:
    1.13, 1.55, 1.43, 0.92, 1.25, 1.36, 1.32, 0.85, 1.07, 1.48, 1.20, 1.33, 1.18, 1.22, 1.29


#### <span style='color:#FABC3F'>PRE-REQUISITE STEPS</span>
##### <span style='color:#E85C0D'>STEP 1:</span>
    First of all let's covert the given sample data into an array
##### <span style='color:#E85C0D'>STEP 2:</span>
    Find the mean, standard deviation (std), size of the sample data (n) and Degree of Freedom (dof) of the given sample data.
    Also address the given CI (Confidence Interval) and the Alpha Value (Significance Value)

In [5]:
import numpy as np

sam = np.array([1.13, 1.55, 1.43, 0.92, 1.25, 1.36, 1.32, 0.85, 1.07, 1.48, 1.20, 1.33, 1.18, 1.22, 1.29])

# calculating the size of sample
n = len(sam)

#calculating the mean of the sample
mean = sam.mean()

#calculating the standard deviation of the sample
std = sam.std()

#calculating the degree of freedom (DOF) of the sample
dof = n -1

#addressing confidence interval from the question
ci = 0.99

#deriving the alpha value / significance value form the confidence interval
alpha = 1 - ci

print(f'The sample size is {n}')
print(f'The mean of the sample is {round(mean, 2)}')
print(f'The standard deviation of the sample is {round(std, 2)}')
print(f'The Degree of Freedom of the sample is {dof}')
print(f'The confidence interval from the question is {ci}')
print(f'The significance value derived from CI is {round(alpha, 2)}')

The sample size is 15
The mean of the sample is 1.24
The standard deviation of the sample is 0.19
The Degree of Freedom of the sample is 14
The confidence interval from the question is 0.99
The significance value derived from CI is 0.01


### <span style='color:#FABC3F'>TASK 1:</span>
##### <span style='color:#C7253E'>Build 99% Confidence Interval Using Sample Standard Deviation</span>
    Assuming the sample is representative of the population, construct a 99% confidence interval for the mean number of characters printed before the print-head fails using the sample standard deviation. Explain the steps you take and the rationale behind using the t-distribution for this task.


### <span style='color:#C0C78C'>EXPLANATION</span>
#### WHY WE ARE USING T DISTRIBUTION FOR THIS QUESTION
##### SAMPLE SIZE IS SMALL:
    When the sample size is small (<30), the t-test is used becaues the standard deviation of the population is unknown, and the sample data might not closely follow a normal distribution (or we can say it is assumed to be following normal distribution).

##### POPULATION STANDARD DEVIATION IS UNKNOWN
    A t-test is used when the population standard deviation is unknown. If the population standard deviation were known, we could have used z-test. As we have only 15 sets of data, so we are bound to use t-test for finding the Confidence Interval.

### <span style='color:#C0C78C'>ANSWER TO TASK 1</span>
##### FINDING THE CRITICAL VALUE
    FORMULA => critical value = t(alpha/2, degree of freedom) 
    Use T-Table to find the value of the critical value, find dof in top of the t-table and alpha value at left most column.


In [17]:
import scipy.stats as stats

#calculating critical value in t table
cv = stats.t.ppf(1 - alpha/2, dof)
print(f'The finding of critical value is {round(cv, 2)}')

The finding of critical value is 2.98


##### COMPUTING MARGIN OF ERROR
    FORMULA = critical value * (standard deviation / square root of sample size)

In [13]:
moe =  critical_value * (std / np.sqrt(n))
print(f'We find the margin of error as {round(moe,2)}')

We find the margin of error as 0.14


##### FINDING THE UPPER AND LOWER BOUND
    Upper Bound = Sample Mean + Margin of Error
    Lower Bound = Sample Mean - Margin of Error

In [15]:
upper_bound = mean + moe
lower_bound = mean - moe

print(f'The lower bound of the sample data is {round(lower_bound, 2)}')
print(f'And the upper bound of the sample data is {round(upper_bound, 2)}')

The lower bound of the sample data is 1.1
And the upper bound of the sample data is 1.38


### <span style='color:#C7253E'>Function for T-Test</span>

In [6]:
import numpy as np
import scipy.stats as stats
def ttest(data, confidence_interval):
    n=len(data)
    sample_mean = np.mean(data)
    sample_std = np.std(data)

    t_critical = stats.t.ppf((1 + confidence_interval) / 2, df = n-1)
    standard_error = sample_std / np.sqrt(n)

    margin_of_error = t_critical * standard_error

    lower_bound = sample_mean - margin_of_error
    upper_bound = sample_mean + margin_of_error

    return [round(lower_bound, 2), round(upper_bound, 2)]

### <span style='color:#C7253E'>Evaluating the T-Test Function</span>

In [9]:
data = np.array([1.13, 1.55, 1.43, 0.92, 1.25, 1.36, 1.32, 0.85, 1.07, 1.48, 1.20, 1.33, 1.18, 1.22, 1.29])
ci = 0.99
result = ttest(data, ci)
print(f'The lower bound is {result[0]}, and upper bound is {result[1]}')

The lower bound is 1.1, and upper bound is 1.38


### <span style='color:#FABC3F'>TASK 2:</span>
##### <span style='color:#C7253E'>Build 99% Confidence Interval Using Known Population Standard Deviation</span>
    If it were known that the population standard deviation is 0.2 million characters, construct a 99% confidence interval for the mean number of characters printed before failure.


### <span style='color:#C0C78C'>EXPLANATION</span>
#### WHY WE ARE USING Z-TEST FOR THIS QUESTION
##### SAMPLE STANDARD DEVIATION IS KNONW
    When we are getting a problem statement with known sample statndard deviation, we use z-test.

##### SIZE OF THE SAMPLE DATA
    The size of the sample data is not really matter in z-test, when we have a known sample standard deviation.

### <span style='color:#C0C78C'>ANSWER TO TASK 2</span>
##### FINDING THECRITICAL VALUE
    FORMULA  => Critical Value = z(alpha/2)
    Use Z-Table to find the value of the critical value.

In [21]:
z_critical = stats.norm.ppf(1 - alpha/2)
print(f'We are getting {round(z_critical, 2)} as critical value')

We are getting 2.58 as critical value


##### COMPUTING MARGIN OF ERROR
    FORMULA = critical value * (standard deviation / square root of sample size)

In [22]:
z_moe = z_critical * (std / np.sqrt(n))
print(f'The margin of error found to be {round(z_moe, 2)}')

The margin of error found to be 0.12


In [23]:
# calculate the lower and upper bound
z_low = mean - z_moe
z_upp = mean + z_moe

print(f'The lower bound of 99% CI is {round(z_low, 2)}')
print(f'The upper bound of 99% CI is {round(z_upp, 2)}')

The lower bound of 99% CI is 1.11
The upper bound of 99% CI is 1.36


### <span style='color:#C7253E'>Function for Z-Test</span>

In [12]:
import numpy as np
import scipy.stats as stats

def ztest(data, population_std, confidence_interval):
    n = len(data)
    sample_mean = np.mean(data)

    z_critical = stats.norm.ppf((1 + confidence_interval) / 2)

    standard_error = population_std / np.sqrt(n)

    margin_of_error = z_critical * standard_error

    lower_bound = sample_mean - margin_of_error
    upper_bound = sample_mean + margin_of_error

    return [round(lower_bound, 2), round(upper_bound, 2)]

### <span style='color:#C7253E'>Evaluating the function for Z-Test</span>

In [13]:
data = np.array([1.13, 1.55, 1.43, 0.92, 1.25, 1.36, 1.32, 0.85, 1.07, 1.48, 1.20, 1.33, 1.18, 1.22, 1.29])
ci = 0.99
p_std = 0.2
result = ztest(data,p_std, ci)
print(f'The lower bound is {result[0]}, and upper bound is {result[1]}')

The lower bound is 1.11, and upper bound is 1.37


### <span style='color:#C0C78C'>Summary of Confidence Intervals:</span>
##### Using the sample standard deviation (t-distribution):
    (1.11, 1.38) OR (1.11,1.38) million characters

##### Using the known population standard deviation (z-distribution):
    (1.11, 1.36) OR (1.11, 1.36) million characters