# Practical Example: Confidence Intervals

This notebook contains the solutions to the practical examples for Confidence Intervals.

In [None]:
import pandas as pd
import numpy as np
from scipy import stats

# Load data for Task 1 & 2 (Apartment Prices)
file_path_re = '2.13.Practical-example.Descriptive-statistics-exercise-solution.xlsx'
df_re = pd.read_excel(file_path_re, sheet_name='365RE', header=4)
prices = df_re['Price']

# Load data for Task 3 (Shoe Shops)
file_path_ci = '3.17.Practical-example.Confidence-intervals-exercise-solution.xlsx'
df_ci = pd.read_excel(file_path_ci, sheet_name='Al Bundy', header=3)


## Task 1
**Calculate the 95% Confidence Interval for the mean price of apartments (Population variance known).**

**Assumption:** Population standard deviation is known to be $15,000.

In [None]:
mean_price = prices.mean()
n = len(prices)
pop_std = 15000
confidence_level = 0.95
z_score = stats.norm.ppf((1 + confidence_level) / 2)

margin_of_error = z_score * (pop_std / np.sqrt(n))
ci_lower = mean_price - margin_of_error
ci_upper = mean_price + margin_of_error

print(f"Mean Price: ${mean_price:,.2f}")
print(f"95% Confidence Interval (Known Variance): (${ci_lower:,.2f}, ${ci_upper:,.2f})")

## Task 2
**Calculate the 95% Confidence Interval for the mean price of apartments (Population variance unknown).**

In [None]:
sample_std = prices.std(ddof=1)
t_score = stats.t.ppf((1 + confidence_level) / 2, df=n-1)

margin_of_error_t = t_score * (sample_std / np.sqrt(n))
ci_lower_t = mean_price - margin_of_error_t
ci_upper_t = mean_price + margin_of_error_t

print(f"Sample Std Dev: ${sample_std:,.2f}")
print(f"95% Confidence Interval (Unknown Variance): (${ci_lower_t:,.2f}, ${ci_upper_t:,.2f})")

## Task 3
**Compare the sales of two shoe shops (Germany, GER1 vs GER2).**

Calculate the 95% Confidence Interval for the difference in means.

In [None]:
# Filter data for Germany
germany_sales = df_ci[df_ci['Country'] == 'Germany']

# Separate by shop (assuming 'Shop' column exists or can be inferred from 'Code')
# Inspecting 'Code' column: GER1, GER2
ger1 = germany_sales[germany_sales['Code'] == 'GER1']['Size (US)'] # Assuming we are comparing shoe sizes or sales?
# Wait, the task is likely about 'Price' or 'Sales' volume? 
# Let's check the columns again. 
# Columns: InvoiceNo, Date, Country, ProductID, Shop, Gender, Size (US), Size (Europe), Size (UK), Unit Price, Discount, Year, Month, SalePrice
# Task says 'sales', usually implies 'SalePrice'.

ger1_sales = germany_sales[germany_sales['Code'] == 'GER1']['SalePrice']
ger2_sales = germany_sales[germany_sales['Code'] == 'GER2']['SalePrice']

n1 = len(ger1_sales)
n2 = len(ger2_sales)
mean1 = ger1_sales.mean()
mean2 = ger2_sales.mean()
var1 = ger1_sales.var(ddof=1)
var2 = ger2_sales.var(ddof=1)

# Pooled variance (assuming equal variances)
pooled_var = ((n1 - 1) * var1 + (n2 - 1) * var2) / (n1 + n2 - 2)
std_error = np.sqrt(pooled_var * (1/n1 + 1/n2))

t_score_diff = stats.t.ppf((1 + confidence_level) / 2, df=n1 + n2 - 2)
margin_of_error_diff = t_score_diff * std_error

diff_mean = mean1 - mean2
ci_lower_diff = diff_mean - margin_of_error_diff
ci_upper_diff = diff_mean + margin_of_error_diff

print(f"Mean Sales GER1: ${mean1:,.2f}")
print(f"Mean Sales GER2: ${mean2:,.2f}")
print(f"Difference in Means: ${diff_mean:,.2f}")
print(f"95% CI for Difference: (${ci_lower_diff:,.2f}, ${ci_upper_diff:,.2f})")

**Interpretation:**
If the confidence interval includes 0, there is no significant difference between the sales of the two shops.