# Confidence Intervals

A confidence interval is a range of values that is likely to contain the true value of a population parameter (such as a population mean or proportion) with a certain degree of confidence, **based on a sample from that population.**

For example, suppose we want to estimate the average height of all people in a particular city. We take a random sample of 100 people from that city and compute the sample mean height. The true population mean height may be different from the sample mean height due to chance variation in the sample. A confidence interval provides a range of values that we are reasonably certain contains the true population mean height, based on the observed sample mean height and the variability in the sample.

**The level of confidence is typically set at a certain percentage, such as 95% or 99%.** This represents the probability that the true population parameter falls within the calculated confidence interval. The width of the confidence interval depends on the sample size and the variability of the data.

**In summary, a confidence interval is a statistical tool that helps to quantify the uncertainty in an estimate of a population parameter by providing a range of plausible values for that parameter, with a specified level of confidence.**

In [1]:
import pandas as pd
import seaborn as sns
!pip install statsmodels
import statsmodels.stats.api as sms



In [2]:
df = sns.load_dataset("tips")

About Dataset- Tips data set is a data set that contains restaurant information.

total_bill: total price of the meal (including tip and tax)

tip

sex: gender of the payer

smoker: Are there any smokers in the group?

day

size: How many people are in the group?

In [3]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [4]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
total_bill,244.0,19.785943,8.902412,3.07,13.3475,17.795,24.1275,50.81
tip,244.0,2.998279,1.383638,1.0,2.0,2.9,3.5625,10.0
size,244.0,2.569672,0.9511,1.0,2.0,2.0,3.0,6.0


When we look at the average of the total bill variable, we see that it is 19.78

**However, the information I want to obtain with the confidence interval here is "What will the total bill be in the worst scenario? So how much does the restaurant earn at the worst? Or how much does the restaurant earn at the best?" Because, payment times or salaries can be determined accordingly.**

Confidence interval calculation should be done for this.

In [5]:
sms.DescrStatsW(df["total_bill"]).tconfint_mean()

(18.66333170435847, 20.908553541543164)

Total bill average was 19.78.
So what does this mean? (18.66333170435847, 20.908553541543164)

**If I take 100 samples and take the total bill average of these 100 samples, 95 of these 100 samples will be in the range of (18.66333170435847, 20.908553541543164).**

There is a 5% chance that it may be outside of this range.



**The average of the accounts paid by the customers visiting the restaurant is between 18.66 and 20.90 with 95% confidence.**