# Mean
### Theory of Mean

The mean, often referred to as the average, is a measure of central tendency in statistics. It represents the sum of all values in a dataset divided by the number of values. The mean provides a single value that summarizes the entire dataset and is widely used in data analysis.

### Formula
For a dataset with values $x_1, x_2, \ldots, x_n$, the mean $\bar{x}$ is calculated as:

$$
\bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i
$$

### Properties
- The mean is sensitive to extreme values (outliers).
- It is best used for data that is symmetrically distributed without outliers.
- The mean is used in many statistical analyses and is the basis for other measures such as variance and standard deviation.

### Example
If the dataset is [2, 4, 6, 8, 10], the mean is:

$$
\bar{x} = \frac{2 + 4 + 6 + 8 + 10}{5} = 6
$$

# Mean using manual calculation

In [1]:
Data=[12, 15, 18, 20, 22, 25, 28, 30, 42, 51]
Mean=sum(Data)/len(Data)
print(f"Mean:{Mean}")

Mean:26.3


# Mean using numpy library

In [2]:
import numpy as np
Data=[12, 15, 18, 20, 22, 25, 28, 30, 42, 51]
Mean=np.mean(Data)
print(f"Mean:{Mean}")

Mean:26.3


# Mean using statistics library

In [3]:
import statistics as stats
Data=[12, 15, 18, 20, 22, 25, 28, 30, 42, 51]
Mean=stats.mean(Data)
print(f"Mean:{Mean}")

Mean:26.3


# Applying Mean on real world data

In [5]:
import seaborn as sns

iris=sns.load_dataset('iris')
iris.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [7]:
import seaborn as sns
import numpy as np

iris=sns.load_dataset('iris')
Mean=np.mean(iris["sepal_length"])
print(f"Mean of Sepal Length is {round(Mean,3)}")

Mean of Sepal Length is 5.843


# Exercises: Mean (6 questions)

1. Compute the mean of the list `Data = [12, 15, 18, 20, 22, 25, 28, 30, 42, 51]` manually (use sum and length) and show the calculation.

2. Compute the mean of `Data` using numpy and using the statistics module. Verify both results match the manual calculation.

3. Using the `iris` DataFrame, compute the mean of the `sepal_length` column. Round the result to three decimal places.

4. Compute the mean `sepal_length` for each species in the `iris` dataset (setosa, versicolor, virginica). Which species has the largest mean sepal length?

5. Investigate outlier impact: remove the largest value in `Data` (51) and recompute the mean. How much did the mean change? Explain why the mean is sensitive to outliers.

6. For a dataset with clear outliers, which measure of central tendency would you prefer (mean, median, or mode)? Provide a short justification and give an example using `Data` or `iris`.
