1.3 - Mean
----

*Definition*

In the field of statistics, the term "mean" refers to a measure of central tendency that represents the typical or average value of a set of data.

The mean is calculated by adding all the values ​​in the data set and then dividing this sum by the total number of values.

It is a useful measure for understanding the central or representative value of a data set and is widely used in statistical analysis and decision making.

The mean can be influenced by extreme values ​​in the data set, so it is important to take into account the context and consider other measures of central tendency, such as the median and mode, to obtain a complete understanding of the distribution of the data. .

**Case 1: The mean for simple series of numbers**

Suppose a person wants to calculate the average salary he has received in the last semester.

The calculation is quite simple.

In [1]:
jan = 200
feb = 225
mar = 250
apr = 275
may = 350
jun = 400

months = 6

mean = (jan+feb+mar+apr+may+jun)/months
print("The average salary for this semester is:",mean)

The average salary for this semester is: 283.3333333333333


**Case 2: The mean of a frequency distribution**

In a university, it is decided to record the number of subjects passed by each first-year student.

The result is as follows.

In [2]:
import pandas as pd

In [6]:
# initialize data of lists.
data = {'N of subjects passed': ['0', '1', '2', '3','4','5','6'],
        'frequecy': [30, 150, 200, 300, 250, 200, 20]}
 
# Create DataFrame
df = pd.DataFrame(data)
 
# Print the output.
df.head(7)

Unnamed: 0,N of subjects passed,frequecy
0,0,30
1,1,150
2,2,200
3,3,300
4,4,250
5,5,200
6,6,20


In this case, we can see that the value "0" is repeated 30 times. This indicates that there are 30 students who did not pass any subject. We could say something similar about the remaining elements of the table.

Let's see how the average is calculated in this case.

In [9]:
# Convert column to numeric
df['N of subjects passed'] = pd.to_numeric(df['N of subjects passed'])

# Calculate the mean
mean = (df['N of subjects passed'] * df['frequecy']).sum() / df['frequecy'].sum()
print("The mean is:", mean)

The mean is: 3.1043478260869564


We can do it manually

In [10]:
mean = (0*30+1*150+2*200+3*300+4*250+5*200+6*20)/1150
print("The mean is:",mean)

The mean is: 3.1043478260869564


**Case 3: Frequency distribution with data grouped into ranges**

In the previous case, each category was linked to a value. But sometimes we have to work with categories that represent a set of values.

When we work with intervals, we have to start by identifying its upper limit, its lower limit and the corresponding class mark. 

The concept "class mark" refers to the average between the lower and upper limit.

In the following example, we can observe the way in which the salaries of a company's workers are distributed within different intervals. There are different numbers of workers within each salary range.

In [3]:
# initialize data of lists.
data = {'Lower Limit': [0, 10, 20, 30, 40, 50],
        'Upper Limit': [10, 20, 30, 40, 50, 60],
        'Number of Employees': [5, 15, 25, 30, 25, 10]}
 
# Create DataFrame
df = pd.DataFrame(data)
 
# Print the output.
df.head(7)

Unnamed: 0,Lower Limit,Upper Limit,Number of Employees
0,0,10,5
1,10,20,15
2,20,30,25
3,30,40,30
4,40,50,25
5,50,60,10


In [11]:
# Calculate the mark of class
df['Mark of class'] = (df['Upper Limit']+df['Lower Limit'])/2
df.head(6)

Unnamed: 0,Lower Limit,Upper Limit,Number of Employees,Mark of class
0,0,10,5,5.0
1,10,20,15,15.0
2,20,30,25,25.0
3,30,40,30,35.0
4,40,50,25,45.0
5,50,60,10,55.0


In [8]:
# Calculate the mean
mean = (df['Number of Employees'] * df['Mark of class']).sum() / df['Number of Employees'].sum()
print("The mean is:", mean)

The mean is: 32.72727272727273


In [12]:
mean = (5*5+15*15+25*25+30*35+25*45+10*55)/110
print("The mean is:", mean)

The mean is: 32.72727272727273
