In [39]:
import numpy as np
import pandas as pd

## 1) 

Table 1 shows a frequency distribution of grades on a final examination in
college algebra. Find the quartiles of the distribution.

| Grade    | Number of Students |
|----------|--------------------|
| 90-99   | 9                  |
| 80-89    | 32                 |
| 70-79    | 43                 |
| 60-69    | 21                 |
| 50-59    | 11                 |
| 40-49    | 3                  |
| 30-39    | 1                  |
|----------|--------------------|
| **Total**| **120**            |



In [18]:
df_1 = pd.DataFrame({
    'Grade': ['90-99', '80-89', '70-79', '60-69', '50-59', '40-49', '30-39'],
    'NumberOfStudents': [9, 32, 43, 21, 11, 3, 1]
})

df_sorted = df_1.iloc[::-1].reset_index(drop=True)

df_sorted['CumulativeFrequency'] = df_sorted['NumberOfStudents'].cumsum()


total_students = df_sorted['NumberOfStudents'].sum()
Q1_position = total_students * 0.25
Q2_position = total_students * 0.50
Q3_position = total_students * 0.75

Q1 = df_sorted[df_sorted['CumulativeFrequency'] >= Q1_position].iloc[0]['Grade']
Q2 = df_sorted[df_sorted['CumulativeFrequency'] >= Q2_position].iloc[0]['Grade']
Q3 = df_sorted[df_sorted['CumulativeFrequency'] >= Q3_position].iloc[0]['Grade']



quartiles = {
    '1st Quartile (Q1)': Q1,
    '2nd Quartile (Q2)': Q2,
    '3rd Quartile (Q3)': Q3
}

quartiles

{'1st Quartile (Q1)': '60-69',
 '2nd Quartile (Q2)': '70-79',
 '3rd Quartile (Q3)': '80-89'}

## 2)
On a final examination in statistics, the mean grade of a group of 150
students was 78 and the standard deviation was 8.0. In algebra, however, the mean final grade of the group was 73 and the standard deviation was 7.6. In which subject was there the greater (a) absolute dispersion and (b) relative dispersion?

---

### Given

$$
N = 150 \\
\mu_{\text{statistics}} = 78 \\
\sigma_{\text{statistics}} = 8.0 \\
\mu_{\text{algebra}} = 73 \\
\sigma_{\text{algebra}} = 7.6
$$


### For absolue dispersion

Since stadard deviation is a type of absolute measure of dispersion, we could use them to compare which absolute dispersion is greater.

$$
\sigma_{\text{statistics}} = 8.0 \\
\sigma_{\text{algebra}} = 7.6
$$

sine $8.0 > 7.6$,  **Statistics class has a greater absolute dispersion**.

### For absolue dispersion

In our context, relative dispersion is called **Coefficient of Variation (V)** and could be calculated as follows:

$$
\text{V}_{\text{sub}} = \frac{\sigma_{\text{sub}}}{\overline{X}_{\text{sub}}}
$$

This id due to the fact that we were given the standard deviation ($\sigma$) and average grades ($\overlie{X}$)

Applying this formula:
$$
V_{\text{statistics}} = \frac{8.0}{78} \\
V_{\text{algebra}} = \frac{7.6}{73}
$$

In [21]:
print(f'Stats: {8.0/78} \nAlgebra: {7.6/73}')

Stats: 0.10256410256410256 
Algebra: 0.10410958904109588


$$
V_{\text{statistics}} \approx 0.1026 \\
V_{\text{algebra}} \approx 0.1041
$$

or in percentage form.

$$
V_{\text{statistics}} \approx 10.26\% \\
V_{\text{algebra}} \approx 10.41\%
$$

Since $10.41 > 10.26$, **Algebra has greater relative dispersion**.

## 2)
Prove that the mean and standard deviation of a set of standard scores are equal to 0 and 1, respectively. Use the following problem to illustrate this: Convert the set 6, 2, 8, 7, 5 into standard scores.

---

### Given
$$
N = 5 \\
S = \{6,2,8,7,5\}
$$

### Calculating mean and standard deviation
The formula to be follow is as follows:
$$
\mu = \frac{\sum X}{N} \\
\sigma = \sqrt{\frac{\sum (X_i - \mu)^2}{N}}
$$

Let us now compute

$$
\mu = \frac{6+2+8+7+5}{5}
$$

In [28]:
given = [6,2,8,7,5]
mu = sum(given)/len(given)
mu

5.6

$$
\mu = 5
$$

$$
\sigma = \sqrt{\frac{\sum (X_i - 5.6)^2}{5}}
$$

In [31]:
x_i_minus_mu_squared = [(x-mu)**2 for x in given]
std = (sum(x_i_minus_mu_squared) / 5)**0.5
std

2.0591260281974

$$
\sigma \approx 2.06
$$

### Calculating the standard score (Z)

$$
Z = \frac{X - \mu}{\sigma} \\
Z_1 = \frac{6 - 5.6}{2.06} \\
Z_2 = \frac{2 - 5.6}{2.06} \\
Z_3 = \frac{8 - 5.6}{2.06} \\
Z_4 = \frac{7 - 5.6}{2.06} \\
Z_5 = \frac{5 - 5.6}{2.06}
$$



In [40]:
res = [(x - mu) / std for x in given]
print([(x - mu) / std for x in given])

print(f'Mean: {sum(res)/len(res)} \n Std: {np.std(res)}')

[0.19425717247145302, -1.7483145522430754, 1.1655430348287172, 0.6799001036500851, -0.2913857587071791]
Mean: 1.7763568394002506e-16 
 Std: 0.9999999999999999


$$
Z_1 \approx 0.19 \\
Z_2 \approx -1.75 \\
Z_3 \approx 1.17 \\
Z_4 \approx 0.68 \\
Z_5 \approx -0.29
$$

### Let us now verify
As calculate previously **both the mean and standard deviation of a set of standard socres are equal to 0 and 1**.

In [37]:
(0.19 - 1.75 + 1.17 + 0.68 -0.29)/5

-1.1102230246251566e-17

## 4)
Three masses are measured as 20.48, 35.97, and 62.34 g, with standard deviations of 0.21, 0.46, and 0.54 g, respectively. Find the (a) mean and (b) standard deviation of the sum of the masses.

---

In [None]:
given_mean = [20.48, 35.97, 67.34]
given_div = [0.21,0.46,0.54]

mean_tot = sum(20.46)