In [1]:
import numpy as np
import pandas as pd
import itertools

## 1) 

Table 1 shows a frequency distribution of grades on a final examination in
college algebra. Find the quartiles of the distribution.

| Grade    | Number of Students |
|----------|--------------------|
| 90-99   | 9                  |
| 80-89    | 32                 |
| 70-79    | 43                 |
| 60-69    | 21                 |
| 50-59    | 11                 |
| 40-49    | 3                  |
| 30-39    | 1                  |
|----------|--------------------|
| **Total**| **120**            |



In [3]:
df_1 = pd.DataFrame({
    'Grade': ['90-99', '80-89', '70-79', '60-69', '50-59', '40-49', '30-39'],
    'NumberOfStudents': [9, 32, 43, 21, 11, 3, 1]
})

df_sorted = df_1.iloc[::-1].reset_index(drop=True)

df_sorted['CumulativeFrequency'] = df_sorted['NumberOfStudents'].cumsum()


total_students = df_sorted['NumberOfStudents'].sum()
Q1_position = total_students * 0.25
Q2_position = total_students * 0.50
Q3_position = total_students * 0.75

Q1 = df_sorted[df_sorted['CumulativeFrequency'] >= Q1_position].iloc[0]['Grade']
Q2 = df_sorted[df_sorted['CumulativeFrequency'] >= Q2_position].iloc[0]['Grade']
Q3 = df_sorted[df_sorted['CumulativeFrequency'] >= Q3_position].iloc[0]['Grade']



quartiles = {
    '1st Quartile (Q1)': Q1,
    '2nd Quartile (Q2)': Q2,
    '3rd Quartile (Q3)': Q3
}

quartiles

{'1st Quartile (Q1)': '60-69',
 '2nd Quartile (Q2)': '70-79',
 '3rd Quartile (Q3)': '80-89'}

## 2)
On a final examination in statistics, the mean grade of a group of 150
students was 78 and the standard deviation was 8.0. In algebra, however, the mean final grade of the group was 73 and the standard deviation was 7.6. In which subject was there the greater (a) absolute dispersion and (b) relative dispersion?

---

### Given

$$
\begin{align*}
N = 150 \\
\mu_{\text{statistics}} = 78 \\
\sigma_{\text{statistics}} = 8.0 \\
\mu_{\text{algebra}} = 73 \\
\sigma_{\text{algebra}} = 7.6
\end{align*}
$$


### For absolue dispersion

Since stadard deviation is a type of absolute measure of dispersion, we could use them to compare which absolute dispersion is greater.

$$
\begin{align*}
\sigma_{\text{statistics}} = 8.0 \\
\sigma_{\text{algebra}} = 7.6
\end{align*}
$$

sine $8.0 > 7.6$,  **Statistics class has a greater absolute dispersion**.

### For absolue dispersion

In our context, relative dispersion is called **Coefficient of Variation (V)** and could be calculated as follows:

$$
\text{V}_{\text{sub}} = \frac{\sigma_{\text{sub}}}{\overline{X}_{\text{sub}}}
$$

This id due to the fact that we were given the standard deviation ($\sigma$) and average grades ($\overlie{X}$)

Applying this formula:
$$
\begin{align*}
V_{\text{statistics}} = \frac{8.0}{78} \\
V_{\text{algebra}} = \frac{7.6}{73}
\end{align*}
$$

In [5]:
print(f'Stats: {8.0/78} \nAlgebra: {7.6/73}')

Stats: 0.10256410256410256 
Algebra: 0.10410958904109588


$$
\begin{align*}
V_{\text{statistics}} \approx 0.1026 \\
V_{\text{algebra}} \approx 0.1041
\end{align*}
$$

or in percentage form.

$$
\begin{align*}
V_{\text{statistics}} \approx 10.26\% \\
V_{\text{algebra}} \approx 10.41\%
\end{align*}
$$

Since $10.41 > 10.26$, **Algebra has greater relative dispersion**.

## 2)
Prove that the mean and standard deviation of a set of standard scores are equal to 0 and 1, respectively. Use the following problem to illustrate this: Convert the set 6, 2, 8, 7, 5 into standard scores.

---

### Given
$$
\begin{align*}
N &= 5 \\
S &= \{6,2,8,7,5\}
\end{align*}
$$

### Calculating mean and standard deviation
The formula to be follow is as follows:
$$
\begin{align*}
\mu = \frac{\sum X}{N} \\ \\
\sigma = \sqrt{\frac{\sum (X_i - \mu)^2}{N}}
\end{align*}
$$

Let us now compute

$$
\mu = \frac{6+2+8+7+5}{5}
$$

In [8]:
given = [6,2,8,7,5]
mu = sum(given)/len(given)
mu

5.6

$$
\mu = 5
$$
\n
$$
\sigma = \sqrt{\frac{\sum (X_i - 5.6)^2}{5}}
$$

In [10]:
x_i_minus_mu_squared = [(x-mu)**2 for x in given]
std = (sum(x_i_minus_mu_squared) / 5)**0.5
std

2.0591260281974

$$
\sigma \approx 2.06
$$

### Calculating the standard score (Z)

$$
\begin{align*}
Z = \frac{X - \mu}{\sigma} \\
Z_1 = \frac{6 - 5.6}{2.06} \\
Z_2 = \frac{2 - 5.6}{2.06} \\
Z_3 = \frac{8 - 5.6}{2.06} \\
Z_4 = \frac{7 - 5.6}{2.06} \\
Z_5 = \frac{5 - 5.6}{2.06}
\end{align*}
$$



In [13]:
res = [(x - mu) / std for x in given]
print([(x - mu) / std for x in given])

print(f'Mean: {sum(res)/len(res)} \n Std: {np.std(res)}')

[0.19425717247145302, -1.7483145522430754, 1.1655430348287172, 0.6799001036500851, -0.2913857587071791]
Mean: 1.7763568394002506e-16 
 Std: 0.9999999999999999


$$
\begin{align*}
Z_1 &\approx 0.19 \\
Z_2 &\approx -1.75 \\
Z_3 &\approx 1.17 \\
Z_4 &\approx 0.68 \\
Z_5 &\approx -0.29
\end{align*}
$$

### Let us now verify
As calculate previously **both the mean and standard deviation of a set of standard socres are equal to 0 and 1**.

In [15]:
(0.19 - 1.75 + 1.17 + 0.68 -0.29)/5

-1.1102230246251566e-17

## 4)
Three masses are measured as 20.48, 35.97, and 62.34 g, with standard deviations of 0.21, 0.46, and 0.54 g, respectively. Find the (a) mean and (b) standard deviation of the sum of the masses.

---

In [42]:
given_mean = [20.48, 35.97, 67.34]
given_std = [0.21,0.46,0.54]

sum_mean = sum(given_mean)
sum_std = sum([x**2 for x in given_std])**0.5
print(f'Mean of the sum = {sum_mean} \nStandard Deviation of the sum = {sum_std}')

Mean of the sum = 123.79 
Standard Deviation of the sum = 0.7397972695272672


## 5)
The credit hour distribution at Metropolitan Technological College is as
follows:

| x    | 6   | 9   | 12  | 15  | 18  |
|------|-----|-----|-----|-----|-----|
| p(x) | 0.1 | 0.2 | 0.4 | 0.2 | 0.1 |

Find $\mu$ and $\sigma^2$. Give the 25 (with replacement) possible samples of size 2, their means,
and their probabilities

---

### Calculating the mean and variance

For a probability distribution, we could calculate the mean by:

$$
\mu = \sum (x \times p(x))
$$

In [48]:
x_given = [6,9,12,15,18]
p_x = [0.1,0.2,0.4,0.2,0.1]
x_p_x = [x_given[i]*p_x[i] for i in range(len(x_given))]
mean_prob = sum(x_p_x)
mean_prob

12.0

$$
\mu = 12
$$

For variance:

$$
\sigma^2 = \sum((x-\mu)^2 \times p(x))
$$

In [50]:
var_prob = sum([((x_given[i] - mean_prob)**2)*p_x[i] for i in range(len(x_given))])
var_prob

10.8

$$
\sigma^2 = 10.8
$$

In [58]:
def sample_prob(x):
    match x:
        case 6:
            return 0.1
        case 9:
            return 0.2
        case 12:
            return 0.4
        case 15:
            return 0.2
        case 18:
            return 0.1
        case _:
            return ""
        
perm = list(itertools.product(x_given, x_given))
a = [x[0] for x in perm]
b = [x[1] for x in perm]
data = {
    'A' : a, 
    'B' : b
}
df_e = pd.DataFrame(data)
df_e['x_bar'] = (df_e['A'] + df_e['B']) / 2
df_e['prob_a'] = df_e['A'].apply(sample_prob)
df_e['prob_b'] = df_e['B'].apply(sample_prob)
df_e['pxa_pxb'] = df_e['prob_a'] * df_e['prob_b']
df_e

Unnamed: 0,A,B,x_bar,prob_a,prob_b,pxa_pxb
0,6,6,6.0,0.1,0.1,0.01
1,6,9,7.5,0.1,0.2,0.02
2,6,12,9.0,0.1,0.4,0.04
3,6,15,10.5,0.1,0.2,0.02
4,6,18,12.0,0.1,0.1,0.01
5,9,6,7.5,0.2,0.1,0.02
6,9,9,9.0,0.2,0.2,0.04
7,9,12,10.5,0.2,0.4,0.08
8,9,15,12.0,0.2,0.2,0.04
9,9,18,13.5,0.2,0.1,0.02


Remember that our earlier mean computation
$$
\mu = 12
$$

Then let's calculate the mean of each our sample size to confirm its validity.

In [54]:
df_e['x_bar'].mean()

12.0