# Array Creation and Basic Manipulation

In this section, we create a NumPy array of 20 random integers between 10 and 100, reshape it to a 4×5 matrix, extract specific rows and columns, and compute basic statistics.

In [None]:
import numpy as np

# Create a NumPy array of 20 random integers between 10 and 100
arr = np.random.randint(10, 101, size=20)
print("Original Array:\n", arr)

# Reshape the array into a 4x5 matrix
arr_reshaped = arr.reshape(4, 5)
print("\nReshaped Array (4x5):\n", arr_reshaped)

# Extract the first two rows and last three columns
extracted = arr_reshaped[:2, -3:]
print("\nFirst two rows & last three columns:\n", extracted)

# Compute mean and standard deviation
mean_val = arr.mean()
std_val = arr.std()
print(f"\nMean of array: {mean_val:.2f}")
print(f"Standard deviation of array: {std_val:.2f}")

# Operations on 2D Arrays

We simulate a 2D array for 10 students' scores in 5 subjects, then calculate the average score per student and find the highest and lowest scores in the dataset.

In [None]:
# Simulate students' scores: 10 students, 5 subjects
scores = np.random.randint(40, 101, size=(10, 5))
print("Students' Scores (10x5):\n", scores)

# Calculate average score per student
avg_per_student = scores.mean(axis=1)
print("\nAverage score per student:\n", avg_per_student)

# Highest and lowest score in the dataset
highest_score = scores.max()
lowest_score = scores.min()
print(f"\nHighest score in dataset: {highest_score}")
print(f"Lowest score in dataset: {lowest_score}")

# Working with 3D Arrays

Here, we create a 3D NumPy array with dimensions (3, 4, 2) filled with random integers between 1 and 20. We then perform sum, max, and flattening operations.

In [None]:
# Create a 3D array (3, 4, 2) with random integers 1-20
arr3d = np.random.randint(1, 21, size=(3, 4, 2))
print("3D Array (3x4x2):\n", arr3d)

# Sum of elements across the second axis (axis=1)
sum_axis1 = arr3d.sum(axis=1)
print("\nSum across second axis (axis=1):\n", sum_axis1)

# Maximum value along each layer (axis=0)
max_per_layer = arr3d.max(axis=0)
print("\nMaximum value along each layer (axis=0):\n", max_per_layer)

# Flatten the entire 3D array into a 1D array
flattened = arr3d.flatten()
print("\nFlattened 1D array:\n", flattened)

# Measures of Center and Spread

Given the dataset of $CO_2$ emissions (in metric tons per capita) from five countries: [25.4, 30.2, 22.5, 28.1, 35.0], we compute the mean, median, mode, range, and standard deviation, and comment on the spread.

In [None]:
from scipy import stats

co2 = np.array([25.4, 30.2, 22.5, 28.1, 35.0])

mean_co2 = co2.mean()
median_co2 = np.median(co2)
mode_co2 = stats.mode(co2, keepdims=True).mode[0]
range_co2 = co2.max() - co2.min()
std_co2 = co2.std()

print(f"Mean: {mean_co2:.2f}")
print(f"Median: {median_co2:.2f}")
print(f"Mode: {mode_co2}")
print(f"Range: {range_co2:.2f}")
print(f"Standard Deviation: {std_co2:.2f}")

**Comment on the spread:**  
The $CO_2$ emissions data shows a moderate spread, with a range of about 12.5 metric tons per capita and a standard deviation indicating some variability among the countries. The mean and median are close, suggesting a fairly symmetric distribution, while the mode is not very informative due to all values being unique.

# Hypothesis Testing

We perform a two-sample t-test to compare beef consumption between Argentina and Bangladesh. We state the hypotheses, compute the t-statistic and p-value, and draw a conclusion.

**Null Hypothesis ($H_0$):**  
There is no significant difference in mean beef consumption between Argentina and Bangladesh.

**Alternative Hypothesis ($H_1$):**  
There is a significant difference in mean beef consumption between Argentina and Bangladesh.

In [None]:
argentina = np.array([60, 62, 58, 63, 59])
bangladesh = np.array([15, 12, 18, 14, 16])

t_stat, p_val = stats.ttest_ind(argentina, bangladesh)
print(f"T-statistic: {t_stat:.2f}")
print(f"P-value: {p_val:.4f}")

if p_val < 0.05:
    print("Conclusion: Reject the null hypothesis. There is a significant difference in mean beef consumption.")
else:
    print("Conclusion: Fail to reject the null hypothesis. No significant difference detected.")

# Correlation Analysis

We compute the Pearson correlation coefficient for consumption and $CO_2$ emission data, interpret the result, and explain the meaning if $r \approx 0$.

In [None]:
x = np.array([10, 15, 20, 25, 30])
y = np.array([30, 45, 50, 70, 85])

r, _ = stats.pearsonr(x, y)
print(f"Pearson correlation coefficient (r): {r:.2f}")

**Interpretation:**  
The Pearson correlation coefficient is close to 1, indicating a strong positive linear relationship between consumption and $CO_2$ emission. As consumption increases, $CO_2$ emission also increases.

If $r \approx 0$, it means there is little to no linear relationship between the two variables; changes in one variable do not predict changes in the other.

# Total Scores per Student

Given matrix $A$ representing scores of 4 students in 3 subjects, we compute the total score for each student by summing across rows.

In [None]:
A = np.array([
    [80, 70, 90],
    [60, 85, 75],
    [95, 88, 92],
    [70, 60, 65]
])

# Total scores per student (sum across rows)
total_scores = A.sum(axis=1).reshape(-1, 1)
print("Total scores per student (column vector):\n", total_scores)

# Average Score per Subject

We compute the average score for each subject by calculating the mean of each column in matrix $A$.

In [None]:
# Average score per subject (mean across columns)
avg_per_subject = A.mean(axis=0).reshape(1, -1)
print("Average score per subject (row vector):\n", avg_per_subject)

# Weighted Final Grades

Using the weights vector $w = [0.5, 0.3, 0.2]$, we compute each student's weighted final grade via matrix multiplication.

In [None]:
w = np.array([0.5, 0.3, 0.2])
G = A @ w.T
G = G.reshape(-1, 1)
print("Weighted final grades (column vector):\n", G)

# Applying Subject Importance

Suppose Mathematics is considered twice as important as English and Science. We multiply the Math column by 2 to create $A'$, recompute total scores, and compare to previous totals.

In [None]:
# Multiply Math column by 2 to create A'
A_prime = A.copy()
A_prime[:, 0] *= 2

# Recompute total scores per student
total_scores_prime = A_prime.sum(axis=1).reshape(-1, 1)
print("Total scores per student with Math weighted (column vector):\n", total_scores_prime)

# Compare to previous totals
print("\nComparison of totals:")
for i in range(4):
    print(f"Student {i+1}: Original={total_scores[i,0]}, Weighted Math={total_scores_prime[i,0]}")

**Discussion:**  
Weighting Mathematics more heavily increases each student's total score, especially for those who performed well in Math. This adjustment reflects the increased importance of Mathematics in the overall evaluation.