# Python for Social Science

<img src="../figures/PySocs_banner.png" width="50%" align="left">

# Python Data Types

## Dictionary

Wrapping up the previous topic, we will now formally introduce dictionaries. Dictionaries are commonly used in Python programming and consist of key-value pairs. 

Arguably, the dictionary or dict in Python may be the most important built-in Python data structure. A dictionary stores a collection of key-value pairs.

- Each **key** is unique and maps to a **value**.
- Dictionaries are unordered, so the elements are accessed using unique keys rather than numeric indexes.
- Keys must be immutable types (strings, tuples).
- Values can be any type (numbers, strings, lists, even other dictionaries).

### Syntax

```python
my_dict = {
    "key1": "value1",
    "key2": "value2",
    "key3": "value3"
}
```

Here's a simple dict containing some information on a student.

In [1]:
student = {
    "name": "Alice",
    "major": "EDP",
    "age": 30,
    "grades": ["A", "B", "A"]
}

print(student)

{'name': 'Alice', 'major': 'EDP', 'age': 30, 'grades': ['A', 'B', 'A']}


You can access, insert, delete, or set elements using the same syntax as for a list.

In [2]:
"name" in student

True

In [3]:
student["name"]

'Alice'

In [4]:
student["name"] = "Bob"

student

{'name': 'Bob', 'major': 'EDP', 'age': 30, 'grades': ['A', 'B', 'A']}

In [5]:
student["grades"][2]

'A'

In [6]:
student["final_grade"] = "A"

student

{'name': 'Bob',
 'major': 'EDP',
 'age': 30,
 'grades': ['A', 'B', 'A'],
 'final_grade': 'A'}

In [7]:
del student["age"]

student

{'name': 'Bob', 'major': 'EDP', 'grades': ['A', 'B', 'A'], 'final_grade': 'A'}

In [8]:
"age" in student

False

In [9]:
list(student.keys())

['name', 'major', 'grades', 'final_grade']

In [10]:
list(student.values())

['Bob', 'EDP', ['A', 'B', 'A'], 'A']

In [11]:
print(student.items())

dict_items([('name', 'Bob'), ('major', 'EDP'), ('grades', ['A', 'B', 'A']), ('final_grade', 'A')])


Notice that each item is a tuple of ('key', 'value') pairs.

### Example: Survey Data Recoding - Redux

Remember the earlier example where survey responses were recoded using list comprehension. The list comprehension example we used previously for recoding survey responses was not bad, but there is a more effective and cleaner way to accomplish this. We will demonstrate that dictionaries make such operations more straightforward.

In [2]:
# Create a dictionary to map scores to categories
mapping = {
    0: "Never",
    1: "Rarely",
    2: "Sometimes",
    3: "Often",
    4: "Always",
    None: "Unanswered"
}

mapping

{0: 'Never',
 1: 'Rarely',
 2: 'Sometimes',
 3: 'Often',
 4: 'Always',
 None: 'Unanswered'}

Now that we have a dictionary. We can use it in a list comprehension as follows:

In [8]:
# Your list of numeric responses
responses = [0, None, 3, 2, 1, 2, 4, 3, 4, 1, 0]

# Use list comprehension with dictionary lookup
categories = [mapping[x] for x in responses]

print(categories)

['Never', 'Unanswered', 'Often', 'Sometimes', 'Rarely', 'Sometimes', 'Always', 'Often', 'Always', 'Rarely', 'Never']


This works as long as `mapping` exhausts all potential values in `responses`. For example, the following will throw an error:

```python
responses = [0, None, 3, 2, 1, 2, 4, 3, 4, 1, 0, 99]
categories = [mapping[x] for x in responses]

KeyError: 99
```

So, how do we address the issue? 

Hint: `dict.get()`

In [None]:
# YOUR CODE HERE


#### Common Dictionary Methods

This table shows some common methods associated with dictionaries.

| Method                    | Description                      |
| ------------------------- | -------------------------------- |
| `dict.keys()`             | Returns all keys                 |
| `dict.values()`           | Returns all values               |
| `dict.items()`            | Returns key-value pairs          |
| `dict.get(key, default)`  | Safely get a value, with default |
| `dict.pop(key)`           | Remove key and return value      |
| `dict.popitem()`          | Remove last inserted key-value   |
| `dict.clear()`            | Remove all items                 |
| `dict.update(other_dict)` | Merge another dictionary         |


The fourth method `dict.get(key, default)` can be useful to map an unexpected value to a default value:

In [14]:
mapping.get(1, "Unanswered")

'Rarely'

In [15]:
mapping.get(99, "Unanswered")

'Unanswered'

#### Creating a Dictionary from Sequences

We can create a dictionary from two lists (more generally, sequences) that can be paired up element-wise. 

As a first shot, you may think something like this:

```python
mappings = {}
mappings.keys = [0, 1, 2, 3, 4]
mappings.values = ["Never", "Rarely", "Sometimes", "Often", "Always"]
mappings
```

❌ Unfortunately, that throws an error because `keys` and `values` are read-only *methods* of a dictionary, not *attributes* you can assign to.

We can achieve this using the `dict()` function and the `zip()`function as follows:

In [16]:
keys = [0, 1, 2, 3, 4]
values = ["Never", "Rarely", "Sometimes", "Often", "Always"]

mappings = dict(zip(keys, values))
print(mappings)

{0: 'Never', 1: 'Rarely', 2: 'Sometimes', 3: 'Often', 4: 'Always'}


`zip(keys, values)` pairs each `key` with its corresponding `value`:
(0, "Never"), (1, "Rarely"), ...

`dict(...)` converts those pairs into a dictionary.

You can't directly display what `zip(keys, values)` contains.

In [17]:
zip(keys, values)

<zip at 0x7fdbd8950a80>

`zip()` does not directly store all pairs in memory.
- It creates an **iterator** that generates pairs on the fly.
- Converting to `list()` or `tuple()` materializes the result for display or reuse.

In [18]:
list(zip(keys, values))

[(0, 'Never'), (1, 'Rarely'), (2, 'Sometimes'), (3, 'Often'), (4, 'Always')]

In [19]:
tuple(zip(keys, values))

((0, 'Never'), (1, 'Rarely'), (2, 'Sometimes'), (3, 'Often'), (4, 'Always'))

#### Creating a Dictionary using Comprehension

We can also use a **dictionary comprehension** to create a dictionary as follows:

In [20]:
mappings = {k: v for k, v in zip(keys, values)}
print(mappings)

{0: 'Never', 1: 'Rarely', 2: 'Sometimes', 3: 'Often', 4: 'Always'}


For dictionaries, we can use a for-loop along with **f-strings** to display the key-value pairs:

In [21]:
for key, value in mappings.items():
    print(f"{key}: {value}")

0: Never
1: Rarely
2: Sometimes
3: Often
4: Always


In Python, **f-strings** (formatted string literals) provide a concise and powerful way to embed expressions inside strings.
- An **f-string** is simply a string prefixed with the letter f or F.
- Inside the string, you can embed Python expressions using curly braces {}

In [22]:
name = "Alice"
where = "Wonderland"

message = f"My name is {name}, and I live in {where}."
print(message)

My name is Alice, and I live in Wonderland.


More examples with numbers...

In [23]:
pi = 3.14159
euler = 2.71828

f"My favorite numbers are {pi} and {euler}."

'My favorite numbers are 3.14159 and 2.71828.'

In [24]:
f"My favorite numbers are {pi:.3f} and {euler:.3f}."

'My favorite numbers are 3.142 and 2.718.'

In [25]:
f"pi + euler is {pi + euler:.2f}."

'pi + euler is 5.86.'

### In-Depth Study: Dictionary Comprehension

Let's recall the `words` list we created earlier from *The Eyes of Texas*. We will build a dictionary of the **unique** words in `words` and count how many times each unique word appears, ignoring capitalization. This means, for example, that "The" and "the" are considered the same word.

We will first create this dictionary, `by_word`, using a for-loop, and then demonstrate how to do the same with a dictionary comprehension.

For reference, the `words` list contains the following words.

In [26]:
lyrics = """
The Eyes of Texas are upon you,
All the livelong day.
The Eyes of Texas are upon you,
You cannot get away.
Do not think you can escape them
At night or early in the morn --
The Eyes of Texas are upon you
Til Gabriel blows his horn.
"""

cleaned_text = lyrics.replace(",", "") \
                     .replace(".", "") \
                     .replace("-", "")
print(cleaned_text)


The Eyes of Texas are upon you
All the livelong day
The Eyes of Texas are upon you
You cannot get away
Do not think you can escape them
At night or early in the morn 
The Eyes of Texas are upon you
Til Gabriel blows his horn



In [27]:
words = cleaned_text.split()

print(words)

['The', 'Eyes', 'of', 'Texas', 'are', 'upon', 'you', 'All', 'the', 'livelong', 'day', 'The', 'Eyes', 'of', 'Texas', 'are', 'upon', 'you', 'You', 'cannot', 'get', 'away', 'Do', 'not', 'think', 'you', 'can', 'escape', 'them', 'At', 'night', 'or', 'early', 'in', 'the', 'morn', 'The', 'Eyes', 'of', 'Texas', 'are', 'upon', 'you', 'Til', 'Gabriel', 'blows', 'his', 'horn']


In [28]:
print(sorted(words))

['All', 'At', 'Do', 'Eyes', 'Eyes', 'Eyes', 'Gabriel', 'Texas', 'Texas', 'Texas', 'The', 'The', 'The', 'Til', 'You', 'are', 'are', 'are', 'away', 'blows', 'can', 'cannot', 'day', 'early', 'escape', 'get', 'his', 'horn', 'in', 'livelong', 'morn', 'night', 'not', 'of', 'of', 'of', 'or', 'the', 'the', 'them', 'think', 'upon', 'upon', 'upon', 'you', 'you', 'you', 'you']


In [29]:
f"There are {len(words)} words."

'There are 48 words.'

In [30]:
by_word = {}

for word in words:
    word = word.lower()
    if word not in by_word:
        by_word[word] = 1
    else:
        by_word[word] += 1
        
by_word

{'the': 5,
 'eyes': 3,
 'of': 3,
 'texas': 3,
 'are': 3,
 'upon': 3,
 'you': 5,
 'all': 1,
 'livelong': 1,
 'day': 1,
 'cannot': 1,
 'get': 1,
 'away': 1,
 'do': 1,
 'not': 1,
 'think': 1,
 'can': 1,
 'escape': 1,
 'them': 1,
 'at': 1,
 'night': 1,
 'or': 1,
 'early': 1,
 'in': 1,
 'morn': 1,
 'til': 1,
 'gabriel': 1,
 'blows': 1,
 'his': 1,
 'horn': 1}

Now, we reproduce `by_word` using a dictionary comprehension as follows:

In [31]:
words_lower = [word.lower() for word in words]

{word: words_lower.count(word) for word in set(words_lower)}

{'at': 1,
 'get': 1,
 'away': 1,
 'blows': 1,
 'in': 1,
 'eyes': 3,
 'his': 1,
 'not': 1,
 'all': 1,
 'the': 5,
 'cannot': 1,
 'night': 1,
 'you': 5,
 'can': 1,
 'or': 1,
 'day': 1,
 'early': 1,
 'livelong': 1,
 'texas': 3,
 'morn': 1,
 'of': 3,
 'are': 3,
 'them': 1,
 'gabriel': 1,
 'do': 1,
 'escape': 1,
 'til': 1,
 'horn': 1,
 'upon': 3,
 'think': 1}

Here, we annotate the components of the dictionary comprehension example:

- `words_lower`:      ensures every element is lowercase.
- `set(words_lower)`: provides a *set* of unique words which ensures we only count each word once.
- `.count(word)`:     a method to count a specific word in a set.
- `set(words)`:       creates a set of unique words.

If you think that's too much work, which I agree, you can use the `collections` package and its function `Counter` to achieve the same thing as follows:

In [32]:
from collections import Counter

In [33]:
Counter(word.lower() for word in words)

Counter({'the': 5,
         'eyes': 3,
         'of': 3,
         'texas': 3,
         'are': 3,
         'upon': 3,
         'you': 5,
         'all': 1,
         'livelong': 1,
         'day': 1,
         'cannot': 1,
         'get': 1,
         'away': 1,
         'do': 1,
         'not': 1,
         'think': 1,
         'can': 1,
         'escape': 1,
         'them': 1,
         'at': 1,
         'night': 1,
         'or': 1,
         'early': 1,
         'in': 1,
         'morn': 1,
         'til': 1,
         'gabriel': 1,
         'blows': 1,
         'his': 1,
         'horn': 1})

### Practice Exercise - Movie Ratings Database

You are given a small dataset of movies, where each movie has information such as genre, year released, and ratings from different reviewers.

Your task is to perform different operations to analyze this dataset using Python dictionaries.

In [12]:
movies = {
    "Inception": {
        "year": 2010,
        "genre": "Sci-Fi",
        "ratings": {"IMDB": 8.8, "RottenTomatoes": 87, "Metacritic": 74}
    },
    "The Dark Knight": {
        "year": 2008,
        "genre": "Action",
        "ratings": {"IMDB": 9.0, "RottenTomatoes": 94, "Metacritic": 84}
    },
    "Interstellar": {
        "year": 2014,
        "genre": "Sci-Fi",
        "ratings": {"IMDB": 8.6, "RottenTomatoes": 72, "Metacritic": 74}
    },
    "The Godfather": {
        "year": 1972,
        "genre": "Crime",
        "ratings": {"IMDB": 9.2, "RottenTomatoes": 97, "Metacritic": 100}
    },
    "Parasite": {
        "year": 2019,
        "genre": "Thriller",
        "ratings": {"IMDB": 8.6, "RottenTomatoes": 99, "Metacritic": 96}
    }
}

import pandas as pd

pd.DataFrame(movies)

Unnamed: 0,Inception,The Dark Knight,Interstellar,The Godfather,Parasite
year,2010,2008,2014,1972,2019
genre,Sci-Fi,Action,Sci-Fi,Crime,Thriller
ratings,"{'IMDB': 8.8, 'RottenTomatoes': 87, 'Metacriti...","{'IMDB': 9.0, 'RottenTomatoes': 94, 'Metacriti...","{'IMDB': 8.6, 'RottenTomatoes': 72, 'Metacriti...","{'IMDB': 9.2, 'RottenTomatoes': 97, 'Metacriti...","{'IMDB': 8.6, 'RottenTomatoes': 99, 'Metacriti..."


#### Task 1: Display all movie titles

Print out just the movie titles, one per line. You may find the following table helpful:

| Code                        | What Python Iterates Over | Notes                   |
| --------------------------- | ------------------------- | ----------------------- |
| `for x in dict:`            | **Keys**                  | Default behavior        |
| `for x in dict.keys():`     | **Keys**                  | Explicit, same as above |
| `for x in dict.values():`   | **Values**                | Values only             |
| `for k, v in dict.items():` | **(Key, Value) pairs**    | Most useful for both    |


In [13]:
# YOUR CODE HERE


Inception
The Dark Knight
Interstellar
The Godfather
Parasite


#### Task 2: Find movies released after 2010

Create a list of movie names released after 2010.

**Hint**: Check the "year" key inside each movie dictionary.

The expected output:

```python
['Interstellar', 'Parasite']
```

In [16]:
# YOUR CODE HERE


['Interstellar', 'Parasite']

#### Task 3: Get the average IMDB rating

Calculate the average IMDB rating of all movies.

In [20]:
# YOUR CODE HERE


8.84

#### Task 4: Add a new movie to the dictionary

Add "Oppenheimer" with this data:
- Year: 2023
- Genre: "Biography"
- Ratings: IMDB: 8.5, RottenTomatoes: 93, Metacritic: 88

In [2]:
# YOUR CODE HERE


# Introduction to Numpy

![NumPy](../figures/NumPy.png)

NumPy (Numerical Python) is the fundamental package for scientific computing in Python. Along with pandas, NumPy is considered the lingua franca for scientific computation. It provides:

-    Efficient array operations
-    Mathematical functions
-    Random number generation
-    Linear algebra operations
-    Integration with other libraries like Pandas, Matplotlib, and SciPy

Although many data analysts prefer pandas for most statistical analyses, understanding array-oriented computing with NumPy is essential for advanced data analysis tasks.

We will focus primarily on NumPy’s array-based operations for data munging and cleaning, subsetting and filtering, transforming data, and performing key statistical operations. This includes tasks such as reshaping, aggregating, and summarizing data.

## Loading NumPy

In [None]:
import numpy as np

## NumPy Arrays

NumPy arrays are the core of the library. They are faster and more **memory-efficient** than Python lists. They also support vectorized computations, replacing loops and comprehensions.

### Creating NumPy Arrays

NumPy arrays are iterable objects similar to Python lists. NumPy arrays are homogeneous (i.e., atomic or all elements must be the same type).

We can create a NumPy array from a list as follows:

In [None]:
# From a Python list
my_list = [1, 2, 3, 4]
my_array = np.array(my_list)
print(my_array)  # [1 2 3 4]

In [None]:
# Range of numbers
arr_range = np.arange(0, 10, 2)  # start, stop, step (note stop is exclusive)
print(arr_range)

# Linearly spaced numbers
arr_linspace = np.linspace(0, 1, 5)  # 5 numbers between 0 and 1
print(arr_linspace)

### Array Operations

NumPy supports element-wise operations, matrix operations, and aggregations. It promotes vectorized operations instead of loops for efficiency.

In [None]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Arithmetic operations
print(a + b)   # [5 7 9]
print(a - b)   # [-3 -3 -3]
print(a * b)   # [4 10 18] element-wise
print(a / b)   # [0.25 0.4 0.5]


To add a number to each element of a Python list we needed a for-loop or a list comprehension. For exampe,

In [None]:
[x + 5 for x in my_list]

In NumPy, this task is accomplished through a vectorized computation as shown below:

In [None]:
my_array + 5

### Speed Advantage

As demonstrated in Wes McKinney's book (p. 85), there is a significant performance advantage for NumPy—ranging from 10 to 100 times. For example, consider a list of one million integers compared to an equivalent NumPy array.

In [None]:
my_list = list(range(1_000_000))
my_array = np.arange(1_000_000)

Let's divide each sequence by 2 and compare the time difference. 

In [None]:
%timeit [x / 2 for x in my_list]

In [None]:
%timeit my_array / 2

### Memory Efficiency

A NumPy array is a collection of homogeneous data-types that are stored in **contiguous** memory locations. Homogenous data elements let the NumPy array be densely packed resulting in lesser memory consumption. For large numerical data, always use NumPy arrays for better memory efficiency and performance.

### Mathematical functions

NumPy provides vectorized math operations, meaning they operate on entire arrays at once — much faster and cleaner than Python loops.

| Function            | Description                               |
| ------------------- | ----------------------------------------- |
| `np.add(x, y)`      | Element-wise addition                     |
| `np.subtract(x, y)` | Element-wise subtraction                  |
| `np.multiply(x, y)` | Element-wise multiplication               |
| `np.divide(x, y)`   | Element-wise division                     |
| `np.power(x, y)`    | Raise elements of `x` to the power of `y` |
| `np.mod(x, y)`      | Element-wise remainder                    |
| `np.sqrt(x)`        | Square root of `x`                        |
| `np.exp(x)`         | $e^x$                                     |
| `np.log(x)`         | Natural logarithm $\text{ln}(x)$                 |
| `np.log10(x)`       | Base-10 logarithm                         |

In [None]:
x = np.array([10, 20, 30, 40])
y = np.array([1, 2, 3, 4])

print("Add:", np.add(x, y))
print("Subtract:", np.subtract(x, y))
print("Multiply:", np.multiply(x, y))
print("Divide:", np.divide(x, y))
print("Power:", np.power(x, 2))
print("Modulus:", np.mod(x, y))
print("Square root:", np.sqrt(x))
print("Exponentiation:", np.exp(y))

### Rounding Functions

| Function         | Description                       |
| ---------------- | --------------------------------- |
| `np.round(x, n)` | Round to `n` decimal places       |
| `np.floor(x)`    | Round **down** to nearest integer |
| `np.ceil(x)`     | Round **up** to nearest integer   |
| `np.trunc(x)`    | Truncate decimal part             |

In [None]:
vals = np.array([1.234, 2.678, -3.456])

print("Round to 2 decimals:", np.round(vals, 2))
print("Floor:", np.floor(vals))
print("Ceil:", np.ceil(vals))
print("Truncate:", np.trunc(vals))

### Aggregation Functions

Aggregate functions reduce an array to a single value or along a specified axis.

| Function         | Purpose             |
| ---------------- | ------------------- |
| `np.size(arr)`   | N of all elements   |
| `np.sum(arr)`    | Sum of all elements |
| `np.mean(arr)`   | Mean (average)      |
| `np.median(arr)` | Median              |
| `np.std(arr)`    | Standard deviation  |
| `np.var(arr)`    | Variance            |
| `np.min(arr)`    | Minimum value       |
| `np.max(arr)`    | Maximum value       |


In [None]:
data = np.array([1, 3, 5, 7, 9])

print("N:", np.size(data))
print("Sum:", np.sum(data))
print("Mean:", np.mean(data))
print("Median:", np.median(data))
print("Standard Deviation:", np.std(data))
print("Variance:", np.var(data))
print("Min:", np.min(data))
print("Max:", np.max(data))

### Aggregate Functions with Different Axes

For multidimensional arrays, `ndarray`, the NumPy aggregate functions can work with different axes. Let's go step-by-step to understand how aggregations like `sum()`, `mean()`, `max()`, etc., can **collapse** dimensions of arrays based on the axis you specify. 

The concept of axes is summarized in the following table:

| Axis       | Meaning in a 2D array                                                                    |
| ---------- | ---------------------------------------------------------------------------------------- |
| **axis=0** | Operates **down the rows**, collapsing **rows**, result is **one value per column**      |
| **axis=1** | Operates **across the columns**, collapsing **columns**, result is **one value per row** |

They can be concisely denoted as: axis=0 ↓    axis=1 →.

![Array Dimensions and Axes](../figures/array_dimensions_axes.png)

Source: Vaughan, L. (2023) Python Tools for Scientists.

In [None]:
# Create a 3x4 array
A = np.array([
    [1, 2, 3, 4],
    [5, 6, 7, 8],
    [9, 10, 11, 12]
])

print("Array A:\n", A)
print("\nShape:", A.shape)

#### Example: `np.sum()` in 2D

In [None]:
# Sum across rows (axis=0)
sum_axis0 = np.sum(A, axis=0)
print("Sum along axis=0 ↓ (column-wise):", sum_axis0)

# Sum across columns (axis=1)
sum_axis1 = np.sum(A, axis=1)
print("Sum along axis=1 → (row-wise):", sum_axis1)

# Sum of all elements (no axis specified)
total_sum = np.sum(A)
print("Sum of all elements:", total_sum)

#### Example: `np.sum()` in 3D

In [None]:
B = np.arange(1, 25).reshape(2, 3, 4)
print("Array B:\n", B)
print("Shape:", B.shape)

| Axis       | Meaning in 3D array                                             |
| ---------- | --------------------------------------------------------------- |
| **axis=0** | Operates **between blocks**, collapsing **the first dimension** |
| **axis=1** | Operates **between rows** inside each block                     |
| **axis=2** | Operates **between columns** inside each row                    |

In [None]:
print("Sum over axis=0:\n", np.sum(B, axis=0))  # Shape becomes (3,4)
print("\nSum over axis=1:\n", np.sum(B, axis=1))  # Shape becomes (2,4)
print("\nSum over axis=2:\n", np.sum(B, axis=2))  # Shape becomes (2,3)

### In-Depth Illustration

Imagine we are monitoring temperature readings (in °F) for 2 cities, over 3 days, at 4 different times per day:

- Cities: ["Austin" "Dallas"]
- Days: ["Day 1", "Day 2", "Day 3"]
- Times per day: ["Morning", "Afternoon", "Evening", "Night"]

Shape of Data
- Shape = (2 cities, 3 days, 4 time slots)

In [None]:
# Temperature readings: shape (2, 3, 4)
temps = np.array([
    # Austin (3 days x 4 times)
    [
        [85, 90, 88, 74],  # Day 1
        [86, 96, 89, 85],  # Day 2
        [81, 97, 90, 83]   # Day 3
    ],

    # Dallas (3 days x 4 times)
    [
        [82, 97, 95, 81],  # Day 1
        [83, 98, 96, 82],  # Day 2
        [84, 99, 92, 83]   # Day 3
    ]
])

cities = ["Austin", "Dallas"]
days = ["Day 1", "Day 2", "Day 3"]
times = ["Morning", "Afternoon", "Evening", "Night"]

print("Temperature Array:\n", temps)
print("Shape:", temps.shape)


#### Understanding the Axes

| Axis       | Represents                                 | Size |
| ---------- | ------------------------------------------ | ---- |
| **axis=0** | Cities                                     | 2    |
| **axis=1** | Days                                       | 3    |
| **axis=2** | Times (Morning, Afternoon, Evening, Night) | 4    |

We can access a specific element of `temps`:

```python
temps[city, day, time]
```

In [None]:
# Temperature in Austin, Day 3, Morning

austin_day3_morning = temps[0, 2, 0]
print("Austin, Day 3, Morning:", austin_day3_morning, "°F")

#### Practice Exercise 5

Collapse Across Days and Times → Mean per City. 

- We want to average across both days (axis=1) and times (axis=2), leaving only the city dimension. 
- Let's save the result as `mean_per_city`.
- Print `mean_per_city`.

In [None]:
# YOUR CODE HERE


#### Practice Exercise 6

Collapse Across Cities and Times → Mean per Day.

- Here, we average across axis=0 (cities) and axis=2 (times), keeping days.
- Let's save the result as `mean_per_day`.
- Print `mean_per_day`.

In [None]:
# YOUR CODE HERE


#### Practice Exercise 7

Collapse Across Cities and Days → Mean per Time Slot

- Now, we average across axis=0 (cities) and axis=1 (days), leaving only time slots.
- Let's save the result as `mean_per_time`
- Print `mean_per_time`

In [None]:
# YOUR CODE HERE


#### Collapse All Axes → Overall Mean

If you omit axis, NumPy averages across all elements.

In [None]:
overall_mean = np.mean(temps)
print("Overall average temperature:", overall_mean.round(2))

| Operation                                       | Code                         | Remaining Shape |
| ----------------------------------------------- | ---------------------------- | --------------- |
| Mean **per city** (collapse days + times)       | `np.mean(temps, axis=(1,2))` | `(2,)`          |
| Mean **per day** (collapse cities + times)      | `np.mean(temps, axis=(0,2))` | `(3,)`          |
| Mean **per time slot** (collapse cities + days) | `np.mean(temps, axis=(0,1))` | `(4,)`          |
| Overall mean (collapse everything)              | `np.mean(temps)`             | Scalar          |

## Multidimensional Arrays (`ndarray`)

One of the core objects in the NumPy library is the n-dimensional array object, **ndarray**. It represents a multidimensional, **homogeneous** array of elements and serves as a fundamental data structure for scientific computing in Python, enabling efficient storage, fast mathematical operations, and manipulation of large datasets.

### Creating `ndarray`

In [None]:
# Create a 2D array of shape (3, 4)
arr2d = np.array([[1, 2, 3, 4], 
                  [5, 6, 7, 8], 
                  [9, 10, 11, 12]])
print(arr2d)

You can use the standard indexing to access (select or modify) a specific element a 2-d array using row index and column index.

In [None]:
arr2d[2, 2] # Row 2, Column 2

Alternatively, we can access this 2-d array in two steps, or recursively.

In [None]:
arr2d[2][2] # Row 2, Column 2

To generalize this in a higher dimensional array, let's consider a 3-dimensional array of shape `(2, 3, 4)`. 

In [None]:
# Create a 3D array of shape (2, 3, 4)
arr3d = np.array([
    [
        [1, 2, 3, 4],
        [5, 6, 7, 8],
        [9, 10, 11, 12]
    ],
    [
        [13, 14, 15, 16],
        [17, 18, 19, 20],
        [21, 22, 23, 24]
    ]
])

print(arr3d)

Before we discuss how to access the elements of this array, can you think of a simpler (easier) way to create the same array? We can create an `ndarray` by reshaping a one-dimensional array as follows:

In [None]:
arr3d = np.arange(1, 25).reshape((2, 3, 4))
print(arr3d)

Similar to the `range()` function, `np.arange()` returns evenly spaced values within a given interval.

```python
np.arange([start,] stop[, step,])
```

The only required argument is `stop`, and the interval is half-open or open on the right side, `[start, stop)`, excluding `stop`. Try `np.arange?` for more information

Now, `11` is in **Block** 0, **Row** 2, and **Column** 2. So, we can access the element as follows:

In [None]:
arr3d[0, 2, 2] 

or recursively

In [None]:
arr3d[0][2][2]

The second method is a little slower but works as well.

### Special Functions for creating `ndarray` 

Several helper functions are available in NumPy to generate special `ndarray` objects.

In [None]:
# Array of zeros
zeros = np.zeros((2, 3))
print(zeros)

In [None]:
# Array of ones
ones = np.ones((3, 2))
print(ones)

In [None]:
# Array of a specific number
full = np.full((3, 3), 99)
print(full)

In [None]:
# Identity matrix 3x3
I = np.eye(3)
print(I)

### Array Attributes

NumPy ndarrays have several **attributes** that are easily accessible and can be retrieved as follows:

In [None]:
arr = np.array([[1, 2, 3], 
                [4, 5, 6]])

print(arr.shape)  # (2, 3)
print(arr.ndim)   # 2 dimensions
print(arr.size)   # 6 elements
print(arr.dtype)  # data type


### Indexing and Slicing `ndarray`

![Slicing a 2D array](../figures/slicing_2d_array.png)

Source: Vaughan, L. (2023) Python Tools for Scientists.

#### Indexing a 1D Array

In [None]:
arr = np.array([10, 20, 30, 40, 50])

# Indexing
print(arr[0])   # 10
print(arr[-1])  # 50

# Slicing
print(arr[1:4])   # [20 30 40]
print(arr[:3])    # [10 20 30]
print(arr[::2])   # [10 30 50]

#### Boolean Indexing and Filtering

In [None]:
arr = np.array([10, 20, 30, 40, 50])

# Condition
mask = arr > 25
print(mask)          # [False False  True  True  True]
print(arr[mask])     # [30 40 50]

# Combine conditions
print(arr[(arr > 20) & (arr < 50)])  # [30 40]


#### Indexing a 2D Array

In [None]:
arr2d = np.array([
    [10, 20, 30, 40],
    [50, 60, 70, 80],
    [90, 100, 110, 120]
])

print(arr2d[0, 1])    # 20
print(arr2d[:, 1])    # all rows, column 1 -> [20 60 100]
print(arr2d[1, :])    # all columns, row 1 -> [50 60 70 80]

#### Fancy Indexing (Using Lists/Arrays of Indices)

Fancy indexing in NumPy refers to advanced indexing where you use arrays or lists of indices (instead of just slices or single integers) to access or modify multiple elements of a NumPy array at once.

In [None]:
# Selecting specific rows

print(arr2d[[0, 2]])  # Select rows 0 and 2

In [None]:
# Selecting specific rows

print(arr2d[:, [1, 3]]) # Select columns 1 and 3

In [None]:
# Selecting specific (row, column) pairs

print(arr2d[[0, 1, 2], [3, 2, 1]]) # Picks (0,3), (1,2), (2,1) → [40, 70, 100]

#### Mixing Slicing and Fancy Indexing

You can combine slicing and fancy indexing for more complex selections.

In [None]:

print(arr2d[0:2, [1, 3]])  # Select rows 0 and 1, columns 1 and 3

#### Reshaping and Stacking

In [None]:
arr = np.arange(1, 13)  # 1D array with 12 elements

# Reshape
arr2d = arr.reshape(3, 4)
print(arr2d)

# Flatten
print(arr2d.flatten())

# Stacking
a = np.array([1, 2])
b = np.array([3, 4])
print(np.vstack((a, b)))  # vertical
print(np.hstack((a, b)))  # horizontal


#### Summary of Indexing Methods

| Method               | Example         | Use Case              |
| -------------------- | --------------- | --------------------- |
| **Basic indexing**   | `arr[1, 2]`     | Single element        |
| **Row slice**        | `arr[1, :]`     | All columns of a row  |
| **Column slice**     | `arr[:, 2]`     | All rows of a column  |
| **Submatrix slice**  | `arr[0:2, 1:3]` | Subset block          |
| **Negative index**   | `arr[-1, -1]`   | Last row/column       |
| **Fancy indexing**   | `arr[:, [1,3]]` | Specific columns      |
| **Boolean indexing** | `arr[arr>50]`   | Conditional selection |


### View, Shallow Copy, and Deep Copy

In NumPy, understanding views, shallow copies, and deep copies is crucial because they determine how data is shared or duplicated in memory. These concepts affect performance, memory usage, and unexpected changes to arrays.

### View in NumPy

A view is a different array object that shares the same data in memory with the original array.

- No data is copied.
- Changes made in one will reflect in the other.

Created using:
- Slicing (arr[1:4])
- reshape() (sometimes)
- .view() method

In [None]:
# Example: View via Slicing

a = np.array([1, 2, 3, 4, 5])
b = a[1:4]   # view
b[0] = 99    # change in view affects the original

print("Original array:", a)  # [ 1 99  3  4  5 ]
print("View array:", b)      # [99  3  4]

✔️  **Key point**: Both `a` and `b` share the same underlying data buffer.

#### Shallow Copy

A shallow copy creates a new array object, but the data buffer is still shared.

- Think of it as a view with its own metadata, such as shape or dtype.
- Changes to the data affect both arrays.
- Changes to the shape of one don’t affect the other.

Created using:
- ndarray.view()
- Certain functions like reshape() (if possible)

In [None]:
# Example: Shallow Copy

a = np.array([[1, 2, 3], 
              [4, 5, 6]])

b = a.view()  # shallow copy

b.shape = (3, 2)  # metadata change only affects `b`

print("Shape of a:", a.shape)  # (2, 3)
print("Shape of b:", b.shape)  # (3, 2)

b[0, 0] = 99
print("\nOriginal array:\n", a)

✔️ **Key point**: The shape was independent, but data changes are shared.

#### Deep Copy

A deep copy creates a completely new array and data buffer.
- No data sharing between the original and the copy.
- Safe for independent modifications.

Created using:
- np.copy()
- ndarray.copy()

In [None]:
# Example: Deep Copy

a = np.array([1, 2, 3, 4, 5])
b = a.copy()   # deep copy

b[0] = 99
print("Original array:", a)  # [1 2 3 4 5]
print("Deep copy:", b)       # [99  2  3  4  5]

✔️ **Key point**: Changes to `b` do **NOT** affect `a`.

#### Comparison of Shallow and Deep Copy

| **Feature**        | **View** (Slice) | **Shallow Copy (`.view()`)** | **Deep Copy (`.copy()`)** |
| ------------------ | ---------------- | ---------------------------- | ------------------------- |
| New object created | ✅ Yes            | ✅ Yes                        | ✅ Yes                     |
| Shares data buffer | ✅ Yes            | ✅ Yes                        | ❌ No                      |
| Shape independence | ❌ No (linked)    | ✅ Yes                        | ✅ Yes                     |
| Data independence  | ❌ No             | ❌ No                         | ✅ Yes                     |
| Speed              | ⚡ Very Fast      | ⚡ Very Fast                  | 🐢 Slower (data copied)   |


#### When to Use Which

| **Scenario**                         | **Recommended**                                  |
| ------------------------------------ | ------------------------------------------------ |
| Large datasets, performance critical | **View** or **Shallow Copy** (avoid duplication) |
| Data must remain unchanged and safe  | **Deep Copy**                                    |
| Modifying shape only, not data       | **Shallow Copy**                                 |
| Completely independent copy needed   | **Deep Copy**                                    |

### Array Arithmetic

Computation with NumPy arrays is vectorized, meaning you can perform arithmetic operations using simple mathematical and logical operators.

In [None]:
A = np.arange(1, 10).reshape((3, 3))
A

In [None]:
B = np.arange(9, 0, -1).reshape((3, 3))
B

In [None]:
A + B

In [None]:
B - A

In [None]:
A * B

In [None]:
A / B

In [None]:
A > B

### Array Broadcasting

There are a set of strict rules for how computations involving two arrays are performed in NumPy.

![Rules of Broadcasting](../figures/broadcasting.png)

Source: Jake VanderPlas (2023). Python Data Science Handbook

#### Rule 1

If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading (left) side.

In [None]:
np.arange(3) + 5

#### Rule 2
If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.

In [None]:
np.ones((3, 3)) + np.arange(3)

In [None]:
np.arange(3).reshape((3, 1)) + np.arange(3)

#### Rule 3

If in any dimension the sizes disagree and neither is equal to 1, an error is raised.

```python
np.ones((3, 3)) + np.zeros((4, 4))

ValueError: operands could not be broadcast together with shapes (3,3) (4,4)
```

### Random Number Functions

NumPy provides a powerful random module to generate random numbers from a variety of distributions.

| Function                             | Description                    |
| ------------------------------------ | ------------------------------ |
| `np.random.rand(n)`                  | Uniform random numbers \[0,1)  |
| `np.random.randn(n)`                 | Standard normal distribution   |
| `np.random.normal(loc, scale, size)` | Normal distribution ($\mu$, $\sigma$, $N$) |
| `np.random.randint(low, high, size)` | Random integers                |
| `np.random.choice(arr)`              | Random selection from an array |


### Reproducing Random Numbers

Random number generators produce pseudo-random numbers. As a result, the "random" numbers from these functions can be reproduced if given a specific seed number. This is useful for replication purposes.

In [None]:
# Create a random generator with a fixed seed
rng = np.random.default_rng(seed=2222)

# Generate 10 random numbers from N(μ=50, σ=5)
random_numbers = rng.normal(loc=50, scale=5, size=10)
print(random_numbers)

### Visualizing Random Numbers

In [None]:
import matplotlib.pyplot as plt

rng = np.random.default_rng(seed=2222)
data = rng.normal(loc=50, scale=5, size=1000)

plt.hist(data, bins=30, edgecolor='black')
plt.title("Normal Distribution (μ=50, σ=5)")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()

### Matrix Operations

The following table summarizes matrix operations that are commonly used in numerical and statistical analyses:

| Operation             | NumPy Function          |
| --------------------- | ----------------------- |
| Addition              | `+`                     |
| Subtraction           | `-`                     |
| Scalar Multiplication | `*`                     |
| Matrix Multiplication | `@` or `np.dot`         |
| Transpose             | `.T`                    |
| Determinant           | `np.linalg.det`         |
| Inverse               | `np.linalg.inv`         |
| Rank                  | `np.linalg.matrix_rank` |
| Solve Equations       | `np.linalg.solve`       |
| Eigenvalues/Vectors   | `np.linalg.eig`         |
| SVD                   | `np.linalg.svd`         |


In [None]:
A = np.array([[1, 2], 
              [3, 4]])

B = np.array([[5, 6], 
              [7, 8]])

# Element-wise multiplication
print(A * B)

In [None]:
# Matrix multiplication
print(A @ B)

In [None]:
# Matrix multiplication with `dot()`
print(np.dot(A, B))

In [None]:
# Transpose
print(A.T)

### Matrix Algebra

Matrix algebra is essential in many areas such as:

- Machine learning
- Data science
- Computer graphics
- Statistics and optimization

NumPy makes it very efficient to handle matrices and perform linear algebra operations.

In NumPy:
- Vectors → 1D arrays
- Matrices → 2D arrays (numpy.ndarray)
- Higher-order tensors → 3D or more dimensional arrays

### Creating Matrices

In [None]:
# Manually create a matrix
A = np.array([[1, 2], [3, 4]])
print("Matrix A:\n", A)

# Zeros and ones matrices
zero_matrix = np.zeros((2, 3))
ones_matrix = np.ones((3, 3))
print("\nZero Matrix:\n", zero_matrix)
print("\nOnes Matrix:\n", ones_matrix)

# Zeros and ones matching the shape of another array
zero_like_A = np.zeros_like(A)
ones_like_A = np.ones_like (A)
print("\nZero Like A:\n", zero_like_A)
print("\nOnes Like A:\n", ones_like_A)

# Identity matrix
I = np.eye(3)
print("\nIdentity Matrix:\n", I)

# Random matrix
random_matrix = np.random.rand(2, 2) # uniform distribution (0, 1)
print("\nRandom Matrix:\n", random_matrix)

### Matrix Properties

In [None]:
print("Shape of A:", A.shape, "\n")
print("Transpose of A:\n", A.T, "\n")
print("Flatten A:", A.flatten())

A brief aside: we can recreate the `flatten()` function using a list comprehension.

In [None]:
[x for row in A for x in row]

Can you flatten this?

In [None]:
A = np.array([
    [
        [1, 2, 3, 4],
        [5, 6, 7, 8],
        [9, 10, 11, 12]
    ],
    [
        [13, 14, 15, 16],
        [17, 18, 19, 20],
        [21, 22, 23, 24]
    ]
])

In [None]:
A.flatten()

I mean using a list comprehension.

In [None]:
np.array([val for depth in A for row in depth for val in row])

### Matrix Multiplication

In [None]:
A = np.array([[1, 2], [3, 4]])
print("Matrix A:\n", A, "\n")

B = np.array([[5, 6], [7, 8]])
print("Matrix B:\n", B)

# Element-wise multiplication
print("\nElement-wise A * B:\n", A * B)

# Matrix multiplication
print("\nMatrix multiplication (A @ B):\n", A @ B)

# Matrix multiplication A.dot(B)
print("\nMatrix multiplication (A.dot(B)):\n", A.dot(B))

### Determinant, Inverse, and Rank

In [None]:
det_A = np.linalg.det(A)
print("Determinant of A:", det_A)

# Inverse of A
A_inv = np.linalg.inv(A)
print("\nInverse of A:\n", A_inv)

# Verify A @ A_inv = Identity
print("\nA @ A_inv:\n", A @ A_inv)

# Rank of A
rank_A = np.linalg.matrix_rank(A)
print("\nRank of A:", rank_A)

The rank of a matrix is the maximum number of linearly independent rows or columns it contains, which is equivalent to the dimension of the vector space spanned by its rows or columns.

### Eigenvalues and Eigenvectors

$$Av=λv$$

In [None]:
A = np.array([[4, 2],
              [1, 3]])

eigenvalues, eigenvectors = np.linalg.eig(A)

print("Eigenvalues (λ):", eigenvalues)
print("\nEigenvectors (v):\n", eigenvectors)

### Practice Exercise 8

Create two 3×3 matrices `M1` and `M2` with random **integers** between 1 and 10 (`np.random.randint(min, max, (shape))`).

Compute: 
   - `M1 + M2`
   - `M1 - M2`
   - Element-wise multiplication
   - Matrix multiplication `M1 @ M2`

The expected results should resemble the following (note that the numbers may differ):

```python
M1 = 
 [[10  3  3]
 [ 6  3  2]
 [ 7  7  9]] 

M2 = 
 [[4 5 3]
 [7 5 1]
 [7 1 7]] 

M1 * M2 = 
 [[40 15  9]
 [42 15  2]
 [49  7 63]] 

M1 @ M2 = 
 [[ 82  68  54]
 [ 59  47  35]
 [140  79  91]] 
```

In [None]:
# YOUR CODE HERE


### Practice Exercise 9

Given matrix:
$$
C = \begin{bmatrix} 2 & 5 \\ 1 & 3 \end{bmatrix}
$$

- Find the determinant, `det_C`.
- Find the inverse, `C_inv`.
- Verify $C @ C^{-1} = I$.

The expected results are as follows:

```python
Determinant of C: 1.0 

Inverse of C:
 [[ 3. -5.]
 [-1.  2.]] 

Verification C @ C_inv:
 [[1. 0.]
 [0. 1.]]
```

In [None]:
# YOUR CODE HERE


### Common Statistical Functions

The table below summarized a set of common statistical functions available in NumPy:

| Function          | Description                                                  |
| ----------------- | ------------------------------------------------------------ |
| `np.mean()`       | Compute the arithmetic mean along the specified axis         |
| `np.median()`     | Compute the median along the specified axis                  |
| `np.std()`        | Compute the standard deviation                               |
| `np.var()`        | Compute the variance                                         |
| `np.min()`        | Minimum value of the array                                   |
| `np.max()`        | Maximum value of the array                                   |
| `np.percentile()` | Compute the q-th percentile of data along the specified axis |
| `np.quantile()`   | Compute quantiles of data along the specified axis           |
| `np.corrcoef()`   | Correlation coefficient matrix                               |
| `np.cov()`        | Covariance matrix                                            |
| `np.sum()`        | Sum of array elements                                        |
| `np.prod()`       | Product of array elements                                    |
| `np.cumsum()`     | Cumulative sum of elements                                   |
| `np.cumprod()`    | Cumulative product of elements                               |


### Statistical Operations

In this illustration, we will show how to apply a series of statistical functions to perform the eigen decomposition of a matrix in four steps.

- **Step 1**: Create a 5×3 dataset with random values.
- **Step 2**: Center the data (subtract column mean).
- **Step 3**: Compute the covariance matrix.
- **Step 4**: Find the eigenvalues and eigenvectors.

In [None]:
# Step 1: Create a 5×3 dataset with random values.
data = np.random.rand(5, 3)
print("Original data:\n", data)

In [None]:
# Step 2: Center the data (subtract column mean).
mean = data.mean(axis=0)
centered_data = data - mean
print("\nCentered data:\n", centered_data)

In [None]:
# Step 3: Compute the covariance matrix.
cov_matrix = np.cov(centered_data.T)
print("\nCovariance matrix:\n", cov_matrix)

In [None]:
# Step 4: Eigen decomposition
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)
print("\nEigenvalues:", eigenvalues)
print("\nEigenvectors:\n", eigenvectors)

## NumPy Array Illustration

### Data Normalization

In [None]:
data = np.array([10, 20, 30, 40, 50])
normalized = (data - data.mean()) / data.std()
print(normalized)


You can recreate this normalization using a list comprehension without using NumPy’s vectorized operations or specialized methods: `.mean()` and `.std()`.

In [None]:
data = [10, 20, 30, 40, 50]

normalized = [
    (x - sum(data)/len(data)) / (sum((y - sum(data)/len(data))**2 for y in data)/len(data))**0.5
    for x in data
]

print(normalized)

⚠️ Note: This works, but it recomputes the mean twice and the standard deviation repeatedly for each element, which is less efficient than storing intermediate results.

A more efficient one that avoids computing the mean and standard deviation repeatedly is to use the `:=` operator (the **walrus operator**) that computes mean (`m`) and std (`s`) once, and then uses them for all elements.

In [None]:
data = [10, 20, 30, 40, 50]

normalized = [
    (x - (m := sum(data)/len(data))) / (s := (sum((y - m)**2 for y in data)/len(data))**0.5)
    for x in data
]

print(normalized)


Python does not have built-in functions for mean or standard deviation for lists in the core language. You have to use an external module or build your own custom function.

The following is using the `statistics` module:

In [None]:
import statistics

In [None]:
data = [10, 20, 30, 40, 50]

mean = statistics.mean(data)
std = statistics.stdev(data)  # sample standard deviation by default
normalized = [(x - mean) / std for x in data]

print(normalized)

So, what's the point in doing all that when we have NumPy? Exactly, we should use NumPy's methods and vectorized computations!

Here's a comparison table showing how to compute mean and standard deviation in Python using the three approaches: statistics module, NumPy, and manual calculation.

| Feature                | `statistics` module                                                                                                  | `NumPy`                                                                                                   | Manual calculation                                                                                               |
| ---------------------- | -------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------- |
| **Import needed?**     | Yes: `import statistics`                                                                                             | Yes: `import numpy as np`                                                                                 | No                                                                                                               |
| **Data type**          | Python list (or any iterable)                                                                                        | Python list or NumPy array                                                                                | Python list                                                                                                      |
| **Mean**               | `statistics.mean(data)`                                                                                              | `np.mean(data)` or `data_array.mean()`                                                                    | `sum(data)/len(data)`                                                                                            |
| **Standard deviation** | `statistics.stdev(data)` → sample std <br> `statistics.pstdev(data)` → population std                                | `np.std(data)` → population std <br> `np.std(data, ddof=1)` → sample std                                  | `sqrt(sum((x - mean)**2)/len(data))` → population std <br> `sqrt(sum((x - mean)**2)/(len(data)-1))` → sample std |
| **Performance**        | Good for small datasets                                                                                              | Very fast, optimized for large datasets                                                                   | Slower for large datasets                                                                                        |
| **Example**            | `import statistics` <br> `data = [10,20,30]` <br> `mean = statistics.mean(data)` <br> `std = statistics.stdev(data)` | `import numpy as np` <br> `data = np.array([10,20,30])` <br> `mean = data.mean()` <br> `std = data.std()` | `data = [10,20,30]` <br> `mean = sum(data)/len(data)` <br> `std = (sum((x-mean)**2)/len(data))**0.5`             |


### Practice Exercise 10

Here’s a practice exercise using student test scores. It will help practice NumPy statistical functions with a dataset resembling a classroom scenario.

In [None]:
# Test scores for 5 students across 4 subjects (Math, English, Science, History)
# Rows = Students, Columns = Subjects
scores = np.array([
    [85, 78, 92, 88],   # Student A
    [76, 85, 84, 80],   # Student B
    [90, 92, 88, 95],   # Student C
    [65, 70, 72, 68],   # Student D
    [88, 82, 85, 87]    # Student E
])

students = np.array(["Student A", "Student B", "Student C", "Student D", "Student E"])
subjects = np.array(["Math", "English", "Science", "History"])

#### (a) Average score for each student (row-wise mean)

Expected output:

```python
Student A: 85.75
Student B: 81.25
Student C: 91.25
Student D: 68.75
Student E: 85.50
```

In [None]:
# YOUR CODE HERE


#### (b) Average score for each subject (column-wise mean)

Expected output:

```python
Math: 80.8
English: 81.4
Science: 84.2
History: 83.6
```

In [None]:
# YOUR CODE HERE


### Wrap Up

That's it for now. Please...
- Finish the DC course "Data Manipulation with pandas" by noon 9/22.

BY PRINTING YOUR NAME BELOW, YOU CONFIRM THAT THE EXERCISES YOU SUBMITTED IN THIS NOTEBOOK ARE YOUR OWN AND THAT YOU DID NOT USE AI TO ASSIST WITH YOUR WORK.

In [None]:
# PRINT YOUR NAME
print("Enter Your Name Here")