# How to Use Arrays (ndarray)

```{note}
This page was not shared with MUDE students in 2023-2024 (year 2).

It may have been a new page, or a modified page from year 1.

There may be pages in year 1 and year 2 that are nearly identical, or have significant modifications. Modifications usually were to reformat the notebooks to fit in a jupyter book framework better.
```

text

This notebook is based on the Numpy lesson from [Aalto Scientific Computing: Python for Scientific Computing](https://github.com/AaltoSciComp/python-for-scicomp/) and [W3Schools](https://www.w3schools.com/python/numpy/).

## Indexing and Slicing

NumPy has many ways to extract values out of arrays:

- You can select a single element
- You can select rows or columns
- You can select ranges where a condition is true.

An example of some ways of indexing is shown in the following image (credits GeeksForGeeks):

<img src="https://media.geeksforgeeks.org/wp-content/uploads/Numpy1.jpg" alt="indexing" style="width:400px;"/>



Clever and efficient use of these operations is a key to NumPy's speed. 

<font color='red'>Reminder: In python, all indexing starts at zero, so to select the index of the first element in a list or numpy array is represented by a 0!</font>

In [None]:
a = np.arange(16).reshape(4, 4)        # 4x4 matrix from 0 to 15
print(f'a:\n{a}\n')
print(f'a[0]:\n{a[0]}\n')              # first row
print(f'a[:,0]:\n{a[:,0]}\n')          # first column
print(f'a[1:3,1:3]:\n{a[1:3,1:3]}\n')  # middle 2x2 array

print(f'a[(0, 1), (1, 1)]:\n{a[(0, 1), (1, 1)]}')               # second element of first and second row as array

You can also perform *boolean indexing* on arrays, such as shown below:

In [None]:
print(f'a > 7:\n{a > 7}\n')        # creates boolean matrix of same size as a 
print(f'a[a > 7]:\n{a[a > 7]}\n')  # array with matching values of above criterion

---
### <font color='red'>Exercise</font>

For the reshaped taxi ride duration array `taxi_weeks`, create the following arrays using slicing:
- An array containing only daily total durations of *fridays* 
- An array containing *monday's* total durations from week 2 up to week 5
- An array containing only entries with a total duration of more than 600 minutes

In [None]:
fridays = 'Your code here'
print(fridays)

mondays_week_2_to_5 = 'Your code here'
print(mondays_week_2_to_5)

total_duration_over_6000 = 'Your code here'
print(total_duration_over_6000)

### <font color='red'>Exercise</font>

The reshaped array `taxi_weeks` currently starts on a friday because this is the first day of the year. People often prefer to have the first column of the array corresponding to a monday instead. Using *slicing* and *reshaping*, create a new version of `taxi_weeks` from the `durations` array where the first column represents monday and chronological order is maintained.

> Hint: It is easier if you remove some observations at the beginning and the end because they are not part of a full week of observations.

In [None]:
taxi_weeks_monday = 'Your code here'

Again, we visualise the result:

In [None]:
labels = ['monday', 'tuesday', 'wednesday', 'thursday','friday', 'saturday', 'sunday']
plot_taxi_weeks(taxi_weeks_monday,labels)

---
## Array reshaping
Arrays can be [reshaped](https://numpy.org/doc/stable/reference/generated/numpy.reshape.html) in many different ways, as long as the number of entries in the new shape does not differ from the number of entries in the original array. 
For example, the following array can be reshaped into a 3 by 3 array:

<img src="./1dim.png" alt="drawing" width="600"/>

By reshaping this array into a 3 by 3 array using the default reading order, the following array is created:

<img src="./2dim.png" alt="drawindg" style="width:200px;"/>


In [None]:
arr = np.arange(10)
print(f'original:\n{arr}')
print(f'\n5 rows and 2 columns:\n{arr.reshape((5, 2))}')
print(f'\n2 rows and 5 columns:\n{arr.reshape((2, -1))}') # -1 provides the fitting lenght of the dimension
print(f'\n1 row and 5 columns:\n{arr.reshape((1, 5))}')   # This action will cause an error because 
                                                          # 10 entries do not fit in a 1 by 5 array

---
### <font color='red'>Exercise</font>

Reshape the Taxi array as loaded in the previous exercise such that the array columns represent weekdays and the array rows represent different weeks in the period of the data set. Note that the first day of the year 2016 was a *friday*, so the week representation in the columns will start at *friday*.

In [None]:
taxi_weeks = 'Your code here'

A visualization of the reshaped array:

In [None]:
from plotting_functions import plot_taxi_weeks
plot_taxi_weeks(taxi_weeks, labels = ['friday','saturday','sunday','monday','tuesday','wednesday','thursday'])

---
## View vs copy
See the cell below:

In [None]:
a = np.eye(4)         # Create an array
print(f'a:\n{a}\n')   # Print a

b = a[:,0]            # Set variable b as the first column of b
b[0] = 5              # Set all elements in b to 5
print(f'b:\n{b}\n')   # print b

print(f'a:\n{a}\n')   # print a again

The change in ``b`` has also changed the array ``a``!
This is because ``b`` is merely a *view* of a part of array ``a``.  Both
variables point to the same memory. Hence, if one is changed, the other
one also changes! If you need to keep the original array as is, use `np.copy(a)` or `a.copy()`.

In [None]:
a = np.eye(4)         # Create an array
print(f'a:\n{a}\n')   # Print a 

b = np.copy(a)[:,0]   # Set variable b as a copy of the first column of b
b[0] = 5              # Set all elements in b to 5
print(f'b:\n{b}\n')   # print b

print(f'a:\n{a}\n')   # print a again

---
## Saving and loading arrays
When working with arrays, it might be useful to save or load an array to a file on your computer. This can be done using the `np.save()` and `np.load()` functions respectively:

In [None]:
arr = np.linspace(0, 10, 11)  # Create an array
print(f'arr:\n{arr}')

np.save('arr.npy', arr)       # Save the array to a file on your computer
arr = None                    # Setting the arr parameter to None
print(f'arr:\n{arr}')

arr = np.load('arr.npy')      # Load the array from the created .npy file 
print(arr)

You now saved `arr.npy` such that you can use it later and in different scripts! It is also possible to load csv or txt files using the `np.loadtxt()` function. by passing the correct string representing the delimiter character, a txt or csv file can be loaded as an array:

In [None]:
arr_from_csv = np.loadtxt('./numpy_files/example_data.csv', delimiter=',')  # This file uses the comma as the seperating character

arr_from_txt = np.loadtxt('./numpy_files/example_data.txt', delimiter='\t') # This file uses a tab as the seperating character

print(f'array from csv file:\n{arr_from_csv}\n')
print(f'array from txt file:\n{arr_from_txt}')

---
### <font color='red'>Exercise</font>

Load the provided csv `taxi_duration.txt` using the `np.loadtxt` function. The text file contains two columns: one representing the day of the year, and the other representing the daily total duration of taxi rides corresponding to the day of the year. Check the number of days in your loaded dataset. You can preview the file in a text editor if you want.

In [None]:
taxis = 'Your code here'
# print(f'The dataset is {len(taxis)} days long.')

Now, visualize the dataset by running the cell below

In [None]:
from numpy_files.plotting_functions import plot_taxi_time_series
plot_taxi_time_series(taxis)