<div style="text-align:left;font-size:2em"><span style="font-weight:bolder;font-size:1.25em">SP2273 | Learning Portfolio</span><br><br><span style="font-weight:bold;color:darkred">Storing Data (Good)</span></div>

# What to expect in this chapter

You should know how to store data using:
- Lists
- Arrays
- Dictionaries

# 1 Subsetting: Indexing and Slicing

You will often need to select a subset (***subsetting***) of the data in a list (or array). One form of this is picking a *single element* called **indexing** (You already know how to do this from the previous chapter). Another option is to *select a range of elements*. This is called **slicing**.

So, in summary, what we mean when we say:

- **Subsetting** -> means to 'select
- **Indexing** -> refers to selecting one element.
- **Slicing** -> refers to selecting a range of elements.

## 1.1 Lists & Arrays in 1D | Subsetting & Indexing

Since slicing gives us a range of elements, we must specify two indices to indicate where to start and end. The various syntaxes for these are shown in the table below.

The following applies to **both** lists and arrays.

In [6]:
import numpy as np
py_list=["a1", "b2", "c3", "d4", "e5",
         "f6", "g7", "h8", "i9", "j10"]
np_array = np.array(py_list)

In [4]:
x = py_list
print(x)

['a1', 'b2', 'c3', 'd4', 'e5', 'f6', 'g7', 'h8', 'i9', 'j10']


In [5]:
x = np_array
print(x)

['a1' 'b2' 'c3' 'd4' 'e5' 'f6' 'g7' 'h8' 'i9' 'j10']


| **Syntax**  | **Result**                      |                                   | **Note**                                 |
|-------------|---------------------------------|-----------------------------------|------------------------------------------|
| `x[0]`      | First element                   | `'a1'`                            |                                          |
| `x[-1]`     | Last element                    | `'j10'`                           |                                          |
| `x[0:3]`    | Index 0 to 2                    | `['a1','b2','c3']`                | Gives 3 - 0 = 3 elements                 |
| `x[1:6]`    | Index 1 to 5                    | `['b2','c3','d4','e5','f6']`      | Gives 6 - 1 = 5 elements                 |
| `x[1:6:2]`  | Index 1 to 5 in steps of 2      | `['b2','d4','f6']`                | Gives every other of  6 − 1 = 5 elements |
| `x[5:]`     | Index 5 to the end              | `['f6','g7','h8','i9','j10']`     | Gives `len(x)` − 5 = 5 elements          |
| `x[:5]`     | Index 0 to 5                    | `['a1','b2','c3','d4','e5']`      | Gives  5 − 0 = 5 elements                |
| `x[5:2:-1]` | Index 5 to 3 (i.e., in reverse) | `['f6','e5','d4']`                | Gives  5 − 2 = 3 elements                |
| `x[::-1]`   | Reverses the list               | `['j10','i9','h8',...,'b2','a1']` |                                          |         |

**Remember**

Remember slicing in Python can be a bit tricky.
If you slice with` [i:j`], the slice will start at` `i and end at` j-`1, giving you a total of` j-`i elements.

## 1.2 Arrays only | Subsetting by masking

One of the most powerful things you can do with NumPy arrays is subsetting by `masking`. 

To make sense of this, consider the following.

In [7]:
np_array = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
my_mask = np_array > 3
my_mask

array([False, False, False,  True,  True,  True,  True,  True,  True,
        True])

Output:

`array([False, False, False,  True,  True,  True,  True,  True,  True,
        True]`)

The answer to that question is in the form of a ‘Yes’/‘No’ or `True`/`False` format. We can use this `True`/`False` format to ask NumPy to show me **only those that are** **`True`** by

In [8]:
np_array[my_mask]

array([ 4,  5,  6,  7,  8,  9, 10])

This is why the term ‘masking’ is used. The `True`/`False` answer acts like a mask allowing only the `True` subset to be seen.

**Remember**

Remember that subsetting by masking **only** works with NumPy arrays.

Instead of creating another variable, we can also do all of this succinctly as:

In [9]:
np_array[np_array > 3]

array([ 4,  5,  6,  7,  8,  9, 10])

Example 1

In [10]:
np_array[~(np_array > 3)]                 # '~' means 'NOT'

array([1, 2, 3])

We can **invert** our mask by using the `~`.
`
`~ is called the* Bitwise No*t operator.

Example 2

In [11]:
np_array[(np_array > 3) & (np_array < 8)]     # '&' means 'AND'

array([4, 5, 6, 7])

We can combine one mask **AND** another mask.

 `(AND` will show something only if both masks are true.)

Example 3

In [None]:
np_array[(np_array < 3) | (np_array > 8)]     # '|' means 'OR'

We can combine one mask **OR** another mask.
`(O`R will show something if either mask is true.)

**Remember**

- Always use the Bitwise NOT(`~`), Bitwise OR(`|`) and Bitwise AND(`&`) when combining masks with NumPy.
- Always use brackets to clarify what you are asking the mask to do.

## 1.3 Lists & Arrays in 2D | Indexing & Slicing

The differences between lists and arrays become even more apparent with higher dimensional lists and arrays. Especially when you try indexing and slicing in higher dimensions.

Let’s consider the following 2D list.

In [13]:
py_list_2d = [[1, "A"], [2, "B"], [3, "C"], [4, "D"],
              [5, "E"], [6, "F"], [7, "G"], [8, "H"],
              [9, "I"], [10, "J"]]

np_array_2d = np.array(py_list_2d)

Example 1

Question: What is at position 4 (index 3)?

In [15]:
py_list_2d[3]

[4, 'D']

In [16]:
np_array_2d[3]

array(['4', 'D'], dtype='<U11')

Example 2

Question: What is the FIRST element at position 4 (index 3)

In [18]:
py_list_2d[3][0]

4

In [17]:
np_array_2d[3, 0]

'4'

Notice how the syntax for arrays uses just a single pair of square brackets (`[ ]`).

Example 3

Question: What are the first three elements?

In [None]:
py_list_2d[:3]

In [None]:
np_array_2d[:3]

Example 4

In [19]:
py_list_2d[:3][0]

[1, 'A']

In [20]:
np_array_2d[:3, 0]

array(['1', '2', '3'], dtype='<U11')

You might think that this will yield the first elements (i.e., `[1, 2, 3]`) of all the sub-lists up to index 2.

No! Instead, it gives the first of the list you get from `py_list_2d[:3]`.)

Notice how differently NumPy arrays work.

Example 5

In [None]:
py_list_2d[3:6][0]

In [None]:
np_array_2d[3:6, 0]

In [None]:
np_array_2d[:, 0]

If you want ‘everything’ you just use `:`.

## 1.4 Growing lists

NumPy arrays are invaluable, and their slicing syntax (e.g. `[:3,0]`) is more intuitive than lists. So, why do we even bother with lists? One advantage of lists is their ease and efficiency in growing. NumPy arrays are fantastic for fast math operations, **provided you do not change their size**<sup>1</sup>. So, in this section, we only discuss about 'how to grow a list'.

Example 1

Creating a larger list from a smaller one.

In [23]:
x=[1, 2]*5
x

[1, 2, 1, 2, 1, 2, 1, 2, 1, 2]

Example 2

Three ways to grow a list by appending one element at a time.

In [27]:
x=[1]
x= x + [2]
x= x + [3]
x= x + [4]
x

[1, 2, 3, 4]

In [25]:
x=[1]
x+= [2]
x+= [3]
x+= [4]
x

[1, 2, 3, 4]

In [26]:
x=[1]
x.append(2)
x.append(3)
x.append(4)
x

[1, 2, 3, 4]

If you are wondering, there **are** differences between these three versions. Their execution speeds are different; the version with `append()` runs about 1.5 times faster than the rest!

Example 3

Here are three ways of incorporating multiple elements.

Notice the difference between the effects of` extend(`) and` append(`).

In [28]:
x = [1, 2, 3]
x += [4, 5, 6]
x

[1, 2, 3, 4, 5, 6]

In [29]:
x=[1, 2, 3]
x.extend([4, 5, 6])
x

[1, 2, 3, 4, 5, 6]

In [30]:
x=[1, 2, 3]
x.append([4, 5, 6])
x

[1, 2, 3, [4, 5, 6]]

`append()` will add an element specified to the end of the list, while `extend()` will add the elements given to the end of the list.

In other words, the `append()` function can only add only a single element, while the `extend()` function can add multiple elements.

# ***Additional Information***

Elements can also be added to the start or middle of the list using the method: `insert()`. Formula of this method is : `insert(position, element)`.

For example:

In [37]:
name = ["David", "Ryan", "Fauzan", "Vincent"]
name.insert(2, "You")
print(name)

['David', 'Ryan', 'You', 'Fauzan', 'Vincent']


Output: `['David', 'Ryan', 'You', 'Fauzan', 'Vincent']`

*Removing Elements*

To remove unwanted element in the list, we can use the function `remove()`.

For example:

In [40]:
friends = ["David", "Ryan", "Fauzan", "Vincent"]
friends.remove("Vincent")
print(friends)

['David', 'Ryan', 'Fauzan']


Output: `["David", "Ryan", "Fauzan"]`

*Changing Elements*

There comes a time when we want to change an element. It is possible to remove it and insert a new one using previous methods but there is a better and faster way. It is possible to overwrite directly just like variable like this:

In [None]:
friends = ["David", "Ryan", "Fauzan", "Vincent"]
friends[1] = "You"
print(friends)

Output: `["David", "You", "Fauzan", "Vincent"]`

# Some loose ends

## 1.5 Tuples

*Tuples are another kind of sequence that functions much like a list - they have elements which are indexed starting at 0.

*TuTuples are similar to lists, except they us`e (` ) and cannot be changed after creation (i.e., they ar**e immutab**le).

In [None]:
a=(1, 2, 3)     # Define tuple

We can access its data…

In [None]:
print(a[0])    # Access data

But, we cannot change the data.

In [None]:
# The following will NOT work
a[0]=-1
a[0]+= [10]

Tuples are more efficient:

- Since Python does not have to build tuple structures to be modifiable, they are simpler and more efficient in terms of memory use and performance than lists
- So in our program when we are making “temporary variables” we prefer tuples over lists


## 1.6 Be VERY careful when copying

Variables in Python have subtle features that might make your life miserable if you are not careful. You should be particularly mindful when making copies of lists and arrays.

For example, if you want to copy a list, you might be tempted to do the following:

In [33]:
x=[1, 2, 3]
y=x           # DON'T do this!
z=x           # DON'T do this!

The correct way to do this is as follows:

In [35]:
x=[1, 2, 3]
y=x.copy()
z=x.copy()

**Note:** At this stage, you only have to know that you must use `copy()` to be safe; you **do not** have to understand why. However, if you want to, please refer to the discussion on [mutable and immutable objects](https://sps.nus.edu.sg/sp2273/docs/python_basics/03_storing-data/2_storing-data_good.html#sec-python-variables).

# Exercises & Self-Assessment

In [None]:
x = ('Glenn', 'Sally', 'Joseph')
print(x[2])
#The output will be : Joseph

y = ( 1, 9, 2 )
print(y)
#The output will be: (1, 9, 2)


In [44]:
friends = ["David", "Ryan", "Fauzan", "Vincent"]
friends.append("You")
print(friends)

# Output: ['David', 'Ryan', 'Fauzan', 'Vincent', 'You']

friends = ["David", "Ryan", "Fauzan", "Vincent"]
friends.extend(["You", "Me"])
print(friends)

# Output: ['David', 'Ryan', 'Fauzan', 'Vincent', 'You', 'Me']

friends = ["David", "Ryan", "Fauzan", "Vincent"]
friends.append(["You", "Me"])
print(friends)
print(len(friends))

#Output: ['David', 'Ryan', 'Fauzan', 'Vincent', ['You', 'Me']]. 
#In this case, "['You', 'Me']" will be regarded as ONE component. You can check by checking the length of the variable "friends".

['David', 'Ryan', 'Fauzan', 'Vincent', 'You']
['David', 'Ryan', 'Fauzan', 'Vincent', 'You', 'Me']
['David', 'Ryan', 'Fauzan', 'Vincent', ['You', 'Me']]
5


## Footnotes

1. The gains in speed are due to NumPy doing things to all the elements in the array in one go. For this, the data needs to be stored in a specific order in memory. Adding or removing elements hinders this optimization. When you change the size of a NumPy array, NumPy destroys the existing array and creates a new one, making it extremely inefficient.