<div style="text-align:left;font-size:2em"><span style="font-weight:bolder;font-size:1.25em">SP2273 | Learning Portfolio</span><br><br><span style="font-weight:bold;color:darkred">Storing Data (Good)</span></div>

# What to expect in this chapter

You should now know how to store data using lists, arrays and dictionaries. I will now show you more details on accessing and modifying these structures. This is important because most of what you do with programming is related to accessing and changing data. You will also gain a better understanding of the differences and similarities between lists, NumPy arrays and dictionaries.

# 1 Subsetting: Indexing and Slicing

You will often need to select a subset (subsetting) of the data in a list (or array). One form of this is picking a single element called indexing (see previous chapter). Another option is to select a range of elements in a process called slicing.

So, in summary, what we mean when we say…

- Subsetting means to ‘select’.
- Indexing refers to selecting one element.
- Slicing refers to selecting a range of elements.

## 1.1 Lists & Arrays in 1D | Subsetting & Indexing

Since slicing gives us a range of elements, we must specify two indices to indicate where to start and end. The various syntaxes for these are shown in the table below.

The following applies to both lists and arrays.

In [2]:
import numpy as np

In [3]:
py_list=["a1", "b2", "c3", "d4", "e5",
         "f6", "g7", "h8", "i9", "j10"]
np_array=np.array(py_list)

# Pick one
x = py_list  # OR
x = np_array

__There are various ways you can slice__. 
Below shows the various syntax you can use to slice and the related output you'll get

| Syntax    | Result                          |                                 | Note                                     |
|-----------|---------------------------------|---------------------------------|------------------------------------------|
| `x[0]`      | First element                   | `'a1'`                            |                                          |
| `x[-1]`     | Last element                    | `'j10'`                           |                                          |
| `x[0:3]`    | Index 0 to 2                    | `['a1','b2','c3']`                | Gives  `3 − 0 = 3` elements                |
| `x[1:6]`    | Index 1 to 5                    | `['b2','c3','d4','e5','f6']`      | Gives  `6 − 1 = 5` elements                |
| `x[1:6:2]`  | Index 1 to 5 in steps of 2      | `['b2','d4','f6']`                | Gives every other of  `6 − 1 = 5` elements |
| `x[5:]`     | Index 5 to the end              | `['f6','g7','h8','i9','j10']`     | Gives `len(x) − 5 = 5` elements            |
| `x[:5]`     | Index 0 to 5                    | `['a1','b2','c3','d4','e5']`      | Gives  `5 − 0 = 5` elements                |
| `x[5:2:-1]` | Index 5 to 3 (i.e., in reverse) | `['f6','e5','d4']`                | Gives  `5 − 2 = 3` elements                |
| `x[::-1]`   | Reverses the list               | `['j10','i9','h8',...,'b2','a1']` |                                          |

## 1.2 Arrays only | Subsetting by masking

One of the most powerful things you can do with NumPy arrays is subsetting by __masking__. To make sense of this, consider the following.

In [4]:
np_array = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
my_mask = np_array > 3
my_mask

array([False, False, False,  True,  True,  True,  True,  True,  True,
        True])

In [5]:
#Now I can I can use this True/False format to ask NumPy to 
#show me only those that are True by...
np_array[my_mask]

array([ 4,  5,  6,  7,  8,  9, 10])

This is why I used the term ‘masking’. The `True/False` answer acts like a mask allowing only the `True` subset to be seen. __Masking only works with Numpy Arrays.__

In [6]:
#Also, instead of creating another variable, I can also do all of this succinctly as:
np_array[np_array > 3]


array([ 4,  5,  6,  7,  8,  9, 10])

__A few more examples!__

- We can __invert__ our mask by using the ~.
~ is called the __Bitwise Not__ operator.

In [7]:
np_array[~(np_array > 3)]                 # '~' means 'NOT'

array([1, 2, 3])

- We can combine one mask __AND__ (`&`) another mask.
(`AND` will show something only if both masks are true.)

In [8]:
np_array[(np_array > 3) & (np_array < 8)] # '&' means 'AND'

array([4, 5, 6, 7])

- We can combine one mask __OR__ (`|`) another mask. (`OR` will show something if either mask is true.)

In [9]:
np_array[(np_array < 3) | (np_array > 8)] # '|' means 'OR'

array([ 1,  2,  9, 10])

### Remember
- Always use the Bitwise NOT(`~`), Bitwise OR(`|`) and Bitwise AND(`&`) when combining masks with NumPy.
- Always use brackets to clarify what you are asking the mask to do.


## 1.3 Lists & Arrays in 2D | Indexing & Slicing

The differences between lists and arrays become even more apparent with higher dimensional lists and arrays. Especially when you try _indexing and slicing in higher dimensions_.

In [10]:
#Let’s consider the following 2D list
py_list_2d = [[1, "A"], [2, "B"], [3, "C"], [4, "D"],
              [5, "E"], [6, "F"], [7, "G"], [8, "H"],
              [9, "I"], [10, "J"]]

np_array_2d = np.array(py_list_2d)

__Example 1__: What is at index 3?

In [11]:
#For a list
py_list_2d[3]

[4, 'D']

In [12]:
#Syntax for an array:
np_array_2d[3]

array(['4', 'D'], dtype='<U21')

__Example 2__: What is the FIRST element at index 3?

In [14]:
#For a list
py_list_2d[3][0]


4

In [16]:
#For an array
np_array_2d[3, 0] # a single pair of square brackets [] 

'4'

__Example 3__: What are the first 3 elements?

In [17]:
# For a list
py_list_2d[:3]

[[1, 'A'], [2, 'B'], [3, 'C']]

In [18]:
#For an array
np_array_2d[:3]


array([['1', 'A'],
       ['2', 'B'],
       ['3', 'C']], dtype='<U21')

__Example 4__: Getting the FIRST of the first 3 element pairs

In [19]:
#For a list
py_list_2d[:3][0]

[1, 'A']

In [21]:
#For an array
np_array_2d[:3, 0] #NOTICE! it's different!

array(['1', '2', '3'], dtype='<U21')

 __For an array, it returns the first value of the first 3 pairs__

__Example 5__: Indexing the first of elements within a specified range.

Here's the list again just for reference
```python
#Let’s consider the following 2D list
py_list_2d = [[1, "A"], [2, "B"], [3, "C"], [4, "D"],
              [5, "E"], [6, "F"], [7, "G"], [8, "H"],
              [9, "I"], [10, "J"]]

```

In [3]:
py_list_2d = [[1, "A"], [2, "B"], [3, "C"], [4, "D"],
              [5, "E"], [6, "F"], [7, "G"], [8, "H"],
              [9, "I"], [10, "J"]]


In [22]:
#For a list
#Indexing for the first element of the elements inbetween the 3rd to 5th index (inclusive)
py_list_2d[3:6][0]

[4, 'D']

In [4]:
#YZ suggestion:
#Notice how [3:6] indexes for the 3rd, 4th and 5th elements
print(py_list_2d[3:6])

[[4, 'D'], [5, 'E'], [6, 'F']]


In [23]:
#For an array
np_array_2d[3:6, 0]
# This gives the first element in the paired data in between the 3rd and 6th index (inclusive)


array(['4', '5', '6'], dtype='<U21')

__If you want ‘everything’ you just use `:` in an array__

In [24]:
np_array_2d[:, 0]

array(['1', '2', '3', '4', '5', '6', '7', '8', '9', '10'], dtype='<U21')

## 1.4 Growing lists

NumPy arrays are invaluable, and their slicing syntax (e.g. `[:3,0]`) is more intuitive than lists. 

So, why do we even bother with lists? __One advantage of lists is their ease and efficiency in growing.__ NumPy arrays are fantastic for fast math operations, provided you do not change their size. We'll not discuss how to change the size of a NumPy array here in this course. 

__Instead, let me show you how to grow a list.__ This will be _useful_ later; for instance when you try to solve _differential equations_ numerically.

__Example 1:__ creating a larger list from a smaller one with `*`


In [25]:
x=[1, 2]*5
x

[1, 2, 1, 2, 1, 2, 1, 2, 1, 2]

__Example 2:__ three ways to grow a list by appending one element at a time

In [27]:
#Syntax type 1

#We just start with x=1
x=[1]

#Then we grow it with + one at a time
x= x + [2]
x= x + [3]
x= x + [4]
x


[1, 2, 3, 4]

In [30]:
#Syntax type 2

#Start with x=1
x=[1]

#Then we grow it with += one at a time
x+= [2]
x+= [3]
x+= [4]
x

[1, 2, 3, 4]

In [29]:
#Syntax type 3

#Start with x=1
x=[1]

#Then we grow it with .append one at a time
x.append(2)
x.append(3)
x.append(4)
x

[1, 2, 3, 4]

Only difference between them? Their execution speeds are different; the version with append() runs about 1.5 times faster than the rest!

__Example 3:__ three ways of incorporating multiple elements.

Notice the difference between the effects of `extend()` and `append()`.

In [38]:
#Syntax 1 +=
x = [1, 2, 3]
x += [4, 5, 6]
x
#Note that the last line here is the same as print(x), which you could use all the same.
#It's a quirk of jupyter notebook that it'll always output the last line when you run.
#Note not all python based softwares will do this. All python softwares will definitely give an output with print(x).

[1, 2, 3, 4, 5, 6]


Ending 

In [36]:
#Syntax 2 .extend
x=[1, 2, 3]
x.extend([4, 5, 6])
x
#Makes a 1d list

[1, 2, 3, 4, 5, 6]

In [37]:
#Syntax 3 .append
x=[1, 2, 3]
x.append([4, 5, 6])
x
#Makes a 2d list, a list in a list (2 pairs of [])

[1, 2, 3, [4, 5, 6]]

# Some loose ends

## 1.5 Tuples

Before we end this section, I must introduce you to another data storage structure called a __tuple.__

Tuples are __similar to lists, except they use ( ) and cannot be changed after creation__ (i.e., they are ___immutable___).

Let me first create a simple tuple.

In [39]:
a=(1, 2, 3)     # Define tuple

In [40]:
# Access its data...
print(a[0])    # Access the first element in it


1


In [42]:
# But, we can't change the data.
a[0]=-1

TypeError: 'tuple' object does not support item assignment

In [43]:
a[0]+= [10]

TypeError: unsupported operand type(s) for +=: 'int' and 'list'

## 1.6 Be VERY careful when copying

Variables in Python have subtle features that might make your life miserable if you are not careful. You should be particularly mindful when making copies of lists and arrays.


For example, if you want to copy a list, you might be tempted to do the following; PLEASE DON’T!



In [45]:
x=[1, 2, 3]
y=x           # DON'T do this!
z=x           # DON'T do this!
z

[1, 2, 3]

__Why? what's happening:__

![](https://miro.medium.com/v2/resize:fit:1400/format:webp/1*1i1NuGvSvgsgcwxshyFGOw.png)

- So, in case of mutable objects (like a list) the assigning method (using “=”) will just name the object a different "name" but it carries the exact same ID. so in our example above, `x`, `y`, and `z` all tag to the same ID. This can cause issues when you do more things with the list and or x,y or z downstream since to python wht are all actually the same object. (e.g. if you were to .append y to grow it, you would also be growing x and z unintentionally.)

- `.copy` will assign a entirely new and unique ID to the list that you've assigned to a new name to. This will avoid downstream list appending mishaps since now each list x,y, and z is its own unique list with its own unique ID.


In [47]:
# The CORRECT way to do this is as follows

x=[1, 2, 3]
y=x.copy()
z=x.copy()
z

[1, 2, 3]

# Exercises & Self-Assessment

In [None]:



# In the Storing data good exercise file




## Footnotes