<div style="text-align:left;font-size:2em"><span style="font-weight:bolder;font-size:1.25em">SP2273 | Learning Portfolio</span><br><br><span style="font-weight:bold;color:darkred">Storing Data (Good)</span></div>

# What to expect in this chapter

- More details on accessing and modifying these structures.

This is important because most of what you do with programming is related to accessing and changing data. 

- Gain a better understanding of the differences and similarities between lists, NumPy arrays and dictionaries.



# 1 Subsetting: Indexing and Slicing

You will often need to select a subset (<span style="color:orange">subsetting</span>) of the data in a list (or array).

One form of this is picking a single element called <span style="color:orange">indexing</span> (from the previous chapters).

Another option is to select a range of elements - <span style="color:orange">slicing</span>.

**Subsetting** - select

**Indexing** - selecting one element

**Slicing** - selecting a range of elements

## 1.1 Lists & Arrays in 1D | Subsetting & Indexing

Since slicing gives us a range of elements, we must specify two indices to indicate where to start and end. The various syntaxes for these are shown in the table below.

The following applies to **both** lists and arrays.


In [2]:
import numpy as np

In [10]:
py_list=["a1", "b2", "c3", "d4", "e5",
         "f6", "g7", "h8", "i9", "j10"]
np_array=np.array(py_list)

x = py_list

In [17]:
x[0]     #first element

'a1'

In [18]:
x[-1]	 #last element

'j10'

In [19]:
x[0:3]	 #index 0 to 2

['a1', 'b2', 'c3']

In [20]:
x[1:6]	#index 1 to 5

['b2', 'c3', 'd4', 'e5', 'f6']

In [21]:
x[1:6:2] #index 1 to 5 in steps of 2

['b2', 'd4', 'f6']

In [22]:
x[5:]	#index 5 to the end

['f6', 'g7', 'h8', 'i9', 'j10']

In [23]:
x[:5]	#index 0 to 5

['a1', 'b2', 'c3', 'd4', 'e5']

In [24]:
x[5:2:-1]	#index 5 to 3 (i.e., in reverse)

['f6', 'e5', 'd4']

In [25]:
x[::-1]	 #reverses the list

['j10', 'i9', 'h8', 'g7', 'f6', 'e5', 'd4', 'c3', 'b2', 'a1']

If you slice with <span style="color:purple">[i:j]</span> , the slice will start at <span style="color:purple">i</span> and end at <span style="color:purple">j-1</span>, giving you a total of <span style="color:purple">j-1</span> elements.



## 1.2 Arrays only | Subsetting by masking

One of the most powerful things you can do with NumPy arrays is subsetting by **masking**. To make sense of this, consider the following.



In [26]:
np_array = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
my_mask = np_array > 3
my_mask

array([False, False, False,  True,  True,  True,  True,  True,  True,
        True])

The answer to my question is in the form of a ‘Yes’/‘No’ or True/False format. I can use this True/False format to ask NumPy to show me only those that are True by:


In [29]:
np_array[my_mask]   #only works with NumPy arrays

array([ 4,  5,  6,  7,  8,  9, 10])

This is why I used the term ‘masking’. The True/False answer acts like a mask allowing only the True subset to be seen.



Instead of creating another variable, I can also do all of this succinctly as:

In [32]:
np_array[np_array > 3]

array([ 4,  5,  6,  7,  8,  9, 10])

### Example 1

In [34]:
np_array[~(np_array > 3)]                 # '~' means 'NOT'

# We can invert our mask by using the ~, called the Bitwise Not operator.

array([1, 2, 3])

### Example 2



In [38]:
np_array[(np_array > 3) & (np_array < 8)]    # '&' means 'AND'

# We can combine one mask AND another mask. 
# AND will show something only if both masks are true.

array([4, 5, 6, 7])

### Example 3


In [39]:
np_array[(np_array < 3) | (np_array > 8)] # '|' means 'OR'

# We can combine one mask OR another mask.
# OR will show something if either mask is true.

array([ 1,  2,  9, 10])

- Always use the Bitwise NOT(~), Bitwise OR(|) and Bitwise AND(&) when combining masks with NumPy.
- Always use brackets to clarify what you are asking the mask to do.


## 1.3 Lists & Arrays in 2D | Indexing & Slicing

The differences between lists and arrays become even more apparent with higher dimensional lists and arrays. Especially when you try indexing and slicing in higher dimensions.

Let’s consider the following 2D list.




In [40]:
py_list_2d = [[1, "A"], [2, "B"], [3, "C"], [4, "D"],
              [5, "E"], [6, "F"], [7, "G"], [8, "H"],
              [9, "I"], [10, "J"]]

np_array_2d = np.array(py_list_2d)

### Example 1

What is at position 4 (index 3)?

In [45]:
py_list_2d[3]
np_array_2d[3]

array(['4', 'D'], dtype='<U21')

### Example 2

What is the FIRST element at position 4 (index 3)?



In [46]:
py_list_2d[3][0]
np_array_2d[3, 0]

'4'

### Example 3

What are the first three elements?

In [47]:
py_list_2d[:3]
np_array_2d[:3]

array([['1', 'A'],
       ['2', 'B'],
       ['3', 'C']], dtype='<U21')

### Example 4

In [48]:
py_list_2d[:3][0]
np_array_2d[:3, 0]

array(['1', '2', '3'], dtype='<U21')

You might think that this will yield the first elements (i.e., [1, 2, 3]) of all the sub-lists up to index 2.
No! Instead, it gives the first of the list you get from py_list_2d[:3].

### Example 5

In [51]:
py_list_2d[3:6][0]
np_array_2d[3:6, 0]

array(['4', '5', '6'], dtype='<U21')

In [53]:
np_array_2d[:, 0]  #if i want everything

array(['1', '2', '3', '4', '5', '6', '7', '8', '9', '10'], dtype='<U21')

## 1.4 Growing lists

NumPy arrays are invaluable, and their slicing syntax (e.g. <span style="color:orange">[:3,0]</span>) is more intuitive than lists. 

So, why do we even bother with lists? 

One advantage of lists is their ease and efficiency in growing. NumPy arrays are fantastic for fast math operations, provided you do not change their size.



### Example 1

Creating a larger list from a smaller one.


In [56]:
x=[1, 2]*5
x

[1, 2, 1, 2, 1, 2, 1, 2, 1, 2]

### Example 2

Three ways to grow a list by appending one element at a time.



In [58]:
x=[1]
x= x + [2]
x= x + [3]
x= x + [4]
x

[1, 2, 3, 4]

In [59]:
x=[1]
x+= [2]
x+= [3]
x+= [4]
x

[1, 2, 3, 4]

In [60]:
x=[1]
x.append(2)
x.append(3)
x.append(4)
x

[1, 2, 3, 4]

There are differences between these three versions. 

Their execution speeds are different; the version with <span style="color:orange">append()</span> runs about 1.5 times faster than the rest!



### Example 3

Here are three ways of incorporating multiple elements.
Notice the difference between the effects of <span style="color:orange">extend()</span> and <span style="color:orange">append()</span>.



In [61]:
x = [1, 2, 3]
x += [4, 5, 6]
x

[1, 2, 3, 4, 5, 6]

In [62]:
x=[1, 2, 3]
x.extend([4, 5, 6])
x

[1, 2, 3, 4, 5, 6]

In [63]:
x=[1, 2, 3]
x.append([4, 5, 6])
x

[1, 2, 3, [4, 5, 6]]

# Some loose ends

## 1.5 Tuples

Tuples are similar to lists, except they use ( ) and cannot be changed after creation (i.e., they are immutable).

Let me first create a simple tuple.

In [64]:
a=(1, 2, 3)     # Define tuple

In [65]:
print(a[0])    # Access data

1


In [66]:
# The following will NOT work as data cannot be changed
a[0]=-1
a[0]+= [10]

TypeError: 'tuple' object does not support item assignment

## 1.6 Be VERY careful when copying

Variables in Python have subtle features that might make your life miserable if you are not careful. 

You should be particularly mindful when making copies of lists and arrays.



For example, if you want to copy a list, you might be tempted to do the following; PLEASE DON’T!



In [67]:
x=[1, 2, 3]
y=x           # DON'T do this!
z=x           # DON'T do this!

In [68]:
#correct method
x=[1, 2, 3]
y=x.copy()
z=x.copy()

Note: At this stage, you only have to know that you must use copy() to be safe; you do not have to understand why. 

