<div style="text-align:left;font-size:2em"><span style="font-weight:bolder;font-size:1.25em">SP2273 | Learning Portfolio</span><br><br><span style="font-weight:bold;color:darkred">Storing Data (Good)</span></div>

In [4]:
#Before I forget, 
import numpy as np

# Subsetting: Indexing and Slicing

To summarize the words in the heading, 
- **Subsetting**: Select data from a list or array (for this course)
    - **Indexing**: Is a form of subsetting. It means chosing just one element.
    - **Slicing**: Is also a form of subsetting. It means chosing a range of elements.  

## Lists & Arrays in 1D | Subsetting & Indexing

To begin this, let us create a list (and an array) for us to play around with. 

In [3]:
py_list = ["Old","Mcdonald","had","a","farm","ee","aai","eey","aay","oo"]
np_array = np.array(py_list)

#We can assume x=py_list or x=np_array

Now the following results will work with both py_list and np_array. 

**Indexing**
|Syntax|Result|Output|Note|
|---|---|---|---|
|```x[0]```|First Element|```'Old'```|Remember, 0 base indexing|
|```x[-1]```|Last element|```"oo"```|Remember, -1 base indexing for negative numbers|

**Slicing**
|Syntax|Result|Output|Note|
|---|---|---|---|
|```x[0:4]```|Index 0 to 3 (Position 1 to 4)|```['Old','Mcdonald','had','a']```|$4-0=4$ elements|
|```x[4:7]```|Index 4 to 6 (Position 5 to 7)|```['farm','ee','aai']```|$7-4=3$ elements|
|```x[1:7:2]```|Index 1 to 6 in steps of 2 (Position 2,4,6)|```['Mcdonald','a','ee']```|$7-1=6$, every 2nd element|
|```x[7:]```|Index 7 to the end (Position 8 to 10)|```['eey','aay','oo']```|```len(x)```$-7$ elements|
|```x[:7]```|Index 0 to 6 (Position 1 to 7)|```['Old','Mcdonald','had','a','farm','ee','aai']```|Gives $7-0=7$ elements|
|```x[6:1:-1]```|Index 6 to 2, reversed (Position 7 to 2)|```['aai','ee','farm','a','had']```|Gives $6-1 =5$ elements|
|```x[::-1]```|Reverse the list|```['oo','aay','eey','aai','ee','farm','a','had','Mcdonald','Old']```|Magic|


TLDR: ```[i:j]``` slice implies $j-i$ elements will be printed. 

In [5]:
x=np_array
print(x[0:4], x[4:7] , x[1:7:2] , x[7:] , x[:7] , x[6:1:-1] , x[::-1])

['Old' 'Mcdonald' 'had' 'a'] ['farm' 'ee' 'aai'] ['Mcdonald' 'a' 'ee'] ['eey' 'aay' 'oo'] ['Old' 'Mcdonald' 'had' 'a' 'farm' 'ee' 'aai'] ['aai' 'ee' 'farm' 'a' 'had'] ['oo' 'aay' 'eey' 'aai' 'ee' 'farm' 'a' 'had' 'Mcdonald' 'Old']


## Arrays only | Subsetting by masking

Masking is a fun thing we can do only with NumPy arrays. 

In [6]:
#Creating another list, because mcdonald is tiring
numpy_array = np.array([2,4,6,8,9,24,3278234242344])
mask = numpy_array>95
mask

array([False, False, False, False, False, False,  True])

Basically masks assigns Truthy and Falsy values to elements in the list. We can just get all the True elements printed. 

In [7]:
numpy_array[mask]

array([3278234242344], dtype=int64)

All this can be actually done in one line, 

In [9]:
numpy_array[numpy_array<84]

array([ 2,  4,  6,  8,  9, 24], dtype=int64)

### Other masking operations

**Inverting**

To do this, we can use the Bitwise Not operator, ```~```

In [10]:
numpy_array[~(numpy_array<84)]

array([3278234242344], dtype=int64)

**Combining masks with AND**

To do this, we can use the Bitwise And operator, ```&```

In [14]:
numpy_array[(numpy_array<5) & (numpy_array>5000)] #we expect to see an empty list

array([], dtype=int64)

In [13]:
numpy_array[(numpy_array<5000) & (numpy_array>5)]

array([ 6,  8,  9, 24], dtype=int64)

**Combining Masks with OR**

To do this, we can use the Bitwise Or operator, ```|```

In [15]:
numpy_array[(numpy_array<5)|(numpy_array>5000)]

array([            2,             4, 3278234242344], dtype=int64)

## Lists & Arrays in 2D | Indexing & Slicing

To get started with this section, time to create a 2D list.

In [5]:
d2_list = [
    [1,'Twinkle Twinkle'],[2,'Little'],[3,'Stars'],[4,'How'],[5,'Wonder'],[6,'What'],[7,'you'],[8,'are'],[9,'up'],[10,'above']
]
d2_array = np.array(d2_list)

Now, we can look at some examples. We will notice the differences between lists and arrays soon. 

**Indexing one element**

In [7]:
print(d2_list[5])
d2_array[5]

[6, 'What']


array(['6', 'What'], dtype='<U15')

As seen above, the single index call for both lists and arrays is the same. 

**Indexing an element inside a sublist**

In [8]:
print(d2_list[3][0])
d2_array[3,0]

4


'4'

As seen above, arrays only require one pair of sqaure brakets ```[]``` to index one element. 

**Indexing multiple elements**

In [9]:
print(d2_list[:5])
d2_array[:5]

[[1, 'Twinkle Twinkle'], [2, 'Little'], [3, 'Stars'], [4, 'How'], [5, 'Wonder']]


array([['1', 'Twinkle Twinkle'],
       ['2', 'Little'],
       ['3', 'Stars'],
       ['4', 'How'],
       ['5', 'Wonder']], dtype='<U15')

As seem above, both can pull out the arrays for the first 5 elements. 

**Indexing a list of elements in a sublist**

In [10]:
print(d2_list[:3][0])
d2_array[:3,0]

[1, 'Twinkle Twinkle']


array(['1', '2', '3'], dtype='<U15')

Here, NumPy arrays are superior, because they can print the first elements of the first three subarrays inside the array. However lists fail to produce this output.

**More indexing for multiple elements in a sublist**

In [11]:
print(d2_list[3:6][0])
print(d2_array[3:6,0])

[4, 'How']
['4' '5' '6']


As said before, only NumPy arrays can print the first element in the list. 

In [12]:
d2_array[:,0]

array(['1', '2', '3', '4', '5', '6', '7', '8', '9', '10'], dtype='<U15')

Using ```:``` prints all the elements. 

## Growing lists

While NumPy slicing and subsetting is very useful, it is easier to grow or increase lists. Moreover NumPy arrays are very **fussy about their size**, so it is not a good idea generally to change their size. 

In [13]:
#Creating another list
x=[1,2,3,4,5,6,7,8,9,10]

**Increasing list size by appending list to itself**

In [14]:
x*5

[1,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 1,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 1,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 1,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 1,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10]

There is nothing more to say here, the list has grown. 

**Appending one element at a time**

In [15]:
x=[1]
x=x+[2]+[3]
x

[1, 2, 3]

In [17]:
x=[9]
x+=[5]
x

[9, 5]

In [19]:
x=[6]
x.append(8)
x

[6, 8]

As seen in the various syntax options above, we can append one element at a time to the list and make the list grow as well. **The addition here represents the operation concatenation**.

With respect to speed of different operations, 

In [24]:
%%timeit
x=[1]
x.append(2)

59.3 ns ± 1.54 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


In [25]:
%%timeit
x=[1]
x=x+[2]

126 ns ± 18.6 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


In [26]:
%%timeit
x=[1]
x+=[2]

135 ns ± 21.5 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


As seen above, ```.append``` is MUCH faster than the other two.

**Incorporating multiple elements**

In [27]:
x=[4,5,6,7,8,4]
x+=[1,1,1,1,1,1,1]
x

[4, 5, 6, 7, 8, 4, 1, 1, 1, 1, 1, 1, 1]

In [28]:
x=[35,32523,5235,235,2342,53,456,456,4]
x.extend([241234,123123,123123,123])
x

[35, 32523, 5235, 235, 2342, 53, 456, 456, 4, 241234, 123123, 123123, 123]

In [30]:
x=[23423423,42344,54756,8,907,867,75,64,563,45234]
x.append([34,523523423,42353456546,45653453,6536456])
x

[23423423,
 42344,
 54756,
 8,
 907,
 867,
 75,
 64,
 563,
 45234,
 [34, 523523423, 42353456546, 45653453, 6536456]]

As seen above, ```.append``` creates a sublist inside the original list. However ```.extend``` grows the list and does not create the sublist.

# Some loose ends

## Tuples

Tuples are a form of immutable data type (they can't be changed after creation, which could be useful at times). They are defined by using a set of parentheses, ```()```.

In [31]:
a=(9,8,7,6,4)
print(a[4])

4


As seen above, the data can be accessed. 

In [33]:
#this will not work because tuples are immutable
a[0]=-4
a+=[10]
#Error --> TypeError: 'tuple' object does not support item assignment.

TypeError: 'tuple' object does not support item assignment

## Be VERY careful when copying

As already stated in **SP2273 Fundamentals (Nice)**, the below should NOT BE DONE.

In [34]:
x=[4,67,235,123]
y=x
z=x

The issue with the above is the assignment of location, 

In [35]:
print(id(x),id(y),id(z))

2549181406080 2549181406080 2549181406080


This implies if we make a change to any of ```x```,```y``` or ```z```, the other two will be automatically affected.

In [38]:
z+=[1]
print(x)
y

[4, 67, 235, 123, 1, 1]


[4, 67, 235, 123, 1, 1]

Therefore to make copies, we use ```.copy```.

In [39]:
x=[1,2,3,3253,3,46354,6,345,323424]
y=x.copy()
z=x.copy()
print(id(x),id(y),id(z))

2549181563648 2549181572864 2549181543552


As seen above, ```x```,```y```, and ```z``` are different now. 