<div style="text-align:left;font-size:2em"><span style="font-weight:bolder;font-size:1.25em">SP2273 | Learning Portfolio</span><br><br><span style="font-weight:bold;color:darkred">Storing Data (Need)</span></div>

# Chapter Summary

- Python lists, Numpy arrays, dictionaries, tuples
- accessing data from:
    1. list & array, use **index** (counting start from zero)  
       `list/array_name[index]`
    3. dictionary, use **paired-key**  
       `dictionary_name[key]`  
       access all keys  
       `dictionary_name.keys()`  
       access all values  
       `dictionary_name.values()`  
- operations on lists & arrays  
    - `len()` find number of elemetns  
    - `.shape` find array shape  
    - add another list, add number, multiply, sqaure  
- list & array mathematics  
    - Python: `sum()`, `max()`, `min()` (base Python function)
    - Array: `.sum()`, `.max()`, `.min()`, `.mean()`, `.std()`

# Lists, Arrays & Dictionaries

data structures can influence **how you think about data**  
(non comprehensive) to store & manipulate data:  
1. lists
2. numpy arrays
3. dictionaries
4. tuples
5. dataframes (Data Processing basket in the Applications part)
6. classes (Nice chapter)

## Let’s compare

**!!!**  
- dictionaries use a **key** and an **associated value** separated by a `:`
- dictionary holds the real and superhero names in one structure while we need two lists (or arrays) for the same data  
- for lists and arrays, the **order matters** (i.e. ‘Iron Man’ must be in the same position as ‘Tony Stark’ for things to work)

In [5]:
# Python (py) Lists
py_super_names = ["Black Widow", "Iron Man", "Doctor Strange"]
py_real_names = ["Natasha Romanoff", "Tony Stark", "Stephen Strange"]

In [2]:
# Numpy (np) Arrays
import numpy as np
np_super_names = np.array(["Black Widow", "Iron Man", "Doctor Strange"])
np_real_names = np.array(["Natasha Romanoff", "Tony Stark", "Stephen Strange"])

In [4]:
# Dictionary
superhero_info = {
    "Natasha Romanoff": "Black Widow",
    "Tony Stark": "Iron Man",
    "Stephen Strange": "Doctor Strange"
}

## Accessing data from a list (or array)

Python as **zero-indexed** language = start counting at 0  
to access data from lists & arrays, use **index** corresponding to the data's position  
to access a particular element, specify the relevant index starting from 0

In [8]:
py_super_names = ["Black Widow", "Iron Man", "Doctor Strange"]
py_real_names = ["Natasha Romanoff", "Tony Stark", "Stephen Strange"]

In [9]:
py_real_names[0]

'Natasha Romanoff'

In [10]:
py_super_names[0]

'Black Widow'

**negative index** = count from the back of the list  
e.g. using index -1 = give the last element  
good so don't need to know the list size

In [11]:
py_super_names[-1]

'Doctor Strange'

## Accessing data from a dictionary

dictionary has a **key-value** structure  
access value using the paired-key:

In [12]:
superhero_info = {
    "Natasha Romanoff": "Black Widow",
    "Tony Stark": "Iron Man",
    "Stephen Strange": "Doctor Strange"
}                 
# use key
superhero_info["Natasha Romanoff"]

'Black Widow'

access all the keys and all the values:

In [15]:
superhero_info.keys()
superhero_info.values()

dict_values(['Black Widow', 'Iron Man', 'Doctor Strange'])

QUESTION: why didn't it show all the keys

## Higher dimensional lists

instead of having two lists to store info (above), we can use 2D list (or array)  
e.g.

In [17]:
py_superhero_info = [['Natasha Romanoff', 'Black Widow'],
                     ['Tony Stark', 'Iron Man'],
                     ['Stephen Strange', 'Doctor Strange']]

*from the tutor's comments:*  
- however, a list can only extract values if one knows the index  
- if one doesn't know the index but only know the key:  
    - use if-else operations or some other function to find the index of key, then extract the corresponding value
    - tedious & slow for human & computer
- downside of dictionary:
    - require more memory

# Lists vs. Arrays

Overall:  
an operation on a list works on the **whole** list  
an operation on an array works on the **individual element** of the array

## Size

`len()` = find **number of elements** in lists or arrays

In [18]:
py_list_2d = [[1, "A"], [2, "B"], [3, "C"], [4, "D"],
              [5, "E"], [6, "F"], [7, "G"], [8, "H"],
              [9, "I"], [10, "J"]]

np_array_2d = np.array(py_list_2d)      # Reusing the Python list 
                                        # to create a NEW
                                        # NumPy array

In [19]:
len(py_list_2d)

10

In [20]:
len(np_array_2d)

10

`.shape` = a property/attribute of NumPy array, give **2D shape**  
not a fxn, no `()`

In [21]:
np_array_2d.shape

(10, 2)

## Arrays are fussy about type

Numpy array can only hold **one type** of data  
**!!!** numbers are converted to English when Numpy array is created from Python list

*from tutor's comments:*  
- Numpy will convert variables of integers, floats, and strings into the least "restrictive" variable in the original list  
- **integers > floats > strings in terms of "restrictiveness"**  
- the zero index element (the very well behaved 3) is converted to an integer, float, or string depending on what the other elements in the array are

In [3]:
#Try the following:
print(type(np.array([3,3])[0]))
print(type(np.array([3,3.0])[0]))
print(type(np.array([3,"3"])[0]))

<class 'numpy.int32'>
<class 'numpy.float64'>
<class 'numpy.str_'>


In [26]:
py_list = [1, 1.5, 'A']
np_array = np.array(py_list)

In [25]:
# List
py_list

[1, 1.5, 'A']

In [27]:
# Array
np_array

array(['1', '1.5', 'A'], dtype='<U32')

QUESTION: what does `dtype=` mean

*from tutor's comments:*

dtype stands for data type, which is the different ways the computer actually stores the data.

In reality, none of the data storage and data usage is truly automatic, instead, we need to allocate specific bits to correspond to specific things, and the computer needs to record some more bits to remember that we did this kind of allocation. The good thing about Python is that it mostly sweeps this under the rug so you don't have to handle it, but in the more low-level (as in closer to hardware, not lousier) programming languages often require you to specify the data type you want to use to store your data.

In this case, dtype='<U32' means that the we want the U --> Unicode data type, of 32 --> 32 bits long, with the < sign indicating the order the bytes are stored in the 32 bits. 

## Adding a number

adding number to list **won't work**  
adding number to array **work**, but acts as increase in every number

In [28]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         # Reusing the Python list
                                     # to create a NEW
                                     # NumPy array

In [29]:
np_array

array([1, 2, 3, 4, 5])

In [31]:
# adding number to list
py_list + 10

TypeError: can only concatenate list (not "int") to list

In [32]:
# adding number to array
np_array + 10

array([11, 12, 13, 14, 15])

## Adding another list

adding lists = combination (+)  
adding arrays = element-wise operation

In [33]:
py_list_1 = [1, 2, 3, 4, 5]
py_list_2 = [10, 20, 30, 40, 50]

np_array_1 = np.array(py_list_1)
np_array_2 = np.array(py_list_2)

In [34]:
py_list_1 + py_list_2

[1, 2, 3, 4, 5, 10, 20, 30, 40, 50]

In [35]:
np_array_1 + np_array_2

array([11, 22, 33, 44, 55])

## Multiplying by a Number

multiplying list = makes a list grow  
multiplying array = multiplies its elements by the number

In [36]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         

In [37]:
py_list*2

[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]

In [38]:
np_array*2

array([ 2,  4,  6,  8, 10])

## Squaring

squaring list **won't work**  
squaring array will **square each element** in array

In [39]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)

In [40]:
py_list**2

TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'

In [41]:
np_array**2

array([ 1,  4,  9, 16, 25])

## Asking questions

asking qns in list **won't work**  
asking qns in array **check against each element**

In [5]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         

In [6]:
py_list == 3     # Works, but what IS the question?

False

*from tutor's comments:*  

the above works but the question is:  
is the variable *py_list* an integer of value 3?  
since py_list is a list, it therefore is *not* an integer of value 3, $\therefore$ the result is False

In [44]:
py_list > 3      # Won't work!

TypeError: '>' not supported between instances of 'list' and 'int'

In [47]:
np_array == 3  

array([False, False,  True, False, False])

In [46]:
np_array > 3  

array([False, False, False,  True,  True])

False = 0  
True = 1  
check how many True by `sum(array_name>number)`:

In [48]:
sum(np_array>3)

2

because 4 and 5 is > 3 (two Trues)

## Mathematics

**!!!** note the differences (`.` and `()`) btw Python function and Numpy function  
Python: `sum()`, `max()`, `min()` (base Python function)  
Array: `.sum()`, `.max()`, `.min()`, `.mean()`, `.std()`

In [49]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         

In [52]:
sum(py_list)

15

In [53]:
max(py_list)

5

In [54]:
min(py_list)

1

In [55]:
py_list.sum()   #WON'T WORK

AttributeError: 'list' object has no attribute 'sum'

*from tutor's comments:*  
This is probably because list attributes try to **be applicable to whenever there is a list**. sum() as an attribute would not make sense if your list is full of non-number objects, like strings.  
The list attributes that Python does have are:  
`.append()`, `.extend()`, `.insert()`, `.remove()`, `.pop()`

----

In [56]:
np_array.sum()

15

In [57]:
np_array.max()

5

In [58]:
np_array.min()

1

In [59]:
np_array.mean()

3.0

In [60]:
np_array.std()

1.4142135623730951