<div style="text-align:left;font-size:2em"><span style="font-weight:bolder;font-size:1.25em">SP2273 | Learning Portfolio</span><br><br><span style="font-weight:bold;color:darkred">Storing Data (Need)</span></div>

In [1]:
import numpy as np

# What to expect in this chapter

1. **Lists**
1. **Numpy arrays**
1. **Dictionaries**
1. **Tuples**
1. Dataframes
1. Classes

# 1 Lists, Arrays & Dictionaries

## 1.1 Let’s compare

In [None]:
# Python Lists
py_super_names = ["Black Widow", "Iron Man", "Doctor Strange"]
py_real_names = ["Natasha Romanoff", "Tony Stark", "Stephen Strange"]

In [None]:
# Numpy Arrays
np_super_names = np.array(["Black Widow", "Iron Man", "Doctor Strange"])
np_real_names = np.array(["Natasha Romanoff", "Tony Stark", "Stephen Strange"])

In [None]:
# Dictionary
superhero_info = {
    "Natasha Romanoff": "Black Widow",
    "Tony Stark": "Iron Man",
    "Stephen Strange": "Doctor Strange"
}

- Dictionaries use a key and an associated value separated by a :
- The dictionary very elegantly holds the real and superhero names in one structure while we need two lists (or arrays) for the same data.
- For lists and arrays, the order matters. I.e. ‘Iron Man’ must be in the same position as ‘Tony Stark’ for things to work.

## 1.2 Accessing data from a list (or array)

To access data from lists (and arrays), we need to use an index corresponding to the data’s position.

Python is a zero-indexed language, meaning it starts counting at 0. 

In [2]:
py_super_names = ["Black Widow", "Iron Man", "Doctor Strange"]
py_real_names = ["Natasha Romanoff", "Tony Stark", "Stephen Strange"]

In [3]:
py_real_names[0]

'Natasha Romanoff'

In [4]:
py_super_names[0]

'Black Widow'

Using a negative index allows us to count from the back of the list. For instance, using the index -1 will give the last element. This is super useful because we can easily access the last element without knowing the list size.

In [None]:

py_super_names[-1]   # Reverse indexing

'Doctor Strange'

In [6]:
py_super_names[-2]

'Iron Man'

## 1.3 Accessing data from a dictionary

Dictionaries hold data (values) paired with a key. i.e. you can access the value

In [7]:
superhero_info = {
    "Natasha Romanoff": "Black Widow",
    "Tony Stark": "Iron Man",
    "Stephen Strange": "Doctor Strange"
}               

In [8]:
superhero_info["Natasha Romanoff"]

'Black Widow'

In [None]:
superhero_info.keys() # access all keys

dict_keys(['Natasha Romanoff', 'Tony Stark', 'Stephen Strange'])

In [10]:
superhero_info.values() # access all values

dict_values(['Black Widow', 'Iron Man', 'Doctor Strange'])

## 1.4 Higher dimensional lists

Unlike with a dictionary, we needed two lists to store the corresponding real and superhero names. An obvious way around the need to have two lists is to have a 2D list (or array) as follows.

In [11]:
py_superhero_info = [['Natasha Romanoff', 'Black Widow'],
                     ['Tony Stark', 'Iron Man'],
                     ['Stephen Strange', 'Doctor Strange']]

# 2 Lists vs. Arrays

## 2.1 Size

len() function can be used to find out how many elements there are in lists or arrays

In [12]:
py_list_2d = [[1, "A"], [2, "B"], [3, "C"], [4, "D"],
              [5, "E"], [6, "F"], [7, "G"], [8, "H"],
              [9, "I"], [10, "J"]]

np_array_2d = np.array(py_list_2d)      # Reusing the Python list 
                                        # to create a NEW
                                        # NumPy array

In [16]:
len(np_array_2d)


10

In [17]:
len(py_list_2d)


10

In [None]:
np_array_2d.shape # no need to use brackets as shape is not a function

(10, 2)

## 2.2 Arrays are fussy about type

Arrays insist on having only a single data type while lists are more accommodating

In [19]:
py_list = [1, 1.5, 'A']
np_array = np.array(py_list)

In [20]:
py_list

[1, 1.5, 'A']

In [21]:
np_array

array(['1', '1.5', 'A'], dtype='<U32')

- For array, all the numbers are converted to English (' ')
- When dealing with datasets with both numbers and text, you must be mindful of this restriction
- type (typecast) can be changed using the ‘hidden’ function astypes()

## 2.3 Adding a number

In [22]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         # Reusing the Python list
                                     # to create a NEW
                                     # NumPy array

In [None]:
np_array + 10 # 10 is added to each element

array([11, 12, 13, 14, 15])

In [24]:
py_list + 10        # Won't work!

TypeError: can only concatenate list (not "int") to list

## 2.4 Adding another list

In [25]:
py_list_1 = [1, 2, 3, 4, 5]
py_list_2 = [10, 20, 30, 40, 50]

np_array_1 = np.array(py_list_1)
np_array_2 = np.array(py_list_2)

In [26]:
py_list_1 + py_list_2

[1, 2, 3, 4, 5, 10, 20, 30, 40, 50]

In [27]:
np_array_1 + np_array_2

array([11, 22, 33, 44, 55])

You can add 1 list to another but you cannot add one array to another.

## 2.5 Multiplying by a Number

In [28]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         

In [30]:
py_list*2

[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]

In [29]:
np_array*2

array([ 2,  4,  6,  8, 10])

multiplying by a number makes a list grow, whereas an array multiplies its elements by the number

## 2.6 Squaring

In [31]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)

In [32]:
np_array**2

array([ 1,  4,  9, 16, 25])

In [33]:
py_list**2                      # Won't work!  

TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'

You can square each element in the array but the same cannot be done for lists

## 2.7 Asking questions

In [34]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         

In [None]:
py_list == 3   # different type of object hence always False

(False, 2, 3, 4, 5)

In [39]:
py_list > 3

TypeError: '>' not supported between instances of 'list' and 'int'

In [None]:
np_array == 3  # checks if each element == 3

array([False, False,  True, False, False])

In [None]:
np_array > 3  # check if each element > 3

array([False, False, False,  True,  True])

## 2.8 Mathematics

In [41]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         

In [42]:
print(sum(py_list))    # sum() is a base Python function
print(max(py_list))     # max() is a base Python function
print(min(py_list))   # min() is a base Python function
print(np_array.sum())
print(np_array.max())
print(np_array.min())
print(np_array.mean())
print(np_array.std())

15
5
1
15
5
1
3.0
1.4142135623730951


## Footnotes

arrays are better for mathematical stuff