<div style="text-align:left;font-size:2em"><span style="font-weight:bolder;font-size:1.25em">SP2273 | Learning Portfolio</span><br><br><span style="font-weight:bold;color:darkred">Storing Data (Need)</span></div>

*panda: try ltr*

# What to expect in this chapter

### multiple ways to store data
1. **Lists**
1. **Numpy arrays**
1. Dictionaries
1. **Tuples**
1. Dataframes
1. **Classes**

# 1 Lists, Arrays & Dictionaries

### Python lists

In [None]:
py_super_names = ["Black Widow", "Iron Man", "Doctor Strange"]
py_real_names = ["Natasha Romanoff", "Tony Stark", "Stephen Strange"]

### Numpy array

In [None]:
np_super_names = np.array(["Black Widow", "Iron Man", "Doctor Strange"])
np_real_names = np.array(["Natasha Romanoff", "Tony Stark", "Stephen Strange"])

### Dictionary

In [None]:
superhero_info = {
    "Natasha Romanoff": "Black Widow",
    "Tony Stark": "Iron Man",
    "Stephen Strange": "Doctor Strange"
}

## 1.1 Let’s compare

- Dictionaries use a key and an associated value separated by a :
- The dictionary very elegantly holds the real and superhero names in one structure while we need two lists (or arrays) for the same data.
- For lists and arrays, the order matters. I.e. ‘Iron Man’ must be in the same position as ‘Tony Stark’ for things to work.

## 1.2 Accessing data from a list (or array)

*list/array starts counting from zero*

In [2]:
py_super_names = ["Black Widow", "Iron Man", "Doctor Strange"]
py_real_names = ["Natasha Romanoff", "Tony Stark", "Stephen Strange"]
py_real_names[0]

'Natasha Romanoff'

In [3]:
py_super_names[2]    # Forward indexing 
                     # We need to know the size 
                     # beforehand for this to work.

'Doctor Strange'

In [4]:
py_super_names[-1]   # Reverse indexing

'Doctor Strange'

*Using a negative index allows us to count from the back of the list. For instance, using the index -1 will give the last element. This is super useful because we can easily access the last element without knowing the list size.*

## 1.3 Accessing data from a dictionary

In [5]:
superhero_info = {
    "Natasha Romanoff": "Black Widow",
    "Tony Stark": "Iron Man",
    "Stephen Strange": "Doctor Strange"
}                  

In [6]:
superhero_info["Natasha Romanoff"]

'Black Widow'

*Dictionaries hold data (values) paired with a key. i.e. you can access the value (in this case, the superhero name) using the real name as a key.*

In [7]:
superhero_info.keys() #to show all the keys

dict_keys(['Natasha Romanoff', 'Tony Stark', 'Stephen Strange'])

In [8]:
superhero_info.values() #to show all the values

dict_values(['Black Widow', 'Iron Man', 'Doctor Strange'])

## 1.4 Higher dimensional lists

In [9]:
# using a 2D list
py_superhero_info = [['Natasha Romanoff', 'Black Widow'],
                     ['Tony Stark', 'Iron Man'],
                     ['Stephen Strange', 'Doctor Strange']]

# 2 Lists vs. Arrays

## 2.1 Size

In [12]:
import numpy as np

In [20]:
py_list_2d = [[1, "A"], [2, "B"], [3, "C"], [4, "D"],
              [5, "E"], [6, "F"], [7, "G"], [8, "H"],
              [9, "I"], [10, "J"]]

np_array_2d = np.array(py_list_2d)      # Reusing the Python list 
                                        # to create a NEW
                                        # NumPy array

#2D = 2 dimensions -> think of it as an matrix

In [22]:
np.array(py_list_2d)

array([['1', 'A'],
       ['2', 'B'],
       ['3', 'C'],
       ['4', 'D'],
       ['5', 'E'],
       ['6', 'F'],
       ['7', 'G'],
       ['8', 'H'],
       ['9', 'I'],
       ['10', 'J']], dtype='<U21')

In [14]:
len(py_list_2d)
len(np_array_2d)
np_array_2d.shape

# Notice the absence of brackets ( ) in shape above. This is because shape is not a function. Instead, it is a property or attribute of the NumPy array.

(10, 2)

## 2.2 Arrays are fussy about type

In [16]:
py_list = [1, 1.5, 'A']
np_array = np.array(py_list)

In [18]:
py_list # integer, float, string

[1, 1.5, 'A']

In [19]:
np_array # string, string, string, only one type of data -> array is faster

array(['1', '1.5', 'A'], dtype='<U32')

*Remember that NumPy arrays tolerate only a single type.*

## 2.3 Adding a number

In [23]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         # Reusing the Python list
                                     # to create a NEW
                                     # NumPy array

In [24]:
py_list + 10        # Won't work!

TypeError: can only concatenate list (not "int") to list

In [25]:
np_array + 10 #add 10 to each number in the array

array([11, 12, 13, 14, 15])

## 2.4 Adding another list

In [26]:
py_list+[10]

[1, 2, 3, 4, 5, 10]

In [27]:
py_list_1 = [1, 2, 3, 4, 5]
py_list_2 = [10, 20, 30, 40, 50]

np_array_1 = np.array(py_list_1)
np_array_2 = np.array(py_list_2)

In [28]:
py_list_1 + py_list_2 #concanate lists

[1, 2, 3, 4, 5, 10, 20, 30, 40, 50]

In [29]:
np_array_1 + np_array_2 #adding up numbers 

array([11, 22, 33, 44, 55])

## 2.5 Multiplying by a Number

In [30]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         

In [31]:
py_list* #same list repeat twice, **2 doesn't work

[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]

In [32]:
np_array*2 #number doubles in the array

array([ 2,  4,  6,  8, 10])

## 2.6 Squaring

In [33]:
np_array**2

array([ 1,  4,  9, 16, 25])

## 2.7 Asking questions

In [34]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         

In [35]:
py_list == 3     # Works, but what IS the question?

False

In [36]:
np_array == 3  

array([False, False,  True, False, False])

In [37]:
np_array > 3  

array([False, False, False,  True,  True])

## 2.8 Mathematics

In [38]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         

In [40]:
sum(py_list)     # sum() is a base Python function

15

In [41]:
max(py_list)     # max() is a base Python function

5

In [42]:
min(py_list)     # min() is a base Python function

1

In [43]:
np_array.sum()   #same as sum(np_array), if %timeit, np_array function is much faster than the python function, due to s very small data set here 

np.int64(15)

*if there's a 2D list converted to numpy array, np_array.sum() will give the sum of every number in the list. To find the sum of specific numbers, e.g. all the rows/ columns, can use axis in numpy*

In [44]:
np_array.max()

np.int64(5)

In [45]:
np_array.min()

np.int64(1)

In [46]:
np_array.mean()

np.float64(3.0)

In [47]:
np_array.std()

np.float64(1.4142135623730951)

## Footnotes

In [49]:
py_list = [[1, 1], [2, 1], [3, 1], [4, 1], [5, 1]]
np_array = np.array(py_list)

In [50]:
np_array.sum(axis=0)

array([15,  5])

In [51]:
np_array.sum(axis=1)

array([2, 3, 4, 5, 6])