<div style="text-align:left;font-size:2em"><span style="font-weight:bolder;font-size:1.25em">SP2273 | Learning Portfolio</span><br><br><span style="font-weight:bold;color:darkred">Storing Data (Need)</span></div>

# What to expect in this chapter

Different ways to store and manipulate data:
1. List 
1. Numpy arrays
1. Dictionaries
1. Tuples
1. Dataframes
1. Classes 

# 1 Lists, Arrays & Dictionaries

## 1.1 Let’s compare

Python lists:

In [5]:
py_super_names = ["Black Widow", "Iron Man", "Doctor Strange"]
py_real_names = ["Natasha Romanoff", "Tony Stark", "Stephen Strange"]

print(py_super_names)

['Black Widow', 'Iron Man', 'Doctor Strange']


Numpy arrays:

In [4]:
import numpy as np
np_super_names = np.array(["Black Widow", "Iron Man", "Doctor Strange"])
np_real_names = np.array(["Natasha Romanoff", "Tony Stark", "Stephen Strange"])

print(np_real_names)

['Natasha Romanoff' 'Tony Stark' 'Stephen Strange']


Python lists are separated by commas but numpy arrays are not.

Dictionaries: 

In [None]:
superhero_info = {
    "Natasha Romanoff": "Black Widow",
    "Tony Stark": "Iron Man",
    "Stephen Strange": "Doctor Strange"
}


Dictionaries have key value pairs separated by a `:`

Dictionary holds real and superhero names in one structure while we need two lists (or arrays) for the same data (and they must be in the same positions).  

Arrays refer to 'NumPy arrays' while 'lists' refers to 'Python Lists'. 

## 1.2 Accessing data from a list (or array)

Python is a zero-indexed language, meaning it starts counting at 0. So if you want to access a particular element in the list (or array), you need to specify the relevant index starting from zero.

![](https://sps.nus.edu.sg/sp2273/docs/python_basics/03_storing-data/python-zero-indexed-counting.png)

The following code will output 'Natasha Romanoff' as she is in the first position of py_real_names.

In [1]:
py_super_names = ["Black Widow", "Iron Man", "Doctor Strange"]
py_real_names = ["Natasha Romanoff", "Tony Stark", "Stephen Strange"]

py_real_names[0]

'Natasha Romanoff'

In [2]:
py_super_names[2]

'Doctor Strange'

Using a negative index allows us to count from the back of the list. For instance, using the index -1 will give the last element. Useful as length of list may change, so "-1" will give us the last element regardless of list length. 

In [3]:
py_super_names[-1]   # Reverse indexing

'Doctor Strange'

## 1.3 Accessing data from a dictionary

Dictionaries hold data (values) paired with a key. i.e. you can access the value (in this case, the superhero name) using the real name as a key. Here is how it works:

In [4]:
superhero_info = {
    "Natasha Romanoff": "Black Widow",
    "Tony Stark": "Iron Man",
    "Stephen Strange": "Doctor Strange"
}                  

In [5]:
superhero_info["Natasha Romanoff"]

'Black Widow'

It is possible to access all the keys and all the values as follows:

In [6]:
superhero_info.keys()

dict_keys(['Natasha Romanoff', 'Tony Stark', 'Stephen Strange'])

In [7]:
superhero_info.values()

dict_values(['Black Widow', 'Iron Man', 'Doctor Strange'])

## 1.4 Higher dimensional lists

List of lists(element 0 in the embedded link is the real name while element 1 is the superhero name): 

In [8]:
py_superhero_info = [['Natasha Romanoff', 'Black Widow'],
                     ['Tony Stark', 'Iron Man'],
                     ['Stephen Strange', 'Doctor Strange']]

# 2 Lists vs. Arrays

## 2.1 Size

  We can use `len()` function for both list and arrays.

In [10]:
import numpy as np

py_list_2d = [[1, "A"], [2, "B"], [3, "C"], [4, "D"],
              [5, "E"], [6, "F"], [7, "G"], [8, "H"],
              [9, "I"], [10, "J"]]

np_array_2d = np.array(py_list_2d)      # Reusing the Python list 
                                        # to create a NEW
                                        # NumPy array

In [12]:
len(py_list_2d)

10

In [13]:
len(np_array_2d)

10

In [14]:
np_array_2d.shape

(10, 2)

Notice the absence of brackets ( ) in shape above. This is because shape is not a function. Instead, it is a property or attribute of the NumPy array.

## 2.2 Arrays are fussy about type

One prominent difference between lists and arrays is that arrays insist on having only a single data type; lists are more accommodating. Consider the following example and notice how the numbers are converted to English (' ') when we create the NumPy array.

In [15]:
py_list = [1, 1.5, 'A']
np_array = np.array(py_list)

In [16]:
py_list

[1, 1.5, 'A']

In [17]:
np_array

array(['1', '1.5', 'A'], dtype='<U32')

The above code converts all the elements into a string type since arrays can only deal with one data type. 

However, this is just an annoyance and not a problem as we can easily change type (typecast) using the ‘hidden’ function astypes() (later chapter). 

## 2.3 Adding a number

In [18]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         # Reusing the Python list
                                     # to create a NEW
                                     # NumPy array

The following code takes in each element of an array and adds 10 to each of them. 

In [19]:
np_array + 10

array([11, 12, 13, 14, 15])

The same idea will not work with `py_list + 10`. This is because you can only concatenate list to list (same type to same type).

In [20]:
py_list + 10        # Won't work!

TypeError: can only concatenate list (not "int") to list

## 2.4 Adding another list

In [21]:
py_list_1 = [1, 2, 3, 4, 5]
py_list_2 = [10, 20, 30, 40, 50]

np_array_1 = np.array(py_list_1)
np_array_2 = np.array(py_list_2)

In [22]:
py_list_1 + py_list_2

[1, 2, 3, 4, 5, 10, 20, 30, 40, 50]

In [23]:
np_array_1 + np_array_2

array([11, 22, 33, 44, 55])

So, adding lists causes them to grow while adding arrays is an element-wise operation. Lets see what happens when we add 2 arrays with diff length:

In [24]:
py_list_1 = [1, 2, 3, 4, 5]
py_list_2 = [10, 20, 30, 40, 50, 60, 70, 80]

np_array_1 = np.array(py_list_1)
np_array_2 = np.array(py_list_2)
np_array_1 + np_array_2

ValueError: operands could not be broadcast together with shapes (5,) (8,) 

The above code does not work as arrays need to have the same shape in order to be added together. 

## 2.5 Multiplying by a Number

In [None]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         

In [25]:
py_list*2

[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]

In [27]:
np_array*2

array([ 2,  4,  6,  8, 10])

As usual, manipulating arrays lead to element-wise operations while manipulating lists do not change individual elements. 

## 2.6 Squaring

In [None]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)

In [28]:
np_array**2


array([ 1,  4,  9, 16, 25])

In [29]:
py_list**2                      # Won't work!  

TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'

List to the power of 2 is meaningless.

## 2.7 Asking questions

In [30]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         

The following won't work because py_list is a list type and 3 is an integer type. 

In [34]:
py_list == 3 

False

The following iterates through all elements of the array and determines if the element is equal to 3. 

In [32]:
np_array == 3 

array([False, False,  True, False, False])

The following iterates through each element and see which elements are greater than 3 (if yes, return True)

In [33]:
np_array > 3  

array([False, False, False,  True,  True])

## 2.8 Mathematics

In [None]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         

In [36]:
print(sum(py_list))     # sum() is a base Python function
print(max(py_list))    # max() is a base Python function
print(min(py_list))     # min() is a base Python function
print(np_array.sum())
print(np_array.max())
print(np_array.min())
print(np_array.mean())
print(np_array.std())

15
5
1
15
5
1
3.0
1.4142135623730951


# Exercises & Self-Assessment

In [None]:



# Your solution here




## Footnotes