<div style="text-align:left;font-size:2em"><span style="font-weight:bolder;font-size:1.25em">SP2273 | Learning Portfolio</span><br><br><span style="font-weight:bold;color:darkred">Storing Data (Need)</span></div>

# What to expect in this chapter

In [3]:
import numpy as np

It is essential to have ways to store and manipulate data easily and efficiently beyond the simple variables we have encountered so far. You have already met the list and dictionary in a previous chapter. However, there are several more; here is a (non-comprehensive) list.

1. Lists
2. Numpy arrays
3. Dictionaries
4. Tuples
5. Dataframes
6. Classes

I cannot emphasize how important it is for you to understand how to store, retrieve and modify data in programming. This is because these abstract structures will influence how you think about data1. This will ultimately aid (or hinder) your ability to conjure up algorithms to solve problems.

# 1 Lists, Arrays & Dictionaries

## 1.1 Let’s compare

Python Lists

In [2]:
py_super_names = ["Black Widow", "Iron Man", "Doctor Strange"]
py_real_names = ["Natasha Romanoff", "Tony Stark", "Stephen Strange"]

Numpy Arrays

In [3]:
np_super_names = np.array(["Black Widow", "Iron Man", "Doctor Strange"])
np_real_names = np.array(["Natasha Romanoff", "Tony Stark", "Stephen Strange"])

Dictionary

In [4]:
superhero_info = {
    "Natasha Romanoff": "Black Widow",
    "Tony Stark": "Iron Man",
    "Stephen Strange": "Doctor Strange"
}

- Dictionaries use a key and an associated value separated by a :
- The dictionary very elegantly holds the real and superhero names in one structure while we need two lists (or arrays) for the same data.
- For lists and arrays, the order matters. I.e. ‘Iron Man’ must be in the same position as ‘Tony Stark’ for things to work.

## 1.2 Accessing data from a list (or array)

To access data from lists (and arrays), we need to use an index corresponding to the data’s position. Python is a zero-indexed language, meaning it starts counting at 0. So if you want to access a particular element in the list (or array), you need to specify the relevant index starting from zero. 

In [5]:
py_super_names = ["Black Widow", "Iron Man", "Doctor Strange"]
py_real_names = ["Natasha Romanoff", "Tony Stark", "Stephen Strange"]

In [6]:
py_real_names[0]

'Natasha Romanoff'

In [7]:
py_super_names[0]

'Black Widow'

Using a negative index allows us to count from the back of the list. For instance, using the index -1 will give the last element. This is super useful because we can easily access the last element without knowing the list size.

In [8]:
py_super_names[2]    # Forward indexing 
                     # We need to know the size 
                     # beforehand for this to work.

'Doctor Strange'

In [9]:
py_super_names[-1]   # Reverse indexing

'Doctor Strange'

## 1.3 Accessing data from a dictionary

Dictionaries hold data (values) paired with a key. i.e. you can access the value (in this case, the superhero name) using the real name as a key. Here is how it works:

In [10]:
superhero_info = {
    "Natasha Romanoff": "Black Widow",
    "Tony Stark": "Iron Man",
    "Stephen Strange": "Doctor Strange"
}                  

In [11]:
superhero_info["Natasha Romanoff"]

'Black Widow'

If you want, you can access all the keys and all the values as follows:

In [12]:
superhero_info.keys()

dict_keys(['Natasha Romanoff', 'Tony Stark', 'Stephen Strange'])

In [13]:
superhero_info.values()

dict_values(['Black Widow', 'Iron Man', 'Doctor Strange'])

## 1.4 Higher dimensional lists

Unlike with a dictionary, we needed two lists to store the corresponding real and superhero names. An obvious way around the need to have two lists is to have a 2D list (or array) as follows.

In [14]:
py_superhero_info = [['Natasha Romanoff', 'Black Widow'],
                     ['Tony Stark', 'Iron Man'],
                     ['Stephen Strange', 'Doctor Strange']]

# 2 Lists vs. Arrays

Lists and arrays have some similarities but more differences.

## 2.1 Size

To know how many elements there are in lists or arrays. We can use the len() function for this purpose for both lists and arrays. However, arrays also offer other options.

In [4]:
py_list_2d = [[1, "A"], [2, "B"], [3, "C"], [4, "D"],
              [5, "E"], [6, "F"], [7, "G"], [8, "H"],
              [9, "I"], [10, "J"]]

np_array_2d = np.array(py_list_2d)      # Reusing the Python list 
                                        # to create a NEW
                                        # NumPy array

In [5]:
len(py_list_2d)  #method to see no. of elements for LISTS

10

In [6]:
len(np_array_2d)

10

In [8]:
np_array_2d.shape  #Notice the absence of brackets ( ) in shape above. 
                   #This is because shape is not a function. Instead, it is a property or attribute of the NumPy array.

(10, 2)

## 2.2 Arrays are fussy about type

One prominent difference between lists and arrays is that arrays insist on having only a single data type; lists are more accommodating. 

Consider the following example and notice how the numbers are converted to English (' ') when we create the NumPy array.



In [9]:
py_list = [1, 1.5, 'A']
np_array = np.array(py_list)

In [10]:
py_list  #for lists

[1, 1.5, 'A']

In [12]:
np_array  #for arrays
          #Remember that NumPy arrays tolerate only a single type

array(['1', '1.5', 'A'], dtype='<U32')

## 2.3 Adding a number

In [13]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         # Reusing the Python list
                                     # to create a NEW
                                     # NumPy array

In [14]:
py_list + 10        # Won't work for lists

TypeError: can only concatenate list (not "int") to list

In [15]:
np_array + 10

array([11, 12, 13, 14, 15])

## 2.4 Adding another list

In [16]:
py_list_1 = [1, 2, 3, 4, 5]
py_list_2 = [10, 20, 30, 40, 50]

np_array_1 = np.array(py_list_1)
np_array_2 = np.array(py_list_2)

In [17]:
py_list_1 + py_list_2

[1, 2, 3, 4, 5, 10, 20, 30, 40, 50]

In [18]:
np_array_1 + np_array_2

array([11, 22, 33, 44, 55])

So, adding lists causes them to grow while adding arrays is a mathematical operation.

## 2.5 Multiplying by a Number

In [19]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         

In [20]:
py_list*2

[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]

In [21]:
np_array*2

array([ 2,  4,  6,  8, 10])

So multiplying by a number makes a list grow, whereas an array multiplies its elements by the number!

## 2.6 Squaring

In [23]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)

In [24]:
py_list**2                      # Won't work!  

TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'

In [25]:
np_array**2

array([ 1,  4,  9, 16, 25])

## 2.7 Asking questions

In [26]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         

In [27]:
py_list == 3     # Works, but what IS the question?

False

In [28]:
np_array == 3  

array([False, False,  True, False, False])

In [29]:
py_list > 3      # Won't work!

TypeError: '>' not supported between instances of 'list' and 'int'

In [30]:
np_array > 3  

array([False, False, False,  True,  True])

## 2.8 Mathematics

In [31]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         

In [32]:
sum(py_list)     # sum() is a base Python function

15

In [33]:
np_array.sum()

15

In [34]:
max(py_list)     # max() is a base Python function

5

In [35]:
np_array.max()

5

In [36]:
min(py_list)     # min() is a base Python function

1

In [37]:
np_array.min()

1

In [38]:
py_list.sum()   # Won't work!

AttributeError: 'list' object has no attribute 'sum'

In [39]:
np_array.mean()

3.0

In [40]:
np_array.std()

1.4142135623730951