<div style="text-align:left;font-size:2em"><span style="font-weight:bolder;font-size:1.25em">SP2273 | Learning Portfolio</span><br><br><span style="font-weight:bold;color:darkred">Storing Data (Need)</span></div>

![](https://imgs.xkcd.com/comics/2018_cve_list.png)

Figure 1: From [xkcd](https://xkcd.com/).

**Before Proceeding**

- You need to load the NumPy package to use NumPy arrays. You can import it using an alias as follows:
  
  `import numpy as np`

# What to expect in this chapter

- Store and manipulate data easily and efficiently beyond the simple variables we have encountered so far (**list** and **dictionary**).
- Here are several more ways to store and manipulate data:
  1. **Lists**
  2. **Numpy arrays**
  3. **Dictionaries**
  4. **Tuples**
  5. Dataframes
  6. Classes
 
**Lists, Numpy arrays, dictionaries and tuples** will be discussed in this Chapter.

**Important**

The storing, retrieving and modifying data in programming will influence how you think about data.

This will ultimately aid (or hinder) your ability to conjure up algorithms to solve problems.

# 1 Lists, Arrays & Dictionaries

## 1.1 Let’s compare

3 ways of storing the same data using `lists`, `arrays` and `dictionaries`.

**Python Lists**

In [None]:
py_super_names = ["Black Widow", "Iron Man", "Doctor Strange"]
py_real_names = ["Natasha Romanoff", "Tony Stark", "Stephen Strange"]

**Numpy Arrays**

In [2]:
import numpy as np
np_super_names = np.array(["Black Widow", "Iron Man", "Doctor Strange"])
np_real_names = np.array(["Natasha Romanoff", "Tony Stark", "Stephen Strange"])

**Dictionary**

In [None]:
superhero_info = {
    "Natasha Romanoff": "Black Widow",
    "Tony Stark": "Iron Man",
    "Stephen Strange": "Doctor Strange"
}

Notice:

- Dictionaries use a **key** and an associated **value** separated by a `:`
- The **dictionary** very elegantly holds the real and superhero names in *one structure* while we need **two lists (or arrays)** for *the same data*.
- For lists and arrays, the order matters. i.e. ‘Iron Man’ must be in the same position as ‘Tony Stark’ for things to work.

Lists (and arrays) offer many features that dictionaries don’t and vice versa.

Which data storage strategy to choose will depend on the problem you are trying to solve.

**Remember**

Three (3) basic ways of storing data:

1. Lists
2. NumPy arrays
3. Dictionaries.

**Notes**:

- `py` and `np` in front of the variable are added just for clarity. Actually, you can use any name for the variables provided that they are not a Python keyword like `for`, `if`, `while`, `in`, `try`, `except`, `or`, `not`, `elif`, `else`, etc.
- **`Arrays`** is the short for **`NumPy arrays`**.
- **`Lists`** is the short for **`Python lists`**.

## 1.2 Accessing data from a list (or array)

To access data from lists (and arrays), we need to use an **index operator** corresponding to the data’s position. Python is a **zero-indexed language**, meaning it **starts counting at 0**. 

Here is **a way to remember** it:

Python is invented by Guido van Rossum in the Netherlands, which is a country in Europe. In Europe, the floor of the hotel starts at 0 not at 1. So does Python, the index starts at 0, not at 1.

In conclusion, if you want to access a particular element in the list (or array), you need to specify the relevant index starting from zero. The image below shows the relationship between the position and index.

![](https://sps.nus.edu.sg/sp2273/docs/python_basics/03_storing-data/python-zero-indexed-counting.png)

In [3]:
py_super_names = ["Black Widow", "Iron Man", "Doctor Strange"]
py_real_names = ["Natasha Romanoff", "Tony Stark", "Stephen Strange"]

Example 1

In [4]:
py_real_names[0]

'Natasha Romanoff'

The output will be `'Natasha Romanoff'`

Example 2

In [None]:
py_super_names[0]

The output will be `'Black Widow'`

We can also use negative indexing to count from the back of the list. 

To access the last element of the list, you just have to call [-1]. To access the second last element of the list, call [-2], and so on.

This is super useful because we can easily access the last element without knowing the list size. If we use forward indexing, we need to know the size of the list.

Example 3

In [5]:
py_super_names[2]    # Forward indexing 
                     # We need to know the size 
                     # beforehand for this to work.

'Doctor Strange'

The output will be `'Doctor Strange'`

In [9]:
py_super_names[-1]   # Reverse indexing

'Doctor Strange'

The output will be `'Doctor Strange'`

Additionally,

we can also print all the elements inside a list by using the following code as an example:

In [10]:
py_super_names[:]

['Black Widow', 'Iron Man', 'Doctor Strange']

This will give `['Black Widow', 'Iron Man', 'Doctor Strange']` as the output.

We can also select for more than 1 elements.

Example:

In [11]:
py_super_names[0:2]

['Black Widow', 'Iron Man']

This will give `['Black Widow', 'Iron Man']` as an output.

Note that `[0:2]` means 'I want to know the elements from the index 0 until, but not including, 2'. So, you will got 'Black Widow' as the 0 index element and 'Iron Man' as the 1 index element, but not including 'Doctor Strange' as the 2 index elemment.

If you want to include 'Iron Man' and 'Doctor Strange', you might need to use this code:

In [14]:
py_super_names[1:]

['Iron Man', 'Doctor Strange']

Other equivalent code to produce 'Iron Man' and 'Doctor Strange':

In [15]:
py_super_names[1:3]

['Iron Man', 'Doctor Strange']

In [17]:
py_super_names[1:5]

['Iron Man', 'Doctor Strange']

**Fun Fact**

You can even do the code like this:

`py_super_names[1:1000]`

It will still give you the same output, which is `['Iron Man', 'Doctor Strange']`.

1000 in the back of the list retrieve the elements until the number 999 of elements. But, because in this case we just have until the element with index 2, then Python will just ignore the rest and print until the last element that we have.y?

In [16]:
py_super_names[1:1000]

['Iron Man', 'Doctor Strange']

**Remember:**

Data in lists (and arrays) must be accessed using a *zero-based index*.

# Additional Information

## You can also use the indexing concept for strings.

Example 1

![](https://trainingcommit.notion.site/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2Fd7554274-50bc-48a5-80fc-184ad0e186df%2F6110e792-97b4-4d5a-ae9b-3cd668d58811%2FUntitled.png?table=block&id=c84987b8-e920-4701-a01c-9e59ad240ee2&spaceId=d7554274-50bc-48a5-80fc-184ad0e186df&width=860&userId=&cache=v2)

Consider the string 'probe'!

To access 'e' which has an index of 4, we can use the code below:

In [21]:
a = 'probe'
print(a[4])

e


Again, to access the same letter 'e', we can use the negative indexing code below:

In [22]:
print(a[-1])

e


Example 2

We use the same string as in *Example 1*.

We can also do slicing for string. For example:

In [23]:
print(a[2:4])

ob


Again, the index 4 means 'until, but not including 4'.

## You can know the length of a string or a list using the `len()` function

Example for a string

In [29]:
print(len(a))

5


Note:

Despite the indexing of 'a' stops at index 4, the length is still condisered as 5. The same applies for lists.

Example for a list

In [30]:
print(len(py_super_names))

3


## 1.3 Accessing data from a dictionary

Dictionaries hold data (values) paired with a key. i.e. you can access the value (in this case, the superhero name) using the real name as a key. Here is how it works:

In [24]:
superhero_info = {
    "Natasha Romanoff": "Black Widow",
    "Tony Stark": "Iron Man",
    "Stephen Strange": "Doctor Strange"
}                  

In [25]:
superhero_info["Natasha Romanoff"]

'Black Widow'

This will give `'Black Widow'` as an output.

**Remember**

Remember that dictionaries have a key-value structure.

If you want, you can access all the keys and all the values as follows:

In [26]:
superhero_info.keys()

dict_keys(['Natasha Romanoff', 'Tony Stark', 'Stephen Strange'])

In [27]:
superhero_info.values()

dict_values(['Black Widow', 'Iron Man', 'Doctor Strange'])

## 1.4 Higher dimensional lists

Unlike with a dictionary, we needed *two lists* to store the corresponding real and superhero names. An obvious way around the need to have two lists is to have a **2D list (or array)** as follows.

In [28]:
py_superhero_info = [['Natasha Romanoff', 'Black Widow'],
                     ['Tony Stark', 'Iron Man'],
                     ['Stephen Strange', 'Doctor Strange']]

Those list will assign `'Natasha Romanoff'` to `'Black Widow'`, `'Tony Stark'`to `'Iron Man'`, and `'Stephen Strange'` to `'Doctor Strange'`.

# 2 Lists vs. Arrays

Lists and arrays have some *similarities* but **more differences**. It is important to know these to make full use of these differences.

## 2.1 Size

Often, you need to know how many elements there are in lists or arrays. We can use the `len()` function for this purpose for both lists and arrays. However, **arrays also offer other options**.

In [31]:
py_list_2d = [[1, "A"], [2, "B"], [3, "C"], [4, "D"],
              [5, "E"], [6, "F"], [7, "G"], [8, "H"],
              [9, "I"], [10, "J"]]

np_array_2d = np.array(py_list_2d)      # Reusing the Python list 
                                        # to create a NEW
                                        # NumPy array

In [35]:
len(py_list_2d)

10

In [36]:
len(np_array_2d)

10

In [37]:
np_array_2d.shape

(10, 2)

Notice the absence of brackets `( )` in `shape` above. This is because `shape` is **not** a function. Instead, it is a property or *attribute* of the NumPy array.

## 2.2 Arrays are fussy about type

Recall : [data types](https://sps.nus.edu.sg/sp2273/docs/python_basics/03_storing-data/1_storing-data_need.html#sec-basic-good-type) (e.g., `int`, `float`, `str`).

One prominent difference between lists and arrays is that **arrays** insist on having only a **single data type**; **lists** are **more accommodating**. 

Consider the following example and notice how the numbers are converted to English (`' '`) when we create the NumPy array.

In [39]:
py_list = [1, 1.5, 'A']
np_array = np.array(py_list)

**Lists**

In [41]:
py_list

[1, 1.5, 'A']

**Arrays**

In [42]:
np_array

array(['1', '1.5', 'A'], dtype='<U32')

The output will be `array(['1', '1.5', 'A'], dtype='<U32')`.

When dealing with datasets with both numbers and text, you must be mindful of this restriction. However, this is just an annoyance and not a problem as we can easily change type (typecast) using the ‘hidden’ function `astypes()`.

**Remember**:

Remember that NumPy arrays tolerate only a single type.

## 2.3 Adding a number

In [None]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         # Reusing the Python list
                                     # to create a NEW
                                     # NumPy array

In [None]:
np_array + 10

**Lists**

In [None]:
py_list + 10        # Won't work!

**Arrays**

`array([11, 12, 13, 14, 15])`

## 2.4 Adding another list

In [None]:
py_list_1 = [1, 2, 3, 4, 5]
py_list_2 = [10, 20, 30, 40, 50]

np_array_1 = np.array(py_list_1)
np_array_2 = np.array(py_list_2)

In [None]:
py_list_1 + py_list_2
np_array_1 + np_array_2

**Lists**

`[1, 2, 3, 4, 5, 10, 20, 30, 40, 50]`

**Arrays**

`array([11, 22, 33, 44, 55])`

# ***Conclusion***:

**Adding lists** causes them to *grow* while **adding arrays** is an *element-wise operation*.

## 2.5 Multiplying by a Number

In [None]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         

In [None]:
py_list*2
np_array*2

**Lists**

`[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]`

**Arrays**

`array([ 2,  4,  6,  8, 10])`

# ***Conclusion***:

**Multiplying** by a number makes a *list grow*, whereas an **array multiplies** its *elements by the number*!

## 2.6 Squaring

In [None]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)

In [None]:
np_array**2

**Lists**

In [None]:
py_list**2                      # Won't work!  

**Arrays**

`array([ 1,  4,  9, 16, 25])`

# ***Conclusion***:

**Applying exponentiation** to lists by a number *wont work*, whereas **arrays do the exponentiation** to every of its *elements by the number*!

## 2.7 Asking questions

In [43]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         

In [45]:
py_list == 3     # Works, but what IS the question?

False

In [53]:
py_list = [3, 3, 3, 3, 3]
py_list == 3

False

In the above example, `py_list == 3` will compare its value to the integer in `py_list = [3, 3, 3, 3, 3]`. However, since `py_list = [3, 3, 3, 3, 3]` is a list then the answer will always be `False`

In [46]:
np_array == 3

array([False, False,  True, False, False])

In [47]:
np_array > 3  

array([False, False, False,  True,  True])

**Lists**

Example 1

The output: `False`

Example 2

In [None]:
py_list > 3      # Won't work!

**Arrays**

Example 1

The output: `array([False, False,  True, False, False])`

Example 2

The output: `array([False, False, False,  True,  True])`

# ***Conclusion***:

**Arrays** will compare each of the value of its elements to the number we give, but **Lists** won't

## 2.8 Mathematics

In [None]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         

In [None]:
sum(py_list)     # sum() is a base Python function
max(py_list)     # max() is a base Python function
min(py_list)     # min() is a base Python function
np_array.sum()
np_array.max()
np_array.min()
np_array.mean()
np_array.std()

**Lists**

Example 1

The output: `15`

Example 2

The output: `5`

Example 3

The output: `1`

Example 4

In [None]:
py_list.sum()   # Won't work!

**Arrays**

Example 1

The output: `15`

Example 2

The output: `5`

Example 3

The output: `1`

Example 4

The output: `3.0`

Example 5

The output: `1.4142135623730951`

# ***Conclusion***:

Use *`function name()`* to deal with **lists**, whereas use *`array name.operation type()`* to deal with **arrays**.

**Remember**:

(**roughly speaking**) an operation on a list works on the **whole** list. In contrast, an operation on an array works on the **individual elements** of the array.

# Exercises & Self-Assessment

## Footnotes

1. For example, think of how easy it is to do row or column manipulations of data when put into a spreadsheet format.