<div style="text-align:left;font-size:2em"><span style="font-weight:bolder;font-size:1.25em">SP2273 | Learning Portfolio</span><br><br><span style="font-weight:bold;color:darkred">Storing Data (Need)</span></div>

----------------------------------------------------
# What to Expect in This Chapter
----------------------------------------------------

1. we need data/ info to **solve problems**/ **obtain solutions**
1. hence, we need ways to store and manipulate data easily and efficiently
1. some examples include:
    - **Lists**
    - **Numpy arrays**
    - **Dictionaries**
    - **Tuples**
    - Dataframes
    - Classes
      
Items in **bold** will be covered in this unit.

<div class="alert alert-block alert-danger">
<b>Important:</b> 
    <p>1. SUPER DUPER important it is for you to understand how to <b>store</b>, <b>retrieve</b> and <b>modify</b> data in programming. This is because these abstract structures will <b>influence how you think about data</b>. (For example, think of how easy it is to do row or column manipulations of data when put into a spreadsheet format.) This will ultimately aid (or hinder) your ability to conjure up algorithms to solve problems.</p>
</div>

# 1 Lists, Arrays & Dictionaries

## 1.1 Let’s compare

**python lists**

`py_super_names = ["Black Widow", "Iron Man", "Doctor Strange"]`
`py_real_names = ["Natasha Romanoff", "Tony Stark", "Stephen Strange"]`

**numpy arrays:** 
- `py` and `np` added in front of the variable for clarity. You can choose any name for the variables (provided that they are not a Python keyword like `for`, `if`).
- mentioning of 'arrays’ refers to ‘NumPy arrays’ and ‘lists’ refers to ‘Python lists’.


In [74]:
import numpy as np

`np_super_names = np.array(["Black Widow", "Iron Man", "Doctor Strange"])`\
`np_real_names = np.array(["Natasha Romanoff", "Tony Stark", "Stephen Strange"])`

**dictionary**:

```
superhero_info = {
    "Natasha Romanoff": "Black Widow",
    "Tony Stark": "Iron Man",
    "Stephen Strange": "Doctor Strange"
}
```

**Notice**:
Dictionaries use a key and an associated value separated by a `:`
The dictionary very elegantly holds the real and superhero names in one structure while we need two lists (or arrays) for the same data.
For lists and arrays, the order **matters**. I.e. `‘Iron Man’` must be in the same position as `‘Tony Stark’` for things to work.
Lists (and arrays) offer many features that dictionaries *don’t* and vice versa. Choosing a data storage strategy depends on the problem you are trying to solve.

<div class="alert alert-block alert-success">
<b>Remember:</b>
<p>There are three basic ways of storing data:</p>
    <ol type="1">
        <li>lists,</li>
        <li>NumPy arrays and</li>
        <li>dictionaries. </li>
    </ol>  
</div>

## 1.2 Accessing data from a list (or array)

To access data from lists (and arrays), we need to use an **index** corresponding to the data’s **position**. Python is a **zero-indexed language**, meaning it starts counting at **0**. So if you want to access a particular element in the list (or array), you need to specify the relevant index starting from zero. The image below shows the relationship between the position and index.

![](https://phyweb.physics.nus.edu.sg/~chammika/sp2273/docs/python_basics/03_storing-data/python-zero-indexed-counting.png)

Using the following list:


In [75]:
py_super_names = ["Black Widow", "Iron Man", "Doctor Strange"]
py_real_names = ["Natasha Romanoff", "Tony Stark", "Stephen Strange"]


In [76]:
py_real_names[0]

'Natasha Romanoff'

In [77]:
py_super_names[0]

'Black Widow'

Using a **negative index** allows us to count from the **back** of the list. For instance, using the index -1 will give the last element. This is super useful because we can easily access the last element **without knowing** the list size.

In [78]:
py_super_names[2]    # Forward indexing 
                     # We need to know the size 
                     # beforehand for this to work.

'Doctor Strange'

In [79]:
py_super_names[-1]   # Reverse indexing

'Doctor Strange'

<div class="alert alert-block alert-success">
<b>Remember:</b>
<p>1. Data in lists (and arrays) must be accessed using a <b>zero-based index</b>.  </p>  
</div>

## 1.3 Accessing data from a dictionary

Dictionaries hold **data** (values) paired with a **key**. i.e. you can **access the value** (in this case, the superhero name) using the real name as a **key**. 

In [80]:
superhero_info = {
    "Natasha Romanoff": "Black Widow",
    "Tony Stark": "Iron Man",
    "Stephen Strange": "Doctor Strange"
}                  

In [81]:
superhero_info["Natasha Romanoff"]

'Black Widow'

<div class="alert alert-block alert-success">
<b>Remember:</b>
<p>1. Dictionaries have a <b>key-value structure</b>. </p>  
</div>

- to access **all** the keys and **all** the values...

In [82]:
superhero_info.keys()

dict_keys(['Natasha Romanoff', 'Tony Stark', 'Stephen Strange'])

In [83]:
superhero_info.values()

dict_values(['Black Widow', 'Iron Man', 'Doctor Strange'])

## 1.4 Higher dimensional lists

- Unlike with a dictionary, we needed two lists to store the corresponding real and superhero names
- A way around the need to have two lists is to have a **2D list** (or array) as follows.


In [84]:
py_superhero_info = [['Natasha Romanoff', 'Black Widow'],
                     ['Tony Stark', 'Iron Man'],
                     ['Stephen Strange', 'Doctor Strange']]

# 2 Lists vs. Arrays

Lists and arrays have some similarities but more **differences**...

## 2.1 Size

Often, you need to know **how many** elements there are in lists or arrays. We can use the `len()` function for this purpose for both lists and arrays. However, arrays also offer other methods...

In [85]:
py_list_2d = [[1, "A"], [2, "B"], [3, "C"], [4, "D"],
              [5, "E"], [6, "F"], [7, "G"], [8, "H"],
              [9, "I"], [10, "J"]]

np_array_2d = np.array(py_list_2d)      # Reusing the Python list 
                                        # to create a NEW
                                        # NumPy array

**Lists**:

In [86]:
len(py_list_2d)

10

**Arrays**

In [87]:
len(np_array_2d)

10

In [88]:
np_array_2d.shape

(10, 2)

Notice the **absence** of brackets `( )` in shape above. This is because `shape` is **not** a `function`. Instead, it is a **property** or attribute of the NumPy array.

## 2.2 Arrays are fussy about type

Please recall the previous discussion about data types (e.g., `int`, `float`, `str`). One prominent difference between lists and arrays is that arrays insist on having only a single data type; lists are more accommodating. Consider the following example and notice how the numbers are converted to English (`' '`) when we create the NumPy array.

In [89]:
py_list = [1, 1.5, 'A']
np_array = np.array(py_list)

**Lists**

In [90]:
py_list

[1, 1.5, 'A']

**Arrays**

In [91]:
np_array

array(['1', '1.5', 'A'], dtype='<U32')

When dealing with datasets with both numbers and text, you must be mindful of this restriction. However, this is just an annoyance and not a problem as we can easily change **type** (typecast) using the ‘hidden’ function `astypes()`. 

<div class="alert alert-block alert-success">
<b>Remember:</b>
<p>1. NumPy arrays tolerate only a single type.</p>  
</div>

## 2.3 Adding a number

In [2]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         # Reusing the Python list
                                     # to create a NEW
                                     # NumPy array

NameError: name 'np' is not defined

**Lists**

In [3]:
py_list + 10        # Won't work!

TypeError: can only concatenate list (not "int") to list

to add an integer to a list...

In [5]:
integer = 10                  #choose an integer number you want to add

new_py_list = [x + integer for x in py_list]  #name your new list after addition of integer
 
print(new_py_list)        #view results :")

[11, 12, 13, 14, 15]


**Arrays**

In [94]:
np_array + 10

array([11, 12, 13, 14, 15])

## 2.4 Adding another list

In [95]:
py_list_1 = [1, 2, 3, 4, 5]
py_list_2 = [10, 20, 30, 40, 50]

np_array_1 = np.array(py_list_1)
np_array_2 = np.array(py_list_2)

**Lists**

In [96]:
py_list_1 + py_list_2           #literally adds on to the list in an insertional way (and sequential rather than mathematically).

[1, 2, 3, 4, 5, 10, 20, 30, 40, 50]

**Arrays**

In [97]:
np_array_1 + np_array_2

array([11, 22, 33, 44, 55])

- adding lists causes them to grow while adding arrays is an element-wise operation.

## 2.5 Multiplying by a Number

In [98]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         

**Lists**:

In [99]:
py_list*2

[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]

**Arrays**:

In [100]:
np_array*2

array([ 2,  4,  6,  8, 10])

multiplying by a number makes a list grow, whereas an array multiplies its elements by the number!

## 2.6 Squaring

In [6]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)

NameError: name 'np' is not defined

**Lists**:

In [7]:
py_list**2                      # Won't work!  

TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'

to square the numbers in a list...

- list comprehension method

In [8]:
new_py_list = [n**2 for n in py_list]        
print(new_py_list)


[1, 4, 9, 16, 25]


- for other methods, click [here](https://levelup.gitconnected.com/10-ways-to-square-a-list-of-numbers-in-python-b85f710a939b).

**Arrays**:

In [103]:
np_array**2

array([ 1,  4,  9, 16, 25])

## 2.7 Asking questions

In [104]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         

**Lists**

Example 1:

In [106]:
py_list == 3     # Works, but what IS the question?

False

Example 2:

In [107]:
py_list > 3      # Won't work!

TypeError: '>' not supported between instances of 'list' and 'int'

**Arrays**

Example 3:

In [108]:
np_array == 3  

array([False, False,  True, False, False])

Example 4:

In [109]:
np_array > 3  

array([False, False, False,  True,  True])

## 2.8 Mathematics

In [110]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         

**Lists**:

Example 1:

In [111]:
sum(py_list)     # sum() is a base Python function


15

Example 2:

In [112]:
max(py_list)     # max() is a base Python function


5

Example 3:

In [113]:
min(py_list)     # min() is a base Python function


1

Example 4:

In [68]:
py_list.sum()   # Won't work!

AttributeError: 'list' object has no attribute 'sum'

**Arrays**:

Example 5:

In [70]:
np_array.sum()

15

Example 6:

In [71]:
np_array.max()

5

Example 7:

In [72]:
np_array.min()

1

Example 8:

In [73]:
np_array.mean()

3.0

Example 9:

In [None]:
np_array.std()

<div class="alert alert-block alert-success">
<b>Remember:</b>
<p>1. (<b>roughly speaking</b>) an operation on a list works on the <b>whole</b> list. In contrast, an operation on an array works on the <b>individual elements </b> of the array.</p>  
</div>

# Exercises & Self-Assessment

In [None]:



# Your solution here




## Footnotes