<div style="text-align:left;font-size:2em"><span style="font-weight:bolder;font-size:1.25em">SP2273 | Learning Portfolio</span><br><br><span style="font-weight:bold;color:darkred">Storing Data (Need)</span></div>

# What to expect in this chapter

We learnt Python as a tool to help us understand science and solve problems related to science.

To do this, we must interact with information/data and transform to yield a solution.

For this, it is essential to have ways to store and manipulate data easily and efficiently beyond the simple variable we encountered so far.

Python offers a variety of ways to store and manipulate data.

We have seen the **list** and **dictionary** in the previous chapter.

However, there are several more; here is a (non-comprehensive) list.

1. Lists
2. Numpy arrays
3. Dictionaries
4. Tuples
5. Dataframes
6. Classes

In these chapters on basics, only Python **lists**, Numpy **arrays**, **dictionaries** and **tuples** will be discussed.

If you want to learnt about **dataframes**, refer to the Data Processing basket in the Applications part.

**Clasees** are an advanced topic that will be covered in the Nice Chapter.

It is VERY VERY important to understand how to store, retrieve and modify data in programming. This is beacsue these abstract structures will **influence how you think about data**. This will ultimately aid (or hinder) your ability to conjure up algorithms to solve problems.

In [1]:
import numpy as np

# 1 Lists, Arrays & Dictionaries

## 1.1 Let’s compare

We will see how to store the same information (in this case, some superhero data) using <span style="color:orange">lists</span>, <span style="color:orange">arrays</span> and <span style="color:orange">dictionaries</span>.

### Python Lists

In [4]:
py_super_names = ["Black Widow", "Iron Man", "Doctor Strange"]
py_real_names = ["Natasha Romanoff", "Tony Stark", "Stephen Strange"]

### Numpy Arrays

In [5]:
np_super_names = np.array(["Black Widow", "Iron Man", "Doctor Strange"])
np_real_names = np.array(["Natasha Romanoff", "Tony Stark", "Stephen Strange"])

### Dictionary

In [7]:
superhero_info = {
    "Natasha Romanoff": "Black Widow",
    "Tony Stark": "Iron Man",
    "Stephen Strange": "Doctor Strange"
}

Notice:
- Dictionaries use a key and an associated value separated by <span style="color:purple">:</span>
- The dictionary very elegantly holds the real and superhero names in one structure while we need two (or arrays) for the same data.
- For lists and arrays, the order matters. I.e. 'Iron Man' must be in the same position as 'Tony Stark' for things to work.

Lists (and arrays) offer many features that dictionaries don’t and vice versa.

There are 3 basic ways of storing data:
1. Lists
2. NumPy arrays
3. Dictionaries

- <span style="color:purple">py</span> and <span style="color:purple">np</span> in front of the variable for clarity. You can choose any name for the variables (provided that they are not a Python keywords like <span style="color:purple">for</span>, <span style="color:purple">if</span>.
- arrays means NumPy arrays and lists means Python lists.

## 1.2 Accessing data from a list (or array)

To access data from lists (and arrays), we need to use an index corresponding to the data’s position. 

Python is a zero-indexed language, meaning it starts counting at 0. So if you want to access a particular element in the list (or array), you need to specify the relevant index starting from zero. The image below shows the relationship between the position and index.



<img src = "https://sps.nus.edu.sg/sp2273/docs/python_basics/03_storing-data/python-zero-indexed-counting.png">

In [12]:
py_super_names = ["Black Widow", "Iron Man", "Doctor Strange"]
py_real_names = ["Natasha Romanoff", "Tony Stark", "Stephen Strange"]

### Example 1

In [15]:
py_real_names[0]

'Natasha Romanoff'

### Example 2

py_super_names[0]

### Example 3

Using a negative index allows us to count from the back of the list. For instance, using the index -1 will give the last element. This is super useful because we can easily access the last element without knowing the list size.

In [17]:
py_super_names[2]    # Forward indexing 
                     # We need to know the size 
                     # beforehand for this to work.

'Doctor Strange'

In [18]:
py_super_names[-1]   # Reverse indexing

'Doctor Strange'

Data in lists (and arrays) must be accessed using a zero-based index.



## 1.3 Accessing data from a dictionary

Dictionaries hold data (values) paired with a key. i.e. you can access the value (in this case, the superhero name) using the real name as a key.



In [19]:
superhero_info = {
    "Natasha Romanoff": "Black Widow",
    "Tony Stark": "Iron Man",
    "Stephen Strange": "Doctor Strange"
}                  

In [20]:
superhero_info["Natasha Romanoff"]

'Black Widow'

Remember that dictionaries have a key-value structure.



If you want, you can access all the keys and all the values as follows:



In [21]:
superhero_info.keys()

dict_keys(['Natasha Romanoff', 'Tony Stark', 'Stephen Strange'])

In [22]:
superhero_info.values()

dict_values(['Black Widow', 'Iron Man', 'Doctor Strange'])

## 1.4 Higher dimensional lists

Unlike with a dictionary, we needed two lists to store the corresponding real and superhero names. An obvious way around the need to have two lists is to have a 2D list (or array) as follows.



In [23]:
py_superhero_info = [['Natasha Romanoff', 'Black Widow'],
                     ['Tony Stark', 'Iron Man'],
                     ['Stephen Strange', 'Doctor Strange']]

# 2 Lists vs. Arrays

Lists and arrays have some similarities but more differences. It is important to know these to make full use of these differences. 

A few quick examples of using lists and arrays. These will allow you to appreciate the versatility that each offers.



## 2.1 Size

Often, you need to know how many elements there are in lists or arrays. We can use the <span style="color:purple">len ()</span> function for this purpose for both lists and arrays. However, arrays also offer other options.



In [24]:
py_list_2d = [[1, "A"], [2, "B"], [3, "C"], [4, "D"],
              [5, "E"], [6, "F"], [7, "G"], [8, "H"],
              [9, "I"], [10, "J"]]

np_array_2d = np.array(py_list_2d)      # Reusing the Python list 
                                        # to create a NEW
                                        # NumPy array

In [25]:
len(py_list_2d)
len(np_array_2d)
np_array_2d.shape

(10, 2)

Notice the absence of brackets <span style="color:purple">()</span> in <span style="color:purple">shape</span> above.

This is because <span style="color:purple">shape</span> is **not** a function.

It is a property or <span style="color:orange">attribute</span> of the NumPy array.

## 2.2 Arrays are fussy about type

Please recall the previous discussion about data types (e.g., <span style="color:purple">int</span>, <span style="color:purple">float</span>, <span style="color:purple">str</span>). 

One prominent difference between lists and arrays is that arrays insist on having only a single data type; lists are more accommodating. Consider the following example and notice how the numbers are converted to English (<span style="color:purple">' '</span>) when we create the NumPy array.



In [36]:
py_list = [1, 1.5, 'A']        #Lists [1, 1.5, 'A']
np_array = np.array(py_list)   #array(['1', '1.5', 'A'], dtype='<U32')

In [33]:
py_list
np_array

array(['1', '1.5', 'A'], dtype='<U32')

When dealing with datasets with both numbers and text, you must be mindful of this restriction. However, this is just an annoyance and not a problem as we can easily change type (typecast) using the ‘hidden’ function <span style="color:purple">astypes()</span>.

Remember that NumPy arrays tolerate only a single type.


## 2.3 Adding a number

In [39]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         # Reusing the Python list
                                     # to create a NEW
                                     # NumPy array

In [41]:
np_array + 10                        

array([11, 12, 13, 14, 15])

In [42]:
py_list + 10        # Won't work!

TypeError: can only concatenate list (not "int") to list

## 2.4 Adding another list

In [43]:
py_list_1 = [1, 2, 3, 4, 5]
py_list_2 = [10, 20, 30, 40, 50]

np_array_1 = np.array(py_list_1)
np_array_2 = np.array(py_list_2)

In [44]:
py_list_1 + py_list_2
np_array_1 + np_array_2

array([11, 22, 33, 44, 55])

So, adding lists cause them to grow while adding arrays is an element-wise operation.

## 2.5 Multiplying by a Number

In [45]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         

In [46]:
py_list*2
np_array*2

array([ 2,  4,  6,  8, 10])

So multiplying by a number makes a list grow, whereas an array multiplies its elements by the number!



## 2.6 Squaring

In [48]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)

In [49]:
np_array**2

array([ 1,  4,  9, 16, 25])

In [50]:
py_list**2                      # Won't work!  

TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'

## 2.7 Asking questions

In [51]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         

In [52]:
py_list == 3     # Works, but what IS the question?
np_array == 3  
np_array > 3  

array([False, False, False,  True,  True])

In [53]:
py_list > 3      # Won't work!

TypeError: '>' not supported between instances of 'list' and 'int'

## 2.8 Mathematics

In [54]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         

In [55]:
sum(py_list)     # sum() is a base Python function
max(py_list)     # max() is a base Python function
min(py_list)     # min() is a base Python function
np_array.sum()
np_array.max()
np_array.min()
np_array.mean()
np_array.std()

1.4142135623730951

**REMEMBER**

**roughly speaking** an operation on a list works on the **whole** list.

In contrast, an operation on an array works on the **individual elements** of the array.