<div style="text-align:left;font-size:2em"><span style="font-weight:bolder;font-size:1.25em">SP2273 | Learning Portfolio</span><br><br><span style="font-weight:bold;color:darkred">Storing Data (need)</span></div>

# What to expect in this chapter

To help us understand science and solve problems related to science, we must be able to interact with information/data and transform them to yield solutions. To do so, we have to be able to store and manipulate data easily beyond simple variables. Apart from **lists** and **dictionaries** here are some more ways to store data;

1. Lists
2. Numpy Arrays
3. Dictionaries
4. Dataframes
5. Classes

However, we will only discuss Python `lists`, Numpy `Arrays`, `dictionaries` and `tuples`. 
<br> <span style='color:Red'> *Dataframes can be accessed in the Data processing basket in the applications part*</span>

# Lists, Arrays & Dictionaries

## Let's compare
____

Here are some ways to store information using <span style = 'color: orange'> lists, arrays </span> and <span style='color:orange'>dictionaries </span>.  

**Python Lists**

In [10]:
py_super_names = ['Black Widow', 'Iron Man', 'Doctor Strange']
py_real_names = ['Natasha Romanoff', 'Tony Stark', 'Stephen Strange']

**Numpy Arrays**

In [4]:
np_super_names = np.array(['Black Widow', 'Iron Man', 'Doctor Strange'])
np_real_names = np.array(['Natasha Romanoff', 'Tony Stark', 'Stephen Strange'])

NameError: name 'np' is not defined

In [5]:
import numpy as np

In [6]:
np_super_names = np.array(['Black Widow', 'Iron Man', 'Doctor Strange'])
np_real_names = np.array(['Natasha Romanoff', 'Tony Stark', 'Stephen Strange'])

**Dictionary**

In [18]:
superhero_info = {
    'Natasha Romanoff':'Black Widow',
    'Tony Stark':'Iron Man',
    'Stephen Strange':'Doctor Strange'
}

Notes: 
- Dictionaries use a key and an associated value that is separated by a `:`
- The dictionary very elegantly holds the real and superhero names in one structure while we need 2 lists (or arrays) for the same data.
- For lists and arrays, the order matters. (i.e. "Iron Man" must be in the same position as "Tony Stark") for things to work. 

## Accessing Data from a list (or array)
_____

We use an index corresponding to the data's position. 
<br> Python is a zero-induced language, meaning that it starts counting at 0. Hence we start from zero. 
Here's an image for our reference!

![](https://sps.nus.edu.sg/sp2273/docs/python_basics/03_storing-data/python-zero-indexed-counting.png)

In [11]:
py_super_names = ['Black Widow', 'Iron Man', 'Doctor Strange']
py_real_names = ['Natasha Romanoff', 'Tony Stark', 'Stephen Strange']

In [12]:
#Example 1
py_super_names[2]

'Doctor Strange'

In [14]:
#Example 2
py_super_names[0]

'Black Widow'

In [15]:
#Example 3: Using a negative index to count from the back of the list. E.g. -1 gives te last element 
#This is an example of forward indexing
py_super_names[2] 

'Doctor Strange'

In [17]:
#Example of Negative/Reverse Indexing
py_super_names[-1]

'Doctor Strange'

## Accessing Data from a dictionary
____

- Dictionaries hold a value with a key, in this case we can access the value using the key. Here is an example of how it works:

In [None]:
superhero_info = {
    'Natasha Romanoff':'Black Widow',
    'Tony Stark':'Iron Man',
    'Stephen Strange':'Doctor Strange'
}

In [21]:
superhero_info["Natasha Romanoff"]

'Black Widow'

In [22]:
superhero_info["Black Widow"]

KeyError: 'Black Widow'

<span style='color:red'>*We can only access the values using the keys, not the other way round* </span>

In [23]:
superhero_info.keys()

dict_keys(['Natasha Romanoff', 'Tony Stark', 'Stephen Strange'])

In [24]:
superhero_info.values()

dict_values(['Black Widow', 'Iron Man', 'Doctor Strange'])

## Higher Dimensional Lists
_______

A way to get around needing 2 lists, is to have a 2D list (or array) as follows :) 

In [25]:
py_superhero_info = [['Natasha Romanoff', 'Black Widow'],
                     ['Tony Stark', 'Iron Man'],
                     ['Stephen Strange', 'Doctor Strange']]

# Lists vs Arrays

## Size
_____

We can use the `len( )` function to know how many elements there are in a list or an array

In [27]:
py_list_2d = [[1, "A"], [2, "B"], [3, "C"], [4, "D"],
              [5, "E"], [6, "F"], [7, "G"], [8, "H"],
              [9, "I"], [10, "J"]]

np_array_2d = np.array(py_list_2d)      # Reusing the Python list 
                                        # to create a NEW
                                        # NumPy array

In [28]:
len(py_list_2d)
len(np_array_2d)
np_array_2d.shape

(10, 2)

*Shape is not a function, it is a property or attribute of the NumPy array*

## Arrays are fussy about type
_____

data types = `int`, `float`, `str`
<br> The difference between lists and arrays is that arrays insist on having a single data type; lists are more accomodating

In [31]:
py_list=[1,1.5,'A']
np_array = np.array(py_list)

In [30]:
py_list
np_array

array(['1', '1.5', 'A'], dtype='<U32')

We'll learn about the hidden function `astypes()` in a later chapter

## Adding a number
______

In [36]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         # Reusing the Python list
                                     # to create a NEW
                                     # NumPy array

In [37]:
np_array + 10

array([11, 12, 13, 14, 15])

<span style='color:red'>*py_list + 10 won't work </span>
<br> The expression py_list + 10 won't work because it attempts to add an integer (10) to a list (py_list), which is not a valid operation in Python.

## Adding another list
_______

In [38]:
py_list_1 = [1, 2, 3, 4, 5]
py_list_2 = [10, 20, 30, 40, 50]

np_array_1 = np.array(py_list_1)
np_array_2 = np.array(py_list_2)

In [45]:
print(py_list_1+py_list_2)
print(np_array_1+ np_array_2)

[1, 2, 3, 4, 5, 10, 20, 30, 40, 50]
[11 22 33 44 55]


## Multiplying by a Number
____

In [41]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         

In [44]:
print(py_list*2)
print(np_array*2)


[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]
[ 2  4  6  8 10]


*Multiplying by a number makes a list grow, but when an array multiplies it multiplies its elements by the number*

## Squaring
_____

In [46]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)

In [48]:

print(np_array **2)

[ 1  4  9 16 25]


In [51]:
print(py_list **2)

TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'

*py_list ** 2 won't work!*

## Asking questions
____

In [52]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         

In [53]:
py_list == 3     # Works, but what IS the question?
np_array == 3  
np_array > 3  

array([False, False, False,  True,  True])

In [56]:
py_list == 3     # checks if the list equals to 3?
np_array == 3    #checks if the array equal to 3?
np_array > 3     #checks if the values in the array is more than 3? 

array([False, False, False,  True,  True])

## Mathematics
_____

In [57]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         

In [61]:
print(sum(py_list))     # sum() is a base Python function
print(max(py_list))     # max() is a base Python function
print(min(py_list))     # min() is a base Python function
print(np_array.sum())
print(np_array.max())
print(np_array.min())
print(np_array.mean())
print(np_array.std())

15
5
1
15
5
1
3.0
1.4142135623730951


**<span style='color:green'>(roughly speaking) an operation on a list works on the whole list. In contrast, an operation on an array works on the individual elements of the array.**</span>