<div style="text-align:left;font-size:2em"><span style="font-weight:bolder;font-size:1.25em">SP2273 | Learning Portfolio</span><br><br><span style="font-weight:bold;color:darkred">Storing Data (Need)</span></div>

# What to expect in this chapter

We are learning Python as a tool to help us understand science and solve problems related to science. To do this, we must interact with information/data and transform them to yield a solution. For this, it is essential to have ways to store and manipulate data easily and efficiently beyond the simple variables we have encountered so far. Python offers a variety of ways to store and manipulate data. You have already met the list and dictionary in a previous chapter. However, there are several more; here is a (non-comprehensive) list.

- Lists
- Numpy arrays
- Dictionaries
- Tuples
- Dataframes
- Classes
  
In these chapters on basics, I will only discuss Python lists, Numpy arrays, dictionaries and tuples. If you want to learn about dataframes please look at the Data Processing basket in the Applications part. 

# 1 Lists, Arrays & Dictionaries

## 1.1 Let’s compare

Same information different ways to store

__Python lists__

In [2]:
py_super_names = ["Black Widow", "Iron Man", "Doctor Strange"]
py_real_names = ["Natasha Romanoff", "Tony Stark", "Stephen Strange"]

__Numpy arrays/lists__

In [1]:
import numpy

In [7]:
np_super_names = numpy.array(["Black Widow", "Iron Man", "Doctor Strange"])
np_real_names = numpy.array(["Natasha Romanoff", "Tony Stark", "Stephen Strange"])

__Dictionary__

In [6]:
superhero_info = {
    "Natasha Romanoff": "Black Widow",
    "Tony Stark": "Iron Man",
    "Stephen Strange": "Doctor Strange"
}


## 1.2 Accessing data from a list (or array)

Note that python is a zero-based index language meaning the first position is "0" and not "1"

In [9]:
py_real_names[0] #to recall the first real name in the python list

'Natasha Romanoff'

In [10]:
py_super_names[0] #to recall the first super name in the python list

'Black Widow'

- __reverse indexing__
- Using a negative index allows us to count from the back of the list. For instance, using the index -1 will give the last element. This is super useful because we can easily access the last element without knowing the list size.



In [11]:
py_super_names[2]    # This is Forward indexing (like the first example)
                     # We need to know the size 
                     # beforehand for this to work.

'Doctor Strange'

In [14]:
py_super_names[-1]   # This is Reverse indexing
                     # I.e. accessing from the back of the list
                     # -1 would be Dr Strange, -2 would be Iron man, and so on

'Doctor Strange'

## 1.3 Accessing data from a dictionary

Dictionaries hold data (values) paired with a key. i.e. you can access the value (in this case, the superhero name) using the real name as a key. Here is how it works:



In [15]:
superhero_info = {
    "Natasha Romanoff": "Black Widow",
    "Tony Stark": "Iron Man",
    "Stephen Strange": "Doctor Strange"
}                

# so you have the "paired key" : "data"

In [16]:
superhero_info["Natasha Romanoff"]

'Black Widow'

In [17]:
superhero_info["Stephen Strange"]

'Doctor Strange'

In [18]:
superhero_info["Iron Man"]

KeyError: 'Iron Man'

So in the above you get an error because you're trying to call form the data instead of the paired key.

In [19]:
#Access all the keys 
superhero_info.keys()

dict_keys(['Natasha Romanoff', 'Tony Stark', 'Stephen Strange'])

In [21]:
#Access all the values
superhero_info.values()

dict_values(['Black Widow', 'Iron Man', 'Doctor Strange'])

__Remember that dictionaries have a key-value structure.__

## 1.4 Higher dimensional lists

Before, in a dictionary, we can store everything in one {}. But if we want to pair up the data within, we can make a 2D list...a list within a list.

In [22]:
#A 2D list that pairs the data together
py_superhero_info = [['Natasha Romanoff', 'Black Widow'],
                     ['Tony Stark', 'Iron Man'],
                     ['Stephen Strange', 'Doctor Strange']]


# 2 Lists vs. Arrays

Lists and arrays have some similarities but more differences. Leanring about both will allow you to appreciate the versatility that each offers.

## 2.1 Size

__Knowing the size of lists and arrays with len ()__


In [30]:
#making a python list
py_list_2d = [[1, "A"], [2, "B"], [3, "C"], [4, "D"],
              [5, "E"], [6, "F"], [7, "G"], [8, "H"],
              [9, "I"], [10, "J"]]

#Making the python list an array
np_array_2d = numpy.array(py_list_2d)   # Reusing the Python list 
                                        # to create a NEW
                                        # NumPy array

In [27]:
len(py_list_2d) #Counts the 10 pairs of data and tells you the number

10

In [28]:
len(np_array_2d) #Counts the 10 pairs of data and tells you the number

10

In [31]:
np_array_2d.shape #Counts the 10 pairs of data and tells you that they are grouped in 2s.

(10, 2)

Notice the absence of brackets ( ) in shape above. This is because shape is not a function. Instead, it is a property or attribute of the Numpy array.

## 2.2 Arrays are fussy about type

Arrays insist on having only a single data type. Lists are more accomodating. 
Consider the following example and notice how the numbers are converted to English (' ') when we create the NumPy array.

In [3]:
import numpy as np

In [6]:
py_list = [1, 1.5, 'A']
np_array = np.array(py_list)

In [7]:
#Lists
py_list

[1, 1.5, 'A']

In [9]:
#Vs the array which on't jove well with the 'A'
np_array

array(['1', '1.5', 'A'], dtype='<U32')

This is not a super big deal an is easily amendable with `astypes()` to tycast (i.e change the type). More on that later.

## 2.3 Adding a number

In [10]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         # Reusing the Python list
                                     # to create a NEW
                                     # NumPy array

In [12]:
#Lists
py_list + 10
# This won;t work as the error message says below, in the list format it can only concatenate list into list and not integer into list

TypeError: can only concatenate list (not "int") to list

In [15]:
# But with an Array we can arithmetically add 10
np_array + 10

array([11, 12, 13, 14, 15])

## 2.4 Adding another list

In [18]:
# Let's start by making 2 lists/arrays
py_list_1 = [1, 2, 3, 4, 5]
py_list_2 = [10, 20, 30, 40, 50]

np_array_1 = np.array(py_list_1)
np_array_2 = np.array(py_list_2)

In [19]:
#Lists can be added together to form a larger list by the + sign
py_list_1 + py_list_2

[1, 2, 3, 4, 5, 10, 20, 30, 40, 50]

In [21]:
#Arrays are arithmetically added together with the + sign
np_array_1 + np_array_2

array([11, 22, 33, 44, 55])

## 2.5 Multiplying by a Number

In [None]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         

In [22]:
#Lists can be multiplies to increase the number the number of elements in it by x number of times leading to a larger list
py_list*3

[1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5]

In [26]:
#Arrays are arithmetically multiplied by the factor
np_array*3

array([ 3,  6,  9, 12, 15])

## 2.6 Squaring

In [None]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)

In [28]:
#You can't square the elements stored in a list
py_list**2

TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'

In [29]:
#You can square the elements stored in an array arithmetically
np_array**2

array([ 1,  4,  9, 16, 25])

## 2.7 Asking questions

In [None]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         


In [30]:
#List Example 1
py_list == 3
#Asks: is py_list only consisting of 3?

False

In [34]:
#List Example 1 cont
py_list == [1, 2, 3, 4, 5]
#This then is...

True

In [35]:
#List Example 2
py_list > 3      # Won't work! '>' not supported between instances of list and integers

TypeError: '>' not supported between instances of 'list' and 'int'

In [37]:
#Array Example 1
np_array == 3
#In an array format, == will cycle through each element to check if it is true for each one individually and return...

array([False, False,  True, False, False])

In [38]:
#Array Example 2
#It follows in the same way for inequality signs
np_array > 3

array([False, False, False,  True,  True])

## 2.8 Mathematics

In [39]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         


__Summation (Lists)__

In [40]:
#Lists Example 1
sum(py_list)     # sum() is a base Python function

15

__Maximum (Lists)__

In [42]:
#Lists Example 2
max(py_list)     # max() is a base Python function

5

In [45]:
__Minimum (Lists)__

SyntaxError: invalid syntax (3923588591.py, line 1)

In [43]:
min(py_list)     # min() is a base Python function

1

__Careful with syntax for lists__

In [44]:
py_list.sum()   # Won't work!

AttributeError: 'list' object has no attribute 'sum'

__Summation (Arrays)__

In [47]:
np_array.sum()

15

__Maximum (Arrays)__

In [48]:
np_array.max()

5

__Minimum (Arrays)__

In [49]:
np_array.min()

1

__Mean (Arrays)__

In [50]:
np_array.mean()

3.0

__Standard Deviation (Arrays)__

In [51]:
np_array.std()

1.4142135623730951

__TLDR: an operation on a list works on the whole list. In contrast, an operation on an array works on the individual elements of the array.__

# Exercises & Self-Assessment

Yuan Zhe, do I do the Storing Data Need exercises here or in the separate Storing Data Need exercises file? 

In [None]:



# Your solution here




## Footnotes