In [1]:
from IPython.core.display import HTML

HTML("""
    <link rel="stylesheet" href="../fonts/cmun-bright.css">
    <style type='text/css'>
        * {
            font-family: Computer Modern Bright !important;
        }
    </style>
""")

<div style="text-align:left;font-size:2em"><span style="font-weight:bolder;font-size:1.25em">SP2273 | Learning Portfolio</span><br><br><span style="font-weight:bold;color:darkred">Storing Data (Need)</span></div>

In [2]:
import numpy as np

# What to expect in this chapter

Python Built-in Primitives
- `int`
- `float`
- `str`
- `bool`

Python Built-in Non-primitives
- `list`
- `set`
- `tuple`
- `dict`

Numpy
- arrays

Pandas
- dataframes

# 3.1 Lists, Arrays & Dictionaries

## 3.1.1 Let’s compare

We can store data entry wise as Python lists! 

In [3]:
school_abbrev = ["SoC", "FoS", "FASS", "FoE"]
school_long = ["School of Computing", "Faculty of Science", "Faculty of Arts and Social Sciences", "Faculty of Engineering"]

print("Abbreviations:", school_abbrev)
print("Full Names:", school_long)

Abbreviations: ['SoC', 'FoS', 'FASS', 'FoE']
Full Names: ['School of Computing', 'Faculty of Science', 'Faculty of Arts and Social Sciences', 'Faculty of Engineering']


As numpy arrays, 

In [4]:
school_abbrev_np = np.array(school_abbrev)
school_long_np = np.array(school_long)

print("Abbreviations:", school_abbrev_np)
print("Full Names:", school_long_np)

Abbreviations: ['SoC' 'FoS' 'FASS' 'FoE']
Full Names: ['School of Computing' 'Faculty of Science'
 'Faculty of Arts and Social Sciences' 'Faculty of Engineering']


or as Python sets, which do not allow for duplicate values and are unordered and unindexed

In [5]:
school_abbrev_set = set(school_abbrev)
school_long_set = set(school_long)

print("Abbreviations Set:", school_abbrev_set)
print("Full Names Set:", school_long_set)

Abbreviations Set: {'SoC', 'FASS', 'FoE', 'FoS'}
Full Names Set: {'Faculty of Arts and Social Sciences', 'Faculty of Engineering', 'Faculty of Science', 'School of Computing'}


or alternatively, Python dictionaries which allow for key -> value assignments

In [6]:
school_dict = dict(zip(school_abbrev, school_long))

print("School Dictionary:", school_dict)

School Dictionary: {'SoC': 'School of Computing', 'FoS': 'Faculty of Science', 'FASS': 'Faculty of Arts and Social Sciences', 'FoE': 'Faculty of Engineering'}


to which we can recover them as separate lists of keys and values

In [7]:
school_abbrev_recover = [key for key, value in school_dict.items()]
school_long_recover = [value for key, value in school_dict.items()]

print("Abbreviations List:", school_abbrev_recover)
print("Full Names List:", school_long_recover)

Abbreviations List: ['SoC', 'FoS', 'FASS', 'FoE']
Full Names List: ['School of Computing', 'Faculty of Science', 'Faculty of Arts and Social Sciences', 'Faculty of Engineering']


## 3.1.2 Accessing data from a list (or array)

Python lists and numpy arrays are indexed starting from `0`!

In [8]:
print(school_abbrev[0])
print(school_long[0])

SoC
School of Computing


This is in contrast to other structures or languages which may be indexed from `1` like normal human speak!

In [9]:
print(school_abbrev[-1])
print(school_long[3])

FoE
Faculty of Engineering


## 3.1.3 Accessing data from a dictionary

Accessing key-value pairs...

In [10]:
print(school_dict["SoC"])

School of Computing


And recovering them

In [11]:
print(school_dict.keys())
print(school_dict.values())

dict_keys(['SoC', 'FoS', 'FASS', 'FoE'])
dict_values(['School of Computing', 'Faculty of Science', 'Faculty of Arts and Social Sciences', 'Faculty of Engineering'])


which returns a `dict_keys` and `dict_values` object. They are iterables but do not function as lists! 

In [12]:
# dictkeys = school_dict.keys()
# dictvalues = school_dict.values()

# dictkeys[1] # returns'dict_keys' object is not subscriptable
# dictvalues[1] # returns'dict_values' object is not subscriptable

they must be assigned or typecasted into lists (responsibly and safely) before being used! 

## 3.1.4 Higher dimensional lists

In [13]:
school_2d_list = [list(pair) for pair in zip(school_abbrev, school_long)]

print("2D List of Schools:")
for pair in school_2d_list:
    print(pair)

2D List of Schools:
['SoC', 'School of Computing']
['FoS', 'Faculty of Science']
['FASS', 'Faculty of Arts and Social Sciences']
['FoE', 'Faculty of Engineering']


# 3.2 Lists vs. Arrays

Similarly, we cast our 2D list into a numpy array

In [14]:
combined_list = list(zip(school_abbrev, school_long))
school_2d_array = np.array(combined_list)

print("2D Array of Schools:")
print(school_2d_array)

2D Array of Schools:
[['SoC' 'School of Computing']
 ['FoS' 'Faculty of Science']
 ['FASS' 'Faculty of Arts and Social Sciences']
 ['FoE' 'Faculty of Engineering']]


## 3.2.1 Size

The `len` function! (and `shape` attribute of numpy arrays)

In [15]:
print(len(school_2d_list))

print(len(school_2d_array))
print(school_2d_array.shape)

4
4
(4, 2)


## 3.2.2 Arrays are fussy about type

Numpy array element types HAS to be homogeneous (which makes a lot more sense) whereas Python lists can contain a mixture of types and structures (which is definitely a lot more convenient!)

In [16]:
py_list = [1, 1.5, 'A']
np_array = np.array(py_list)

print(py_list)
print(np_array)

[1, 1.5, 'A']
['1' '1.5' 'A']


numpy typecasts a list of `int` and `str` into all `str`

## 3.2.3 Adding a number (to every element)

Appending is already a lot easier in Python, but Numpy makes things a lot easier!

In [17]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)

# py_list + 10        # Won't work!
np_array += 10

print(np_array)

[11 12 13 14 15]


## 3.2.4 Adding another list

`append` and `extend` for Python `list' which is already very convenient compared to languages with memory allocation...

In [18]:
py_list.append(21)
print(py_list)

py_list.extend([31])
print(py_list)

[1, 2, 3, 4, 5, 21]
[1, 2, 3, 4, 5, 21, 31]


the add operation concatenates `list` objects for python lists, numpy arrays instead increment each element index wise

In [19]:
py_list += [41, 51, 61]
print(py_list)

# np_array + [21, 31]     # this will not work
np_array = np.concatenate((np_array, [21, 31]))
print(np_array)

np_array += np.array([10] * 7)
print(np_array)

[1, 2, 3, 4, 5, 21, 31, 41, 51, 61]
[11 12 13 14 15 21 31]
[21 22 23 24 25 31 41]


## 3.2.5 Multiplying by a Number

How convenient Python can do all these fun and convenient stuff without having to in-place iterate!

In [20]:
py_list *= 2
print(py_list)

np_array *= 10
print(np_array)

[1, 2, 3, 4, 5, 21, 31, 41, 51, 61, 1, 2, 3, 4, 5, 21, 31, 41, 51, 61]
[210 220 230 240 250 310 410]


## 3.2.6 Squaring

and of course, squaring! 

In [21]:
# py_list = py_list **= 2

np_array **= 10

print(np_array)

[ 3349867528234288128  3006740969875832832  6580473194599359488
   769096289001406464 -3905460460108995584  7096233308316427264
  3547309027658376192]


## 3.2.7 Asking questions

Comparators on Python lists 'acts' on the entire python list

In [22]:
print(py_list == 3)     # ???? nani

# print(py_list > 3)      # Won't work!

False


Comparators on Numpy arrays work entry wise like a mask

In [23]:
print(np_array == 3)

print(np_array > 3)

[False False False False False False False]
[ True  True  True  True False  True  True]


## 3.2.8 Mathematics

Built-in Python List math functions:

In [24]:
print(sum(py_list))
print(max(py_list))
print(min(py_list))

440
61
1


Numpy array math methods

In [25]:
print(np_array.sum())
print(np_array.max())
print(np_array.min())
print(np_array.mean())

1997515783867143168
7096233308316427264
-3905460460108995584
2.920608551082385e+18


# Exercises & Self-Assessment

In [26]:
def quicksortR(arr):
    # Base case
    if len(arr) <= 1:
        return arr
    
    else:   # Recursive case
        pivot = arr[-1]     # Pivot (set at the end for convenience)
        
        less_than_pivot = arr[arr < pivot]      # Less thrown to the left
        greater_than_pivot = arr[arr > pivot]   # Right thrown to the right
        
        return np.concatenate((quicksortR(less_than_pivot), [pivot], quicksortR(greater_than_pivot)))

arr = np.random.rand(10)
sorted_arr = quicksortR(arr)
print("Sorted array:", sorted_arr)

Sorted array: [0.46890704 0.64677779 0.68321367 0.6863613  0.72695895 0.77412277
 0.80226095 0.81134662 0.92150797 0.93432245]


# Appendix/Extra!

## Summary of the differences between Numpy `arrays` and Python primitive `list`!

Python primitive `list`
1. General purpose computing/task/script
2. Dynamic/inhomogeneous datatyping
      - elements can be of different types, e.g. `int`, `float`, `str`
      - more versatility
3. more memory consumption
     - type
     - reference count
     - object value
4. performnce
     - faster!
5. versatility

`numpy` `array`
1. Designed for scientific computing
2. Homoegenous datatyping
      - elements must be of one (same) type, e.g. `int`, `float`, `str`
      - more efficient computation
3. more efficient memory utilisation
     - compact data storage!
4. performance
     - vectorized operations without explicit iteration!
     - faster
5. specific functionalities
     - science libraries support!

## Examples!

`list` -- handling a collection of items of different types, while performing basic operations like adding or removing items, but not performing complex numerical computations.

In [27]:
shopping_list = ["carrots", 4, "bananas", 2, "bread", 5, "croissant", 100000]

shopping_list.append("eggs")
shopping_list.append(12)

shopping_list.remove("bananas")

print("Updated shopping list:", shopping_list)


Updated shopping list: ['carrots', 4, 2, 'bread', 5, 'croissant', 100000, 'eggs', 12]


`numpy` `array` -- matrix multiplcation

In [28]:
import numpy as np

A = np.array([[1, 2, 3],
              [4, 5, 6]])

B = np.array([[7, 8],
              [9, 10],
              [11, 12]])

C = np.dot(A, B)

print("Matrix A:")
print(A)
print("\nMatrix B:")
print(B)
print("\nMatrix C (A * B):")
print(C)


Matrix A:
[[1 2 3]
 [4 5 6]]

Matrix B:
[[ 7  8]
 [ 9 10]
 [11 12]]

Matrix C (A * B):
[[ 58  64]
 [139 154]]


# Footnotes
Referenced [Storing Data (Need)](https://sps.nus.edu.sg/sp2273/docs/python_basics/03_storing-data/1_storing-data_need.html)