<div style="text-align:left;font-size:2em"><span style="font-weight:bolder;font-size:1.25em">SP2273 | Learning Portfolio</span><br><br><span style="font-weight:bold;color:darkred">Storing Data (Need)</span></div>

In [3]:
import numpy as np

# What to expect in this chapter

There are a non-comprehensive list of ways to store and manipulate data:
1. Lists
2. Numpy arrays
3. Dictionaries
4. Tuples
5. Dataframes
6. Classes

# 1 Lists, Arrays & Dictionaries

## 1.1 Let’s compare

**Python Lists**

In [2]:
py_super_names = ["Black Widow", "Iron Man", "Doctor Strange"]
py_real_names = ["Natasha Romanoff", "Tony Stark", "Stephen Strange"]
print(py_super_names)
print(py_real_names)

['Black Widow', 'Iron Man', 'Doctor Strange']
['Natasha Romanoff', 'Tony Stark', 'Stephen Strange']


**Numpy Arrays**

In [3]:
np_super_names = np.array(["Black Widow", "Iron Man", "Doctor Strange"])
np_real_names = np.array(["Natasha Romanoff", "Tony Stark", "Stephen Strange"])
print(np_super_names)
print(np_real_names)

['Black Widow' 'Iron Man' 'Doctor Strange']
['Natasha Romanoff' 'Tony Stark' 'Stephen Strange']


**Dictionary**

In [9]:
superhero_info = {"Natasha Romanoff": "Black Widow", "Tony Stark": "Iron Man", "Stephen Strange": "Doctor Strange"}
print(superhero_info["Natasha Romanoff"])

Black Widow


Things to take note:
- Dictionaries use a key and an associated value that is separated by ```:```.
- Dictionaries can hold both the real and superhero names in one structure, but two lists or arrays are needed for the same data.
- In this case, the order of the lists and arrays matter.

## 1.2 Accessing data from a list (or array)

For data in a list or array, the index corresponding to the data's position starts from 0.

In [14]:
py_super_names = ["Black Widow", "Iron Man", "Doctor Strange"]
py_real_names = ["Natasha Romanoff", "Tony Stark", "Stephen Strange"]
print(py_real_names[0])
print(py_super_names[0])
print(py_super_names[2]) # Forward indexing
print(py_super_names[-1]) # Reverse indexing

Natasha Romanoff
Black Widow
Doctor Strange
Doctor Strange


## 1.3 Accessing data from a dictionary

Values in dictionaries can be accessed by using their keys:

In [15]:
superhero_info = {"Natasha Romanoff": "Black Widow", "Tony Stark": "Iron Man", "Stephen Strange": "Doctor Strange"}
print(superhero_info["Natasha Romanoff"])

Black Widow


In [26]:
print(superhero_info.keys()) # This is not returned as a list or dictionary
print(superhero_info.values())
superhero_names = list(superhero_info.keys())
print(superhero_names)


dict_keys(['Natasha Romanoff', 'Tony Stark', 'Stephen Strange'])
dict_values(['Black Widow', 'Iron Man', 'Doctor Strange'])
['Natasha Romanoff', 'Tony Stark', 'Stephen Strange']


## 1.4 Higher dimensional lists

2D lists or arrays can be used instead of two separate lists to store the corresponding real and superhero names.

In [1]:
py_superhero_info = [['Natasha Romanoff', 'Black Widow'], ['Tony Stark', 'Iron Man'], ['Stephen Strange', 'Doctor Strange']]

# 2 Lists vs. Arrays

## 2.1 Size

In [4]:
py_list_2d = [[1, "A"], [2, "B"], [3, "C"], [4, "D"], [5, "E"], [6, "F"], [7, "G"], [8, "H"], [9, "I"], [10, "J"]]
np_array_2d = np.array(py_list_2d) # Uses the py_list_2d list to create a new NumPy Array

In [8]:
print(len(py_list_2d))
print(len(np_array_2d))
np_array_2d.shape # Attribute of a NumPy Array

10
10


(10, 2)

NumPy arrays are specific in the sense that they require the array to have a consistent shape. So a list like ```[[1, 2], [1]]``` is okay but an array like ```array([[1, 2], [1]]``` is not allowed. The dimensions of the elements must be the same.

## 2.2 Arrays are fussy about type

One key difference between lists and arrays is that arrays insist on having only a single data type while lists are more accomodating.

In [9]:
py_list = [1, 1.5, "A"]
np_array = np.array(py_list)
print(py_list)
print(np_array)

[1, 1.5, 'A']
['1' '1.5' 'A']


When dealing with datasets that contain both numbers and text, be aware of this restriction.

## 2.3 Adding a number

In [10]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)
print(np_array + 10) # py_list + 10 does not work

[11 12 13 14 15]


## 2.4 Adding another list

Adding lists to each other causes the elements of one list to be added to the other, while adding arrays causes elements to change.

In [11]:
py_list_1 = [1, 2, 3, 4, 5]
py_list_2 = [10, 20, 30, 40, 50]
np_array_1 = np.array(py_list_1)
np_array_2 = np.array(py_list_2)

print(py_list_1 + py_list_2)
print(np_array_1 + np_array_2)

[1, 2, 3, 4, 5, 10, 20, 30, 40, 50]
[11 22 33 44 55]


## 2.5 Multiplying by a Number

Multiplying a list by a number causes the list to grow, whereas multiplying arrays by a number causes elements to be multiplied by the number itself.

In [12]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)
print(py_list*2)
print(np_array*2)

[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]
[ 2  4  6  8 10]


## 2.6 Squaring

Squaring an array causes elements to be squared, whereas this doesn't work with lists.

In [13]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)
np_array**2

array([ 1,  4,  9, 16, 25])

## 2.7 Asking questions

In [15]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)

In [23]:
print(py_list == 3)
# print(py_list > 3) doesn't work
print(np_array == 3)
print(np_array > 3)

False
[False False  True False False]
[False False False  True  True]


False

In [27]:
print(sum(np_array >3))
# takes a sum of the bottom list, where False is treated as 0 and True is treated as 1

2


For lists, it is possible to check for equality between the list and, say, an integer like 3, which will always return False as they are different object types.

 But it isn't possible to check if it is greater than or less than an interger since it is not an int and they are not the same object type. It can only be compared with another list.

For arrays, the individual elements are either checked for equality or checked if they are larger than the int in the condition.

In [36]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list * 5)
mask = np_array > 3 # The mask only consists of True and Falses
print(np_array[mask]) # Doing this returns an array with the elements that correspond to True values in the "mask"

[4 5 4 5 4 5 4 5 4 5]


## 2.8 Mathematics

In [25]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)

# sum(), max() and min() are base Python functions
print(sum(py_list))
print(max(py_list))
print(min(py_list))

print(np_array.sum())
print(np_array.max())
print(np_array.min())
print(np_array.mean()) # returned as float
print(np_array.std()) # returned as float

15
5
1
15
5
1
3.0
1.4142135623730951


In a generalised statement, operations on a list tend to work on the whole list, while operations on an array tend to work on individual elements within the array.

## Footnotes