<div style="text-align:left;font-size:2em"><span style="font-weight:bolder;font-size:1.25em">SP2273 | Learning Portfolio</span><br><br><span style="font-weight:bold;color:darkred">Storing Data (Need)</span></div>

In [1]:
import numpy as np

# What to expect in this chapter

# 1 Lists, Arrays & Dictionaries

## 1.1 Let’s compare

**Python lists**

In [2]:
py_super_names = ["Black Widow", "Iron Man", "Doctor Strange"]
py_real_names = ["Natasha Romanoff", "Tony Stark", "Stephen Strange"]

**Numpy arrays**

In [3]:
np_super_names = np.array(["Black Widow", "Iron Man", "Doctor Strange"])
np_real_names = np.array(["Natasha Romanoff", "Tony Stark", "Stephen Strange"])

**Dictionary**

In [4]:
superhero_info = {
    "Natasha Romanoff": "Black Widow",
    "Tony Stark": "Iron Man",
    "Stephen Strange": "Doctor Strange"
}

- Dictionaries use a key and an associated value separated by a :
- The dictionary holds the real and superhero names in one structure while we need two lists (or arrays) for the same data
- For lists and arrays, the order matters. I.e. ‘Iron Man’ must be in the same position as ‘Tony Stark’ for things to work
- py and np in front of the variables are for clarity. Any name works for the variables (provided that they are not a Python keyword like for, if)

## 1.2 Accessing data from a list (or array)

To access data from lists (and arrays), we need to use an index corresponding to the data’s position. Python is a zero-indexed language, meaning it starts counting at 0. So if you want to access a particular element in the list (or array), you need to specify the relevant index starting from zero

In [5]:
py_super_names = ["Black Widow", "Iron Man", "Doctor Strange"]
py_real_names = ["Natasha Romanoff", "Tony Stark", "Stephen Strange"]

In [7]:
py_real_names[0]   #for the first name use 0

'Natasha Romanoff'

In [9]:
py_super_names[2]  #for the 3rd name use 2

'Doctor Strange'

In [10]:
py_super_names[2]    # Forward indexing 
                     # We need to know the size 
                     # beforehand for this to work.

'Doctor Strange'

In [14]:
py_super_names[-1]   # Reverse indexing, counts from the back, useful, you dont need to know the size of the dataset

'Doctor Strange'

In [13]:
print (py_real_names[1], 'is', py_super_names[1]) 

Tony Stark is Iron Man


## 1.3 Accessing data from a dictionary

Dictionaries hold data (values) paired with a key. i.e. you can access the value (in this case, the superhero name) using the real name as a key

In [15]:
superhero_info = {
    "Natasha Romanoff": "Black Widow",
    "Tony Stark": "Iron Man",
    "Stephen Strange": "Doctor Strange"
}                  

In [16]:
superhero_info["Natasha Romanoff"]

'Black Widow'

In [18]:
superhero_info.keys()  #gives keys

dict_keys(['Natasha Romanoff', 'Tony Stark', 'Stephen Strange'])

In [20]:
superhero_info.values() #gives values

dict_values(['Black Widow', 'Iron Man', 'Doctor Strange'])

## 1.4 Higher dimensional lists

Unlike with a dictionary, we needed two lists to store the corresponding real and superhero names. An obvious way around the need to have two lists is to have a 2D list (or array)

In [22]:
py_superhero_info = [['Natasha Romanoff', 'Black Widow'],     #sort of a list of lists
                     ['Tony Stark', 'Iron Man'],
                     ['Stephen Strange', 'Doctor Strange']]

# 2 Lists vs. Arrays

In [23]:
py_list_2d = [[1, "A"], [2, "B"], [3, "C"], [4, "D"],
              [5, "E"], [6, "F"], [7, "G"], [8, "H"],
              [9, "I"], [10, "J"]]

np_array_2d = np.array(py_list_2d)      # Reusing the Python list 
                                        # to create a NEW
                                        # NumPy array

In [32]:
py_list_2d

[[1, 'A'],
 [2, 'B'],
 [3, 'C'],
 [4, 'D'],
 [5, 'E'],
 [6, 'F'],
 [7, 'G'],
 [8, 'H'],
 [9, 'I'],
 [10, 'J']]

In [33]:
np_array_2d

array([['1', 'A'],
       ['2', 'B'],
       ['3', 'C'],
       ['4', 'D'],
       ['5', 'E'],
       ['6', 'F'],
       ['7', 'G'],
       ['8', 'H'],
       ['9', 'I'],
       ['10', 'J']], dtype='<U11')

**List has numbers and letters, array has only letters, lists can hold mixed variables, arrays cannot**

## 2.1 Size

len() function lets us know how many elements in a list

In [24]:
len(py_list_2d) #for lists

10

In [29]:
len(np_array_2d) #for arrays

10

**OR**

In [31]:
np_array_2d.shape #shape is not a function, hence brackets not required

(10, 2)

## 2.2 Arrays are fussy about type

In [35]:
py_list = [1, 1.5, 'A']
np_array = np.array(py_list)

In [38]:
py_list

[1, 1.5, 'A']

In [36]:
np_array    #can change type using function astypes()

array(['1', '1.5', 'A'], dtype='<U32')

## 2.3 Adding a number

In [39]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         # Reusing the Python list
                                     # to create a NEW
                                     # NumPy array

In [40]:
py_list + 50        # Won't work!

TypeError: can only concatenate list (not "int") to list

In [42]:
np_array + 50   #can add number to all elements in arrays

array([51, 52, 53, 54, 55])

## 2.4 Adding another list

In [43]:
py_list_1 = [1, 2, 3, 4, 5]
py_list_2 = [10, 20, 30, 40, 50]

np_array_1 = np.array(py_list_1)
np_array_2 = np.array(py_list_2)

In [45]:
py_list_1 + py_list_2  #Numbers from both sets listed

[1, 2, 3, 4, 5, 10, 20, 30, 40, 50]

In [46]:
np_array_1 + np_array_2 #Numbers added up

array([11, 22, 33, 44, 55])

**adding lists causes them to grow while adding arrays is an element-wise operation**

## 2.5 Multiplying by a Number

In [48]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         

In [49]:
py_list*2

[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]

In [50]:
np_array*2

array([ 2,  4,  6,  8, 10])

**Multiplying by a number makes a list grow, whereas an array multiplies its elements by the number**

## 2.6 Squaring

In [51]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)

In [52]:
py_list**2                      # Won't work!  

TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'

In [53]:
np_array**2

array([ 1,  4,  9, 16, 25])

## 2.7 Asking questions

In [54]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         

**Lists**

In [59]:
py_list == 3     # Works, but what IS the question? Im asking if 3 is part of the list but the ans is false. Dont ask such questions on a list

False

In [56]:
py_list > 3      # Won't work!

TypeError: '>' not supported between instances of 'list' and 'int'

**Arrays**

In [62]:
np_array   #asks questions of every element

array([1, 2, 3, 4, 5])

In [57]:
np_array == 3 

array([False, False,  True, False, False])

In [58]:
np_array > 3

array([False, False, False,  True,  True])

In [65]:
sum(np_array > 3)  #In computing languages false is 0 and true is 1, so the sum is 0+0+0+1+1

2

## 2.8 Mathematics

In [66]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         

**List**

In [68]:
sum(py_list)     # sum() is a base Python function

15

In [69]:
max(py_list)     # max() is a base Python function

5

In [70]:
min(py_list)     # min() is a base Python function

1

In [78]:
py_list.sum()   # Won't work, .sum is not defined for the list

AttributeError: 'list' object has no attribute 'sum'

**Array**

In [72]:
np_array.sum()

15

In [73]:
np_array.max()

5

In [74]:
np_array.min()

1

In [75]:
np_array.mean()

3.0

In [76]:
np_array.std()

1.4142135623730951

**(roughly speaking) an operation on a list works on the whole list. In contrast, an operation on an array works on the individual elements of the array**