<div style="text-align:left;font-size:2em"><span style="font-weight:bolder;font-size:1.25em">SP2273 | Learning Portfolio</span><br><br><span style="font-weight:bold;color:darkred">Storing Data (Need)</span></div>

# What to expect in this chapter

In [1]:
import numpy as np

**Different ways to store and manipulate data in Python**
1. Lists
1. Numpy arrays
2. Dictionaries
3. Tuples
4. Dataframes
5. Classes

# 1 Lists, Arrays & Dictionaries

## 1.1 Let’s compare

**Python lists**

In [6]:
py_super_names = ["Black Widow", "Iron Man", "Doctor Strange"]
py_real_names = ["Natasha Romanoff", "Tony 老师", "Benedict Cucumber"]

**Numpy Arrays**

In [5]:
np_super_names = np.array(["Black Widow", "Iron Man", "Doctor Strange"])
np_real_names = np.array(["Natasha Romanoff", "Tony 老师", "Benedict Cucumber"])
#for list and array to work, the order matters.

**Dictionary**

In [4]:
superhero_info = {
    "Natasha Romanoff":"Black Widow",
    "Tony老师":"Iron Man",
    "Benedict Cucumber":"Doctor Strange"
}
# Dictionaries use a key and an associated value separated by a :
# can very elegantly holds the real and superhero names in one structure.

## 1.2 Accessing data from a list (or array)

Python is a **zero-indexed language**, meaning it starts counting at 0. 

If you want to access a particular element in the list or array, you need to specify the relevant index starting from zero.

In [7]:
py_super_names = ["Black Widow", "Iron Man", "Doctor Strange"] # Black widow's index is 0, Doctor strange's index is 2
py_real_names = ["Natasha Romanoff", "Tony 老师", "Benedict Cucumber"]

In [10]:
py_real_names[2] # use[] for data, NOT()

'Benedict Cucumber'

In [11]:
py_super_names[2]

'Doctor Strange'

In [12]:
# We can also use a negative index to count from the back of the list
# Useful to access the last element without knowing the list size
py_super_names [-1]

'Doctor Strange'

## 1.3 Accessing data from a dictionary

Dictionaries hold **data(value)** paired with a **key**. This is called a **Key-value structure**.

In this case, the superhero's real name is they **key** of their super name.

In [14]:
superhero_info = {
    "Natasha Romanoff":"Black Widow",
    "Tony老师":"Iron Man",
    "Benedict Cucumber":"Doctor Strange"
}

In [17]:
superhero_info['Benedict Cucumber'] # MUST use with [] because it is a list data, and '' because the key is a string

'Doctor Strange'

In [22]:
superhero_info['Doctor Strange'] # does not work the other way round

KeyError: 'Doctor Strange'

In [18]:
# To access all keys
superhero_info.keys()

dict_keys(['Natasha Romanoff', 'Tony老师', 'Benedict Cucumber'])

In [20]:
# To access all values
superhero_info.values()

dict_values(['Black Widow', 'Iron Man', 'Doctor Strange'])

## 1.4 Higher dimensional lists

Since we needed two lists to store the corresponding real and superhero names, we can also create a **2D list (or array)**.

In [31]:
#this is a list, NOT  an array.
py_superhero_info = [['Natasha Romanoff', 'Black Widow'],
                    ['Tony 老师', 'Iron Man'],
                    ['Bendict Cucumber', 'Doctor Strange']]
py_superhero_info[1][1] # We need to know the exact coordinate to call out a specific data. 
# We can call out the entire column or row of elements using a **for-loop**
# Advatage of list = faster to build

'Iron Man'

# 2 Lists vs. Arrays

Generally, an operation on a **list** works on the **whole list**. List can accomodate more than one data type.

In contrast, an operation on an **array** works on the **individual elements** of the array. Array can only work with one data type.

## 2.1 Size

We can use **len()** to know how many elements there are in a list or array

In [32]:
py_list_2d = [[1, "A"], [2, "B"], [3, "C"], [4, "D"],
              [5, "E"], [6, "F"], [7, "G"], [8, "H"],
              [9, "I"], [10, "J"]]

np_array_2d = np.array(py_list_2d)      # Reusing the Python list 
                                        # to create a NEW
                                        # NumPy array

In [36]:
print(id(py_list_2d), id(np_array_2d))   # They are not the same, not linked
print(type(py_list_2d), type(np_array_2d)) # They are also not the same type

3125213105856 3125213206832
<class 'list'> <class 'numpy.ndarray'>


In [37]:
# To know the size of a list
len(py_list_2d)

10

In [38]:
# To know the size of an array
len(np_array_2d)

10

In [40]:
np_array_2d.shape # (10, 2) means that the array has 10 rows and 2 columns

(10, 2)

<font color='blue'>

In np_array_2d.shape, shape does not need a () because it is an attribute, not a function. 

**Attribute is a property or characteristic associated with an object.**

Accessing an attribute retrieves this data without performing any computation or modification. Attributes are accessed using dot notation.

**Function is to perform a specific task or operation.** 

Fucntion is reusable code that performs a specific task. 

When you call a function, you're instructing the program to execute the code inside the function. 

Functions are called using parentheses after the function name (function_name()), which may or may not accept arguments.

In programming, a function is a block of code that can be called to perform a specific task. It may or may not take input parameter (arguments)s, and it may or may not return a result
</font>.

## 2.2 Arrays are fussy about type

Array insists on having **only single data type**, list is more accomodating

In [41]:
py_list = [1, 1.5, 'A']
np_array = np.array(py_list)

In [42]:
py_list

[1, 1.5, 'A']

In [43]:
np_array #array will convert all the int values into str with ''

array(['1', '1.5', 'A'], dtype='<U32')

<font color = 'blue'>
U32 is a **Unicode string data type**

**Unicode** is an international character encoding standard that provides a unique number for every character across languages and scripts, making almost all characters accessible across platforms, programs, and devices.

**String Data Type**: In NumPy, strings are represented using a specific data type. <U32 indicates a Unicode string data type where each element (string) can have a maximum length of 32 characters.

Some other data types incude:
**<U (Unicode String), <S (String), <a (Byte String), <int(Integer)**

<U and <S are suitable for representing text data, while <a is useful for handling binary data or interfacing with external systems that expect byte strings.
</font>

## 2.3 Adding a number

In [44]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         # Reusing the Python list
                                     # to create a NEW
                                     # NumPy array

In [46]:
py_list + 10        # Won't work!

TypeError: can only concatenate list (not "int") to list

In [47]:
np_array + 10

array([11, 12, 13, 14, 15])

## 2.4 Adding another list

Adding **lists** will cause them to **grow bigger in size**

Adding **arrays** is an **element-wise operation**

In [48]:
py_list_1 = [1, 2, 3, 4, 5]
py_list_2 = [10, 20, 30, 40, 50]

np_array_1 = np.array(py_list_1)
np_array_2 = np.array(py_list_2)

In [49]:
#for list
py_list_1 + py_list_2

[1, 2, 3, 4, 5, 10, 20, 30, 40, 50]

In [50]:
#for array
np_array_1 + np_array_2

array([11, 22, 33, 44, 55])

## 2.5 Multiplying by a Number

Multiplying **lists** will cause them to **grow bigger in size**

Multiplying **arrays** will **multiply its element by the number**

In [52]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         

In [53]:
#for list
py_list*2

[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]

In [56]:
#for array
np_array*2

array([ 2,  4,  6,  8, 10])

In [58]:
np_array*np_array #each element will be squared

array([ 1,  4,  9, 16, 25])

## 2.6 Squaring

In [59]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)

In [60]:
py_list**2                      # Won't work!  

TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'

In [61]:
np_array**2

array([ 1,  4,  9, 16, 25])

In [62]:
np_array**4

array([  1,  16,  81, 256, 625], dtype=int32)

## 2.7 Asking questions

In [63]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         

In [67]:
py_list == 3     # Works, but what IS the question?

False

In [68]:
py_list > 3      # Won't work!

TypeError: '>' not supported between instances of 'list' and 'int'

In [69]:
np_array==3

array([False, False,  True, False, False])

In [70]:
np_array>3

array([False, False, False,  True,  True])

## 2.8 Mathematics

In [71]:
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)         

**For list**

In [72]:
sum(py_list)     # sum() is a base Python function

15

In [73]:
max(py_list)     # max() is a base Python function

5

In [74]:
min(py_list)     # min() is a base Python function

1

In [75]:
py_list.sum()   # Won't work!

AttributeError: 'list' object has no attribute 'sum'

**For array**

In [76]:
np_array.sum()

15

In [77]:
np_array.max()

5

In [78]:
np_array.min()

1

In [79]:
np_array.mean()

3.0

In [80]:
np_array.std()

1.4142135623730951