# Module 10: Jupyter Notebook and Python Data Types

## Overview

While MATLAB excels in specialized numerical computations and simulations, Python's strength lies in its flexibility, extensive community support (open-source), and applications across a wide array of disciplines, making it a popular choice for diverse programming tasks.

Python is: 
1. the preferred language for data analysis, machine learning, and artificial intelligence tasks due to libraries like NumPy, Pandas, SciPy, and scikit-learn.
2. extensively used in scientific research and engineering computations due to its libraries like SciPy, SymPy, and Matplotlib, facilitating advanced scientific simulations and data visualization.

__What is a python library?__ 
<ul>
<li>A Python library, therefore, is a collection of Python modules containing functions, methods, and classes that allow you to perform specific actions without writing your own code for these tasks.</li>
<li>Python libraries are designed to simplify the programming process, and they cover a wide range of domains, such as data analysis, web development, machine learning, scientific computing, and more.</li>
<li>When you import a library into your Python script or program, you gain access to its predefined functions and classes.</li>
<li>By utilizing libraries, developers can save time and effort, they can leverage the existing solutions provided by these libraries, allowing for faster development and more efficient code. Python's extensive library ecosystem is one of the reasons why it is so popular among developers for various applications.</li>
</ul>

## Getting Started with Jupyter Notebook

There are two main types of cells in Jupyter notebook -- **Markdown** and **Code**. This is an example of a markdown cell. All this means, is that this cell does not run code. It is used for text, images, links, etc.

In [1]:
# This is an example of a code cell
# In a code cell, we use "#" to start our comments, which will not be executed.
# The following line, however, is python code
2+2

4

Some useful commands to make your text looks better: 
* **bold**
* *italic*

Similar to LaTex:
* $\sin(x)$
* $\int_0^1 e^x dx$

## Python Data Types

We will be working with many different Python data types, both built-in data types as well as data types defined in external libraries, like the library NumPy. It may seem that Python has an endless number of data types, but focus this week on really understanding the data types that are presented. In upcoming units, we may see less common data types that are presented once or twice, just as a way to show you that they exist.

### lists and strings

We start with two data types that seem very different but which share a lot of common functionalities.

We can:
* **Make a string using quotation marks (either single or double quotation marks).** 
* **Make a list by using square brackets.**

Here we define the variable `s` to be the string `"Hello, world"`.

You can check the type of an object by using the `type` function.

In [2]:
s = "Hello, world"

In [3]:
type(s) # Similar to the "class()" function in MATLAB

str

In [4]:
print(s)

Hello, world


Just like we made a string by using quotation marks, here we make a list by using square brackets.

In [5]:
mylist = [3,1,4,1]

In [6]:
mylist2 = [3 1 4 1] # Unlike MATLAB, we have to use commas in Python!

SyntaxError: invalid syntax. Perhaps you forgot a comma? (1712655810.py, line 1)

In [7]:
type(mylist)

list

Here is one of the many similarities between strings and lists (as well as many other data types in Python): you can use the following indexing notation to extract a part of the object.  
Notice that **numbering in Python starts at 0.** We can refer the first element as the "zeroth" element or the "initial" element. For example, the initial element in the string `Hello, world` is the letter `H`.

In [8]:
s[0]

'H'

In [9]:
mylist[0]

3

Another similarity between strings and lists is that both have a notion of length.

In [10]:
len(s)

12

In [11]:
len(mylist)

4

Because indexing starts at 0 and the string `s` has length 12, using `s[12]` will raise an error.

In [12]:
s[len(s)]

IndexError: string index out of range

To get the last element in a string (or list) `s`, you can use `s[len(s)-1]`, but much more common is to use the negative indexing shorthand `s[-1]` (you can regard ``-`` as a direction).

In [13]:
s[len(s)-1]

'd'

In [14]:
s[-1]

'd'

**slicing**:  similar to the colon command from MATLAB (``a:dx:b``), but with a few key syntax differences. We use slicing method in Python to get a sequence of elements.
* **The right endpoint is not included**. For example: Calling `s[1:4]` will include `s[1]`, `s[2]`, `s[3]`, but it will not include `s[4]`.
* **When slicing with specified step size, the step size goes at the end** (like ``a:b:dx``). For example: `s[1:7:2]` will include `s[1]`, `s[3]`, `s[5]`, but not `s[7]` because right endpoints are not included. 

In [15]:
s[1:4]

'ell'

In [16]:
s

'Hello, world'

The string `s[:5]` is the same as `s[0:5]`.  One benefit of the "right endpoint is not included" convention is that `s[:5]` will be length 5.

In [17]:
s[:5]

'Hello'

In [18]:
s[0:5]

'Hello'

Similarly, the notation `s[3:]` will go from the element `s[3]` to the end.

In [19]:
s[3:]

'lo, world'

In [20]:
mylist[1:]

[1, 4, 1]

In [21]:
mylist

[3, 1, 4, 1]

Example: `s[1:7:2]` will include `s[1]`, `s[3]`, `s[5]`, but not `s[7]` because right endpoints are not included.

In [22]:
s[1:7:2]

'el,'

The following gets all the even-indexed elements in `s`.

In [23]:
s[::2]

'Hlo ol'

Similarly, `s[1::2]` gets all the odd-indexed elements in `s` (we start at index 1 and go up in steps of 2).

In [24]:
s[1::2]

'el,wrd'

### lists, tuples, and sets

In the previous section, we considered two data types (lists and strings) which seem quite different.  Here we will look at three data types (lists, tuples, and sets) that on the surface seem very similar.  The difference between lists and tuples is more subtle, but sets are very different from these other two.

As we already saw, you can construct a list using square brackets.  To construct a tuple, you can use round parentheses, and to construct a set you can use curly brackets.

In [25]:
mylist = [3,1,4,1]

mytuple = (3,1,4,1)

myset = {3,1,4,1}

In [26]:
type(mytuple)

tuple

In [27]:
type(myset)

set

Here is a first difference with sets: like the mathematical notion of a set, sets in Python **do not allow repeated elements**.  Notice how there is only a single 1 shown in `myset`, even though we defined `myset` using `myset = {3,1,4,1}`.

In [28]:
myset

{1, 3, 4}

Here is a major difference between sets and the other two: sets do not support indexing.  This is because, like the mathematical notion of a set, **sets in Python do not have a notion of order**, so there is no "zeroth" element in a set.

In [29]:
myset[0]

TypeError: 'set' object is not subscriptable

In [30]:
mytuple[0]

3

In [31]:
mylist[0]

3

Sets, tuples, and lists all have a notion of length.

In [32]:
len(mytuple)

4

In [33]:
len(myset)

3

In [34]:
mylist[2]

4

Here is a first difference between lists and tuples: you can change elements in a list but not in a tuple. This is usually described as saying that tuples are **immutable**. \
Besides, tuples are **hashable** objects. A hashable object (like a tuple) has a fixed hash value, which encodes the features of its elements. This allows python to quickly index and organize it, at the cost of persisting its contents throughout its lifetime.

In [35]:
mylist[2] = 17

In [36]:
mylist

[3, 1, 17, 1]

In [37]:
mytuple[2]

4

In [38]:
mytuple[2] = 17

TypeError: 'tuple' object does not support item assignment

You can access the hash value of a tuple (the value itself is so far meaningless to you).

In [39]:
hash(mytuple)

5667174679445670316

In [40]:
hash(mylist) # lists and sets are not hashable

TypeError: unhashable type: 'list'

Another consequence of lists being mutable and tuples being immutable is that lists have an `append` method and tuples do not.The English word "append" implies "adding to the end", which does not make sense to sets.

In [41]:
mylist.append(8)

In [42]:
mylist

[3, 1, 17, 1, 8]

In [43]:
mytuple.append(8) # cannot use "append" method to a tuple, as it is immutable

AttributeError: 'tuple' object has no attribute 'append'

In [44]:
myset.add(8) # Alternatively, you may use "add" to add an element to a set
myset

{1, 3, 4, 8}

It might seem like lists are better than tuples in every way.  However, you can use tuples in certain situations where lists are not allowed. 

In [45]:
newset = {3,1,(4,1)} # here (4,1) is a tuple, and is an element of this "newset".

In [46]:
type(newset)

set

In [47]:
len(newset)

3

In [48]:
newset2 = {3,1,[4,1]}

TypeError: unhashable type: 'list'

Set elements must be **hashable**. A tuple is hashable, as long as its elements are all hashable.

### ranges and for loops

Our next data type is commonly used with for-loops, and should remind you of the colon command (``a:dx:b``) from MATLAB.

In [49]:
mystring = "Hello, world"
mylist = [3,1,4,1]
mytuple = (3,1,4,1)
myset = {3,1,4,1}

Here is the basic syntax for **for-loops** in Python. Key things to notice:

* The ``:`` at the end of the first line
* The indentation to enter the for-loop


In [50]:
for x in mylist:
    print(x)

3
1
4
1


In [51]:
for x in mystring:
    print(x)

H
e
l
l
o
,
 
w
o
r
l
d


The indentation is critical to Python programming! \
Unlike MATLAB that uses `{}` to control the code blocks, Python requires correct indentation to define the structure of your code:

In [52]:
for x in mylist:
print(x)

IndentationError: expected an indented block after 'for' statement on line 1 (3994173541.py, line 2)

In the following example, Python repeats each of these three `print` statements each iteration looping through the for loop. (**White space matters!**)

In [53]:
for x in mystring:
    print(x)
    print(type(x))
    print(len(x))

H
<class 'str'>
1
e
<class 'str'>
1
l
<class 'str'>
1
l
<class 'str'>
1
o
<class 'str'>
1
,
<class 'str'>
1
 
<class 'str'>
1
w
<class 'str'>
1
o
<class 'str'>
1
r
<class 'str'>
1
l
<class 'str'>
1
d
<class 'str'>
1


On the other hand, in the following, only the `print(x)` command is indented, so only that command is repeated by the for loop.  The other two `print` commands are executed after the for loop has completed.

In [54]:
for x in mystring:
    print(x) # only this line is in the loop!
print(type(x))
print(len(x))

H
e
l
l
o
,
 
w
o
r
l
d
<class 'str'>
1


In [55]:
for x in mytuple:
    print(x)

3
1
4
1


In [56]:
for x in myset:
    print(x)

1
3
4


In [57]:
myset # recall that there is no order in a set

{1, 3, 4}

#### Remark: Indexing of a list
Sometimes, you may want to extract an arbitarary subset of a list/string. For example, you want to grab the 1st, 4th and 3rd elements of `mylist` (in order). So you intuitively write (as in MATLAB):

In [58]:
mylist[(0,3,2)]

TypeError: list indices must be integers or slices, not tuple

You **CANNOT** directly use a list/tuple as the indices to subset a list/string !!

The alternative ways of such slicing is by using the **for** loop (introduced below) or use other libraries like **Numpy**.

In [59]:
[mylist[i] for i in [0,3,2]] # You may iterate over either a list or a tuple

[3, 1, 4]

In [60]:
[mylist[i] for i in (0,3,2)]

[3, 1, 4]

If you loop over a **set**, the order will be determined by the set's internal hash table. The output will NOT be guaranteed.

In [61]:
[mylist[i] for i in {0,3,2}] 

[3, 4, 1]

A **range** object in Python is a built-in iterable that represents a sequence of numbers

Here is our first example of a range object.  In this case, it is used as a quick way to repeat the print statement 5 times.

In [62]:
range(5)

range(0, 5)

In [63]:
for i in range(5):
    print("Hello, world")

Hello, world
Hello, world
Hello, world
Hello, world
Hello, world


The syntax in ``range`` is very similar to the slicing syntax from above: in particular, 
* **The step size comes at the end.**
* **The right endpoint is usually not included.**

In [64]:
for i in range(2,10,3):
    print(i)

2
5
8


==================================================

_**<font color = blue>In-class Exercise 1</font>**_: Modify the above for-loop, output from 10 to 3.

In [65]:
# write your code below


==================================================

The expression `range(10,3,-1)` above is its own type of object, a `range` object.

In [66]:
myrange = range(10,3,-1)

In [67]:
type(myrange)

range

In [68]:
myrange

range(10, 3, -1)

In [69]:
myrange2 = range(3,1000)

In [70]:
myrange2[0] # the first element

3

In [71]:
myrange2[-1] # the last element, cannot go beyond the ending point

999

Range objects support slicing.  That's not too important; maybe the most interesting thing about slicing in the context of range objects is that it again produces a range object.

In [72]:
# recall that: myrange2 = range(3,1000), and [10:100:4] has the starting point 10, ending point (not included) 100, step size 4
# so we have: myrange2[10]=13, myrange2[100]=103 
myrange2[10:100:4]

range(13, 103, 4)

You can also compute the length of a range object, the same way as you can compute the length of a string, list, tuple, or set.

In [73]:
# There are total of 998 elements from 3 to 1000. Since the right endpoint is not included, we have 997 elements.
len(myrange2)

997

### ints, floats, and bools

==================================================

_**<font color = blue>In-class Exercise 2</font>**_: Execute the following commands.

In [74]:
# 1. What do these two lines mean?
z = 10/2
z == 5

True

In [75]:
# 2. Compare the outputs.
True
true

NameError: name 'true' is not defined

In [76]:
# 3. Check the data type
False
type(False)

bool

==================================================

``bool`` -- Boolean values, which can be either **True** or **False**. Booleans are often used in programming for decision-making and conditional statements.

Since `z == 5` is `True`, can we use `range(z)` instead of `range(5)`? 

In [77]:
range(z)

TypeError: 'float' object cannot be interpreted as an integer

The problem is: `z` was defined as `10/2`, Python set its data type as a **floating point number** (like a decimal, not an integer), and range objects can only be created using **integers.**

In [78]:
type(5)

int

In [79]:
type(z)

float

Because computers can only specify decimals (floats, real numbers) to a finite degree of precision, some subtleties are inevitable.  One consequence is that it is **almost never correct to ask if two floating points are equal.**  Here is an example where two floats are obviously mathematically equal, but Python reports them as being unequal.

In [80]:
(0.1 + 0.1 + 0.1) == 0.3

False

This is not unique to Python:
* If you test it in *MATLAB*, the result will also be **false**; 
* If you test it in *Mathetica*, the result will be **true**, as Mathetica can treat them as rationals.

When you want to express `≠` in python, you need to use `!=` (in MATLAB, we use `~=` instead)

In [81]:
1 + 1 != 2

False

### dictionaries

Dictionary is one of the most common data types, which provides a powerful tool to assiociate keys and values. They start off looking pretty similar to sets, but are an entirely diferent data type. Here are some key characters:
* **Store a collection of key-value pairs.**
* **Unordered.**
* **Mutable.**
* **Each key in a dictionary must be unique, and it is used to access its corresponding value.**
* **Dictionaries are defined using curly braces {}, and key-value pairs are separated by colons.**

In [82]:
# This example is provided by GPT-3.5
# Creating a dictionary
my_dict = {
    "key1": "value1",
    "key2": "value2",
    "key3": "value3"
}

# Accessing values using keys
print(my_dict["key1"])  # Output: "value1"
print(my_dict["key2"])  # Output: "value2"


value1
value2


In [83]:
d = {"A": 90, "B": 80, "C": 70, "D": 60, "F": 0}

In [84]:
type(d)

dict

==================================================

_**<font color = blue>In-class Exercise 3</font>**_: Execute the following commands.

In [85]:
d["C"]

70

In [86]:
d[20]

KeyError: 20

In [87]:
d[0]

KeyError: 0

==================================================

You should use the "key" to access its "value".

Here is an example of adding a new key to the dictionary.  Notice how it gets displayed without quotation marks, because `0` is not the same as `"0"` (the first one is an integer, while the second one is a string).

In [88]:
d[0] = [3,1,4,1]

In [89]:
d

{'A': 90, 'B': 80, 'C': 70, 'D': 60, 'F': 0, 0: [3, 1, 4, 1]}

We can also change a value associated to a key using the same kind of syntax.  Notice how the old value gets deleted; there cannot be repeated keys in a dictionary.

In [90]:
d["B"] = 85

In [91]:
d

{'A': 90, 'B': 85, 'C': 70, 'D': 60, 'F': 0, 0: [3, 1, 4, 1]}

I probably should have done this earlier, but here is how you access the value associated for example to the key `"C"`.

Here we add a new key, and the value is itself a dictionary.  The two things to notice in this next example are that both `0` and `"0"` show up as keys, and that a dictionary can be a value in a dictionary.

In [92]:
d["0"] = {"hello": "first word", "world": "second word"}

In [93]:
d

{'A': 90,
 'B': 85,
 'C': 70,
 'D': 60,
 'F': 0,
 0: [3, 1, 4, 1],
 '0': {'hello': 'first word', 'world': 'second word'}}

Remember that **lists were not allowed to go in sets, but tuples were allowed.**  It is the same with keys in dictionaries. Values in dictionaries can be anything, but the keys need to be **hashable**.

In [94]:
d[[4,1]] = 1

TypeError: unhashable type: 'list'

But making the tuple `(4,1)` a key in our dictionary works fine.

In [95]:
d[(4,1)] = 1

In [96]:
d

{'A': 90,
 'B': 85,
 'C': 70,
 'D': 60,
 'F': 0,
 0: [3, 1, 4, 1],
 '0': {'hello': 'first word', 'world': 'second word'},
 (4, 1): 1}

Now you may notice dictionaries are quite similar to sets –– and you are right! Their underlying implementation is nearly the same: both are built on top of hash tables. In this way, they can provide fast lookups.

### Converting from one type to another

In [97]:
mylist = [3,1,4,1]

Creating a tuple out of `mylist` and does not actually change the value of `mylist` itself.

In [98]:
tuple(mylist)

(3, 1, 4, 1)

In [99]:
type(mylist)

list

If you actually want to be able to use that tuple you've created, you need to rename/save it.

In [100]:
x = tuple(mylist)

In [101]:
x

(3, 1, 4, 1)

You can also convert a tuple to a list:

In [102]:
mytuple = (3,1,4,1)
list(mytuple)

[3, 1, 4, 1]

Many of the objects we have seen so far can be converted into other data types.  For example: string --> list.

In [103]:
s = "Hello world"

In [104]:
list(s) # a lits of characters

['H', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd']

In [105]:
for x in s:
    print(x)

H
e
l
l
o
 
w
o
r
l
d


To convert a list to a string is not straightforward:

In [106]:
t = ['H', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd']
# Here, join concatenates a list of strings into a single string, with a separator between them. 
# The separator here is nothing ("")
"".join(t) 

'Hello world'

In [107]:
"-".join(t) # Use hyphen as the separator

'H-e-l-l-o- -w-o-r-l-d'

Here is an example where conversion does not work.  This particular list `mylist` cannot be converted to a dictionary.

In [108]:
mylist

[3, 1, 4, 1]

In [109]:
dict(mylist)

TypeError: cannot convert dictionary update sequence element #0 to a sequence

We can convert a list to a dictionary, if we’re a little more careful and make it easy for Python to gather key-value pairs.  

For example, this list of length-2 tuples can be converted.  Python interprets the 0th element in each tuple as the key and the 1st element in each tuple as the value.  (Remember that keys can't be repeated in a dictionary, which is why we only see the key `1` one time. **Each key can only have one value.**)

In [110]:
mylist2 = [(3,"a"),(1,"b"),(1,"c")]

In [111]:
dict(mylist2) # the value to the key "1" has been overwritten

{3: 'a', 1: 'c'}

A surprisingly useful conversion (that is often done automatically) is going from the Boolean values `True` and `False` to the integer values `1` and `0` (respectively).  One reason this is useful is that if you add together the elements in a list or array of Trues and Falses, the result will be exactly the number of Trues.

In [112]:
int(True)

1

In [113]:
int(False)

0

You can also add an integer to a Boolean value, and Python will automatically convert the Boolean value to an integer before doing the addition.  This functionality itself is not very important, but it's very important to remember in general that **`True` corresponds to `1` and that `False` corresponds to `0`.**

In [114]:
4+True

5

### Timing comparisons

A fundamental example of sets over lists and tuples is that you can search in a set (whose elements are hashable) much faster.

In [115]:
10000000

10000000

That integer is pretty hard to read.  Notice that we **cannot** make exponents in Python using the caret symbol `^`, instead you have to use `**`.
`^`: Bitwise XOR (you don't need to worry about this)

In [116]:
10^7

13

In [117]:
10**7

10000000

Or you may use the scientific notation `e`. Here, `aeb` means $a \times 10^{b}$.\
`b` could be positive, negative or zero, but it MUST be an integer.

In [118]:
2.5e-2 # You do not need the brackets for negative exponents

0.025

In [119]:
10000000 == 1e7

True

==================================================

_**<font color = blue>In-class Exercise 4</font>**_: Make a range object from 0 to 10000000, with step size 3. Then convert it into a list, into a tuple, and into a set.

In [120]:
# write your code below


==================================================

Here we use the operator `in` to check if `0` is in the corresponding object.

In [121]:
0 in mylist

False

In [122]:
1 in mylist

True

Here is a special feature of Jupyter notebooks, where we can time how long an operation takes.

In [123]:
%%timeit
0 in myset

9.49 ns ± 0.0436 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)


In [124]:
%%timeit
0 in mytuple

19.6 ns ± 0.078 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)


In [125]:
%%timeit
0 in mylist

20.3 ns ± 0.0568 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


## Other useful links

Christopher's youtube video playlist: https://www.youtube.com/watch?v=w3HxawiC-7s&list=PLHfGN68wSbbIuSS-Y1Y5zYDPN59OaggpA