# Introduction

This is a markdown cell. In edit mode, we use plain text to describe how we want the material to be presented. If we want bullets, we use asterisks or numbers:

* first items
* second item
* and so on.

When we "run" this cell, the formatted output is displayed on the web-page.

### Interactive Computing

Now let's take a look at how we work with Python within this environment.

In [2]:
the_world_is_flat = True

if the_world_is_flat:
    print("Be careful not to fall off!")

Be careful not to fall off!


In [3]:
2 + 2

4

In [4]:
17 / 3

5.666666666666667

In [5]:
w = 4.0
h = w * 2.3

There are "magic" commands in a notebook, that allow you to perform useful actions. For instance, the following command tells you what objects you have created in this session.

In [6]:
%whos

Variable            Type     Data/Info
--------------------------------------
h                   float    9.2
the_world_is_flat   bool     True
w                   float    4.0


You can explore other useful magic commands using the following command:

In [None]:
%quickref

In [None]:
%hist

In [None]:
%rerun

In [None]:
%recall

In [None]:
5.666666666666667

In [None]:
5.666666666666667

In [None]:
113.09733552923255

This markdown cell contains latex code
\begin{equation}
x = 1
\end{equation}

In [None]:
%%latex

\[
1 = \frac{1}{2} + \frac{1}{2} = \frac{3}{4} + \frac{1}{4}
\]

## Python Data Structures

### Slice Operator

The `:` operator is known as the *slice* operator. It useful in extracting sequences from list-like objects.

In Python, strings like the one below are like "lists" of individual characters.

In [None]:
word = 'Python'
word[0] # first element

In [None]:
word[-1] # last element (useful when you do not know the total number of elements)

In [None]:
len(word)

In [None]:
word[0::2] # starting from the first element till the last element, every 2nd position

Here is a pictorial representation of how slices should be envisioned:

![python_slice.png](attachment:python_slice.png)

The negative indices on the second row are an alternate, right-to-left way we can index strings.

### Lists, Tuples and Dictionaries

The main types of containers in Python are 
* lists, which are defined with `[ ]`. Lists are mutable.
* tuples, which are defined with `( )`. Tuples are immutable.
* dictionaries, which are defined with `{ }`. Dictionaries have keys and items. They are also mutable.

Here's what we mean by mutable. The entries in a list (or dictionary) can be overwritten, but not those in a tuple:

In [None]:
x = [1, 3, 5, 7, 8, 9, 10]

# This is OK, because "x" is a list.
x[3] = 17     
print(x)

In [None]:
# This is not OK, because x_tuple is a tuple
x_tuple = (1, 3, 5, 6, 8, 9, 10)
x_tuple[3] = 17 

Lists, tuples and dictionaries are *iterables*. This means that they can be *iterated over* in `for` loops. This is a convenient concept in Python.

In [None]:
for ele in x:
    print(ele)

The `range` function allows us to create an iterable conveniently...

In [None]:
x_range = range(1, 11, 2)
x_range

In [None]:
for ele in x_range:
    print(ele)

In [None]:
for x in x_range:
    print(f'{x} squared: {x ** 2}, cubed: {x ** 3}.')

...which we can convert to a list to see all of its elements.

In [None]:
list(x_range)

In `print`, we have used an f-string (short for a formatted string literal).

Dictionaries contain keys and their values.

In [None]:
comics = {'marvel': 1, 'dc': 0}
comics.keys()
# list(comics.keys())

In [None]:
?comics.keys

In [None]:
comics.items()
# list(comics.items())

In [None]:
comics.values()
# list(comics.values())

We slice/index a dictionary object by a key to retrieve the associated value, if it exists.

In [None]:
comics['marvel']

We then iterate over the items of a dictionary, which are 2-tuples of key-value pairs. Here, we unpack them immediately using the following code:

In [None]:
key = 'test'
print(f'Here\'s the {key!r}')
print(f'Here\'s the {key}')

In [None]:
key.__repr__()

In [None]:
for key, value in comics.items():
    # Note that we use single-quotes inside a double-quoted
    # string. We could instead use double-quotes inside a
    # single-quoted string.
    print(f"Here's the key '{key}',")
    print(f"and the associated item '{value}'.")
    # Insert a newline.
    print()

Notice that splitting the item 2-tuple within the `for` statement of a loop
is a bit more convenient than doing it in the loop's body.

## String Methods

### String Formatting

Text data in Python is handled with `str`-type objects. Like tuples, strings are iterable and immutable. String operations in Python are fast. There is a comprehensive suite of functions for dealing with strings -- from catenating to capitalising to searching through them.

In [None]:
test_str = 'Where in the World is Carmen San Diego?'

Remember that a string is immutable so this will not work:

In [None]:
test_str[5] = 'z'

In [None]:
count = 0
for character in test_str:
    if character.isupper():
        count += 1
print(f'There were {count} upper-case characters in the sentence.')

In [None]:
test_str.lower()

To join strings, we can use the '+' operator, the `str.join()` method, or, if they are part of the same expression, we can just place them next to each other separated by whitespace.

In [None]:
x = "Where shall "
y = "we "
new_str = "Where shall " "we " "go today?"
print(new_str)

In [None]:
' '.join(["Where", "shall", "we", "go", "today?"])

In [None]:
"Where " + "shall" + " we go" + " today?"

To find simple patterns, we may turn to the `find`, `replace`, `startswith` and `endswith` methods.

In [None]:
test_str.find('Carmen')

In [None]:
test_str[test_str.find('Carmen'):]

In [None]:
test_str.replace('Carmen', 'John')

For more complex patterns, we might have to turn to *regular expressions*.

### Regular Expressions

Regular expressions are a mini-language for specifying a search pattern. Regex-es are not specific to Python. Similar syntax is used in Perl, Ruby, Awk, and R. They are difficult to decipher at first glance sometimes, so if you use them, I recommend you describe it in the comments, so that you will know what you were doing when you come back to it the next day.

One of the best write-ups to learn about regexes is from the Python HOWTOs page. I recommend going through it 
[here](https://docs.python.org/3/howto/regex.html).

In [None]:
# The re library allows us to execute common regex
# operations in Python.
import re

p = re.compile('a[bcd]*b')

The following are meta-characters when creating regular expressions:

. ^ $ * + ? { } [ ] \ | ( )

If you look back above, here's what we mean in that expression: the square brackets denote a character class (one of these must match), while the asterisk denotes the number of repeats to look for. The asterisk is a greedy matcher - it tries to match as much or as many times as possible.

After we create a regular expression object, we can match it against a string:

In [None]:
out = p.match('abcbd')

In [None]:
out.span()

In [None]:
'abcbd'[0:4]

In [None]:
out.group()

The returned object contains information about the match:
* out.group() contains the matched string
* out.start() and out.end() returns the start and end of the match
* out.span() contains a tuple with the start and end indices

In [None]:
out.start()

In [None]:
out.end()

Using `p.match` matches at the beginning of a string. If we wish to match anywhere, we use `p.search()`.

In [None]:
# p.match/search/findall('abtyroiutrabb')

In [None]:
# p.match('abcbd').start()
p.match('abcbd').group()

`*` matches the preceding character 0 or more times, so this will also match:

In [None]:
p.match('ab').group()

The `+` operator matches at least once. So the following match returns None.

In [None]:
p2 = re.compile('a[bcd]+b')
p2.match('ab') is None

Suppose we have a list of email addresses and wish to find only those that are nus.edu.sg addresses:

In [None]:
email_adds = ['e1234@nus.edu.sg', 'a2305@nus.edu.sg', 'eric@gmail.com', 
              'george@yahoo.com', 'e1234@ntu.edu.sg', 'nus.edu.sg@ntu.edu.sg']
re1 = re.compile('nus.edu.sg')

In [None]:
re1.match(email_adds[5])

In [None]:
ids = []
for x, y in enumerate(email_adds):
    if re1.match(y) is not None:
        ids.append(x)
ids

That did not work because `match` looks for strings matching at the beginning of a string.

In [None]:
re1.search(email_adds[0])

In [None]:
ids = []
for x, y in enumerate(email_adds):
    if re1.search(y) is not None:
        ids.append(x)

In [None]:
ids

In [None]:
emails_that_match = []
for x in ids:
    emails_that_match.append(email_adds[x])

In [None]:
emails_that_match

In [None]:
[email_adds[x] for x in ids]

That did not work because of the funny address nus.edu.sg@ntu.edu.sg. 

We need to pick up only those where nus.edu.sg is at the end of the string.

In [None]:
re2 = re.compile('nus.edu.sg$')
ids = []
for x, y in enumerate(email_adds):
    if re2.search(y) is not None:
        ids.append(x)
[email_adds[x] for x in ids]

In [None]:
[email_adds[x] for x, y in enumerate(email_adds) if re2.search(y) is not None]

### Functions

Functions encapsulate certain instructions to be carried out on inputs which can produce an output.
To define a function in Python, we use the `def` keyword.

In [None]:
def test_function(x):
    print('You typed', x)
    
test_function('test')

Sometimes, we do not need the permanence of a named, defined function. In those situations, where we just need a one-line function once, we can use a lambda function.

In [None]:
y = ['a', 'b', 'd']

# Here, we apply the lambda function to "y", without
# storing it in memory after the map operation has completed.
map(lambda x: 'You typed ' + x, y)

The `map` function returns the output in an iterator, which we can convert to a list:

In [None]:
list(map(lambda x: 'You typed ' + x, y))

We encourage you to avoid the use of `for` loops with the `map` function:

In [None]:
list(map(re2.search, email_adds))

Another quick way to iterate and produce output in list form is to use *list comprehension*.

In [None]:
['You typed ' + x for x in y]

What if we wished to match only the userid in the string of email addresses earlier? In other words, we only wished to match until the `@` symbol in the list?

Let's do it in two ways: Using the `split()` method, and then using a look-ahead assertion.

In [None]:
import pprint

In [None]:
pp = pprint.PrettyPrinter(4)

In [None]:
pp.pprint(email_adds)

In [None]:
p1 = re.compile('@')
userids = [p1.split(x)[0] for x in email_adds]
#userids = [x.split('@')[0] for x in email_adds]

In [None]:
userids

In [None]:
p2 = re.compile('^.*(?=@)')
userids = [p2.match(x).group() for x in email_adds]

In [None]:
userids

#### Modules and Packages

You may have noticed we "imported" the `re` package earlier. What did we mean by that?

* A Python module is a file containing Python definitions (of functions and constants) and statements.
* Instead of re-typing functions every time, we can simply load the module. We would then have access to the functions.
* We access objects within the module using the ``dot'' notation.
* There are several modules that ship with the default Python installation.
* Packages are collections of modules.

In [None]:
import numpy as np
import pandas as pd

In [None]:
np.max([1, 2, 3])

### Object-Oriented Programming

We can create new object-types in Python, which contain their own data and methods.
In the example below, we create the `Circle` class. This class definition is a blueprint
that will allow us to create `Circle`-objects. These objects need to be initialized
with a radius. After initialization, we can query how big a given `Circle`-object is.

In [None]:
from math import pi

class Circle:
    """ A simple class definition 
    
    c0 = Circle()
    c0.radius
    
    """
    # If no radius is specified, default to 1.
    def __init__(self, radius = 1.0):
        self.radius = radius
    
    def set_radius(self, new_radius):
        """ Set radius of circle"""
        self.radius = new_radius
    
    def get_area(self):
        """ Compute area of circle"""
        return pi * (self.radius ** 2)

In [None]:
c1 = Circle(3.2)
c1.get_area()

In [None]:
c1.set_radius(6)
c1.get_area()