# Introduction

This is a markdown cell. In edit mode, we use plain text to describe how we want the material to be presented. If we want bullets, we use asterisks or numbers:

* first items
* second item
* and so on.

When we "run" this cell, the formatted output is displayed on the web-page.

### Interactive Computing

Now let's take a look at how we work with Python within this environment.

In [13]:
the_world_is_flat = True

if the_world_is_flat:
    print("Be careful not to fall off!")

Be careful not to fall off!


In [14]:
2 + 2

4

In [15]:
17 / 3

5.666666666666667

In [16]:
w = 4.0
h = w * 2.3

There are "magic" commands in a notebook, that allow you to perform useful actions. For instance, the following command tells you what objects you have created in this session.

In [17]:
%whos

Variable            Type      Data/Info
---------------------------------------
cond                int       5646375237
el                  int       5646375237
h                   float     9.2
np                  module    <module 'numpy' from '/ho<...>kages/numpy/__init__.py'>
string              str       thankyou
the_world_is_flat   bool      True
w                   float     4.0


If is is just a few var w/o the full list, use '%who'


In [18]:
%who

cond	 el	 h	 np	 string	 the_world_is_flat	 w	 


You can explore other useful magic commands using the following command:

In [19]:
%quickref


IPython -- An enhanced Interactive Python - Quick Reference Card

obj?, obj??      : Get help, or more help for object (also works as
                   ?obj, ??obj).
?foo.*abc*       : List names in 'foo' containing 'abc' in them.
%magic           : Information about IPython's 'magic' % functions.

Magic functions are prefixed by % or %%, and typically take their arguments
without parentheses, quotes or even commas for convenience.  Line magics take a
single % and cell magics are prefixed with two %%.

Example magic function calls:

%alias d ls -F   : 'd' is now an alias for 'ls -F'
alias d ls -F    : Works if 'alias' not a python name
alist = %alias   : Get list of aliases to 'alist'
cd /usr/share    : Obvious. cd -<tab> to choose from visited dirs.
%cd??            : See help AND source for magic %cd
%timeit x=10     : time the 'x=10' statement with high precision.
%%timeit x=2**100
x**100           : time 'x**100' with a setup of 'x=2**100'; setup code is not
                   co

In [20]:
%hist

import numpy as np
import pandas as pd
print("print something 'hi'")
print('i said "hi"')
string = "thank" + 'you'
string
print(f'{el}')
el = 123
print(f'{el}')
el = 123
print(f'''{el}''')
el = 5646375237
print(f'''{el}''')
el = 5646375237
print(f'''SELECT *{el}''')
el = 5646375237
print(f'''SELECT *
FROM table{el}''')
cond = 5646375237
print(f'''SELECT *
FROM table
WHERE {cond}''')
out
the_world_is_flat = True

if the_world_is_flat:
    print("Be careful not to fall off!")
2 + 2
17 / 3
w = 4.0
h = w * 2.3
%whos
%who
%quickref
%hist


In [21]:
%rerun 2 #n #show the code b4 I run, where n is the number of line/ cell, tgt with the output

=== Executing: ===
print("print something 'hi'")
=== Output: ===
print something 'hi'


In [22]:
%recall 2 #n #show the code after I run it, n is the number of line/ cell

In [None]:
print("print something 'hi'")

In [23]:
the_world_is_flat = True

if the_world_is_flat:
    print("Be careful not to fall off!")

Be careful not to fall off!


In [24]:
the_world_is_flat = True

if the_world_is_flat:
    print("Be careful not to fall off!")

Be careful not to fall off!


In [25]:
5.666666666666667

5.666666666666667

In [26]:
5.666666666666667

5.666666666666667

In [27]:
113.09733552923255

113.09733552923255

This markdown cell contains latex code
\begin{equation}
x = 1
\end{equation}

In [28]:
%%latex

\[
1 = \frac{1}{2} + \frac{1}{2} = \frac{3}{4} + \frac{1}{4}
\]

<IPython.core.display.Latex object>

## Python Data Structures

### Slice Operator

The `:` operator is known as the *slice* operator. It useful in extracting sequences from list-like objects.

In Python, strings like the one below are like "lists" of individual characters.

In [29]:
word = 'Python'
word[0] # first element

'P'

In [30]:
word[-1] # last element (useful when you do not know the total number of elements)

'n'

In [31]:
len(word)

6

In [32]:
word[0::2] # starting from the first element till the last element, every 2nd position

'Pto'

Here is a pictorial representation of how slices should be envisioned:

![python_slice.png](attachment:python_slice.png)

The negative indices on the second row are an alternate, right-to-left way we can index strings.

### Lists, Tuples and Dictionaries

The main types of containers in Python are 
* lists, which are defined with `[ ]`. Lists are mutable.
* tuples, which are defined with `( )`. Tuples are immutable.
* dictionaries, which are defined with `{ }`. Dictionaries have keys and items. They are also mutable.

Here's what we mean by mutable. The entries in a list (or dictionary) can be overwritten, but not those in a tuple:

In [33]:
x = [1, 3, 5, 7, 8, 9, 10]

# This is OK, because "x" is a list.
x[3] = 17     
print(x)

[1, 3, 5, 17, 8, 9, 10]


In [34]:
# This is not OK, because x_tuple is a tuple
x_tuple = (1, 3, 5, 6, 8, 9, 10)
x_tuple[3] = 17 

TypeError: 'tuple' object does not support item assignment

Lists, tuples and dictionaries are *iterables*. This means that they can be *iterated over* in `for` loops. This is a convenient concept in Python.

In [35]:
for ele in x:
    print(ele)

1
3
5
17
8
9
10


The `range` function allows us to create an iterable conveniently...

In [36]:
x_range = range(1, 11, 2)
x_range

range(1, 11, 2)

In [37]:
for ele in x_range:
    print(ele)

1
3
5
7
9


In [38]:
for x in x_range:
    print(f'{x} squared: {x ** 2}, cubed: {x ** 3}.')

1 squared: 1, cubed: 1.
3 squared: 9, cubed: 27.
5 squared: 25, cubed: 125.
7 squared: 49, cubed: 343.
9 squared: 81, cubed: 729.


...which we can convert to a list to see all of its elements.

In [39]:
list(x_range)

[1, 3, 5, 7, 9]

In `print`, we have used an f-string (short for a formatted string literal).

Dictionaries contain keys and their values.

In [40]:
comics = {'marvel': 1, 'dc': 0}
comics.keys()
# list(comics.keys())

dict_keys(['marvel', 'dc'])

In [41]:
?comics.keys

[0;31mDocstring:[0m D.keys() -> a set-like object providing a view on D's keys
[0;31mType:[0m      builtin_function_or_method

In [42]:
comics.items()
# list(comics.items())

dict_items([('marvel', 1), ('dc', 0)])

In [43]:
comics.values()
# list(comics.values())

dict_values([1, 0])

We slice/index a dictionary object by a key to retrieve the associated value, if it exists.

In [44]:
comics['marvel']

1

We then iterate over the items of a dictionary, which are 2-tuples of key-value pairs. Here, we unpack them immediately using the following code:

In [45]:
key = 'test'
print(f'Here\'s the {key!r}')
print(f'Here\'s the {key}')

Here's the 'test'
Here's the test


In [46]:
key.__repr__()

"'test'"

In [47]:
for key, value in comics.items():
    # Note that we use single-quotes inside a double-quoted
    # string. We could instead use double-quotes inside a
    # single-quoted string.
    print(f"Here's the key '{key}',")
    print(f"and the associated item '{value}'.")
    # Insert a newline.
    print()

Here's the key 'marvel',
and the associated item '1'.

Here's the key 'dc',
and the associated item '0'.



Notice that splitting the item 2-tuple within the `for` statement of a loop
is a bit more convenient than doing it in the loop's body.

## String Methods

### String Formatting

Text data in Python is handled with `str`-type objects. Like tuples, strings are iterable and immutable. String operations in Python are fast. There is a comprehensive suite of functions for dealing with strings -- from catenating to capitalising to searching through them.

In [48]:
test_str = 'Where in the World is Carmen San Diego?'

Remember that a string is immutable so this will not work:

In [49]:
test_str[5] = 'z'

TypeError: 'str' object does not support item assignment

In [50]:
count = 0
for character in test_str:
    if character.isupper():
        count += 1
print(f'There were {count} upper-case characters in the sentence.')

There were 5 upper-case characters in the sentence.


In [51]:
test_str.lower()

'where in the world is carmen san diego?'

To join strings, we can use the '+' operator, the `str.join()` method, or, if they are part of the same expression, we can just place them next to each other separated by whitespace.

In [52]:
x = "Where shall "
y = "we "
new_str = "Where shall " "we " "go today?"
print(new_str)

Where shall we go today?


In [53]:
' '.join(["Where", "shall", "we", "go", "today?"])

'Where shall we go today?'

In [54]:
"Where " + "shall" + " we go" + " today?"

'Where shall we go today?'

To find simple patterns, we may turn to the `find`, `replace`, `startswith` and `endswith` methods.

In [55]:
test_str.find('Carmen')

22

In [56]:
test_str[test_str.find('Carmen'):]

'Carmen San Diego?'

In [57]:
test_str.replace('Carmen', 'John')

'Where in the World is John San Diego?'

For more complex patterns, we might have to turn to *regular expressions*.

### Regular Expressions

Regular expressions are a mini-language for specifying a search pattern. Regex-es are not specific to Python. Similar syntax is used in Perl, Ruby, Awk, and R. They are difficult to decipher at first glance sometimes, so if you use them, I recommend you describe it in the comments, so that you will know what you were doing when you come back to it the next day.

One of the best write-ups to learn about regexes is from the Python HOWTOs page. I recommend going through it 
[here](https://docs.python.org/3/howto/regex.html).

In [None]:
print()

In [58]:
# The re library allows us to execute common regex
# operations in Python.
import re

p = re.compile('a[bcd]*b')

The following are meta-characters when creating regular expressions:

. ^ $ * + ? { } [ ] \ | ( )

If you look back above, here's what we mean in that expression: the square brackets denote a character class (one of these must match), while the asterisk denotes the number of repeats to look for. The asterisk is a greedy matcher - it tries to match as much or as many times as possible.

After we create a regular expression object, we can match it against a string:

In [59]:
out = p.match('abcbd')

In [60]:
out

<re.Match object; span=(0, 4), match='abcb'>

In [61]:
out.span()

(0, 4)

In [62]:
'abcbd'[0:4]

'abcb'

In [63]:
out.group()

'abcb'

The returned object contains information about the match:
* out.group() contains the matched string
* out.start() and out.end() returns the start and end of the match
* out.span() contains a tuple with the start and end indices

In [68]:
out.start()

0

In [69]:
out.end()

4

Using `p.match` matches at the beginning of a string. If we wish to match anywhere, we use `p.search()`.

In [70]:
# p.match/search/findall('abtyroiutrabb')

In [71]:
# p.match('abcbd').start()
p.match('abcbd').group()

'abcb'

`*` matches the preceding character 0 or more times, so this will also match:

In [72]:
p.match('ab').group()

'ab'

The `+` operator matches at least once. So the following match returns None.

In [73]:
p2 = re.compile('a[bcd]+b')
p2.match('ab') is None

True

Suppose we have a list of email addresses and wish to find only those that are nus.edu.sg addresses:

In [66]:
email_adds = ['e1234@nus.edu.sg', 'a2305@nus.edu.sg', 'eric@gmail.com', 
              'george@yahoo.com', 'e1234@ntu.edu.sg', 'nus.edu.sg@ntu.edu.sg']
re1 = re.compile('[1-9]*nus.edu.sg')

In [68]:
re1.match(email_adds[0])

In [76]:
ids = []
for x, y in enumerate(email_adds):
    if re1.match(y) is not None:
        ids.append(x)
ids

[5]

That did not work because `match` looks for strings matching at the beginning of a string.

In [77]:
re1.search(email_adds[0])

<re.Match object; span=(6, 16), match='nus.edu.sg'>

In [78]:
ids = []
for x, y in enumerate(email_adds):
    if re1.search(y) is not None:
        ids.append(x)

In [79]:
ids

[0, 1, 5]

In [80]:
emails_that_match = []
for x in ids:
    emails_that_match.append(email_adds[x])

In [81]:
emails_that_match

['e1234@nus.edu.sg', 'a2305@nus.edu.sg', 'nus.edu.sg@ntu.edu.sg']

In [82]:
[email_adds[x] for x in ids]

['e1234@nus.edu.sg', 'a2305@nus.edu.sg', 'nus.edu.sg@ntu.edu.sg']

That did not work because of the funny address nus.edu.sg@ntu.edu.sg. 

We need to pick up only those where nus.edu.sg is at the end of the string.

In [83]:
re2 = re.compile('nus.edu.sg$')
ids = []
for x, y in enumerate(email_adds):
    if re2.search(y) is not None:
        ids.append(x)
[email_adds[x] for x in ids]

['e1234@nus.edu.sg', 'a2305@nus.edu.sg']

In [84]:
[email_adds[x] for x, y in enumerate(email_adds) if re2.search(y) is not None]

['e1234@nus.edu.sg', 'a2305@nus.edu.sg']

### Functions

Functions encapsulate certain instructions to be carried out on inputs which can produce an output.
To define a function in Python, we use the `def` keyword.

In [85]:
def test_function(x):
    print('You typed', x)
    
test_function('test')

You typed test


Sometimes, we do not need the permanence of a named, defined function. In those situations, where we just need a one-line function once, we can use a lambda function.

In [86]:
y = ['a', 'b', 'd']

# Here, we apply the lambda function to "y", without
# storing it in memory after the map operation has completed.
map(lambda x: 'You typed ' + x, y)

<map at 0x7f43e578db40>

The `map` function returns the output in an iterator, which we can convert to a list:

In [87]:
list(map(lambda x: 'You typed ' + x, y))

['You typed a', 'You typed b', 'You typed d']

We encourage you to avoid the use of `for` loops with the `map` function:

In [88]:
list(map(re2.search, email_adds))

[<re.Match object; span=(6, 16), match='nus.edu.sg'>,
 <re.Match object; span=(6, 16), match='nus.edu.sg'>,
 None,
 None,
 None,
 None]

Another quick way to iterate and produce output in list form is to use *list comprehension*.

In [89]:
['You typed ' + x for x in y]

['You typed a', 'You typed b', 'You typed d']

What if we wished to match only the userid in the string of email addresses earlier? In other words, we only wished to match until the `@` symbol in the list?

Let's do it in two ways: Using the `split()` method, and then using a look-ahead assertion.

In [90]:
import pprint

In [91]:
pp = pprint.PrettyPrinter(4)

In [92]:
pp.pprint(email_adds)

[   'e1234@nus.edu.sg',
    'a2305@nus.edu.sg',
    'eric@gmail.com',
    'george@yahoo.com',
    'e1234@ntu.edu.sg',
    'nus.edu.sg@ntu.edu.sg']


In [93]:
p1 = re.compile('@')
userids = [p1.split(x)[0] for x in email_adds]
#userids = [x.split('@')[0] for x in email_adds]

In [94]:
userids

['e1234', 'a2305', 'eric', 'george', 'e1234', 'nus.edu.sg']

In [95]:
p2 = re.compile('^.*(?=@)')
userids = [p2.match(x).group() for x in email_adds]

In [96]:
userids

['e1234', 'a2305', 'eric', 'george', 'e1234', 'nus.edu.sg']

#### Modules and Packages

You may have noticed we "imported" the `re` package earlier. What did we mean by that?

* A Python module is a file containing Python definitions (of functions and constants) and statements.
* Instead of re-typing functions every time, we can simply load the module. We would then have access to the functions.
* We access objects within the module using the ``dot'' notation.
* There are several modules that ship with the default Python installation.
* Packages are collections of modules.

In [2]:
print("print something 'hi'")

print something 'hi'


In [3]:
print('i said "hi"')

i said "hi"


In [4]:
string = "thank" + 'you'
string

'thankyou'

In [None]:
'''SELECT * FROM table'''

In [None]:
table.col_name

In [11]:
cond = 5646375237
print(f'''SELECT *
FROM table
WHERE {cond}''')

SELECT *
FROM table
WHERE 5646375237


In [1]:
import numpy as np
import pandas as pd

ModuleNotFoundError: No module named 'pandas'

In [1]:
np.max([1, 2, 3])

NameError: name 'np' is not defined

### Object-Oriented Programming

We can create new object-types in Python, which contain their own data and methods.
In the example below, we create the `Circle` class. This class definition is a blueprint
that will allow us to create `Circle`-objects. These objects need to be initialized
with a radius. After initialization, we can query how big a given `Circle`-object is.

In [None]:
from math import pi

class Circle:
    """ A simple class definition 
    
    c0 = Circle()
    c0.radius
    
    """
    # If no radius is specified, default to 1.
    def __init__(self, radius = 1.0):
        self.radius = radius
    
    def set_radius(self, new_radius):
        """ Set radius of circle"""
        self.radius = new_radius
    
    def get_area(self):
        """ Compute area of circle"""
        return pi * (self.radius ** 2)

In [None]:
c1 = Circle(3.2)
c1.get_area()

In [None]:
c1.set_radius(6)
c1.get_area()