# Introduction to Python and Natural Language Technologies

__Lecture 01-2, Introduction to Python__

__Sept 09, 2020__

__Judit Ács__

# About this part of the course

## Goal

- upper intermediate level Python
- will cover some advanced concepts
- focus on string manipulation

## Prerequisites

- intermediate level in at least one object oriented programming language
- must know: _class, instance, method, operator overloading, basic IO handling_
- good to know: _static method, property, mutability, garbage collection_

## Links to course material

[Official website](https://python-nlp.github.io/)

[Official Github repository](https://github.com/bmeaut/python_nlp_2020_fall)

# Jupyter

- Jupyter - formally known as IPython Notebook is a web application that allows you to create and share documents with live code, equations, visualizations etc.
- Jupyter notebooks are JSON files with the extension `.ipynb`
- Can be converted to HTML, PDF, LateX etc.
- Can render images, tables, graphs, LateX equations
- Large number of extensions called 'nbextensions'
  - `jupyter-vim-binding` is used in this lecture

- Content is organized into cells

## Cell types


1. code cell: Python/R/Lua/etc. code
2. raw cell: raw text
3. markdown cell: formatted text using Markdown

## Code cell

In [None]:
print("Hello world")

The last command's output is displayed

In [None]:
2 + 3
3 + 4

This can be a tuple of multiple values

In [None]:
2 + 3, 3 + 4, "hello " + "world"

## Markdown cell

**This is in bold**

*This is in italics*

| This | is |
| --- | --- |
| a | table |

and this is a pretty LateX equation:

$$
\mathbf{E}\cdot\mathrm{d}\mathbf{S} = \frac{1}{\varepsilon_0} \iiint_\Omega \rho \,\mathrm{d}V
$$

## Using Jupyter

### Command mode and edit mode

Jupyter has two modes: command mode and edit mode

1. Command mode: perform non-edit operations on selected cells (can select more than one cell)
  - Selected cells are marked blue
2. Edit mode: edit a single cell
  - The cell being edited is marked green

### Switching between modes

1. Esc: Edit mode -> Command mode
2. Enter or double click: Command mode -> Edit mode

### Running cells

1. Ctrl + Enter: run cell
2. Shift + Enter: run cell and select next cell
3. Alt + Enter: run cell and insert new cell below

## User input

Jupyter has a widget for the built-in `input` function. This __halts__ the execution until some input is provided. Note the * in place of the execution counter:

In [None]:
input("Please input something: ")

## Cell magic

Special commands can modify a single cell's behavior, for example

In [None]:
%%time

for x in range(1000000):
    pass

In [None]:
%%timeit

x = 2

In [None]:
%%writefile hello.py

print("Hello world from BME")

For a complete list of magic commands:

In [None]:
%lsmagic

## Under the hood

- Each notebook is run by its own _Kernel_ (Python interpreter)
  - The kernel can interrupted or restarted through the Kernel menu
  - **Always** run `Kernel -> Restart & Run All` before submitting homework to make sure that your notebook behaves as expected
- All cells share a single namespace

In [None]:
my_name = 12

In [None]:
my_name + 1

Cells can be run in arbitrary order, execution count is helpful

In [None]:
print("this is run first")

In [None]:
print("this is run afterwords. Note the execution count on the left.")

# The Python programming language

## History of Python


- Python started as a hobby project of Dutch programmer, Guido van Rossum in 1989.
- Python 1.0 in 1994
- Python 2.0 in 2000
  - Cycle-detecting garbage collector
  - Unicode support
- Python 3.0 in 2008
  - Backward incompatible
- Python2 End-of-Life (EOL) date was postponed from 2015 to 2020

## Guido van Rossum, <s>Benevolent Dictator for Life</s> Stepped down in 2018
 
Guido van Rossum at OSCON 2006. by
[Doc Searls](https://www.flickr.com/photos/docsearls/)
licensed under [CC BY 2.0](https://creativecommons.org/licenses/by/2.0/)
 <img width="400" alt="portfolio_view" src="https://upload.wikimedia.org/wikipedia/commons/6/66/Guido_van_Rossum_OSCON_2006.jpg">

## Python community and development

- Python Software Foundation nonprofit organization based in Delaware, US
- Managed through PEPs (Python Enhancement Proposal)
    - Public discussion for example [PEP 3000 about Python 3.0](https://www.python.org/dev/peps/pep-3000/)
- Strong community inclusion
- Large standard library
- Very large third-party module repository called PyPI (Python Package Index)
- pip installer

In [None]:
import antigravity

## Python neologisms

- the Python community has a number of made-up expressions
- _Pythonic_: following Python's conventions, Python-like
- _Pythonist_ or _Pythonista_: good Python programmer

# Developing in Python

## Notebooks

- Jupyter
- [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/): Jupyter + IDE-like features
- [Google Colab](https://colab.research.google.com): online notebook with GPU access

## IDEs

- [VSCode](https://code.visualstudio.com/): free, cross-platform, Python plugin, command line support
- [PyCharm](https://www.jetbrains.com/pycharm/): free Community edition, cross-platform

## Command line tools

- [VIM](https://www.vim.org/) or [neovim](https://neovim.io/) + [tmux](https://github.com/tmux/tmux/wiki): small, runs everywhere, modal editing, steep learning curve, Python plugins, very mature
    - VSCode and PyCharm have VIM editing mode
- [Emacs](https://www.emacswiki.org/emacs/PythonProgrammingInEmacs): another CLI editor, built-in Python support

## PEP8, the Python style guide

- widely accepted style guide for Python
- [PEP8](https://www.python.org/dev/peps/pep-0008/) by Guido himself, 2001

Specifies:

- indentation
- line length
- module imports
- class names, function names etc.

We shall use PEP8 throughout this course. You are expected to follow it in the homeworks.

# General properties of Python

## Whitespaces

Whitespace indentation instead of curly braces, no need for semicolons:

In [None]:
n = 12
if n % 2 == 0:
    print("n is even")
else:
    print("n is odd")

## Dynamic typing

Type checking is performed at run-time as opposed to compile-time (C++):

In [None]:
n = 2
print(type(n))

n = 2.1
print(type(n))

n = "foo"
print(type(n))

## Assignment

Assignment differs from other imperative languages:

- in C++ `i = 2` translates to _typed variable named i receives a copy of numeric value 2_
- in Python `i = 2` translates to _name i receives a reference to object of numeric type of value 2_

The built-in function `id` returns the object's id

In [None]:
i = 2
print(id(i))

i = 3
print(id(i))

The `is` operator compares two objects' identities:

In [None]:
a = 2
b = a
print(a is b)  # same as print(id(a) == id(b))

String concatenation results in a new object:

In [None]:
s = "foo"
old_id = id(s)

s += "bar"
print(old_id == id(s))

Numerical operations also result in new objects. We will talk about this in detail next week.

In [None]:
a = 2
b = a
print(id(a) == id(b))
a += 1
print(a is b)

Integers from -5 to 256 are preallocated since these numbers are used frequently.

More information [here](https://github.com/satwikkansal/wtfPython#-is-is-not-what-it-is).

More crazy stuff in [WTFPython](https://github.com/satwikkansal/wtfpython)

In [None]:
for n in range(-8, 0):
    print(n, n is n + 1 - 1)
    
for n in range(253, 260):
    print(n, n is n + 1 - 1)

# Simple statements

## Conditional expressions

### if, elif, else

In [None]:
#n = int(input())
n = 12

if n < 0:
    print("N is negative")
elif n > 0:
    print("N is positive")
else:
    print("N is neither positive nor negative")

### Ternary conditional operator

- one-line `if` statements
- the order of operands is different from C's `?:` operator, the C version of abs would look like this

~~~C
int x = -2;
int abs_x = x>=0 ? x : -x;
~~~
- should only be used for very short statements


`<expr1> if <condition> else <expr2>`

In [None]:
n = -2
abs_n = n if n >= 0 else -n
abs_n

## Lists

- lists are the most frequently used built-in containers
- basic operations: indexing, length, append, extend
- lists will be covered in detail next week

In [None]:
l = []  # empty list
l.append(2)
l.append(2)
l.append("foo")

len(l), l

## Iteration

### Iterating a list

In [None]:
for e in ["foo", "bar"]:
    print(e)

## `enumerate`: iterating with an index

In [None]:
for idx, element in enumerate(["foo", "bar"]):
    print(idx, element)

## `range`: Iterating over a range of integers

The same in C++:
~~~C++
for (int i=0; i<5; i++)
    cout << i << endl;
~~~

By default `range` starts from 0.

In [None]:
for i in range(5):
    print(i)

Specifying the start of the range:

In [None]:
for i in range(2, 5):
    print(i)

Specifying the step. Note that in this case we need to specify all three positional arguments.

In [None]:
for i in range(0, 10, 2):
    print(i)

Negative values:

In [None]:
for i in range(-3, 0):
    print(i)

In [None]:
for i in range(-3, 0, -1):
    print(i)

In [None]:
for i in range(0, -3, -1):
    print(i)

## `break` and `continue`

- `break`: allows early exit from a loop
- `continue`: allows early jump to next iteration

In [None]:
for i in range(10):
    if i % 2 == 0:
        continue
    print(i)

In [None]:
for i in range(10):
    if i > 4:
        break
    print(i)

## `else`

__`else`__ can be used with `for`:

In [None]:
numbers = [3, -1, 5, 3, 7]

for n in numbers:
    if n % 2 == 0:
        break
else:
    print("Found no even numbers.")

## while

In [None]:
i = 0
while i < 5:
    print(i)
    i += 1
i

There is no `do...while` loop in Python.

# Functions

Functions can be defined using the `def` keyword:

In [None]:
def foo():
    print("this is a function")
     
foo()

## Function arguments

1. positional
2. named or keyword arguments

Keyword arguments must follow positional arguments:

In [None]:
def foo(arg1, arg2, arg3):
    print("arg1 ", arg1)
    print("arg2 ", arg2)
    print("arg3 ", arg3)
    
foo(1, 2, arg3="asdfs")
# foo(1, arg3="asdfs", 2)  # raises SyntaxError

In [None]:
foo(1, arg3=2, arg2=29)

## Default arguments

- arguments can have default values
- default arguments must follow non-default arguments

In [None]:
def foo(arg1, arg2, arg3=3):
# def foo(arg1, arg2=2, arg3):  # raises SyntaxError
    print("arg1 ", arg1)
    print("arg2 ", arg2)
    print("arg3 ", arg3)
foo(1, 2)

Default arguments need not be specified when calling the function

In [None]:
foo(1, 2)

They can be specified in any order:

In [None]:
foo(arg1=1, arg3=33, arg2=222)

If more than one value has default arguments, either can be skipped:

In [None]:
def foo(arg1, arg2=2, arg3=3):
    print("arg1 ", arg1)
    print("arg2 ", arg2)
    print("arg3 ", arg3)
    
foo(11, 33)
print("")
foo(11, arg3=33)

This mechanism allows having a very large number of arguments.
Many libraries have functions with dozens of arguments.

The popular data analysis library `pandas` has functions with dozens of arguments, for example:

~~~python
pandas.read_csv(filepath_or_buffer, sep=',', delimiter=None, header='infer', names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, skipfooter=0, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, cache_dates=True, iterator=False, chunksize=None, compression='infer', thousands=None, decimal='.', lineterminator=None, quotechar='"', quoting=0, doublequote=True, escapechar=None, comment=None, encoding=None, dialect=None, error_bad_lines=True, warn_bad_lines=True, delim_whitespace=False, low_memory=True, memory_map=False, float_precision=None)
 ~~~

## Variable number of arguments

__`args` and `kwargs`__

- both positional and keyword arguments can be captured in arbitrary numbers using the `*` and `**` operators
- positional arguments are captured in a tuple

In [None]:
def arbitrary_positional_f(*args):
    print(type(args))
    for arg in args:
        print(arg)
        
arbitrary_positional_f(1, 2, -1)
# arbitrary_positional_f(1, 2, arg=-1)  # raises TypeError

Keyword arguments are captured in a dictionary:

In [None]:
def arbitrary_keyword_f(**kwargs):
    print(type(kwargs))
    for argname, value in kwargs.items():
        print(argname, value)
        
arbitrary_keyword_f(arg1=1, arg2=12)
# arbitrary_keyword_f(12, arg=12)  # TypeError

We usually capture both:

In [None]:
def arbitrary_arg_f(*args, **kwargs):
    if args:
        print("Positional arguments")
        for arg in args:
            print(arg)
    else:
        print("No positional arguments")
    if kwargs:
        print("Keyword arguments")
        for argname, value in kwargs.items():
            print(argname, value)
    else:
        print("No keyword arguments")
        
arbitrary_arg_f()
arbitrary_arg_f(12, -2, param1="foo")

## The return statement

- functions may return more than one value
  - a tuple of the values is returned
- without an explicit return statement `None` is returned
- an empty return statement returns `None`

In [None]:
def foo(n):
    if n < 0:
        return "negative"
    if 0 <= n < 10:
        return "positive", n
    # return None
    # return

print(foo(-2))
print(foo(3), type(foo(3)))
print(foo(12))

# Exception handling

Fully typed exception handling:

In [None]:
try:
    int("abc")
except ValueError as e:
    print(type(e), e)
    print(e)

More than one except clauses may be defined ordered from more specific to least specific:

In [None]:
try:
    age = int(input())
    if age < 0:
        raise Exception("Age cannot be negative")
except ValueError as e:
    print("ValueError caught")
except Exception as e:
    print("Other exception caught: {}".format(type(e)))

## More than one type of exception can be handled in the same except clause

In [None]:
def age_printer(age):
    next_age = age + 1
    print("Next year your age will be " + next_age)
    
try:
    your_age = input()
    your_age = int(your_age)
    age_printer(your_age)
except ValueError:
    print("ValueError caught")
except TypeError:
    print("TypeError caught")

def age_printer(age):
    next_age = age + 1
    print("Next year your age will be " + next_age)
    
try:
    your_age = input()
    your_age = int(your_age)
    age_printer(your_age)
except (ValueError, TypeError) as e:
    print("{} caught".format(type(e).__name__))

## `except` without an exception type

- without specifying a type, `except` catches everything but all information about the exception is lost

In [None]:
try:
    age = int(input())
    if age < 0:
        raise Exception("Age cannot be negative")
except ValueError:
    print("ValueError caught")
except:
#except Exception as e:
    print("Something else caught")

- the empty `except` must be the last except block since it blocks all others
- `SyntaxError` otherwise

In [None]:
try:
    age = int(input())
    if age < 0:
        raise Exception("Age cannot be negative")
#except:
    #print("Something else caught")
except ValueError:
    print("ValueError caught")

## Base class' except clauses catch derived classes too

`ValueError` subclasses `Exception` so the second `except` never runs:

In [None]:
try:
    age = int(input())
    if age < 0:
        raise Exception("Age cannot be negative")
except Exception as e:
    print("Exception caught: {}".format(type(e)))
except ValueError:
    print("ValueError caught")

## `finally`

The `finally` block is guaranteed to run regardless an exception was raised or not

In [None]:
try:
    age = int(input())
except Exception as e:
    print(type(e), e)
finally:
    print("this always runs")

## `else`

Try-except blocks may have an `else` clause that **only** runs if no exception was raised

In [None]:
try:
    age = int(input())
except ValueError as e:
    print("Exception", e)
else:
    print("No exception was raised")
    # raise Exception("Raising an exception in else")
finally:
    print("this always runs")

### `raise` keyword

- `raise` throws/raises an exception
- an empty `raise` in an `except` block reraises the exception

In [None]:
try:
    int("not a number")
except Exception:
    # important log message
    # raise
    pass

### Defining exceptions

Any type that subclasses `Exception` (`BaseException` to be exact) can be used as an exception object:

In [None]:
class NegativeAgeError(Exception):
    pass

try:
    age = int(input())
    if age < 0:
        raise NegativeAgeError("Age cannot be negative. Invalid age: {}".format(age))
except NegativeAgeError as e:
    print(e)
except Exception as e:
    print("Something else happened. Caught {}, with message {}".format(type(e), e))

Using exception for trial-and-error is considered _Pythonic_:

In [None]:
try:
    v = input()
    int(v)
except ValueError:
    print("not an int")
else:
    print("looks like an int")

# Zen of Python

In [None]:
import this