# Introduction to Python and Natural Language Technologies

## Lecture 01, Introduction to Python

### September 6, 2017

# About this part of the course

### Goal

- upper intermediate level Python
- will cover some advanced concepts
- focus on string manipulation

### Prerequisites

- intermediate level in at least one object oriented programming language
- must know: _class, instance, method, operator overloading, basic IO handling_
- good to know: _static method, property, mutability, garbage collection_

### Course material

[Official Github repository](https://github.com/bmeaut/python_nlp_2017_fall)

- will push the slideshow notebooks right before the lecture, so you can follow on your own notebook

### Homework

- one homework for this part
- released on Week 4
- deadline by the end of Week 7

# Jupyter


- Jupyter - formally known as IPython Notebook is a web application that allows you to create and share documents with live code, equations, visualizations etc.
- Jupyter notebooks are JSON files with the extension `.ipynb`
- can be converted to HTML, PDF, LateX etc.
- can render images, tables, graphs, LateX equations

- content is organized into cells
- 3 basic cell types
  1. code cell: Python/R/Lua/etc. code
  2. raw cell: raw text
  3. markdown cell: formatted text using Markdown

# Code cell

In [None]:
print("Hello world")

The last command's output is displayed

In [None]:
2 + 3
3 + 4

This can be a tuple of multiple values

In [None]:
2 + 3, 3 + 4, "hello " + "world"

# Markdown cell

**This is in bold**

*This is in italics*

| This | is |
| --- | --- |
| a | table |

and is a pretty LateX equation:

$$
\mathbf{E}\cdot\mathrm{d}\mathbf{S} = \frac{1}{\varepsilon_0} \iiint_\Omega \rho \,\mathrm{d}V
$$

# Using Jupyter

## Command mode and edit mode

Jupyter has two modes: command mode and edit mode

1. Command mode: perform non-edit operations on selected cells (can select more than one cell)
  - selected cells are marked blue
2. Edit mode: edit a single cell
  - the cell being edited is marked green
  

### Switching between modes

1. Esc: Edit mode -> Command mode
2. Enter or double click: Command mode -> Edit mode

## Running cells

1. Ctrl + Enter: run cell
2. Shift + Enter: run cell and select next cell
3. Alt + Enter: run cell and insert new cell below

# Cell magic

Special commands can modify a single cell's behavior, for example

In [None]:
%%time

for x in range(100000):
    pass

In [None]:
%%timeit

x = 2

In [None]:
%%writefile hello.py

print("Hello world")

In [None]:
%%python2

print "Hello world"

For a complete list of magic commands:

In [None]:
%lsmagic

# Course material - Jupyter slides

Jupyter notebooks can be converted to slides and rendered with [Reveal.js](https://github.com/hakimel/reveal.js/) just like this course material.

This slideshow is a single Jupyter notebook which means:
- you can view it as a notebook on Github
- you can run and modify it on your own computer
- you can render it using Reveal.js

~~~
jupyter-nbconvert --to slides 01_Python_introduction.ipynb --reveal-prefix=reveal.js --post serve
~~~

More on Jupyter slides:

[10 min video on Jupyter slides](https://www.youtube.com/watch?v=EOpcxy0RA1A)

- cells may be skipped during presentations
  - some extra material is skipped, they will not be covered in the exam
- all notebooks should run without errors using `Kernel -> Restart & Run All`
  - code samples that would raise an exception are commented
- this live presentation uses the RISE jupyter extension

# Under the hood

- each notebook is run by its own _Kernel_ (Python interpreter)
  - the kernel can interrupted or restarted through the Kernel menu
  - **always** run `Kernel -> Restart & Run All` before submitting homework to make sure that your notebook behaves as expected
- all cells share a single namespace
- cells can be run in arbitrary order, execution count is helpful

In [None]:
print("this is run first")

In [None]:
print("this is run afterwords. Note the execution count on the left.")

## The input and output of code cells can be accessed

Previous output:

In [None]:
42

In [None]:
_

Next-previous output:

In [None]:
"first"

In [None]:
"second"

In [None]:
__

In [None]:
__

Next-next previous output:

In [None]:
___

N-th output can also be accessed as a variable `_output_count`. This is only defined if the N-th cell had an output.

Here is a way to list all defined outputs (you will understand the code in 3 week):

In [None]:
list(filter(lambda x: x.startswith('_') and x[1:].isdigit(), globals()))

## Inputs can be accessed similarly

Previous input:

In [None]:
_i

N-th input:

In [None]:
_i2

# The Python programming language

# History of Python


- Python started as a hobby project of Dutch programmer, Guido van Rossum in 1989.
- Python 1.0 in 1994
- Python 2.0 in 2000
  - cycle-detecting garbage collector
  - Unicode support
- Python 3.0 in 2008
  - backward incompatible
- Python2 End-of-Life (EOL) date was postponed from 2015 to 2020

 # Benevolent Dictator for Life
 
 <img width="400" alt="portfolio_view" src="https://upload.wikimedia.org/wikipedia/commons/6/66/Guido_van_Rossum_OSCON_2006.jpg">
 Guido van Rossum at OSCON 2006. by [Doc Searls](https://www.flickr.com/photos/docsearls/) licensed under [CC BY 2.0](https://creativecommons.org/licenses/by/2.0/)

# Python community and development

- Python Software Foundation nonprofit organization based in Delaware, US
- managed through PEPs (Python Enhancement Proposal)
- strong community inclusion
- large standard library
- very large third-party module repository called PyPI (Python Package Index)
- pip installer



In [None]:
import antigravity

## Python neologisms

- the Python community has a number of made-up expressions
- _Pythonic_: following Python's conventions, Python-like
- _Pythonist_ or _Pythonista_: good Python programmer

# General properties of Python

## Whitespaces

- whitespace indentation instead of curly braces
- no semicolons

In [None]:
n = 11
if n % 2 == 0:
    print("n is even")
else:
    print("n is odd")

## Dynamic typing

- type checking is performed at run-time as opposed to compile-time (C++)

In [None]:
n = 2
print(type(n))

n = 2.1
print(type(n))

n = "foo"
print(type(n))

## Assignment

assignment differs from other imperative languages:

- in C++ `i = 2` translates to _typed variable named i receives a copy of numeric value 2_
- in Python `i = 2` translates to _name i receives a reference to object of numeric type of value 2_

the built-in function `id` returns the object's id

In [None]:
i = 2
print(id(i))

i = 3
print(id(i))

# Simple statements

## if, elif, else

In [None]:
#n = int(input())
n = 12

if n < 0:
    print("N is negative")
elif n > 0:
    print("N is positive")
else:
    print("N is neither positive nor negative")

## Conditional expressions

- one-line `if` statements
- the order of operands is different from C's `?:` operator:
- should only be used for very short statements


`<expr1> if <condition> else <expr2>`

In [None]:
n = -2
abs_n = n if n >= 0 else -n
abs_n

## Lists

- lists are the most frequently used built-in containers
- basic operations: indexing, length, append, extend
- lists will be covered in detail next week

In [None]:
l = []  # empty list
l.append(2)
l.append(2)
l.append("foo")

len(l), l

In [None]:
l[1] = "bar"
l.extend([-1, True])
len(l), l

## for, range

### Iterating a list

In [None]:
for e in ["foo", "bar"]:
    print(e)

### Iterating over a range of integers

The same in C++: `for (int i=0; i<5; i++) cout << i << endl;`

By default `range` starts from 0.

In [None]:
for i in range(5):
    print(i)

specifying the start of the range:

In [None]:
for i in range(2, 5):
    print(i)

specifying the step. Note that in this case we need to specify all three positional arguments.

In [None]:
for i in range(0, 10, 2):
    print(i)

## while

In [None]:
i = 0
while i < 5:
    print(i)
    i += 1
    

There is no `do...while` loop in Python.

## break and continue

- `break`: allows early exit from a loop
- `continue`: allows early jump to next iteration

In [None]:
for i in range(10):
    if i % 2 == 0:
        continue
    print(i)

In [None]:
for i in range(10):
    if i > 4:
        break
    print(i)

# Functions

# Defining functions

Functions can be defined using the `def` keyword:

In [None]:
def foo():
    print("this is a function")
    
foo()

# Function arguments

1. positional
2. named or keyword arguments

keyword arguments must follow positional arguments

In [None]:
def foo(arg1, arg2, arg3):
    print("arg1 ", arg1)
    print("arg2 ", arg2)
    print("arg3 ", arg3)
    
foo(1, 2, 3)

In [None]:
foo(1, arg3=2, arg2=2)

# Default arguments

- arguments can have default values
- default arguments must follow non-default arguments

In [None]:
def foo(arg1, arg2, arg3=3):
    print("arg1 ", arg1)
    print("arg2 ", arg2)
    print("arg3 ", arg3)

Default arguments need not be specified when calling the function

In [None]:
foo(1, 2)

In [None]:
foo(arg1=1, arg3=33, arg2=222)

This mechanism allows having a very large number of arguments.
Many libraries have functions with dozens of arguments.

The popular data analysis library `pandas` has functions with dozens of arguments, for example:

~~~python
 pandas.read_csv(filepath_or_buffer, sep=', ', delimiter=None, header='infer', names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, iterator=False, chunksize=None, compression='infer', thousands=None, decimal=b'.', lineterminator=None, quotechar='"', quoting=0, escapechar=None, comment=None, encoding=None, dialect=None, tupleize_cols=False, error_bad_lines=True, warn_bad_lines=True, skipfooter=0, skip_footer=0, doublequote=True, delim_whitespace=False, as_recarray=False, compact_ints=False, use_unsigned=False, low_memory=True, buffer_lines=None, memory_map=False, float_precision=None)
 ~~~

# The return statement

- functions may return more than one value
  - a tuple of the values is returned
- without an explicit return statement `None` is returned

In [None]:
def foo(n):
    if n < 0:
        return "negative"
    if 0 <= n < 10:
        return "positive", n
    
print(foo(-2))
print(foo(3))
print(foo(12))

# Zen of Python

In [None]:
import this