# Topic 0: Introduction to Python (Part 1)

This is an Jupyter notebook, a web-based interactive computational environment. 
- Cells can contain markdown or code. 
- To run a code cell, press shift+Enter. 
- Jupyter will print the output from a cell, beneath it.

This session is designed to give you the working knowledge of Python necessary to complete the lab sessions for Natural Language Engineering. 

- Run all of the code cells as you work through the notebook. 
- Try to understand what is happening in each code cell and predict the output before running it.
- Complete all of the exercises.
- Solutions to all exercises are provided, but please avoid loading the solution until you have had a go at solving it yourself.


Run the following cell twice, first to load some set up code, then again to run the code.

In [1]:
# %load ../setup
import sys
#sys.path.append(r'T:\Departments\Informatics\LanguageEngineering') 
sys.path.append(r'/Users/davidw/Documents/teach/NLE/resources')
#sys.path.append(r'\\ad.susx.ac.uk\ITS\TeachingResources\Departments\Informatics\LanguageEngineering\resources)
import re
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import collections
from collections import defaultdict,Counter
from itertools import zip_longest
from IPython.display import display
from random import seed
get_ipython().magic('matplotlib inline')
import random
import math
import matplotlib.pylab as pylab
%matplotlib inline
params = {'legend.fontsize': 'large',
          'figure.figsize': (15, 5),
         'axes.labelsize': 'large',
         'axes.titlesize':'large',
         'xtick.labelsize':'large',
         'ytick.labelsize':'large'}
pylab.rcParams.update(params)
from pylab import rcParams
from operator import itemgetter, attrgetter, methodcaller
import matplotlib.pyplot as plt
from sklearn.decomposition import TruncatedSVD
from sklearn.feature_extraction.text import TfidfVectorizer
import seaborn as sns
import csv



## Python types

### String
Strings are enclosed in double or single quotes in Python.

In [3]:
print('Hello World')

Hello World


In [4]:
print("Hello World")

Hello World


In [5]:
# This is a comment (# at the beginning of the line)
# Note that a string enclosed in double quotes can contain single quotes as part of the string:
print("'A reader lives a thousand lives before he dies,' said Jojen. 'The man who never reads lives only one.'")

'A reader lives a thousand lives before he dies,' said Jojen. 'The man who never reads lives only one.'


In [6]:
# ...and a string enclosed in single quotes can contain double quotes as part of the string:
print('"A reader lives a thousand lives before he dies," said Jojen. "The man who never reads lives only one."')

"A reader lives a thousand lives before he dies," said Jojen. "The man who never reads lives only one."


As an alternative to using the explicit `print` function, when a cell is run, Python will print the value of the last line of code in a cell. Try running the following cell.

In [8]:
"Hello World"
'"A reader lives a thousand lives before he dies," said Jojen. "The man who never reads lives only one."'
"My first Python"

'My first Python'

### Integer

In [9]:
75

75

### Float

In [10]:
6.3646

6.3646

When a string contains just digits, the function `int` will **cast** that string to an integer.

In [11]:
# give the type of the string '623'
type('623')

str

In [12]:
# cast the string '623' to an integer
int('623')

623

In [13]:
# give the type that results from casting the string '623' to an integer.
type(int('623'))

int

## Basic operations

Strings can be joined using `+`

In [15]:
"Hello " + "World" + " Batman" 

'Hello World Batman'

Standard operators are used on integers and floats: `+`, `-`, `*`, and `/`.

In [16]:
7 - 3 + 5

9

In [17]:
3.5*8/4

7.0

If we want to use floor division (rounded down to nearest integer) use `//`.

In [18]:
7//2

3

Use `**` for exponentiation - e.g. `3**2 = 3^2`.

In [20]:
# This is equivalent to 2*2*2*2*2
2**6

64

Use double equals, `==`, to check equality.

In [21]:
5*4 == 2*10

True

Modulo operator `%` returns the remainder after integer division.  
e.g. 13/5 = 2 with 3 leftover, so `13%5=3`.

In [22]:
7%3

1

In [23]:
4 % 2

0

## Python error reports
e.g. when attempting to join a **string** and an **integer**

In [32]:
"Hello" + 3

TypeError: must be str, not int

### Exercise
In the empty cell below write a single line Python expression to print "Hello world! My name is", joined with another string containing your name

In [27]:
 "Hello world! My name is" + " Alex" 

'Hello world! My name is Alex'

In [28]:
# %load solutions/hello

## Python identifiers
Assign a variable name to any value (eg string, integer, float) using a single equals sign.

In [29]:
student_name = "Adam"
student_name = "Alex"

In [33]:
student_age = 21
student_age = 22

Operations can be carried out as before, using the variable names.

In [34]:
student_age/2

11.0

We can update values associated with a variable using the operators `+=` , `-=` , `/=`, and `*=`.

- For example, `+=` adds the number on the right to the current value.

This is a useful shortcut - take your time to play around and familiarise yourself with this syntax.

In [35]:
#Note that each time you run this cell, it will add 5 to the stored value.
student_age += 5
student_age += 12

In [36]:
age_next_year=student_age+1
age_next_year = student_age + 1

### Exercise
In the cell below, assign appropriate values to the variables `my_name`, `my_age`, and `years_at_sussex`.

In [59]:
my_name = "Alex"
my_age = 22
years_at_sussex = 3
print(my_name)
print(my_age)
years_at_sussex

Alex
22


3

In [None]:
# %load solutions/age
my_name = "David Weir"
my_age = 57
years_at_sussex = 26


### Exercise
In the cell below subtract `years_at_sussex` from `my_age` and assign this value to a new variable called `age_started_sussex`.

In [40]:
age_started_sussex = my_age - years_at_sussex

In [41]:
# %load solutions/age_started


### Exercise
In the cell below practice using the `**`,  `+=` , `-=`, `/=`, and `\*=` operators to update these values.

In [50]:
years_at_sussex -= 1
print(years_at_sussex)
age_started_sussex += 1
print(age_started_sussex)

-1
20


## Dynamic typing
The `type` function is used to get an object's type: `int` for integer, `str` for string, etc.

In [51]:
type(student_name)

str

In [52]:
type(student_age)

int

As Python has dynamic typing, if a variable name is assigned to a new value of different type, the variable's type will change accordingly.

In [53]:
student_age = "Twenty"
type(student_age)

str

### Exercise
In the cell below reassign your `my_age` and `years_at_sussex` `int` variables to `string` giving the number in words. Print the type of these variables before and after.

In [60]:
print(type(my_age), type(years_at_sussex))
my_age = "Twenty two"
years_at_sussex = "2"
print(type(my_age), type(years_at_sussex))

<class 'int'> <class 'int'>
<class 'str'> <class 'str'>


In [None]:
# %load solutions/dynamic_typing
print(type(my_age),type(years_at_sussex))
my_age = "fifty seven"
years_at_sussex = "twenty six"
print(type(my_age),type(years_at_sussex))


## Lists

Lists are initialised using square brackets, with objects separated by commas.

In [2]:
primes = [2, 3, 5, 7, 11]
type(primes)

list

Lists can contain any data type.

In [3]:
list_of_strings =['string','another string','a third string']
list_of_strings

['string', 'another string', 'a third string']

'Empty' lists with no elements can also be initialised.

In [4]:
empty_list = []

Indexing into lists uses square brackets.
- Note that indexing starts from zero.

In [5]:
primes[0]

2

A colon, `:`, can be used to take a slice of list between two indices.
- Note that this will start from the first index, up to but NOT including the second index.

In [6]:
primes[1:4]

[3, 5, 7]

If either index is omitted, the slice will go to the beginning/end of the list.

In [10]:
primes[:-2]

[2, 3, 5]

To index from the end of the list use negative numbers.

In [73]:
primes[-1]

11

In [75]:
primes[-3:]

[5, 7, 11]

To test for list membership use the keyword `in`.

In [76]:
5 in primes

True

In [77]:
6 in primes

False

The function `len` gives the length of a list.

In [78]:
len(primes)

5

To append an element to a list use `append`.

In [79]:
primes.append(13)

In [80]:
primes.append(17)

In [81]:
primes

[2, 3, 5, 7, 11, 13, 17]

Using `append` with a list as parameter adds the list as a single element - producing a list that contains a list as its last element.

In [82]:
primes = [2, 3, 5, 7, 11, 13]
primes.append([17,19])
primes

[2, 3, 5, 7, 11, 13, [17, 19]]

When we want to add the elements of one list individually to another list, use the `+=` operator to concatenate the two lists.

In [83]:
primes = [2, 3, 5, 7, 11, 13]
primes += [17,19]
primes

[2, 3, 5, 7, 11, 13, 17, 19]

To write a for loop that iterates over a list use keywords `for` and `in`, `:`, and indentation to indicate the scope of the body of the loop.

In [84]:
for prime in primes:
    print(prime,"is a prime")

2 is a prime
3 is a prime
5 is a prime
7 is a prime
11 is a prime
13 is a prime
17 is a prime
19 is a prime


### Exercise
In the cell below initialise the variable `squares` to be a list of the square numbers from 1 to 16 inclusive.

In [88]:
squares = [1, 4, 9, 16]
squares

[1, 4, 9, 16]

In [87]:
# %load solutions/squares
squares = [1,4,9,16]
squares


[1, 4, 9, 16]

### Exercise
In the cell below append the next square number to the list `squares`.

In [91]:
# squares.append(25)
squares += [25]
squares

[1, 4, 9, 16, 25]

In [None]:
# %load solutions/extend_squares
squares += [25]
squares


### Exercise
In the cell below make a list of the next two square numbers and concatenate this with `squares`.

In [92]:
squares.append([36, 49])
squares

[1, 4, 9, 16, 25, [36, 49]]

In [94]:
# %load solutions/more_squares
more_squares = [36,49]
squares += more_squares
squares


[1, 4, 9, 16, 25, [36, 49], 36, 49]

### Exercise
In the cell  below check how many items are in the list now.

In [95]:
length = len(squares)
length

8

In [None]:
# %load solutions/squares_length
len(squares)


### Exercise
In the cell below use indexing to print just the first 3 and last 3 items in the list `squares`

In [110]:
print(squares[:3])
print(squares[-3:])

[1, 4, 9]
[[36, 49], 36, 49]


In [11]:
# %load solutions/first_last_three
print(squares[:3])
print(squares[-3:])


NameError: name 'squares' is not defined

### Exercise
In the cell below, use a `for` loop to print each item in the list `squares` on its own line, as part of a sentence. The output should like like this:
```
The first square in the list is  1
The next square in the list is  4
The next square in the list is  9
The next square in the list is  16
The next square in the list is  25
The next square in the list is  36
The last square in the list is  49
```

In [122]:
print("The first square in the list is", squares[0])
for square in squares[1:-1]:
    print("The next square in the list is ", square)
print("The last square in the list is ", squares[-1])

The first square in the list is 1
The next square in the list is  4
The next square in the list is  9
The next square in the list is  16
The next square in the list is  25
The next square in the list is  [36, 49]
The next square in the list is  36
The last square in the list is  49


In [None]:
# %load solutions/print_squares
print(squares)
print("The first square in the list is ",squares[0])
length = len(squares) - 1 
for square in squares[1:-1]:
    print("The next square in the list is ",square)
print("The last square in the list is ", squares[-1])


## Strings

In [13]:
# Here we asign a string "Hello World" as the value a variable called hello_world
hello_world = "Hello World"

String indexing is similar to list indexing, but works on a character-by-character basis.

In [14]:
hello_world[0]

'H'

In [15]:
hello_world[7]

'o'

In [24]:
hello_world[:-3]

'Hello Wo'

In [127]:
hello_world[-40]

IndexError: string index out of range

Test for substring presence using the keyword `in`.

In [128]:
"w" in hello_world

False

In [129]:
"W" in hello_world

True

In [130]:
"llo" in hello_world

True

Find the length of a string using `len`.

Note that the output value is a count including spaces, tabs and non-alphanumeric characters.

In [131]:
len(hello_world)

11

In [132]:
hello_world+="!"
hello_world

'Hello World!'

In [133]:
len(hello_world)

12

Iterating over a string involves similar syntax to list iteration, but works on a character-by-character basis.

In [134]:
for char in hello_world:
    print ("the character >>>", char, "<<< is present")

the character >>> H <<< is present
the character >>> e <<< is present
the character >>> l <<< is present
the character >>> l <<< is present
the character >>> o <<< is present
the character >>>   <<< is present
the character >>> W <<< is present
the character >>> o <<< is present
the character >>> r <<< is present
the character >>> l <<< is present
the character >>> d <<< is present
the character >>> ! <<< is present


Parsing a string into words uses the `split` method which returns a list of tokens in a sentence. 

By default, it separates based on whitespace.

In [None]:
sentence = "This is a sample sentence"
words = sentence.split()
print(words)

To check for the presence of a token in a list of words use the `in` keyword.

In [None]:
"sample" in words

In [None]:
"Hello" in words

### Exercise
In the empty cell below  assign the string `"It was the best of times, it was the worst of times"` to the variable `opening_line`.

In [25]:
opening_line = "It was the best of times, it was the worst of times"

In [None]:
# %load solutions/assign_string
opening_line = "It was the best of times, it was the worst of times"


### Exercise
In the empty cell below check whether 'worst' appears in opening_line.

In [137]:
"worst" in opening_line

True

In [None]:
# %load solutions/worst_in
'worst' in opening_line


### Exercise
In the empty cell below make a list of the words in `opening_line`, assigned to the variable `dickens_words`, and iterate over `dickens_words`, printing one word per line.

In [147]:
for word in opening_line.split():
    print()

the word >>> It <<< is present
the word >>> was <<< is present
the word >>> the <<< is present
the word >>> best <<< is present
the word >>> of <<< is present
the word >>> times, <<< is present
the word >>> it <<< is present
the word >>> was <<< is present
the word >>> the <<< is present
the word >>> worst <<< is present
the word >>> of <<< is present
the word >>> times <<< is present


In [None]:
# %load solutions/print_sentence
dickens_words = opening_line.split()
for word in dickens_words:
    print ("the word >>>", word, "<<< is present")


### Exercise
In the empty cell below check whether `'blurst'` appears in the list you made.

In [148]:
"blurst" in dickens_words

False

In [None]:
# %load solutions/blurst_check
'blurst' in dickens_words


## Conditions and booleans

In [None]:
if 2 > 3:
    print ("yes")
else:
    print ("no")

Here are some useful string *shape* functions.

In [150]:
"This".isalpha()

True

In [151]:
"This,".isalpha()

False

In [152]:
"M25".isalpha()

False

In [153]:
"M25".isalnum()

True

In [154]:
"463".isdigit()

True

In [155]:
# non zero numbers are TRUE
print ("yes" if 15 else "no")

yes


In [156]:
# zero is FALSE
print ("yes" if 0 else "no")

no


In [157]:
# non empty lists are TRUE
print ("yes" if ["one element"] else "no")

yes


In [158]:
# the empty list is FALSE
print ("yes" if [] else "no")

no


In [159]:
# non empty character strings are TRUE
print ("yes" if "Hello" else "no")

yes


In [160]:
# the empty string is FALSE
print ("yes" if "" else "no")

no


Boolean statements can be combined using `and`. Both must be true for the combination to be evaluated as `True`.

In [161]:
True and True

True

In [162]:
False and True

False

Boolean statements can be combined using `or`. At least one statement must be true for the combination to be evaluated as `True`.

In [163]:
False or True

True

In [164]:
True or False

True

A boolean statement can be negated using `not`.

In [165]:
not True

False

In [166]:
not False

True