# Topic 0: Introduction to Python

This is an Jupyter notebook, a web-based interactive computational environment. Cells can contain markdown or code. To run a code cell, press shift+Enter. Jupyter will automatically print the output from a cell, beneath it.

This session is designed to give you the working knowledge of Python necessary to complete the lab sessions for Natural Language Engineering. Try to understand what is happening in each code cell and predict the output before running it.

You should also try editing the code snippets, and complete the 13 exercises.


## Python types

### 1) String

Strings are enclosed in double or single quotes in Python.

In [1]:
'Hello World'

'Hello World'

In [None]:
"Single quotes require one less key press!"

In [None]:
# This is a comment (# at the beginning of the line)
# Note that a string enclosed in double quotes can contain single quotes as part of the string:
"'A reader lives a thousand lives before he dies,' said Jojen. 'The man who never reads lives only one.'"

In [None]:
# ...and a string enclosed in single quotes can contain double quotes as part of the string:
'"A reader lives a thousand lives before he dies," said Jojen. "The man who never reads lives only one."'

### 2) Integer

`75`

### 3) Float

`6.3646`

Note that a string may contain just a number. This can be used to generate an integer using the `int` function.

In [2]:
type('623')

str

In [3]:
int('623')

623

In [2]:
# Generate an integer from the string '623' and then check its type
type(int('623'))

int

## Basic operations

Strings can be joined using `+`

In [None]:
"Hello " + "World"

Standard operators are used on integers and floats: `+`, `-`, `*`, and `/`.

In [8]:
 7 - 3 + 5

9

In [9]:
3.5*8/4

7.0

If we want to use floor division (rounded down to nearest integer) use `//`.

In [7]:
7//2

3

Use `**` for exponentiation - e.g. `3**2 = 3^2`.

In [10]:
# This is equivalent to 2*2*2*2*2
2**5

32

Use double equals, `==`, to check equality.

In [11]:
5*4 == 2*10

True

Modulo operator `%` returns the remainder after integer division.  
e.g. 13/5 = 2 with 3 leftover, so `13%5=3`.

In [12]:
7%3

1

In [13]:
4 % 2

0

### Python error reports

e.g. when attempting to join a **string** and an **integer**

In [14]:
"Hello" + 3

TypeError: must be str, not int

### Exercise 1
a) In the empty cell below write a single line Python expression to print "Hello world! My name is", joined with another string containing your name

In [None]:
%save -f ../Solutions/0/my_tokeniser 2

In [2]:
print("Hello world! My name is " + "David Weir")

Hello world! My name is David Weir


**b) Use Python as a calculator - practice using addition, subtraction, multiplication, division and exponentiation and check that you get the results you expect **

## Python identifiers
**Assign a variable name to any value (eg string, integer, float) using a single equals sign**

In [15]:
student_name = "Adam"
student_name

'Adam'

In [None]:
student_age = 21
student_age

**Operations can be carried out as before, using the variable names**

In [None]:
student_age/2

**We can update values associated with a variable using the operators** += , -= , /= , *=

**For example, **+=** adds the number on the right to the current value**

**This is a useful shortcut - take your time to play around and familiarise yourself with this syntax**

In [None]:
#Note that each time you run this cell, it will add 5 to the stored value.
student_age+=5
student_age

In [None]:
age_next_year=student_age+1
age_next_year

### Exercise 2

**Create integer variables called **my_name**, **my_age**, and **years_at_sussex

**Subtract **years_at_sussex** from **my_age** and assign this value to a new variable called **age_started_sussex

**Practice using the **  += , -= , /= , \*= ** operators to update these values**

## Dynamic typing
**The **type** function is used to get an object's type: **int** for integer, **str** for string, etc**

In [None]:
type(student_name)

In [None]:
type(student_age)

**As Python has dynamic typing, if a variable name is assigned to a new value of different type, the variable's type will change accordingly**

In [None]:
student_age = "Twenty"
type(student_age)

**Try it for yourself - reassign your **my_age** and **years_at_sussex** **int** variables to **string**s giving the number in words. Check the type of these variables before and after**

## Lists

List**s are initialised using square brackets, with objects separated by commas**

In [18]:
primes = [2, 3, 5, 7, 11]
type(primes)

list

**Lists can contain any data type**

In [None]:
list_of_strings =['string','another string','a third string']
list_of_strings

**'Empty' lists with no elements can also be initialised**

In [34]:
empty_list = []

### Indexing

**Also uses square brackets**

**Indexing starts from zero**

In [19]:
primes[0]
    

2

**Use indexing with colon : to take a slice of list between two indices**
**Note that this will start from the first index, up to but NOT including the second index**

In [None]:
primes[1:3]

**If either index is omitted, the slice will go to the beginning/end of the list**

In [None]:
primes[:3]

### Indexing from the end of the list

**Using negative numbers**

In [20]:
primes[-1]

11

In [21]:
primes[-2:]

[7, 11]

### Testing for list membership

**Using the keyword **in

In [22]:
5 in primes

True

In [23]:
6 in primes

False

### Getting the number of elements in a list

**Using **len** again**

In [24]:
len(primes)

5

### Appending an element to a list

In [26]:
primes.append(13)

In [27]:
primes.append(17)

In [28]:
primes

[2, 3, 5, 7, 11, 13, 17]

### Appending a list to another list  

**Using **append** with a list as parameter adds the list as a single element - a 'list of lists'**

In [29]:
primes = [2, 3, 5, 7, 11, 13]
primes.append([17,19])
primes

[2, 3, 5, 7, 11, 13, [17, 19]]

**So if  we want to add the elements of one list individually to another list, use the **+= **operator to concatenate the two lists**

In [30]:
primes = [2, 3, 5, 7, 11, 13]
primes += [17,19]
primes

[2, 3, 5, 7, 11, 13, 17, 19]

### Iterate over a list

**To write a for loop in python use keywords **for** and** in**, :, and indentation**

In [31]:
for prime in primes:
    print (prime,"is a prime")

2 is a prime
3 is a prime
5 is a prime
7 is a prime
11 is a prime
13 is a prime
17 is a prime
19 is a prime


### Exercise 3

**Make a list of square numbers from 1 to 16 inclusive**

**Append the next square number to this list**

**Make a list of the next two square numbers and concatenate this with the original list**

**Check how many items are in the list now**

**Use indexing to print just the first 3 and last 3 items in the list**

**Print each item in the list on its own line, as part of a sentence**

## Strings

In [35]:
# Here we save a string "Hello World" into a variable called hello_world
hello_world = "Hello World"

### Indexing into a string

**Note the similar syntax to list indexing, but works on a character-by-character basis**

In [None]:
hello_world[0]

In [None]:
hello_world[7]

In [None]:
hello_world[-3:]

In [None]:
hello_world[-6]

### Testing for substring presence

**Using the keyword **in

In [36]:
"w" in hello_world

False

In [37]:
"W" in hello_world

True

In [38]:
"llo" in hello_world

True

### Length of a string

**Again, we use **len**. Note that the output value is a count including spaces, tabs and non-alphanumeric characters**

In [39]:
len(hello_world)

11

In [40]:
hello_world+="!"
hello_world

'Hello World!'

In [41]:
len(hello_world)

12

### Printing strings

**Notebook will automatically print the last output of a cell. Other Python development environments will usually require an explicit call to print**

In [42]:
print(hello_world)

Hello World!


### Iterate over a string

**Note the similar syntax to list iteration, but works on a character-by-character basis**

In [43]:
for char in hello_world:
    print ("the character >>>", char, "<<< is present")

the character >>> H <<< is present
the character >>> e <<< is present
the character >>> l <<< is present
the character >>> l <<< is present
the character >>> o <<< is present
the character >>>   <<< is present
the character >>> W <<< is present
the character >>> o <<< is present
the character >>> r <<< is present
the character >>> l <<< is present
the character >>> d <<< is present
the character >>> ! <<< is present


### Parsing a string into words
**The **split** method returns a list of tokens in a sentence. By default, it separates based on whitespace**

In [44]:
sentence = "This is a sample sentence"
words = sentence.split()
print (words)

['This', 'is', 'a', 'sample', 'sentence']


### Checking the presence of a token in a list of words

words **is just a list, so use the **in** keyword, as before**

In [45]:
"sample" in words

True

In [46]:
"Hello" in words

False

### Exercise 4: 
**Make a variable called opening_line, set to a string containing the following sentence: "It was the best of times, it was the worst of times"**

**Check whether 'worst' appears in opening_line**

**Make a **list** of the words in opening_line, assigned  to variable name **dickens_words**, and iterate over **dickens_words**, printing one word per line**

**Check whether 'blurst' appears in the list you made**

### Other Python types: functions

**Functions are defined using the keyword 'def', a function name, and a list of parameters in parentheses. Don't forget the ':'**

**The body of the function starts on the next line, and must be indented**

In [47]:
def double(number):
     return(number * 2)

In [48]:
double(13)

26

In [49]:
type(double)

function

In [50]:
def add_question_mark(string):
    return string+"?"

In [51]:
add_question_mark("what's your name")

"what's your name?"

In [52]:
def print_first_half(string):
    half_length_of_string = len(string)//2 #use floor division as indices must be integers
    return string[:half_length_of_string]

In [53]:
print_first_half('hi how are you doing?')

'hi how are'

### Exercise 5

**a) Define a function called square that returns an input parameter squared (hint: check the 'basic functions' section above, for the Python syntax for exponentials)**

**b) Define a function that takes a sentence **string** as an input, and returns a **list** of the words in the sentence**

# Other Python types: classes

In [55]:
#The pass statement is a null operation used here as a placeholder; class definitions would go here
class Example:
    pass

In [56]:
Example

__main__.Example

In [57]:
type(Example)

type

**Creating an instance of a class (remember every class defines a type)**

In [58]:
my_example = Example()

In [59]:
my_example

<__main__.Example at 0x10e7346d8>

In [60]:
type(my_example)

__main__.Example

**It's easy to mix types. Here is a mixed type list:**

In [61]:
[21, "Brighton", double, MyClass, my_example, []]

NameError: name 'MyClass' is not defined

## Sets  

**These are *unordered* lists of *unique* elements**
(**note curly  brackets:**)

In [63]:
unique_numbers = {1, 2, 2, 2, 3}
unique_numbers

{1, 2, 3}

In [64]:
type(unique_numbers)

set

**To initialise an empty set, use **set()

In [65]:
new_set = set()
type(new_set)

set

### Number of elements in a set - what does it count?

In [66]:
len(unique_numbers)

3

### Checking the presence of an element in a set

**Do this using the keyword **in**, similarly to lists**

In [67]:
2 in unique_numbers

True

### Iterating over a set

**Syntax is also similar to iterating over a set: remember to use **for, in, : ** and indentation**

In [68]:
for number in unique_numbers:
    print (number * 3)


3
6
9


In [69]:
for number in unique_numbers:
    print (double(number))

2
4
6


###  Exercise 6
**Create a function called get_vocabulary that takes a *list* of words as input, and returns a *set* of unique words**

**Use your function to create the set dickens_vocab, a set of unique words in the** opening_line** (from exercise 4)**

## Dictionaries

**A dictionary is an *unordered* set of **key : value** pairs. Keys are used to index the dictionary, and the main operations are storing a value with a key, and then extracting a specific value using its key. Each key in a given dictionary must be unique.** 

**A dictionary is initialised with curly braces. This can contain comma-separated key:value pairs. Note the use of ':' to map a key to a value**

In [70]:
simpsons_ages = {"Bart":10, "Lisa":8, "Homer" : "thirty something"}
simpsons_ages

{'Bart': 10, 'Homer': 'thirty something', 'Lisa': 8}

In [71]:
type(simpsons_ages)

dict

### How to access the values of keys in a dictionary

In [72]:
simpsons_ages["Homer"]

'thirty something'

In [None]:
simpsons_ages['Bart']

### Getting the number of elements in a dictionary

**Just like getting the length of a list, we use the keyword **len

In [73]:
len(simpsons_ages)

3

### Checking the presence of a key in a dictionary

In [None]:
"Marge" in simpsons_ages

In [None]:
"Bart" in simpsons_ages

In [74]:
simpsons_ages["Lisa"]

8

### Accessing a key that does not exist is an error

In [75]:
simpsons_ages["Krusty"]

KeyError: 'Krusty'

### Adding a new key:value entry to the dictionary

In [76]:
simpsons_ages["Marge"] = 34
simpsons_ages["Marge"]

34

### Exercise 7
**Add two extra key-value pairs to the dictionary, each consisting of a name and corresponding age**

### Iterating over keys in the dictionary

In [None]:
for person in simpsons_ages: 
     print (person)

### Iterating over the items (key-value pairs) of a dictionary

In [77]:
for item in simpsons_ages.items():
     print (item)

('Bart', 10)
('Lisa', 8)
('Homer', 'thirty something')
('Marge', 34)


### Iterating over the keys and values of a dictionary

In [78]:
#Note that 'person' and 'age' here are arbitary variable names, and  can be replaced with any two names eg 'key' and 'value'
for person, age in simpsons_ages.items():
     print (person," is ", age, " years old")

Bart  is  10  years old
Lisa  is  8  years old
Homer  is  thirty something  years old
Marge  is  34  years old


### Exercise 8
**Make a new dictionary called 'shapes' where the keys are names of shapes and the values are the corresponding number of sides**

**Iterate over the keys and values, printing each key and value in a sentence (eg 'a triangle has 3 sides')**

### Making dictionaries which have a default value if not specified

In [80]:
# To do this, we need to use a class that is not built-in, so we import it
import collections
word_counts = collections.defaultdict(int)
# the "int" parameter will create entries with a default value of 0
word_counts

defaultdict(int, {})

In [81]:
type(word_counts)

collections.defaultdict

In [82]:
len(word_counts)

0

In [83]:
"This" in word_counts

False

In [84]:
word_counts["This"]

0

In [85]:
# an entry has been automatically created with the default value of 0, just by querying the default dictionary
"This" in word_counts

True

In [86]:
len(word_counts)

1

In [87]:
# we can add a new entry with a value of 1
word_counts["is"] += 1
#querying this key in the default dictionary makes an entry with the default value of 0, and we add 1 to this
word_counts["is"]

1

In [None]:
# we can also update the value of a key
word_counts["is"] += 5
word_counts["is"]
6

### Exercise 9: 

**Write a Python program that will print, one word per line, each word in dickens_words together with the number of times that word appears in dickens_words.**

# Files
### Files have a file path
**Here we save to a variable a string that contains a file path **

In [None]:
#Make sure the file path points to a valid file
input_file_path = "N:/nle_notebooks/sample_text.txt"

**We now use the file path variable to **open** the file. We need to do this before reading/writing to it.**

In [None]:
input_file = open(input_file_path)
type(input_file)

**Use the **read** command to read the entire file contents into a **str** variable called** input_text

In [None]:
input_text = input_file.read()
type(input_text)

In [None]:
input_text

### When you are done with the file, close it

In [None]:
input_file.close()

### After the file has been closed it cannot be read any more

In [None]:
input_text = input_file.read()

### Exercise 10: 
**Write code that will open the file "sample_text.txt", then print, one word per line, each word in the file together with the number of times that word appears in the file.**

# Tuples

**A tuple consists of a number of values separated by commas. These can be different types. It is initialised with parentheses, containing its objects separated by commas.**

In [None]:
person = ("Jon", 14, "jon@thewall.com")
person

type(person)

### Counting the of elements in a tuple

In [None]:
len(person)

### Indexing into a tuple
**Works similarly to indexing into a list**

In [None]:
person[0]

In [None]:
person[-2:]

### Tuples as values in dictionaries

In [None]:
#Note that each key is a string, and each value is a tuple
people = {"Joffrey":(12, "Baratheon", "joff@kingslanding.com"), "Jon":(14, "Snow", "jon@thewall.com")}
people["Joffrey"]

In [None]:
### Jon's age - we access this using the dictionary key, and then indexing within the value:
people["Jon"][0]

In [None]:
### Joffrey's email
people["Joffrey"][2]

In [None]:
#  list everyone's first and last names:
for person, record in people.items():
     print (person, record[1])

### Exercise 11
**Create a dictionary called **address_book**, with at least 3 key-value entries. Each should consist of a person's name in string format (the key), and a tuple with corresponding pieces of information about them (the value)**

**Iterate over the address book, printing information about each person into a sentence**

# The range function

**This produces a generator of numbers in a specified range**

In [88]:
indices = range(0,5)
indices

range(0, 5)

In [89]:
type(indices)

range

In [90]:
len(indices)

5

**The output from a **range ** can be used as a set of indices**

In [91]:
for i in indices:
    print (words[i])

This
is
a
sample
sentence


**If** range **is given a single parameter, it will create a range from zero**

In [None]:
for i in range(10):
    print (i)

**Use **range** to print a list of the first 10 integers**

**Use **range** to print a list of the first 10 cubes**

# The zip function

**The zip function is used to 'match' the corresponding elements between multiple iterables.** 

**It takes multiple iterables as arguments, and returns a list of tuples where the i-th tuple consists of the i-th element from each of the input iterables**

**In the example below, we 'zip together' **words** and **indices** into a series of tuples called **word_positions**. For example, the 3rd element of **word_positions** contains the 3rd element of **words** and the 3rd element of **indices.

In [None]:
word_positions = zip(words, indices)
type(word_positions)

In [None]:
for word, position in word_positions:
     print ("'",word, "' is in position", position)


### Exercise 12
**Use **range** to produce a list of numbers 1-12 called** one_to_twelve

**Use **zip** to produce a series of tuples called **words_and_numbers** containing elements from ** one_to_twelve **and** dickens_words

**Iterate over **words_and_numbers**, printing each word and its corresponding number as part of a sentence**

# The map function

**This takes a function and an iterable (eg a list) as arguments. It then applies the function to every item of the iterable. Finally it  returns a list of the results.**

In [None]:
#First we make a function, which we will pass to the map function in the next cell
natural_numbers = range(5)
def square(n):
    return n**2

square(5)

In [None]:
squared_numbers = map(square, natural_numbers)
for i in squared_numbers:
    print (i)

In [None]:
def decorate(char):
     return "*" + char + "*"

decorate("A")

In [None]:
decorated_characters = map(decorate, "Hello")
type(decorated_characters)

In [None]:
decorated_characters = map(decorate, "Hello")
for char in (decorated_characters):
     print (char)

**Write a function called add_exclamation which adds a '!' to the input string**

**Use** map ** and **add_exclamation** to print each word in **dickens_words**, followed by an exclamation point**

# The if statement

In [None]:
if 2 > 3:
    print ("yes")
else:
    print ("no")

**String shape functions**

In [None]:
"This".isalpha()

In [None]:
"This,".isalpha()

In [None]:
"M25".isalpha()

In [None]:
"M25".isalnum()

In [None]:
"463".isdigit()

# Truth in Python

In [138]:
# non zero numbers are TRUE
print ("yes" if 15 else "no")

yes


In [139]:
# zero is FALSE
print ("yes" if 0 else "no")

no


In [140]:
# non empty lists are TRUE
print ("yes" if ["one element"] else "no")

yes


In [141]:
# the empty list is FALSE
print ("yes" if [] else "no")

no


In [142]:
# non empty character strings are TRUE
print ("yes" if "Hello" else "no")

yes


In [143]:
# the empty string is FALSE
print ("yes" if "" else "no")

no


**Boolean statements can be combined using 'and'. Both must be true for the combination to be evaluated as True**

In [None]:
True and True

In [None]:
False and True

**Boolean statements can be combined using 'or'. At least one statement must be true for the combination to be evaluated as True**

In [None]:
False or True

In [None]:
True or False

# List comprehension

**Create a list of squares**


In [92]:
[x**2 for x in range (4)]

[0, 1, 4, 9]

In [93]:
squares = [x*x for x in range(4)]
type(squares)

list

In [None]:
len(squares)

**Create a list of decorated characters**

In [94]:
["*" + char + "*" for char in "Hello"]

['*H*', '*e*', '*l*', '*l*', '*o*']

**Create a list of even numbers**


In [None]:
[double(n) for n in range(4)]

**Define a function that returns True for even numbers**

In [None]:
#Remember the mod operator % returns the residue after integer division
def is_even(n):
    return not n % 2

In [None]:
is_even(8)

In [None]:
is_even(7)

**Create a list of squares for the first even numbers**

In [None]:
[square(n) for n in range(15) if is_even(n)]

### Exercise 13
**Create a list of the first odd numbers in the range 0-20**

**Create a list of numbers in the range 0-20 that are both odd AND divisible by 3**