# Week 1: Introduction to Python (Part 1)

This is an Jupyter notebook, a web-based interactive computational environment. 
- Cells can contain markdown or code. 
- To run a code cell, press shift+Enter. 
- Jupyter will print the output from a cell, beneath it.

This session is designed to give you the working knowledge of Python necessary to complete the lab sessions for Natural Language Engineering. 

- Run all of the code cells as you work through the notebook. 
- Try to understand what is happening in each code cell and predict the output before running it.
- Complete all of the exercises.
- Discuss answers and ask questions!


## Python types

### String
Strings are enclosed in double or single quotes in Python.

In [None]:
print('Hello World')

In [None]:
print("Hello World")

In [None]:
# This is a comment (# at the beginning of the line)
# Note that a string enclosed in double quotes can contain single quotes as part of the string:
print("'A reader lives a thousand lives before he dies,' said Jojen. 'The man who never reads lives only one.'")

In [None]:
# ...and a string enclosed in single quotes can contain double quotes as part of the string:
print('"A reader lives a thousand lives before he dies," said Jojen. "The man who never reads lives only one."')

As an alternative to using the explicit `print` function, when a cell is run, Python will print the value of the last line of code in a cell. Try running the following cell.

In [None]:
"Hello World"
'"A reader lives a thousand lives before he dies," said Jojen. "The man who never reads lives only one."'

### Integer

In [None]:
75

### Float

In [None]:
6.3646

When a string contains just digits, the function `int` will **cast** that string to an integer.

In [None]:
# give the type of the string '623'
type('623')

In [None]:
# cast the string '623' to an integer
int('623')

In [None]:
# give the type that results from casting the string '623' to an integer.
type(int('623'))

## Basic operations

Strings can be joined using `+`

In [None]:
"Hello " + "World"

Standard operators are used on integers and floats: `+`, `-`, `*`, and `/`.

In [None]:
7 - 3 + 5

In [None]:
100*200*1000000

In [None]:
3.5*8/4

If we want to use floor division (rounded down to nearest integer) use `//`.

In [None]:
7//2

Use `**` for exponentiation - e.g. `3**2 = 3^2`.

In [None]:
# This is equivalent to 2*2*2*2*2
2**5

In [None]:
10**4

Use double equals, `==`, to check equality.

In [None]:
5*4 == 2*10

Modulo operator `%` returns the remainder after integer division.  
e.g. 13/5 = 2 with 3 leftover, so `13%5=3`.

In [None]:
7%3

In [None]:
4 % 2

## Python error reports
e.g. when attempting to join a **string** and an **integer**

In [None]:
"Hello" + 3

### Exercise 1
In the empty cell below write a single line Python expression to print "Hello world! My name is", joined with another string containing your name

In [7]:
print("Hello world! My name is", "Jacob")

Hello world! My name is Jacob


## Python identifiers
Assign a variable name to any value (eg string, integer, float) using a single equals sign.

In [None]:
student_name = "Adam"
student_name

In [None]:
student_age = 21
student_age

Operations can be carried out as before, using the variable names.

In [None]:
student_age/2

We can update values associated with a variable using the operators `+=` , `-=` , `/=`, and `*=`.

- For example, `+=` adds the number on the right to the current value.

This is a useful shortcut - take your time to play around and familiarise yourself with this syntax.

In [None]:
#Note that each time you run this cell, it will add 5 to the stored value.
student_age += 5
student_age

In [None]:
age_next_year=student_age+1
age_next_year

### Exercise 2a
In the cell below, assign appropriate values to the variables `my_name`, `my_age`, and `years_at_sussex`.

In [15]:
my_name = "Jacob Cons"
my_age = 20
years_at_sussex = 0

### Exercise 2b
In the cell below subtract `years_at_sussex` from `my_age` and assign this value to a new variable called `age_started_sussex`.

In [16]:
age_started_sussex = my_age - years_at_sussex

### Exercise 2c
In the cell below practice using the `**`,  `+=` , `-=`, `/=`, and `*=` operators to update these values.

In [17]:
my_age += my_age ** 2
years_at_sussex += 1
print(my_age, years_at_sussex)

420 1


## Dynamic typing
The `type` function is used to get an object's type: `int` for integer, `str` for string, etc.

In [11]:
type(student_name)

NameError: name 'student_name' is not defined

In [12]:
type(student_age)

NameError: name 'student_age' is not defined

As Python has dynamic typing, if a variable name is assigned to a new value of different type, the variable's type will change accordingly.

In [13]:
student_age = "Twenty"
type(student_age)

str

### Exercise 3
In the cell below reassign your `my_age` and `years_at_sussex` `int` variables to `string` giving the number in words. Print the type of these variables before and after.

In [18]:
print(type(my_age), type(years_at_sussex))
my_age = str(my_age)
years_at_sussex = str(years_at_sussex)
print(type(my_age), type(years_at_sussex))

<class 'int'> <class 'int'>
<class 'str'> <class 'str'>


## Lists

Lists are initialised using square brackets, with objects separated by commas.

In [None]:
primes = [2, 3, 5, 7, 11]
type(primes)

Lists can contain any data type.

In [None]:
list_of_strings =['string','another string','a third string']
list_of_strings

'Empty' lists with no elements can also be initialised.

In [None]:
empty_list = []

Indexing into lists uses square brackets.
- Note that indexing starts from zero.

In [None]:
primes[0]

A colon, `:`, can be used to take a slice of a list between two indices.
- Note that this will start from the first index, up to but NOT including the second index.

In [None]:
primes[1:3]

If either index is omitted, the slice will go to the beginning/end of the list.

In [None]:
primes[:3]

To index from the end of the list use negative numbers.

In [None]:
primes[-1]

In [None]:
primes[-2:]

To test for list membership use the keyword `in`.

In [None]:
5 in primes

In [None]:
6 in primes

The function `len` gives the length of a list.

In [None]:
len(primes)

To append an element to a list use `append`.

In [None]:
primes.append(13)

In [None]:
primes.append(17)

In [None]:
primes

Using `append` with a list as parameter adds the list as a single element - producing a list that contains a list as its last element.

In [None]:
primes = [2, 3, 5, 7, 11, 13]
primes.append([17,19])
primes

When we want to add the elements of one list individually to another list, use the `+=` operator to concatenate the two lists.

In [None]:
primes = [2, 3, 5, 7, 11, 13]
primes += [17,19]
primes

To write a for loop that iterates over a list use keywords `for` and `in`, `:`, and indentation to indicate the scope of the body of the loop.

In [None]:
for prime in primes:
    print(prime,"is a prime")

### Exercise 4a
In the cell below initialise the variable `squares` to be a list of the square numbers from 1 to 16 inclusive.

In [19]:
squares = [x**2 for x in range(1, 17)]

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256]


### Exercise 4b
In the cell below append the next square number to the list `squares`.

In [22]:
squares.append(17**2)

### Exercise 4c
In the cell below make a list of the next two square numbers and concatenate this with `squares`.

In [21]:
squares += [18**2, 19**2]

### Exercise 4d
In the cell  below check how many items are in the list now.

In [23]:
len(squares)

19

### Exercise 4e
In the cell below use indexing to print just the first 3 and last 3 items in the list `squares`

In [25]:
print(squares[0:3], squares[-3:])

[1, 4, 9] [324, 361, 289]


### Exercise 4f
In the cell below, use a `for` loop to print each item in the list `squares` on its own line, as part of a sentence. The output should like like this:
```
The first square in the list is  1
The next square in the list is  4
The next square in the list is  9
The next square in the list is  16
The next square in the list is  25
The next square in the list is  36
The last square in the list is  49
```

In [33]:
print("The first square in the list is", squares[0])
for s in squares[1:-1]:
    print("The next square in the list is", s)
print("The last square in the list is", squares[-1])

The first square in the list is 1
The next square in the list is 4
The next square in the list is 9
The next square in the list is 16
The next square in the list is 25
The next square in the list is 36
The next square in the list is 49
The next square in the list is 64
The next square in the list is 81
The next square in the list is 100
The next square in the list is 121
The next square in the list is 144
The next square in the list is 169
The next square in the list is 196
The next square in the list is 225
The next square in the list is 256
The next square in the list is 324
The last square in the list is 361


## Strings

In [None]:
# Here we asign a string "Hello World" as the value a variable called hello_world
hello_world = "Hello World"

String indexing is similar to list indexing, but works on a character-by-character basis.

In [None]:
hello_world[0]

In [None]:
hello_world[7]

In [None]:
hello_world[-3:]

In [None]:
hello_world[-40]

Test for substring presence using the keyword `in`.

In [None]:
"w" in hello_world

In [None]:
"W" in hello_world

In [None]:
"llo" in hello_world

Find the length of a string using `len`.

Note that the output value is a count including spaces, tabs and non-alphanumeric characters.

In [None]:
len(hello_world)

In [None]:
hello_world+="!"
hello_world

In [None]:
len(hello_world)

Iterating over a string involves similar syntax to list iteration, but works on a character-by-character basis.

In [None]:
for char in hello_world:
    print ("the character >>>", char, "<<< is present")

The `split` method provides a simplistic way to parse a string into words.   By default, it separates based on whitespace and will returns a list of *tokens*.   We will learn more about tokenisation in week 2.

An optional character can be passed to split as an argument.  See the difference if you change the following cell so that the second line reads `words = sentence.split('s')`



In [None]:
sentence = "This is a sample sentence"
words = sentence.split()
print(words)

To check for the presence of a token in a list of words use the `in` keyword.

In [None]:
"sample" in words

In [None]:
"Hello" in words

### Exercise 5a
In the empty cell below  assign the string `"It was the best of times, it was the worst of times"` to the variable `opening_line`.

In [35]:
opening_line = "It was the best of times, it was the worst of times"

### Exercise 5b
In the empty cell below check whether 'worst' appears in opening_line.

In [36]:
"worst" in opening_line.split()

True

### Exercise 5c
In the empty cell below make a list of the words in `opening_line`, assigned to the variable `dickens_words`, and iterate over `dickens_words`, printing one word per line.

In [37]:
dickens_words = opening_line.split()
for word in dickens_words:
    print(word)
    

It
was
the
best
of
times,
it
was
the
worst
of
times


### Exercise 5d
In the empty cell below check whether `'blurst'` appears in the list you made.

In [38]:
"blurst" in dickens_words

False

## Conditions and booleans

In [None]:
if 2 > 3:
    print ("yes")
else:
    print ("no")

There are some useful string *shape* methods, which form part of the String class and can be used to test for certain types of string.  Work out what each of the following test for:
- astring.isalpha()
- astring.isalnum()
- astring.isdigit()

In [None]:
"This".isalpha()

In [None]:
"This,".isalpha()

In [None]:
"M25".isalpha()

In [None]:
"M25".isalnum()

In [None]:
"463".isdigit()

In [None]:
# non zero numbers are TRUE
print ("yes" if 15 else "no")

In [None]:
# zero is FALSE
print ("yes" if 0 else "no")

In [None]:
# non empty lists are TRUE
print ("yes" if ["one element"] else "no")

In [None]:
# the empty list is FALSE
print ("yes" if [] else "no")

In [None]:
# non empty character strings are TRUE
print ("yes" if "Hello" else "no")

In [None]:
# the empty string is FALSE
print ("yes" if "" else "no")

Boolean statements can be combined using `and`. Both must be true for the combination to be evaluated as `True`.

In [None]:
True and True

In [None]:
False and True

Boolean statements can be combined using `or`. At least one statement must be true for the combination to be evaluated as `True`.

In [None]:
False or True

In [None]:
True or False

A boolean statement can be negated using `not`.

In [None]:
not True

In [None]:
not False