# Week 1: Introduction to Python (Part 1)

This is a python notebook file 
- it has the file extension **.ipynb** 
- it can be run in a web-based interactive computational environment such as **Google colab** or using Anaconda's Jupyter notebook environment

Cells in a python notebook can contain **markdown** (like this one) or **code**.
- To run a cell, press shift+Enter (or click on **Run**). 
- The output from a code cell will be printed beneath it.
- If you run a markdown cell, the text will be formatted according to the markdown instructions.  
    - If you are using Colab, click edit on this cell and see how the text was actually input.
    - If you are using Anaconda, double click on this cell and see how the text was actually input.


The notebooks this week are designed to give you the working knowledge of Python necessary to complete the lab sessions for Natural Language Engineering. 

- Run all of the code cells as you work through the notebook. 
- Try to understand what is happening in each code cell and predict the output before running it.
- Get used to adding your own cells (both code and text) whereever you want to try things out
    - You can use the + icon to add a cell. Find out how to change between adding markdown and code cells in whichever environment you are using.
- Complete all of the exercises.
- Discuss answers and ask questions!

Add some more notes here


## 1.1.1 Python types

We are going to start by looking at some basic datatypes in Python.
- String
- Integer
- Float

### Strings
A String is a datatype used to represent text.  In Python, Strings can be enclosed in double or single quotes.
- 'Hello World'
- "Hello World"  

Quite often we want to display or print strings to output - we can do this with Python's built-in *print()* function.  We will look more at functions later - but for now, *print()* is a function which takes one or more arguments (specified in the () after the keyword *print*).  The arguments will be printed in the output when the cell is run.

Run the code in the cells below by clicking on them and then pressing "shift"+"enter" (or by clicking on the play button next to the cell in google colab).

In [1]:
print('Hello World')

Hello World


In [2]:
print("Hello World")

Hello World


In [3]:
# This is a comment (# at the beginning of the line)
# Note that a string enclosed in double quotes can contain single quotes as part of the string:
print("'A reader lives a thousand lives before he dies,' said Jojen. 'The man who never reads lives only one.'")

'A reader lives a thousand lives before he dies,' said Jojen. 'The man who never reads lives only one.'


In [4]:
# ...and a string enclosed in single quotes can contain double quotes as part of the string:
print('"A reader lives a thousand lives before he dies," said Jojen. "The man who never reads lives only one."')

"A reader lives a thousand lives before he dies," said Jojen. "The man who never reads lives only one."


As an alternative to using the explicit `print` function, when a cell is run, Python will print the value of the **last line of code** in a cell. Try running the following cell.

In [5]:
"Hello World"
'"A reader lives a thousand lives before he dies," said Jojen. "The man who never reads lives only one."'

'"A reader lives a thousand lives before he dies," said Jojen. "The man who never reads lives only one."'

### Integers
Integers represent whole numbers

In [6]:
75

75

### Floats
Floats represent decimal or floating point numbers

In [7]:
6.3646

6.3646

When a string contains just digits, the function `int` will **cast** that string to an integer.

In [8]:
# give the type of the string '623'
type('623')

str

In [9]:
# cast the string '623' to an integer
int('623')

623

In [10]:
# give the type that results from casting the string '623' to an integer.
type(int('623'))

int

Thinking about types is really important as the type of a variable or value determines what operations can be carried out on it.  

## 1.1.2 Basic operations

Now we will look at some operations which can be carried out on the basic datatypes.

Strings can be joined using `+`

In [11]:
"Hello " + "World"

'Hello World'

Standard mathematical operators can be used on integers and floats: `+`, `-`, `*`, and `/`.

In [12]:
7 - 3 + 5

9

Note that '+' is overloaded.  If it is applied to strings then it causes them to be concatenated.  If it is applied to integers or floats then it causes them to added.

In [13]:
'6'+'2'

'62'

In [14]:
6+2

8

In [15]:
100*200*1000000

20000000000

In [16]:
3.5*8/4

7.0

If we want to use floor division (rounded down to nearest integer), we use `//`.

In [17]:
7//2

3

Use `**` for exponentiation - e.g. `3**2 = 3^2`.

In [18]:
# This is equivalent to 2*2*2*2*2
2**5

32

In [19]:
10**4

10000

Use double equals, `==`, to check equality.

In [20]:
5*4 == 2*10

True

Modulo operator `%` returns the remainder after integer division.  
e.g. 13/5 = 2 with 3 leftover, so `13%5=3`.

In [21]:
7%3

1

In [22]:
4 % 2

0

## 1.1.3 Python error reports

Sometimes your code won't work and will generate an error.  You need to get used to looking at error reports and seeing what the type of error is and where it has occurred.

For example, in the code below a Type Error occurs when attempting to concatenate a **string** and an **integer**.  This is because the operator + can only be used on values of the same type (e.g., two strings or 2 integers or 2 floats).

In [23]:
"Hello" + 3

TypeError: can only concatenate str (not "int") to str

### **Exercise 1**
In the empty cell below write a single line Python expression to print "Hello world! My age is", joined with another string containing your age.

In [24]:
"Hello world! My age is "+str(21)

'Hello world! My age is 21'

## 1.1.4 Python identifiers

Normally, we want to store values in variables so we can use them later.

We can create a new variable (with any name) and **assign** a value to it (e.g., string, integer, float) using a single equals sign.


In [25]:
student_name = "Adam"

The code above didn't generate any output - it just stored the string value "Adam" in the variable called *student_name*.  To see the current value of any variable, we can use the print function or just run a cell containing the variable name alone (or as the last line of the cell).

In [26]:
print(student_name)

Adam


In [27]:
student_name

'Adam'

You can use any name you like for a variable.  Note that variable names differ from Strings.  Strings are values which are enclosed in either single or double quotes.  Variable names do not have quotes.  A common mistake is to forget the quotes when you assign a String to a variable - this will generate an error (assuming you don't also have a variable with this name): 

In [28]:
student_name=Bill

NameError: name 'Bill' is not defined

In [29]:
student_name="Bill"

Be careful not to choose names which are also Python key words.  Python key words are highlighted in a different colour in the notebook (usually green).  

In [30]:
print = "Adam"

I have now overwritten the inbuilt function print with a String.  This means I can no longer use the *print()* function.

In [31]:
print(student_name)

TypeError: 'str' object is not callable

Even, if you go back and change or delete the offending cell, the print function appears to be gone.  

Go back and try deleting the cell where we assigned the value "Adam" to print, and then calling the print function again.

In [32]:
print(student_name)

TypeError: 'str' object is not callable

You can fix this by using the del function or by restarting the runtime environment.  To restart the runtime environment, go to the **Kernel** menu and select `restart`.  Note, the **kernel** is the name used for the server or process which is actually running your notebook and your cells of code.  By restarting it, you are effectively rebooting and making everything as if you had just opened the notebook (and not run any cells). 

In [33]:
del print
print(student_name)

Bill


So, when choosing variable names, remember:
- don't use keywords (or anything which might be a keyword).  
- you could add numbers or extra words to the ends of your variable names to avoid accidentally overwriting keywords

It is also best to: 
- use meaningful variable names.  This will to help you remember what the variables store (and to help other people read your code)
- use _ to join separate words to form a single variable name.  This is a Python convention which is different to the convention of using camelCase (e.g., studentName) in other languages such as Java.

In [34]:
student_age = 21
student_age

21

Operations can be carried out as before, using the variable names.

In [35]:
student_age/2

10.5

We can update values associated with a variable using the operators `+=` , `-=` , `/=`, and `*=`.

- For example, `+=` adds the number on the right to the current value.

This is a useful shortcut - take your time to play around and familiarise yourself with this syntax.

In [36]:
#Run this cell multiple times to see what happens.
#Note that each time you run this cell, it will add 5 to the stored value.
student_age += 5
student_age

26

In [37]:
age_next_year=student_age+1
age_next_year

27

### **Exercise 2a**
In the cell below, assign appropriate values to the variables `my_name`, `my_age`, and `years_at_sussex`.

In [39]:
my_name="Julie"
my_age=21
years_at_sussex=20

### **Exercise 2b**
In the cell below subtract `years_at_sussex` from `my_age` and assign this value to a new variable called `age_started_sussex`.

In [40]:
age_started_sussex=my_age-years_at_sussex
age_started_sussex

1

### **Exercise 2c**
In the cell below practice using the `**`,  `+=` , `-=`, `/=`, and `*=` operators to update these values.

In [41]:
my_age+=2
my_age

23

In [42]:
years_at_sussex*=2
years_at_sussex

40

In [43]:
age_started_sussex-=10
age_started_sussex

-9

### 1.1.5 Dynamic typing
The `type` function is used to get an object's type: `int` for integer, `str` for string, etc.

In [44]:
type(student_name)

str

In [45]:
type(student_age)

int

As Python has dynamic typing, if a variable name is assigned to a value of different type, the variable's type will change accordingly.

In [46]:
student_age = "Twenty"
type(student_age)

str

### **Exercise 3**
In the cell below reassign your `my_age` and `years_at_sussex` `int` variables to `string` giving the number in words. Print the type of these variables before and after.

In [47]:
print(type(my_age))
my_age="twenty one"
print(type(my_age))

<class 'int'>
<class 'str'>


## 1.1.6 Lists

We are now going to look at a more complex data structure.  A list is an ordered collection of other data types.

Lists are initialised using square brackets, with objects separated by commas.

In [48]:
primes = [2, 3, 5, 7, 11]
type(primes)

list

Lists can contain any data type.

In [49]:
list_of_strings =['string','another string','a third string']
list_of_strings

['string', 'another string', 'a third string']

'Empty' lists with no elements can also be initialised.

In [50]:
empty_list = []

Indexing into lists uses square brackets.
- Note that indexing starts from zero.

In [51]:
primes[0]

2

In [52]:
primes[2]

5

A colon, `:`, can be used to take a slice of a list between two indices.
- Note that this will start from the first index, up to but NOT including the second index.

In [53]:
primes[1:3]

[3, 5]

If either index is omitted, the slice will go to the beginning/end of the list.

In [54]:
primes[:3]

[2, 3, 5]

To index from the end of the list use negative numbers.

In [55]:
primes[-1]

11

In [56]:
primes[-2:]

[7, 11]

To test for list membership use the keyword `in`.

In [57]:
5 in primes

True

In [58]:
6 in primes

False

The function `len` gives the length of a list.

In [59]:
len(primes)

5

To append an element to a list use `append`.

In [60]:
primes.append(13)

In [61]:
primes.append(17)

In [62]:
primes

[2, 3, 5, 7, 11, 13, 17]

Using `append` with a list as parameter adds the list as a single element - producing a list that contains a list as its last element.

In [63]:
primes = [2, 3, 5, 7, 11, 13]
primes.append([17,19])
primes

[2, 3, 5, 7, 11, 13, [17, 19]]

That probably isn't what we wanted to do.  
If we want to add the elements of one list individually to another list, use the `+=` operator to concatenate the two lists.  

In [64]:
primes = [2, 3, 5, 7, 11, 13]
primes += [17,19]
primes

[2, 3, 5, 7, 11, 13, 17, 19]

Quite often when we have a list, we want to do the same thing to everything in that list.  That requires us to write some code to **iterate over the list**.  The most simple way to do this is with a **for** loop

To write a for loop that iterates over a list, we use the keywords `for` and `in`, `:`, and **indentation** to indicate the scope of the body of the loop.


In [65]:
for prime in primes:
    print(str(prime)+" is a prime")

2 is a prime
3 is a prime
5 is a prime
7 is a prime
11 is a prime
13 is a prime
17 is a prime
19 is a prime


The indentation is generated with a tab (or in some environments 4 spaces) and is really important.   It tells Python which commands should be repeated.  Try these:

In [66]:
for prime in primes:
    print(str(prime)+ " is a prime")
print("I love python")

2 is a prime
3 is a prime
5 is a prime
7 is a prime
11 is a prime
13 is a prime
17 is a prime
19 is a prime
I love python


In [67]:
for prime in primes:
    print(str(prime)+ " is a prime")
    print("I love python")

2 is a prime
I love python
3 is a prime
I love python
5 is a prime
I love python
7 is a prime
I love python
11 is a prime
I love python
13 is a prime
I love python
17 is a prime
I love python
19 is a prime
I love python


There always has to be at least one command in a block that is being repeated, otherwise you'll get an IndentationError.  Expect to see this one a lot (especially if you cut and paste code!)

In [68]:
for prime in primes:
print(str(prime)+ " is a prime")
print("I love python")

IndentationError: expected an indented block (<ipython-input-68-7b446eeaa7b1>, line 2)

In the code above we could have used any variable name instead of prime. 

In [69]:
for alien_planet in primes:
    print(alien_planet,"is a prime")

2 is a prime
3 is a prime
5 is a prime
7 is a prime
11 is a prime
13 is a prime
17 is a prime
19 is a prime


It is usually best practice to consider the loop variable, alien_planet, as **local** to the loop and not try to access it from outside the loop.  However, if you do, you will get the last value that it had during the iteration.

In [70]:
alien_planet

19

### **Exercise 4a**
In the cell below initialise the variable `squares` to be a list of the square numbers from 1 to 16 inclusive.

In [71]:
squares=[1,4,9,16]

### **Exercise 4b**
In the cell below append the next square number to the list `squares`.

In [72]:
squares.append(25)

### **Exercise 4c**
In the cell below make a list of the next two square numbers and concatenate this with `squares`.

In [73]:
more_squares=[36,49]
squares+=more_squares

### **Exercise 4d**
In the cell  below check how many items are in the list now.

In [74]:
len(squares)

7

### **Exercise 4e**
In the cell below use indexing to print just the first 3 and last 3 items in the list `squares`

In [75]:
squares[:3]+squares[-3:]

[1, 4, 9, 25, 36, 49]

### **Exercise 4f**
In the cell below, use a `for` loop to print each item in the list `squares` on its own line, as part of a sentence. The output should like like this:
```
The first square in the list is  1
The next square in the list is  4
The next square in the list is  9
The next square in the list is  16
The next square in the list is  25
The next square in the list is  36
The last square in the list is  49
```

In [77]:
print("The first square in the list is",squares[0])
for number in squares[1:-1]:
    print("The next square in the list is",number)
print("The last square in the list is",squares[-1])

The first square in the list is 1
The next square in the list is 4
The next square in the list is 9
The next square in the list is 16
The next square in the list is 25
The next square in the list is 36
The last square in the list is 49


## 1.1.7 Strings

We are now going to take a bit more of an in-depth look at Strings.  

We often think of Strings as an atomic data types, like integers and floats, out of which we might make other more complex types (e.g., lists) but which can't be broken down any further.  But actually, a String can be thought of as a complex datatype - it is a **list** of **characters**. We just have an easier way of writing it (as a String e.g., 'Adam')) rather than using the square brackets notation ['A','d','a','m'].   

However, Python lets us use a lot of list functionality straightforwardly on Strings.

In [78]:
# Here we asign a string "Hello World" as the value a variable called hello_world
hello_world = "Hello World"

String indexing is similar to list indexing, but works on a character-by-character basis.

In [79]:
hello_world[0]

'H'

In [80]:
hello_world[7]

'o'

In [81]:
hello_world[-3:]

'rld'

In [82]:
hello_world[-40]

IndexError: string index out of range

Can you work out why the error above was generated?

We can also easily test for substring presence using the keyword `in`.

In [83]:
"w" in hello_world

False

In [84]:
"W" in hello_world

True

In [85]:
"llo" in hello_world

True

We can also find the length of a string using `len`.

Note that the output value is a count including spaces, tabs and non-alphanumeric characters.

In [86]:
len(hello_world)

11

In [87]:
hello_world+="!"
hello_world

'Hello World!'

In [88]:
len(hello_world)

12

We can iterating over a string with the same syntax as in normal list iteration.  However, it now works on a character-by-character basis.  In other words, in each iteration of the loop, the loop variable will the next character in the string.

In [89]:
for char in hello_world:
    print ("the character >>>", char, "<<< is present")

the character >>> H <<< is present
the character >>> e <<< is present
the character >>> l <<< is present
the character >>> l <<< is present
the character >>> o <<< is present
the character >>>   <<< is present
the character >>> W <<< is present
the character >>> o <<< is present
the character >>> r <<< is present
the character >>> l <<< is present
the character >>> d <<< is present
the character >>> ! <<< is present


The `split` method provides a simplistic way to parse a string into words.   By default, it separates based on whitespace and will returns a list of *tokens*.   We will learn more about tokenisation in week 2.

An optional character can be passed to split as an argument.  See the difference if you change the following cell so that the second line reads `words = sentence.split('s')`



In [90]:
sentence = "This is a sample sentence"
words = sentence.split()
print(words)

['This', 'is', 'a', 'sample', 'sentence']


To check for the presence of a token in a list of words, we use the `in` keyword.

In [91]:
"sample" in words

True

In [92]:
"Hello" in words

False

### **Exercise 5a**
In the empty cell below  assign the string `"It was the best of times, it was the worst of times"` to the variable `opening_line`.

In [93]:
opening_line="It was the best of times, it was the worst of times"

### **Exercise 5b**
In the empty cell below check whether 'worst' appears in opening_line.

In [94]:
"worst" in opening_line

True

### **Exercise 5c**
In the empty cell below make a list of the words in `opening_line`, assigned to the variable `dickens_words`, and iterate over `dickens_words`, printing one word per line.

In [95]:
dickens_words=opening_line.split()
for word in dickens_words:
    print(">>",word)

>> It
>> was
>> the
>> best
>> of
>> times,
>> it
>> was
>> the
>> worst
>> of
>> times


### **Exercise 5d**
In the empty cell below check whether `'blurst'` appears in the list you made.

In [96]:
"blurst" in dickens_words

False

## 1.1.8 Conditions and booleans

Finally, we are going to take a quick look at conditional statements.  In the code below, note the use of the keywords if and else as well as the presence of the colons (:) and the indentation.

In [97]:
if 2 > 3:
    print ("yes")
else:
    print ("no")

no


In [98]:
if len(words) > 10:
    print("its a long sentence")
else:
    print("its a short sentence")

its a short sentence


There are some useful string *shape* methods, which form part of the String class and can be used to test for certain types of string.  Work out what each of the following test for:
- astring.isalpha()
- astring.isalnum()
- astring.isdigit()

In [99]:
"This".isalpha()

True

In [100]:
"This,".isalpha()

False

In [101]:
"M25".isalpha()

False

In [102]:
"M25".isalnum()

True

In [103]:
"463".isdigit()

True

Boolean statements can be combined using `and`. Both must be true for the combination to be evaluated as `True`.

In [104]:
True and True

True

In [105]:
False and True

False

Boolean statements can be combined using `or`. At least one statement must be true for the combination to be evaluated as `True`.

In [106]:
False or True

True

In [107]:
True or False

True

A boolean statement can be negated using `not`.

In [108]:
not True

False

In [109]:
not False

True