# Topic 0: Introduction to Python - Part 1

This is an Jupyter notebook, a web-based interactive computational environment. 
- Cells can contain markdown or code. 
- To run a code cell, press shift+Enter. 
- Jupyter will print the output from a cell, beneath it.

This session is designed to give you the working knowledge of Python necessary to complete the lab sessions for Natural Language Engineering. 

- Run all of the code cells as you work through the notebook. 
- Try to understand what is happening in each code cell and predict the output before running it.
- Complete all of the exercises.
- Solutions to all exercises are provided, but please avoid loading the solution until you have had a go at solving it yourself.


Run the following cell twice, first to load some set up code, then again to run the code.

In [None]:
%load ../setup

## Python types

### String
Strings are enclosed in double or single quotes in Python.

In [None]:
print('Hello World')

In [None]:
print("Hello World")

In [None]:
# This is a comment (# at the beginning of the line)
# Note that a string enclosed in double quotes can contain single quotes as part of the string:
print("'A reader lives a thousand lives before he dies,' said Jojen. 'The man who never reads lives only one.'")

In [None]:
# ...and a string enclosed in single quotes can contain double quotes as part of the string:
print('"A reader lives a thousand lives before he dies," said Jojen. "The man who never reads lives only one."')

As an alternative to using the explicit `print` function, when a cell is run, Python will print the value of the last line of code in a cell. Try running the following cell.

In [None]:
"Hello World"
'"A reader lives a thousand lives before he dies," said Jojen. "The man who never reads lives only one."'

### Integer

In [None]:
75

### Float

In [None]:
6.3646

When a string contains just digits, the function `int` will **cast** that string to an integer.

In [None]:
# give the type of the string '623'
type('623')

In [None]:
# cast the string '623' to an integer
int('623')

In [None]:
# give the type that results from casting the string '623' to an integer.
type(int('623'))

## Basic operations

Strings can be joined using `+`

In [None]:
"Hello " + "World"

Standard operators are used on integers and floats: `+`, `-`, `*`, and `/`.

In [None]:
7 - 3 + 5

In [None]:
3.5*8/4

If we want to use floor division (rounded down to nearest integer) use `//`.

In [None]:
7//2

Use `**` for exponentiation - e.g. `3**2 = 3^2`.

In [None]:
# This is equivalent to 2*2*2*2*2
2**5

Use double equals, `==`, to check equality.

In [None]:
5*4 == 2*10

Modulo operator `%` returns the remainder after integer division.  
e.g. 13/5 = 2 with 3 leftover, so `13%5=3`.

In [None]:
7%3

In [None]:
4 % 2

## Python error reports
e.g. when attempting to join a **string** and an **integer**

In [None]:
"Hello" + 3

### Exercise
In the empty cell below write a single line Python expression to print "Hello world! My name is", joined with another string containing your name

In [None]:
# uncomment the next line and then run the cell to load a solution
# %load ../Solutions/0/hello

## Python identifiers
Assign a variable name to any value (eg string, integer, float) using a single equals sign.

In [None]:
student_name = "Adam"
student_name

In [None]:
student_age = 21
student_age

Operations can be carried out as before, using the variable names.

In [None]:
student_age/2

We can update values associated with a variable using the operators `+=` , `-=` , `/=`, and `*=`.

- For example, `+=` adds the number on the right to the current value.

This is a useful shortcut - take your time to play around and familiarise yourself with this syntax.

In [None]:
#Note that each time you run this cell, it will add 5 to the stored value.
student_age += 5
student_age

In [None]:
age_next_year=student_age+1
age_next_year

### Exercise
In the cell below, assign appropriate values to the variables `my_name`, `my_age`, and `years_at_sussex`.

In [None]:
# uncomment the next line and then run the cell to load a solution
# %load ../Solutions/0/age

### Exercise
In the cell below subtract `years_at_sussex` from `my_age` and assign this value to a new variable called `age_started_sussex`.

In [None]:
# uncomment the next line and then run the cell to load a solution
# %load ../Solutions/0/age_started

### Exercise
In the cell below practice using the `**`,  `+=` , `-=`, `/=`, and `\*=` operators to update these values.

## Dynamic typing
The `type` function is used to get an object's type: `int` for integer, `str` for string, etc.

In [None]:
type(student_name)

In [None]:
type(student_age)

As Python has dynamic typing, if a variable name is assigned to a new value of different type, the variable's type will change accordingly.

In [None]:
student_age = "Twenty"
type(student_age)

### Exercise
In the cell below reassign your `my_age` and `years_at_sussex` `int` variables to `string` giving the number in words. Print the type of these variables before and after.

In [None]:
# uncomment the next line and then run the cell to load a solution
# %load ../Solutions/0/dynamic_typing

## Lists

Lists are initialised using square brackets, with objects separated by commas.

In [None]:
primes = [2, 3, 5, 7, 11]
type(primes)

Lists can contain any data type.

In [None]:
list_of_strings =['string','another string','a third string']
list_of_strings

'Empty' lists with no elements can also be initialised.

In [None]:
empty_list = []

Indexing into lists uses square brackets.
- Note that indexing starts from zero.

In [None]:
primes[0]

A colon, `:`, can be used to take a slice of list between two indices.
- Note that this will start from the first index, up to but NOT including the second index.

In [None]:
primes[1:3]

If either index is omitted, the slice will go to the beginning/end of the list.

In [None]:
primes[:3]

To index from the end of the list use negative numbers.

In [None]:
primes[-1]

In [None]:
primes[-2:]

To test for list membership use the keyword `in`.

In [None]:
5 in primes

In [None]:
6 in primes

The function `len` gives the length of a list.

In [None]:
len(primes)

To append an element to a list use `append`.

In [None]:
primes.append(13)

In [None]:
primes.append(17)

In [None]:
primes

Using `append` with a list as parameter adds the list as a single element - producing a list that contains a list as its last element.

In [None]:
primes = [2, 3, 5, 7, 11, 13]
primes.append([17,19])
primes

When we want to add the elements of one list individually to another list, use the `+=` operator to concatenate the two lists.

In [None]:
primes = [2, 3, 5, 7, 11, 13]
primes += [17,19]
primes

To write a for loop that iterates over a list use keywords `for` and `in`, `:`, and indentation to indicate the scope of the body of the loop.

In [None]:
for prime in primes:
    print(prime,"is a prime")

### Exercise
In the cell below initialise the variable `squares` to be a list of the square numbers from 1 to 16 inclusive.

In [None]:
# uncomment the next line and then run the cell to load a solution
# %load ../Solutions/0/squares

### Exercise
In the cell below append the next square number to the list `squares`.

In [None]:
# uncomment the next line and then run the cell to load a solution
# %load ../Solutions/0/extend_squares

### Exercise
In the cell below make a list of the next two square numbers and concatenate this with `squares`.

In [None]:
# uncomment the next line and then run the cell to load a solution
# %load ../Solutions/0/more_squares

### Exercise
In the cell  below check how many items are in the list now.

In [None]:
# uncomment the next line and then run the cell to load a solution
# %load ../Solutions/0/squares_length

### Exercise
In the cell below use indexing to print just the first 3 and last 3 items in the list `squares`

In [None]:
# uncomment the next line and then run the cell to load a solution
# %load ../Solutions/0/first_last_three

### Exercise
In the cell below, use a `for` loop to print each item in the list `squares` on its own line, as part of a sentence. The output should like like this:
```
The first square in the list is  1
The next square in the list is  4
The next square in the list is  9
The next square in the list is  16
The next square in the list is  25
The next square in the list is  36
The last square in the list is  49
```

In [None]:
# uncomment the next line and then run the cell to load a solution
# %load ../Solutions/0/print_squares

## Strings

In [None]:
# Here we asign a string "Hello World" as the value a variable called hello_world
hello_world = "Hello World"

String indexing is similar to list indexing, but works on a character-by-character basis.

In [None]:
hello_world[0]

In [None]:
hello_world[7]

In [None]:
hello_world[-3:]

In [None]:
hello_world[-6]

Test for substring presence using the keyword `in`.

In [None]:
"w" in hello_world

In [None]:
"W" in hello_world

In [None]:
"llo" in hello_world

Find the length of a list using `len`.

Note that the output value is a count including spaces, tabs and non-alphanumeric characters.

In [None]:
len(hello_world)

In [None]:
hello_world+="!"
hello_world

In [None]:
len(hello_world)

Iterating over a string involves similar syntax to list iteration, but works on a character-by-character basis.

In [None]:
for char in hello_world:
    print ("the character >>>", char, "<<< is present")

Parsing a string into words uses the `split` method which returns a list of tokens in a sentence. 

By default, it separates based on whitespace.

In [None]:
sentence = "This is a sample sentence"
words = sentence.split()
print(words)

To check for the presence of a token in a list of words use the `in` keyword.

In [None]:
"sample" in words

In [None]:
"Hello" in words

### Exercise
In the empty cell below  assign the string `"It was the best of times, it was the worst of times"` to the variable `opening_line`.

In [None]:
# uncomment the next line and then run the cell to load a solution
# %load ../Solutions/0/assign_string

### Exercise
In the empty cell below check whether 'worst' appears in opening_line.

In [None]:
# uncomment the next line and then run the cell to load a solution
# %load ../Solutions/0/worst_in

### Exercise
In the empty cell below make a list of the words in `opening_line`, assigned to the variable `dickens_words`, and iterate over `dickens_words`, printing one word per line.

In [None]:
# uncomment the next line and then run the cell to load a solution
# %load ../Solutions/0/print_sentence

### Exercise
In the empty cell below check whether `'blurst'` appears in the list you made.

In [None]:
# uncomment the next line and then run the cell to load a solution
# %load ../Solutions/0/blurst_check

## Functions
Functions are defined using the keyword `def`, followed by a function name, and a list of parameters in parentheses. Don't forget the `:` after the closing parenthesis.

The body of the function starts on the next line, and must be indented.

In [None]:
def double(number):
     return(number * 2)

In [None]:
double(13)

In [None]:
type(double)

In [None]:
def add_question_mark(string):
    return string + "?"

In [None]:
add_question_mark("what's your name")

In [None]:
def print_first_half(string):
    half_length_of_string = len(string)//2 #use floor division as indices must be integers
    return string[:half_length_of_string]

In [None]:
print_first_half('hi how are you doing?')

### Exercise
In the empty cell below define a function called `square` that returns an input parameter squared. 

Hint: check the 'basic functions' section above, for the Python syntax for exponentials.

In [None]:
# uncomment the next line and then run the cell to load a solution
# %load ../Solutions/0/square_function

### Exercise
In the empty cell below define a function `makelist` that takes a sentence string as an input, and returns a list of the words in the sentence.

In [None]:
# uncomment the next line and then run the cell to load a solution
# %load ../Solutions/0/make_list

### Comments and docstrings

Look at the code in the code cell below.

- The first block of text shows how doc strings are used by convention in Python. All functions should begin with a block of documentation (docstring) of the form given by the first block in triple quotes in the programme below.
- Comments in the code itself are introduced by a `#` either as a separate line or appended to the end of a line. Python will ignore the rest of a line after a `#`.

When you type shift-enter to execute the cel the function exists in the kernel and can be called by any cell in the notebook. The function definition ends as soon as the indentation ceases (this is triggered by the comment `"Here is the argument:"`).  After creating the function the kernel will continue to execute contents of the cell, thereby calling the function. 

Notice how the programme splits a character string at carriage return characters. This works because split is an inbuilt method of the text data type. Therefore all text objects can be split in this way.
- `\n` is the carriage return character. 
- Use `\t` for reading tab separated files.
- If you leave the argument empty it will treat any string of whitespace as a delimiter to be split. This has the advantage that a double space will be treated as a single delimiter.

### Exercise
In the cell below examine the contents of the variable `sample_text` with and without the print function.
- Notice that execution of the first cell means the variable is now in the kernel and accessible to any cell.
- Use the same box to try printing the variable `input_text`. What happens? Why?

In [None]:

def count_paragraphs(input_text):
    """
    A paragraph is defined as the text before a CR character ie. "\n"
    Take a character string, split it into paragraphs, count them.
    and return the counts
    :param input_text: a character string containing paragraph marks
    :return: integer, the number of paragraphs
    """
    
    # The following statement creates a list of strings by breaking
    # up input_text wherever a "\n" character occurs
    
    paragraphs = input_text.split("\n")  
    
    # The len() function counts the number of elements in the list
    
    return len(paragraphs)


# Here is the argument:

sample_text = "This is a sample sentence01 showing 7 different token types: alphabetic, numeric, alphanumeric, Title, UPPERCASE, CamelCase and punctuation!\nSentences like that should not exist. They're too artificial.\nA REAL sentence looks different. It has flavour to it. You can smell it; it's like Pythonic code, you know?\nHave you heard of 'code smell'? Google it if you haven't."
print (sample_text)

# Here is the function call:

print ("Number of paragraphs: ", count_paragraphs(sample_text))


### Exercise
In the blank cell below examine the contents of the variable `sample_text` with and without the print function.
- Notice that execution of the first cell means the variable is now in the kernel and accessible to any cell.
- Use the same box to try printing the variable `input_text`. What happens? Why?

## Classes

In [None]:
#The pass statement is a null operation used here as a placeholder; class definitions would go here
class MyClass:
    pass

In [None]:
MyClass

In [None]:
type(MyClass)

Creating an instance of a class (remember every class defines a type).

In [None]:
my_example = MyClass()

In [None]:
my_example

In [None]:
type(my_example)

It's easy to mix types. Here is a mixed type list:

In [None]:
[21, "Brighton", double, MyClass, my_example, []]

## Sets  
These are *unordered* lists of *unique* elements.

Note the use of curly  brackets rather than the square brackets used for lists.

In [None]:
unique_numbers = {1, 2, 2, 2, 3}
unique_numbers

In [None]:
type(unique_numbers)

To initialise an empty set, use `set()`

In [None]:
new_set = set()
type(new_set)

To add an element to a set use the method `add`.

In [None]:
unique_numbers.add(5)

Use `len` to give the number of elements in a set.

In [None]:
len(unique_numbers)

To check the presence of an element in a set use the keyword `in`.

Similar to the use of `in` for lists and strings.

In [None]:
2 in unique_numbers

Iterating over a set

The syntax for iterating over a set is similar to that used when iterating over a list. 

Remember to use `for`, `in`, `:` and indentation.

In [None]:
for number in unique_numbers:
    print(number * 3)

In [None]:
for number in unique_numbers:
    print (double(number))

### Exercise
In the empty cell below create a function called `get_vocabulary` that takes a *list* of words as input, and returns a *set* of the words in the sentence.

Use your function `get_vocabulary` to create the set dickens_vocab, a set of unique words in the opening_line (see above).

In [None]:
# uncomment the next line and then run the cell to load a solution
# %load ../Solutions/0/get_vocabulary

    
The code in the next cell shows how to create a vocabulary with the set datatype.

In [None]:
def get_vocabulary(input_text):
    """
    A word is defined as a character string delimited by
    a space " " character.
    Given an input string, split it into words and
    return the set of unique words in the input.
    :param input_text: Character string with some text
    :return: The set of unique words in the input.
    """
    
    list_of_words = input_text.split()
    
    # The following line takes the list of words,
    # removes repetitions and creates a set:

    return set(list_of_words)

# The following loop is repeated for every word in the set
# Note that a set is just one of many iterable objects:


for word in get_vocabulary(sample_text):
    print (word)

## Dictionaries
A dictionary is an *unordered* set of key:value pairs. 

Keys are used to index the dictionary.

The main operations are storing a value with a key, and then extracting a specific value using its key. 

Each key in a given dictionary must be unique. 

A dictionary is initialised with curly braces. This can contain comma-separated key:value pairs. 

Note the use of ':' to map a key to a value.

In [None]:
simpsons_ages = {"Bart":10, "Lisa":8, "Homer" : "thirty something"}
simpsons_ages

In [None]:
type(simpsons_ages)

Accessing the values of keys in a dictionary

In [None]:
simpsons_ages["Homer"]

In [None]:
simpsons_ages['Bart']

Getting the number of elements in a dictionary.

Just like getting the length of a list, we use the keyword `len`.

In [None]:
len(simpsons_ages)

Checking the presence of a key in a dictionary.

In [None]:
"Marge" in simpsons_ages

In [None]:
"Bart" in simpsons_ages

Accessing a key that does not exist is an error.

In [None]:
simpsons_ages["Krusty"]

Adding a new key:value entry to the dictionary.

In [None]:
simpsons_ages["Marge"] = 34
simpsons_ages["Marge"]

### Exercise
In the blank cell below add two extra key-value pairs to the dictionary, `simpons_ages`, each consisting of a name and corresponding age.

In [None]:
# uncomment the next line and then run the cell to load a solution
# %load ../Solutions/0/more_simpsons

Use a `for` loop to iterate over *keys* in the dictionary.

In [None]:
for person in simpsons_ages: 
     print (person)

Use the `items` method to iterat over the key-value pairs of a dictionary.

In [None]:
for item in simpsons_ages.items():
     print (item)

In [None]:
#Note that 'person' and 'age' here are arbitary variable names, and  can be replaced with any two names eg 'key' and 'value'
for person, age in simpsons_ages.items():
     print(person," is ", age, " years old")

### Exercise
In the blank cell below make a new dictionary called `polygons` where the keys are names of shapes and the values are the corresponding number of sides.

In [None]:
# uncomment the next line and then run the cell to load a solution
# %load ../Solutions/0/polygons

### Exercise
In the blank cell below iterate over the keys and values, printing each key and value in a sentence (eg 'a triangle has 3 sides').

In [None]:
# uncomment the next line and then run the cell to load a solution
# %load ../Solutions/0/print_polygons

We can make dictionaries that have a default value when none is specified

In [None]:
# To do this, we need to use a class that is not built-in, so we import it
import collections
word_counts = collections.defaultdict(int)
# the "int" parameter will create entries with a default value of 0
word_counts

In [None]:
type(word_counts)

In [None]:
len(word_counts)

In [None]:
"This" in word_counts

In [None]:
word_counts["This"]

In [None]:
# an entry has been automatically created with the default value of 0, just by querying the default dictionary
"This" in word_counts

In [None]:
len(word_counts)

In [None]:
# we can add a new entry with a value of 1
word_counts["is"] += 1
#querying this key in the default dictionary makes an entry with the default value of 0, and we add 1 to this
word_counts["is"]

In [None]:
# we can also update the value of a key
word_counts["is"] += 5
word_counts["is"]
6

### Exercise
In the empty cell below write code that will print, one word per line, each word in `dickens_words` together with the number of times that word appears in `dickens_words`.

In [None]:
# uncomment the next line and then run the cell to load a solution
# %load ../Solutions/0/print_dickens_counts 

## Files
Files have a file path and in the cell below we use the variable `input_file_path` to hold a string that contains a file path.

In [None]:
#Make sure the file path points to a valid file
#input_file_path = "N:/nle_notebooks/sample_text.txt"
input_file_path = "/Users/davidw/Documents/teach/NLE/NLE Notebooks/Topic 0/sample_text.txt"

We now use the file path variable to *open* the file. We need to do this before reading/writing to it.

In [None]:
input_file = open(input_file_path)
type(input_file)

Use the `read` command to read the entire file contents into a `str` variable called `input_text`.

In [None]:
input_text = input_file.read()
type(input_text)

In [None]:
input_text

When you are done with the file, close it.

In [None]:
input_file.close()

After the file has been closed it cannot be read any more.

In [None]:
input_text = input_file.read()

### Exercise
In the blank cell below write a function, `print__word_counts` that will take a file path as an argument, open the file, then print, one word per line, each word in the file together with the number of times that word appears in the file. 

Test your function by running it on the `sample_text.txt`.

In [None]:
# uncomment the next line and then run the cell to load a solution
# %load ../Solutions/0/print_word_counts

## Tuples

A tuple consists of a number of values separated by commas. These can be different types. It is initialised with parentheses, containing its objects separated by commas.

In [None]:
person = ("Jon", 14, "jon@thewall.com")
person

In [None]:
type(person)

Use `len` to count the number of elements in a tuple.

In [None]:
len(person)

Indexing into a tuple is similarly to indexing into a list.

In [None]:
person[0]

In [None]:
person[-2:]

It can be useful to use tuples as values in dictionaries.

In [None]:
#Note that each key is a string, and each value is a tuple
people = {"Joffrey":(12, "Baratheon", "joff@kingslanding.com"), "Jon":(14, "Snow", "jon@thewall.com")}
people["Joffrey"]

In [None]:
### Jon's age - we access this using the dictionary key, and then indexing within the value:
people["Jon"][0]

In [None]:
### Joffrey's email
people["Joffrey"][2]

In [None]:
#  list everyone's first and last names:
for person, record in people.items():
     print (person, record[1])

### Exercise
In the blank cell below create a dictionary called `address_book`, with at least 3 key-value entries. Each should consist of a person's name in string format (the key), and a tuple with corresponding pieces of information about them (the value).

Once you've done that, iterate over the address book, printing information about each person into a sentence.

In [None]:
# uncomment the next line and then run the cell to load a solution
# %load ../Solutions/0/prime_ministers

### Exercise
Make sure that you understand the code in the following cell. It calculates the number of sentences in each paragraph of a text. 

Can you see where tuples are being used?

In [None]:
def count_sentences_per_paragraph(input_text):
    """
    Given an input text:
     - assign a number to each paragraph,
     - count the number of sentences in each paragraph,
     - output a list of all paragraph numbers together
       with the number of sentences in it.

    :param input_text: A character string possibly containing
                        periods "." to separate sentences and
                        paragraph marks "\n" to separate
                        paragraphs.
    :return: A list of ordered pairs (tuples) where the first
            element of the pair is the paragraph number and
            the second element is the number of sentences in
            that paragraph.
            Sample output: [(0, 1), (1, 3), (2, 3), (3, 1)]
    """
    
    paragraphs = input_text.split("\n")
    sentences_per_paragraph = list()  # create an empty list
    paragraph_number = 0
    for paragraph in paragraphs:
        paragraph_number += 1
        sentences = paragraph.split(". ")
        number_of_sentences = len(sentences)
        
        # create a tuple with the paragraph number and the number of sentences
        # in it, then append the tuple to the list:
        
        sentences_per_paragraph.append((paragraph_number, number_of_sentences))
    return sentences_per_paragraph


print (sample_text)
for para, count in count_sentences_per_paragraph(sample_text):
    print("paragraph {0} contains {1} sentence(s)".format(para,count))

## The range function

This produces a generator of numbers in a specified range.

In [None]:
indices = range(0,5)
indices

In [None]:
type(indices)

In [None]:
len(indices)

The output from a `range` can be used as a set of indices.

In [None]:
for i in indices:
    print (words[i])

If `range` is given a single parameter, it will create a range from zero.

In [None]:
for i in range(10):
    print (i)

### Exercise
In the blank cell below use `range` to print a list of the first 10 integers.

In [None]:
# uncomment the next line and then run the cell to load a solution
# %load ../Solutions/0/first_ten_ints

### Exercise
In the cell below use `range` to print a list of the first 10 cubes.

In [None]:
# uncomment the next line and then run the cell to load a solution
# %load ../Solutions/0/first_ten_cubes

## The zip function

The zip function is used to pair up the corresponding elements between multiple iterables. 

It takes multiple iterables as arguments, and returns a list of tuples where the i-th tuple consists of the i-th element from each of the input iterables.

In the example below, we 'zip together' `words` and `indices` into a series of tuples called `word_positions`. For example, the 3rd element of `word_positions` contains the 3rd element of `words` and the 3rd element of `indices`.

In [None]:
words = 'It was the best of times, it was the worst of times'.split()
indices = range(len(words))
word_positions = zip(words, indices)
type(word_positions)

In [None]:
for word, position in word_positions:
    print("'{0}' is in position {1}".format(word,position))


### Exercise
In the blank cell below write a function, `show_word_positions` that takes a filepath as its argument. The funciton should read the text from the file, split the text on whitespace, and then print out each word and its position as in the above example.

Test your funciton out on `sample_text.txt`.

In [None]:
# uncomment the next line and then run the cell to load a solution
# %load ../Solutions/0/show_word_positions

The cell below contains a new version of the code that calculates the number of sentences in each paragraph of a text. 

This versions makes use `range` and `zip`.    

In [None]:
def count_sentences_per_paragraph(input_text):
    """
    Given an input text:
     - assign a number to each paragraph,
     - count the number of sentences in each paragraph,
     - output a list of all paragraph numbers together
       with the number of senfor a,b in enumerate(['The','Holy','Grail']): print a,btences in it.

    :param input_text: A character string possibly containing
                        periods "." to separate sentences and
                        paragraph marks "\n" to separate
                        paragraphs.
    :return: A list of ordered pairs (tuples) where the first
            element of the pair is the paragraph number and
            the second element is the number of sentences in
            that paragraph.
            Sample output: [(0, 1), (1, 3), (2, 3), (3, 1)]
    """
    
    paragraphs = input_text.split("\n")
    sentence_counts = []
    for paragraph in paragraphs:
        number_of_sentences = count_sentences(paragraph)
        sentence_counts.append(number_of_sentences)
    
    # Create a list with the paragraph numbers we need:
    
    paragraph_numbers = range(len(paragraphs))
    
    # Make a list of tuples by combining two existing lists:
    sentences_per_paragraph = zip(paragraph_numbers, sentence_counts)
    return sentences_per_paragraph

def count_sentences(paragraph):
    """
    A sentence is a character string delimited by a period "."
    Given an input paragraph, return the number of sentences
    in it.
    :param paragraph: Character string with sentences.
    :return: number of sentences in the input paragraph
    """
    
    sentences = paragraph.split(".")
    return len(sentences)

for para, count in count_sentences_per_paragraph(sample_text):
    print("paragraph {0} contains {1} sentence(s)".format(para,count))

In situations where you are zipping lists that may be of different lengths, and want to pad out any 'missing' elements, you fill find `zip_longest` useful.

In [None]:
from itertools import zip_longest

listA = ["the","cat","sat"]
listB = ["a","dog","lay","down"]
for elem in zip_longest(listA,listB):
    print(elem)

### Enumerate
Python provides a built-in function that can be used instead of `range` and `zip`.

In [None]:
for a,b in enumerate(['The','Holy','Grail']): 
    print(a,b)

In [None]:
for a,b in enumerate(['The','Holy','Grail'],1): 
    print(a,b)

### Exercise
In the empty cell below, adapt the that calculates the number of sentences in each paragraph of a text so that it uses `enumerate` rather than `range` and `zip`.


In [None]:
# uncomment the next line and then run the cell to load a solution
# %load ../Solutions/0/count_sentences_per_paragraph

## The map function
This takes a function and an iterable (e.g. a list) as arguments. It then applies the function to every item of the iterable, returning a list of the results.

In [None]:
#First we make a function, which we will pass to the map function in the next cell
natural_numbers = range(5)
def square(n):
    return n**2

square(5)

In [None]:
squared_numbers = map(square, natural_numbers)
for i in squared_numbers:
    print (i)

In [None]:
def decorate(char):
     return "*" + char + "*"

decorate("A")

In [None]:
decorated_characters = map(decorate, "Hello")
type(decorated_characters)

In [None]:
decorated_characters = map(decorate, "Hello")
for char in (decorated_characters):
     print (char)

### Exercise
In the blank cell below write a function called `add_exclamation` which adds a `'!'` to the input string. Then map add_exclamation to print each word in dickens_words, followed by an exclamation point.

In [None]:
# uncomment the next line and then run the cell to load a solution
# %load ../Solutions/0/add_exclamations

## Conditions and booleans

In [None]:
if 2 > 3:
    print ("yes")
else:
    print ("no")

Here are some useful string *shape* functions.

In [None]:
"This".isalpha()

In [None]:
"This,".isalpha()

In [None]:
"M25".isalpha()

In [None]:
"M25".isalnum()

In [None]:
"463".isdigit()

In [None]:
# non zero numbers are TRUE
print ("yes" if 15 else "no")

In [None]:
# zero is FALSE
print ("yes" if 0 else "no")

In [None]:
# non empty lists are TRUE
print ("yes" if ["one element"] else "no")

In [None]:
# the empty list is FALSE
print ("yes" if [] else "no")

In [None]:
# non empty character strings are TRUE
print ("yes" if "Hello" else "no")

In [None]:
# the empty string is FALSE
print ("yes" if "" else "no")

Boolean statements can be combined using `and`. Both must be true for the combination to be evaluated as `True`.

In [None]:
True and True

In [None]:
False and True

Boolean statements can be combined using `or`. At least one statement must be true for the combination to be evaluated as `True`.

In [None]:
False or True

In [None]:
True or False

A boolean statement can be negated using `not`.

In [None]:
not True

In [None]:
not False

### Exercise
In the next code cell we see code that determines the kinds of tokens found in a list. A token is a specific occurrence of a basic unit of lexical processing, typically a word or an item of punctuation.

- Study the programme, in particular the string methods. These are very useful in NLP.
- Experiment with the string methods using the empty cell until you understand how they work in special cases such as a single space and a single punctuation mark.
- The programme will only assign one feature to each token. Are there any cases where more than one feature should be assigned?

In [None]:
def make_tokens(input_text):
    """
    Take an input text, split it into tokens, find the
    token's shape, make a feature
    vector with the token itself and its shape, return
    a list of all token feature vectors found in the input.
    :param input_text: A character string containing spaces
    :return: A list of token feature vectors (token, shape).
        Sample output: [('a', 'alpha'), ('7', 'digit'), ('A27', 'alnum')]
    """
    
    # Here we define a token as being delimited by a whitespace:
    
    tokens = input_text.split()
    return map(make_token_feature_vector, tokens)


def make_token_feature_vector(token):
    """
    Given a token, extract its shape and return a
    vector with the token itself and its shape
    :param token: A character string
    :return: A tuple (token, shape)
    """
    
    if token.isalpha():
        return (token, "alpha")
    elif token.isdigit():
        return (token, "digit")
    elif token.isalnum():
        return (token, "alnum")
    elif token in ",:;":  
        return (token, "punctuation")
    elif token in ".!?":  
        return (token, "sentence_end")
    elif token == "\n":  
        return (token, "paragraph_end")
    else:
        return (token, "other")


for token in make_tokens(sample_text):
    print(token)

## List comprehension

List comprehensions can be used to create a list of squares.


In [None]:
[x**2 for x in range (4)]

In [None]:
squares = [x*x for x in range(4)]
type(squares)

In [None]:
len(squares)

List comprehensions can be used to create a list of decorated characters.

In [None]:
["*" + char + "*" for char in "Hello"]

List comprehensions can be used to create a list of even numbers.


In [None]:
[double(n) for n in range(4)]

The following function, `is_even` returns `True` for even numbers, and `False`, otherwise.

In [None]:
#Remember the mod operator % returns the residue after integer division
def is_even(n):
    return not n % 2

In [None]:
is_even(8)

In [None]:
is_even(7)

List comprehensions can be used with our `is_even` function to create a list of squares for the first even numbers.

In [None]:
[square(n) for n in range(15) if is_even(n)]

### Exercise
In the blank cell below create a list of the odd numbers in the range 0-20.

In [None]:
# uncomment the next line and then run the cell to load a solution
# %load ../Solutions/0/odd_list

### Exercise
In the blank cell below create a list of numbers in the range 0-20 that are both odd AND divisible by 3.

In [None]:
# uncomment the next line and then run the cell to load a solution
# %load ../Solutions/0/odd_div_by_three

We now take one last look at the code to that counts the number of sentences in each paragraph of a text.

This version uses `map` to iterate over a list. The advantage of this is that it is no longer necessary to initialize the list and append elements to it in a loop.

In [None]:
def count_sentences_per_paragraph(input_text):
    """
    Given an input text:
     - assign a number to each paragraph,
     - count the number of sentences in each paragraph,
     - output a list of all paragraph numbers together
       with the number of sentences in it.

    :param input_text: A character string possibly containing
                        periods "." to separate sentences and
                        paragraph marks "\n" to separate
                        paragraphs.
    :return: A list of ordered pairs (tuples) where the first
            element of the pair is the paragraph number and
            the second element is the number of sentences in
            that paragraph.
            Sample output: [(0, 1), (1, 3), (2, 3), (3, 1)]
    """
    
    paragraphs = input_text.split("\n")
    
    # Apply the count_sentences function to every element of paragraphs,
    # return the results in a new list, call it sentence_counts:
    
    sentence_counts = map(count_sentences, paragraphs)
    paragraph_numbers = range(len(paragraphs))
    sentences_per_paragraph = zip(paragraph_numbers, sentence_counts)
    return sentences_per_paragraph

def count_sentences(paragraph):
    """
    A sentence is a character string delimited by a period "."
    Given an input paragraph, return the number of sentences
    in it.
    :param paragraph: Character string with sentences.
    :return: number of sentences in the input paragraph
    """
    
    sentences = paragraph.split(".")
    return len(sentences)

for para, count in count_sentences_per_paragraph(sample_text):
    print("paragraph {0} contains {1} sentence(s)".format(para,count))

While, in the above code, the use of the map function is considered acceptable python style, it can be used to produce more complicated code which is difficult to read and would be considered poor style. In such cases it is considered good practice to use list comprehensions.

### Exercise
Make a copy of the code cell above and move it to be below this cell. Then adapt the code to use a list comprehension instead of `map`.

In [None]:
# uncomment the next line and then run the cell to load a solution
# %load ../Solutions/0/count_sentences_per_paragraph_list_comp

### Exercise
There is a problem with the code in the cell above. The same problem exists in all our versions of this programme. 

- First look at the code and see if you can see where the problem lies. 
- Next, if you haven't found it, do some experimenting with the split function. Do you really understand how it works?
- Try loading the file `sample_corpus_2.txt` and running the programme on it. It is best to carry out a separate experiment in a new cell.
- Study the input and output until you understand what the problem was. 


### Lazy generators
We now introduce lazy generators, an important form of function in python. A lazy generator does not calculate its results all at once, but returns them one a a time for iteration. The `enumerate` function which we saw earlier is a lazy generator.

You can define lazy generator functions by using `yield` instead of `return`. When the function reaches a `yield` command it yields the argument and suspends execution without terminating and returns control to the level that called the function. The next time it is called it it resumes from the same place that it was left. There is no requirement to have a single yield command. You can yield in one place the first time and another place the next time.

The cell below shows a simple function using both forms so that you can see the difference. Notice that you cannot use the result in the same way. A result that is returned is passed directly as value whereas a result that is yielded must be used in an iterator.

In [None]:
def return_count_to_ten():
    return range(1,11)


def yield_count_to_ten():
    for i in range(1, 11):
        yield i

        
l = return_count_to_ten()
print(l)
    
i = yield_count_to_ten()
print ('yield')
print(i)

l = list(yield_count_to_ten())
print(l)

for i in yield_count_to_ten():
    print(i)


The previous programme delimited tokens by looking for spaces between them. You should have noticed that it doesn't work very well because it doesn't account for punctuation symbols. We need a better way to do this and, ideally, a separate function to do it.

Because it is hard to follow, here is a summary of the logic of the new function, `split_tokens(input_text)`:

The function reads the whole string one character at a time, adding characters to the token variable.
- When it encounters a delimiter it yields the token.
- If the token is empty it yields the delimiter character - unless it is a space - because the delimiter is an item of punctuation which is itself a token.
- After returning a token the variable is reset to an empty string.


In [None]:
def make_tokens(input_text):
    """
    Take an input text, split it into tokens, find the
    token's shape, make a feature
    vector with the token itself and its shape, return
    a list of all token feature vectors found in the input.
    :param input_text: A character string containing spaces
    :return: A list of token feature vectors (token, shape).
        Sample output: [('a', 'alpha'), ('7', 'digit'), ('A27', 'alnum')]
    """
    
    # Now it's up to the split_tokes function to decide what a token is.
    # List comprehension creates a list by extracting elements from
    # an iterable object, in this case Python automatically converts the
    # split_tokens function into an iterable object because it uses the "yield" statement:
    
    tokens = [token for token in split_tokens(input_text)]
    return map(make_token_feature_vector, tokens)


def split_tokens(input_text):
    """
    This function decides how to delimit a token. It takes an input
    string, iterates over it character by character; it collects
    constituent characters in the output token; punctuation characters
    are considered delimiters therefore become tokens of their own; the
    space character is removed from tokens. Yield each found token at
    a time.
    :param input_text: A character string containing a mix of text and delimiter characters.
    :yield A character string which is either free from delimiters or
        is a delimiter itself.
    """

    DELIMITERS = ",:!?.\n"
    token = ""
    for char in input_text:
        if char in DELIMITERS:  # test if the input character is a delimiter (substring presence)
            
            # Character strings, lists, etc, have a logical truth value in Python;
            # an empty string is False, if it has characters it is True.
            
            if not token:  # same as token == ""
                yield char
            else:
                
                # Return token to the calling program, but next time this function
                # is called, continue from
                # the next statement rather than from the beginning of the function:
                
                yield token  # After yielding control to the calling program,
                             # this function will execute the next statement:
                token = ""  # Pick up execution from here.
                yield char
        elif char == " ":
            if token:  # same as token != ""
                yield token
                token = ""
        else:
            token += char

for token in make_tokens(sample_text):
    print(token)

Notice how the function `split_tokens` yields the result instead of returning it. This means that it continues from the same point next time it is called.

### Exercise
In the empty cell below try calling the function `split_tokens` on `sample_text`. What happens?

Notice that the programme does not make a simple function call, it uses it in a list comprehension which iterates over it. Another common way to collect the yields would be with a for loop.

## Pandas dataframes
We will be using tables in various ways later in the module. We now look at how to store tables as Pandas dataframes. 

If you want more detais, a good starting point is [10 Minutes to Pandas](https://pandas.pydata.org/pandas-docs/stable/10min.html).

First, let's create some data to put in the table. This is meant to be the results of some experiment that we have underaken. 

To do this we create a list of tuples, where each tuple is a row in the table.
- We use `display` rather than `print` as it produces a nicer looking table.

Run the cell and make sure you understand the code.

In [None]:
results = [
    (10,0.674),
    (20,0.708),
    (30,0.721),
    (40,0.744),
    (50,0.748),
    (60,0.759),
    (70,0.762),
    (80,0.769),
    (90,0.773),
    (100,0.775)]
df = pd.DataFrame(results,columns = ["Sample Size","Accuracy"])
display(df)

### Making a table from columns
We now create the same dataframe, but in a different way. This time we specify the contents by giving a list for each column.
- The column lists and `zip`'d together to create the same list of tuples we saw above, one tuple for each row of the table.
- `zip` returns an iterator of tuples, so  `list` is needed to give the required list of tuples.

In [None]:
sample_sizes = list(range(10,110,10))
scores = [0.674,0.708,0.721,0.744,0.748,0.759,0.762,0.769,0.773,0.775]
df = pd.DataFrame(list(zip(sample_sizes,scores)),columns = ["Sample Size","Score"])
display(df)

### Plotting data in a dataframe
In the following cell we see how to plot the dataframe containing our pretend experimental results.
- Note that some of the settings are determined by code in the first cell of the notebook.
- `x=0` indicates that the first column of the data provides the values on the x-axis.
- See [pandas.DataFrame.plot](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.plot.html) for more details.

In [None]:
ax = df.plot(kind="bar",x=0,legend=False,title="Experimental Results",yticks=(0.6,0.65,0.7,0.75,0.8))
# set the x-axis label
ax.set_xlabel("Sample Size")
# set the y-axis label
ax.set_ylabel("Accuracy")
# set the y axis range 
ax.set_ylim(0.6,0.8)

Suppose we have results for two competing methods. 

We will have a three rather than two columns in our dataframe:
- the first column holds the sample size
- the second column holds one set of results
- the third column holds a second set of results

Run the cell below.

In [None]:
sample_sizes = list(range(10,110,10))
your_results = [0.674,0.708,0.721,0.744,0.748,0.759,0.762,0.769,0.773,0.775]
my_results = [0.774,0.788,0.801,0.844,0.852,0.855,0.860,0.862,0.863,0.864]

df = pd.DataFrame(list(zip(sample_sizes,your_results,my_results)),columns = ["Sample Size","Your Score","My Score"])
display(df)

Now we show how to visualise these results.
- This time we want a legend.
- We also need to expand the limits being shown on the y-axis

Run the following cell.

In [None]:
ax = df.plot(kind="bar",x=0,title="Experimental Results",yticks=(0.6,0.65,0.7,0.75,0.8))
# set the x-axis label
ax.set_xlabel("Sample Size")
# set the y-axis label
ax.set_ylabel("Accuracy")
# set the y axis range 
ax.set_ylim(0.6,0.9)

In [None]:
data = {"a": 1., "b":2.,"c":3.}
s = pd.Series(data)
print(s)
s.plot.bar()

### Running a python program
We now look at the difference between three different ways of running a python program. 

The first is the way used in the above examples: simply typing or pasting the code into a notebook (or console) and running it.

Very similar to the first way is to import the code from a file or module into a notebook (or console). If you import a module, python will automatically run it. That means it reads every line in the file and executes. If the module contains function definitions, executing them means creating the functions. If it contains code that calls functions, python will make those calls and run the functions.  

The third way is to run the module from the command line by typing python followed by the module name including the `.py` suffix.

Python behaves the same for the second and third method. However, it is often useful to have a module that runs using the third method but doesn't run using the second i.e. you can import the functions, and perhaps some variables, without running anything. To achieve this, modules often include the line  
- `if __name__ == "__main__"`  
as in the cell below. 

This will run when called from the command line, but not when the file is imported.

The cell below contains the programmes for the tokens exercise. It is also stored in a file named "Exercise.py" You don't need to read the code as nothing has changed (apart from the addition of one line for testing which was added only to the saved file). 

In [None]:
def make_tokens(input_text):
    """
    Take an input text, split it into tokens, find the
    token's shape, make a feature
    vector with the token itself and its shape, return
    a list of all token feature vectors found in the input.
    :param input_text: A character string containing spaces
    :return: A list of token feature vectors (token, shape).
        Sample output: [('a', 'alpha'), ('7', 'digit'), ('A27', 'alnum')]
    """
    
    # Now it's up to the split_tokes function to decide what a token is.
    # List comprehension creates a list by extracting elements from
    # an iterable object, in this case Python automatically converts the
    # split_tokens function into an iterable object because it uses the "yield" statement:
    
    tokens = [token for token in split_tokens(input_text)]
    return map(make_token_feature_vector, tokens)


def make_token_feature_vector(token):
    
    """
    Given a token, extract its shape and return a
    vector with the token itself and its shape
    :param token: A character string
    :return: A tuple (token, shape)
    """
    
    if token.isalpha():
        return (token, "alpha")
    elif token.isdigit():
        return (token, "digit")
    elif token.isalnum():
        return (token, "alnum")
    elif token in ",:;":  
        return (token, "punctuation")
    elif token in ".!?":  
        return (token, "sentence_end")
    elif token == "\n":  
        return (token, "paragraph_end")
    else:
        return (token, "other")



def split_tokens(input_text):
    
    """
    This function decides how to delimit a token. It takes an input
    string, iterates over it character by character; it collects
    constituent characters in the output token; punctuation characters
    are considered delimiters therefore become tokens of their own; the
    space character is removed from tokens. Yield each found token at
    a time.
    :param input_text: A character string containing a mix of text and delimiter characters.
    :yield A character string which is either free from delimiters or
        is a delimiter itself.
    """
    
    # First decide what characters delimit a token:
    DELIMITERS = ",:!?.\n"
    
    token = ""
    for char in input_text:
        
        if char in DELIMITERS:  # test if the input character is a delimiter (substring presence)
            
            # Character strings, lists, etc, have a logical truth value in Python;
            # an empty string is False, if it has characters it is True.
            
            if not token:  # same as token == ""
                yield char
            else:
                
                # Return token to the calling program, but next time this function
                # is called, continue from
                # the next statement rather than from the beginning of the function:
                
                yield token  # After yielding control to the calling program,
                             # this function will execute the next statement:
                token = ""  # Pick up execution from here.
                yield char
        elif char == " ":
            if token:  # same as token != ""
                yield token
                token = ""
        else:
            token += char
            
sample_text = "This is a sample sentence01 showing 7 different token types: alphabetic, numeric, alphanumeric, Title, UPPERCASE, CamelCase and punctuation!\nSentences like that should not exist. They're too artificial.\nA REAL sentence looks different. It has flavour to it. You can smell it; it's like Pythonic code, you know?\nHave you heard of 'code smell'? Google it if you haven't."            

if __name__ == "__main__":
    for token in make_tokens(sample_text):
        print(token)

### Exercise
Try the following.

1. Execute the cell above and look at what happens.

2. In the empty cell below execute:  
`import Exercise`  
Note the capital letter in the filename. 
It should not run the programme. 

To understand what has happened, run each the following commands one at a time:  
`print(noone)`  
`print Exercise.noone`  
`from Exercise import noone`  
`print(noone)` 

The variable `noone` did not exist in the original programme (it was assigned in the test line that was added to the file).
- Notice the difference between the two types of import. Using the second type is more convenient as you don't have to specify the namespace to access functions and variables.
- For this reason people sometimes use the command  
`from module import *`  
However, this is dangerous as you can easily overwrite existing names and python will not warn you. Using the import command in this way is considered bad practice. You can sometimes get away with it when importing your own module, but avoid it with library modules.



Note on terminology. The word "parse" means to read and process sequentially. In NLP it also has a specific meaning to analyse text to determine its syntax. To avoid confusion please be aware that in this exercise the first meaning is used.

The programme below constructs a nested list by reading some input text and looking for delimiters.
- First it runs our `make_tokens` function.
- Then it reads one token at a time to construct sentences. Each sentence is a list.
- When the end of a sentence is reached, a new empty list is created and a new sentence read.
- When it reaches the end of a paragraph, all the sentences in that paragraph are kept together in a list, and a new pargraph is created.

There are various ways this could be done. The method below uses the generator method we have seen before where results are delivered using the yield command, instead of the return command. This means the function does not exit, but resumes from the same place the next time it is called.
- Using generators is often a good way to write clear simple code
- Another advantages of the generator method is that it enables data to be processed as it is needed, making it possible to process very large lists that might use up too much memory.

### Exercise
Execute the  cell below and study the code until you understand how it works.

In [None]:
def parse_text(input_text):
    """
    A parsed text is defined as a list of parsed paragraphs.
    Given an input text, parse its paragraphs and return a list
    with the results.
    :param input_text: A character string with paragraphs
    :return: A list of parsed paragraphs
    """
    
    return [paragraph for paragraph in parse_paragraphs(input_text)]


def parse_paragraphs(input_text):
    """
    A parsed paragraph is defined as a list of parsed sentences.
    Given an input text, parse its sentences; if the sentence is
    actually the end of a paragraph, then yield the previous
    sentences packed as a list.
    :param input_text: a character string containing paragraphs
                       and sentences.
    :yield: A list of sentences up to the end of the paragraph.
    """
    
    paragraph = list()
    for sentence in parse_sentences(input_text):
        
        # We expect parse_sentences to return "paragraph_end"
        # when it encounters an end of paragraph mark.
        
        if sentence == "paragraph_end":
            yield paragraph
            paragraph = list()
        else:
            paragraph.append(sentence)
    yield paragraph


def parse_sentences(input_text):
    """
    A parsed sentence is defined as a list of token vectors
    :param input_text: a character string containing paragraphs,
                       sentences and token vectors.
    :yield: A list of token vectors up to the end of a sentence.
    """
    
    token_vectors = make_tokens(input_text)  
    sentence = list()
    
    # Since a token vector is a tuple (token, shape) we can unpack it
    # automatically as we iterate over the list of token vectors:
    
    for token, shape in token_vectors:
        if shape == "sentence_end":
            yield sentence
            sentence = list()
        elif shape == "paragraph_end":
            if sentence:
                yield sentence
                sentence = list()
            yield "paragraph_end"
        else:
            sentence.append((token, shape))
    if sentence:
        yield sentence



print("************************** SENTENCES IN THE PARSED TEXT:")
for sentence in parse_sentences(sample_text):
    print(sentence)
print("************************** PARAGRAPHS IN THE PARSED TEXT:")

for paragraph in parse_paragraphs(sample_text):
    print(paragraph)

print("************************** PARSED TEXT:")
print(parse_text(sample_text))

### Exercise
The programme in the cell below selects a character at random from the nested list generated by the previous programme. Run the cell and make sure you can understand what it is doing.

1. Recall that we defined a token vector to be an ordered pair (token, shape). Accessing the token or the shape with the code `token_vector[0]` or `token_vector[1]` is difficult to read. It is better to define the indices as constants. Constants are always given capitalised names and sit in the global scope. Do you agree that this improves readability?

2. Notice how to index into the nested list and the character string. The line indexing the character string could have been written as:  

`character = parsed_text[paragraph_coord][sentence_coord][token_coord][TOKEN][character_coord]`

Do you think this would have made the programme more readable?

In [None]:
import random

TOKEN = 0  
SHAPE = 1

def get_random_character_coordinates_in_text(parsed_text):
    """
    Given a parsed text, as the one produced by parse_text.py,
    return a random character within the text, together with its
    coordinates.
    :param parsed_text: A nested list with token vectors within
        sentence lists within paragraph lists.
    :return: A vector where the elements are: the random character,
        the paragraph, sentence, token and character coordinates.
        Sample output: ('f', 3, 1, 2, 1)
    """

    # Generate a random index within a valid range:
    
    paragraph_coord = random.randrange(len(parsed_text))
    sentence_coord = random.randrange(len(parsed_text[paragraph_coord]))
    token_coord = random.randrange(len(parsed_text[paragraph_coord][sentence_coord]))
    token = parsed_text[paragraph_coord][sentence_coord][token_coord][TOKEN]
    character_coord = random.randrange(len(token))
        
    
    # With the obtained random coordinates, access the input parsed text:
    
    character = token[character_coord]
    
    return character, paragraph_coord, sentence_coord, token_coord, character_coord


parsed_text = parse_text(sample_text)
for _ in range(10):
    print(get_random_character_coordinates_in_text(parsed_text))