## Side Note: Type Checking

Most of the time your code will know the data type implictly, but it is useful to be able to explicitly set it, and to check it if you're uncertain what type of data a variable is.

In [None]:
print ( type('what type is this string?') )
print ( type(20) )
print ( type(20.0) )

our_list = [1,2,3]
print ( type(our_list) )


In some cases we can even change types...

In [None]:
number_string = '1250'
print(number_string)
print(type(number_string))

number_int = int(number_string)
print(number_int)
print(type(number_int))

In [None]:
# but transformations are tricky, and must be logical
print (int('Hello!'))

In [None]:
# we can use type checking to ensure that our code is running as expected.
# The `assert` statement checks that something is True and if not, throws up an 'Assertion error' to stop the code.

assert type(number_string) is str
print('So far so good!')

assert type(number_int) is int
print('Another hurdle crossed!')

# Some more code happens and somehow number_int gets turned into a string!
number_int = str(number_int)

assert type(number_int) is int

print('Success!!')

### Side Note: Chaining
This ability is called chaining, where the result of an operation can be immediately passed to another operation.
For example we could start with our `friends` string from earlier.

>friends = 'Rachel, Monica, Ross, Joey and Chandler'

Let's say we want to make everything lower case, split the string into a list by serperating by each comma, then isolate the string 'joey and chandler', split again by spaces, and then select the word chandler. We could do this as a series of individual lines, assigning the result of each operation to a new variable or we could chain the methods together. 

Watch what happens as we add each method.

In [1]:
friends = 'Rachel, Monica, Ross, Joey and Chandler' # We'll rerun the variable assignment just to make sure.

In [2]:
friends.lower()

'rachel, monica, ross, joey and chandler'

In [3]:
friends.lower().split(',')

['rachel', ' monica', ' ross', ' joey and chandler']

In [4]:
friends.lower().split(',')[3]

' joey and chandler'

In [5]:
friends.lower().split(',')[3].split()

['joey', 'and', 'chandler']

In [6]:
joey = friends.lower().split(',')[3].split()[0]
joey

'joey'

In [7]:
# NOTE:  We did not need all those intermediary steps to achieve this result....
chandler = friends.lower().split(',')[3].split()[2]
chandler

'chandler'

In [8]:
# Note that this is purely to demonstrate chaining.
# A far more efficient approach would be to slice 'Chandler' off the end and then use the .lower() method.

friends[-8:].lower()

'chandler'

### Side Note Key Points
1. In programming the same ends can be achieved using a range of different means.
2. Chaining is a useful technique to save time and avoid making too many variables. 
3. However in Python, readability is highly valued. Sometimes it is better to have more lines of code that are intelligible to a human reader than to just have a single line of code that does 100 operations.

<a id='dictionaries'></a>

## 6. Dictionaries
Lists are useful because they provide an ordered collection of values. The value of dictionaries is that they paired  keys and values. This allows you to refer to a dictionary's key, and it will return the associated value. They can be helpful in that they can provide a label for a value, and can be used to ensure pairs of values remain linked together.

<img src="https://raw.githubusercontent.com/Minyall/sc207_materials/master/images/dictionary.png">



In [None]:
# Dictionaries can be initisalised in a few ways, here are two approaches...

my_empty_dictionary = dict()
my_empty_dictionary_bracket = {}

In [None]:
# we would normally initialise with values

test_dict_a = dict(town='Colchester', firstname='Joe', surname='Joeson')
test_dict_b = {'town': 'Colchester', 'firstname':'Joe' ,'surname': 'Joeson'}

In [None]:
# We can look at a whole dictionary
print (test_dict_a)

In [None]:
# and you recall individual values by citing keys using square brackets

print(test_dict_a['town'])
print(test_dict_b['surname'])

In [None]:
# we can check whether keys are in a dictionary

print ('town' in test_dict_a)
print ('pet' in test_dict_b)

In [None]:
# and we can add to an existing dictionary easily...

test_dict_a['pet'] = 'Horse'

In [None]:
print(test_dict_a)
print ('pet' in test_dict_a)

In [None]:
# and remove values as well using the del statement and providing the dictionary and key

del test_dict_a['pet']

print(test_dict_a)

### Side Note: Dictionary Order

Since Python started there has ben no guarantee that the order of dictionaries would be retained. Just because you added a particular `key:value` pair to a dictionary last, it didn't mean it will always stay at the end of the dictionary.

People that tried activities such as trying to make a dictionary of values that matched the order of a list often got nasty surprises later when they attempted to match up the list to the dictionary based on the order of the data.

Since the release of Python 3.6, dictionaries __have__ retained their order, but the community tends to treat them as if they don't due to backwards compatibility with older scripts, and other conventions that have built up around prior limitations of the language.

In [None]:
# dictionaries can hold all sorts of data types - it's often easier to spread them across multiple lines

test_dict_b = {'town': 'Colchester', # make sure you seperate value entries with a comma
               'firstname':'Joe' ,
               'surname': 'Joeson',
               'age':32,
               'friend_list': ['Donald','Theresa','Kim'],
               'clever_function': lambda x: x*2}

for key, value in test_dict_b.items():
    print('***')
    print(key)
    print(value)
    
test_dict_b['clever_function'](2)

In [None]:
# One use of lists in dictionaries is to create a labelled set of data imagining the keys as the column headers,
# and the list under each key the row values for that column. The lists will retain the row order.

firstname = ['Donald','Theresa','Kim']
surname = ['Trump','May','Jong-un']

data = {'first_name': firstname,
        'surname': surname}

data

# Dicitonaries have other functions, and are often used as part of much larger data structures. 
# For our purposes 'Pandas' is a much better way to handle data in Python, which we'll cover in a later session.

# *Exercises: Dictionaries*

In [None]:
# 1. Initialise a new dictionary with three keys, `first_name`, `surname`, and `favourite_animal`. Choose some values.



In [None]:
# 2. Add another key to your dictionary called `secret_plan` and provide a value. 
# Print the dictionary to check it worked.


In [None]:
# 3. Delete your secret plan from your dictionary....! Print to check you're safe.



<a id='task'></a>


# 9. Final Task - Create your own word frequency script.
<img src="https://raw.githubusercontent.com/Minyall/sc207_materials/master/images/alice.png" align="right">


Your task using everything you have been shown above is to create a script that can take any length of text, and count up how many times each unique word occurs in the text. You will need two functions. 

1. The first should take our `test_string`, make everything lowercase, remove any punctuation and return a list of individual word tokens. To help you we have provided a `remove_punctuation` function already (see if you can work out how it works).

2. The second function should take a list of string word tokens (the output of the first function) and return a dictionary where each key is a unique word and each value is a count of how many times that word occurs.

Once you've created this dictionary you can use the `print_sorted_dict` function provided to see what words occur the most.

### Tips:
- Make sure you run the two cells with the `test_string` and the helper functions.
- Remember you can make new cells by...
    - Going into command mode by clicking to the left of a cell (making the coloured bar turn blue) and then...
    - tapping 'a' or 'b' on your keyboard to create a new cell either above or below.
- If you need to delete a cell go into command mode and then double tap 'd' on the keyboard.

In [None]:
test_string = "Either the well was very deep, or she fell very slowly, for she had plenty of time as she went down "\
"to look about her and to wonder what was going to happen next. First, she tried to look down and make out what she "\
"was coming to, but it was too dark to see anything; then she looked at the sides of the well, and noticed that they "\
"were filled with cupboards and book-shelves; here and there she saw maps and pictures hung upon pegs. She took down "\
"a jar from one of the shelves as she passed; it was labelled ‘ORANGE MARMALADE’, but to her great disappointment it "\
"was empty: she did not like to drop the jar for fear of killing somebody, so managed to put it into one of the "\
"cupboards as she fell past it.'Well!’ thought Alice to herself, ‘after such a fall as this, I shall think "\
"nothing of tumbling down stairs! How brave they’ll all think me at home! Why, I wouldn’t say anything about "\
"it, even if I fell off the top of the house!’ (Which was very likely true.) Down, down, down. Would the fall "\
"never come to an end! ‘I wonder how many miles I’ve fallen by this time?’ she said aloud. ‘I must be getting "\
"somewhere near the centre of the earth. Let me see: that would be four thousand miles down, I think’ (for, "\
"you see, Alice had learnt several things of this sort in her lessons in the schoolroom, and though this was not "\
"a very good opportunity for showing off her knowledge, as there was no one to listen to her, still it was good "\
"practice to say it over) ‘yes, that’s about the right distancebut then I wonder what Latitude or Longitude "\
"I’ve got to?’ (Alice had no idea what Latitude was, or Longitude either, but thought they were nice grand words "\
"to say.)"

In [None]:
# Some helper functions for you...
from string import punctuation
punctuation = [char for char in punctuation]
punctuation.extend(["‘","’"])

def remove_punctuation(text):
    return ''.join([char for char in text if char not in punctuation])

def print_sorted_dict(your_dictionary):
    for word in sorted(your_dictionary, key=your_dictionary.get, reverse=True):
        print(f'{word}: {your_dictionary[word]}')
    return

In [None]:
# Function 1 Here

In [None]:
# Function 2 Here

In [None]:
cleaned_text = #output of function 1 on test_string
my_counter = #output of function 2 on cleaned_text

In [None]:
print_sorted_dict(my_counter)

In [None]:
# in the case above the values for sentences and keywords must be passed to the function.
# however you can also have optional values to create more complex systems.

# In this example we have the lower_output keyword. Keywords must have a default value defined, in this case False 

def sentence_filter_lower_option(sentences, filter_words, lower_output=False): 
    filtered = []
    for word in filter_words:
        for sent in sentences:
            if word in sent:
                if lower_output:
                    sent = sent.lower()
                filtered.append(sent)
    return filtered

In [None]:
print(sentence_filter_lower_option(sentences, ['Trump']))
print(sentence_filter_lower_option(sentences, ['Trump'], lower_output=True))

In [None]:
# you can use lambda to make a quick function on the fly which can often be useful

tiny_percent_func = lambda fraction, total: (fraction/total)*100 # often a full func is more readable.

tiny_percent_func(6, 20)

### Indexing Strings
Each character in a string is also assigned an 'index' number, allowing you to refer to positions in the string. This means you can *slice* the string to extract certain parts of it. This creates a new string and does not change the original variable.

In [None]:
our_string = "Please don't slice me!"

print(our_string[0])
print(our_string[:10])
print(our_string[10:])

print(our_string)

<img src="https://raw.githubusercontent.com/Minyall/sc207_materials/master/images/hello.png">

Indexes start at 0, and if you use a range Python will return every character **up to but not including** the upper number. You can also select characters from the end of a string in reverse by using a negative index number.

Indexing can be used to return characters at specific positions.

'Hello'[0]
> H

'Hello'[1]
> e

An example of an index range or 'slice' would be 'Hello'[1:4] which returns the all characters between the 2nd character (remember 0 indexing) and the 4th character, *but not including* the 4th character itself.

> 'ell

If we wanted to include the 4th character we would use 'Hello'[1:5]
> 'ello

If an index value is left blank it is interpreted as either the beginning or end of the string.

'Hello'[:3]
> Hel

'Hello'[3:]
> lo

In [None]:
# [Start Index:End Index]

print( 'Hello'[0] ) # Return just character 0
print( 'Hello'[1:] ) # Beginning from character 1 until the end of the string
print( 'Hello'[:4] ) # From the beginning of the String up to but not including character 4
print( 'Hello'[2:4] ) # Beginning with character 2, and ending up to but not including character 4

In [None]:
print( 'Hello' [-4:])
print( 'Hello' [-3:-1])
print( 'Hello' [-5:])
print( 'Hello' [:-2])