# Python basics

## Getting help

To get help from the interactive console, pass the name of any object (function, variable, etc.) to the ``help()`` function:

In [1]:
help(print)

Help on built-in function print in module builtins:

print(...)
    print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
    
    Prints the values to a stream, or to sys.stdout by default.
    Optional keyword arguments:
    file:  a file-like object (stream); defaults to the current sys.stdout.
    sep:   string inserted between values, default a space.
    end:   string appended after the last value, default a newline.
    flush: whether to forcibly flush the stream.



If the above gives a ``SyntaxError``, you are using Python 2 rather than Python 3.x. In this case,  type this at the top of your scripts / modules:

In [2]:
from __future__ import print_function

Then you can call `help(print)` as above.

If you are using IPython/Jupyter, you can append a ``?`` to an object's name, like this:

In [3]:
print?

This brings up the "docstring" attached to an object (variable, function, class, ...).

## Variables

Variables store things. Python has these basic variable types: ``int`` (integer), ``float``, ``str`` (string), ``list``, ``tuple``, ``dict`` (dictionary), ``set``.

In [4]:
# How to create new variables:
myint = 5
myfloat = 7.1
mystr = 'au'
mylist = [1, 3, 5]
mytuple = (1, 3, 5)
mydict = {'au': 'Australia', 'jp': 'Japan', 'uk': 'United Kingdom'}
myset = {'au', 'jp', 'uk'}

Variable names can be any combination of letters, numbers and underscores:

In [5]:
# Valid and common format for naming varaibles

all_lowercase_with_underscores = 'valid'

# Valid but unusual to find in Python
camelCaseVariable = 'unusual in Python, but still valid'
onewordlowercase = 'Hard to read, use underscores'

In [6]:
# Invalid:
1var = 0    # number cannot be first character
var! = 0    # symbols like ! have special meanings

SyntaxError: invalid syntax (<ipython-input-6-bccf36553a91>, line 2)

## Conditions

Python uses ``if`` / ``elif`` / ``else`` blocks for conditional flow control:

In [7]:
control_mean = 1.0
treatment_mean = 1.2

In [8]:
if treatment_mean > control_mean:
    print('The treatment appears to have some positive effect')

The treatment appears to have some positive effect


In [9]:
if treatment_mean > control_mean:
    print('The treatment appears to have some positive effect.')
elif treatment_mean == control_mean:
    print("The treatment doesn't appear to have any effect.")
else:
    print('The treatment appears to have a negative effect.')

The treatment appears to have some positive effect.


Python has English-like conditional operators for boolean (True/False) expressions: ``and``, ``or``, ``not``:

In [10]:
if 0 < treatment_mean and treatment_mean < 2:
    print('The effect of the treatment is between 0 and 2')
else:
    print('The effect of the treatment is not between 0 and 2')

The effect of the treatment is between 0 and 2


The above conditions can be rewritten more simply as follows:

In [11]:
if 0 < treatment_mean < 2:
    print('The effect of the treatment is between 0 and 2')
else:
    print('The effect of the treatment is not between 0 and 2')

The effect of the treatment is between 0 and 2


Yet another way of writing this is with ``not``:

In [12]:
if not (0 < treatment_mean < 2):
    print('The effect of the treatment is not between 0 and 2')
else:
    print('The effect of the treatment is between 0 and 2')

The effect of the treatment is between 0 and 2


## Functions

In [13]:
def greet(name):
    """
    Creates a greeting for the person.
    """
    return "Hello " + name + "!"

In [14]:
# You can call help on your new function, which will read the docstring
help(greet)

Help on function greet in module __main__:

greet(name)
    Creates a greeting for the person.



In [15]:
greet("Robert")

'Hello Robert!'

Here is another function:

In [1]:
def bmi(mass, height):
    """
    Returns the Body Mass Index for a person's mass in kg and height in metres
    """
    return mass / height**2

Notice that ``**`` is the power (exponentiation) operator in Python. Watch out: `^` gives a bitwise exclusive or!


### Function keyword arguments

Functions can have optional arguments ("keyword arguments"):

In [16]:
def greet(name, is_morning=True):
    """
    Creates a greeting for the person, based on the time.
    """
    if is_morning:
        return "Good morning " + name + "!"
    else:
        return "Hello " + name + "!"

In [17]:
name = "Robert"
greet(name, is_morning=True)

'Good morning Robert!'

In [18]:
greet(name, is_morning=False)

'Hello Robert!'

### Exercise

(a). Write a function to determine if a number is odd or even. Hint: Use the modulo operator to get the remainder:

`7 % 5` is 2

`175 % 50` is 25

(b). Write a function to convert a height, given in feet and inches (as two parameters to our function), into centimeters. Hint: convert feet and inches to total inches first. 

(c). Make the number of inches default to zero if not given.

In [None]:
# See solutions/oddsevens.py

# See solutions/convert_height.py

# See solutions/convert_height_default.py

**Extended exercise**
You can pass functions themselves as normal variables in python (they are "first order objects" much like any other). One such function that is useful is the `itemgetter` function from the `operator` module. Example code to use it is below. Use this function to sort the list `people` by each person's height. Check the help for the `sorted` function.

In [26]:
from operator import itemgetter

In [27]:
get_height = itemgetter("height")  # Check help of the function

In [28]:
people = [
    {"name": "Ed", "height": 176},
    {"name": 'Andrew', "height": 188},
    {"name": 'Kath', "height": 170}
]

3. You can pass name keywords into a function using double-star on a dictionary mapping the keyword name to the keyword value. Test this by creating a dictionary of this form and calling `greet(**my_dict)`.

## Lists

Lists are defined by square brackets, like these:

In [19]:
mylist1 = [1, 3, 5]
names = ['McCain', 'Barrasso', 'Ayotte', 'Alexander', 'Akaka']

In [20]:
type(names)

list

The ``list`` function also produces lists when passed a sequence:

In [21]:
numbers = list(range(10))
numbers

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

We will come back to ``range()`` shortly when describing loops.

Here's how to retrieve, add and delete items from lists:

In [22]:
names

['McCain', 'Barrasso', 'Ayotte', 'Alexander', 'Akaka']

In [23]:
# Retrieving items from the list
names[0]     # The first element has index 0 !

'McCain'

In [24]:
names[2]     # third element

'Ayotte'

In [25]:
names[2:]    # a slice: multiple items: 3rd to end

['Ayotte', 'Alexander', 'Akaka']

In [26]:
names[:2]     # equivalent to exchanges[0:2]. Another slice.

['McCain', 'Barrasso']

Python also supports "extended slicing" with an optional step size. The syntax is:

In [None]:
mylist[start:stop:step]

Not specifying a start is the same as saying "start from the start of the list", and not specifying a stop is the same as "go to the end of the list".

In [27]:
names[::2]    # every other protocol

['McCain', 'Ayotte', 'Akaka']

Here is an easy way to reverse the order of a list:

In [28]:
names[::-1]

['Akaka', 'Alexander', 'Ayotte', 'Barrasso', 'McCain']

In [29]:
# Appending items onto the end
names.append('Obama')       # to add a single item
names += ['Booker', 'Sanders'] # to add one or more items
names.extend(['Kirk', 'Bennet'])  # equivalent to the above
names

['McCain',
 'Barrasso',
 'Ayotte',
 'Alexander',
 'Akaka',
 'Obama',
 'Booker',
 'Sanders',
 'Kirk',
 'Bennet']

In [30]:
# Deleting an item with a given index
del names[0]
names[0:3]

['Barrasso', 'Ayotte', 'Alexander']

In [31]:
# Delete three more from the beginning:
del names[:3]
names

['Akaka', 'Obama', 'Booker', 'Sanders', 'Kirk', 'Bennet']

In [32]:
del names[-1]     # delete the last item
print(names)

['Akaka', 'Obama', 'Booker', 'Sanders', 'Kirk']


### Exercise

1. Create a list of everyone's first name in the room
2. Find the index of your name (using the ``names.index()`` method).
3. Create an alphabetically sorted list using the ``sorted()`` function.
4. Retrieve a list of names after yours in the alphabet.
5. Are there are any duplicate first names?

In [None]:
# See solutions/names.py

**Extended Exercise**
You can index and loop over strings as if they were lists-of-characters. Using a dictionary to count, how many times does each character appear in your list?

#### "List comprehension": a compact way of creating a list from another sequence

In [33]:
sorted_names = sorted(names)
sorted_indices = [sorted_names.index(name) for name in ['Akaka', 'Obama', 'Booker']]

In [34]:
sorted_indices

[0, 3, 1]

In [35]:
bignames = [name.upper() for name in names]
bignames

['AKAKA', 'OBAMA', 'BOOKER', 'SANDERS', 'KIRK']

## Loops

Loops are blocks of code run repeatedly, introduced by ``for`` (most often) or ``while``:

### Looping through sequences

Here is how we loop through an iterable sequence such as a list (or file or set ...):

In [36]:
for name in names:
    print(name)

Akaka
Obama
Booker
Sanders
Kirk


### How to loop $n$ times

Use the ``range()`` function to loop several times:

In [37]:
n = 5
for index in range(n):
    print(index)

0
1
2
3
4


Now look at the parameters (called arguments) that the range function can take. Try this in IPython:

In [38]:
range?

This brings up the "docstring" for the ``range`` function. If not using IPython, you can do this:

In [39]:
print(range.__doc__)

range(stop) -> range object
range(start, stop[, step]) -> range object

Return an object that produces a sequence of integers from start (inclusive)
to stop (exclusive) by step.  range(i, j) produces i, i+1, i+2, ..., j-1.
start defaults to 0, and stop is omitted!  range(4) produces 0, 1, 2, 3.
These are exactly the valid indices for a list of 4 elements.
When step is given, it specifies the increment (or decrement).


We can see the values in the range by turning it into a concrete list:

In [40]:
list(range(5, 10))

[5, 6, 7, 8, 9]

Notice that the endpoint is *excluded*. The number of items returned by ``range(start, stop)`` is ``stop - start``.

### Exercise: powers of 2

Loop through the first 64 powers of 2 starting at 0. Print them, one per line, like this:

In [None]:
2**0 = 1
2**1 = 2
2**2 = 4
2**3 = 8
...

Here is a less common kind of loop with ``while``:

In [41]:
i = 100
while i > 0:
    print(i)
    i //= 2         # equivalent to i = i // 2   (integer division)
print("Finished!")

100
50
25
12
6
3
1
Finished!


### Indentation

Note that indentation controls the flow in Python. Compare the following two ``for`` loops and notice the different behaviour:

In [42]:
for name in names:
    name_upper = name.upper()    # uppercase
    print(name_upper)

AKAKA
OBAMA
BOOKER
SANDERS
KIRK


In [43]:
for name in names:
    name_upper = name.upper()    # uppercase
print(name_upper)

KIRK


### List comprehensions
A 'list comprehension' is a compact piece of syntax to create a list in one line. It has a for loop inside the list square brackets, like this:

In [44]:
squares = [number**2 for number in range(10)]
squares

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [61]:
names_upper = [name.upper() for name in names if len(name) == 4]
names_upper

['SZSE', 'TWSE']

### Exercise

1. Loop through each of the names, and only print out the name if it starts with an A
2. Loop through each of the names, and only print them out if their name *doesn't* start with an A
3. Update your code to return a list of the names that don't start with an A, instead of printing them out.

**Hint:** You can get the first letter of a string using `word[0]`. Also, remember to indent the lines. You'll need one indent for the loop, and another for the if conditional.

**Extended exercise**
Repeat the exercises with a list comprehension instead of a for loop.

## Dictionaries
Dictionaries map from keys to values, must like how a phone book maps someone's name to their phone number.

This is how we look up values in a dictionary: through the key

In [69]:
heights = {'Ed': 176, 'Andrew': 188, 'Kath': 170}
heights['Ed']    # Get the value corresponding to the "Ed" key

176

In [70]:
heights['Robert'] = 180  # Add a new entry

In [71]:
heights

{'Andrew': 188, 'Ed': 176, 'Kath': 170, 'Robert': 180}

Dicts all have these important 'methods': keys(), values(), items().

Dicts have no concept of order, so the order of the outputs of these is unpredictable. (The order is, however, at least consistent for the three.)

In [72]:
heights.keys()

dict_keys(['Ed', 'Andrew', 'Kath', 'Robert'])

In [73]:
heights.values()

dict_values([176, 188, 170, 180])

In [74]:
heights.items()

dict_items([('Ed', 176), ('Andrew', 188), ('Kath', 170), ('Robert', 180)])

In [75]:
len(heights)   # number of keys == number of values == number of items

4

### How to loop through dicts:

In [76]:
# Method 1
for key in heights:  # equivalent to: `for key in heights.keys():`
    # do something with key and mydict[key]
    # e.g.
    print(key, heights[key])

Ed 176
Andrew 188
Kath 170
Robert 180


In [77]:
# Method 2
for (key, value) in heights.items():
    # do something with key and value
    # e.g.
    print(key, value)

Ed 176
Andrew 188
Kath 170
Robert 180


Warning: The order of dictionaries can't be guaranteed.

**Exercise - Dictionaries**

1. Create a dictionary that has the capital city of a few different countries.
2. Loop through the dictionary, and print out the capital for each country

**Extended Exercise - Dictionaries**

Create a dictionary inverting function, that takes a dictionary as input, and swaps keys and values around. For instance, the heights dictionary would be inverted to:

`{188:'Andrew', 176:'Ed', 170:'Kath', 180:'Robert'}`

Update the code to deal with the possibility that there can be two (or more) keys mapping to the same value.

In [78]:
# See solutions/capitals.py

## Strings

Strings are created by the use of quotes, and are list-like structures that have additional properties.

In [79]:
my_string = "Python is awesome!"

In [80]:
type(my_string)

str

In [81]:
len(my_string)

18

Compared to other languages, there is no "char" data type in python, its just a string with length one:

In [82]:
letter = my_string[0]

In [83]:
letter, type(letter), len(letter)

('P', str, 1)

Let's read in a text file to get a long list of words:

In [84]:
with open("/data/alice_in_wonderland.txt") as f:  # Create a file pointer called f
    text = f.read()

In [85]:
type(text), len(text)

(str, 144410)

In [86]:
text_lower = text.lower()  # Convert everything to lowercase

In [87]:
text_lower.count("a")  # how many times this letter appears

8791

In [88]:
set(text_lower)  # The unique characters in the string

{'\n',
 ' ',
 '!',
 '"',
 "'",
 '(',
 ')',
 '*',
 ',',
 '-',
 '.',
 ':',
 ';',
 '?',
 '[',
 ']',
 '_',
 'a',
 'b',
 'c',
 'd',
 'e',
 'f',
 'g',
 'h',
 'i',
 'j',
 'k',
 'l',
 'm',
 'n',
 'o',
 'p',
 'q',
 'r',
 's',
 't',
 'u',
 'v',
 'w',
 'x',
 'y',
 'z'}

### Presence of a substring

You can test whether one contains another using the ``in`` operator:

In [89]:
'de' in 'abcdefg'

True

In [90]:
'ag' in 'abcdefg'

False

In [91]:
'!' not in 'abcdefg'

True

String objects also have a variety of useful methods. You can list these in IPython by using tab completion:

In [92]:
s = 'Some string'

In [None]:
s.<TAB>     # hit the TAB key

Here is a list of string methods:

capitalize
casefold
center
count
encode
endswith
expandtabs
find
format
format_map
index
isalnum
isalpha
isdecimal
isdigit
isidentifier
islower
isnumeric
isprintable
isspace
istitle
isupper
join
ljust
lower
lstrip
maketrans
partition
replace
rfind
rindex
rjust
rpartition
rsplit
rstrip
split
splitlines
startswith
strip
swapcase
title
translate
upper
zfill

To get help on any one of them, use the ``help()`` function or a trailing ``?`` in IPython:

In [95]:
help(s.split)

Help on built-in function split:

split(...) method of builtins.str instance
    S.split(sep=None, maxsplit=-1) -> list of strings
    
    Return a list of the words in S, using sep as the
    delimiter string.  If maxsplit is given, at most maxsplit
    splits are done. If sep is not specified or is None, any
    whitespace string is a separator and empty strings are
    removed from the result.



String "literals" can be introduced in two other main ways: with an `r` prefix (raw strings) and with a triple quote (multiline string). Here are some examples:

In [81]:
broken_path = 'C:\Documents and Settings\norman'
print(broken_path)

C:\Documents and Settings
orman


In [82]:
# To avoid needing double backslashes to "escape" them, use an
# r prefix:
path = r'C:\Documents and Settings\norman'
print(path)

C:\Documents and Settings\norman


In [83]:
# A multiline string:
names = '''
Fred
George
Alice
Wen Lu
Alexei
'''

In [84]:
names

'\nFred\nGeorge\nAlice\nWen Lu\nAlexei\n'

In [85]:
# The `strip` method of strings is useful for removing any leading and
# trailing whitespace characters:
names.strip()

'Fred\nGeorge\nAlice\nWen Lu\nAlexei'

### Exercises: String methods
Find the right string methods for the following:

1. Convert the string ``'great britain'`` to upper case and title case (``Great Britain``).
2. Split this string into a list of words:

In [None]:
logline = 'String splitting can be done via the split method.'

3. Test whether the string contains any words beginning with ``s`` and ending with ``ing:``.
4. Use list indexing to obtain a list of bi-grams, i.e. pairs of words occuring next to each other. The first bi-gram in the sentence is `["String", "splitting"]`.

In [None]:
# See solutions/string_splitting.py

### Extended exercise: creating your own string function

1. Create a function that extracts bi-grams from any given string.
2. Extend the function to take a parameter, n, which is the length of n-gram to extract. For instance, giving `n=3` will return tri-grams (length 3), such as `["String", "splitting", "can"]`.

In [None]:
# See solutions/ngrams.py

## Exceptions

Sometimes your program doesn't work the way you intended, or you want to call attention to something wrong. You can use exceptions to do this.

For example, our height conversion function from earlier shouldn't really accept inches values above 12, but it does.

In [88]:
def convert_height(feet, inches=0):
    total_inches = 12 * feet + inches
    return total_inches * 2.54

In [89]:
feet = 4
inches = 16
convert_height(feet, inches)

162.56

We can throw an exception in the case where the number of inches is more than 12 -- this will alert the user into needing to convert their data into a better format.

In [90]:
def convert_height(feet, inches=0):
    if inches >= 12:
        raise ValueError("Inches should be less than 12.")
    total_inches = 12 * feet + inches
    return total_inches * 2.54

In [91]:
convert_height(5, 16)
convert_height(5, 6)

ValueError: Inches should be less than 12.

Exceptions will stop your code running, so the second line above wasn't even run. We can *catch* exceptions, if we expect them and know what to do with them:

You can see a list of all of the built-in exception types in the official Python documentation: https://docs.python.org/3/library/exceptions.html

In [92]:
try:
    convert_height(5, 16)
except ValueError:
    print("I shouldn't have done that")

I shouldn't have done that


### Exercises

1. As our function computes heights, it doesn't make sense for the values to be negative. Raise an exception if the given parameters are negative.
2. It may be possible to have a negative inches value. In other words, "5 feet, minus 2 inches" would mean the height of someone who is two inches shorter than 5 feet tall. Change the exception to only be raised if the total height is negative.


### Extended Exercises

Investigate using `type(my_variable)` to determine whether the user has entered a number of not. If they haven't entered a number, try convert it to a number using `float(my_varaible)`. If that fails, raise an Exception of an appropriate type.