# Python warm up

To refresh your knowledge in preparation for today's lecture, we'll quickly review Python's built-in, general-purpose "container" data types: **list**, **tuple**, **range**, **set** and **dict**. We'll use a number of the common *sequence operations* as we go.

## Sequence Types

First, let us recall that the most primitive form of **sequence** is the *string*. The string object is, in fact, an *immutable sequence* of unicode characters:

In [1]:
magicWord = 'abracadabra'

# Note the escaped single quote \' in the string handed to print()
# len() returns the length of a sequence
print('Today\'s magic word "{0}" has {1} characters.'.format(magicWord, len(magicWord))) 

Today's magic word "abracadabra" has 11 characters.


Like all sequence types, strings support element indexer [i] and slicer [i:j:k] operations

In [2]:
magicWord[0] # Recall: indexers begin with 0. Consequently, the first element in the sequence is retrieved with [0]

'a'

In [3]:
magicWord[-4:-2] # Recall: we use negative indexes to access elements starting from the end of the sequence

'ab'

**Exercise:** Extract the word 'cadabra'.

In [4]:
cadabra = magicWord[4:]
cadabra

'cadabra'

_**Question**_: What is the *third parameter* in the slicer operation good for?

In [5]:
#What is good for?
cadabra[::2]

'cdba'

We use the **step** parameter to specify the *interval* of the slice operation. In the above example, we select every third item in our string, **start**ing with the second character (index 1!) and **stop**ping at the 11th character (index 10).

String supports all the common sequence functions like x in s, len(), count(x), min(), max(), index(i) and the overloaded operaters + and * for concatenation and repetition, respectively: e.g.

**Exercise:** Duplicate our _magic_ word 10 times.

In [6]:
# ten times the magic!
magicWordTenTimes = magicWord * 10
magicWordTenTimes

'abracadabraabracadabraabracadabraabracadabraabracadabraabracadabraabracadabraabracadabraabracadabraabracadabra'

**Exercise:** Check whether the word cadabra does appear in our magic word.

In [7]:
#look for cadabra in our magic word
'cadabra' in magicWord

True

### Tuples

**Tuples are immutable sequences** typically used to store *heterogeneous* data. Tuples are enclosed in parentheses: **( )**.

Can we turn our magicWord string into a tuple?

Yes we can! We pass it to the **tuple()** constructor.

**Exercise:** Convert our magicWord into a _tuple_

In [8]:
magicTuple = tuple(magicWord)
magicTuple

('a', 'b', 'r', 'a', 'c', 'a', 'd', 'a', 'b', 'r', 'a')

**Question:** Can we make changes to our tuple?

In [9]:
#Try to edit our Tuple
magicTuple[0] = 'b'

TypeError: 'tuple' object does not support item assignment

In [10]:
#Any idea on how we could expand our tuple
magicList = list(magicTuple)
# magicList = list(magicWord)
# 

In [12]:
str(magicList)

"['a', 'b', 'r', 'a', 'c', 'a', 'd', 'a', 'b', 'r', 'a']"

In [13]:
magicList += ['']

In [14]:
magicList

['a', 'b', 'r', 'a', 'c', 'a', 'd', 'a', 'b', 'r', 'a', '']

In [15]:
# ABBA
magicList.append('ABBA')

In [16]:
magicList = magicList + ['ABBA']

In [17]:
magicList

['a', 'b', 'r', 'a', 'c', 'a', 'd', 'a', 'b', 'r', 'a', 'ABBA', 'ABBA']

In [18]:
magicList + magicTuple

TypeError: can only concatenate list (not "tuple") to list

In [26]:
cat = ('Cosmo', 'British shorthair', 'black', 3.5)
cat

('Cosmo', 'British shorthair', 'black', 3.5)

**Note**: the *parentheses* in the printed output tell you that cat is a tuple. Use the **type()** function to confirm what kind of object our cat is:

In [27]:
#your code below.
type(cat)

tuple

How would you return the *name* of our cat?

In [28]:
# your code below. Hint: index 
cat[0]

'Cosmo'

**Question**: what is the problem or inconvenience of using the tuple[i] operation?

There must be a better way! And there is...

In [37]:
#create a cat object based on a namedtuple, 'Animal' to store attributes for Name, Type, Colour and Weight
#cat = collections.namedtuple('Animal', ['Name', 'Type', 'Colour', 'Weight'])
#myCat = cat('Cosmo', 'British shorthair', 'black', 3.5)

#myCat

In [16]:
import collections

In [17]:
cat = collections.namedtuple('Animal', ['Name', 'Type', 'Colour', 'Weight'])

In [18]:
mycat = cat('Cosmo', 'British shorthair', 'black', 3.5)
your_cat = cat('Blau', 'Iranian', 'white', 7)

In [22]:
dict(mycat)

ValueError: dictionary update sequence element #0 has length 5; 2 is required

In [20]:
your_cat

Animal(Name='Blau', Type='Iranian', Colour='white', Weight=7)

Now, with a named tuple we can use the "dot" syntax on the object to retrieve any of its elements, e.g.

In [39]:
mycat.Name

'Cosmo'

In [40]:
your_cat.Name

'Blau'

What colour is Cosmo? And what type of cat is he?

In [41]:
# your code below.
mycat.Type

'British shorthair'

In [43]:
mycat.Colour

'black'

In [44]:
cat('abc')

TypeError: __new__() missing 3 required positional arguments: 'Type', 'Colour', and 'Weight'

We can't change our cat object because its a tuple. We could, however, convert it into a dictionary:

**Exercise:** Convert our cat into a dictionary

In [46]:
#Conversion
mycat_dict = mycat._asdict()
mycat_dict

OrderedDict([('Name', 'Cosmo'),
             ('Type', 'British shorthair'),
             ('Colour', 'black'),
             ('Weight', 3.5)])

In [47]:
mycat_dict['Colour']

'black'

In [48]:
mycat_dict['Weight']

3.5

**Exercise:** Add an Owner and Colour to our dictionary.

In [49]:
# Add owner and colour
mycat_dict['Owner'] = 'Superman'
mycat_dict

In [51]:
mycat_dict['Owner']

'Superman'

In [53]:
# fails
mycat.Owner ='Superman'

AttributeError: 'Animal' object has no attribute 'Owner'

### Lists

**Lists are mutable sequences** typically used to store *homogenous* data. Lists are enclosed in square brackets: **[ ]**.

You can use [] to create an empty list.

A string can also be cast to a list using the **list()** constructor:

**Exercise:** Convert our magicWord into a list

In [54]:
magicWord

'abracadabra'

In [55]:
magicList = list(magicWord) #magicWord to list

In [56]:
magicList

['a', 'b', 'r', 'a', 'c', 'a', 'd', 'a', 'b', 'r', 'a']

**Exercise:** You can sort the items in the list using the **sort()** - function:

In [57]:
magicList.sort()
magicList

['a', 'a', 'a', 'a', 'a', 'b', 'b', 'c', 'd', 'r', 'r']

**Question:** How many times does the letter a appear in our list?

In [58]:
count_a = magicList.count('a')
count_a

5

In [59]:
sum(item == 'a' for item in magicList)

5

In [60]:
len([item for item in magicList if item == 'a'])

5

In [61]:
c = 0
for item in magicList:
    if item == 'a':
        c = c+1
c

5

In [62]:
magicList

['a', 'a', 'a', 'a', 'a', 'b', 'b', 'c', 'd', 'r', 'r']

Unlike tuples, which are immutable, we can make changes to a list. *Lists are mutable objects*. To add elements to a list, use the **append()** or **insert()** functions

**Exercise: **Add an Z to the beginning and an X to the end

In [None]:
## magicList


In [63]:
#magicList = ['Z'] + magicList + ['X']
# insert or append
magicList.append('X')

In [64]:
magicList

['a', 'a', 'a', 'a', 'a', 'b', 'b', 'c', 'd', 'r', 'r', 'X']

In [65]:
magicList.insert(0,'Z')

In [66]:
magicList

['Z', 'a', 'a', 'a', 'a', 'a', 'b', 'b', 'c', 'd', 'r', 'r', 'X']

In [67]:
magicList.insert(3,'XX')
magicList

['Z', 'a', 'a', 'XX', 'a', 'a', 'a', 'b', 'b', 'c', 'd', 'r', 'r', 'X']

**Exercise:** Given a string "hocus pocus", append each character in this string to the magicList using the iterator pattern

In [69]:
hocus_pocus

'hocus pocus'

In [70]:
elements = (item for item in hocus_pocus)

In [72]:
type(elements)

generator

In [73]:
next(elements)

'h'

In [76]:
next(elements)

StopIteration: 

In [75]:
for item in elements:
    print(item)

c
u
s
 
p
o
c
u
s


In [2]:
h = list("hocus pocus")
a = []
for item in h:
    a.append(item)

In [3]:
a

['h', 'o', 'c', 'u', 's', ' ', 'p', 'o', 'c', 'u', 's']

In [77]:
hocus_pocus = list("hocus pocus")
newListWithHocusPocus = magicList + hocus_pocus 

In [78]:
newListWithHocusPocus

['Z',
 'a',
 'a',
 'XX',
 'a',
 'a',
 'a',
 'b',
 'b',
 'c',
 'd',
 'r',
 'r',
 'X',
 'h',
 'o',
 'c',
 'u',
 's',
 ' ',
 'p',
 'o',
 'c',
 'u',
 's']

To remove elements from a list we can use **del()** with an *index*, **remove()** with a *value* or **pop()** to *remove and return* an element.

In [83]:
magicList = list("abracadabra") 
magicList

['a', 'b', 'r', 'a', 'c', 'a', 'd', 'a', 'b', 'r', 'a']

In [82]:
magicList.pop()
magicList.pop()
magicList.pop()
magicList

['a', 'b', 'r', 'a', 'c', 'a', 'd', 'a']

In [85]:
magicList.remove('a')
magicList

['b', 'r', 'a', 'c', 'a', 'd', 'a', 'b', 'r', 'a']

In [86]:
magicList = list("abracadabra") 

In [88]:
del magicList[:3]

In [89]:
magicList

['a', 'c', 'a', 'd', 'a', 'b', 'r', 'a']

In [79]:
#delete the last 3 items using del
# changedList = None

In [91]:
magicList = list("abracadabra") 
magicList

['a', 'b', 'r', 'a', 'c', 'a', 'd', 'a', 'b', 'r', 'a']

We can combine del with a slicer to delete a range of values.

In [92]:
#remove the b from the list
delete_sliced = magicList 
del delete_sliced[1:3]

In [93]:
delete_sliced

['a', 'a', 'c', 'a', 'd', 'a', 'b', 'r', 'a']

To *remove all elements* we could use the slicer with no start and stop indexes: [:] or, more simply, the **clear()** function

In [96]:
del magicList[:]

In [97]:
magicList

[]

In [98]:
del magicList

In [99]:
magicList

NameError: name 'magicList' is not defined

In [None]:
magicList.clear() # Note: the same as del magicList[:]
magicList

### Ranges

A **range** is an *immutable ordered sequence of numeric elements*.
To create a range object we use the **range(start, stop, step)** function, whereby the start and step parameters are optional:

In [100]:
r1 = range(10)

In [101]:
r1

range(0, 10)

In [102]:
list(r1)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [103]:
numbers = list(range(0,24,2))
numbers

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22]

In [105]:
r2 = range(0,110,10) # denote that range starts with 0
list(r2)

[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]

In [107]:
r3 = range(10,110,10)
list(r3)

[10, 20, 30, 40, 50, 60, 70, 80, 90, 100]

In [109]:
numbers = list(range(0,100))

In [111]:
numbers[0:10]

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [113]:
res = []
for item in numbers:
    if item % 2 == 0:
        res.append(item)

In [114]:
res[0:10]

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

Tip: We can use a **list comprehension** to iterate over the range sequence, additionally defining a predicate to select only even numbered elements

In [None]:
#print every even number in the list
# HINT: FOR IN?
[item for item in numbers if item %2==1]

## Mapping Types

### Dictionaries

A **dictionary** is a *mutable, unordered table which maps hashable keys to arbitrary values*.

**dict.keys()** returns a live dictionaryview object of the dictionary's keys, **dict.values()** returns a view of the values and **dict.items()** returns a view of (k:v) pairs as a list of 2-tuples which can be accessed using k and v elements.

The key:value pairs of a dict object are enclosed in curly braces, **{ }**.

Let us take the following dictionary containing the favourite sports of each country

In [159]:
sports = {'Archery': 'Bhutan',
          'Golf': 'Scotland',
          'Sumo': 'Japan',
          'Ski': 'Austria',
          'Drinking Beer': 'Austria',
          'Taekwondo': 'South Korea'}

In [118]:
sports['Drinking Beer']

'Austria'

Display only the sports in the dataset

In [119]:
# Hint. Ask yourself, are the sports used as the key or value in the dictionary?
sports.keys()

dict_keys(['Archery', 'Golf', 'Sumo', 'Ski', 'Drinking Beer', 'Taekwondo'])

Display only countries in the dataset

In [121]:
# Hint. Ask yourself, are the countries used as the key or value in the dictionary?
countries = sports.values()

In [122]:
countries

dict_values(['Bhutan', 'Scotland', 'Japan', 'Austria', 'Austria', 'South Korea'])

Again, and this time, please remove the duplicates.

In [123]:
#Hint. What convenient container type swallows duplicates to return only distinct values?
set(countries)

{'Austria', 'Bhutan', 'Japan', 'Scotland', 'South Korea'}

Show us which country loves to play Golf:

In [124]:
sports["Golf"]

'Scotland'

In [125]:
sports

{'Archery': 'Bhutan',
 'Golf': 'Scotland',
 'Sumo': 'Japan',
 'Ski': 'Austria',
 'Drinking Beer': 'Austria',
 'Taekwondo': 'South Korea'}

In [127]:
sports.keys()

dict_keys(['Archery', 'Golf', 'Sumo', 'Ski', 'Drinking Beer', 'Taekwondo'])

In [141]:
for key in sports.keys():
    country = sports[key]
    hobby = key
    if country == 'Austria':
        print('Austria loves ' + hobby)

Austria loves Ski
Austria loves Drinking Beer


What are the favourite sports of Austria?

In [144]:
# Hint: define a list comprehension using the dictionary view object returned by items() and filter on the v elements
[key for key, value in sports.items() if value == 'Austria']

['Ski', 'Drinking Beer']

In [146]:
sports
sports.pop('Ski')

'Austria'

In [149]:
sports

{'Archery': 'Bhutan',
 'Golf': 'Scotland',
 'Sumo': 'Japan',
 'Ski': 'Austria',
 'Drinking Beer': 'Austria',
 'Taekwondo': 'South Korea'}

It appears that we have an error in our data set. Could you please clean the data?

In [150]:
del sports['Drinking Beer']
sports

{'Archery': 'Bhutan',
 'Golf': 'Scotland',
 'Sumo': 'Japan',
 'Ski': 'Austria',
 'Taekwondo': 'South Korea'}

In [151]:
del sports
sports

NameError: name 'sports' is not defined

It seems like we forgot to add some data to our set. Would you please add the following pairs to the dictionary: USA = Football, India = Cricket, Baseball = Venezuela

In [154]:
sports # add other_sports

{'Archery': 'Bhutan',
 'Golf': 'Scotland',
 'Sumo': 'Japan',
 'Ski': 'Austria',
 'Drinking Beer': 'Austria',
 'Taekwondo': 'South Korea'}

Other sports we need to add.

In [155]:
other_sports = {'Soccer': 'Spain',
          'Golf': 'Scotland',
          'Baseball': 'USA',
          'Ski': 'Canada'}

Oops, now it seems like we need to figure out a different dataset in order to combine this. Any ideas?

In [156]:
# Combine the DataSet 
sports.update(other_sports)

In [161]:
sports['Ski'] = list()

In [162]:
sports['Ski'].append('Austria')

In [163]:
sports['Ski'].append('Canada')

In [165]:
sports['Ski']

['Austria', 'Canada']

## Lambda

You may have seen the keyword lambda appear in this week's content, and you'll certainly see it appear more as you spend more and more time with Python and data science. Lambda's are Python's way of creating anonymous functions. These are the same as other functions, but they have no name. The intent is that they're simple or short lived and it's easier just to write out the function in one line instead of going to the trouble of creating a named function. 

The lambda syntax is fairly simple. But it might take a bit of time to get used to. 

In [None]:
# You declare a lambda function with the word lambda followed by a list of arguments, 
# followed by a colon and then a single expression and this is key. 
# There's only one expression to be evaluated in a lambda. 
# The expression value is returned on execution of the lambda. 
# The return of a lambda is a function reference. 
# So in this case, you would execute my_function and pass in three different parameters. 
my_function = lambda a, b, c : a + b

In [None]:
my_function(1, 2, 3)

Note that you can't have default values for lambda parameters and you can't have complex logic inside of the lambda itself because you're limited to a single expression. 

Convert this to lambda

In [7]:
people = ['Dr. Christopher Brooks', 'Dr. Kevyn Collins-Thompson', 'Dr. VG Vinod Vydiswaran', 'Dr. Daniel Romero']

def split_title_and_name(person):
    return person.split()[0] + ' ' + person.split()[-1]

#option 1, traditional way without lambda
#for person in people:
#    f = (lambda p: p.split()[0] + ' ' + p.split()[-1])
#    n = f(person)
#    print(n)
    #print(split_title_and_name(person))
    #print(split_title_and_name(person) == (lambda person:???))

#option 2, how it could be done using lambda and map:


print(map(split_title_and_name, people))
#list(map(lambda p: p.split()[0] + ' ' + p.split()[-1], people))
#list(map(split_title_and_name, people)) == list(map(???))

<map object at 0x00000246E7CB03A0>


## List comprehension

Redefine our times_tables in a list comprehension

In [None]:
def times_tables():
    lst = []
    for i in range(10):
        for j in range (10):
            lst.append(i*j)
    return lst

times_tables() == [???]

In [3]:
[[y*x] for y in range(10) for x in range(10)]

[[0],
 [0],
 [0],
 [0],
 [0],
 [0],
 [0],
 [0],
 [0],
 [0],
 [0],
 [1],
 [2],
 [3],
 [4],
 [5],
 [6],
 [7],
 [8],
 [9],
 [0],
 [2],
 [4],
 [6],
 [8],
 [10],
 [12],
 [14],
 [16],
 [18],
 [0],
 [3],
 [6],
 [9],
 [12],
 [15],
 [18],
 [21],
 [24],
 [27],
 [0],
 [4],
 [8],
 [12],
 [16],
 [20],
 [24],
 [28],
 [32],
 [36],
 [0],
 [5],
 [10],
 [15],
 [20],
 [25],
 [30],
 [35],
 [40],
 [45],
 [0],
 [6],
 [12],
 [18],
 [24],
 [30],
 [36],
 [42],
 [48],
 [54],
 [0],
 [7],
 [14],
 [21],
 [28],
 [35],
 [42],
 [49],
 [56],
 [63],
 [0],
 [8],
 [16],
 [24],
 [32],
 [40],
 [48],
 [56],
 [64],
 [72],
 [0],
 [9],
 [18],
 [27],
 [36],
 [45],
 [54],
 [63],
 [72],
 [81]]

Here's a harder question which brings a few things together.

Many organizations have user ids which are constrained in some way. Imagine you work at an internet service provider and the user ids are all two letters followed by two numbers (e.g. aa49). Your task at such an organization might be to hold a record on the billing activity for each possible user.

Write an initialisation line as single list comprehension which creates a list of all possible user ids. Assume letters are all lower case

In [None]:
lowercase = 'abcdefghijklmnopqrstuvwxyz'
digits = '0123456789'

answer = [???]
correct_answer == answer

## Numpy

Numpy is a package widely used in the data science community which lets us work efficiently with arrays and matrices in Python. 

In [None]:
#First, let's import Numpy as np. 

Now let's make our first array. We can start by creating a list and converting it to an array. 

In [None]:
#Create a python list of numbers

With **np.array()** you can convert a list to a numpy array

In [None]:
#Convert your list to a numpy array

We can do it more succinctly by passing the list directly.

In [None]:
# pass list directly to the np.array method

Now let's make multidimensional arrays by passing in a list of lists. (Matrices). We pass in two lists with three elements each, and we get a two by three array. 

In [None]:
# np.array with 2 lists

We can check the dimensions by using the **shape** attribute. 

In [None]:
#check with shape

For the **arange** function, we pass in a start, a stop, and a step size, and it returns evenly spaced values within a given interval. 

In [None]:
n = np.arange(0,30,2)
n

So suppose we wanted to convert this array of numbers to a three by five array. We can use reshape to do that. 

In [None]:
#reshape (3,5)

Use **resize** to change the size of our array

In [None]:
#resize

**ones()** to return a matrix of ones. 
**zeros()** for zeros.
**eye()** for Identity Matrix

In [None]:
# call ones

You can use the times-operator **\*** to replicate items or use repeat to repeat.

In [None]:
#Replicate using *


### Operations

Performing elementwise addition, subtraction, multiplication, and division is straightforward, as is raising all the numbers of an array to a power. 

In [None]:
x = np.array([[1, 2, 3] * 2 ,[4, 5, 6]* 2] *2)
#try out different operations

For those familiar with linear algebra, the dot product can be done using the dot function. We can also take the transpose of an array using the t method, which swaps the rows and columns. 

In [None]:
# calculate a dot product and transpose

Numpy also has many useful math functions that we can use. Let's look at a few commonly used ones. 

In [None]:
#Create an array of random values using numpy.random.rand()
# random_matr = np.random.rand(3,2)

We can look the sum of the values in the array, the maximum and minimum, Or the mean and standard deviation. 

In [None]:
#try them out

To find the index of a maximum or minimum value, we can use argmax and argmin. 


In [None]:
# ;-)

#### Indexing and slicing just like with tuples and lists

In [None]:
r = np.arange(36)
r.resize([6,6])
print(r)

Get the value of the 2nd row and 2nd column

Now let's use colon notation to get a slice of the third row and columns three to six. We can also do something like get the first two rows and all of the columns except the last. 

Try out the '<' or '>' operator and figure out what happens

In [None]:
#reassign any value bigger 30