# LO3 - Data Manipulation

## Data Storage Types

In LO2 we worked with strings and numbers, now we turn out attention to the next logical thing: data structure that allow us to build more complicated functionality.  Python has several built in container types to support most use cases.  They include:
1. List
2. Tuple
3. Dictionary
4. Set
5. Frozen Set

Of these tuple and frozen set are immutable, that is - they can not be changed after they have been created.

To create an instance of any of these containers, we call its name as a constructor with no arguments, and we get an empty instance (which is not very useful for immutable types).

list()
dict()
tuple()
set()

### Lists


A list is an ordered sequence of zero or more objects, which can be of different types.  It is often created by[] around a comma seperated list of values:

In [1]:
a = ['bacon','ham', 3.14, 4, [10,20,30]]

In [2]:
fruit = ['Lime', 'Orange', 'Pear', 'Apple']

In [3]:
[['ax', 'bx', 'cx', 'dx', 'ex'], ['ay', 'by', 'cy', 'dy', 'ey'], ['az', 'bz', 'cz', 'dz', 'ez']]

[['ax', 'bx', 'cx', 'dx', 'ex'],
 ['ay', 'by', 'cy', 'dy', 'ey'],
 ['az', 'bz', 'cz', 'dz', 'ez']]

There are several ways to add or remove items from a list

In [4]:
fruit.append('Cherry')
print(fruit)

['Lime', 'Orange', 'Pear', 'Apple', 'Cherry']


In [5]:
fruit.insert(2, 'Mango')
print(fruit)

['Lime', 'Orange', 'Mango', 'Pear', 'Apple', 'Cherry']


In [6]:
fruit.append(['Grapes', 'Banana'])
print(fruit)

['Lime', 'Orange', 'Mango', 'Pear', 'Apple', 'Cherry', ['Grapes', 'Banana']]


In [7]:
fruit.extend(['Cherry', 'Banana'])
print(fruit)

['Lime', 'Orange', 'Mango', 'Pear', 'Apple', 'Cherry', ['Grapes', 'Banana'], 'Cherry', 'Banana']


In [8]:
fruit.remove('Cherry')
print(fruit)

['Lime', 'Orange', 'Mango', 'Pear', 'Apple', ['Grapes', 'Banana'], 'Cherry', 'Banana']


In [11]:
fruit.pop()
print(fruit)

['Lime', 'Orange', 'Pear', 'Apple', ['Grapes', 'Banana']]


In [10]:
fruit.pop(2)
print(fruit)

['Lime', 'Orange', 'Pear', 'Apple', ['Grapes', 'Banana'], 'Cherry']


There are other methods available to gather infomration about the state of the list...

In [12]:
len(fruit)

5

In [14]:
'Apple' in fruit

True

In [None]:
'Apple' not in fruit

In [15]:
fruit.count('Apple')

1

In [None]:
fruit.index('Apple')

In [None]:
fruit.index('Apple',1)

## List Comprehension

Python makes it easy to work with lists.  One of the most common uses of a list is to iterate over it's elements with a for loop.  This can be accomplished easily with a list comprehension.

In [16]:
a = [i for i in range(10)]
print(a)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


In [17]:
b = [i**2 for i in range(10)]
print(b)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


In [18]:
c = [[i, i**2, i**3] for i in range(10)]
print(c)

[[0, 0, 0], [1, 1, 1], [2, 4, 8], [3, 9, 27], [4, 16, 64], [5, 25, 125], [6, 36, 216], [7, 49, 343], [8, 64, 512], [9, 81, 729]]


In [19]:
d = [[i, i**2, i**3] for i in range(10) if i % 2]
print(d)

[[1, 1, 1], [3, 9, 27], [5, 25, 125], [7, 49, 343], [9, 81, 729]]


### In Class Exercise

Create a list **comprehension** that produces this output: 
[['ax', 'bx', 'cx', 'dx', 'ex'], ['ay', 'by', 'cy', 'dy', 'ey'], ['az', 'bz', 'cz', 'dz', 'ez']]

Please do not using the method we discussed early. 

Hint - you will have to nest two list comprehension together.


In [20]:
# Cell is for in class exercise

#Remove answer before provideing to students! 
e = [[i+j for i in 'abcde'] for j in 'xyz']

## Sorting and Reordering

In [21]:
fruit.remove(['Grapes', 'Banana'])
print(fruit)

['Lime', 'Orange', 'Pear', 'Apple']


In [22]:
sorted_fruit = sorted(fruit)
print(sorted_fruit)

['Apple', 'Lime', 'Orange', 'Pear']


In [23]:
sorted_fruit == fruit

False

In [24]:
fruit.sort()
sorted_fruit == fruit

True

### Reversing a List

In [25]:
r_fruit = list(reversed(fruit))
print(r_fruit)

['Pear', 'Orange', 'Lime', 'Apple']


In [26]:
fruit.reverse()
print(fruit)

['Pear', 'Orange', 'Lime', 'Apple']


In [27]:
r_fruit == fruit

True

In [None]:
sorted(r_fruit, reverse=True)

## Tuples

Like a list, a tuple is a ordered sequence of zero or more objects (type agnostic).  Unlike a list, a tuple is immutable which means it can not be modified once created - so it will lack methods such as append, sort, etc

In [34]:
a = (1,2,'three, four')

In [32]:
len(a)

3

In [33]:
sorted(a) #note you can't sort multiple types

TypeError: '<' not supported between instances of 'str' and 'int'

In [None]:
a.index(2)

In [None]:
a.count(2)

In [None]:
b = '5','6','7' #alternate way of declaring a tuple
type(b)

In [None]:
c_raw = '1'
c_tuple = '1',
c_raw == c_tuple

In [None]:
d_raw = ('d')
d_tuple = ('d',)
d_raw == d_tuple

## Index and Slice Notation

While solving real world problems you will often want to grab a subsection of list, or tuple, etc.  In Python, ordered data structure (lists, tuple, str, etc) can be easily sliced and indexed using...the slice and index notation.  Remember indexes in python start at 0.

In [None]:
animals = ['tiger', 'lions', 'bears', 'monkey', 'human']

In [None]:
animals[1]

In [None]:
animals[1] = 'chimp' #pull that up Jamie...
print(animals)

In [None]:
animals[1:3]

In [None]:
animals[:3] #starts at the begining

In [None]:
animals[2:] #goes to the end

In [None]:
animals[-2:]

In [None]:
animals[1:6:2] #optional step parameter

### In Class Exercise

1. Show that using the step funtion of -1 on the animals list is equivalent to using the list and reversed functions on the same list.

In [None]:
#Answer - Remove before giving to students
animals[::-1] == list(reversed(animals))

## Dictionaries

A dicitonary houses key - value pairs.  They keys in a dictionary must be immutable and unique.  The values can be of any type.  

In [None]:
music = {"rock": 11, "jazz": 2}
music['rap'] = 7
del music['jazz']
'rock' in music

In [None]:
'jazz' in music

In [None]:
11 in music # can only index on keys

In [None]:
music['rock']

There are several additonal methods to make Dictionaries easy to work with...3

In [None]:
music.items()

In [None]:
music.keys()

In [None]:
music.values()

In [None]:
music.get('rock')

In [None]:
music.get('rock')

In [None]:
music.clear()
print(music)

## Sets

A set is a data structure that can only contain unique objects.  Adding something that already exists in a set does nothing (but also does not cause an error)

In [None]:
numbers = set([1,1,1,1,1,3,3,3,3,3,2,2,2,2,3,3,3,4])
letters = set('TheQuickBrownFoxJumpedOverTheLazyDog'.lower())
numbers.add(4)
print(numbers)

In [None]:
numbers.add(5)
print(numbers)

In [None]:
numbers.update([3,4,5,6,7])
print(numbers)

In [None]:
numbers.pop()

In [None]:
print(numbers)

In [None]:
numbers.remove(7)

In [None]:
print(numbers)

You can do Union, Intersection, and symmetric difference on sets...

In [None]:
house_pets = {'dog', 'cat', 'fish'}
farm_animals = {'cow', 'sheep', 'pig', 'dog', 'cat'}

house_pets & farm_animals

In [None]:
house_pets | farm_animals

In [None]:
house_pets ^ farm_animals # symmetric difference

In [None]:
house_pets - farm_animals # asymmetric difference

### In Class Exercise

1. If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3,5,6 and 9.  The sum of these mutliples is 23.  Find the sum of all the multiples between 3 or 5 below 1000.

In [None]:
# Cell for in class exercise - remove answer before giving to students

sum = 0

for i in range(1000):
    if i%3 == 0 or i%5 == 0:
            sum += i
            print(i)
            
print(sum)

## File IO

Easy to do in Python!

In [None]:
myfile = open('data.txt', 'w')
myfile.write("writing data to file")
myfile.close()

data.txt should be saved to the folder from where you are running this note book - go check!

Note - opening a file that already exists will erase the original file!

There are only a few modes which are commonly used
1. w - writing
2. r - reading
3. a - appending
4. r+ - reading and writing
5. b - binary mode

### File-like Objects

With the amount of RAM available in modern computers there could be a big performance gain in your applicaiton by using a file-like object.3

In [None]:
import io
mystringfile = io.StringIO()
mystringfile.write("This is my data!")
mystringfile.read() # cursor is at the end!3

In [None]:
mystringfile.seek(0) # put the cursor back at the start
mystringfile.read()

### In Class Exercise

Write a word count funtion that takes a text file and returns a dictionary that contains the count for each word.  Remove all punctuation except apostrophe.  Lowercase all words.

In [None]:
# Cell is for all In Class Exercise


#Remove answer before giving to students
myfile = open('data.txt', 'r')

mydata = myfile.read()
mydict = {}

for i in mydata.split():
    if i in mydict:
        mydict[i] += 1
    else:
        mydict[i] = 1
        
    
print(mydict)

## Functional Programming

At a basic level there are two types of programming paradigms:
1. imperative or procedural programming (everything in this course so far)
2. declarative or functional programming

Imperative programming focuses on telling a computer how to change a program's state step by step.  Most programming is this paragdigm.  It's tightly aligned to how a computer actually works physically.  These instructions can be organized into functions and objects, but both of these improvements remain imperative at heart.

Declarative programming, focuses on expressing what the program should do - not necessarily how it should be done.  Funcitonal programming is the most common style of declarative programming.  It treats a program as if it is made up of mathematical style functions: for a give input, x, applying function f will always give you the same output f(x), and x itself will remain unchanged afterwards.

The primary distinction between procedural and functional programming is: a procedural function may have side effect - it may change the state of it's inputs or something outside of itself, giving you a different result when you run it a second time.  Functional programming avoids side effects, ensuring that functions don't modify anything outside of themselves.

### An Example

The below funtion is NOT functional:

In [None]:
a = 0
def increment():
    global a
    a += 1
    
print(a)

increment()

print(a)

This function is functional

In [None]:
def increment(a):
    return a + 1

## Map Reduce

A map is a function that takes two arguments: another function and a collection of items.  It will:
1. Run the function on each item of the original collection
2. Return a new collection containing the results
3. Leave the original collection unchanged

In Python the collection must simply be iterable (list, tuple, string)

In [None]:
#Example 1: String Lengths
string_lengths = map(len, ["dog", "cat", "zebra", "turtle"])
print(list(string_lengths))

Example 2: Cubing

In [None]:
cubes = map(lambda x: x**3, [0,1,2,3,4])
print(list(cubes))

## Lambdas

A Lambda lets you define and use an unnamed function.  Arguments fit between the lambda and the colon while the stuff after the colon gets implicity returned (ie without an explicit return statement)

Lambdas are most useful when:
1. your function is simple
2. you only need to use it once

Consider the usual way:

In [None]:
def cube(x):
    return x**3

print(cube(4))

In [None]:
# we could have done this instead
cubes = map(cube, [0,1,2,3,4])
print(list(cubes))

### In Class Exercise

Write a lambda to add 5 to a list of numbers

In [None]:
#Remove before handing out to students
func = lambda x: x+5
print(list(map(func, [1,2,3,4,5,6])))

Now back to map - Here is a procedural way to take a list of real names and assign them a random code name

In [None]:
import random

names = ['Jared', 'Gavin', 'Walter', 'Mike', 'Brett', 'Hugh']
code_names = ['Eagle', 'Hawk', 'Seagull', 'Heron', 'Sparrow', 'Raven']

for i in range(len(names)):
    names[i] = random.choice(code_names)

print(names)

### In Class Exercise

Rewrite the above function functionally (use map and lambda)

In [None]:
#Remove before supplying to students

names = ['Jared', 'Gavin', 'Walter', 'Mike', 'Brett', 'Hugh']

covernames = map(lambda x: random.choice(['Eagle', 'Hawk', 'Seagull', 'Heron', 'Sparrow', 'Raven']), names)
                 
print(list(covernames))

## Reduce

Reduce is a counter part to map.  Given a funciton and a collection of items, it uses the function to combine then into a single value and returns that result.

The function passed to reduce has some restictions.  It must take two arguments: an accumulator and an update value.  The update value is like it was before with map; it will get set to each item in the collection one by one.  The accumulator is new.  It recieves the output from the previous function call, thus "accumulating" the combined value from item to item through the collection.

In [None]:
import functools
sum = functools.reduce(lambda a, x: a + x, [0,1,2,3,4])
print(sum)