# Data Manipulation in Python

In [None]:
# No imports today



# Objectives

- Write functions to transform data
- Construct list and dictionary comprehensions
- Extract data from nested data structures

# Functions

This aspect of Python is _incredibly_ useful! Writing your own functions can save you a TON of work - by _automating_ it.

## Creating Functions

The first line will read:

```python

'def' function_name() ':'

```

Any arguments to the function will go in the parentheses.

Let's try building a function that will automate the task of finding how many times a given number can be evenly divided by 2.

In [None]:
# Let's code it!



## Calling Functions

To _call_ a function, simply type its name, along with any necessary arguments in parentheses.

In [None]:
# Let's call it!



## Default Argument Values

Sometimes we'll want the argument(s) of our function to have default values.

In [6]:
def cheers(person='aaron', job='data scientist', age=30):
    return f'Hooray for {person}. You\'re a {job} and you\'re {age}!'

In [7]:
cheers('greg', 'scientist', 130)

"Hooray for greg. You're a scientist and you're 130!"

In [8]:
cheers(job='scientist', age=130, person='greg')

"Hooray for greg. You're a scientist and you're 130!"

In [9]:
cheers('cristian', 'git enthusiast')

"Hooray for cristian. You're a git enthusiast and you're 30!"

In [10]:
cheers()

"Hooray for aaron. You're a data scientist and you're 30!"

# Lists

## List Methods

Make sure you're comfortable with the following list methods:

- `.append()`: adds the input element to the end of a list
- `.pop()`: removes and returns the element with input index from the list
- `.extend()`: adds the elements in the input iterable to the end of a list
- `.index()`: returns the first place in a list where the argument is found
- `.remove()`: removes element by value
- `.count()`: returns the number of occurrences of the input element in a list

Question: What's the difference between `.remove()` and `del`?

<details>
    <summary>
        Answer here
    </summary>
    .remove() removes an element by value;<br/>
    del removes an element by position

## List Comprehension

List comprehension is a handy way of generating a new list from existing iterables.

Suppose I start with a simple list.

In [3]:
primes = [2, 3, 5, 7, 11, 13, 17, 19]

What I want now to do is to build a new list that comprises doubles of primes. I can do this with list comprehension!

The syntax is: `[ f(x) for x in <iterable> if <condition>]`

In [4]:
prime_doubles = [x*2 for x in primes]
prime_triples = [x*3 for x in primes]

In [5]:
prime_doubles

[4, 6, 10, 14, 22, 26, 34, 38]

##### Aside: List Comprehensions Vs. `for`-Loops

Yes, I could do the same work with `for`-loops:

In [11]:
prime_doubles2 = []
for prime in primes:
    prime_doubles2.append(prime*2)
prime_doubles2

[4, 6, 10, 14, 22, 26, 34, 38]

In [12]:
prime_doubles == prime_doubles2

True

But list comprehensions are more efficient: The syntax is simpler, and they're also faster. Also, you'll see them in other people's code, so you'll have to know how to work with them!

### Another List Comprehension Example

I can use list comprehension to build a list from objects other than lists:

In [13]:
names = ('Alan Turing', 'Charles Babbage', 'Ada Lovelace',
        'Anita Borg', 'Steve Wozniak', 'Andrew Ng')

splits = [name.split() for name in names]
splits

[['Alan', 'Turing'],
 ['Charles', 'Babbage'],
 ['Ada', 'Lovelace'],
 ['Anita', 'Borg'],
 ['Steve', 'Wozniak'],
 ['Andrew', 'Ng']]

In [14]:
[name1[0]+'. '+name2[0]+'.' for (name1, name2) in splits]

['A. T.', 'C. B.', 'A. L.', 'A. B.', 'S. W.', 'A. N.']

### Exercises

1. Use a list comprehension to extract the odd numbers from this set:

In [15]:
nums = set(range(1000))

In [20]:
odd_nums= [x for x in nums if x %2 !=0]
print(odd_nums)

[1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119, 121, 123, 125, 127, 129, 131, 133, 135, 137, 139, 141, 143, 145, 147, 149, 151, 153, 155, 157, 159, 161, 163, 165, 167, 169, 171, 173, 175, 177, 179, 181, 183, 185, 187, 189, 191, 193, 195, 197, 199, 201, 203, 205, 207, 209, 211, 213, 215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 251, 253, 255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 291, 293, 295, 297, 299, 301, 303, 305, 307, 309, 311, 313, 315, 317, 319, 321, 323, 325, 327, 329, 331, 333, 335, 337, 339, 341, 343, 345, 347, 349, 351, 353, 355, 357, 359, 361, 363, 365, 367, 369, 371, 373, 375, 377, 379, 381, 383, 385, 387, 389, 391, 393, 395, 397, 399, 401, 403, 405, 407, 409, 411, 413, 415, 417, 419, 421,

<details>
    <summary>Answer
    </summary>
    <code>[num for num in nums if num % 2 == 1]</code>
    </details>

2. Use a list comprehension to take the first character of each string from the following list of words.

In [21]:
words = ['carbon', 'osmium', 'mercury', 'potassium', 'rhenium', 'einsteinium',
        'hydrogen', 'erbium', 'nitrogen', 'sulfur', 'iodine', 'oxygen', 'niobium']

In [24]:
first_chars=[word[0]for word in words]
print(first_chars)

['c', 'o', 'm', 'p', 'r', 'e', 'h', 'e', 'n', 's', 'i', 'o', 'n']


In [27]:
last_chars=[word[-1]for word in words]
print(last_chars)

['n', 'm', 'y', 'm', 'm', 'm', 'n', 'm', 'n', 'r', 'e', 'n', 'm']


3. Use a list comprehension to build a list of all the names that start with 'R' from the following list. Add a '?' to the end of each name.

In [30]:
names = ['Randy', 'Robert', 'Alex', 'Ranjit', 'Charlie', 'Richard', 'Ravdeep',
        'Vimal', 'Wu', 'Nelson']

In [33]:
filtered_names=[name +"?" for name in names if name[0]=="R"]
print(filtered_names)

['Randy?', 'Robert?', 'Ranjit?', 'Richard?', 'Ravdeep?']


In [38]:
filtered_names=[name +"?" for name in names if name.startswith("R")]
print(filtered_names)

['Randy?', 'Robert?', 'Ranjit?', 'Richard?', 'Ravdeep?']


# Dictionaries

## Dictionary Methods

Make sure you're comfortable with the following dictionary methods:

- `.keys()`: returns an array of the dictionary's keys
- `.values()`: returns an array of the dictionary's values
- `.items()`: returns an array of key-value tuples

## Dictionary Comprehension

Much like list comprehension, I can use dictionary comprehension to build dictionaries from existing iterables.

In [40]:
my_dict = {'who': 'flatiron school', 'what': 'data science',
           'when': 'now', 'where': 'here', 'why': '$',
           'how': 'python'}

Remember that the `.items()` method will return a collection of doubles:

In [41]:
my_dict.items()

dict_items([('who', 'flatiron school'), ('what', 'data science'), ('when', 'now'), ('where', 'here'), ('why', '$'), ('how', 'python')])

So I can use a pair of variables to range over it:

In [42]:
{k: v + '!' for k, v in my_dict.items() if k.startswith('w')}

{'who': 'flatiron school!',
 'what': 'data science!',
 'when': 'now!',
 'where': 'here!',
 'why': '$!'}

The same thing works for any collections of doubles:

In [43]:
{k**2: v**2 for k, v in [(0, 1), (2, 3), (4, 5)]}

{0: 1, 4: 9, 16: 25}

### `zip`

Remember that `zip` is a handy way of pairing up two or more iterables:

In [44]:
dict(zip(range(5), ['apple', 'orange', 'banana', 'lime', 'blueberry']))

{0: 'apple', 1: 'orange', 2: 'banana', 3: 'lime', 4: 'blueberry'}

In [45]:
# Zipping multiple iterables together
tuple(zip(range(1, 5), 'a'*4, 'b'*4, 'c'*4, 'd'*4, 'e'*4))

((1, 'a', 'b', 'c', 'd', 'e'),
 (2, 'a', 'b', 'c', 'd', 'e'),
 (3, 'a', 'b', 'c', 'd', 'e'),
 (4, 'a', 'b', 'c', 'd', 'e'))

#### Dictionary Comprehension Using `zip`

In [46]:
{k: v for k, v in zip(range(5), range(0, 10, 2))}

{0: 0, 1: 2, 2: 4, 3: 6, 4: 8}

In [47]:
scores = [.858, .873, .868]
{'model' + str(j+1): scores[j] for j in range(3)}

{'model1': 0.858, 'model2': 0.873, 'model3': 0.868}

### Exercises

1. Use a dictionary comprehension to pair up the countries in the first list with their corresponding capitals in the second list:

In [48]:
list1 = ['USA', 'France', 'Canada', 'Thailand']
list2 = ['Washington', 'Paris', 'Ottawa', 'Bangkok']

In [49]:
{k:v for k,v in zip(list1,list2)}

{'USA': 'Washington',
 'France': 'Paris',
 'Canada': 'Ottawa',
 'Thailand': 'Bangkok'}

<details>
<summary>Answer
    </summary>
    <code>{country: capital for (country, capital) in zip(list1, list2)}</code> <br/> OR <br/>
    <code>dict(zip(list1, list2))</code>
    </details>

2. Use a dictionary comprehension to make each of the characters in the following list a key with the value 'fictional character'.

In [55]:
chars = ['Pinocchio', 'Gilgamesh', 'Kumar Patel', 'Toby Flenderson']

In [60]:
{name:'fictional characters' for name in chars}
 

{'Pinocchio': 'fictional characters',
 'Gilgamesh': 'fictional characters',
 'Kumar Patel': 'fictional characters',
 'Toby Flenderson': 'fictional characters'}

<details>
    <summary>Answer</summary>
    <code>{char: 'fictional character' for char in chars}</code>
    </details>

# Nesting

Just as we can put lists and dictionaries inside of other lists and dictionaries, we can also put comprehensions inside of other comprehensions.

In [61]:
lists = [['morning', 'afternoon', 'night'], ['read', 'code', 'sleep']]

In [62]:
[[item[0] for item in small_list] for small_list in lists]

[['m', 'a', 'n'], ['r', 'c', 's']]

## Nested Structures

It will be well worth your while to practice accessing data in complex structures. Consider the following:

In [63]:
customers = {
    'bill': {'purchases': {'movies': ['Terminator', 'Elf'],
                     'books': []}, 'id': 1},
            'dolph': {'purchases': {'movies': ['It Happened One Night'],
                     'books': ['The Far Side Gallery']}, 'id': 2},
            'pat': {'purchases': {'movies': [],
                   'books': ['Seinfeld and Philosophy', 'I Am a Bunny']},
                   'id': 3}
}

**Q**: How would we access 'I Am a Bunny'?
<br/>
**A**: The outermost "layer" has a name: 'customers', and that object is a dictionary:
<br/>
`customers`
<br/>
The key we are interested in is 'pat', since that's where 'I Am a Bunny' is located:
<br/>
`customers['pat']`
<br/>
The value corresponding to the key 'pat' is also a dictionary, and in this "lower-down" dictionary, the key we are interested in is 'purchases':
<br/>
`customers['pat']['purchases']`
<br/>
The value corresponding to the key 'purchases' is yet another dictionary, and here the key of interest is `books`:
<br/>
`customers['pat']['purchases']['books']`
<br/>
The value corresponding to the key 'books' is a list, and 'I Am a Bunny' is the second element in that list:
<br/>
`customers['pat']['purchases']['books'][1]`

In [64]:
customers['pat']['purchases']['books'][1]

'I Am a Bunny'

## Exercises

1. From the list below, make a list of dictionaries where the key is the person's name and the value is the person's home phone number.

In [None]:
phone_nos = [{'name': 'greg', 'nums': {'home': 1234567, 'work': 7654321}},
          {'name': 'max', 'nums': {'home': 9876543, 'work': 1010001}},
            {'name': 'erin', 'nums': {'home': 3333333, 'work': 4444444}},
            {'name': 'joél', 'nums': {'home': 2222222, 'work': 5555555}},
            {'name': 'sean', 'nums': {'home': 9999999, 'work': 8888888}}]

<details>
    <summary>Answer</summary>
    <code>[{item['name']: item['nums']['home']} for item in phone_nos]</code>
    </details>

2. From the customers dictionary above, build a dictionary where the customers' names are the keys and the movies they've bought are the values.

In [65]:
phone_nos = [
    {'name': 'greg', 'nums': {'home': 1234567, 'work': 7654321}},
    {'name': 'max', 'nums': {'home': 9876543, 'work': 1010001}},
    {'name': 'erin', 'nums': {'home': 3333333, 'work': 4444444}},
    {'name': 'joél', 'nums': {'home': 2222222, 'work': 5555555}},
    {'name': 'sean', 'nums': {'home': 9999999, 'work': 8888888}}
]

home_phone_list = [{person['name']: person['nums']['home']} for person in phone_nos]

print(home_phone_list)


[{'greg': 1234567}, {'max': 9876543}, {'erin': 3333333}, {'joél': 2222222}, {'sean': 9999999}]


<details>
    <summary>Answer</summary>
    <code>{customer: customers[customer]['purchases']['movies'] for customer in customers.keys()}</code> <br/>
    OR <br/>
    <code>{k: v['purchases']['movies'] for k, v in customers.items()}</code>
    </details>

# More Exercises

1. Build a function that will return $2^n$ for an input $n$.

In [67]:
# defining a function
def power_of_two(n):
    return 2**n

<details>
    <summary>Answer</summary>
    <code>
def expo(n):
    return 2**n</code>
    </details>

2. Build a function that will take in a list of phone numbers as strings and return the same as integers, removing any parentheses ('(' and ')'), hyphens ('-'), and spaces.

In [None]:
x=["7","8","9","10"]
print(int(x))

<details>
    <summary>Answer</summary>
    <code>
def int_phone(string_list):
    return [int(string.replace('(', '').replace(')', '').replace('-', '').replace(' ', ''))\
    for string in string_list]</code>
    </details>

3. Build a function that returns the mode of a list of numbers.

In [69]:
def mode(lst):
    counts = {num: lst.count(num) for num in lst}
    return [num for num in counts.keys() if counts[num] == max(counts.values())]

<details>
    <summary>Answer</summary>
        <code>
def mode(lst):
    counts = {num: lst.count(num) for num in lst}
    return [num for num in counts.keys() if counts[num] == max(counts.values())]</code>
    </details>