# Data Manipulation with Python

## Learning Goals

- Construct list and dictionary comprehensions
- Extract data from nested data structures
- Write functions to transform data

## Lists

### List Methods

Make sure you're comfortable with the following list methods:

- `.append()`: adds the input element to the end of a list
- `.pop()`: removes and returns the element with input index from the list
- `.extend()`: adds the elements in the input iterable to the end of a list
- `.index()`: returns the first place in a list where the argument is found
- `.remove()`: removes element by value
- `.count()`: returns the number of occurrences of the input element in a list

Question: What's the difference between `.remove()` and `del`?

<details>
    <summary>
        Answer here
    </summary>
    .remove() removes an element by value;<br/>
    del removes an element by position

### List Comprehension

List comprehension is a handy way of generating a new list from existing iterables.

Suppose I start with a simple list.

In [1]:
primes = [2, 3, 5, 7, 11, 13, 17, 19]

What I want now to do is to build a new list that comprises doubles of primes. I can do this with list comprehension!

The syntax is: `[ f(x) for x in <iterable> if <condition>]`

In [2]:
prime_doubles = [x*2 for x in primes]
prime_triples = [x*3 for x in primes]

In [3]:
prime_doubles

[4, 6, 10, 14, 22, 26, 34, 38]

#### Aside: List Comprehensions Vs. `for`-Loops

Yes, I could do the same work with `for`-loops:

In [4]:
prime_doubles2 = []
for prime in primes:
    prime_doubles2.append(prime*2)
prime_doubles2

[4, 6, 10, 14, 22, 26, 34, 38]

But list comprehensions are more efficient: The syntax is simpler, and they're also faster. Also, you'll see them in other people's code, so you'll have to know how to work with them!

#### End of Aside

I can use list comprehension to build a list from objects other than lists:

In [6]:
my_dict = dict(zip(range(5), 'aeiou'))
my_dict
[v for k, v in my_dict.items() if k % 4 == 0]

['a', 'u']

In [7]:
names = ['Alan Turing', 'Charles Babbage', 'Ada Lovelace',
        'Anita Borg', 'Steve Wozniak', 'Andrew Ng']

splits = [name.split() for name in names]

[name1[0]+'. '+name2[0]+'.' for (name1, name2) in splits]

['A. T.', 'C. B.', 'A. L.', 'A. B.', 'S. W.', 'A. N.']

### Exercises

1. Use a list comprehension to extract the odd numbers from this set:

In [12]:
nums = set(range(1000))
odds = [x for x in nums if x % 2 == 1]
len(odds)

500

<details>
    <summary>Answer
    </summary>
    <code>[num for num in nums if num % 2 == 1]</code>
    </details>

2. Use a list comprehension to take the first character of each string from the following list of words.

In [15]:
words = ['carbon', 'osmium', 'mercury', 'potassium', 'rhenium', 'einsteinium',
        'hydrogen', 'erbium', 'nitrogen', 'sulfur', 'iodine', 'oxygen', 'niobium']
initials = [word[0].upper() for word in words]
initials

['C', 'O', 'M', 'P', 'R', 'E', 'H', 'E', 'N', 'S', 'I', 'O', 'N']

<details>
    <summary>Answer
    </summary>
    <code>[word[0] for word in words]</code>
    </details>

3. Use a list comprehension to build a list of all the names that start with 'R' from the following list. Add a '?' to the end of each name.

In [18]:
names = ['Randy', 'Robert', 'Alex', 'Ranjit', 'Charlie', 'Richard', 'Ravdeep',
        'Vimal', 'Wu', 'Nelson']
r_names = [name + "?" for name in names if name[0] == "R"]
r_names

['Randy?', 'Robert?', 'Ranjit?', 'Richard?', 'Ravdeep?']

<details>
<summary>Answer
    </summary>
    <code>[name+'?' for name in names if name[0] == 'R']</code>
    </details>

## Dictionaries

### Dictionary Methods

Make sure you're comfortable with the following dictionary methods:

- `.keys()`: returns an array of the dictionary's keys
- `.values()`: returns an array of the dictionary's values
- `.items()`: returns an array of key-value tuples

### Dictionary Comprehension

Much like list comprehension, I can use dictionary comprehension to build dictionaries from existing iterables.

In [24]:
my_dict = {'who': 'flatiron school', 'what': 'data science',
           'when': 'now', 'where': 'here', 'why': '$',
           'how': 'python'}

Remember that the `.items()` method will return a collection of doubles:

In [25]:
my_dict.items()

dict_items([('who', 'flatiron school'), ('what', 'data science'), ('when', 'now'), ('where', 'here'), ('why', '$'), ('how', 'python')])

So I can use a pair of variables to range over it:

In [26]:
{k: v + '!' for k, v in my_dict.items() if k.startswith('w')}

{'who': 'flatiron school!',
 'what': 'data science!',
 'when': 'now!',
 'where': 'here!',
 'why': '$!'}

The same thing works for any collections of doubles:

In [27]:
{k**2: v**2 for k, v in [(0, 1), (2, 3), (4, 5)]}

{0: 1, 4: 9, 16: 25}

#### `zip`
Remember that `zip` is a handy way of pairing up two or more iterables:

In [28]:
dict(zip(range(5), ['apple', 'orange', 'banana', 'lime', 'blueberry']))

{0: 'apple', 1: 'orange', 2: 'banana', 3: 'lime', 4: 'blueberry'}

In [29]:
tuple(zip(range(1, 5), 'a'*4, 'b'*4, 'c'*4, 'd'*4, 'e'*4))

((1, 'a', 'b', 'c', 'd', 'e'),
 (2, 'a', 'b', 'c', 'd', 'e'),
 (3, 'a', 'b', 'c', 'd', 'e'),
 (4, 'a', 'b', 'c', 'd', 'e'))

#### Dictionary Comprehension Using `zip`

In [30]:
{k: v for k, v in zip(range(5), range(0, 10, 2))}

{0: 0, 1: 2, 2: 4, 3: 6, 4: 8}

In [31]:
scores = [.858, .873, .868]
{'model' + str(j+1): scores[j] for j in range(3)}

{'model1': 0.858, 'model2': 0.873, 'model3': 0.868}

### Exercises

1. Use a dictionary comprehension to pair up the countries in the first list with their corresponding capitals in the second list:

In [20]:
list1 = ['USA', 'France', 'Canada', 'Thailand']
list2 = ['Washington', 'Paris', 'Ottawa', 'Bangkok']

capitals = {k: v for k, v in zip(list1, list2)}
capitals

{'USA': 'Washington',
 'France': 'Paris',
 'Canada': 'Ottawa',
 'Thailand': 'Bangkok'}

<details>
<summary>Answer
    </summary>
    <code>{country: capital for (country, capital) in zip(list1, list2)}</code>
    </details>

2. Use a dictionary comprehension to make each of the characters in the following list a key with the value 'fictional character'.

In [37]:
chars = ['Pinocchio', 'Gilgamesh', 'Kumar Patel', 'Toby Flenderson']
characters = {name: 'fictional character' for name in chars}
characters

{'Pinocchio': 'fictional character',
 'Gilgamesh': 'fictional character',
 'Kumar Patel': 'fictional character',
 'Toby Flenderson': 'fictional character'}

<details>
    <summary>Answer</summary>
    <code>{char: 'fictional character' for char in chars}</code>
    </details>

## Nesting

Just as we can put lists and dictionaries inside of other lists and dictionaries, we can also put comprehensions inside of other comprehensions.

In [39]:
lists = [['morning', 'afternoon', 'night'], ['read', 'code', 'sleep']]

In [40]:
[[item[0] for item in small_list] for small_list in lists]

[['m', 'a', 'n'], ['r', 'c', 's']]

### Nested Structures

It will be well worth your while to practice accessing data in complex structures. Consider the following:

In [41]:
customers = {
    'bill': {'purchases': {'movies': ['Terminator', 'Elf'],
                     'books': []}, 'id': 1},
            'dolph': {'purchases': {'movies': ['It Happened One Night'],
                     'books': ['The Far Side Gallery']}, 'id': 2},
            'pat': {'purchases': {'movies': [],
                   'books': ['Seinfeld and Philosophy', 'I Am a Bunny']},
                   'id': 3}
}

**Q**: How would we access 'I Am a Bunny'?
<br/>
**A**: The outermost "layer" has a name: 'customers', and that object is a dictionary:
<br/>
`customers`
<br/>
The key we are interested in is 'pat', since that's where 'I Am a Bunny' is located:
<br/>
`customers['pat']`
<br/>
The value corresponding to the key 'pat' is also a dictionary, and in this "lower-down" dictionary, the key we are interested in is 'purchases':
<br/>
`customers['pat']['purchases']`
<br/>
The value corresponding to the key 'purchases' is yet another dictionary, and here the key of interest is `books`:
<br/>
`customers['pat']['purchases']['books']`
<br/>
The value corresponding to the key 'books' is a list, and 'I Am a Bunny' is the second element in that list:
<br/>
`customers['pat']['purchases']['books'][1]`

In [42]:
customers['pat']['purchases']['books'][1]

'I Am a Bunny'

### Exercises

1. From the list below, make a list of dictionaries where the key is the person's name and the value is the person's home phone number.

In [54]:
phone_nos = [{'name': 'greg', 'nums': {'home': 1234567, 'work': 7654321}},
          {'name': 'max', 'nums': {'home': 9876543, 'work': 1010001}},
            {'name': 'erin', 'nums': {'home': 3333333, 'work': 4444444}},
            {'name': 'joél', 'nums': {'home': 2222222, 'work': 5555555}},
            {'name': 'sean', 'nums': {'home': 9999999, 'work': 8888888}}]

home_phones = [{item['name'].title(): 
                str(item['nums']['home'])[:3] + '-' +
                str(item['nums']['home'])[4:7]} 
               for item in phone_nos]
home_phones

[{'Greg': '123-567'},
 {'Max': '987-543'},
 {'Erin': '333-333'},
 {'Joél': '222-222'},
 {'Sean': '999-999'}]

<details>
    <summary>Answer</summary>
    <code>[{item['name']: item['nums']['home']} for item in phone_nos]</code>
    </details>

2. From the customers dictionary above, build a dictionary where the customers' names are the keys and the movies they've bought are the values.

In [61]:
movie_purchases = {customer.title(): customers[customer]['purchases']['movies'] for customer in customers.keys()}
movie_purchases

{'Bill': ['Terminator', 'Elf'], 'Dolph': ['It Happened One Night'], 'Pat': []}

<details>
    <summary>Answer</summary>
    <code>{customer: customers[customer]['purchases']['movies'] for customer in customers.keys()}</code>
    </details>

## Functions

This aspect of Python is _incredibly_ useful! Writing your own functions can save you a TON of work - by _automating_ it.

### Creating Functions

The first line will read:

'def' + _your function's name_ + '( )' + ':'

Any arguments to the function will go in the parentheses.

Let's try building a function that will automate the task of finding how many times a given number can be evenly divided by 2.

In [87]:
def divide_by_two_repeatedly(num):
    counter = 0
    while num % 2 == 0:
        num /=2
        counter += 1
    return counter







### Calling Functions

To _call_ a function, simply type its name, along with any necessary arguments in parentheses.

In [89]:
divide_by_two_repeatedly(64)

6

### Default Argument Values

Sometimes we'll want the argument(s) of our function to have default values.

In [83]:
def cheers(person='aaron', job='data scientist', age=30):
    return f'Hooray for {person.title()}. You\'re a {job} and you\'re {str(age)}!'

In [84]:
cheers('greg', 'scientist', 130)

"Hooray for Greg. You're a scientist and you're 130!"

In [85]:
cheers('cristian', 'git enthusiast', 93)

"Hooray for Cristian. You're a git enthusiast and you're 93!"

In [86]:
cheers()

"Hooray for Aaron. You're a data scientist and you're 30!"

### Exercises

1. Build a function that will return $2^n$ for an input $n$.

In [94]:
def two_to_the_power_of(power):
    return f'Two to the power of {str(power)} is {str(2 ** power)}!'

two_to_the_power_of(0.5)

'Two to the power of 0.5 is 1.4142135623730951!'

<details>
    <summary>Answer</summary>
    <code>
def expo(n):
    return 2**n</code>
    </details>

2. Build a function that will take in a list of phone numbers as strings and return the same as integers, removing any parentheses ('(' and ')'), hyphens ('-'), and spaces.

In [None]:
def tidy_phone_nos(list_of_phone_numbers):
    

<details>
    <summary>Answer</summary>
    <code>
def int_phone(string):
    return int(string.replace('(', '').replace(')', '').replace('-', '').replace(' ', ''))</code>
    </details>

3. Build a function that returns the mode of a list of numbers.

<details>
    <summary>Answer</summary>
        <code>
def mode(lst):
    counts = {num: lst.count(num) for num in lst}
    return [num for num in counts.keys() if counts[num] == max(counts.values())]</code>
    </details>

In [None]:
def get_mode(list_of_numbers):      
    frequency_dict = {}
    
    for item in data:
        if (item in frequency_dict)==True:
            frequency_dict[item]+=1
        else:
            frequency_dict[item]=1

    mode=[]

    for element in frequency_dict.items():
        if element[1] == max(list(frequency_dict.values())):
            mode.append(element[0])
    
    
    return mode

