![sslogo](https://github.com/stratascratch/stratascratch.github.io/raw/master/assets/sslogo.jpg)

# Functions,  Lambda Functions, List Comprehensions, Loops

## Functions

The following cell defines a function called square.

All named functions in python are defined using the def keyword (which as you guessed is short for define) followed by the function name and function arguments in parentheses. All function definitions must end with the colon sign (:) or you will get an error.

Functions can do anything and usually they give some value as their output. That value is given back to the caller of the function (returned) using the return keyword followed by the value.

Functions are always named by their purpose so we always know what to expect from them.

In this case the square functions always squares the input number.

The `**` operator is exponentiation in python (e.g. sqrt(2) = 2 `**` 0.5, 16 = 2 `**` 4) 

In [0]:
def square(x): 
    return x**2 

Calling our custom defined functions is same as calling any function.

In [5]:
x = 2
y = square(x)
print(y)

4


Calling our function using named arguments

Named arguments mean that when calling the function we do not pass just arguments as values (e.g. 2) but pass arguments as name, value pairs (e.g. x = 2)

Another example. Assuming we have some function `func(a, b, c)` we can call it as `func(1, 2, 3)` or as `func(a = 1, b = 2, c = 3)` or even `func(b = 2, c = 3, a = 1)`.

When using this calling convention we can order the arguments however we want.

In [6]:
y = square(x = 2)
print(y)

4


### Exercises

#### Exercise #1

Define a function called `pythonize_expression` which takes an input string and replaces all ^ with `**` and replaces all & with `and` and replaces all | with or.

Hint: Visit https://docs.python.org/3.7/library/string.html#string.replace

In [10]:
def pythonize_expression (inpstring):

  inpstring=inpstring.replace('^','**')
  inpstring=inpstring.replace('&', 'and')
  inpstring=inpstring.replace('|', 'or')

  #print(inpstring)
  return inpstring

inpstring="""i am learnign ^ data science ^^ &&& ||||"""
print(inpstring)


new_string=pythonize_expression(inpstring)
print(new_string)

i am learnign ^ data science ^^ &&& ||||
i am learnign ** data science **** andandand orororor


#### Exercise #2

Define three functions f, g, h and call them in succession from the starting argument 10.
- f should double the input number
- g should multiply it with 10
- h should cube it

The wanted result is h(g(f(10)))

In [16]:
def f(inpnumber):
  tmp=inpnumber*2
  return tmp

def g(inpnumber):
  tmp=inpnumber*10
  return tmp

def h(inpnumber):
  tmp=(inpnumber ** 3)
  return tmp

mynumber=10
x=f(mynumber)
print(x)
y=g(mynumber)
print(y)
z=h(mynumber)
print(z)

20
100
1000


#### Exercise #3

Define a function called `animal_speak` which takes two parameters

- First parameter is the name of the animal (one of "horse", "elephant", "duck" or "snake")
- Second paramater is the loudness (one of "loud" or "quiet") If "loud" print sound using upper case letters.

Each animal has a charactertic sound:
- horse has "neigh"
- elphant has "trumpet"
- duck has "quack"
- snake has "sss"

In [27]:
def animal_speak(animal_name, loudness):
  voice=''

  if (animal_name=='horse'):
    voice='neigh'
  elif (animal_name=='elephant'):
    voice='trumpet'
  elif (animal_name=='duck'):
    voice='quack'
  elif (animal_name=='snake'):
    voice='sss'
  else:
    print("Wrong animal")
    voice='error'

  if (loudness=='loud'):
    print("For {}, the sound is {}".format(animal_name, voice.upper()))
  else:
    print("For {}, the sound is {}".format(animal_name, voice.lower()))

animal_speak('horse', 'loud')
animal_speak('elephant', 'quiet')

For horse, the sound is NEIGH
For elephant, the sound is trumpet


## Lambda Functions

The function square we defined above has a name and we can call it with that name.

There also exist another variery of functions which are not named and are called lambdas.

It might seem weird to create a function and not give it a name but consider the case of a function used only once.
This happens much more often than you may think. 

Defining lambdas goes with the lambda keyword, followed by a list of parameters separated by comas, followed by a colon (:) and followed by a **single** python expression. Due to python limitations only a single expressions may be used. The reason for this is that if you have more than a single expression you should probably devise a named function.

Examples:
- `lambda x, y, z: (x * y) / z`
- `lambda array: array.shape`
- `lambda string: string.split(' ')[0]`

### Exercises

#### Exercise #1

Define a lambda function to calculate e^x as $$((1 + \frac{1}{n})^n)^x$$ for n = 1000

In [29]:
n=1000
lambda x:  ((1+(1/n)) ** n) ** x

<function __main__.<lambda>>

#### Exercise #2

Define a lambda function to get first and last elements of a list as a new list

Hint: Last element is at position -1

In [30]:
lambda list : [list[0], list[-1]]

<function __main__.<lambda>>

#### Exercise #3

Define a lambda function which takes 4 arguments, first two are lists (l1 and l2) and second two are positions (i1 and i2) and takes the elements at corresponding positions and multiplies them.

The body of the lambda is: `l1[i1] * l2[i2]` and you need to define the parameters.

In [31]:
lambda l1,l2,i1,i2 : li[i1]*l2[i2]

<function __main__.<lambda>>

## Function references

When you define functions and their parameters you usually think of parameters either as being some simple type (integer, real number, character string, array, datetime) or some class object.

It is possible to define functions whose parameters are other functions. This might be an abstract concept but it is the basis of many day-to-day tasks in data science so it is super important to understand.

An example is given in the following cell.

The `operate` function takes two parameters:
- operating_function which is a function that will do something with our number
- number which is what is passed to another_function as input
It prints the results of calling the function **referenced** by first parameter with the second parameter as input.

Our two functions which we use as operations are:
- `is_powerful` which checks if a number is powerful and says True or False.
- `make_powerful` which makes small numbers powerful by making them 9001 

The crucial line is `operate(is_powerful, 8000)`.
- We pass `is_powerful` **without any brackets** as the reference to the `is_powerful` function
- We also pass 8000 as the number we want to operate

The bold part is very important because if we used brackets that would be calling the `is_powerful` functions.
We do not want to call it here but just say to our `operate` function what is the function it should use as its `operating_function`

In [32]:
def operate(operating_function, number):
    print(operating_function(number))
    
def is_powerful(number):
    if number > 9000:
        return True
    else:
        return False
    
def make_powerful(number):
    if number <= 9000:
        return 9001
    else:
        return number
    
operate(is_powerful, 8000)

operate(make_powerful, 9000)

False
9001


### Exercises

#### Exercise #1

What do you think is the output of the following cell?

In [33]:
operate(is_powerful, make_powerful(8000))

True


#### Exercise #2

Make functions `is_weak` and `make_weak` and use them as references with `operate` function.

In [34]:
def is_weak(number):
  if number < 9000:
    return True
  else:
    return False

def make_weak(number):
  if number< 9000:
    return 8999
  else:
    return number

operate(is_weak, make_weak(8000))


True


### Practical Examples of Functions

Import nfl combine dataset

In [35]:
import pandas as pd
import numpy as np
import psycopg2 as ps

  """)


In [42]:
host_name = 'db-strata.stratascratch.com'
dbname = 'db_strata'
user_name = 'ankit082006' #enter username and password from profile tab in Strata Scratch
pwd = 'ofkMQPktC'
port = '5432'

try:
    conn = ps.connect(host=host_name,database=dbname,user=user_name,password=pwd,port=port)
except ps.OperationalError as e:
    raise e
else:
    print('Connected!')

Connected!


In [43]:
#Make the database call
cur = conn.cursor()
cur.execute(""" 
            SELECT *  FROM datasets.nfl_combine; 
            """)
data = cur.fetchall()
colnames = [desc[0] for desc in cur.description] #grab the column names
conn.commit()

#create the dataframe
data=pd.DataFrame(data)
data.columns = colnames

#close the connection
cur.close()

ERROR:root:An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line string', (1, 12))



ProgrammingError: ignored

In [38]:
data.head()

NameError: ignored

#### Example #1

Turn name to upper case letters

Here we see a practical example of function references.

As the reference we use `str.upper` which just makes all characters uppercase.

The `apply` function of a Series applies/calls the function on each item and stores the result in a new Series.

We use the Series which is behind the column called name.

In [0]:
data.name.apply(str.upper).head() 

0    AMEER ABDULLAH
1    NELSON AGHOLOR
2         JAY AJAYI
3    KWON ALEXANDER
4      MARIO ALFORD
Name: name, dtype: object

#### Example #2

Reverse order of first and last name with lambda

Here we used a lambda function.

This time apply works over rows (axis = 1) but the logic is the same.

In [0]:
data.apply(lambda row: '{1}, {0}'.format(row['lastname'], row['firstname']), axis=1).head()

0    Ameer, Abdullah
1    Nelson, Agholor
2         Jay, Ajayi
3    Kwon, Alexander
4      Mario, Alford
dtype: object

#### Example #3

Reverse order of first and last name with named functions and references.

In [0]:
def name_reverse(row):
    return '{1}, {0}'.format(row['lastname'], row['firstname'])

data.apply(name_reverse, axis=1).head()

0    Ameer, Abdullah
1    Nelson, Agholor
2         Jay, Ajayi
3    Kwon, Alexander
4      Mario, Alford
dtype: object

### Exercises

#### Exercise #1

Label colleges as Ivy League or not Ivy League

In [0]:
def label_ivy_league(college_name):
    ivy_colleges = ["Brown", "Columbia", "Cornell", "Dartmouth", "Harvard", "Pennsylvania", "Princeton", "Yale"]
    
    if college_name in ivy_colleges:
        return 'Ivy League'
    else:
        return 'Not Ivy League'
    
data.college.apply(label_ivy_league)

#### Exercise #2

Convert between lambda and named function and vice versa

`lambda line: len(line.replace('\n', '').split(' '))`

def rotate2(message):
    a = message.pop()
    b = message.pop()
    
    message = message + [a, b]
    
    return message
    
Hint 1: rotate2(['h', 'e', 'l', 'l', 'o']) is ['h', 'e', 'l', 'o', 'l']

Hint 2: when using + lists are concatenated together (e.g. [1, 2, 3] + [4, 5, 6] = [1, 2, 3, 4, 5, 6]

In [0]:
# Lambda to named function
def count_words(line):
    line = line.replace('\n', '')
    words = line.split(' ')
    count = len(words)
    return count

# named function to lambda
def rotate2(message):
    a = message.pop()
    b = message.pop()
    
    message = message + [a, b]
    
    return message

print(rotate2(['h', 'e', 'l', 'l', 'o']))

lambda message: message + [message.pop(), message.pop()]

#### Exercise #3

Use apply over a row

Make a description of each player using their name, college and position attributes.

Hint: Use a lambda

In [0]:
data.apply(lambda row: '{0}, from {1}, is playing as {2}'.format(row['name'], row.college, row.position), axis=1)

## For loops

For loops are an essential piece of the programming puzzle because they allow us to process unlimited amounts of data. 

In python for loops have the syntactic structure:

```
for x in iterable:
    do something with x
```
    
There are many iterables in python but for now we will stick with the two main ones:
- list
- range

In the following example we see a loop over a list where we just print each element.

In [44]:
languages = ["C", "C++", "Perl", "Python"]
for x in languages:
    print(x)

C
C++
Perl
Python


#### More complicated For loop

In programming all loops can end in one of two ways:
- When the iterable is finished (for example we reached end of list)
- Forcefully using break

In this example we forcefully exit a loop using break if we reach spam in our list of edibles.

You can also notice the added construction `else:`. This else is not linked to the `if` above it but to the `for`. You can see that by tracing the lines of indentation.

So what does that mean? It means that if the loop ended naturally (without break) the code in `else` section will be executed.

In [45]:
edibles = ["ham","eggs","nuts"]
for food in edibles:
    if food == "spam":
        print("No more spam please!")
        break
    print("Great, delicious {0}".format(food))
else:
    print("I am so glad: No spam!")
    
print("Finally, I finished stuffing myself")

Great, delicious ham
Great, delicious eggs
Great, delicious nuts
I am so glad: No spam!
Finally, I finished stuffing myself


#### Range iterable

```
for x in range(1, 5):
    print(x)
```

will print 1, print 2, print 3, print 4 but not 5.

Thus `range(start, end)` is the only way to iterate over a sequence of numbers using vanilla python (in numpy there is arange function).

There is also a version of range which takes a step parameter: `range(start, end, step)`

In [46]:
for x in range(1, 5):
    print(x)

1
2
3
4


#### Nested for loops

Sometimes we want to iterate in more than a single dimension (e.g. we need to work with matrices).

To do that we have to use nested for loops.

In [47]:
for x in range(0, 5):
    for y in range(0, 3):
        print("ROW {0} COLUMN {1}".format(x, y))

ROW 0 COLUMN 0
ROW 0 COLUMN 1
ROW 0 COLUMN 2
ROW 1 COLUMN 0
ROW 1 COLUMN 1
ROW 1 COLUMN 2
ROW 2 COLUMN 0
ROW 2 COLUMN 1
ROW 2 COLUMN 2
ROW 3 COLUMN 0
ROW 3 COLUMN 1
ROW 3 COLUMN 2
ROW 4 COLUMN 0
ROW 4 COLUMN 1
ROW 4 COLUMN 2


### Exercises

#### Exercise #1

Iterate over triples of numbers in a list, sum them up and print the result.

[2, 4, 6, 8, 10] => [2 + 4 + 6, 4 + 6 + 8, 6 + 8 + 10]

Hint: use the range function

In [52]:

# 2 4 6
# 4 6 8
# 6 8 10

numbers = [2, 4, 6, 8, 10]

for i in range(0, len(numbers) - 2):
    a = numbers[i]
    b = numbers[i + 1]
    c = numbers[i + 2]
    
    print(a + b + c)

12
18
24


#### Exercise #2

You are given two lists, one containg fruit names and other containing the sweetness level of each fruit. Your goal is to print out each fruit along with a binary information about its sweetness (SOUR if sweetness is below 0.5 else SWEET).

Hint: Use range and index the two lists using the `i` from range. Use the len function.

In [56]:
fruitlist=['apple','orange','banana','bitterguard']
sweetlist=[0.6,0.5,0.5, 0.1]

ind="SWEET"
for a,b in (zip(fruitlist, sweetlist)):
  if b<0.5:
    ind="SOUR"
  else:
    ind="SWEET"

  print ("{} is {}".format(a, ind))

apple is SWEET
orange is SWEET
banana is SWEET
bitterguard is SOUR


#### Exercise #3

Print out the multiplication of every combination of first 5 odd and first 5 even numbers.

[1, 3, 5] and [2, 4, 6] would give `1 * 2, 1 * 4, 1 * 6, 3 * 2, 3 * 4, 3 * 6` and so on

Hint  : Use a nested for loop

Hint 2: Use a range with step or a manually created list

In [57]:
list1=[1,3,5]
list2=[2,4,6]

result=[]
for elem1 in list1:
  for elem2 in list2:
    result.append(elem1*elem2)

print(result)

[2, 4, 6, 6, 12, 18, 10, 20, 30]


### List Comprehensions
List comprehensions are compact lines of code that have the same functionality as longer lines of code.  They're harder to read at first, but they end up saving lots of unnecessary loops of code.

#### List Comprehension Example #1
Long format

In [58]:
example_1 = []
for x in [1,2,3,4]:
    calculation = x ** 2
    example_1.append(calculation)
print(example_1)

[1, 4, 9, 16]


List comprehension

In [59]:
[x ** 2 for x in [1,2,3,4]]

[1, 4, 9, 16]

#### List Comprehension Example #2
Long format

In [0]:
example_2 = []
for x in [1,2,3,4]:
    if x%2 == 0:
        example_2.append(x)
print (example_2)

[2, 4]


List comprehension

In [60]:
[x for x in [1,2,3,4] if x%2 == 0]

[2, 4]

#### List Comprehension Example #3
Long format

In [0]:
example_3 = []
for x in [1,2,3,4]:
    if x%2==0:
        example_3.append('even')
    else:
        example_3.append('odd')
print(example_3)

['odd', 'even', 'odd', 'even']


List comprehension

In [61]:
['even' if x%2==0 else 'odd' for x in [1,2,3,4]]

['odd', 'even', 'odd', 'even']

#### Short form if

You can see that we use the short form of if in list comprehensions.

The syntax is: `result_true if condition else result_false`

The following two blocks of code are equivalent.

```
if condition:
    result_true
else:
    result_false
```

```result_true if condition else result_false```

## Using functions and loops to clean data

### Really messy dataset

We must load a new dataset where we can showcase how to apply our newfound knowlegde to clean data.

And as a bonus I will use a function to do so. You can reuse this function to load your own datasets from StrataScratch.

In [63]:
host_name = 'db-strata.stratascratch.com'
dbname = 'db_strata'
port = '5432'
user_name='ankit082006'
pwd='ofkMQPktC'

try:
    conn = ps.connect(host=host_name,database=dbname,user=user_name,password=pwd,port=port)
except ps.OperationalError as e:
    raise e
else:
    print('Connected!')

def get_dataset(dataset_name):
    #Write SQL below to pull datasets 
    cur = conn.cursor()
    cur.execute(""" 
                SELECT *  FROM datasets.{0}; 
                """.format(dataset_name))
    data = cur.fetchall()
    colnames = [desc[0] for desc in cur.description] 
    conn.commit()

    #create the pandas dataframe
    dataframe = pd.DataFrame(data, columns=colnames)

    #close the connection
    cur.close()
    
    return dataframe

innerwear_amazon = get_dataset("innerwear_amazon_com")

innerwear_amazon.head()

ERROR:root:An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line string', (1, 16))



Connected!


ProgrammingError: ignored

In [0]:
innerwear_amazon.dtypes

product_name        object
mrp                 object
price               object
pdp_url             object
brand_name          object
product_category    object
retailer            object
description         object
rating              object
review_count         int64
style_attributes    object
total_sizes         object
available_size      object
color               object
dtype: object

**This tutorial is assuming you have some experience with regular expressions and that you have read the guide for cleaning data in pandas.**
- https://colab.research.google.com/drive/1TlPeG9clc2UOE-pXteTRfy8CkfVbQn-p

In [0]:
import re

#### Example 1

Fix prices (price and mrp columns) to become numbers and of type float64.

In [0]:
# get the first element of a list only if it has elements otherwise return NaN
def safe_get(list):
    if len(list) > 0:
        return list[0]
    else:
        return np.nan

# $ is also a special sign and we must escape it in regexp
def convert_prices(p):
    return safe_get(re.findall("\$(\d{2}\.\d{2})", p))

for column in ["price", "mrp"]:
    innerwear_amazon[column] = innerwear_amazon[column].apply(convert_prices).astype(np.float64)

innerwear_amazon.dtypes

product_name         object
mrp                 float64
price               float64
pdp_url              object
brand_name           object
product_category     object
retailer             object
description          object
rating               object
review_count          int64
style_attributes     object
total_sizes          object
available_size       object
color                object
dtype: object

#### Example 2

The `available_size` attribute is a list and of two different variations:
- Small, Medium, Large for Panties
- 30B, 30C, 30DD etc for Bras

For further analysis (let's say we want to count the total number of available Medium Panties or 30D Bras) we want to create new columns called Small, Medium, etc and 32DD, 34C etc and have a value of 1 in the correspondingly named column if that size is present in the list of available sizes.

This will be a bigger example and the explanations are in code comments so make sure to take it slowly and go step by step.

In [0]:
all_sizes = innerwear_amazon.available_size

# we use this set to store a list of all possible sizes
possibles_sizes_set = set()

# we can iterate over Series as they are iterables too
for sizes_str in all_sizes:
    # sizes_str is a string and we make it into a list using split by comma
    sizes = sizes_str.split(",")
    # we now have a list of sizes but they have extra whitespace which we strip away
    for size in sizes:
        size = size.strip(" ")
        
        # we add this size to our set using union and making a one-element set out of size
        possibles_sizes_set = possibles_sizes_set.union({ size })
    
# Now we have a set of all possible sizes which will be the names of our new columns.

print("The possible sizes are: ")
print(possibles_sizes_set)

The possible sizes are: 
{'One Size', '36I', '32D', '2X Apparel', '44DD', '38DD', '30DDD', '42G', '36DD', '32DDD', '44H', '2X Plus', '36DDD', 'XL', '34H', '24-26 Plus', '40DDD', '44G', 'Medium', '34A', '42D', '1X Apparel', '38B', 'Small', '40B', '40G', '32H', '34D', '36G', '34DD', '46DD', '44DDD', '36H', '42DD', '46H', '30C', '32', '40I', '40C', '40H', 'S', 'XS', '36C', '36A', '42C', '32DD', '30G', '2X (20W-22W)', '36B', '48H', '40D', '3X Apparel', '32A', '30B', '34', 'X-Large', '32B', 'X-Small', '44D', '38DDD', '34C', '38H', '36', '1X Plus', '38A', 'L', 'Null', '30DD', '38D', '38I', '2X', '44C', 'M', '34I', '36D', '38G', '38C', '46DDD', '34DDD', '40DD', '32C', '1X', 'One Size (fits 0-12)', '42DDD', '38', '46D', 'Large', '30D', '3X', '46G', '32G', '34G', '34B', '42H'}


In [0]:
# We now build our new dataframe using the data we just got
rows = []

for sizes_str in all_sizes:
    # sizes_str is a string and we make it into a list using split by comma
    sizes = sizes_str.split(",")
    
    # now we can use a list comprehension to remove whitespace
    sizes = [sz.strip(" ") for sz in sizes]
    
    # This is magic for now, just take it that it builds a row whose columns are our sizes
    # and whose cells are all 0 and that row is in form of a dictionary.
    one_row = dict(zip(possibles_sizes_set, [0] * len(possibles_sizes_set)))
    
    for size in sizes:
        one_row[size] = 1
                    
    # We store our new row in a list called rows.
    rows.append(one_row)
    
print("Example row after sizes are marked")
print(rows[0])

Example row after sizes are marked
{'One Size': 0, '36I': 0, '32D': 1, '2X Apparel': 0, '44DD': 0, '38DD': 0, '30DDD': 0, '42G': 0, '36DD': 0, '32DDD': 0, '44H': 0, '2X Plus': 0, '36DDD': 0, 'XL': 0, '34H': 0, '24-26 Plus': 0, '40DDD': 0, '44G': 0, 'Medium': 0, '34A': 1, '42D': 0, '1X Apparel': 0, '38B': 1, 'Small': 0, '40B': 0, '40G': 0, '32H': 0, '34D': 1, '36G': 0, '34DD': 1, '46DD': 0, '44DDD': 0, '36H': 0, '42DD': 0, '46H': 0, '30C': 1, '32': 0, '40I': 0, '40C': 0, '40H': 0, 'S': 0, 'XS': 0, '36C': 1, '36A': 1, '42C': 0, '32DD': 1, '30G': 0, '2X (20W-22W)': 0, '36B': 1, '48H': 0, '40D': 0, '3X Apparel': 0, '32A': 0, '30B': 1, '34': 0, 'X-Large': 0, '32B': 1, 'X-Small': 0, '44D': 0, '38DDD': 0, '34C': 1, '38H': 0, '36': 0, '1X Plus': 0, '38A': 0, 'L': 0, 'Null': 0, '30DD': 1, '38D': 0, '38I': 0, '2X': 0, '44C': 0, 'M': 0, '34I': 0, '36D': 1, '38G': 0, '38C': 1, '46DDD': 0, '34DDD': 0, '40DD': 0, '32C': 1, '1X': 0, 'One Size (fits 0-12)': 0, '42DDD': 0, '38': 0, '46D': 0, 'Large': 0

In [0]:
# We have a list of dictionaries and we must now build a dictionary of lists so we can build a data frame.
n_rows = len(rows)

# there exist dictionary comprehensions too
final_dict = {  size : np.zeros(shape=n_rows, dtype=np.bool)
                for size in possibles_sizes_set }

# We go row by row to build our final dictionary
for i in range(0, n_rows):
    # this is how we iterate over dictionaries
    for size, present in rows[i].items():
        # think of this like a matrix, size is columns, i is rows
        final_dict[size][i] = present

final_dict

{'1X': array([False, False, False, ..., False, False, False]),
 '1X Apparel': array([False, False, False, ..., False, False, False]),
 '1X Plus': array([False, False, False, ..., False, False, False]),
 '24-26 Plus': array([False, False, False, ..., False, False, False]),
 '2X': array([False, False, False, ..., False, False, False]),
 '2X (20W-22W)': array([False, False, False, ..., False, False, False]),
 '2X Apparel': array([False, False, False, ..., False, False, False]),
 '2X Plus': array([False, False, False, ..., False, False, False]),
 '30B': array([ True, False, False, ..., False, False, False]),
 '30C': array([ True, False, False, ..., False, False, False]),
 '30D': array([ True, False, False, ..., False, False, False]),
 '30DD': array([ True, False, False, ..., False, False, False]),
 '30DDD': array([False, False, False, ..., False, False, False]),
 '30G': array([False, False, False, ..., False, False, False]),
 '32': array([False, False, False, ..., False, False, False]),
 '

In [0]:
# Turn our final dictionary to a dataframe
sizes_dataframe = pd.DataFrame(final_dict)

# Whew no wonder data scientists get over $100k
sizes_dataframe.head()

Unnamed: 0,One Size,36I,32D,2X Apparel,44DD,38DD,30DDD,42G,36DD,32DDD,...,38,46D,Large,30D,3X,46G,32G,34G,34B,42H
0,False,False,True,False,False,False,False,False,False,False,...,False,False,False,True,False,False,False,False,True,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,True,False,False,True,True,...,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,False,False,True,False,False,True,True,False,True,True,...,False,False,False,True,False,False,True,True,False,False


In [0]:
# Find how many bras are of size 30G
print(sizes_dataframe['30G'].sum())
# Find out how many panties are Medium
print(sizes_dataframe['Medium'].sum())

1434
3625


### Exercises

#### Exercise #1

Create a list comprehension that converts temperatures from Celsius to Fahrenheit for 0 to 100 in increments of 10. (hint: use the range with step function)

In [66]:
celciuslist=np.arange(10,101, 10)
print(celciuslist)

farenheightlist=[(x*(9/5)+32) for x in celciuslist ]
print(farenheightlist)

[ 10  20  30  40  50  60  70  80  90 100]
[50.0, 68.0, 86.0, 104.0, 122.0, 140.0, 158.0, 176.0, 194.0, 212.0]


Update your list comprehension to check the data type and return an error message if the data type isn't numeric. (hint: range returns integer values)

In [69]:
for elem in celciuslist:
  print(type(elem))

<class 'numpy.int64'>
<class 'numpy.int64'>
<class 'numpy.int64'>
<class 'numpy.int64'>
<class 'numpy.int64'>
<class 'numpy.int64'>
<class 'numpy.int64'>
<class 'numpy.int64'>
<class 'numpy.int64'>
<class 'numpy.int64'>


In [71]:
farenheightlist=[(x*(9/5)+32) if type(x)==np.int64 else "error" for x in celciuslist ]
print(farenheightlist)

[50.0, 68.0, 86.0, 104.0, 122.0, 140.0, 158.0, 176.0, 194.0, 212.0]


#### Exercise #2

Check if an input list is a palindrome with a list comprehension 

A palindrome is a list whose elements satisfy l[0] = l[-1], l[1] = l[-2], l[2] = l[-3] and so on.

For example: [1, 2, 2, 1] is a palindrome with result [True, True] but [1, 2, 3, 1] is not with result [True, False]

The result should be a list of boolean values, where each value is True if the input list is a palindrome.

Hint: Make the list comprehension over the results of range function

In [0]:
input_list = [1, 2, 2, 1]
length=len(input_list)

In [0]:
result=[input_list[i]==input_list[length-i-1] for i in range(0, length//2) ]

In [75]:
result

[True, True]

#### Exercise #3

Take the results from exercise 2 and use a for loop to get a single value

Hint: Use the `and` operator

In [76]:
result = [True, True]

output = True

for b in result:
    output = output and b

print(output)

True


#### Exercise for whole lesson

We want to know what physical and athletic characteristics lead to an athelete being drafted in the 1st vs 2nd vs 3rd, etc rounds for a specific position. Using the NFL combine dataset, pick one position to evaluate, and analyze the differences between atheletes drafted in the different rounds.

Hints: There's no right or wrong answer but here's a few things to think about

- Based on the athelete's physical and atheletic characteristics, is there an aggregate metric you can create (similar to column nflgrade)?
- How accurate is that metric in determining whether or not an athelete is drafted in the 1st round vs 2nd round, etc? You might need to create an if/else statement inside a for loop to evaluate the accuracy of the metric for each athelete before evaluating the performance of the metric.
- Another method is to take a look at the averages of each attribute to evaluate whether or not the athelete is above or below the average for each round. 