<a href="https://colab.research.google.com/github/vanderbilt-data-science/p4ai-essentials/blob/main/3_iteration_and_tidbits_solns.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Conditional Execution and Functions
> Performing operations based on conditions and reusable code

In today's lesson, we'll cover conditional execution. We'll continue learning the syntax and grammar of the Python language to effectively communicate our goals to Python.

In this lesson, you'll learn:
* Communicating conditional execution syntax to Python

Let's get started!


In [2]:
#@title Quick Review on Data Structures
#@markdown We'll start off with a quick review of creating data structures. Let's start by creating lists,
#@markdown and then we'll use the to create a dictionary.
#@markdown
#@markdown Don't forget that you can always use the `Show Code` button for help!
#@markdown
#@markdown 1. Create a list called `weight_kgs`. It should have the values 25.0, 20.22, 17.83, 10.22, and 8.05
#@markdown 2. Create a list called `height_cm`. It should have the values 68.0, 57.99, 45.21, 36.2, and 10.22
#@markdown 3. Create a list called `neck_circ_cm`. It should have the values 45.2, 50.35, 55.2, 40.88 and 5.06
#@markdown 4. Create a list called `back_length_cm`. It should have the values 63.2, 50.25, 43.8, 50.1, and 12.5
#@markdown 5. Create a list called `chest_circ_cm`. It should have the values 78.2, 86.92, 53.9, 71.2, and 25.5
#@markdown 6. Create a list called `breed`. It should contain the string values of Afghan Hound, Airedale Terrier, American Staffordshire Terrier,
#@markdown Australian Shepherd, and Toy Poodle.
#@markdown 7. Create a dictionary called `dog_data` of 1-6, where the keys are the names given for the list (i.e., weight_kgs)
#@markdown and the values are the lists themselves.

#1.
weight_kgs = [25.0, 20.22, 17.83, 10.22, 8.05]

#2
height_cm = [68.0, 57.99, 45.21, 36.2, 10.22]

#3
neck_circ_cm = [45.2, 50.35, 55.2, 40.88, 5.06]

#4
back_length_cm = [63.2, 50.25, 43.8, 50.1, 12.5]

#5
chest_circ_cm = [78.2, 86.92, 53.9, 71.2, 25.5]

#6
breed = ['Afghan Hound', 'Airedale Terrier', 'Staffordshire Terrier', 'Australian Shepherd', 'Toy Poodle']

#7
dog_data = {'weight_kgs': weight_kgs,
            'height_cm': height_cm,
            'neck_circ_cm': neck_circ_cm,
            'back_length_cm': back_length_cm,
            'chest_circ_cm' : chest_circ_cm,
            'breed': breed}

In [3]:
#Solution to 1-6 here
#1.
weight_kgs = [25.0, 20.22, 17.83, 10.22, 8.05]

#2
height_cm = [68.0, 57.99, 45.21, 36.2, 10.22]

#3
neck_circ_cm = [45.2, 50.35, 55.2, 40.88, 5.06]

#4
back_length_cm = [63.2, 50.25, 43.8, 50.1, 12.5]

#5
chest_circ_cm = [78.2, 86.92, 53.9, 71.2, 25.5]

#6
breed = ['Afghan Hound', 'Airedale Terrier', 'Staffordshire Terrier', 'Australian Shepherd', 'Toy Poodle']

In [None]:
#Solution to 7 here. Make sure to display it so that you can see that it is correct.
dog_data = {'weight_kgs': weight_kgs,
            'height_cm': height_cm,
            'neck_circ_cm': neck_circ_cm,
            'back_length_cm': back_length_cm,
            'chest_circ_cm' : chest_circ_cm,
            'breed': breed}

What have we done? Actually, we've created a little mini dog dataset. With what we currently know, we can actually perform a substantial number of data analytics operations already. Consider filtering; if we only knew how to conditionally select data based on some quality of the data, we could filter out "rows" of the data. Let's explore conidtional execution. 

## Writing conditional execution code
Conditional execution takes the form of one or more `if`, `if-else`, `if-elif-else` statements. The syntax for communicating conditional execution looks like so:

### `if` statements
```
if condition:
  #code block for if condition true
```

### `if-else` statements
`if-else` statements allow binary, mutually exclusive decisions:

```
if condition:
  #code block for if condition true
else:
  #code block for if condition false
```

### `if-elif-else` statements

`if-elif-else` statements allow multiple, mutually exclusive decisions:
```
if condition A:
  #code block for if condition A true
elif condition B:
  #code block for if condition B true
elif condition C:
  #code block for if condition C true
else:
  #code block for none of the above conditions are true
```

Let's see what this looks like for our code.

## Guided Learning

### Guided Learning 1 - `if`
What if we wanted to know only the types of terriers we had in the dataset? Let's start with a basic exploration of known fields, where we can use the breed list that we created.

In [4]:
#show breeds
breed

['Afghan Hound',
 'Airedale Terrier',
 'Staffordshire Terrier',
 'Australian Shepherd',
 'Toy Poodle']

In [9]:
# syntax to try if
test_index = 3
if 'Terrier' in breed[test_index]:
    print(breed[test_index])

In [10]:
# try this using the for syntax we've seen
for dog_breed in breed:
    if 'Terrier' in dog_breed:
        print(dog_breed)

Airedale Terrier
Staffordshire Terrier


### Guided Learning 2 - `if-else`
Let's say we want to differentially process the `Toy` dog breeds from the others. If the dog breed is a `Toy` dog breed, we'll do some operation. Otherwise, we'd do another operation. How can we do this?

In [11]:
#differential processing for Toy breeds vs other breeds
if 'Toy' in breed[0]:
    print('Operation on Toy breed for', breed[0])
else:
    print('Different operation on non-toy breed for', breed[0])

Different operation on non-toy breed for Afghan Hound


In [12]:
# try this using the for syntax we've seen
for dog_breed in breed:
    if 'Toy' in dog_breed:
        print('Operation on Toy breed for', dog_breed)
    else:
        print('Different operation on non-toy breed for', dog_breed) 

Different operation on non-toy breed for Afghan Hound
Different operation on non-toy breed for Airedale Terrier
Different operation on non-toy breed for Staffordshire Terrier
Different operation on non-toy breed for Australian Shepherd
Operation on Toy breed for Toy Poodle


### Guided Learning 3 - `if-elif-else`
Suppose we found an error in our data for all dog breeds that start with the letter A, and we needed to perform an operation on them. Additionally, we need a different operation on Staffordshires, and everything else should be processed the same? How could we do this?

In [18]:
#differential processing for starts_with A breeds vs Staffordshires vs other breeds
test_index = 4
if breed[test_index].startswith('A'):
    print(breed[test_index], 'with separate processing for A')
elif 'Staffordshire' in breed[test_index]:
    print(breed[test_index], 'with separate processing for Staffordshire')
else:
    print(breed[test_index], 'with standard processing for other dog breeds')

Toy Poodle with standard processing for other dog breeds


### Mutual exclusivity of `if-else` and `if-elif-else` statements
Suppose we found an error in our data for all dog breeds that start with the letter A, and we needed to perform an operation on them. Additionally, we need a different operation on Terriers, and everything else should be processed the same? How could we do this?

In [20]:
#differential processing for starts_with A breeds vs Terriers vs other breeds
for dog_breed in breed:
    if dog_breed.startswith('A'):
        print(dog_breed, 'with separate processing for A')
    elif 'Terrier' in dog_breed:
        print(dog_breed, 'with separate processing for Terrier')
    else:
        print(dog_breed, 'with standard processing for other dog breeds')

Afghan Hound with separate processing for A
Airedale Terrier with separate processing for A
Staffordshire Terrier with separate processing for Terrier
Australian Shepherd with separate processing for A
Toy Poodle with standard processing for other dog breeds


<p style='color:red; font-weight:bold'> Do you see an error in execution here? How do you think you would fix it? </p>

<h4 style='color:blue; font-weight:bold'> Life Pro Tip </h4>
<p style='color:blue; font-weight:italic'> Just because it runs doesn't mean it's right. </p>

## Guided Exercises
Let's see what some real examples look like.

### Mutating a a new feature
Let's create a new feature of the dataset, called `weight_capped`. For reasons, we've chosen to cap the weight of dogs to 20.0kgs. If the dog weighs more than 20kgs, the value in the new list should be set to 20.0kgs.

We want to create this feature in a new dictionary called `dog_weight_cap`. This should be the exact same as the `dog_data`  dictionary, but with the new field added.

Let's see how we can do this.

In [5]:
# Calculate weight_cap list
weight_capped = []
for weight in dog_data['weight_kgs']:
    if weight > 20:
        weight_capped.append(20.0)
    else:
        weight_capped.append(weight)

weight_capped

[20.0, 20.0, 17.83, 10.22, 8.05]

In [6]:
# Create copy of dictionary
dog_weight_cap = dog_data.copy()

# Create this new feature in the dictionary
dog_weight_cap['weight_capped'] = weight_capped

# Show
dog_weight_cap

{'weight_kgs': [25.0, 20.22, 17.83, 10.22, 8.05],
 'height_cm': [68.0, 57.99, 45.21, 36.2, 10.22],
 'neck_circ_cm': [45.2, 50.35, 55.2, 40.88, 5.06],
 'back_length_cm': [63.2, 50.25, 43.8, 50.1, 12.5],
 'chest_circ_cm': [78.2, 86.92, 53.9, 71.2, 25.5],
 'breed': ['Afghan Hound',
  'Airedale Terrier',
  'Staffordshire Terrier',
  'Australian Shepherd',
  'Toy Poodle'],
 'weight_capped': [20.0, 20.0, 17.83, 10.22, 8.05]}

In [4]:
#@title Try it Yourself! - Mutating a class column
#@markdown In this exercise, we will add another column to the dataset, which will be an integer reflecting the class.
#@markdown The classes are given by the following mapping:
#@markdown 2: Toy breeds
#@markdown 1: Dogs named for country of origin
#@markdown 0: Dog breeds not given by {2,1}
#@markdown
#@markdown 1. Create a list called `dog_class`. Populate it based on the breeds in dog_data.
#@markdown 2. Create a copy of the dictionary called `dog_data_cls`.
#@markdown 3. Add the `dog_class` list to the dictionary with key `dog_class`.

#1
dog_class = []
for dog_breed in dog_data['breed']:
    if 'Toy' in dog_breed:
        dog_class.append(0)
    elif 'American' in dog_breed or 'Australian' in dog_breed or 'Afghan' in dog_breed:
        dog_class.append(1)
    else:
        dog_class.append(2)

#2
dog_data_cls = dog_data.copy()

#3
dog_data_cls['dog_class'] = dog_class

dog_data_cls

{'weight_kgs': [25.0, 20.22, 17.83, 10.22, 8.05],
 'height_cm': [68.0, 57.99, 45.21, 36.2, 10.22],
 'neck_circ_cm': [45.2, 50.35, 55.2, 40.88, 5.06],
 'back_length_cm': [63.2, 50.25, 43.8, 50.1, 12.5],
 'chest_circ_cm': [78.2, 86.92, 53.9, 71.2, 25.5],
 'breed': ['Afghan Hound',
  'Airedale Terrier',
  'Staffordshire Terrier',
  'Australian Shepherd',
  'Toy Poodle'],
 'dog_class': [1, 2, 2, 1, 0]}

# An excruciating motivation for "standard techniques"
Now, let's see what an analytics exercise would look like using lists and for loops. What if we wanted to exclude all of the rows of data for terriers?

In [22]:
# view the data once more
dog_data

#let's view this in a more reasonable way (note: you're about to experience the reason why pandas
#is a standard technique for data scientists)
import pandas as pd
display(pd.DataFrame(dog_data))

Unnamed: 0,weight_kgs,height_cm,neck_circ_cm,back_length_cm,chest_circ_cm,breed
0,25.0,68.0,45.2,63.2,78.2,Afghan Hound
1,20.22,57.99,50.35,50.25,86.92,Airedale Terrier
2,17.83,45.21,55.2,43.8,53.9,Staffordshire Terrier
3,10.22,36.2,40.88,50.1,71.2,Australian Shepherd
4,8.05,10.22,5.06,12.5,25.5,Toy Poodle


In [38]:
#@markdown Start by saying in words what you need to do.
#@markdown A good way to begin is by writing some comments that you will fill in in the code cell below.
#@markdown You can use the `Show Code` button below for some assistance.

# An option: non-destructive of original dictionary
## Figure out which indices to keep which are non-terrier
# iteration over breed:
    # is breed not 'Terrier'?
        # add index to keep_list

## Create a new dictionary based on newly constructed lists
# for key and list for each key:
    #iterate over all columns in dataset
        #if the index is to be kept, add the value to the column
    #reconstruct the key-value pair for the new dictionary

# An advanced option: use zip and restructure dictionary
# restructure dictionary as row-major
# find index of breed
# iteration over breed:
    # get indices of non-terrier
# return only matching indices

In [39]:
# Add your comments here.

In [49]:
# An option: non-destructive of original dictionary
no_terriers = {}

keep_inds = []

# iteration over breed:
for ind, dog_breed in enumerate(breed):
    # is breed not 'Terrier'?
    if 'Terrier' not in dog_breed:
        # add index to keep_list
        keep_inds.append(ind)
        
# for list for each key:
for key, column_list in dog_data.items():

    new_column_list = []
    #iterate over all columns in old dataset
    for ind, value in enumerate(column_list):
        
        #if the index is to be kept, add the value to the column
        if ind in keep_inds:
            new_column_list.append(value)
    
    #reconstruct the key-value pair for the new dictionary
    no_terriers[key] = new_column_list
        
# print to make sure
no_terriers

{'weight_kgs': [25.0, 10.22, 8.05],
 'height_cm': [68.0, 36.2, 10.22],
 'neck_circ_cm': [45.2, 40.88, 5.06],
 'back_length_cm': [63.2, 50.1, 12.5],
 'chest_circ_cm': [78.2, 71.2, 25.5],
 'breed': ['Afghan Hound', 'Australian Shepherd', 'Toy Poodle']}

In [50]:
#make sure original data isn't destroyed
dog_data

{'weight_kgs': [25.0, 20.22, 17.83, 10.22, 8.05],
 'height_cm': [68.0, 57.99, 45.21, 36.2, 10.22],
 'neck_circ_cm': [45.2, 50.35, 55.2, 40.88, 5.06],
 'back_length_cm': [63.2, 50.25, 43.8, 50.1, 12.5],
 'chest_circ_cm': [78.2, 86.92, 53.9, 71.2, 25.5],
 'breed': ['Afghan Hound',
  'Airedale Terrier',
  'Staffordshire Terrier',
  'Australian Shepherd',
  'Toy Poodle']}