In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("week-4.ipynb")

# Week 4 Lecture Notebook

## List Comprehension

List comprehension offers a shorter syntax when you want to create a new list based on the values of an existing list.

In [None]:
programming_languages = ["python", "R", "julia", 
                         "Java", "C", "C#", "C++", 
                         "java script", "go", "swift"]

# Initialize empty list
my_list = []

for word in programming_languages: # Iterate through each word in the list
    if "a" not in word:            # Iterate through each letter in the word
        my_list.append(word)       # Add the word to the list if there is no "a"
my_list

We can check to see if there are any vowels in each word in the `programming_languages` list.

In [None]:
# Initialize empty list
my_list = []

# Create a list of vowels
vowels = ["a", "e", "i", "o", "u"]

for word in programming_languages:                   # Iterate through each word in the list
    if not any(letter in vowels for letter in word): # Iterate through each letter in the word and see if its a vowel
        my_list.append(word)                         # Add the word to the list if it does not contain a vowel

print(my_list)

With list comprehension you can do all that with only one line of code.

<!-- BEGIN QUESTION -->

**Example 1.** Let's practice list comprehension.

In [None]:
import numpy as np

# Create a list of the first 10 odd integers
my_odd_integers = ...

# Create a list of the programming languages
my_languages = ...

# Create a list of words without "a" from the programming languages list
no_a_list = ...

# Create a list of words without vowels from the programming languages list
no_vowel_list = ...

print("My odd integers list:", my_odd_integers)
print("My list:", my_list)
print("My no a list:", no_a_list)
print("My no vowel list:", no_vowel_list)

<!-- END QUESTION -->

## `Pandas`

####  `.loc`

The `.loc` property is used to select a **group** of rows and columns by label(s) or a boolean array. The `.loc` method is primarily label based, but can be used with a boolean array.

Allowed inputs are:

* A single label, e.g. `5` (interpreted as a label of the index, and never as an integer position along the index) or `'a'`.

* A list or array of labels, e.g. `['a', 'b', 'c']`.

* A slice object with labels, e.g. `'a':'f'`.

* A boolean array of the same length as the axis being sliced, e.g. `[True, False, True]`.

Run the cell below to load `pandas`.

In [None]:
import pandas as pd

Do you believe in UFOs? How many reported sigthings would have to occur before you believe UFOs really existed?

Run the cell below to load the `ufo.csv` file.

In [None]:
ufo = pd.read_csv('data/ufo.csv')
ufo

<!-- BEGIN QUESTION -->

**Example 2.** What are all the locations where a sighting took place?

In [None]:
# Using df.column_name and the unique method
...

In [None]:
# Using bracket notation
...

In [None]:
# Using bracket notation and the .value_counts method
ufo['State'].value_counts()

<!-- END QUESTION -->

**Example 3.** Let's look at the column names.

In [None]:
ufo.columns

**Example 4.** We can use `.loc` to select the first row. A `Series` is returned with the column name as the index.

In [None]:
# Single bracket notation returns a Series
ufo.loc[0]

**Exmaple 5.** What is we want to return a `DataFrame`.

In [None]:
# Double bracket notation returns a DataFrame
ufo.loc[[0]]

**Example 6.** Return the first three rows.

In [None]:
# Rows 1, 2, and 3
ufo.loc[0:2]

**Example 7.** We can use a row slice and choose a column by name.

In [None]:
# Rows 1, 2, and 3, and only the City column. Reutrns a Series.
ufo.loc[0:2, 'City']

In [None]:
# Rows 1, 2, and 3, and only the City column. Reutrns a DataFrame.
ufo.loc[0:2, ['City']]

**Example 8.** We can use a row slice and choose multiple columns by name.

In [None]:
# Rows 1, 2, and 3, and the City and State columns. Reutrns a DataFrame.
ufo.loc[0:2, ['City', 'State']]

This uses `.loc`.

In [None]:
ufo.loc[0:, ['City', 'State']]

A simplified version of the previous command.

In [None]:
ufo[['City', 'State']]

Let's set the index to the `State`.

In [None]:
ufo_state = ufo.set_index('State')
ufo_state.head()

Now we can slice the rows by the index.

**Example 9.** Find all the rows where the state is North Carolins.

In [None]:
# Using .loc we can slice by index name and colum name
ufo_state.loc['NC', ['City']]

## `Pandas`

#### `.iloc`

With the `.iloc` function, we can retrieve rows and column using the index values assigned to it. The `.iloc` function accepts only integer type values as the index values for the values of the rows and columns to be accessed and displayed.

**Exmaple 10.**  Select the first three rows and the first column. Return a dataframe.

In [None]:
# Use .iloc to select the first three rows and the first column. 
# Return a dataframe.
ufo.iloc[0:3, [0]]

In [None]:
# Use .iloc to select the first three rows and the City and State columns. 
# Return a dataframe.
ufo.iloc[0:3, [0, 3]]

## `Pandas`

#### `.groupby`

Pandas provide a `.groupby` function that operates on a `DataFrame` object. It can take one or multiple columns (as a list) to group the data and returns a `GroupBy` object.

In [None]:
ufo.groupby('State')

We can aggregate using one or more operations over a specified axis.

In [None]:
# This returns the count of all columns with the state as the index
ufo.groupby('State').count()

In [None]:
# This returns the count of all columns with integers as the index
...

In [None]:
# This returns the count of all columns with integers as the index
# then we select the State and City
# then rename the columns
...

In [None]:
weight = pd.read_csv('data/baby.csv')
weight

The aggregation functions can be found [here](https://cmdlinetips.com/2019/10/pandas-groupby-13-functions-to-aggregate/).

In [None]:
weight.groupby('Maternal Smoker', as_index=False).mean()

In [None]:
weight.groupby('Maternal Smoker').agg({'Birth Weight': 'mean', 'Maternal Age':'max'})

In [None]:
weight.groupby('Maternal Smoker').agg({'Birth Weight': ['mean', 'max']})

## Dictionary

A dictionary is an unordered and mutable Python container that stores mappings of unique keys to values. Dictionaries are written with curly brackets (`{ }`), including key-value pairs separated by commas (`,`). A colon (`:`) separates each key from its value.

In [None]:
country_capitals = {"USA":"Washington D.C.", "France":"Paris", "India":"New Delhi"}
country_capitals

In [None]:
conferences  = {"ACC": ["Boston College", "Clemson", "Duke", "Georgia Tech", 
                        "Florida State", "NC State", "Syracuse", "Louisville", 
                        "Miami", "UNC", "Notre Dame", "Pittsburgh", "Virginia", 
                        "Virginia Tech", "Wake Forest"],
               "CIAA": ["Bowie State", "Chowan", "Claflin", "Elizabeth City State", "Fayetteville State",
                       "Johnson C Smith", "Lincoln", "Livingston", "Shaw", "St. Augustine", "Virginia State",
                       "Virginia Union", "Winston Salem State"]
               }
conferences

**Example 11.** We can append key:value pairs to a dictionary.

In [None]:
country_capitals["UAE"] = "Abu Dhabi"
country_capitals

In [None]:
nfc = {}
nfc['south'] = ['Panthers']
nfc

In [None]:
# Append Falcons to the list
...
nfc

In [None]:
# Extend multiple items to a list
# Add the Buccaneers and Saints to the list
...
nfc

**Example 12.** We combine two lists into a dictionary using the`dict` and `zip` function.

The `zip()` function returns a zip object, which is an iterator of [tuples](https://www.geeksforgeeks.org/python-tuples/) where the first item in each passed iterator is paired together, and then the second item in each passed iterator are paired together etc.

In [None]:
# A tuple
my_tuple = (1, 3, 5, 7, 9)
my_tuple

In [None]:
keys = ['a', 'b', 'c']
values = [1, 2, 3]

zip(keys, values)

In [None]:
my_dictionary = dict(zip(keys, values))
my_dictionary

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

When done exporting, download the .zip file by finding it in the file browswer on the left side of the screen, then right-click and select **Download**. You'll submit this .zip file for the assignment in Canvas to Gradescope for grading.

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(pdf=False)