### Using Jupyter Notebooks and Pandas to Work with Data

## A. Introduction


* **Jupyter Notebooks** allow anyone to run **Python** code in any browser, without the need to use the terminal or command line

* **Jupyter Notebooks** are organized as 'cells', which can be **commentary** (like this one, which is static), or **code** (those below, which produce dyanmic output in the form of charts or tabular data frames.  

* To run an individual cell, use the **`arrow/run`** command at the top of the Notebook, or just press **`Shift + Enter`** on your keyboard.


In [None]:
print("This is my First Notebook")

This is my First Notebook


# Introduction
In Python, like other programming languages, we declare **variables**, then use **functions** transform them in various ways.  There are functions (and libraries) that help us work with numbers, text, statistics, graphs, etc.

Read more about Python with this easy reference guide:  https://www.w3schools.com/python/python_intro.asp



---




# Data Types
It's important to understand **data types**:

*   **integer**:  any whole number, such as 1, 2, 10

*   **float**:  decimal numbers, such as 2.25, 6.875

*   **string**:   any alpha-numeric character, surrounded by quotation marks (double or single)
*   **Boolean**:  True or False (in fact these are reserved words in Python and should not be used beyond Boolean functions)

---
# Collections
These can also be combined as various kind of **collections**:

*   **lists**:  an ordered sequence presented in brackets and separated by commas, such as **["Blues", "Gospel", "Country"]** or **[1.0, 4.0, 6.0]**.  Lists can contain duplicates
*   **set**:  an unordered sequence; it cannot contain duplicates
*   **tuples**: a special kind of list that cannot be altered, presented as **("Jazz", "Classical", "Folk")** or **(1.25, 4.76, 8.03)**

*   **dictionaries**:  a series of **key : value** pairs contained within braces, such as **{"artist" : "McCartney, Paul", "title" : "Yesterday"}**

Read more about Python data types here:  https://www.w3schools.com/python/python_datatypes.asp







# Checking Data Types

Check the **data type** of any item:

`type(item)`

In [None]:
item_1 = 'guitar'
item_2 = 1
item_3 = 2.25
item_4 = True
item_5 = ["Blues", "Gospel", "Country"] 
item_6 = ("Jazz", "Classical", "Folk")
item_7 = {"artist_surname" : "McCartney", "artist_given_name" : "Paul", "title" : "Yesterday"}

In [None]:
type(item_2)

int

# Numbers, Strings, Floats, and Booleans

Depending on the data type, it's possible to perform various **operations** on them.

* You **can** add/subtract **integers** and **floats** to each other
* You **cannot** add an **integer** to **text or a Boolean**
* Many other kinds of operators can be used to compare items, test for thresholds, etc.  

**Integers** and **floats** can be used with various mathematical tests:

```
Equals: a == b
Not Equals: a != b
Less than: a < b
Less than or equal to: a <= b
Greater than: a > b
Greater than or equal to: a >= b
```


Read about operators here:  https://www.w3schools.com/python/python_operators.asp


**Strings** are text.  There are a wide variety of built-in methods that allow to you to work with them to:

* change lower/upper case
* strip out certain characters (like punctuation)
* split texts (at spaces, for example)
* find substrings, or first/last characters
* etc!

See more [here](https://www.w3schools.com/python/python_strings_methods.asp).



# Your Turn with Data Types
Try **adding** some of the variables from above to each other.  
Which combinations return errors?  Why?



```
item_2 + item_3
```



In [None]:
item_2 + item_3

3.25

# Collections:  Lists, Dictionaries, Sets, and Tuples

In Python, items gathered together are called a **collection**. There are four types, each with its own properties: 

* **Lists** are used to store an **ordered sequence of  items** in a single variable.  Each item is surrounded by quotation marks, separated by a comma from the next. Square brackets surround the whole.  See more [here](https://www.w3schools.com/python/python_lists.asp).

```
list_of_genres =  ["Blues", "Gospel", "Country", "Hip-Hop"] 
```

* **Sets** are like lists, but are **not ordered**, and cannot contain duplicates. See more [here](https://www.w3schools.com/python/python_sets.asp).




```
set(list_of_genres)
{'Blues', 'Country', 'Gospel', 'Hip-Hop'}

```
Turn a set back into a list:  

```
list(set(list_of_genres)
```

* **Dictionaries** are used to store data values in **key:value pairs**. Like lists, they are ordered. But like sets they do not allow duplicates. Each key or value is surrounded by quotation marks. Each key:value pair is followed by a comma. Curly brackets surround the whole. Dictionaries can also contain other dictionaries, as 'nested' dictionaries. See more [here](https://www.w3schools.com/python/python_dictionaries.asp) and below.

```
my_dict = {"artist_first_name" : "Wolfgang Amadeus",
"artist_last_name" : "Mozart",
"work_title" : "The Magic Flute",
"work_genre" : "singspiel",
"date" : "1791",
"first_performance_place" : "Vienna"}
```

* **Tuples** are a special type of collection:  ordered (like lists), but unchangeable (it is not possible to add or remove items). See more [here](https://www.w3schools.com/python/python_tuples.asp). 





<font color = 'red'> In the above section, you seem to be restricting lists, sets, dictionaries, etc. to contain items, keys and values that are strings only. I assume it is because of the data sets you are working with. Perhaps that should be made clearer 

# Working with Lists
In the case of lists, we often need to:
* **Add or remove items**, as in `my_list.append(another_item)`
* Find out **how many items** are in a list (that is, the "length"), such as `len(my_list)`
* Find the **unique items** in a list (that is, the "set"), such as `set(my_list)`.
* Find **how many unique items** there are in a given list, which is the "length" of the "set": `len(set(my_list))`.  Note the nested parentheses!
* Find particular **items by their index** (= position in the list): `my_list[0]` (remember that the index of first position is always "0").  The last item in a list is `my_list[-1]` (a "negative index" counts back from the end, starting with "-1").
* Find the **index (= position)** of a particular item: `my_list.index(item_name)`
* **Sort** the list: ``my_list.sort()``, or in reverse alphabetical order: ``my_list.sort(reverse = True)``

Note that there are many other ways to work with lists, including methods for find a **range** of items ("the first 10 items", "every other item", "all but the first and last items").

Read more about working with **Lists** here:  https://www.w3schools.com/python/python_lists.asp

Try them out:  https://www.w3schools.com/python/python_lists_exercises.asp

# Your turn with lists:


* **Create a list** of names, like `my_list = ["Paul McCartney", "John Lennon"]`
* Find the **length** of the list
* Print the **first** item in the list
* Find the **last** item on the list (even if you don't know how long the list is!)
* Assign the **last** item in the list to a new variable name, 'last', and print that
* Find the **index position** of a particular name on your list
* **Add** an item to the end of the list
* **Remove** a particular value from the list (regardless of value)
* **Remove** the item at a particular index position
* **Add a duplicate** of the second item on the list to the end of the list
* Now that the list has a duplicate value in it, find the **unique values**



Read the methods: https://www.w3schools.com/python/python_lists_methods.asp





In [None]:
list_of_genres =  ["Blues", "Gospel", "Country", "Hip-Hop"] 

In [None]:
list_of_genres.index('Country')

2

In [None]:
list_of_genres.append("Western")

In [None]:
list_of_genres.remove("Gospel")

In [None]:
list_of_genres.append(genres[0])

In [None]:
list_of_genres.pop(2)

'Hip-Hop'

In [None]:
set(genres)

{'Blues', 'Country', 'Western'}

In [None]:
list(set(genres))

['Blues', 'Country', 'Western']

# Dictionaries
You can think of a dictionary like a small catalog, with a series of unique "keys" and their associated "values", like:

```
my_dict = {"artist_first_name" : "Wolfgang Amadeus",
"artist_last_name" : "Mozart",
"work_title" : "The Magic Flute",
"work_genre" : "singspiel",
"date" : "1791",
"first_performance_place" : "Vienna"}
```
The values can repeat, but the keys in any dictionary must be unique.

There are various ways to:

* list the **keys**:  `my_dict.keys()`
* list the **values**:  `my_dict.values()`
* list all the **items** (both the keys and values): `my_dict.items()`
* list the **value for a particular key**: `my_dict["date"]`
* update the **value for a particular key**: `my_dict["date"] = "1789"`
* add a **key/value pair**:  `my_dict["language"] = "German"
* remove a **key/value pair**: `my_dict.pop("language")

It is also possible to encounter **nested dictionaries**, in which one dictionary contains another.  More about these below under **for loops**.


More [here](https://www.w3schools.com/python/python_dictionaries.asp).






# Your Turn with Dictionaries

Select a musical object (a work, a musical event, a CD, an instrument) and create a dictionary that captures important information about it. There should be at least five unique keys in your dictionary.

Then try the following:
* **Populate** the keys/value pairs
* **List** the keys: `my_dict.keys()`
* Save that list as a new variable `list_keys = list(my_dict.keys())`.  Note the nested parentheses!
* **Find values** for some of the keys
* **Update** certain keys 
* Pick some related fields in the dictionary and join them up as a new variable, like:

```
Artist_Full_Name = my_dict["artist_first_name"] + " " + my_dict["artist_last_name"]

```

or

```
Artist_Surname_Name_Sort = my_dict["artist_last_name"] + " " + my_dict["artist_first_name"]

```

**Nested Dictionaries** are ones in which one dictionary contains another.  For example:

For example, this dictionary of works in a concert, each with their own details about composer, title, genre, etc.

```
my_concert = {
    "work_1": {
        "composer_first_name" : "Wolfgang Amadeus",
        "composer_last_name" : "Mozart",     
        "work_title" : "The Magic Flute",
        "work_genre" : "singspiel",
        "date" : "1791",
        "first_performance_place" : "Vienna"},
    "work_2": {
        "composer_first_name" : "Giuseppe",
        "composer_last_name" : "Verdi",     
        "work_title" : "Aïda",
        "work_genre" : "opera",
        "date" : "1871",
        "first_performance_place" : "Cairo"}
```

Access the top-level key 'work_1':

```
my_concert['work_1']

```

Access all the keys nested within 'work_1':

```
my_concert['work_1'].keys()


```
Access the 'title' key nested within 'work_1':

```
my_concert['work_1']['work_title']
```
Add a key/value pair to one item:

```
my_concert['work_1']["librettist"] = 'Schickaneder'
```


In [None]:
my_dict = {"artist_first_name" : "Wolfgang Amadeus",
"artist_last_name" : "Mozart",
"work_title" : "The Magic Flute",
"work_genre" : "singspiel",
"date" : "1791",
"first_performance_place" : "Vienna"}

In [None]:
type(my_dict["date"])

str

In [None]:
a = list(my_dict.keys())
a

['artist_first_name',
 'artist_last_name',
 'work_title',
 'work_genre',
 'date',
 'first_performance_place']

In [None]:
Artist_Full_Name = my_dict["artist_first_name"] + " " + my_dict["artist_last_name"]
Artist_Full_Name


'Wolfgang Amadeus Mozart'

In [None]:
my_concert['work_1'].keys()

dict_keys(['composer_first_name', 'composer_last_name', 'work_title', 'work_genre', 'date', 'first_performance_place'])

In [None]:
my_concert['work_1']['work_title']

'The Magic Flute'

In [None]:
my_concert['work_1']["librettist"] = 'Schikaneder'


In [None]:
my_concert['work_1']

{'composer_first_name': 'Wolfgang Amadeus',
 'composer_last_name': 'Mozart',
 'date': '1791',
 'first_performance_place': 'Vienna',
 'librettist': 'Schikaneder',
 'work_genre': 'singspiel',
 'work_title': 'The Magic Flute'}

In [None]:
# IDEA for HW:
# Provide commands with errors, and ask them to correct 
# would need to make it clear what we're looking for by way of the correct answer.


# IDEA for HW:
Provide commands with errors, and ask them to correct 
would need to make it clear what we're looking for by way of the correct answer.
Examples:

* we want the last item from a list but something is wrong



# "If" statements

"If" statements allow you to perform **logical tests** on your data, such as:

```
Equals: a == b
Not Equals: a != b
Less than: a < b
Less than or equal to: a <= b
Greater than: a > b
Greater than or equal to: a >= b
```
Examples in "if" statement:

```
item_value = 100
if item_value > 10:
  print(True)

```

or

```
group_name = "Beatles"
if group_name.startswith("B"):
  print(True)
```
Note that "if" statements can be multi-stage, with "if" followed by "elif" (another condition to test if the first condition is not met), and "else" (a default result if none of the previous tests are true). More about "if" statements [here](https://www.w3schools.com/python/python_conditions.asp).

In [None]:
item_value = 100
if item_value > 10:
  print(True)

True


In [None]:
group_name = "Beatles"
if group_name.startswith("B"):
  print(True)

True


# Your Turn with "if" Statements

Using the list your created above, try various logical tests to find:

* integers or other values above or below a given threshold
* text strings that contain certain letters, or words (or that do not contain them)

# "For" Loops
"For" loops allow you to iterate over the items in any collection, performing the same operation or function on each.

"if" statement within a "for" loop:

```
genres =  ["Blues", "Gospel", "Country", "Hip-Hop"] 
for genre in genres:
  if genre.startswith("B"):
    print(genre)

```

or to all those that do _not_ satisfy the condition:

```
genres =  ["Blues", "Gospel", "Country", "Hip-Hop"] 
for genre in genres:
  if genre.startswith("B"):
    pass
  else:
    print(genre)
```

Note that **list comprehension** allows you work with lists without the need to create separate **for** loops.  For example:

```
b_list = [x for x in genres if x.startswith("B")]
```

More about **list comprehension** [here](https://www.w3schools.com/python/python_lists_comprehension.asp).

In [None]:
genres =  ["Blues", "Gospel", "Country", "Hip-Hop"] 
for genre in genres:
  if genre.startswith("B"):
    print(genre)

Blues


In [None]:
genres =  ["Blues", "Gospel", "Country", "Hip-Hop"] 
for genre in genres:
  if genre.startswith("B"):
    pass
  else:
    print(genre)

Gospel
Country
Hip-Hop


In [None]:
b_list = [x for x in genres if x.startswith("B")]
b_list

['Blues']


### For Loops with Nested Dictionaries

Now create a **nested** dictionary based on the one you made above.  This could be a *series of related items* (like songs on a playlist, or instruments in a collection, or scores on your shelf). The id's (or keys) for each item will need to be unique at the highest level, but the keys within each item can repeat.  

Sample nested dictionary of works on opera season:

```
my_operas = {
    "work_1": {
        "composer_first_name" : "Wolfgang Amadeus",
        "composer_last_name" : "Mozart",     
        "work_title" : "The Magic Flute",
        "work_genre" : "singspiel",
        "date" : "1791",
        "first_performance_place" : "Vienna"},
    "work_2": {
        "composer_first_name" : "Giuseppe",
        "composer_last_name" : "Verdi",     
        "work_title" : "Aïda",
        "work_genre" : "opera",
        "date" : "1871",
        "first_performance_place" : "Cairo"}
```


Then try the following to iterate through each work in the dictionary, and then print the individual key-value pairs for each work:



```
for work_id, work_info in my_concert.items():
    print("\nWork ID:", work_id)
    
    for key in work_info:
        print(key + ':', work_info[key])
```


In [None]:
my_operas = {
    "work_1": {
        "composer_first_name" : "Wolfgang Amadeus",
        "composer_last_name" : "Mozart",     
        "work_title" : "The Magic Flute",
        "work_genre" : "singspiel",
        "date" : "1791",
        "first_performance_place" : "Vienna"},
    "work_2": {
        "composer_first_name" : "Giuseppe",
        "composer_last_name" : "Verdi",     
        "work_title" : "Aïda",
        "work_genre" : "opera",
        "date" : "1871",
        "first_performance_place" : "Cairo"}
}




In [None]:
my_operas.keys()

dict_keys(['work_1', 'work_2'])

In [None]:
for work_id, work_info in my_operas.items():
    print("\nWork ID:", work_id)
    
    for key in work_info:
        print(key + ':', work_info[key])


Work ID: work_1
composer_first_name: Wolfgang Amadeus
composer_last_name: Mozart
work_title: The Magic Flute
work_genre: singspiel
date: 1791
first_performance_place: Vienna

Work ID: work_2
composer_first_name: Giuseppe
composer_last_name: Verdi
work_title: Aïda
work_genre: opera
date: 1871
first_performance_place: Cairo
