# Sample Script
This sample script harvests a spreadsheet of data and generates a report. This notebook is designed to introduce Python coding concepts to better understand the following code.

In [None]:
'''
Generate an alphabatized report of artists names and the alphabatized tags associated with their works.
For example, an artist with three tags should be formatted as follows:
ARTIST: TAG1, TAG2, TAG3
If an artists has no tags associated with their works, omit them from the report.
'''

import pandas as pd
import numpy as np

url = 'https://raw.githubusercontent.com/ecds/intro-to-python/main/MetObjects1000.csv'

data = pd.read_csv(url)
data = data.replace(np.nan, '', regex=True)
data = data.astype(str)

artist_dict = {}
for index, row in data.iterrows():
    artists = row['Artist Display Name']
    tags = row['Tags']

    artists = artists.split('|')
    tags = tags.split('|')

    for artist in artists:
        if artist not in artist_dict.keys():
            artist_dict[artist] = tags
        else:
            artist_dict[artist].extend(tags)

for artist in sorted(artist_dict.keys()):
    if artist != '':
        tags = [x for x in artist_dict[artist] if x != '']
        tags = list(set(tags))
        tags.sort()
        if len(tags) > 0:
            print(artist + ': ' + ', '.join(tags))

# Python Basics
The print statement. The print function print() prints the specified message to the screen or device.
You can print a string, or another object which will be converted to a string.

In [None]:
print("Hello World!")

You can print multiple objects, and you can specify the separator.

In [None]:
x = "Hello"
y = "world!"
print(x, y, sep=" ")

Syntax - You MUST indent correct or you will get a syntax error in Python. Indentation is used to determine scope.

In [None]:
if 5 > 2:
  print("Five is greater than two!")

Let's try it without a space. You get an error!

In [None]:
if 5 > 2:
print("Five is greater than two!")

You can pick the number of spaces (1-4 are common), but you have to be consistent throughout your code or you will get an error.

In [None]:
desserts = ['cake', 'butterscotch', 'ice cream', 'lollipops', 'pudding', 'sorbet', 'pie']
for dessert in desserts:
  if len(dessert) > 6:
      print('Yum, I love ' + dessert + '!')
  else:
      pass

In [None]:
desserts = ['cake', 'butterscotch', 'ice cream', 'lollipops', 'pudding', 'sorbet', 'pie']
for dessert in desserts:
  if len(dessert) > 6:
    print('Yum, I love ' + dessert + '!')
  else:
    pass

## Declaring variables

In [None]:
string_example = 'I am a string variable'
print(sting_example)

In [None]:
number_example = 5
print(number_example)

Keep in mind because there is no declaring variable command, you can overwrite a variable easily.

In [None]:
print(string_example)

In [None]:
string_example = "I am an exciting new string!"
print(string_example)

## Data types
Some main data types in python are:
int (integer), float (floating-point number), str (string)
numeric types like integer and float:

In [None]:
number1 = 3.333
type(number1)

also a float

In [None]:
floatnumber = 35e3

In [None]:
number2 = 3
type(number2)

In [None]:
number1 + number2

In [None]:
number3 = number1 + number2
type(number3)

strings:

In [None]:
string1 = "Hello. How are you? "
type(string1)

for multi-line strings use three quotes """ or '''

In [None]:
a = """Lorem ipsum dolor sit amet,
consectetur adipiscing elit,
sed do eiusmod tempor incididunt
ut labore et dolore magna aliqua."""
print(a)

Strings are an array of individual characters, which can be accessed using square brackets.

In [None]:
a = "Hello, World!"
print(a[1])

boolean:

In [None]:
boolean1 = True
type(boolean1)

You can set a variable to a specific data type.

In [None]:
x = str("Hello World")
x = int(20)
x = float(20.5)

Two more important data types are list (lists) and dict (dictionaries).  We're going to cover those more in a bit.

You can use python for calculations, which may come up within more complex scripts. For example, adding numbers.

In [None]:
num = 3

mean = (num + 10)/2
print(mean)

You can also use python for concatenating strings.

In [None]:
string2 = "I am fine, thank you."

print(string1 + string2)

You could also save these to a new variable.

In [None]:
string3 = string1 + string2

You can change between data types using these functions.
int(), float(), str(), list(), tuple(), set()

In [None]:
number = '3'
type(number)

In [None]:
int(number)
type(number)

In [None]:
word = 3
type(word)
word2 = str(word)
type(word2)

Why do this conversion? Try this example..
Start with the text you provide.

In [None]:
word1 = 'I have '
word3 = ' apples'

then let's use "word" from above 3 - as the number data to include.

In [None]:
print(word1 + word + word3)

Whoa!
What if we use our converted word2?

In [None]:
print(word1 + word2 + word3)

You can also convert between integer and float. You may need specific data types in some applications.

In [None]:
x = 1    # int
y = 2.7  # float

convert from int to float:

In [None]:
a = float(x)

convert from float to int:

In [None]:
b = int(y)

print(a)
print(b)

# Lists

A list is a data type in Python that allows you to store multiple items in a single variable.

A list is created using square brackets:

In [None]:
empty_list = []
desserts_list = ['pie', 'cake', 'ice cream', 'cookies']
primes_list = [2, 3, 5, 7, 11]
mixed_data_list = [5, 'cake', 4.7, 'cookies']

You can use the len() function to count the number of items in a list:

In [None]:
len(desserts_list)

## Adding new items to a list
Lists are ordered. They have a defined order that will not change. If you add a new item to a list using the append() function, it will be added to the end of the list:

In [None]:
desserts_list.append('candy')
print(desserts_list)

You can make an empty list and fill it up with values using append:

In [None]:
dogs = []
dogs.append('Fido')
dogs.append('Buddy')
dogs.append('Ralph')
print(dogs)

You can combine lists using + or the extend() function:

In [None]:
group_1 = ['Matthew', 'Karla', 'Michele']
group_2 = ['David', 'Eddie', 'Annie']
combined_group = group_1 + group_2
print(combined_group)

In [None]:
flavors = ['chocolate', 'vanilla', 'mint']
more_flavors = ['orange', 'pineapple']
flavors.extend(more_flavors)
print(flavors)

## Indexing a list
Lists items are indexed by numbers. **In Python indexing starts with 0.** So the first item in a list has an index value of 0 and the last item in the list has a value of n-1. 

You can access an item in list using square brackets around the index value:

In [None]:
beatles = ['John', 'Paul', 'George', 'Ringo']
print(beatles[0]) # Print the first item in list.

In [None]:
print(beatles[3]) # Print the last (4th) item in the list.

Sometimes you want to access the last item in a list but you don't know how many items are in it. In that case, you can use a negative index value:

In [None]:
print(beatles[-1]) # Print the last item in the list.

In [None]:
print(beatles[-2]) # Print the second to the last item in the list.

You can also access a range of items in a list using index values. Notice that the range below prints the up to **but NOT including** the upper index value.

In [None]:
print(beatles[1:3]) # Print the second and third items in the list

If you leave out the start value, the range will return index values starting at 0 and up to but not including the index value listed:

In [None]:
print(beatles[:2]) # Print the first two items in the list.

## Changing values in a list
Now that you know how to index a list, you can use this to change items in a list. Let's replace brussel sprouts with candy in the list below:

In [None]:
desserts = ['cake', 'ice cream', 'brussel sprouts', 'cookies']
desserts[2] = 'candy'
print(desserts)

You can also replace a range of items:

In [None]:
desserts = ['cake', 'ice cream', 'brussel sprouts', 'asparagus', 'grean beans', 'pie']
# Let's swap out all the vegetables for sweets
desserts[2:5] = ['lollipops', 'pudding', 'sorbet']
print(desserts)

Suppose we want to insert butterscotch at the second position in our list, between cake and ice cream. We can use the insert function to do this using an index value of 1 (remember we start counting at zero!). 

In [None]:
desserts = ['cake', 'ice cream', 'lollipops', 'pudding', 'sorbet', 'pie']
desserts.insert(1, 'butterscotch')
print(desserts)

## Removing items from a list
The simplest way to remove an item from a list is to provide its value to the remove() function like so:

In [None]:
desserts = ['cake', 'butterscotch', 'ice cream', 'lollipops', 'pudding', 'sorbet', 'pie']
desserts.remove('cake')
print(desserts)

You can also use the pop() function to remove an item based on its index value:

In [None]:
desserts = ['cake', 'butterscotch', 'ice cream', 'lollipops', 'pudding', 'sorbet', 'pie']
desserts.pop(-2) # Remove sorbet
print(desserts)

If you don't provide pop() a value, it remove the last item in a list:

In [None]:
desserts = ['cake', 'butterscotch', 'ice cream', 'lollipops', 'pudding', 'sorbet', 'pie']
desserts.pop() # Remove pie
print(desserts)

Alternatively, you can use the keyword del to remove an item:

In [None]:
desserts = ['cake', 'butterscotch', 'ice cream', 'lollipops', 'pudding', 'sorbet', 'pie']
del desserts[1] # Remove butterscotch
print(desserts)

## Looping through a list
It is very common in Python to want to loop through a list of items and do something to each one. There are a number of ways to do this. The simplest is perhaps using a for loop:

In [None]:
desserts = ['cake', 'butterscotch', 'ice cream', 'lollipops', 'pudding', 'sorbet', 'pie']
for dessert in desserts:
    print('I love ' + dessert + '!')

Notice in the above example that we use `dessert` as a placeholder variable when looping through the items in the list. Python doesn't understand English grammar so we don't have to use the singular version of desserts. We could use anything we want as a placeholder variable in a for loop. For example we could have written `for x in desserts`. But using `for dessert in desserts` is closer to natural language and is therefore a common convention.

Also note that the print statement is indented. When writing a for loop, all the code that does things to the placeholder variable should be indented. This lets Python know that you're still working within the loop.

We can also loop through a list using the range() function and the len() function to refer to items by their index number:

In [None]:
desserts = ['cake', 'butterscotch', 'ice cream', 'lollipops', 'pudding', 'sorbet', 'pie']
# Print the first letter of each dessert
for i in range(len(desserts)):
    print(desserts[i][0])

Let's summarize what just happened. We used `i` as a placeholder variable to loop through the range of index values in the list. The len() function finds the total number of items and the range function creates a range from 0 up to but not including the number of items in the list. So with a list of seven items, this will index from index number 0 through the index number 6. In essence, we're looping through every item in the list. Then we print the item by referring to it with the index number `[i]` and access the first letter by indexing the string itself with `[0]`. 

Another way to loop thorugh the items is using a **list comprehension**, sometimes referred to as Pythonic loop. It looks like this:

In [None]:
desserts = ['cake', 'butterscotch', 'ice cream', 'lollipops', 'pudding', 'sorbet', 'pie']
# Make a list of the number of characters in each dessert
dessert_lengths = [len(x) for x in desserts]
print(dessert_lengths)

This time we used `x` as a placeholder variable. We created a new list `dessert_lengths` to store the data. We filled it up by iterating over each dessert `x` and capturing its length using the len() function. List comprehensions are a compact syntax for doing something to each item in a list. 

## Lists and Conditional Statements
It's very common in Python to apply conditional statements to items in a loop. Let's start with an example in which we print out one statement if the dessert is more than six characters and a different statement if it is less than six characters:

In [None]:
desserts = ['cake', 'butterscotch', 'ice cream', 'lollipops', 'pudding', 'sorbet', 'pie']
for dessert in desserts:
    if len(dessert) > 6:
        print('Yum, I love ' + dessert + '!')
    else:
        print('Yuck, I do not like ' + dessert + '.')

Notice the indentation in the above example. The if/else block is indented within the for loop. Also, the print statements are indented from within the if/else block. For each dessert in the list, the code checks if the length is greater than 6. If it is, the first print statement gets executed. If not, the second print statement gets executed. This is a simple example of what is called **flow control**. 

Sometimes you want to skip over items that do not meet a condition. You can use the keyword `pass` to do this. Let's repeat the last example except this time we will not print anything if the item is less than six characters:

In [None]:
desserts = ['cake', 'butterscotch', 'ice cream', 'lollipops', 'pudding', 'sorbet', 'pie']
for dessert in desserts:
    if len(dessert) > 6:
        print('Yum, I love ' + dessert + '!')
    else:
        pass

In addition to `if` and `else`, there is also `elif` which is short for "else if". You use it to evaluate conditions after an initial `if` statement but before a final `else` statement. For example:

In [None]:
desserts = ['cake', 'butterscotch', 'ice cream', 'lollipops', 'pudding', 'sorbet', 'pie']
for dessert in desserts:
    if len(dessert) > 6:
        print('Yum, I love ' + dessert + '!')
    elif len(dessert) > 4:
        print('Nice, I sort of like ' + dessert + '.')
    else:
        pass

Notice that sorbet is the only dessert to which the print statement in the `elif` block was applied. That's because it's not greater than 6 but it is greater than 4 so it caused the second print statement to execute.

You can also use conditional statements on list comprehensions:

In [None]:
desserts = ['cake', 'butterscotch', 'ice cream', 'lollipops', 'pudding', 'sorbet', 'pie']
yummy_desserts = [x for x in desserts if len(x) > 6]
for dessert in yummy_desserts:
    print('Yum, I love ' + dessert + '!')

## Sorting Lists
Lists have access to a **method** called `sort()` that will arrange them alphanumerically in ascending order:

In [None]:
desserts = ['cake', 'butterscotch', 'ice cream', 'lollipops', 'pudding', 'sorbet', 'pie']
desserts.sort()
print(desserts)

You can pass an additional argument to the `sort()` method to sort in descending order:

In [None]:
desserts = ['cake', 'butterscotch', 'ice cream', 'lollipops', 'pudding', 'sorbet', 'pie']
desserts.sort(reverse=True)
print(desserts)

The `sort()` method is handy but it sorts capitalized strings before lowercase ones. We can supply another argument to the `sort()` method to perform case-insensitive sorting:

In [None]:
desserts = ['cake', 'butterscotch', 'Ice cream', 'lollipops', 'Pudding', 'sorbet', 'pie']
desserts.sort(key=str.lower)
print(desserts)

Another way to sort a list is with the `sorted()` function. While the `sort()` method works on the items in a list, `sorted()` returns a new list. You can read about the difference here: https://discuss.codecademy.com/t/what-is-the-difference-between-sort-and-sorted/349679 and here: https://realpython.com/python-sort/

In [None]:
desserts = ['cake', 'butterscotch', 'ice cream', 'lollipops', 'pudding', 'sorbet', 'pie']
desserts = sorted(desserts)
print(desserts)

## Strings to Lists to Strings
Sometimes you encounter data separated by a character like `|` and you want to turn it into a list. You can accomplish this with the `split()` method:

In [None]:
data_string = "Value1|Value2|Value3"
data_list = data_string.split("|")
print(data_list)

To concatenate a list of strings with a character, you can use the `join()` method:

In [None]:
data_list = ['Value1', 'Value2', 'Value3']
print('|'.join(data_list))

Remember that you cannot concatenate strings and numbers:

In [None]:
data_list = ['Value1', 5, "Banana"]
print(','.join(data_list))

In [None]:
# Let's try that again by first converting all values to strings
data_list = ['Value1', 5, "Banana"]
data_list = [str(x) for x in data_list]
print(','.join(data_list))

# Dictionaries
A dictionary is used to store data in `key:value` pairs. It uses curly brackets `{}` to enclose the data:

In [2]:
cat_dict = {
    "name": "Amos",
    "breed": "Siamese", 
    "age": 6
}
print(cat_dict)

{'name': 'Amos', 'breed': 'Siamese', 'age': 6}


You can access a value in a dictionary using the key:

In [None]:
# Print the name of the cat in cat_dict
print(cat_dict["name"])

Dictionaries have a `get()` method that can also return the value of a key:

In [None]:
# Print the age of the cat in cat_dict
print(cat_dict.get("age"))

You can also access the keys in a dictionary as a list:

In [None]:
print(cat_dict.keys())

In [None]:
# Print each key in the dictionary on a separate line:
for key in cat_dict.keys():
    print(key)

You can access the values in a dictionary as a list:

In [None]:
for value in cat_dict.values():
    print(value)

## Changing, Adding and Removing Data
You can update a value by referencing its key:

In [None]:
cat_dict['age'] = 9
print(cat_dict)

Alternatively, you can use the `update()` method to change the data. Note that you must enclose the key:value pair in curly brackets.

In [None]:
cat_dict.update({"age": 12})
print(cat_dict)

You can add a new key value pair to the dictionary as follows:

In [None]:
cat_dict["favorite_snack"] = "catnip"
print(cat_dict)

In [3]:
# Or use the update method:
cat_dict.update({'reproductive_status': "neutered"})
print(cat_dict)

{'name': 'Amos', 'breed': 'Siamese', 'age': 6, 'reproductive_status': 'neutered'}


Delete items from a dictionary using the `pop()` method:

In [None]:
cat_dict.pop("favorite_snack")
print(cat_dict)

You can also create an empty dictionary and fill it up with values:

In [None]:
temperatures = {}
temperatures['Jordan'] = 98.6
temperatures['Maria'] = 98.2
temperatures['Alyson'] = 99.1
print(temperatures)

## More complex data structures
So far we have only used strings and integers in our dictionary. But the values of a dictionary can be anything including lists and even other dictionaries. 

In [None]:
favorite_foods = {}
favorite_foods["Laura"] = ['spaghetti', 'french toast', 'tacos']
favorite_foods["Paul"] = ['kiwi', 'bagels', 'hamburgers', 'eggs']
print(favorite_foods)

Above we defined a variable as an empty dictionary and then populated it with data. For each key:value pair, the key is a name and the value is a list of foods. We can access items in that list of foods with an index:

In [None]:
# Print the last favorite food listed for Laura:
print(favorite_foods['Laura'][-1])

Once data is stored in a dictionary, you can retrieve it and format it as a string of data:

In [None]:
for person in favorite_foods.keys():
    print(person + ": " + ', '.join(favorite_foods[person]))

Nested dictionaries are a great way to store information about a series of objects. When we made the `cat_dict` earlier, it contained key:value pairs of data about a single cat. But we could create a nested dictionary to store the same information about multiple cats:

In [None]:
cats = {
    "cat1": {
        "name": "Gus",
        "breed": "Sphynx",
        "age": 2
    },
    "cat2": {
        "name": "Raven",
        "breed":  "Devon Rex",
        "age":  7
    },
    "cat3": {
        "name": "Tamara",
        "breed": "Persian",
        "age": 11
    }
}
print(cats)

We can iterate over the cats and print their names:

In [None]:
for cat in cats.keys():
    print(cats[cat]["name"])

In [None]:
Or we could use the same approach to printing a statement about them:

In [None]:
for cat in cats.keys():
    print(cats[cat]["name"] + ' is a ' + str(cats[cat]["age"]) + "-year-old " + cats[cat]["breed"] + " cat.")

# Sample Script Revisited
Congratulations for making it through the workshop notebook! Let's revisit the script and see if it makes more sense now. 

The first lines of the script import and use Python code libraries `pandas` and `numpy` to harvest the data and transform it into a table of string data. We then use a `pandas` method called `iterrows()` to loop over each row in the data and manipulate it. 

For each row of data, we're only interested in the Artist's Display Name and Tags columns. So we create a variable for each and access the data. Notice that each row is a dictionary and you access the value of the cell by using the column name: `row['Artist Display Name']`. Since the data has multiple values separated by pipes, we convert `artists` and `tags` into lists using the `split()` method. 

Next we loop through each artist in the list. We fill up an empty dictionary by adding the artist name as the key and the list of tags as the value. We need an if/else statement to accomplish this. The first time the artist comes up in the data, we want to use the tags in that row as the value but if the artist is already in the dictionary, we want to add more tags to the list using `extend()`. 

Finally, we can iterate over the keys in our artist dictionary and print out the report line by line. The original dataset had many objects without an artist and so our dictionary has a key value of an empty string with several tags as values. We use the line `if artist != '':` to ignore this special case. Then we use a list iteration to eliminate empty strings in the list of tags. Next we use `set()` to obtain a unique list of tags and wrap it with `list()` to convert it back to a proper list. We then sort the list of tags alphabetically and then, if there's at least one tag listed for the artist, we print out the line of the report by concatenating the artists name and the tags.

In [None]:
'''
Generate an alphabatized report of artists names and the alphabatized tags associated with their works.
For example, an artist with three tags should be formatted as follows:
ARTIST: TAG1, TAG2, TAG3
If an artist has no tags associated with their works, omit them from the report.
'''

import pandas as pd
import numpy as np

url = 'https://raw.githubusercontent.com/ecds/intro-to-python/main/MetObjects1000.csv'

data = pd.read_csv(url)
data = data.replace(np.nan, '', regex=True)
data = data.astype(str)

artist_dict = {}
for index, row in data.iterrows():
    artists = row['Artist Display Name']
    tags = row['Tags']

    artists = artists.split('|')
    tags = tags.split('|')

    for artist in artists:
        if artist not in artist_dict.keys():
            artist_dict[artist] = tags
        else:
            artist_dict[artist].extend(tags)

for artist in sorted(artist_dict.keys()):
    if artist != '':
        tags = [x for x in artist_dict[artist] if x != '']
        tags = list(set(tags))
        tags.sort()
        if len(tags) > 0:
            print(artist + ': ' + ', '.join(tags))