### Citations (unfinished)

**Resources used in the creation of this guide**

1. http://python-textbook.pythonhumanities.com/01_intro/01_01-02_introduction_to_python.html

# Python Basics

## Introduction

Python is a powerful yet approachable programming language, making it ideal for use in this course. While Python has many applications, our focus in Encoding Music will be using Python to understand, transform, and create data related to music. The objective of this guide is to introduce you to the basic syntactical structure of Python while learning about four key topics:

1. Data and Data Structures
2. Logic
3. Loops
4. Functions

You may already be familiar with Python, in which case you can use this guide as a refresher or reference.

## Where are we?

Traditionally, Python is written in a `.py` file, like `hello.py`, consisting only of Python code. However, in this course, we'll write and run code in Jupyter notebooks. The main difference between `.py` files and Jupyter notebooks is the ability to break up code with sections of formatted text, called "markdown", easily run isolated sections of code, and view the the output of your code in one place.

Let's get started!

## Data and Data Structures

### Hello World

Later, we'll take data and analyze it. But we need to be able to get Python to show us the results - to create output. Python creates output using the `print()` function.

In [2]:
# Run this cell
print("Hello, World!")

Hello, World!


As you can see, the `print()` function will cause some output - whatever is between those parentheses - to display below the code cell. You can try changing the text, running the cell again, and seeing what happens. Just be sure to keep the quotation marks in place.

You may have also noticed plain English writing in the code cell: `# Run this cell`. This is called a comment, and allows you to write small notes, instructions, or descriptions in your code. Writing `#` tells Python to ignore everything that follows the `#` in that line.

### Variables

Sometimes, we might want Python to remember something, so we can use it later. Python gives us the ability to store things in the computer's memory using **variables**. With a variable, you essentially *name* a location in the computer's memory, then put something there. For example:

In [15]:
# Run this cell
x = "Hello, World!"

This is called *assigning a value* (`"Hello, World!"`) *to a variable* (`x`). Now we can try printing `"Hello, World!"` again, but this time by referencing the variable `x`.

In [16]:
# Run this cell
print(x)

Hello, World!


A nice thing about Jupyter is that if you put a variable in the **last line** of a code cell and run it, Jupyter will output that variable.

In [17]:
# Run this cell
x

'Hello, World!'

This also demonstrates a fundamental principle in Jupyter - **all your code exists in the same world**. This means if you assign a value to `x` in one cell, and reference `x` in another cell, that will work because both cells exist in the same world.

#### Things to keep in mind

As you begin coding, here are some important ideas to keep in mind. You can try testing these out in the code block below.

##### Case Sensitivity

Everything in Python is case-sensitive. Running `print(X)` will give you an error, because you haven't yet named a location in memory `X`. Running `Print(x)` will give you an error because the capitalization is not correct.

##### Nothing Counts Unless You Run It

You might decide you want to output `"got it"` in the below cell, after you correct the capitalization of the variable `x`. If you go to the earlier cell where we first assigned a value to `x` and change that value to `"got it"`, you have to **run that cell**. Only *then* can you return to the cell below, run it, and get the new output `"got it"`. This can become difficult to keep track of with larger files - if Python ever acts like code you wrote doesn't exist, first make sure you've run all the code you've written!

In [18]:
# Experiment with the below issues
print(X)    # How can this be made to run? And how can you change the output value?

NameError: name 'X' is not defined

### Data Types

Let's say you're doing a project on the Beatles, and you're collecting some data about them. You already know how to create a variable for the band name - `"The Beatles"`, just like `"Hello, World!"`, is text. But you'll want to store other types of data, too. To begin, consider:

1. Name of band 
2. Number of albums released
3. Number of years the band was together
4. Whether or not they are in the Hall of Fame (HOF)
5. List of all band members

You could simply store all these as text - for example, `band_members = "John, Paul, George, and Ringo`. But Python has simple, built-in ways to express different types of data.

In [None]:
# Run this cell
band_name = "The Beatles"   # string
albums_released = 13        # integer
years_together = 7.6        # float
in_hof = True               # boolean
band_members = ["John Lennon", "Paul McCartney", "George Harrison", "Ringo Starr"] # list

These are five of many different data types in Python. Just by writing the data in a specific way (e.g. including quotation marks, a decimal, brackets), we can tell Python what type of data we want the variable to recognized as. We can verify that Python understands this using the `type()` function, which outputs the `type` of the data between the parentheses.

In [None]:
# Run this cell
print(type(band_name))
print(type(albums_released))
print(type(years_together))
print(type(in_hof))
print(type(band_members))

<class 'str'>
<class 'int'>
<class 'float'>
<class 'bool'>
<class 'list'>


### Data Types

These are some of the most important **data types** in Python. But why does it matter what data type a variable is? How do data types give us greater capabilities when working with Python?

Python handles each data type differently. When you **access** and **operate on** your data, you'll have to do so in different ways depending on the data type. You'll also see that different operations are possible on different data types. When you're deciding how to store your data, your principal consideration should be what you intend to do with that data. Here's a summary of those things:

#### Strings

`"this is a string"`

* Get just part of a string
* Get jus


##### Description

*(abbreviated `str`)* A **string** of alpha-numeric characters surrounded by quotation marks. The quotation marks can be double `"text"` or single `'text'`, but never mixed `"not this!'`. We use strings for text, `"anything from a long phrase or paragraph"` to a single character `"c"`, or even an empty string `""`.

##### Creation

`text1 = "Encoding Music"`

`text2 = 'Music 255'`

##### Accessing

Usually, you'll want the entire string, which you can get using the variable name.

`print(text2)`

Sometimes, though, you might just want part of a string. Say you just want the course number for Encoding Music, `"255"`. You can get this through a technique known as **string splicing**.

A string is a sequence of characters, and you can assign an index to each of these characters. For example, in `"Music 255"`:

```
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|'M'|'u'|'s'|'i'|'c'|' '|'2'|'5'|'5'|
```

You can access just the '2' using the **index** of that character. To the right of the variable name, enclose the index in brackets: `text2[6]`.

See what happens:

In [27]:
# Run this cell
text2 = 'Music 255'
text2[6]

'2'

To access several characters in a string, you can specify a **range**. Ranges are specified in the format `[start:end]`, but *with a catch*: the end is **not inclusive**. The range `[6:8]` won't work to get "255", because that range gets all the indices **starting from 6** and **up to, *but not including*, 8**.

In [None]:
# Run this cell
text2[6:8]

'25'

So if the index of the last character you want is `8`, your range should end in `9`.

In [None]:
# Run this cell
text2[6:9]

'255'

If you want a **substring** to go up to the end of a string, you can also leave the end of a range blank: `text2[6:]`. This captures every character from index 6 onwards.

You can also refer to indices by their position relative from the **end** of the string. Here is what those alternative indices would look like:

```
| -9 | -8 | -7 | -6 | -5 | -4 | -3 | -2 | -1 |
|'M' |'u' |'s' |'i' |'c' |' ' |'2' |'5' |'5' |
```

Thus, another way to get the substring "255" is using `text2[-3:]`.

In [None]:
# Run this cell
text2[-3:]

'255'

##### Operating

There's an optional third argument when specifying range: `step` (`string[start:end:step]`). By default, `step` is set to `1`. This means that within the specified range, your substring will move from forward from index to index in increments of 1.

However, this can be changed to any integer. For example, we can take the substring `text2[::2]`. Since we left `start` and `end` blank, this range automatically includes the beginning and end of the string (you don't need to leave a whitespace to leave `start` or `end` blank). See what happens:

In [29]:
# Run this cell
text2[::2]

'Msc25'

As you can see, this returns **every other character** in the string. If we set step to a negative integer, the substring specified by the range will be parsed in **reverse**. For example, `text2[::-1]` will return the original string in reverse:

In [33]:
# Run this cell
text2[::-1]

'552 cisuM'

However, keep in mind that changing the order in which the string is parsed changes what the `start` and `end` of your substring should be. If you just want to reverse the word `Music`, your new `start` is at the character `c`, in index position `4`. If you set the `end` to `0`, you'll exclude the character `M`, since the `end` of a range is not inclusive. The only way to reverse all of `Music` is to leave `end` blank: `text2[4::-1]`

In [36]:
# Run this cell
text2[4::-1]

'cisuM'

##### Operating with Built-In Methods

Python provides a number of **methods** for operating on strings. Note the usage of the term "returns": when a Python method is used, the output of that operation is called what the method **returns**. Here are some key methods:

* `.title()`: Returns the string in title case
* `.upper()`: Returns the string in upper case
* `.lower()`: Returns the string in lower case
* `.replace()`: Returns the string with characters replaced as specified
* `.find()`: Returns the index of a given character in the string, or -1 if not found
* `.strip()`: Returns a string without the specified trailing/leading characters
* `.split()`: Returns a list of substrings from the original string that were separated by the specified value

To see more, visit the [W3Schools resource on strings](https://www.w3schools.com/python/python_ref_string.asp).

To understand how these work, see the below example:


In [14]:
# Run this cell
text1 = 'encoding music'
print("Original: " + text1)

text1 = text1.title()
print("Title case: " + text1)

text1 = text1.upper()
print("Upper case: " + text1)

text1 = text1.lower()
print("Lower case: " + text1)

text1 = text1.replace(" ", "_") # Replaces all instances of " " with "_"
print("Replacement: " + text1)

print("Find:")
print(text1.find('_'))      # Returns the index of the character "_" without changing the string

text2 = '--Music 255---'
print("Original: " + text2)

text2 = text2.strip('-')    # Removes "-" from the beginning and end of the string until the
                            # string neither starts nor ends with "-"
print("Stripped: " + text2)

text2 = text2.split(' ')    # Splits the string based on the separator " " (single space)
print("Split:")
print(text2)                # Need to print the list on a new line, since you can't add
                            # a string to a list.

Original: encoding music
Title case: Encoding Music
Upper case: ENCODING MUSIC
Lower case: encoding music
Replacement: encoding_music
Find:
8
Original: --Music 255---
Stripped: Music 255
Split:
['Music', '255']


You may have noticed how the output from the `.find()` and `.split()` methods were displayed differently from the other outputs. This is because the data type resulting from `.find()` is an integer, which cannot be added to a string, and the data type resulting from `.split()` is a list, which cannot be added to a string:

`print("Find: " + 8)` would return an error.

`print("Split: " + ['Music', '255'])` would return an error.

There is a more elegant solution to this problem: **f-strings**. F-strings (short for **formatted strings**) are a way to cleanly incorporate data into a string. For example: `print(f"Split: {text2}")`

Placing `f` before the string indicates an f-string. Then, within an f-string, anything within curly brackets `{` `}` is automatically converted to a string that can be outputted. Try it:

In [15]:
# Run this cell
print(f"Find: {text1.find('_')}")
print(f"Split: {text2}")

Find: 8
Split: ['Music', '255']


You can also manually convert other data types to strings using the `str()` function. For example, say you have the variables `a = 'Music'` and `b = 255`. If you want to use these two variables to create the string `Music 255`, you might want to convert `b` from an integer to a string.

You can do this using `str(b)`:

In [16]:
# Run this cell
a = 'Music'
b = 255

b = str(b)
print(type(b))  # See that b has been converted to a string

# Now add a and b, with a space in between, and print it
course = a + ' ' + b
print(course)

<class 'str'>
Music 255


You can also perform the opposite conversion! You can convert a string to an integer using `int()`. This will only work if the string you perform it on is actually an integer, for example `int('255')`:

In [5]:
# Run this cell
x = int('255')  # converting a string to an integer

type(x)

int

Now let's see what we can do with integers in Python.

#### Integers

##### Description

*(abbreviated `int`)* A number that has no decimal or fractional part (e.g. `3`, `0`, `-10`).

##### Creation

Any number without a decimal is automatically interpreted as an integer.

`x = 5`

`x = -22`

##### Operating

When Python recognizes a variable is an integer, you can do math with it:

In [13]:
# Run this cell
print(10 + 5)   # Addition
print(10 - 5)   # Subtraction
print(10 * 5)   # Multiplication
print(10 ** 5)  # Exponentiation 10^5
print(10 / 5)   # Division
print(10 % 5)   # Remainder

15
5
50
100000
2.0
0


You may have noticed that the output from division, `10 / 5`, came out with a decimal. Since division can result in **floats**, numbers with a fractional part, the result of **all** division will be outputted as a float.

#### Floats

##### Description

A number that can have a decimal or fractional part, but doesn't have to (e.g. `3.14159`, `-7.2`, `5.0`).

##### Creation

Just adding a decimal to a number makes Python interpret it as a float, even when that decimal is 0.

`x = 4.2`

`x = -5.0`

##### Operation

Operating with floats works the same way as with integers, and for our usage there is no real difference between the two. Python is flexible, so if you perform an operation that combines integers and floats, Python will treat both as floats (and thus give a float as an output).

`3.5 + 4` will output `7.5`, for example.

Even when the decimal is 0, involving a float will cause the output to be a float: `1 * 1.0` will output `1.0`.

You can convert floats to integers using `int()`, which just strips a number of its decimal places: see what happens when you write `int(7.8)`.

In [18]:
int(7.8)

7

#### Booleans

##### Description

*(abbreviated `bool`)* A value that is either `True` or `False`, and must be capitalized exactly as shown. Booleans are written **without** quotation marks -- if quotation marks are included, Python will interpret the variable as a string.

##### Creation

You can create a boolean by writing either `True` or `False` without quotation marks.

`b = True`

`b = False`

##### Operation

Booleans are important because of the operations possible with them. We'll explore the ways you can operate with booleans when we get to **logic** and **conditionals**.

#### Lists

A collection of values, written within brackets, and separated by commas. A list can contain any data type within it, and can contain multiple data types at once. A list can also contain another list (called a **nested list**).

For example:

-   `["John", "Paul", "George", "Ringo"]`
-   `["Apple", 2024, 0.8, [True, True, False]]`



It's important to think about what you want to do with data before choosing how to store it. Python only has certain tools available for certain data types, which makes your decision of data types important.

Let's return to our example project, collecting data about the Beatles. Now that we've established the importance of data types and covered the most important ones, we can go even further.

#### Dictionaries

*(abbreviated `dict`)* For our project, we want to know the full name of each Beatle, when they were born, and what instrument they played. But we don't yet have a good way of storing a group of related data. We could try to use a list:

`john = ["John Lennon", 1940, "guitar"]`

But this isn't particuarly helpful. If we're looking at this list, it's not immediately clear what each category in the list is meant to be. **Dictionaries** solve that issue by alllowing you to specify a **key** for each **value**.

For the value `"John Lennon"`, we might specify the key `"full_name"` like this:

`"full_name": "John Lennon"

This is a **key-value pair**. 

##### Description

A **dictionary** is a collection of these key-value pairs, surrounded by curly brackets `{` `}` and separated by commas. Here is an implementation of some dictionaries in our example:

In [None]:
# Run this cell
band_name = "The Beatles"   # string
albums_released = 13        # integer
years_together = 7.6        # float
in_hof = True               # boolean
band_members = [
    {"full_name": "John Lennon", "birth_year": 1940, "instrument": "guitar"},           # dict
    {"full_name": "Paul McCartney", "birth_year": 1942, "instrument": "bass guitar"},   # dict
    {"full_name": "George Harrison", "birth_year": 1943, "instrument": "guitar"},       # dict
    {"full_name": "Ringo Starr", "birth_year": 1940, "instrument": "drums"}             # dict
] # list

# See what happens when you output a dictionary
band_members

[{'full_name': 'John Lennon', 'birth_year': 1940, 'instrument': 'guitar'},
 {'full_name': 'Paul McCartney',
  'birth_year': 1942,
  'instrument': 'bass guitar'},
 {'full_name': 'George Harrison', 'birth_year': 1943, 'instrument': 'guitar'},
 {'full_name': 'Ringo Starr', 'birth_year': 1940, 'instrument': 'drums'}]

As you can see, the list of strings we had previously is now a much richer list of dictionaries, containing much more information. You can also see how the various data types work together: a **list** of **dictionaries**, each containing **strings** and **integers**.

This can be extended infinitely - for example, you might decide that each band member should actually have a **list** of every instrument they played, rather than just one.

For example John Lennon's key-value pair `"instrument": "guitar"` could be replaced by:

`"instruments": ["guitar", "harmonica", "piano", "violin", "trumpet"]`

Just remember to be consistent - it makes your life much easier if every band member's dictionary has data stored in the same way.

Let's make one last big change. We have five variables storing data about the same subject, the Beatles. That seems like a perfect candidate for a dictionary. We can replace these variables wth key-value pairs:

In [None]:
# Run this cell
beatles = {
    "band_name": "The Beatles",
    "albums_released": 13,
    "years_together": 7.6,
    "in_hof": True,
    "band_members": [
        {"full_name": "John Lennon", "birth_year": 1940, "instrument": "guitar"},
        {"full_name": "Paul McCartney", "birth_year": 1942, "instrument": "bass guitar"},
        {"full_name": "George Harrison", "birth_year": 1943, "instrument": "guitar"},
        {"full_name": "Ringo Starr", "birth_year": 1940, "instrument": "drums"}
    ]
}

# See what happens when you output the new dictionary
beatles

{'band_name': 'The Beatles',
 'albums_released': 13,
 'years_together': 7.6,
 'in_hof': True,
 'band_members': [{'full_name': 'John Lennon',
   'birth_year': 1940,
   'instrument': 'guitar'},
  {'full_name': 'Paul McCartney',
   'birth_year': 1942,
   'instrument': 'bass guitar'},
  {'full_name': 'George Harrison', 'birth_year': 1943, 'instrument': 'guitar'},
  {'full_name': 'Ringo Starr', 'birth_year': 1940, 'instrument': 'drums'}]}

Dictionaries are a great way to store a variety of data related to one entity.

You're now familiar with **strings**, **integers**, **floats**, **booleans**, **lists**, and **dictionaries**. With these data types, we've assembled a collection of data related to the Beatles. Now it's time to do something with that data.

For our project, 