### Citations (unfinished)

**Resources used in the creation of this guide**

1. http://python-textbook.pythonhumanities.com/01_intro/01_01-02_introduction_to_python.html

# Python Basics

## Introduction

Python is a powerful yet approachable programming language, making it ideal for use in this course. While Python has many applications, our focus in Encoding Music will be using Python to understand, transform, and create data related to music. The objective of this guide is to introduce you to the basic syntactical structure of Python while learning about four key topics:

1. Data and Data Structures
2. Logic
3. Loops
4. Functions

You may already be familiar with Python, in which case you can use this guide as a refresher or reference.

## Where are we?

Traditionally, Python is written in a `.py` file, like `hello.py`, consisting only of Python code. However, in this course, we'll write and run code in Jupyter notebooks. The main difference between `.py` files and Jupyter notebooks is the ability to break up code with sections of formatted text, called "markdown", easily run isolated sections of code, and view the the output of your code in one place.

Let's get started!

## Data and Data Structures

### Hello World

Later, we'll take data and analyze it. But we need to be able to get Python to show us the results - to create output. Python creates output using the `print()` function.

In [42]:
# Run this cell
print("Hello, World!")

Hello, World!


As you can see, the `print()` function will cause some output - whatever is between those parentheses - to display below the code cell. You can try changing the text, running the cell again, and seeing what happens. Just be sure to keep the quotation marks in place.

You may have also noticed plain English writing in the code cell: `# Run this cell`. This is called a comment, and allows you to write small notes, instructions, or descriptions in your code. Writing `#` tells Python to ignore everything that follows the `#` in that line.

### Variables

Sometimes, we might want Python to remember something, so we can use it later. Python gives us the ability to store things in the computer's memory using **variables**. With a variable, you essentially *name* a location in the computer's memory, then put something there. For example:

In [43]:
# Run this cell
x = "Hello, World!"

This is called *assigning a value* (`"Hello, World!"`) *to a variable* (`x`). Now we can try printing `"Hello, World!"` again, but this time by referencing the variable `x`.

In [44]:
# Run this cell
print(x)

Hello, World!


A nice thing about Jupyter is that if you put a variable in the **last line** of a code cell and run it, Jupyter will output that variable.

In [45]:
# Run this cell
x

'Hello, World!'

This also demonstrates a fundamental principle in Jupyter - **all your code exists in the same world**. This means if you assign a value to `x` in one cell, and reference `x` in another cell, that will work because both cells exist in the same world.

#### Things to keep in mind

As you begin coding, here are some important ideas to keep in mind. You can try testing these out in the code block below.

##### Case Sensitivity

Everything in Python is case-sensitive. Running `print(X)` will give you an error, because you haven't yet named a location in memory `X`. Running `Print(x)` will give you an error because the capitalization is not correct.

##### Nothing Counts Unless You Run It

You might decide you want to output `"got it"` in the below cell, after you correct the capitalization of the variable `x`. If you go to the earlier cell where we first assigned a value to `x` and change that value to `"got it"`, you have to **run that cell**. Only *then* can you return to the cell below, run it, and get the new output `"got it"`. This can become difficult to keep track of with larger files - if Python ever acts like code you wrote doesn't exist, first make sure you've run all the code you've written!

In [46]:
# Experiment with the below issues
print(X)    # How can this be made to run? And how can you change the output value?

NameError: name 'X' is not defined

### Data Types

Let's say you're doing a project on the Beatles, and you're collecting some data about them. You already know how to create a variable for the band name - `"The Beatles"`, just like `"Hello, World!"`, is text. But you'll want to store other types of data, too. To begin, consider:

1. Name of band 
2. Number of albums released
3. Number of years the band was together
4. Whether or not they are in the Hall of Fame (HOF)
5. List of all band members

You could simply store all these as text - for example, `band_members = "John, Paul, George, and Ringo`. But Python has simple, built-in ways to express different types of data.

In [None]:
# Run this cell
band_name = "The Beatles"   # string
albums_released = 13        # integer
years_together = 7.6        # float
in_hof = True               # boolean
band_members = ["John Lennon", "Paul McCartney", "George Harrison", "Ringo Starr"] # list

These are five different data types in Python. Just by writing the data in a specific way (e.g. including quotation marks, a decimal, brackets), we can tell Python what type of data we want the variable to recognized as. We can verify that Python understands this using the `type()` function, which outputs the `type` of the data between the parentheses.

In [None]:
# Run this cell
print(type(band_name))
print(type(albums_released))
print(type(years_together))
print(type(in_hof))
print(type(band_members))

<class 'str'>
<class 'int'>
<class 'float'>
<class 'bool'>
<class 'list'>


### Data Types

These are some of the most important data types in Python. Here's a breakdown of what they mean:

#### Strings

*(abbreviated `str`)* A **string** of alpha-numeric characters surrounded by quotation marks. The quotation marks can be double `"text"` or single `'text'`, but never mixed `"not this!'`. We use strings for text, `"anything from a long phrase or paragraph"` to a single character `"c"`, or even an empty string `""`.

#### Integers

*(abbreviated `int`)* A number that has no decimal or fractional part (e.g. `3`, `0`, `-10`).

#### Floats

A number that can have a decimal or fractional part, but doesn't have to (e.g. `3.14159`, `-7.2`, `5.0`).

#### Booleans

*(abbreviated `bool`)* A value that is either `True` or `False`, and must be capitalized exactly as shown. Booleans are written **without** quotation marks -- if quotation marks are included, Python will interpret the variable as a string.

#### Lists

A collection of values, written within brackets, and separated by commas. A list can contain any data type within it, and can contain multiple data types at once. A list can also contain another list (called a **nested list**).

For example:

* `["John", "Paul", "George", "Ringo"]`
* `["Apple", 2024, 0.8, [True, True, False]]`

### Why do Data Types Matter?

Python comes with a variety of powerful functions that work with our data. But the way these functions work, and whether or not they work at all, depends on what data type Python thinks it's dealing with.

An obvious example is addition: writing `x = 5 + 5` will set `x` to `10`. However, writing `y = "5" + "5"` will set `y` to `"55"` - whats going on here? Try it out:

In [None]:
x = 5 + 5
print(x)

y = "5" + "5"
print(y)

10
55


When you put quotation marks around `"5"`, you're telling Python to treat it as a string. When you tell Python to add strings, it will just combine them in order (e.g. `"code" + " in " + "python"` results in `"code in python"`). This is something called **string concatenation**.

As another example, take the list of the Beatles, `["John", "Paul", "George", "Ringo"]`. Python can perform operations special operations on lists using something called **methods**.

Say you store this list in the variable `l`:

`l = ["John", "Paul", "George", "Ringo"]`

You can use Python's `reverse()` method on the list like this:

`l.reverse()`

In [None]:
l = ["John", "Paul", "George", "Ringo"]
l.reverse()

l

['Ringo', 'George', 'Paul', 'John']

However, you can't use this on a string, because this method only exists for lists:

In [None]:
s = "John, Paul, George, Ringo"
s.reverse()

s

AttributeError: 'str' object has no attribute 'reverse'

It's important to think about what you want to do with data before choosing how to store it. Python only has certain tools available for certain data types, which makes your decision of data types important.

Let's return to our example project, collecting data about the Beatles. Now that we've established the importance of data types and covered the most important ones, we can go even further.

#### Dictionaries

*(abbreviated `dict`)* For our project, we want to know the full name of each Beatle, when they were born, and what instrument they played. But we don't yet have a good way of storing a group of related data. We could try to use a list:

`john = ["John Lennon", 1940, "guitar"]`

But this isn't particuarly helpful. If we're looking at this list, it's not immediately clear what each category in the list is meant to be. **Dictionaries** solve that issue by alllowing you to specify a **key** for each **value**.

For the value `"John Lennon"`, we might specify the key `"full_name"` like this:

`"full_name": "John Lennon"

This is a **key-value pair**. A **dictionary** is a collection of these key-value pairs, surrounded by curly brackets `{` `}` and separated by commas. Here is an implementation of some dictionaries in our example:

In [47]:
# Run this cell
band_name = "The Beatles"   # string
albums_released = 13        # integer
years_together = 7.6        # float
in_hof = True               # boolean
band_members = [
    {"full_name": "John Lennon", "birth_year": 1940, "instrument": "guitar"},           # dict
    {"full_name": "Paul McCartney", "birth_year": 1942, "instrument": "bass guitar"},   # dict
    {"full_name": "George Harrison", "birth_year": 1943, "instrument": "guitar"},       # dict
    {"full_name": "Ringo Starr", "birth_year": 1940, "instrument": "drums"}             # dict
] # list

# See what happens when you output a dictionary
band_members

[{'full_name': 'John Lennon', 'birth_year': 1940, 'instrument': 'guitar'},
 {'full_name': 'Paul McCartney',
  'birth_year': 1942,
  'instrument': 'bass guitar'},
 {'full_name': 'George Harrison', 'birth_year': 1943, 'instrument': 'guitar'},
 {'full_name': 'Ringo Starr', 'birth_year': 1940, 'instrument': 'drums'}]

As you can see, the list of strings we had previously is now a much richer list of dictionaries, containing much more information. You can also see how the various data types work together: a **list** of **dictionaries**, each containing **strings** and **integers**.

This can be extended infinitely - for example, you might decide that each band member should actually have a **list** of every instrument they played, rather than just one.

For example John Lennon's key-value pair `"instrument": "guitar"` could be replaced by:

`"instruments": ["guitar", "harmonica", "piano", "violin", "trumpet"]`

Just remember to be consistent - it makes your life much easier if every band member's dictionary has data stored in the same way.

Let's make one last big change. We have five variables storing data about the same subject, the Beatles. That seems like a perfect candidate for a dictionary. We can replace these variables wth key-value pairs:

In [48]:
# Run this cell
beatles = {
    "band_name": "The Beatles",
    "albums_released": 13,
    "years_together": 7.6,
    "in_hof": True,
    "band_members": [
        {"full_name": "John Lennon", "birth_year": 1940, "instrument": "guitar"},
        {"full_name": "Paul McCartney", "birth_year": 1942, "instrument": "bass guitar"},
        {"full_name": "George Harrison", "birth_year": 1943, "instrument": "guitar"},
        {"full_name": "Ringo Starr", "birth_year": 1940, "instrument": "drums"}
    ]
}

# See what happens when you output the new dictionary
beatles

{'band_name': 'The Beatles',
 'albums_released': 13,
 'years_together': 7.6,
 'in_hof': True,
 'band_members': [{'full_name': 'John Lennon',
   'birth_year': 1940,
   'instrument': 'guitar'},
  {'full_name': 'Paul McCartney',
   'birth_year': 1942,
   'instrument': 'bass guitar'},
  {'full_name': 'George Harrison', 'birth_year': 1943, 'instrument': 'guitar'},
  {'full_name': 'Ringo Starr', 'birth_year': 1940, 'instrument': 'drums'}]}

Dictionaries are a great way to store a variety of data related to one entity.

You're now familiar with **strings**, **integers**, **floats**, **booleans**, **lists**, and **dictionaries**. With these data types, we've assembled a collection of data related to the Beatles. Now it's time to do something with that data.

For our project, 