# Introduction to Python

_Written by Kevin Anchukaitis, last updated August 18, 2023_

This notebook provides a general introduction to Python functionality.  In future notebooks, we'll look specifically at using and apply other packages and libraries.

## Data Types and Operations

Python is a powerful, object-oriented, open source, and widely used programming language that is one of the most commonly used in data science and increasingly across academic disciplines and the private sector.  However, if you are accustomed to other languages like MATLAB or R, or are a novice programmer, it can take awhile to get used to the Pythonic way of doing things.  This notebook seeks to provide you with a basic introduction to Python with a focus on the syntax and the type of mathematics we'll do in this class.  Other notebooks in this directory describe reading in and writing data and plotting and visualizing data and analyses. 

Python is an **interpreted** language (like MATLAB or R), which means you don't need to compile code for it to run.  This allows Python code to be relatively easy to debug, largely portable (e.g. cross-platform), easily readable, and tractable.  The primary trade off is that it will be slower than a compiled language like FORTRAN.  It is also very extensible - a large community has written and curated packages of code that allow you to do a lot with relatively little coding. 

Python also is easily useable in Juypter notebooks (like you're using now).  This makes it good for both teaching and sharing your code (and results and figures) with others.  We'll talk more about how to use Juypter notebooks in the introductory part of the class. 

In notebooks and Python code in general, you can comment out lines of code so they aren't read when the road runs. This can be useful for documenting what a line of code does or for troubleshooting or debugging.  To comment our a single line or a part of a line you can use `#`.  

To begin with though, Python has the capacity to function as a simple calculator.  You can do arthimatic operations in a straightforward and intuitive way:

In [None]:
3 + 4 # this is a commment. When you run this code cell, you will see the answer displayed below the cell

In [None]:
4/3 # when you run this code cell, you will see the answer displayed below the cell

Like other programming languages, you can also use **variables** to stand in for numbers and perform operations on those:

In [None]:
a = 3
b = 4

a+b  # when you run this code cell, you will see the answer displayed below the cell

In [None]:
b/a

In [None]:
b * a

You can also do comparisons (more on this later in the notebook):

In [None]:
a < b

In [None]:
a == b

You can also assign the answer to an equation with variable (or numbers) to another variable.  Here we also use the `print` function to have the result displayed after the code block:

In [None]:
c = b * a
print(c) # when you assign the answer to an equation to a variable, the answer doesn't automatically print to the notebook

Python also allows you to assign multiple values at once.  For example:

In [None]:
x, y, z = 1, 2, 3
print(x)
print(y)
print(z)

You can also return multiple values on a single line in a similar way:

In [None]:
x, y # will return (1, 2) as output from this cell

There are some other rather sophisticated ways you can assign values to variables.  Here is one way you might encounter:

You can use the `+=` assignment operator to increase the current variable by a certain amount.  For instance, let's say I have a variable `x` and want to add 3 to it, but keep that nunber assigned to `x`.  See what happens:

In [None]:
x = 4 # x is equal to 4
x += 3 # equivalent to x = x + 3 = 7
print(x) # x will now be equal to 7 because of the '+=' used above

In general, using an operator (+, -, /, *, etc) with an equal sign operates this way to change the value of the original variable.  You can see a full list of these assignment operators here: [https://www.w3schools.com/python/gloss_python_assignment_operators.asp](https://www.w3schools.com/python/gloss_python_assignment_operators.asp)

There are lots of **_mathematical_** operations you can do, not just addition, subtraction, division, and multiplication as above.  In base Python, for instance, calculating an exponent is done with the '**' operator:

In [None]:
2**2 # raise 2 to the power of 2

You can see mathematical operators, comparisons, and other possible simple operations you can do with numbers and variables here: https://docs.python.org/3/library/stdtypes.html

## Data Types

Data types in Python are important to understand, as each has a set of properties that determine how they can be used and how they interact.  The basic data types are:

* `int`: Integers, or whole numbers
* `float`: Floating point numbers, or real numbers (those which decimals)
* `str`: Strings, or text data
* `list`: A sequence of values with specific properties 
* `tuple`: Another sequence of values with specific properties different from a `list` 
* `dict`: Dictionaries, which map a given `key` to a given `value`
* `bool`: Boolean values - essentially, True or False 

So far in this notebook we've looked at the first two in the list, which so far we've simply viewed as individual numbers, or what we might call **Numeric data types**, and some basic ways you can operatre on them.  Numeric data type includes integers (`int`) as well as floats (`float`, or floating precision numbers e.g. numbers with decimals).  They could also be complex numbers, although for our purposes it seldom will occur in this class.  

You can see the type of a variable using the `type` function:


In [None]:
this_number = 1.333
type(this_number) # returns 'float' because the number is a floating point precision number (number with decimals)


In [None]:
that_integer = 7
type(that_integer) # returns 'int' because 7 is an integer


Let's now spend some time looking at other data types we'll use.

### Using strings in Python

We can also have strings, or a sequences of characters.  In Python you can assign strings to a variable using either single quote marks or double:

In [None]:
d = "a string using a double quote"
f= 'a string using a single quote'
print(d)
print(f)

As with floats and integers, you can use `type` to see the type of a string variable:

In [None]:
type(d)

You can also return both the value and the type on a single line:

In [None]:
d, type(d)

As you can see above, using either single and double quotes works in Python when we create strings.  Double quotes however may be useful if you anticipate that your string might have characters including single quote marks or slashes. For example:

In [None]:
my_string = "We're looking for the person\persons responsible"
print(my_string)

You can concatenate strings thusly:

In [None]:
print(d,"concatenated with",f) # concatenate with commas
print(d+" concatenated with "+f) # concatenate with plus signs and white space
print(d,'concatenated with',f) # concatenate with commas and a single quote
print(d+' concatenated with '+f)  # concatenate with plus signs and single quotes and white space


You can't concatenate strings and numbers directly.  But, you could convert a number to string using `str()`:

In [None]:
my_age = 47
"You are " + str(my_age)

Another way to do somethng similiar is with **f-strings**.  f-strings allow you to do a string interpolation:

In [None]:
f"You are {my_age}! That's really old!"


This approach provides great flexibility and simpler formatting when creating longer strings:

In [None]:
your_age = 29
f"You are {my_age}! I am only {your_age}!"


f-strings represent an improvement on the way Python used to handle string concatentation.  You can see a useful tutorial here: [https://realpython.com/python-f-strings/](https://realpython.com/python-f-strings/)

Because strings are a particular data type in Python, there are set of operations that you can perform on them that wouldn't make sense, for instance, if applied to an integer.  For example, we can ask Python to show our string all upper case:

In [None]:
d.upper()

In [None]:
d.swapcase() # swap the current case for the other case

### An aside on functions, methods, and objects

You'll notice that we've called functions in two different ways this far.  For instance, we called `type(d)` to return the data type of variable `d`.  In the code cells above, we operated on our string variable `d` by using the `d.upper()` and `d.swapcase()` syntax.  What is going on? 

The main difference in these two ways of calling functions to operate on a variable is that for certain data types there are functions which are specific to that data type, so these functions are attached to the variable directly. For example, the `upper()` function would not make sense for any other data type other than strings, so Python makes it so this function is attached to string data types specifically and directly. This is in contrast to something like the `type` function which can take *any* data type as the input, so it is left as a generic function.  In general, when we call something in `variable.function()` form, we will call this a function call just like when we do `function(variable)` although you will also see the `variable.function()` form called a method.  We'll see more of this as we go along. 

### Using Lists in Python

The **list** is another common data type, which can hold more than one value at a time and allow us to access (or index into) these.  Lists can hold numbers or strings.  Let's look at a simple example - note that we use **square brackets** to indicate the new list:

In [None]:
fruits = ['orange', 'apple', 'pear', 'banana', 'kiwi', 'apple', 'banana']
fruits # just typing the list name here will show the list in the notebook outbook

In [None]:
print(fruits) # accomplishes the same thing as above

In [None]:
type(fruits) # this will tell us what type of data structure fruits is ...

Because Python is an object-oriented langauge, any [data type](https://docs.python.org/3/library/stdtypes.html#) or class (which in Python are now essentially the same thing) may have a set of methods that are associated with that data type.  Let's continue using our list called `fruits` to look at some examples of how this works.

For instance, the `list` class (which is what `fruits` is) has several methods you can use (see [here](https://docs.python.org/3/tutorial/datastructures.html#more-on-lists)), including one called `count`.  For instance, if we wanted to know how many times the string _apple_ appears in the list fruits, we could type the following command:

In [None]:
fruits.count('apple') # will return the number of times 'apple' appears in the list 'fruits'

Python also allows us to locate the position of a value in a list.  Let's see if we can find the location of the first instance of 'banana' using the `.index` method:

In [None]:
fruits.index('banana') # returns the position of the first instance of banana in the list 'fruits'

**Whoa**, **wait**.  If we count the number of words in the 'fruits' list until we find the first instance of 'banana' we would likely count it as being in position 4, not 3.  What's going on here? 

### An aside on zero-based counting in Python

The answer is that Python is a zero-index language.  Instead of starting to count at 1, Python counts starting at zero. So in the 'fruits' list, orange is in position 0 (the 0th position), apple is in position 1, pear is in position 2, and banana is in position 3!

Yes, this is really counterintuitive and I personally still struggle with it (especially coming from MATLAB and FORTRAN which start numbering with 1).  The reason for zero indexing is largely [mathematically motivated](https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html), where describing e.g. ranges of values is more naturally thought about with respect to zero, whereas index schemes that start at 1 are more typical of counting things.  Unfortunately, most of us do tend to think of numbers as primarily for counting ('I have 2 eggs', 'Take the 3rd right turn').  See discussion and links here: https://craftofcoding.wordpress.com/2017/03/12/why-1-based-indexing-is-ok/

This is definitely a challenge to using Python - we will encounter a number of non-intuitive consequences of this aspect of Python during the class. 

The only intuitive way of thinking about this I can offer you at this point is this:  Imagine Python indexing as describing the solution to the following description: 'Once I am in a starting position, how far do I have to go to get to the result I'm looking for?'.  In the above example, if you start in the first position (orange), you need to move 3 positions to get to banana (or 'Starting at orange, I need to move three positions, 1st to apple, 2nd to pear, and 3rd to banana').  You could also think of how we talk about birthdays - e.g. your 1st birthday is actually the anniversary of your birth day (your 0th birthday).

### Now back to lists

In Python, you can specify a specifc element or elements of a data structure using _indexing_, but make sure you obey the zero-indexing rules of Python. 

One example of this is to specify the position or location in a list - that is, we can ask Python to reveal the value of the first, second, third, etc. position in a list.  Just remember, Python starts counting at zero!

So, if `fruits` is currently a list of strings with these values:

['orange', 'apple', 'pear', 'banana', 'kiwi', 'apple', 'banana']

... let's ask Python to show us the string values in particular positions in the list:

In [None]:
fruits[0] # in Python, square brackets allow you to specify the elements of a data structure by position in the structure

In [None]:
fruits[2]  # in Python, square brackets allow you to specify the elements of a data structure by position in the structure

### An aside on indexing in Python

One of the most common things you'll need to do with lists (or other sequences) is get individual items out of the them. To do so you need to "index" into the sequence, which basically means to point to the location of the sequence where the item or items that you want are. As discussed above, _**Python is a zero-based language,**_ so the first item in a sequence type has an index of 0 (the 0th position or location). To index on a sequence you use **brackets** with integer value(s) inside. See below for examples of how to get the first item out of our example list and tuple.

A common pitfall is confusing indexing and function calls. Indexing uses square brackets [] and function calls use parentheses ().  Of course, creating a list _also_ uses square brackets ... If you try to use parentheses for indexing when you should have used brackets you will receive a error.

Let's look a bit more at indexing, before we return to the other properties of lists.  First, let's create a simple list for us to use:

In [None]:
my_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
my_list[0]

We can return multiple index values on the same line, e.g.:

In [None]:
my_list[0], my_list[3]

We can ask for values in other ways as well.  For instance, if we use an index of `-1` we will get the last item in the list:

In [None]:
my_list[-1] # in general, using the negative means to count backwards from the end of the sequence

You can of course get more than a single value out of a sequence.  The concept of getting some sub-sequence out of a sequence is commonly referred to as `slicing`. In Python, the syntax for slicing is `sequence_variable[start:stop:step]` where `start`, `stop`, and `step` are all (optional) integer values. There are some examples below.


Indexing is where the zero-based nature of Python gets really confusing.  It might help to visualize `my_list` and the zero-based indexing using the following figure:



![sequence_visualized.jpeg](attachment:sequence_visualized.jpeg)

Using the above figure as a guide, here are some various different approaches to indexing that might be useful and instructove in understanding Python's approach to this critical programming feature:

In [None]:
print(my_list[:])        # Get everything in the list
print(my_list[5:])       # Get the 5th element and everything after
print(my_list[:5])       # Get everything up to but NOT including the 5th element
print(my_list[3:5])      # Get the 3rd and 4th elements
print(my_list[0:None])   # Get everything by starting at zero and going to the end
print(my_list[0:None:2]) # Get every other item 
print(my_list[1::3])     # Get every 3rd item, starting after the 1st item

Note how the numbers in the slice specifiers can be optional - for instance `my_list[5:]` basically says 'start at position 5 and go (:) to the end'.  `my_list[1::3]` essentially says 'start from position 1, go (:) to the end, and step by 3'. 

You'll notice some oddities above though.  Specifically, notice how `my_list[3:5] only returns the values at index 3 and 4, but not 5.  Or, that `my_list[:5] doesn't include the value at index position 5.  You'll encounter this throughout Python - the end value of the sequence doesn't include that value.  This arises because of the nature of the zero-based indexing, but it is deeply counterintuitive to me.  It may be easier to think about index numbers not as indicating a position in this case, but rather 'cut points':

![image.png](attachment:image.png)


In this way of visualizing indexing, you could think of [0:1] as implying 'start at 0 and cut at 1', which would give you the value of 1:

In [None]:
print(my_list[0:1])    


Indexing is an incredibly important part of a programming language, and unfortunately Python's zero-based nature can make this more difficult (and dangerous) than it ought to be.  Especially as a beginner, whenever you encounter an indexing (or indexing-like) situation, make sure you are 100% sure what values Python is using. 

### Now, back to lists


Another somewhat non-intuitive and potentially `dangerous` aspect of Python is that applying a method to a data object can _sometimes_ change that object.  Look at the following method applied to 'fruits':

In [None]:
fruits.reverse() # this changes the order of 'fruits'
fruits

Note that the reversal of the order of the list above doesn't just _show_ you the reverse (e.g. it doesn't just print it to the screen or give you the opportunity to assign the reverse to a new variable, as MATLAB would) - it actually **applies** the reverse method to the existing list `fruits`` and so changes the list structure from that point onward!

Also observe the empty parentheses in the `fruits.reverse()` command.  Empty parentheses indicate that the method or function you are calling either takes no additional input arguments at all or it has some default arguments or behavior that are not necessary to specify.

Let's look at a similar method, `.sort` - you can call this method without any additional input:

In [None]:
fruits.sort()
fruits

Now your list is sorted (alphabetically, from A to Z).  What if we wanted to sort a list in the opposite order, however, starting with Z and going to A?  It turns out there is a parameter you can call, `reverse=True` that will sort in ascending order:

In [None]:
fruits.sort(reverse=True)
fruits

See how we passed the parameter option to the method - Python uses this form of 'option_name = value' to pass instructions to methods and to overrule the default behavior.  

Lists are **mutable** in Python, meaning that we can add, remove, and change their elements. So, we can append new values to our list using `.append`:

In [None]:
fruits.append('pineapple')
fruits

We can also change elements of a list by referencing their position (using square brackets) and using the equal sign to set the value of that position to a new position:

In [None]:
fruits[1] = 'mango'
fruits

### Tuples? What are tuples? 

In addition to **lists**, there is another important Python data type called a **tuple** (**tuh**-pl).  The word comes from mathematics, where it refers to a (finite) sequence or ordered list of numbers.  A tuple is therefore similar to a list, but has different properties and methods. Generally speaking, a tuple is an **structured** sequence of values.  Whereas lists are a collection of individual elements of the same type, tuples can have multiple pieces of related but heterogenous information and/or the values are related to one another. 

A tuple can be created by placing all these related elements inside **parentheses** (), separated by commas:

In [None]:
us_president = ('Joe', 'Biden', '2021-01-20', 'Democratic')
type(us_president)

We can still access elements of a tuple in a familiar way (keeping in mind the oddity of zero-indexing):

In [None]:
print(us_president[2])
print(us_president[0],'',us_president[1])


Unlike lists, tuples are _immutable_. This data structure doesn’t allow changing, adding, or removing individual elements, because the elements are all linked.  So, running `us_president[0] = 'Donald'` would result in an error.  

### Lists or tuples? 

Tuples, as far as I know, don't exist in most other programming languages.  So you're probably asking 'What is the difference?' and eventually you'll ask 'Should I be using a list or a tuple?'.

To be honest, I'm still working through these questions myself. 

There some good resources on Stackoverflow:

List vs tuple, when to use each?
https://stackoverflow.com/questions/1708510/list-vs-tuple-when-to-use-each

What's the difference between lists and tuples?
https://stackoverflow.com/questions/626759/whats-the-difference-between-lists-and-tuples

From those posts, a few points:

* Tuples are heterogeneous data structures, while lists are homogeneous sequences.
* 'Tuples have structure, lists have order'
* You can't add elements to a tuple. Tuples have no `append` method.
* You can't remove elements from a tuple. Tuples have no `remove` method.
* Tuples are smaller and faster
* Because they are immutable, tuples can act as keys in a dictionary (see below)

### Using sets in Python

Python also has a data structure called **sets**.  Sets are again similar to lists and tuples, but also have their own rules and characteristics.  sets are unordered and cannot contain duplicates (even if you have 30 entries for the same thing, the resulting set will only have a single instance of that thing) and defined using curly braces {}:

In [None]:
animals = {'penguin', 'cat', 'dog', 'lion', 'lion'}
animals

Sets are mutable like lists, allowing us to add and remove their elements, but the way of doing so is somewhat different, because sets are _unordered_:

In [None]:
animals.add('jaguar')
animals # note that jaguar wasn't appended to the end, like happened when we added to a list

### Dictionaries in Python

Whereas sets are unordered collections of data, **dictionaries** (`dict`) are unordered collections of data stored as _**key-value pairs**_.  The keys have to be unique within that dictionary.  For instance, let's say we want to have a data structure where each person is associated with a number (like your student number):

In [None]:
student_numbers = {'jane': 1234, 'bill': 5678}
type(student_numbers)
print(student_numbers)

One of the nice things about a dictionary is we can retrieve values by the key they are associated with:

In [None]:
student_numbers['bill']

You can also build dictionaries explicitly with the `dict` command and a collection of key-value pairs: 

In [None]:
dict([('jane', 1234), ('bill', 5678)])

Or using simple equal signs to indicate the key-value pair:

In [None]:
dict(jane=1234,bill=5678)

Because dictionaries are data types with specific methods that make sense, you can query the keys or values.  For instance:

In [None]:
student_numbers.keys() # returns the keys in the dictionary

In [None]:
student_numbers.values() # returns the values in the dictionary

### Booleans and Comparisons

Booleans ()`bool`) are binary variables that can take 2 values: `True` or `False` (note here the capitalization). One of the ways we'll use these is to check if certain conditions are met.  For example, if we are interested in equalities or inequalities between numeric data, the answer to a comparison between values will be a boolean.  We can also use the same approach to see if, of instance, a certain word is contained in a list of strings.  

Here is a simple example:

In [None]:
x = 3
y = 21
x < y # will return True because 3 is indeed less than 21

In [None]:
x == y # will return False because 3 is not equal to 21

Here are some of the comparisons you might find yourself making:

* `==`: equal to
* `!=`: not equal to
* `<`: less than
* `<=`: less than or equal to
* `>`: greater than
* `>=`: greater than or equal to
  

Python also can use logical operators:

* `and`: returns True if both or all statements are True
* `or`: returns True if any statements are True
* `not`: returns True only if the statement is False
   
Let's see how this works in practice:

In [None]:
x = 3
y = 21

print((x<y) and (x==y)) # clearly False

In [None]:
print((x<y) or (x==y)) # True because the first statement is True

We can also use similar operations on strings, for instance to find if a specific word exists in a string

In [None]:
my_string = 'The quick brown fox jumped over the lazy dog'
'dog' in my_string # returns True because the word dog is indeed in the string my_string

In [None]:
'cat' in my_string # returns False because 'cat' is not found in my_string