# MIT-WHOI JP 2025 Summer Math Review: Introduction to Python
## *Created by Zoe Aarons, Fadime Stemmer*
##### *Last Modified: 07-16-2025*
This notebook utilizes resources from the Environmental Bioinformatics course by Harriet Alexander, Maria Pachiadaki, Carolyn Tepolt, and Arianna Krinos. 

## Welcome to Python!
### This is a Jupyter Notebook that will walk us through the basics of Python, data science applications, and relevant libraries.

Python is an open source, object-oriented programming language with useful applications in data science, numerical modeling, GIS, and developing open source software.

## Python Basics
### Print Statements and Variables 


You can print a string or integer with the command print():

In [1]:
# Print a string by specifying with '':
print('Hello World')

# Print an integer by typing the number without '':
print(2025)

Hello World
2025


Unlike some other programming languages, you do not need to specify what a variable is going to be. It can honestly be anything and will take on anything. Any data type can be assigned to a variable with the `=`. For example:

In [2]:
my_name = 'Poseidon'
blue = 'red'
apple = 5

Variables persist between calls (until they are actively changed by assigning a new value). So, when you set a variable in one cell it is going to be the same further down.

In [3]:
# print both our variables
print(my_name)
print(apple)

Poseidon
5


We can also assign and print variables of different data types (together):

In [4]:
# Combine strings & variables in a print statement with commas
print('The god of the ocean is:', my_name)

The god of the ocean is: Poseidon


Strings can further be indexed, meaning we can select a single letter or several letters from a string by identifying the position in the string with [ ]. **NOTE: Indexing in python starts at 0, so if you want to grab the 1st letter, you will need to write 0 into the prentheses.**

In [5]:
# index the 3rd letter of the variable my_name
print(my_name[2])

# index the first 4 letters of the variable my_name
print(my_name[0:4])

# we can also use negative numbers to index in reverse
print(my_name[-2:])

s
Pose
on


Sometimes it can be useful to identify the data type (espcially for debugging). You can do that with the `type` function. This is especially helpful when using numeric values, as Python can be a bit finnicky between integers and floats.

In [6]:
print(type(my_name))
print(type(apple))

<class 'str'>
<class 'int'>


Variables can also be used in any calculation you want. For example:

In [7]:
favorite_number = 24
favorite_number_squared = favorite_number ** 2

You can also take information and pass it into another text string that is printed:

In [8]:
print("My favorite number is" , favorite_number, "that number squared is:", favorite_number_squared)

My favorite number is 24 that number squared is: 576


Or alternatively using the `.format()` method (NOTE: `.format()` is a method that is specific to strings. Methods are like functions but are tied to specific data types.


In [9]:
print("My favorite number is {} that number squred is {}".format(favorite_number, favorite_number_squared))

My favorite number is 24 that number squred is 576


You can also create user-defined variables with the `input` function: 

In [10]:
favorite_number = input('What is your favorite number?')

# the + sign concatenates strings
print('My favorite number is ' + favorite_number + '!')

What is your favorite number?
My favorite number is !


#### Basic Python operators
|Sign | Description |
|--- | ---  |
|+ | addition |
|- | subtraction|
|* | multiplication|
|/ | division|
|// | integer division|
|** | exponential multiplication|
|% | modulus (aka the remainder in division)|

You can find more information about Python operators & variable assignment [here](https://www.educative.io/blog/object-oriented-programming).

### Libraries
One of the best features of Python is libraries, or commonly open sourced packages you can import into the software and use their built in functions. There are likely thousands out there, ranging from simple mathematical operations all the way to complex plotting & machine learning algorithms. Below I've outlined a couple libaries I've found useful when working with Python. Many of these are foundational libraries used often as well:

#### Click on the Library Name for a link to their website

__[numpy](https://numpy.org/):__ basic arthimetic operations all the way to complex mathematical functions

__[matplotlib](https://matplotlib.org/):__ basic plotting library *(import as matplotlib.pyplot)*

__[pandas](https://pandas.pydata.org/):__ super useful library for data processing and organization!! one of the main reasons to use python

__[seaborn](https://seaborn.pydata.org/#):__  fancier plotting library (aka prettier plots)

__[sklearn](https://scikit-learn.org/stable/):__ machine learning library with learning data sets and useful algorithms

__[cmocean](https://matplotlib.org/cmocean/):__ colorschemes imbedded within matplotlib specifically for commonly measured oceanographic parameters


In order to use libraries in Python, we have to import them into the script. Typically this is done at the very beginning of files, which you will see in the applied examples notebooks.



In [11]:
import numpy as np #the "as" portion of the code allows you to rename a library (so you can type less when using it a lot!)
import matplotlib.pyplot as plt #I am accessing a specific portion of a library using the dot syntax
import pandas as pd

#### Numpy
Let´s explore some functionalities of Numpy. We can load in a datafile (.txt) using `np.loadtxt`.

In [15]:
# load a file using numpy
data = np.loadtxt('./random.txt',delimiter=',')
print(data)

[[  10.   20.]
 [  50.   23.]
 [ 100.   33.]
 [ 200.   57.]
 [ 300.  103.]
 [ 400.  178.]
 [ 500.  212.]
 [ 600.  234.]
 [ 700.  240.]
 [ 800.  239.]
 [ 900.  240.]
 [1000.  241.]]


Next let´s check some useful mathematical/statistical operations:

In [16]:
# access pi from the numpy library
print(np.pi)

# calculate the mean and standard devation of a list of values
scores = [98,72,85,53]

# find the mean of the dataset
print(np.mean(scores))

# find the standard deviation of the dataset
print(np.std(scores))

# determine the sum of all elements in the list
np.sum(scores)

3.141592653589793
77.0
16.62828914831589


308

There is much more one can do with numpy - it is always useful to check out the documentation to explore functionalities and find out how to use the respective tools. You will practice using the documentation in the excercises. 

### Attributes vs. Methods
The variables we have created previously are also called **objects** and are the building blocks for our code. They have inherent **attributes**, or properties about themselves we can access and use to inform our code. We can also perform **methods** on objects, which are a series of operations we can use to manipulate, extract, and/or alter the attributes of the object.

To access the attributes of an object, we typically use this notation: `attribute(object)`. This will return whatever specific property we want to know about an object. 

Methods are utilized with the dot notation: `object.method()`. 

Some useful attributes:

|Attribute | Description |
|:------------- |:-------------|
|`help()`|Use help function to get specific information about what a method or attribute does.|
|`print()`|Prints string/integer/variable that is specified in ( ).|
|`len()`|Tells lenght of a string and saves as integer.|

Some useful methods: 

|Method | Description |
| :------------- |:-------------|
|`STRING.format`| Specific for strings. `.format` allows the insertion of variables (type string) or strings directly into a field specified by { }.|
|`LIST.append()`|Add items to end of a list. Only takes one argument.|
|`"string".upper()`||
|`"string".lower()`||
|`SET.union()`|Combines sets 1 and 2 without repeating values|
|`SET_1.intersection(SET_2)`|Finds values that appear in both sets|
|`SET_1.difference(SET_2)`|Finds all values that appear in set_1 but not in set_2|
|`SET.issubset('VALUE1', 'VALUE2')`|Make sure set only consists of certain values specified in ().|
|`DICTIONARY.keys()`|Prints keys of the dictionary|
|`DICTIONARY.values()`|Prints values of the dictionary|
|`STRING.strip()`|Strips whitespace () or specific character ('CHARACTER') off at the end of a string|
|`STRING.split()[INDEX]`|Splits a string based on a provided delimiter. By providing index we separate strings and then select specified column.|
|`STRING.count()`|Counts the number of occurences of a character within a string|
|`STRING.join('ELEMENT_1', 'ELEMENT_2')`|Joins elements of a list with specified string|
|`VARIABLE.startswith('string')`|Test whether a string starts with a specified character/set of characters|
|`VARIABLE.endswith('string')`|Test whether a string ends with a specified character/set of characters|
|`STRING.replace('string_to_replace', 'replacement_string')`|Replaces specified character or set of characters with another|

#### Exercise - String operations
Lets look at some examples for methods and attributes. 

In [25]:
# create a string with all the department names:
depts = '#AOPE; BIO; MC&G; MG&G; PO#'
print(depts)

# remove hashtags
depts = depts.strip('#')
print(depts)

# change the commas to semicolons
depts = depts.replace(';',',')
print(depts)

# split the string after each comma
depts = depts.split(', ')

# note that this function returns a list with all the new strings!
print(depts)

#AOPE; BIO; MC&G; MG&G; PO#
AOPE; BIO; MC&G; MG&G; PO
AOPE, BIO, MC&G, MG&G, PO
['AOPE', 'BIO', 'MC&G', 'MG&G', 'PO']


If you ever get confused on what a specific method or function does, you can use the `help()` function to access documentation:

In [26]:
help(type)

Help on class type in module builtins:

class type(object)
 |  type(object) -> the object's type
 |  type(name, bases, dict, **kwds) -> a new type
 |  
 |  Methods defined here:
 |  
 |  __call__(self, /, *args, **kwargs)
 |      Call self as a function.
 |  
 |  __delattr__(self, name, /)
 |      Implement delattr(self, name).
 |  
 |  __dir__(self, /)
 |      Specialized __dir__ implementation for types.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __instancecheck__(self, instance, /)
 |      Check if an object is an instance.
 |  
 |  __or__(self, value, /)
 |      Return self|value.
 |  
 |  __repr__(self, /)
 |      Return repr(self).
 |  
 |  __ror__(self, value, /)
 |      Return value|self.
 |  
 |  __setattr__(self, name, value, /)
 |      Implement setattr(self, name, value).
 |  
 |  __sizeof__(self, /)
 |      Return mem

### Variables with Arrays, Lists, Dictionaries, and Sets
#### Lists
An `array` is a collection of data that is called by a single variable name.

A `list` is a 1D array that contains a series of values. List variables are declared by using brackets [ ]  following the variable name. Values do not need to be of the same type (i.e. you can have a mixture of strings, integers, floats, and booleans). You can also have list contained within a list. Lists are always ordered (once you set the order).

Let's try making a list that contains your favorite number, letter, and fruit:

In [27]:
# Create a list called favorites
Favorites =[23, 'R', 'Blueberries']

Arrays and lists can also be indexed, since they are ordered. 

In [28]:
# Call the 3rd element in the list "Favorites" using indexing
Favorites[2]

'Blueberries'

We can use functions to modify lists. Lets add another element to the end of the list using one of the functions we learned about previously.

In [29]:
# Add your favorite sports to the end of the list
Favorites.append('Taekwondo')
print(Favorites)

[23, 'R', 'Blueberries', 'Taekwondo']


### Arrays
Arrays are useful when working with numeric data. One of the options for working with numeric data is the package numpy. Since we already imported numpy before, we can directly get to work!

Here is a summary of useful functions you can use with numpy (Also don´t forget that you can use help() to get more instructions on how to use certain functionalities.)

|Method | Description |
|:------------- |:-------------|
|`np.array([1D, 1D, 1D],[2D, 2D, 2D],[3D, 3D, 3D])`|Storing numeric data, whether imported or created within Python. Numpy arrays are analagous to one of the main data structures in MATLAB, and has a series of accompanying functions built into the numpy library. The number of brackets signifies the dimension of the array and each dimension contains elements.|
|`np.ones_like(ARRAY)`|Create an empty array based on a previously defined array filled with 1s.|
|`np.zeros_like(ARRAY)`|Create an empty array based on a previously defined array filled with 0s.|


In [2]:
#help(np)

In [31]:
# create a 1D numpy array
data1D = np.array([1,2,3])
print(data1D)

# to create an array a multidimensional array, add each row's values in another [] within the .array command
data3D = np.array([[1,2,3], [4,5,6], [7,8,9]])
print(data3D)

# create an empty array based on the 3D array from the cell block above
ones_data3D = np.ones_like(data3D)
print(ones_data3D)

# create a zeros array based on the same array
zeros_data3D = np.zeros_like(data3D)
print(zeros_data3D)

[1 2 3]
[[1 2 3]
 [4 5 6]
 [7 8 9]]
[[1 1 1]
 [1 1 1]
 [1 1 1]]
[[0 0 0]
 [0 0 0]
 [0 0 0]]


To determine the dimensions of an array, you can access the shape method. You can also rearrange rows and columns of arrays with the reshape tool. Python syntax is (rows, columns).

In [32]:
# obtain the dimensions of our current array
print(data3D.shape)

# reshape our current 3D array into a 1D array
# note if you have an unknown number of rows or columns to convert to, you can input -1 and python will figure it out!
data1D = data3D.reshape(-1,1)
print(data1D)

(3, 3)
[[1]
 [2]
 [3]
 [4]
 [5]
 [6]
 [7]
 [8]
 [9]]


### Sets
Sets in python look a bit like lists at first, but they serve a different purpose. Sets are an unordered collections of unique elements-- this means that more than one of the same item cannot occur and that sets cannot be indexed. So, why would you want to use them? Well, they end up being very useful if you ever want to identify if something is present in a group or identify what the unique members of a group are. Sets are made using the function set().

In [33]:
# Create a list of colors
colors = ['blue', 'blue', 'red', 'green']

# Create a set 
uniquecolors = set(colors)
print(uniquecolors)

{'blue', 'red', 'green'}


Sets have some really powerful methods associated with them. For example, you can rapidly run math-like set comparison with `.union()`, `.intersection()`, `.difference()`. Try running these commands with the sets defined below.

In [3]:
begin_in_A = set(['apple', 'ashen', 'architecture'])
end_in_E = set(['stare', 'bane', 'apple', 'frazzle'])

# Union - Combines the objects in both sets and saves in new set
union = begin_in_A.union(end_in_E)

# Intersection - Output is objects present in both compared sets
intersection = begin_in_A.intersection(end_in_E)

# Difference - Output is objects that are in begin_in_A but removes everything that is also in end_in_E
difference = begin_in_A.difference(end_in_E)

print(union, intersection, difference)


{'apple', 'ashen', 'stare', 'architecture', 'frazzle', 'bane'} {'apple'} {'architecture', 'ashen'}


### Dictionaries

A dictionary (sometimes called a hash or map in other languages) is a different type of container. It is much like it sounds-- it is a dictionary. As with a paper dictionary, where you look up a word and get a definition (or multiple definitions), a dictionary will take a key and look up any associated value. Values can be anything that you can store to a variable: strings, lists, sets, other dictionaries. Dictionaries can be as simple as associating one string with another string or can be much more complicated.

You can initiate a dictionary with {}:

In [35]:
# create a dictionary of model parameters
params = {'sea water depth': 5, 'sediment discharge': 1000, 'island width': 300}

# print our sediment discharge value
print(params['sediment discharge'])

# we can iterate over existing values in our dictionary by calling the associated keys
params['sea water depth'] = 10
print(params)

1000
{'sea water depth': 10, 'sediment discharge': 1000, 'island width': 300}


## Loops and Conditions
### Looping
`for` loops iterate through a specified range of items/time/etc., and performs tasks at each step. They take the general form: 


In [36]:
# List with worlds oceans:
oceans = ['Arctic', 'Antarctic', 'Atlantic', 'Indian', 'Pacific']

for item in oceans:
    print(item)


Arctic
Antarctic
Atlantic
Indian
Pacific


Unlike bash, python formatting relies a lot on the presence of white space to define different components of a program element. In python the for statement is always followed by a `:`. Then, anything following this `:` must be tabbed over and anything occurring within this textual block will be considered to be part of the thing to be looped over.

For loops can loop over any of the sets we defined above : lists, strings, etc.

> Exercise: We have a dictionary that gives us the complementary base to a specified base. Let´s write a for loop that lets us iterate over a DNA sequence and gives us the complementary sequence!


In [37]:
# Complementary base dictionary
comp_dict = {'A': 'T', 'T': 'A', 'G': 'C', 'C': 'G'}

# DNA sequence: 
sequence = 'ATGCCTGGGATAT'
reverse_complement=''

# for loop: 
for base in sequence:
    reverse_complement += comp_dict[base] 
    
print(reverse_complement)

TACGGACCCTATA


### Conditions - Deciding with if and else
Often times in programming you have the computer make decisions for you. You want to do one thing to one set of files and something different with the other set. The `if` statement is the most commonly used technique for such decision making. You can think of it is as a fork in the road: based on a binary True/False choice at this fork you will do one thing or another.

If statements are encoded with the the keyword `if`  and a : as with for loops. `else`, when added gives the option for what to do if the first statement is not true. 

Lets write a simple if statement:

In [4]:
# generate an input statement asking for an random integer
num = int(input('Input a random integer:'))

# print different statements based on numeric value
if num > 0:
    print(num,'is positive!')
elif num==0: ## the '==' means the object value has to match EXACTLY with the criteria
    print(num,'is zero!')
else:
    print(num,'is negative!')

Input a random integer: 1


1 is positive!


You can also add multiple criteria to conditional statements. We use `and` when we need all criteria to be met, and `or` when at least 1 criteria needs to be met.

In [5]:
# generate an input statement asking for a random integer
num = int(input('Input a random integer:'))

# create a conditional statement based on if the number is between 1-10
print('Is your number between 1-10?')
if (num > 1) and (num < 10):
    print('Yes!')
else:
    print('No!')

Input a random integer: 2


Is your number between 1-10?
Yes!


## Functions
Functions are a great way to streamline your code and make it more human friendly. They allow you to apply the same set of commands to some new variable or value. So you only have to write a function once-- but can reuse it forever!

To do write functions in python we use the term `def` which is short for definition. `def` is followed by the name of the function that you are writing. As with for loops etc. above, white space is used define the code block that will be included in the function. After the function-name you have a set of parentheses after which you can list all the parameters that should be passed to the function.

In [40]:
# Lets define the function addx which takes two parameters as inputs and adds them together. 
bignumber = 134709457
def addx(input, x):
    output = input + x
    print(input, 'plus', x, 'equals', output)

Simply defining a function does not run it. Note, if you paste this into a cell in Python nothing will be printed. To execute a function you must call it.

In [41]:
# Lets add bignumber with 5:
addx(bignumber, 5)

# Lets assign the variable biggernumber to the result of our calculation.
biggernumber = addx(2, 40000)
print(biggernumber)

134709457 plus 5 equals 134709462
2 plus 40000 equals 40002
None


You shouldn't see any value automatically assigned to output. 

Broadly, you can think of this as variables defined within a function have a scope only within that function. Variables defined outside of a function are accessible throughout.

* The variable `bignumber` is a global variable. This means that it was defined outside of a function and can be viable (accessible) everywhere (globally).
* The variables `output`, `input`, and `x` are local variables within the function `addx`. These variables are only accessible within the function `addx` and do not get added to the global namespace.

So, what if you wanted to do something with the variable output from `addx`? Any variable that you want to access from a function has to be returned to the global name space. This can be done with `return`.

In [42]:
def addx(input, x):
    output = input + x
    print(input, 'plus', x, 'equals', output)
    return output

# lets run addx again:
biggernumber = addx(2, 40000)
print(biggernumber)

2 plus 40000 equals 40002
40002


## Applied Excercises!
This is the end of the general Python review! Now let´s practice our (maybe) newly acquired skills with some excercises! --> See GitHub for the files!