# A brief introduction to using Python in Astronomy

The purpose of this notebook is to be a short reference guide that introduces the [Python](https://www.python.org/) programming language in a way that is useful for Astronomy. This notebook is not meant to be an introduction to programming, so some familiarity with basic concepts will be assumed. This notebook is also not meant to be a comprehensive overview of all the features of Python. The examples and especially the explanations will be kept short, but links to an explanation in an external resource are often provided. Learning by changing the codeblocks and seeing what happens is highly encouraged. Some remarks in the notebook are written in _italics_. Those can be skipped but might be useful or interesting for someone already more familiar with the subject. Some of the code below is written in a way that demonstrates Python syntax, but might not be the most elegant or practical way of doing things.

## Python

### Hello, World!

Programming languages are often introduced with an implementation of the "Hello, World!" program. This program simply displays the message "Hello, World!" to illustrate the basic syntax of the program. In Python it can be implemented as

In [None]:
print('Hello, World!')

_The code above is written in Python 3 and would be slightly different in Python 2, where ``print`` was a statement, not a function. Given that Python 2 is officially discontinued we will not mention it any further._

It could also be implemented as

In [None]:
print("Hello, World!") # This uses " instead of '

In both cases the ``print()`` function outputs its argument, which is the string "Hello, World!". In the first codeblock the string was constructed using single quotes ``'`` and in the second it was done using double quotes ``"``. According to the [Python Style Guide](https://www.python.org/dev/peps/pep-0008/#string-quotes) either option is fine, but the same style should be used throughout the code.

The second implementation also includes an inline comment, beginning with ``#``. Anything written after a ``#`` is ignored by Python, but can be used to comment the source code.

The argument of the ``print()`` function does not have to be a string, it will be automatically converted if necessary.

In [None]:
print(1.234)

### F-strings

Python supports [Literal String Interpolation](https://www.python.org/dev/peps/pep-0498/) in the form of f-strings. They are a way of writing down human-readable expressions that will be evaluated before printing.

In [None]:
a = 1
b = 2
print(f'The sum of {a} and {b} is {a+b}.')
print(f'{a} divided by {b} is {a/b}')

An f-string is written with ``f`` immediately preceding the first (single or double) quote and all expressions to be evaluated are written inside curly brackets. The values inside the curly brackets can be further formated with the [Format Specification Mini-Language](https://docs.python.org/3/library/string.html#format-specification-mini-language), as demonstrated below. 

In [None]:
e = 2.718281828459
print(f'The number e is roughly {e:.3f}.')
print(f'The number e^4 is roughly {e**4:.2f}.')
print(f'The number e^4 is roughly {e**4:.2e}.')

### Operators

Python supports many arithmetic operations. In the examples above we have already used addition ``+``, division ``/`` and exponentiation ``**``. The arithmetic operations available in Python are demonstrated below.

In [None]:
a = 4
b = 5
print(a+b)  # Addition
print(a-b)  # Subtraction
print(a*b)  # Multiplication
print(a/b)  # Division
print(a%b)  # Modulus
print(a//b) # Floor division
print(a**b) # Exponentiation

The arithmetic operators can be used together with a preceeding ``=`` to create assigment operators which modify the value of a variable instead of creating a new variable with the desired value. Their use is demonstrated below.

In [None]:
a, b = 4, 5
c = a+b
print(a, b, c)
a += b
print(a, b)
c = a*b
print(a, b, c)
a *= b
print(a, b)

Python also includes comparison operators that result in Booleans ``True`` or ``False``.

In [None]:
a, b = 4, 5
print(f'Is {a} equal to {b}? {a == b}')
print(f'Is {a} not equal to {b}? {a != b}')
print(f'Is {a} lesser than {b}? {a < b}')
print(f'Is {a} greater than {b}? {a > b}')
print(f'Is {a} lesser than or equal to {b}? {a <= b}')
print(f'Is {a} greater than or equal to {b}? {a >= b}')

Logical operators are available.

In [None]:
a = True
b = False
print(f'a is {a} and b is {b}.')
print(f'a and b is {a and b}.')
print(f'a or b is {a or b}.')
print(f'a is {a} so not a is {not a}.')

Comparison operators can be chained without having to explicitly use logical operators.

In [None]:
a, b, c, d = 1, 4, 9, 13

print((a < b) and (b < c) and (c < d))
print(a < b < c < d)
print()
print((a < b) and (b > c) and (c < d))
print(a < b > c < d)
print()
print((a <= b) and (b <= c) and (c <= a))
print(a <= b <= c <= a)

### Sequences

Variables can be collected together into sequences. Elements in a sequence can be accessed through indexing. The first element in a Python sequence has index 0. Negative indices are counted from the end of the sequence starting from -1.

In [None]:
seq = [1, 2, 3]
print(f'The sequence is {seq}.')
print(f'The first element of the sequence is {seq[0]}.')
print(f'The second element of the sequence is {seq[1]}.')
print(f'The second to last element of the sequence is {seq[-2]}.')
print(f'The last element of the sequence is {seq[-1]}.')

_We defined ``seq`` using square brackets, which makes it a [list](https://docs.python.org/3/tutorial/introduction.html#lists)._

We can check if an element is in the sequence or not.

In [None]:
a = 2
print(f'Is a={a} in the sequence {seq}? {a in seq}')
print(f'Is a={a} not in the sequence {seq}? {a not in seq}')

### If-statements

If-statements allow code to be conditionally executed. Python uses indentation to group lines of code. [Python Style Guide](https://www.python.org/dev/peps/pep-0008/#string-quotes) recommends using 4 spaces per indentation level.

In [None]:
a = 45
if isinstance(a, int): # We want to be sure that a is an integer
    if a > 0:
        if a%2 == 0: 
            print(f'{a} is positive and even.')
        else:
            print(f'{a} is positive and odd.')
    else:
        print(f'{a} is non-positive.')
else:
    print(f'{a} is not an integer.')
print('This sentence will be printed no matter what.')

We can get rid of one level of indentation with the if-elif-else construction.

In [None]:
if isinstance(a, int): # We want to be sure that a is an integer
    if a <= 0:
        print(f'{a} is non-positive.')
    elif a%2:
        print(f'{a} is positive and odd.')
    else:
        print(f'{a} is positive and even.')
else:
    print(f'{a} is not an integer.')
print('This sentence will be printed no matter what.')

The code above uses numerical values ``a%2`` (either 0 or 1 for any integer) as Booleans. In Python 0 is considered ``False`` and any other numerical value is considered ``True``. ``False`` can be converted to 0 and ``True`` can be converted to 1.

In [None]:
a, b = 0, -1
print(bool(a), bool(b))
print(int(False), int(True))
print(float(False), float(True))

### Loops

Sometimes it is useful to execute some code multiple times. This can be achieved using a while-loop. Loops follow indentation rules analogous to if-statements.

In [None]:
# This code will print all non-negative integers lesser than 10 using a while-loop
i = 0
i_max = 10
while i < i_max:
    print(i)
    i += 1

A while-loop will continue running as long as the condition following the while statement is true, but it can also be interupted using the ``break`` statement.

In [None]:
i = 0
while True:
    if i >= i_max:
        break
    print(i)
    i += 1

If we wish to print only odd numbers we could use the ``continue`` statement. This will interrupt the current iteration of the loop and start the next one.

In [None]:
i = 0
while i < i_max:
    i += 1
    if not i%2:
        continue
    print(i)

Because we know the range of numbers we wish to print beforehand it would be better to use a for-loop together with ``range()``. _A [range object](https://docs.python.org/3/library/stdtypes.html#range) is not a list._

In [None]:
for i in range(10):
    print(i)

In [None]:
for i in range(2, 10):
    print(i)

In [None]:
for i in range(2, 10, 3):
    print(i)

In [None]:
for i in range(10, 2, -3):
    print(i)

The ``break`` and ``continue`` statements work in a for-loop just like in a while-loop.

A Python for-loop does not need to be used together with ``range()``, any iterable, _i.e. an instance of a class that properly implements the [\_\_iter\_\_() function](https://docs.python.org/dev/library/stdtypes.html#iterator-types),_ will do. Printing elements of an iterable can be achieved simply as

In [None]:
seq = [1, 2, 5, 8]
for elem in seq:
    print(elem)

Should we wish to obtain elements together with their indices the following will work, though clumsily.

In [None]:
for i in range(len(seq)):
    print(f'Element with index {i} is {seq[i]}.')

A more elegant way of doing this is 

In [None]:
for i, elem in enumerate(seq):
    print(f'Element with index {i} is {elem}.')

It is also possible to loop through multiple iterables at once with ``zip()``.

In [None]:
letters = ['a', 'b', 'c']
numbers = (1, 2, 3)
booleans = [True, False]
for letter, number, boolean in zip(letters, numbers, booleans):
    print(letter, number, boolean)

_We defined ``numbers`` using parentheses, which makes it a [tuple](https://docs.python.org/3/library/stdtypes.html#Tuples)._

New lists can be created from existing iterables trough [list comprehensions](https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions).

In [None]:
numbers = range(10)
squares_of_even_numbers = [n**2 for n in numbers if n%2 == 0]
print(squares_of_even_numbers)

### Slicing

We already saw how to use indexing to access individual elements from a sequence, but by using slicing we can access subsequences. When slicing a sequence from index a to b we access the indices $\left[a,b\right)$. For example ``seq[2:4]`` will include ``seq[2]`` and ``seq[3]``, but not ``seq[4]``.

In [None]:
seq = list(range(10))
print(seq)
print(seq[2:])
print(seq[:5])
print(seq[2:5])
print(seq[::2])
print(seq[1::2])
print(seq[1:2])

It is worth pointing out that ``seq[1]`` and ``seq[1:2]`` do not have the same output. The former returns the element with index 1 whereas the latter returns a new sequence that has one element, which is the element with index 1.

### Functions

Python allows for the definition of functions _and classes_ which are chunks of code that can be called by other parts of the code. We could implement the "Hello, World!" program using functions.

In [None]:
def hello_world():
    print("Hello, World!")

We have defined the function but no message was printed because we have not yet called it. 

In [None]:
hello_world()

The ``hello_world()`` function always prints the same message, but we could also write a function that returns a value depending on its input.

In [None]:
def is_positive(x):
    return x > 0

for elem in (5, -5):
    print(f'Is {elem} positive? {is_positive(elem)}')

Functions can be used in the definitions of other functions. _Recursion is available too._

In [None]:
def is_negative(x):
    if x == 0:
        return False
    return not is_positive(x)

for elem in (5, -5, 0):
    print(f'Is {elem} negative? {is_negative(elem)}')

Function arguments can have default values in which case they don't have to be provided in the function call.

In [None]:
def is_odd(x, verbose=False):
    answer = bool(a%2)
    if verbose:
        if answer:
            print(f'{x} is odd.')
        else:
            print(f'{x} is not odd.')
    return answer

# One of these three is not like the others
a = 3
print(is_odd(a))
print()
print(is_odd(a, verbose=False))
print()
print(is_odd(a, verbose=True))

Functions have their own namespace, which means variables defined within a function are separate from variables outside them even if they share their name.

In [None]:
def namespace_example():
    a = 5
    print(f'This function thinks a={a}, b={b}.' )
    
a = 3
b = 7
namespace_example()
print(f'But outside the function a={a}, b={b}.')

### Importing

It can often be useful to import code from pre-existing modules or packages. It is possible to import entire modules or individual classes, functions or constants.

In [None]:
from numpy import pi    # Importing a single constant

print(pi)

In [None]:
import numpy as np      # Importing the entire module

print(np.pi)

## NumPy

### Motivation

A Python list can hold elements of different data types. Arithmetic operations can be defined in a meaningful way for many different datatypes, though with different outcomes.

In [None]:
for elem in (5, '5', [5]):
    print(f'2 times {elem}, a {type(elem)} instance, is {2*elem}')

In scientific computing it is very often necessary to perform a large number of operations with many numerical values. Using Python lists for storing numbers in resource-intensive calculations means that the Python interpreter needs to check every single one of them to determine if it is indeed numerical and what an arithmetic operation performed with it means. This overhead slows the code down. The way to get around this limitation is to write vectorized code with the [NumPy](https://numpy.org/) package.

We shall first illustrate the speed difference between basic Python and NumPy by implementing functions that compute pairwise differences between numbers. A more thorough explanation on how to vectorize code using NumPy will be presented below.

We will use the [IPython ``%timeit`` magic function](https://ipython.readthedocs.io/en/stable/interactive/magics.html) to time the code execution. You shouldn't expect ``%timeit`` to work outside a Jupyter notebook.

In [None]:
def pairwise_differences_with_for_loops(arr):
    diffs = []
    for elem1 in arr:
        diffs.append([])
        for elem2 in arr:
            diffs[-1].append(elem1-elem2)
    return diffs

def pairwise_differences_with_list_comprehensions(arr):
    return [[elem1-elem2 for elem2 in arr] for elem1 in arr]

small_list = [1, 4, 9]
print(pairwise_differences_with_for_loops(small_list))
large_list = list(range(1000))
%timeit pairwise_differences_with_for_loops(large_list)

print(pairwise_differences_with_list_comprehensions(small_list))
%timeit pairwise_differences_with_list_comprehensions(large_list)

In [None]:
def pairwise_differences_with_numpy(arr):
    return arr[:,np.newaxis]-arr

small_array = np.array(small_list)
diffs = (pairwise_differences_with_numpy(small_array))
print(diffs)
large_array = np.array(large_list)
%timeit pairwise_differences_with_numpy(large_array)

We can see that using list comprehensions is roughly twice as fast as using for-loops, but using NumPy is roughly 50 times faster still.

### Arrays

One of the key concepts that allows NumPy to perform so much better is the NumPy array. An array is a collection of elements of the same datatype that has some size, i.e. total number of elements in it, and some number of dimensions or axes, which is the number of indices required to identify an element. An array also has a shape, which is the size of the array along all the different axes.

In [None]:
for arr in (small_array, diffs):
    print(f'Array:\n{arr}', f'Size: {arr.size}', f'Number of axes: {arr.ndim}', f'Shape: {arr.shape}', sep='\n')
    print()

Arrays could be generated from Python lists as we have done above, they could be initialized with some default value or they can be obtained by manipulating other arrays.

In [None]:
false_arr = np.zeros(5, dtype=bool)
print(false_arr)
int_ones = np.ones(4, dtype=int)
print(int_ones)
float_ones = np.ones((2,3))
print(float_ones)

### Vectorization

Vectorized code handles arrays as a whole instead of looping through their elements and handling them individually. Suppose we have an array and we wish to create a new array with the values doubled.

In [None]:
def double_not_vectorized(arr):
    return [2*elem for elem in arr]

def double_vectorized(arr):
    return 2*arr

small_list = [1, 4, 9]
print(double_not_vectorized(small_list))
small_array = np.array(small_list)
print(double_vectorized(small_array))

Here is another example where we increment the elements by one.

In [None]:
print(f'Initial list: {small_list}')

# Normal python with list comprehension
incremented_list = [elem+1 for elem in small_list]
    
print(f'Incremented list: {incremented_list}')
print()

# NumPy 
print(f'Initial array: {small_array}')
print(f'Incremented array: {small_array+1}')

Many NumPy functions can also be applied to arrays element-wise.

In [None]:
angles = np.arange(0, 361, 45)
print(f'Angles are {angles} degrees')
angles = np.deg2rad(angles)
print(f'Angles are {angles} radians')
print(f'Cosines are {np.cos(angles)}')

Some functions can be applied to the array as a whole or along some specific axis.

In [None]:
print(f'Array:\n{diffs}')
print(f'Total sum: {np.sum(diffs)}')
for i in range(2):
    print(f'Sums along axis {i}: {np.sum(diffs, axis=i)}')
print(f'Maximum value: {np.max(diffs)}')
for i in range(2):
    print(f'Maximum values along axis {i}: {np.max(diffs, axis=i)}')

### Slicing

Slicing NumPy arrays uses syntax similar to slicing basic Python sequences, but arrays can be sliced independently along different axes.

In [None]:
print(diffs, diffs[1:], diffs[::2], diffs[1:,:-1], diffs[:,2], sep='\n\n')

Note that Numpy arrays with multiple dimensions are accessed with tuples that specify (or not) the indices in the dimension, i.e ``diffs[i,j]`` for i-th row and j-th column. Something similar can be constructed with Python lists, where the list of rows each contain a column list. These are then accessed by first indexing the row and then the column like ``diffs[i][j]``. 

### Masking

Sometimes we wish to perform operations only on a subset of array elements that satisfy some condition. This can be achieved with masking. Suppose we have an array of integers and we wish to double its odd values but leave the even values unchanged.

In [None]:
print(f'Original array:\n{diffs}')
mod_diffs = diffs.copy()
mod_diffs[diffs%2 == 1] *= 2
print(f'Modified array:\n{mod_diffs}')

In the above code the expression ``diffs%2 == 1`` creates a mask of Boolean values. Only the subset of values corresponding to ``True`` in the mask are doubled.

### Broadcasting

If two arrays have the same shape then it is possible to perform element-wise operations.

In [None]:
a = np.arange(3)
print(a)
b = np.arange(4,9,2)
print(b)
print(a-b)
print(a*b)

Sometimes it is also possible to do this is even if the arrays have different shapes. This is known as [broadcasting](https://numpy.org/doc/stable/user/basics.broadcasting.html). We will not repeat the rules of broadcasting here, but we do offer a brief description of how we used it to compute the pairwise distances with the function ``pairwise_differences_with_numpy()``.

In [None]:
print(small_array)                         # The input array
print(small_array.shape)                   # Input is a 1-dimensional array
print(diffs)                               # The expected output
print()
column_vector = small_array[:,np.newaxis]  # We can add another axis to the array without changing its elements
print(column_vector)                       # We now have a column vector
print(column_vector.shape)
print()
row_vector = small_array[np.newaxis,]      # We could also convert the input into a row vector
print(row_vector)
print(row_vector.shape)
print()
# Broadcasting stretches the column vector to a matrix that has the i-th element in i-th row. The row vector gets
# stretched to a matrix that has the j-th element in the j-th column. The difference of these matrices has in its 
# i,j position the difference of elements with indices i and j, which is exactly what we want.
print(column_vector-row_vector)            
print()
# We don't have to store the row and column vectors, so we could just write
print(small_array[:,np.newaxis]-small_array[np.newaxis,])
print()
# But according to broadcasting rules the second broadcasting can be performed implicitly
print(small_array[:,np.newaxis]-small_array)

The initial array has the shape (3,), so the table of pairwise differences must have the shape (3,3). It could be tempting to convert ``small_array`` into the correct shape by taking the outer product with an array of ones.

In [None]:
temp = np.outer(small_array, np.ones(small_array.shape, dtype=int))
print(temp)

Finding the pairwise distances can now be done explicitly if we think of this intermediate 2D array as a matrix and apply the transposing operation.

In [None]:
print(temp-temp.T)

The problem of this approach is that the temporary matrix ``temp`` needs to be stored in the memory. For a matrix of such a small size this shortcoming is not noticeable, but for larger datasets it could well be. Broadcasting achieves the same outcome without creating and storing temporary matrices and also with less code.

## Matplotlib

Python can be used for more than just computing your results, it can also be used for visualizing them. A popular module for this is [Matplotlib](https://matplotlib.org/), which is quite compatible with NumPy. The following illustrates some basic usage.

In [None]:
from matplotlib import pyplot as plt

angles = np.arange(361)
plt.plot(angles, np.cos(np.deg2rad(angles)))
plt.title('A plot')
plt.xlabel('Angles [deg]')
plt.ylabel('Cosines')
plt.show()
plt.close()

Matplotlib does understand basic $\TeX$ commands.

In [None]:
plt.plot(angles, np.cos(np.deg2rad(angles)))
plt.title('A plot')
plt.xlabel(r'$\alpha$ [deg]')
plt.ylabel(r'$\cos\alpha$')
plt.show()
plt.close()

In [None]:
plt.scatter(angles[::10], np.cos(np.deg2rad(angles[::10])), label=r'$\cos\alpha$', marker='*', color='y')
plt.plot(angles, np.sin(np.deg2rad(angles)), label=r'$\sin\alpha$', linestyle='--')
plt.title('A plot with a legend')
plt.xlabel(r'$\alpha$ [deg]')
plt.ylabel('Trigonometric functions')
plt.legend()
plt.show()
plt.close()

## Astropy

[Astropy](https://www.astropy.org/) is a very useful package for using Python in Astronomy. Here we will limit ourselves to demonstrating only two useful aspects of Astropy.

### Units

Astropy can handle [physical quantities](https://docs.astropy.org/en/stable/units/) that have some value in some unit system. Many physical constants are also built in.

In [None]:
from astropy import units as u
from astropy.constants import G

angles = np.arange(0, 361, 45)*u.degree
print(angles)
print(np.sin(angles))                    # We do not have to explicitly convert degrees to radians
print(angles.to(u.rad))                  # But we can if we want to
print()

r = 1*u.au
t = 1*u.yr
v = 2*np.pi*r/t
print(v)
print(v.to(u.km/u.s))
print()

print(G)

## Tables

Astropy also implements [Tables](https://docs.astropy.org/en/stable/table/index.html) that allow data to be grouped and handled together. _Astropy Tables are very similar to pandas dataframes, but they support multidimensional columns._ A QTable is a Table that can have physical quantities (i.e. with units) as columns. The following demonstrates some basic functionality.

In [None]:
from astropy.table import QTable

# Creating a QTable
labels = ['Earth', 'Jupiter', 'Sun']
m = [1*u.M_earth, 1*u.M_jupiter, 1*u.M_sun]
r = [1*u.R_earth, 1*u.R_jupiter, 1*u.R_sun]
data = QTable((labels, m, r), names=['name', 'mass', 'radius'])
print(data)
print()
print(data.info)
print()

# It is possible to add new columns
data['density'] = (data['mass']/(4*np.pi/3*data['radius']**3)).to(u.g/u.cm**3)
print(data)
print()

# We can filter data based on the values of some columns
print(data[data['density'] < 2*u.g/u.cm**3])
print()

# We can easilly access data for a specific object
print(data[data['name'] == 'Sun'])