
# Using Python for your science


This is an introduction to beginning your scientific Python programming. If you havne't done so yet, please download and install the the latest versions of anaconda (www.anaconda.com/download/) and sunpy (docs.sunpy.org/en/stable/guide/installation/).

Let's make sure that your instillation of anaconda is up to date. From the terminal, type:


To keep all of the anaconda packagesupdated, use the following command:

Now, lets launch an ipython (interactive python) session

# Getting started with Python

The first thing to note is that by itself, Python is just a platform. You can think of Python as an engine - it is very powerful but unless you connect it to a transmission, axle, and wheels, it isn't very effective at getting you down the road. 

Fortunatly in most cases you don't need to reinvent the wheel and you can just find some that someone else created that will fit your needs. 

The Anaconda package comes with literally thousands of tools to help you work with data. 

Let's explore some basic tools. NumPy is the fundamental package for scientific computing with Python. Let's load it in to our current session. 

In [1]:
import numpy as np

Notice the syntax we are using: import "the library" as "what we want to call it." 
Now whenever we want to call a tool in numpy, we can use `np.`

Next we will want to plot our data, so lets grab some plotting tools. We will use a package called matplotlib. Matplotlib is a 2D plotting library, which produces all kinds figures. We don't need everything that matplotlib can do, so let's import just the tool that produces line plots. This tool is called pyplot. 

In [2]:
import matplotlib.pyplot as plt

Again notice the syntax we are using: `import` "library.tool" `as` "what we want to call it."

Okay, we have the tools `np` and `plt` defined. Loading in the tools and libraries you want to use is something you have to do every time you write a program or load a python session. 

Now lets put our tools to use:

In [4]:
dt = 0.1
Fs = 1/dt
t = np.arange(0,Fs,dt)

We can dynamically define variables (i.e. we don't have to specify the type) and use conventional assignments to make new variables. 

We also use a tool within `np` to create `t`: `np.arange`. It creates an evenly spaced array with the syntax of `output=np.arange(start,stop,step)`. 

So what does `t` look like? We just type the variable or use a `print` statement:

In [5]:
t

array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2,
       1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2, 2.3, 2.4, 2.5,
       2.6, 2.7, 2.8, 2.9, 3. , 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8,
       3.9, 4. , 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, 5. , 5.1,
       5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9, 6. , 6.1, 6.2, 6.3, 6.4,
       6.5, 6.6, 6.7, 6.8, 6.9, 7. , 7.1, 7.2, 7.3, 7.4, 7.5, 7.6, 7.7,
       7.8, 7.9, 8. , 8.1, 8.2, 8.3, 8.4, 8.5, 8.6, 8.7, 8.8, 8.9, 9. ,
       9.1, 9.2, 9.3, 9.4, 9.5, 9.6, 9.7, 9.8, 9.9])

In [6]:
print(t)

[0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.  1.1 1.2 1.3 1.4 1.5 1.6 1.7
 1.8 1.9 2.  2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.  3.1 3.2 3.3 3.4 3.5
 3.6 3.7 3.8 3.9 4.  4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.  5.1 5.2 5.3
 5.4 5.5 5.6 5.7 5.8 5.9 6.  6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 7.  7.1
 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 8.  8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9
 9.  9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9]


But what are the dimensions of `t`? How many elements are in it? 

In Python, since it is an object oriented programming language, the variable `t` is actually more than just a simple array. It is self-aware! So let's ask `t` about itself. 

In [7]:
t.min

<function ndarray.min>

Um... what just happened here? Python just said "yep, that is a valid function of t" but didn't actually return the value. To get the value out, we need to use a different syntax:

In [8]:
t.min()

0.0

Also we can ask:

In [9]:
t.max()

9.9

In [11]:
t.mean()

4.95

In [12]:
t.var()

8.332500000000001

These are all "t.something( )" - "something( )" is a function of the object `t`.
To find out how many elements are the array

In [13]:
t.size

100

To see the shape of the array:

In [15]:
t.sum()

495.0

If you want to see all of the functions already available in `t`, type `t.` and then 'tab'. This will show you dynamically all of the functions that exist. Go ahead, try it!

You'll note that `.size` and `.shape` don't have brackets after them, but `.min()` and `.var()` do.  This is because the size and the shape of the array are attributes of the array.  The reason `.min()` (and other methods) have brackets is that these functions have different options that you may want to use.

Let's create a 2d random array

nishu.karna@cfa.harvard.edu


In [16]:
rand_nn = np.random.randn(t.size, t.size)

We can find the minimum values along one dimension of the array...

In [17]:
rand_nn.min(axis=0)

array([-2.43752279, -2.50980223, -2.14946974, -2.12835402, -2.48885121,
       -2.66815063, -2.39643338, -2.32681024, -2.167472  , -2.43013792,
       -2.19668089, -2.90032615, -3.83948898, -2.24710512, -3.23761091,
       -2.41436247, -2.52106913, -2.32695829, -1.94353468, -2.14067824,
       -2.79143637, -3.43074167, -2.62526324, -2.80010999, -2.27525274,
       -2.39092103, -2.94210409, -2.38005724, -2.12563683, -2.41620011,
       -2.19120729, -2.37625704, -3.22887521, -1.95837071, -2.37631951,
       -2.44421933, -2.54688608, -2.4130378 , -2.92428852, -1.58561542,
       -3.64182881, -1.84658489, -2.08812991, -3.06747715, -2.09441163,
       -2.46380535, -2.02609883, -2.49990946, -2.64101054, -2.83231044,
       -2.1997834 , -2.63546053, -2.19134745, -2.42727384, -2.02317128,
       -2.49873229, -2.36028874, -2.45644155, -2.71064612, -2.82236034,
       -2.00903524, -3.02725132, -3.38060821, -2.18939625, -2.32459767,
       -2.47743888, -2.36796645, -2.12629654, -1.90357254, -2.39

or the other one...

In [18]:
rand_nn.min(axis=1)

array([-2.50980223, -2.01741295, -2.83231044, -1.98778982, -2.44421933,
       -2.16210688, -3.28528514, -2.59125335, -3.64182881, -2.46403974,
       -2.25080282, -2.24710512, -2.82236034, -2.31583208, -2.33807862,
       -1.98556639, -2.3981012 , -1.54930226, -2.39643338, -2.35691551,
       -2.08824716, -3.24136347, -2.23736698, -2.62526324, -2.38005724,
       -3.38060821, -2.29196251, -2.26432893, -2.47743888, -2.20269038,
       -2.39640824, -1.99260645, -2.72459234, -2.39014812, -2.63546053,
       -2.38211242, -2.43752279, -2.20942642, -1.84411744, -2.92428852,
       -2.14116615, -2.43013792, -2.43126713, -2.16825194, -2.18222669,
       -1.96648481, -2.49873229, -2.94210409, -2.19903802, -3.83948898,
       -2.46380535, -2.90032615, -2.50780234, -3.67231761, -2.36796645,
       -2.34645072, -2.52106913, -2.13137386, -3.23761091, -2.79143637,
       -2.19629162, -3.43074167, -3.02725132, -2.32459767, -2.21652786,
       -2.8832096 , -2.22671101, -2.37776464, -2.9184864 , -2.06

Or the whole array.

In [21]:
rand_nn.min()
rand_nn.min(axis=None)

-3.839488979742032

The `min( )` function with the axis option finds the minimum value along different dimensions of the array `rand_nn`.

We used `axis=None`.  There is a special value in Python called `None`, which indicates **no value**.  In this case, this means that axis has no assigned value.  The axis function then resorts to a default behavior, which is to give you the minimum value over the entire array.


Now let's create an array of random numbers with the same number of elements as `t`. We will use the Numpy tool `random`, but there are lots of ways to generate random numbers. 

To see the options of `Numpy.random` to calculate random numbers, try typing `np.random.` and then 'tab'. 

Let's sample from a normal distribution:

In [22]:
rand_n = np.random.randn(t.size)

Notice that we can use the `t.size` attribute of `t` without having to define another variable or call a separate function.

Let's plot our variables. To do this we will use the matplotlib's `plot` function. 

In [25]:
plt.ion()
plt.plot(t,rand_n)


<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x11f3efc18>]

Matplotlib generates a plot and then in a separate command you have to display it to the screen. Yeah, it's a pain but you get use to it. 

Let's now generate some more complex data:

In [27]:
r = np.exp(-t/0.05)
r_dat = np.convolve(rand_n, r)*dt
r_dat.size

199

The convolution returned an array that is twice the size of our other working data sets. We should trim that down. The syntax for segmenting an array uses square brackets:

In [28]:
r_dat =r_dat[:t.size]

This means we want `r_dat` from the beginning element to `t.size` (which is 100). Let's create one more data set to work with using some more of NumPy's functions (`sin` and `pi`). 

In [29]:
ss = 0.1*np.sin(2*np.pi*t) +r_dat

What does all of this generated data look like? Why don't we make five different plots to highlight some different plotting capabilities.

First, we will define a plotting space, but instead of showing the plot this time, we are going to plot some more values. First, we will plot the magnitude spectrum of the generated data with a linear scale and a dB (logarithmic) scale. This can all be done within matplotlib. Next, We can easily show the wrapped and unwrapped phase spectrum. 

The `subplot`command means that we want a 3 by 2 grid of plots and we want to operate on the first sextant. 

In [30]:
plt.subplot(3,2,1)
plt.plot(t,ss)

plt.subplot(3,2,3)
plt.magnitude_spectrum(ss, Fs=Fs)

plt.subplot(3,2,4)
plt.magnitude_spectrum(ss, Fs=Fs, scale='dB')


plt.subplot(3,2,5)
plt.angle_spectrum(ss, Fs=Fs)

plt.subplot(3,2,6)
plt.phase_spectrum(ss, Fs=Fs)

<IPython.core.display.Javascript object>

(array([  3.14159265,   0.20090279,  -2.62641382,  -3.33472149,
         -5.1592235 ,  -7.65371201,  -4.60354742,  -7.2272593 ,
         -4.43830268,  -4.48633749,  -1.42699672,  -4.38811385,
         -3.49460405,  -6.48781605,  -9.11673975, -11.68322175,
         -9.05047997,  -8.5245342 , -11.36274784, -13.28836727,
        -16.02057281, -16.93292625, -18.94426422, -22.00171275,
        -20.05003189, -20.37788161, -17.73756018, -20.44803617,
        -22.7119837 , -25.28847114, -25.20331697, -28.07197637,
        -30.77224266, -31.08073579, -29.41570953, -27.13386257,
        -24.13733629, -26.9483361 , -29.82561786, -26.87708655,
        -24.51930856, -21.60899852, -23.97514467, -26.37871116,
        -23.32690153, -22.89113239, -21.39441954, -20.44098   ,
        -20.04750799, -18.42889498, -15.70796327]),
 array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2,
        1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2, 2.3, 2.4, 2.5,
        2.6, 2.7, 2.8, 2.9, 3. , 3

Using matplotlib, we can showcase our generated data set in several different ways. Of course this is just an example and there are many more types of plots that matplotlib can generate. To see what else is possible, visit matplotlib.org/gallery.html for examples. 

# Basic Programming Techniques in Python

To showcase some other basic programming techniques using the skills we just learned, lets analytically calculate $\pi$. 

The easiest way of doing this is using the Monty Carlo Method through finding random points inside a circle inscribed in a square. 


In [31]:
import numpy as np

Next, we will initialize our counting variables

In [32]:
total = 0
inside = 0

Now we will set up a for-loop to calculate our points. Loops in Python are a bit different than other programming languages. First, Notice that in Python loops are tab delimited. Unlike Foretran or IDL there is no need for 'begin' and 'end' statements since the tabbing indicates where the loop starts and ends. 

In [33]:
for ii in range(10000):
    x_coord = np.random.uniform()
    y_coord = np.random.uniform()
    
    r = np.sqrt(x_coord**2 +y_coord**2)
    
    if r < 1:
        inside += 1
        
    total +=1 
    

In [34]:
print('Pi=', 4.0*inside/(1.0*total))

Pi= 3.1284


You will see somethig else too: the variable we are iterating over, `ii`, is nowhere to be found in the loop. This is because the iteration of `ii` over the list that the `range( )` function produces is implicit. We can make the interation explicit using a slightly different syntax. 

In this case, `iteration` now contains the iteration number (0,1,...,9999) while `ii` contains whatever is in the list at that iteration. This could be any data type. To drive this point home, let's show a trivial example:

In [37]:
for value in 'Jack':
    print(value)
print

for iteration, value in enumerate('Jack'):
    print(iteration, value)
    


J
a
c
k
0 J
1 a
2 c
3 k


A few other things to notice: we are using `np.random` again, as well as `np.sqrt`. To define an exponent python uses a double astrix. We also encounter an `if` statement. For conditionals, python drops the 'then' from the statement and uses tabbing to denote the begining and ending.  

# Writing A Function

Lets take this same python script and make it into a simple function we can iterate over. 

In [38]:
import numpy as np

In [39]:
def pitest():
    xtemp = np.random.uniform()
    ytemp = np.random.uniform()
    
    r =np.sqrt(xtemp**2 + ytemp**2)
    
    if r<1:
        return True
    return False

The Python syntax for functions use the same tabbing to start and end the function that loops and conditional statemetns do. This particular function `pitest( )` does not have any variable input, but if it did, it would go in the parenthesis. Now let's put the new function in a loop and see how it works:

In [41]:
count = 0
inside = 0

for i in range(10000):
    is_inside = pitest()
    
    inside += is_inside
    count += 1.
    
print('Pi =', ((inside/count)*4.))

Pi = 3.1076


Python has boolean data types, `True` and `False`.  These are also understood to have the integer values 1 and 0 respectively. 

# Running Your Own Function
So now I have writen a script, how do I run it from the command line? Let's say that the function we wrote, __'pitest,'__ is saved with the name __'Pifunction.py'__. First, change to the directory where your program is located. 

Then import the name of your script. 

And then run your function like we have all along: