In [1]:
%matplotlib inline

# Solar Data Processing with Python

Now we have a grasp of the basics of python, but the whole reason for downloading python in the first place was to analyze solar data. Let's take a closer look at examples of solar data analysis. 

We will be using SunPy to access solar data. SunPy is a python package designed to interface between the powerful tools that exist in other Python Libraries with current repositories of solar data. With SunPy we will show how to: download solar data sets from the VSO, calibrate to industry standards, plot and overlay a time series. 

# Downloading Data With SunPy Through The VSO

Data is the life blood of solar research. We have queried the HEK, but how do we search and download data from the VSO? SunPy has a VSO module to directly interact with the VSO so we can search, sort, and download solar data. First, we need to load the VSO client:

Now, let's construct a basic query and ask for all EIT images on Jan 1, 2001 with wavelengths between 170 and 180 Angstroms.  

Notice the syntax we are using similar to how we query the HEK database. To communicate the exact parameters we want the VSO to search over, we are using the vso attributes function (vso.attrs). To find out all of the VSO attributes that exist, you can type 'help(vso.attrs)', or look online at sunpy.org, or type 'vso.attrs.'+"tab" and see a list.

SunPy expects units to be specified where they make physical sense. So we must specify that the wavelengths we are looking for is in angstroms (rather than nanometers or even km). The units package specifies angstroms as units.AA.

How many results did we get?

What do the results look like? Let's print the first one. 

You can see that the results include the meta-data but not the data itself. Now let's download the images in our data query from the VSO to our current working directory. 

Each downloaded file with the filename '{file}' obtained from the VSO is appended with the suffix .fits. The '{file}' option uses the file name obtained by the VSO for each file.

# Fitting A Gaussian to a Spectral Line


One of the most common data types in solar data processing is a time series. A time series is a measurement of how one physical parameter changes as a function of time. This example shows how to fit a gaussian to a spectral line. In this example, it will be as "real world" as possible.

First, let's import some useful libraries. 

Next we need to load in the data set we want to work with:

So what did we get when we opened the file? Let's take a look:

We got 4 items in the list. Lets take a look at the first one: 

This doesn't contain any useful information. Let's take a look at the second item:

Alright, now we are getting somewhere. This has wavelength information in units of 'nm' and accuracy information without units. Let's take a look at the other elements of the list we got:

So it looks like we are working with some wavelength data, spectral information, irradiance data, etc. 

# Plotting Spectral Data
Let's take a look at some of the data we've got. 

So now we have a plot of wavelength vs. irradiance. We can see there is one major spike in the data. Let's filter the data so that we just have that one spike. 

This function, "np.logical_and", is similar to a "where" statement in IDL. We can see that "w" is now an array of true and false values. To take a subsection of our data where our filter is true:

Now, we need to add some units to this data. The header of the file tells us what the units are:

Let's do the same thing to the irradiance data. 

What have we got? Let's plot it and take a look:

# Fit He II 304 line with a Gaussian

Now that we have extracted the He II line from our total spectrum, we want to fit it with a gaussian. Do do this we will make use of a couple of packages in in astropy. We will initialize the gaussian fit with some approximations (max, center, FWHM): 

Now let's define a fitting method and produce a fit:

Let's take a look at some of the qualities of our fitted gaussian:

Our guesses wern't too bad, but we over estimated the Standard Deviation by about a factor of 10. The variable 'g' has the fitted parameters of our gaussian but it doesn't actually contain an array. To plot it over the data, we need to create an array of values. We will make an array from 30.2 to 30.6 with 1000 points in it. 

To find the values of our fit at each location, it is easy:

Now we can plot it:

# Ingegrating under the curve. 

Let's find the area under the curve we just created. We can numerically integrate it easily:

# Produce a Light Curve of He II 304 From All Spectra in File

The file we downloaded is a hyper-spectrum. This means that the spectrum changes over time. Can we find how the intensity of the line changes over time using the same fitting tools that we just showcased? Sure, we just need to put everything into a loop. 