# Lesson 1: Timeseries Data and Exoplanets


## Learning Goals: 
- Understand how missions like TESS and Kepler look for repeated changes in brightness to detect planets.
- Learn to plot a light curve using mission-generated LC and TPF files
- List common uses of timeseries data

## How are exoplanets discovered?
NASA has an excellent summary of the [five main techniques astronomers have used to discover exoplanets](https://exoplanets.nasa.gov/alien-worlds/ways-to-find-a-planet/). They are:
- Transits (the method we'll discuss)
- Microlensing
- Astrometry
- Radial velocity measurements
- Direct imaging

The TESS mission is optimized to look for planets using the transit method. It does this by "staring" at a 24x96 field for 27 days; this is referred to as a sector. NASA's Goddard Spaceflight Center has an incredible [video showing how TESS scanned the sky over its two year, primary mission](https://youtu.be/evHF_mnIdj4?feature=shared&t=26), which we've also embedded into the cell below:

In [None]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/evHF_mnIdj4?si=UaPNulrr-ZZ_Mdop&amp;start=26" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>

### What is a transit? 
A transit occurs when a planet passes between a star and an observer. Transits reveal an exoplanet, not because we directly observe it from many light-years away, but because the planet passing in front of its host star slightly dims its light; we see the planet's "shadow". The dimming is most obvious in light curves: graphs showing the intensity of light over time. 

<img src="https://upload.wikimedia.org/wikipedia/commons/8/88/Exoplanet_transit_method.gif" width="500">

In addition to exoplanets, TESS data are also useful when analyzing a variety of astronomical systems including binary stars, asteroseismic signals, and transient objects. These applications share the same fundamental need for a high-precision measurement of brightness, which TESS delivers.

## Discovering WASP-153b: Plotting a light curve

Let's look for a transit of our own. [WASP-153b](https://exoplanetarchive.ipac.caltech.edu/overview/WASP-153%20b#planet_WASP-153-b_collapsible) is a gas giant exoplanet that orbits a G-type star. It has a mass of 0.39 Jupiters, and takes only 3.3 days to complete one orbit of its star; it's 0.048 AU from its star. Its discovery was announced in 2017. 

Given the planet's large size and proximity to its host star, it should be easy to spot the transit.

### Imports

We'll use a "standard" suite of packages in this Notebook.

* `astropy.io fits` to read in fits files
* `matplotlib `to create plots
* `numpy` has nice mathematical operators that are fast

To access the cloud data, we need:

* `astroquery.mast` to search for and select data
* `s3fs` to access cloud files as though they were local

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import s3fs

from astropy.io import fits
from astroquery.mast import Observations
from astropy.wcs import WCS

We also need to run the following to access cloud data:

In [None]:
Observations.enable_cloud_dataset() # use cloud data when possible
fs = s3fs.S3FileSystem(anon=True)   # read cloud data as though local

### Query For Observations, Filter for Files

We already know our target name: WASP-153. We need to add some additional filters to query to narrow things down:
* Mission: we'll limit our search to data from the TESS mission.
* Time series data: in the context of TESS, this will eliminate full-frame images from our results. This is helpful, as there as thousands of them per sector.
* Sector: this is selected arbitrarily to narrow down the number of search results.

In [None]:
# Query for TESS time series Observations of our target
TESS_table = Observations.query_criteria(objectname="WASP-153"
                                         , obs_collection="TESS"
                                         , dataproduct_type='timeseries'
                                         , sequence_number = 40
                                         ) 

# Get associated science products for each Observation
data_products = Observations.get_product_list(TESS_table) 

# Keep only the science products
filtered = Observations.filter_products(data_products, productType="SCIENCE")

# Be selective about the columns we display
cols = ['obs_id', 'description', 'productSubGroupDescription']
filtered[cols]

Excellent, we've filtered down to three results. The data validation timeseries is product produced by TESS as part of the automated exoplanet detection routine.

More of interest to us are the light curve and target pixel file. Let's focus on the target pixel file for now, since it will help us understand exactly how TESS data are generated. For later convenience, we'll prepare to open both files.

In [None]:
# Filter for the light curve files; we don't need this now
lc_prod = Observations.filter_products(data_products
                                      , productSubGroupDescription = "LC")

# Filter for the target pixel files
tp_prod = Observations.filter_products(data_products
                                       , productSubGroupDescription = "TP")

One last step: convert these into cloud URIs. See the [previous notebook](../00-the-cloud.ipynb) in this series for more detail about cloud access.

In [None]:
# Saving this for later
lc_uri = Observations.get_cloud_uris(lc_prod)

# We'll use this now; let's print it out to make sure it makes sense
tp_uri = Observations.get_cloud_uris(tp_prod)
tp_uri

Great. We have the URI for the target pixel file. Now let's dive in and do some analysis!

### Handling the Target Pixel File

Before we actually read any data from this file, we should figure out what's stored in it. We'll do that by calling `fits.info()`. 

In [None]:
with fs.open(tp_uri[0], "rb") as f:
    with fits.open(f, "readonly") as hdulist:
        hdulist.info()

The `APERTURE` HDU gives us information about the aperture used to process the image. The `PIXELS` HDU contains the actual brightness data we'll need to process our lightcurve. Again, for convenience, let's read in our data now.

In [None]:
with fs.open(tp_uri[0], "rb") as f:
    with fits.open(f, "readonly") as hdulist:
        pixels = hdulist[1].data
        aperture = hdulist[2].data

Let's take a look at the aperture first.

#### Plotting the Aperture

The output from `fits.open()` tells us that the dimensions of the aperture are 11x11. What exactly does the aperture tell us? Let's plot it to see if it makes sense:

In [None]:
# Start figure and axis
fig, ax = plt.subplots()

# Display the pixels as an image
cax = ax.imshow(aperture, origin="lower")

# Add a color bar
cbar = fig.colorbar(cax)

# Add a title to the plot
fig.suptitle("WASP-53b Aperture: Sector 40")
plt.show()

Hm. This doesn't look much like a star, so what's going on here?

##### Exercise: what's going on here?

Let's narrow the focus of this exercise to three questions:

1. How many distinct values are used in the aperture?
2. What are these distinct values?
3.  The integer being displayed is part of a 9-digit binary number. [TESS Archive Manual's Chapter on Data Products](https://outerspace.stsci.edu/display/TESS/2.0+-+Data+Product+Overview), specifically the "Aperture Mask Image Flags", discusses what these values mean. Can you figure out what these values correspond to?

In [None]:
# TYPE YOUR ANSWER HERE

In [None]:
# hint for 1/2: there is a numpy function to get you here in one line
# hint for 3: use np.binary_repr

The bright yellow region at center is the "optimal aperture", which is selected by cross-matching positions with the [Gaia mission](https://archive.stsci.edu/missions-and-data/gaia) and setting a brightness threshold. 

The seemingly random teal pixels around the star are the "background pixels". Since TESS is subjected to stray light, these pixels help to distinguish real changes in the target star's brightness.

All other pixels have a value of 257, indicating they were collected by the spacecraft (1), from CCD output D (256).

#### Pixel Data: Handling and Plotting

Pixel data is slightly more complex, at least in terms of how it is structured. It contains many subgroups of data:

In [None]:
pixels.columns

Notably among these columns are the `TIME` and `FLUX` keywords. Those are the two basic ingredients we need to create a time series plot!

Some of the other columns have straightforward meanings; `FLUX_ERR`, for example, is the error in the measured flux. For details about the other columns, you can read the [TESS Science Data Products Description Document](https://archive.stsci.edu/missions/tess/doc/EXP-TESS-ARC-ICD-TM-0014.pdf#page=24). This manual is aimed at a more technical audience, and is therefore quite information dense.

Let's extract the flux and time data.

In [None]:
# Grab the time and flux data
times = pixels["TIME"]
fluxes = pixels["FLUX"]

# What is the shape (dimensions) of the flux data?
np.shape(flux)

The easiest way to think about the flux data is as a stack of 11x11 images, in this case 20309 of them. Let's start by examining the first image:

In [None]:
# extract the first image
first = flux[0, :, :]


# identical settings to plot above
# Start figure and axis
fig, ax = plt.subplots()

# Display the pixels as an image: use the first image
cax = ax.imshow(first, origin="lower")

# Add a color bar
cbar = fig.colorbar(cax, label="e-/s")

# Add a title to the plot
fig.suptitle("WASP-53b: Image 1")
plt.show()

As expected, our 11x11 image is not a particularly high resolution view of this star; this is the price paid for the enormous field of view of the detectors.

It's worth noting that the brightness in this image is measured by electrons per second on each detector; TESS does not calibrate this to physical units.

### The Tricky Part: Adding Up Brightnesses

We have 20,000 images. How do we turn this into a graph of brightness of our target star over time?

Fortunately, we have the solution: add up the all of the values that fall into the optimal aperture. In general, you would figure this out by identifying which pixels have the optimal aperture bit (`=2`) set. Since that involves parsing the binary representation, it's a bit complex for this lesson. Instead, we'll generate our mask by asking the simpler question: where does the aperture equal 267?

**Caution:** this will fail 75% of the time!

In [None]:
# set the optimal aperture
optimal = aperture==267

# plot: are we selecting the star?
plt.imshow(optimal, origin="lower")

Excellent. We now have a way to "slice" each image with the optimal aperture. Adding up the flux in each slice will give us the brightness at that moment in time.

`numpy` is a wonderful library that will make this summation quite easy for us by handling two important details:
1. Ignoring `NaN` values in the flux
2. Summing over each image, but not all of the data: we expect to get back 20000 results from this operation, not a final, collapsed sum. Setting `axis=1` is how we tell numpy to do this.

In [None]:
# sum all 20000 images individually
flux_sum = np.nansum(flux[:, optimal], axis=1)
len(flux_sum)

Great! We have our brightnesses.

### Plot the Time Series and Compare

We've already done all of the hard work. Now let's combine this brightness information with the timestamps.

For additional clarity, let's normalize the data in our plot. Since we can't rule out outliers, median makes the most sense to use.

In [None]:
# calculate the median flux
med_flux = np.nanmedian(flux_sum)

# normalize the flux
norm_flux = flux_sum/med_flux

# plot the flux vs. time with a point size of two, color blacK
plt.scatter(times, norm_flux, s=2, alpha=0.3, c="k")

# outliers cloud the view, focus on relevant section of data
plt.ylim(0.98, 1.05)

# add labels
plt.ylabel("Normalized flux")
plt.xlabel("TESS Barycentric Julian Date")

Wow! The regularly-spaced dips are an unmistakable sign of a transiting exoplanet. We've done it!

## Discovering the Easier Way: LC Files

Of course, this is rather tedious to do yourself each time. For selected targets (generally around 20000 per sector), the mission data processing pipeline produces light curves. Let's look at this now:

In [None]:
with fs.open(lc_uri[0], "rb") as f:
    with fits.open(f, "readonly") as hdulist:
        hdulist.info()

The aperture data in this file is identical to the aperture data from the target pixel file that we opened earlier. Let's extract the lightcurve data so we can make a comparison plot.

In [None]:
with fs.open(lc_uri[0], "rb") as f:
    with fits.open(f, "readonly") as hdulist:
        print(hdulist[1].columns)
        sap_flux = hdulist[1].data["SAP_FLUX"]
        pdcsap_flux = hdulist[1].data["PDCSAP_FLUX"]

Of note here are the `SAP_FLUX` (simple aperture photometry) and `PDC_SAP_FLUX` (pre-search data conditioning) HDUs. We've actually just done our own SAP processing, by "simply" adding up the brightness within the aperture. In fact, if we plot the `SAP_FLUX`, we should see the same figure:

In [None]:
# set up the figure
fig, ax = plt.subplots(2, figsize=(12,8))

# FIRST PLOT
ax[0].scatter(times, norm_flux, s=2, alpha=0.3, c="k")
ax[0].set_ylim(0.98, 1.02)
ax[0].set_title("Manual SAP")
# add labels
ax[0].set_ylabel("Normalized flux")

# SECOND PLOT: TESS DATA
ax[1].set_title("SAP_FLUX: From TESS")
ax[1].set_ylim(0.98, 1.02)
ax[1].scatter(times, sap_flux/np.nanmedian(sap_flux), s=2, alpha=0.3, c="k")
# add labels
ax[1].set_ylabel("Normalized flux")
ax[1].set_xlabel("TESS Barycentric Julian Date")

These figures are indistinguishable by eye, and indeed they are nearly$^*$ identical.

$^*$ See the optional exercise at the end of this notebook.

## Higher Quality: PDC_SAP

The highest quality data is actually stored in the `PDC_SAP` lightcurves. These are created by de-trending the detectors, using effects common to all stars on the CCD. In this way, systematic effects are removed from the signal.

In [None]:
# calculate the normalized pdc_values
pdc_norm = pdcsap_flux/np.nanmedian(pdcsap_flux)

# start the figure
plt.figure(figsize=(10, 6))

# plot the normalized flux with the same limits and labels
plt.scatter(times, pdc_norm, s=2, c="k")
plt.ylim(0.98, 1.02)
plt.ylabel("Normalized flux")
plt.xlabel("TESS Barycentric Julian Date")

Note the flatter profile of the PDC curve compared to the SAP curve. This level of noise reduction is especially helpful as you do a more thorough analysis of the data, particularly through fourier transforms.

## Next time on "MAST Summer Webinar"...
The next lesson, we'll talk more about other uses for TESS timeseries data, and delve into processing interesting signals from noise. Stay tuned!

## Optional Exercise: What is Going on Here? Part Two

If you check our calculated values against the mission-generated SAP values, you'll find something puzzling. They aren't the same. 

In [None]:
plt.figure(figsize=(12, 6))
plt.scatter(times, norm_flux-sap_flux/np.nanmedian(sap_flux), s=2, c="k")
plt.ylabel("$\Delta$ normalized flux")

There are many approaches you might take to solving this mystery, but you might find it helpful to work through these guiding questions:
* What feature or pattern of this plot do you notice?
* Does this pattern change in different parts of the plot? Where does it change?
* Can you think of a better value to use for the y-axis?

## Acknowldegements

If you write a paper using TESS data from MAST, please acknowledge this using the following template:

> This paper includes data collected with the TESS mission, obtained from the MAST data archive at the Space Telescope Science Institute (STScI). Funding for the TESS mission is provided by the NASA Explorer Program. STScI is operated by the Association of Universities for Research in Astronomy, Inc., under NASA contract NAS 5–26555.

Any published work that uses Astroquery [should include a citation](https://github.com/astropy/astroquery/blob/main/astroquery/CITATION), or can be printed out in a code cell with: `astroquery.__citation__` as long as the astroquery package is imported. 

### About this Notebook:
If you have comments or questions on this notebook, please open a [GitHub issue on tike_content](https://github.com/spacetelescope/tike_content/issues/new) contact us through the [Archive Help Desk e-mail](mailto:archive@stsci.edu).

**Authors:** Thomas Dutkiewicz, Emma Lieb, Scott Fleming

**Last Updated:** May 2024

[Top of Page](#top)

<img style=float:right; src="https://raw.githubusercontent.com/spacetelescope/notebooks/master/assets/stsci_pri_combo_mark_horizonal_white_bkgd.png" alt="Space Telescope Logo" width="200px"/> 