# Data Retrieval
---

In this notebook, you will learn how to retrieve the data that you would like to create a fit for. You will also learn how to prepare the file for Allesfitter use.  

#### Importing
The first step is to import the packages that we need in order to acomplish this goal.

In [None]:
import lightkurve as lk
import pandas as pd

#### Retrieving the Data

This tutorial uses lightkurve to retrieve the light curve data for a target*. This data is from the Mikulski Archive for Space Telescopes (MAST). While MAST provides other data products, we only need the light curve (lc) data sets.

There are a variety of ways that you can search for a planet such as a TIC ID or the planet's name. For more information, visit the lightkurve documentation. Here I use the TIC ID. For example, if I wanted to create a curve for HD 189733 (TOI 4470.01 and TIC ID 256364928), I would replace TIC# with TIC256364928.

Run the below cell for your target to see your options.

###### * if you would like to use data from an instrument besides Kepler or TESS, you will need to use a different method.

In [None]:
search_result = lk.search_lightcurve("TIC#")
print(search_result)

Take a look at your different options. You may notice different missions, sectors, and cadences. Select which data set you would like to use and replace "0" in the cell below with the number corresponding with that data set. For my phase curve of HD 189733, I want data from the TESS mission at 120s cadence. As a result, I chose number 1.

The other aspect of the cell below that you should modify is the name and directory of the file. This will become your inst.csv file. Change the directory to your own, and replace "inst" with the name of the instrument. This should match what you list for INST in the settings.csv file. Make sure to change "allesfit/inst.csv" everywhere in this tutorial.

Since my data is from TESS, and I would like the file to be in my allesfit subfolder, my path is 'allesfit/TESS.csv'. 

In [None]:
lc = search_result[0].download()
lc.to_csv('allesfit/inst.csv')

Take a look at the csv file that you created. Currently, it has more columns of data than are needed. To produce fits for light curves or phase curves, only time, flux, and flux_err is needed.

In [None]:
file = pd.read_csv('allesfit/inst.csv')
data = file.drop(columns=['timecorr','cadenceno','centroid_col','centroid_row','sap_flux','sap_flux_err','sap_bkg','sap_bkg_err','pdcsap_flux','pdcsap_flux_err','quality','psf_centr1','psf_centr1_err','psf_centr2','psf_centr2_err','mom_centr1','mom_centr1_err','mom_centr2','mom_centr2_err','pos_corr1','pos_corr2'])
data.rename(columns = {'time':'#time'}, inplace = True)
data.to_csv('allesfit/inst.csv', index=False)

#### Formatting the Data

Now we have the data that we need, but before we run Allesfitter, a couple of adjustments have to be made.

In [None]:
data = pd.read_csv('allesfit/inst.csv')
data

Your data may have some flux and/or flux_err values that are missing. These will look like empty cells in your csv file, or will say NaN in the DataFrame above. The next step is to delete the rows with missing information. Write a piece of code that accomplishes this. If you would like to see a possible solution, see the box below.

In [None]:
### Write your code here to delete rows with NaN values. Check the box below for a hint/solution



In [None]:
#There are many different ways that you could have completed this. Here are two of them:

#version 1:
for index, row in data.iterrows():

    for column in row:

        if column=='':
            data = data.drop(row)


#version 2:
data.dropna(subset=['#time','flux','flux_err'], inplace=True)

#print the data to see if/how it has changed
data

In [None]:
data.to_csv('allesfit/inst.csv', index=False)

Next, the flux needs to be normalized. Rather than an absolute flux, this will yield a relative flux. Don't forget to propogate the error.

In [None]:
data = pd.read_csv('allesfit/inst.csv')

reference_flux = data['flux'].median()

data['flux'] = data['flux'] / reference_flux

##propogating error 
data['flux_err'] = data['flux_err'] / reference_flux

data.to_csv('allesfit/inst.csv',index=False)

Congratulations!! Your inst.csv file is now ready for use!

Try following this tutorial for your own target, or visit the other tutorials on how to set up for your fit.

This tutorial uses lightkurve (Lightkurve Collaboration, 2018) and pandas (The Pandas Development Team, 2010).