## **Datasets**

A Jupyter notebook talking about datasets in PHOEBE. This roughly follows the given tutorial provided at https://phoebe-project.org/docs/2.4/.

### Setup

Let's quickly install PHOEBE (if needed), load it and other librarys, set up the logger and load the default binary Bundle.

In [None]:
# !pip install phoebe

In [2]:
import phoebe as phb
import numpy as np
import matplotlib.pyplot as plt

In [3]:
logger = phb.logger()
bSystem = phb.default_binary()

### What are Datasets

A dataset tells PHOEBE how and at what times to compute the model. It can include actual observable data along with a mix of times which you want to compute a synthetic model.

You need to includee a dataset - even if it doesn't contain any observational data - to compute a synthetic model.

To add a dataset, you need to provide the function in `phb.parameters.dataset` for the particular type of data to deal with, along with your observational data.

The full list of datasets is:
1. **lc** - light curves
2. **rv** - radial velocity curves
3. **lp** - spectral line profiles
4. **orb** - orbit/positional data
5. **mesh** - discretized mesh of stars

*The orb and mesh datasets cannot accept actual observations, so there is no `times` parameter. There is only the `compute_times` and `compute_phases` parameters.*

### Datasets without Observational Data

The simplest case of using a dataset is one without observational data, when you want to compute only a synthetic model. You only need to provide an array of times and infomation about the type of data, and how to compute it.

Let's add an orbit dataset which will track the positions and velocities of the primary and secondary at the provided times.

In [4]:
bSystem.add_dataset(
    phb.dataset.orb, # Type of dataset - orbital
    compute_times = phb.linspace(0, 10, 20), # Compute times for synthetic data
    dataset = 'orb01', # Name of dataset
    component = ['primary', 'secondary'] # What the dataset applies to - in this case applies to the primary and secondary component
)

<ParameterSet: 50 parameters | contexts: constraint, figure, compute, dataset>

It is better to use `phb.linspace` (or equivalent functions) rather than `np.linspace` (even if they are the same) due to be constructed in a way that plays well with the Bundle.

`bSystem.add_dataset` can either take the type of dataset using `phb.dataset` or just by specifying the name.

If a list of compoments is not provided to `bSystem.add_dataset`, it will be assumed based on the dataset method. **lc** and **mesh** only attach at the system level (component=None), whilst **rv** and **orb** attaches for each star.

In [5]:
bSystem.add_dataset(
    'rv', # Type of dataset - radial velocity
    times = phb.linspace(0, 10, 20), # Times of data - notice it is *times* and not *compute_times*, because rv can have observational data. If there is no data at those times, it will run the synthetic model.
    dataset = 'rv01'
)

<ParameterSet: 83 parameters | contexts: constraint, figure, compute, dataset>

In [6]:
print(bSystem['times@rv01'].components)

['primary', 'secondary']


### Datasets with Observational Data

Loading datasets with observational data can be done by passing arrays to `bSystem.add_dataset`. It will apply it to all of the same components in which the time will be applied. This make sense for **lc** as it is system level.

In [7]:
bSystem.add_dataset(
    'lc', # Type of dataset - light curve
    times = [0, 1], # Times where data is
    fluxes = [1, 0.5], # Data
    dataset = 'lc01' 
)

<ParameterSet: 80 parameters | contexts: constraint, figure, compute, dataset>

In [8]:
print(bSystem['fluxes@lc01@dataset'])

Parameter: fluxes@lc01@dataset
                       Qualifier: fluxes
                     Description: Observed flux
                           Value: [1.  0.5] W / m2
                  Constrained by: 
                      Constrains: None
                      Related to: None



Attaching it to all the components is not always desired (i.e. for **rv** datasets). Single-lined RV where we only attach to one component, it is fine. However, double-lined RVs this is not desired. You need to pass different arrays to the primary and secondary.

This can be done using a dictionary with the keys being the targetted components.

In [9]:
bSystem.add_dataset(
    'rv',
    times = [0, .5, 1],
    rvs = {
        'primary' : [-3,3], # Data for the primary
        'secondary' : [4, -4] # Data for the secondary
    },
    dataset = 'rv02'
)

<ParameterSet: 49 parameters | contexts: constraint, figure, compute, dataset>

In [10]:
print(bSystem['rvs@rv02@dataset'])

ParameterSet: 2 parameters
         rvs@primary@rv02@dataset: [-3.  3.] km / s
       rvs@secondary@rv02@dataset: [ 4. -4.] km / s


If data comes from a file, you need to extract it to an array (using something like `np.loadtxt`) first before you can pass it into the dataset. 

Datasets also do not accept phases, so you either need to covnert it using `bSystem.to_phase` and `bSystem.to_time`, or you can use `compute_phases` when creating a dataset and flip the dependency over to `compute_times` (discussed in Compute.ipynb).

## Light Curve Datasets and Options

As this project will primarily deal with lightcurves, this section is dedicated to mastering lc datasets. Let's quickly wipe the default binary and start fresh.

In [11]:
bSystem = phb.default_binary()

Let's add a lightcurve dataset to the Bundle and view all the parameters it introduced. Some are hidden, so `check_visible=False` needs to be passed.

In [12]:
bSystem.add_dataset('lc')
print(bSystem.get_dataset(kind='lc', check_visible=False))

ParameterSet: 32 parameters
               times@lc01@dataset: [] d
              fluxes@lc01@dataset: [] W / m2
            passband@lc01@dataset: Johnson:V
    intens_weighting@lc01@dataset: energy
       compute_times@lc01@dataset: [] d
C     compute_phases@lc01@dataset: []
       phases_period@lc01@dataset: period
         phases_dpdt@lc01@dataset: dpdt
           phases_t0@lc01@dataset: t0_supconj
        mask_enabled@lc01@dataset: True
         mask_phases@lc01@dataset: []
        solver_times@lc01@dataset: auto
              sigmas@lc01@dataset: [] W / m2
          sigmas_lnf@lc01@dataset: -inf
          pblum_mode@lc01@dataset: component-coupled
     pblum_component@lc01@dataset: primary
       pblum_dataset@lc01@dataset: 
              pbflux@lc01@dataset: 1.0 W / m2
             l3_mode@lc01@dataset: flux
                  l3@lc01@dataset: 0.0 W / m2
             l3_frac@lc01@dataset: 0.0
             exptime@lc01@dataset: 0.0 s
     ld_mode@primary@lc01@dataset: interp
   ld

The important parameters are explained below.

### times
Handles times for observed data. Units are days.

### fluxes
Handles observed fluxes corresponding to a given time. Units are W/m2.

### sigmas
Handles uncertainty on observed fluxes. Units are W/m2.

### compute_times / compute_phases

Times to use using `bSystem.run_compute`. If empty, uses times parameter. Units are days (compute_times) or no units (compute_phases).
