# The `JADE` code: Tutorial #2

*Author:* Mara Attia

***

This tutorial focuses on **atmospheric characterization** using the `JADE` code. It assumes you are already acquainted with the basics of the code, namely that you already went through the first tutorial.

Compared to the first tutorial, the present one makes a more advanced use of the `JADE` code. To employ the terminology of the JADE use cases (see [Attia et al. 2025]()), we will go through **RS1**, **RS2**, and **FA4**. Consequently, we will perform **retrieval modeling** (for the first two use cases) as well as **forward modeling** with **multiple simulations**. 

## Use cases RS1, RS2: Retrieval modeling, interior characterization

The `JADE` code can be employed to **constrain the internal structure** of a planet, as well as its **atmospheric contents**, which represents the **RS1** and **RS2** use cases. Both operations are actually done at the same time using the same routine `routines/compo_retrieval.py`.

Before diving into the code details of how to use the latter script, let us first insist on what is in fact constrainable by `JADE`. Again, we refer the reader to [Attia et al. (2021)](https://ui.adsabs.harvard.edu/abs/2021A%26A...647A..40A/abstract) for the extensive details of the implemented physics. The internal structure of a planet, as modeled by `JADE`, is the following:

- An *iron nucleus* ($\alpha$-Fe) at the very core,
- A *silicate mantle* (perovskite) surrounding it,
- A *light atmosphere* (H/He and a trace amount of metals) topping the solid material.

Therefore, starting from observational measurements of the mass and radius of a planet (as well as its orbital parameters), we will **constrain the mass fraction of the different internal structure components** (except for the atmospheric metallicity, fixed by the user), and in doing so **assess degeneracies** between all these parameters.

***

Even if we will not make use of the `routines/main.py` file as would be done for regular simulations, **an input file is still required**. An example one has been provided for this use case in `input/examples/example_rs.txt`. Have a thorough look at it before proceeding. You may notice it is almost the same as the FC example encountered in the first tutorial, with the following main differences:

- `age = 4000.`: we will constrain the interior at the time variable $t_\bullet = 4\,{\rm Gyr}$. The age parameter controls the intrinsic temperature of the planet $T_{\rm int}$ (and its equilibrium temperature $T_{\rm eq}$ in the `stellar_lum = tabular` mode).
- `lazy_init = True`: fast initialization of the `JADE` code. **This setting should always be switched on for this use case, and for this use case only** (unless you really know what you are doing).
- `planet_sma = 0.05`: the semi-major axis of the planet is $a_\bullet = a(t = t_\bullet) = 0.05\,{\rm AU}$.
- `planet_ecc = 0.2`: the eccentricity of the planet is $e_\bullet = e(t = t_\bullet) = 0.2$.

Compared to the first tutorial, the planet is ten times closer-in and eccentric, which typically corresponds to what would happen shortly after the decoupling from a ZLK-inducing perturber. **The planet parameters in the input file correspond here to the system age**, where the retrieval will be applied (currently observed parameters, denoted with a $\bullet$ subscript throughout the present tutorial). In this regard, `Mpl = 0.06` here means that the present mass used for the retrieval is $M_{\rm p,\,\bullet} = 0.06\,M_{\rm J} \simeq 19 M_\oplus$. This is conceptually different from the regular usage of the `JADE` code, in which the orbital parameters and the planet mass in the input file correspond to the starting point of the simulation (at `t_init`, typically after the disk dispersal, like in the first tutorial).

We are conscious that the system age is often not that well constrained in scientific literature. If you have no clue about the age of your star, you can just **set it to a relatively high value (e.g., `age = 6000.`)** as (i) most observed exoplanets have old hosts, (ii) $T_{\rm int}$ drops quite quickly (in $\sim 1\,{\rm Gyr}$) to $\lesssim 100\,{\rm K}$ and plateaus at $\lesssim 50\,{\rm K}$ after $4 - 8\,{\rm Gyr}$ ([Mordasini 2020](https://ui.adsabs.harvard.edu/abs/2020A%26A...638A..52M/abstract)), so it has a minimal influence on the internal structure. Nonetheless, **it is important to accurately define `Lbol`** (in `stellar_lum = analytic` mode), since it has a major influence on the retrieval. You can compute it using $L_{\rm bol} = 16 \pi \sigma_{\rm B} \sqrt{1 - e_\bullet^2} a_\bullet^2 T_{\rm eq,\,\bullet}^4$ ([Attia et al. 2021](https://ui.adsabs.harvard.edu/abs/2021A%26A...647A..40A/abstract)), where $\sigma_{\rm B}$ is the Stefan–Boltzmann constant.

*Note:* the values set in the `Mcore`, `YHe`, and `fmantle` parameters of the input file do not matter, as they will be constrained by the retrieval. Nonetheless, it is important to fix a `Zmet` value (trace amount, $Z \leq 10^{-4}$, see discussion in [Attia et al. 2025]()).

***

Let us know have a look at `routines/compo_retrieval.py`, which will perform the internal structure constraining using an MCMC exploration (by means of the `emcee` package, [Foreman-Mackey et al. 2013](https://ui.adsabs.harvard.edu/abs/2013PASP..125..306F/abstract)). All you have to do is to **fill in the various fields of the `USER INPUT` section**. They have already been prefilled for this tutorial. Again, have a thorough look at the file, especially the `USER INPUT` section and the comments there, before proceeding. In particular, you can see via the `Rp_med`, `Rp_sig_up`, and `Rp_sig_dn` fields that the observational constraint on the planet radius is set to $R_{\rm p,\,\bullet} = 3.8^{+0.2}_{-0.1}\,R_\oplus$.

**Warning:** if you run the retrieval locally, be careful with the number of cores `nc` you use for the MCMC. The latter will call the atmospheric structure integrator within the log-probability function, which is itself parallelized on a certain number of cores defined by the `parallel` setting in the input file. Hence, make sure employing `nc` $\times$ `parallel` cores does not overload your computer. In the event you are running the retrieval on a cluster, the number of cores you ask for (using, e.g., `NUM_THREADSPROCESSES` with [Slurm](https://slurm.schedmd.com/documentation.html)) should be exactly `nc` $\times$ `parallel`.

***

Once all the relevant data are entered in the `USER INPUT` section, run the script on the terminal with the input file as an argument (relative to `input/`), after you make sure you are in the `routines/` folder.

    python compo_retrieval.py examples/example_rs.txt

Various information will be printed on the standard output. Depending on the requested number of samples (defined by the `ns` setting) and workers (`nw`), the MCMC can take a long time (few hours, even days, so it is not a bad idea to run it on a cluster). You can also run it in successive chunks through the `reuse` setting. Since the intention of this tutorial is pedagogical, the results are already provided in `saved_data/examples/example_rs/`. In particular, the retrieval values are in `fit_results.log` within the latter folder.

Even though the `compo_retrieval.py` routine conveniently allows to plot the results (using the `chain_path` and `corner_path` settings), we will do it manually here by loading and manipulating the MCMC backend `mcmc.h5` in the aforementioned folder in order to break down the different steps (the routine essentially does the same thing as what we will do here). You could also choose to proceed in this way to have more control on the postprocessing and plotting stages.

In [None]:
# Importing necessary packages

import emcee
import corner
import numpy as np
import matplotlib.pyplot as plt

In [None]:
# Loading the MCMC results

#--------------------------------
# Use the same values for these 5 variables as in 'compo_retrieval.py'

mcmc_path = '../saved_data/examples/example_rs/mcmc.h5'
nw        = 16
Rp_med    = 3.8
Rp_sig_up = 0.2
Rp_sig_dn = 0.1

#--------------------------------
# Constructing the sampler

NDIM    = 3 
logP    = lambda x:x #Dummy function just to initialize the sampler
backend = emcee.backends.HDFBackend(mcmc_path)
sampler = emcee.EnsembleSampler(nw, NDIM, logP, backend=backend)
samples = sampler.get_chain()

In [None]:
# Postprocessing

#--------------------------------
# Burn-in phase (half of the samples)

ns = backend.iteration
nb = ns//2
flat_samples = sampler.get_chain(discard=nb, flat=True)
print(f'Applying burn-in phase (discarding first {nb} iterations)... Shape after burn-in/flattening: \
{flat_samples.shape}.')

#--------------------------------
# Reformatting the samples to have the following shape

#... clean_samples[:, 0]: atmosphere mass fraction
#... clean_samples[:, 1]: silicate mantle mass fraction
#... clean_samples[:, 2]: iron nucleus mass fraction
#... clean_samples[:, 3]: atmospheric helium mass fraction
#... clean_samples[:, 4]: computed planet radius (returned as a blob in the log-probability function of the MCMC)

clean_samples        = np.zeros([flat_samples.shape[0], flat_samples.shape[1] + 2])
clean_samples[:, :2] = flat_samples[:, :2]
clean_samples[:, 2]  = 1. - (flat_samples[:, 0] + flat_samples[:, 1])
clean_samples[:, 3]  = flat_samples[:, 2]
clean_samples[:, 4]  = sampler.get_blobs(discard=nb, flat=True)

#--------------------------------
# Cleaning nonphysical configurations

clean_samples = clean_samples[(clean_samples[:, 2] >= 0.) & (clean_samples[:, 4] > -np.inf)]
print(f'Cleaning nonphysical configurations... Shape after cleaning/reformatting: {clean_samples.shape}.')

In [None]:
# Plotting the chains

#... Burn-in samples are in red
#... Kept samples are in blue

fig, axes = plt.subplots(NDIM, figsize=(10, 1.5*NDIM), sharex=True, dpi=300)

labels = [r'$f_{\rm H/He}$', r'$f_{\rm Si}$', '$Y$']

for i in range(NDIM):
    ax = axes[i]
    x = np.arange(1, ns + 1)
    y = samples[:, :, i]
    ax.plot(x[:nb + 1], y[:nb + 1], c='tomato',     alpha=0.3)
    ax.plot(x[nb:],     y[nb:],     c='dodgerblue', alpha=0.3)
    ax.set_xlim(1, ns)
    ax.set_ylabel(labels[i])

axes[-1].set_xlabel('Step Number')

fig.tight_layout()
plt.show()

In [None]:
# Plotting the corner plot

#... Medians and +/- 1 sigma envelopes, as derived from the retrieval, are the black dashed vertical lines
#... Median of the observed radius is the orange vertical line (bottom right)
#... +/- 1 sigma envelope of the observed radius is the two orange dotted vertical lines (bottom right)

fig, axes = plt.subplots(NDIM + 2, NDIM + 2, figsize=(10, 10), dpi=300)
fig = corner.corner(clean_samples, quantiles=(0.159, 0.5, 0.841), fig=fig, plot_datapoints=False, verbose=False)
ax = np.array(axes).reshape((NDIM + 2, NDIM + 2))

ax[-1][-1].vlines(Rp_med, 0, 1, transform=ax[-1][-1].get_xaxis_transform(), colors='orange')
ax[-1][-1].vlines([Rp_med - Rp_sig_dn, Rp_med + Rp_sig_up], 0, 1, transform=ax[-1][-1].get_xaxis_transform(), 
                  colors='orange', linestyles='dotted')

for _ax in ax.flatten():
    _ax.tick_params(rotation=0.)

labels = [r'$f_{\rm H/He}$', r'$f_{\rm Si}$', r'$f_{\rm Fe}$', r'$Y$', r'$R_{\rm p}$ [$R_\oplus$]']
for i in range(NDIM + 1):
    ax[-1, i].set_xlabel(labels[i])
    ax[i + 1, 0].set_ylabel(labels[i + 1])

fig.tight_layout()
plt.show()

You can see that **the atmospheric mass fraction converged quite satisfactorily** to $f_{\rm H/He} = 0.091^{+0.029}_{-0.034}$ (value retrieved from `fit_results.log`). It is not exactly a Gaussian because the constraining planet radius itself follows a skewed distribution. The silicate mass fraction $f_{\rm Si}$ (and consequently the iron mass fraction $f_{\rm Fe}$) did not converge on the other hand, their posterior is just flat. Additionally, the retrieval seems to favor a high atmospheric helium mass fraction, whose posterior follows a ramp shape dropping at $Y = 0.4$ due to the uniform prior $\mathcal{U}(0,\,0.4)$ imposed by the `YBOUNDS` setting in `compo_retrieval.py`.

A good diagnostic for convergence, aside from visual inspection of the chains and corner plot, is how well **the posterior planet radius distribution matches its observational constraint**. 

*Note:* for this example, the provided MCMC chains in `mcmc.h5` have `ns = 5000` samples and `nw = 16` workers. It took about 30 hours to run with `nc = 2` $\times$ `parallel = 5` cores. You could try to run it by yourself for more iterations to see how the code works. All the postprocessing and plotting code shown above will still work independently of the number of samples.

## Use case FA4: Forward modeling, maximum initial mass

Now that we have a likely composition for our fictitious planet, we can go a step further and **determine its maximum initial mass**, which represents the **FA4** use case.

To this effect, we will perform multiple simulations where evaporation is the exclusive operating process. They will be launched on a range of different initial planet masses, but with an unchanged core (iron nucleus + silicate mantle, derived from the previously investigated RS1) and atmospheric composition (RS2). **They will hence differ by their atmosphere mass**. We will assume a fixed orbit for all of them, namely the one corresponding to the present-day configuration.

In doing so, the initial planet mass yielding a **mass compatible with the current one at the upper limit of the system age** can be interpreted as the maximum initial mass. Indeed, an invariant close-in orbit during the entire planet's lifetime (*in situ* formation or early-on disk migration) represents the locus of maximum possible erosion, and any initial mass that is too high to evaporate enough to match the present one can definitely be excluded.

*Note:* technically, this interpretation is correct only if the migration occurs inward. In practice, this condition is always satisfied for our close-in targets.

***

Let us say that the age of the system including its error bar is $t_\bullet = 4 \pm 1\,{\rm Gyr}$, and that the planet mass including its error bar is $M_{\rm p,\,\bullet} = 0.06 \pm 0.01\,M_{\rm J}$. The total core mass flows from the previously derived atmospheric mass fraction $M_{\rm core} = (1 - f_{\rm H/He})M_{\rm p,\,\bullet}$. We will fix the atmospheric helium mass fraction to $Y = 0.4$ given the retrieval results, and the silicate-to-iron mass ratio to 2:1 (similar to Earth) as the retrieval shows no preference to any value.

For the sake of this tutorial, we will explore initial masses up to $3 \times M_{\rm p,\,\bullet} = 0.18\,M_{\rm J}$, with a step equal to $0.02\,M_{\rm J}$, i.e., the full error bar of $M_{\rm p,\,\bullet}$, since it is the sensitivity of observational measurements here.

One could create the various input files by hand, but it would be tedious to do, especially for more complex cases. Instead, we will make use of the `utils/exploration.py` util, which has been designed specifically for this purpose. **Make sure you get acquainted with its contents, including the documentation within, before proceeding**. You can directly edit this file every time you need it, or duplicate it under a different name for every exploration (e.g., `exploration_fa4.py` here).

The two most important settings are the `constant_param` dictionary, containing the **fixed parameters** of the various input files to create, and the `variable_param` list, containing the **parameters that will vary** from an input file to another. According to the aforementioned information, and to ensure consistency with the RS example, they should be set to the following:

    constant_param = {
	
        'output_freq' :      '100',
        'output_npts' :      '1000',
        'age' :              '5000.',
        't_init' :           '10.',

        'dyn' :              'False',
        'orderdyn' :         '4',
        'perturber' :        'False',
        'tides' :            'False',
        'relat' :            'False',
        'roche' :            'False',

        'atmo' :             'True',
        'atmo_acc' :         'False',
        'atmo_grid_path' :   '',
        'evap' :             'True',
        't_atmo' :           '',
        'parallel' :         '9',

        'lazy_init' :        'False',
        'reuse' :            'False',
        'simul' :            'True',

        'Ms' :               '1.',
        'Rs' :               '1.',
        'ks' :               '0.01',
        'Qs' :               '1e5',
        'alphas' :           '0.08',
        'spins' :            '85.',
                  
        'Mcore' :            '0.06*(1-0.091)',
        'Mpl' :              '',
        'Rpl' :              '',
        'kpl' :              '0.25',
        'Qpl' :              '1e4',
        'alphapl' :          '0.25',
        'spinpl' :           '55.',
                  
        'stellar_lum' :      'analytic',
        'stellar_lum_path' : '',
        'Lbol' :             '4e33',
        'LX_Lbol_sat' :      '',
        'tau_X_bol_sat' :    '',
        'alpha_X_bol' :      '',

        'YHe' :              '0.4',
        'Zmet' :             '0.0001',
        'fmantle' :          '0.66',
                  
        'Mpert' :            '1.',

        'planet_sma' :       '0.05',
        'planet_ecc' :       '0.2',
        'planet_incl' :      '90.',
        'planet_lambd' :     '0.',
        'planet_omega' :     '90.',
        'planet_Omega' :     '90.',
                  
        'pert_sma' :         '10.',
        'pert_ecc' :         '0.00001',
        'pert_incl' :        '15.',
        'pert_lambd' :       '0.',
        'pert_omega' :       '90.',
        'pert_Omega' :       '90.',

    }
    
    variable_param = [{'Mpl':['0.08', '0.10', '0.12', '0.14', '0.16', '0.18']}]
    
All values, in either settings, should **always be strings**.

*Note:* you can put simple math formulae in the input files (as done in `Mcore`) as long as there are no blank spaces in them.

For consistency with the other example input files, we will specify

- `input_tree = 'examples/'`, and
- `name = 'example_fa4'`.

This way, the input files will be stored in `input/examples/example_fa4/` and will be called `example_fa4_000001.txt`, `example_fa4_000002.txt`, etc. You may notice that you can thus create up to 999 999 input files with this util, but you can always modify the script if you require more. It can seem like a big number (especially given that we will only create six input files now), but it is needed for large parameter space explorations (such as in [Attia et al. 2025]()).

***

Once you are all set, go to the `utils/` folder and run the exploration script:

    python exploration.py
    
The standard output will inform you about the location of the created input files. The util also creates **an info file** (in the same folder as the multiple input files), detailing the fixed and variable parameters within each input file.

Then, run the six simulations sequentially, after you make sure you are in the `routines/` folder:

    python main.py examples/example_fa4/example_fa4_000001.txt
    python main.py examples/example_fa4/example_fa4_000002.txt
    [...]
    python main.py examples/example_fa4/example_fa4_000006.txt
    
Each run should not take more than a few minutes.

We will now analyze the results, using `utils/output.py` like in the first tutorial.

In [None]:
# Importing the output util
import sys
sys.path.insert(1, '../utils/')
from output import JADE_output

# Unit conversion constant
MJ2E = 317.8284065946748 #[M_Jup] to [M_Earth]

In [None]:
# We will store the multiple simulations in this list
sims_fa4 = []

# Number of simulations
n_sims = 6

# Iterating over the simulations
for i in range(1, n_sims + 1):
    
    # Defining the input file and output folder paths
    input_file_fa4    = f'../input/examples/example_fa4/example_fa4_{i:06d}.txt'
    output_folder_fa4 = f'../saved_data/examples/example_fa4/example_fa4_{i:06d}/'

    # Creating a JADE_output instance
    sim_fa4 = JADE_output(txt=input_file_fa4, npz=output_folder_fa4, verbose=False)

    # Setting a uniform time variable
    sim_fa4.set_time()
    
    # Storing the JADE_output instance
    sims_fa4.append(sim_fa4)

In [None]:
# We will store the multiple parameters to be plotted in these lists
ts_fa4  = [] #Time [Gyr]
Mps_fa4 = [] #Mass [M_Earth]

# Iterating over the simulations
for sim_fa4 in sims_fa4:
    
    # Extracting parameters
    t_fa4  = sim_fa4.t*1e-9  #Time [Gyr]
    Mp_fa4 = sim_fa4.Mp*MJ2E #Mass [M_Earth]
    
    # Storing parameters
    ts_fa4.append(t_fa4)
    Mps_fa4.append(Mp_fa4)
    
# Storing the core mass [M_Earth]
Mc_fa4 = sim_fa4.Mc*MJ2E

In [None]:
# Observational constraints

from matplotlib.patches import Rectangle

Mp_med = 0.06*MJ2E #Median of the mass [M_Earth]
Mp_sig = 0.01*MJ2E #Standard deviation of the mass [M_Earth]
t_med  = 4.        #Median of the age [Gyr]
t_sig  = 1.        #Standard deviation of the age [Gyr]

In [None]:
# Plotting

fig, ax = plt.subplots(figsize=(5, 4), dpi=300)

ax.axhline(Mc_fa4, c='black', ls='--', label=r'$M_{\rm core}$')
ax.add_patch(Rectangle((t_med - t_sig, Mp_med - Mp_sig), 2*t_sig, 2*Mp_sig, fc='gold', alpha=.3))
ax.plot([t_med - t_sig, t_med + t_sig], [Mp_med, Mp_med], c='orange', ls=':', label=r'$M_{\rm p,\,\bullet}$')

for i in range(n_sims):
    c     = 'red' if i == 2 else 'darkblue'
    ls    = '-.'  if i == 2 else '-'
    alpha = 1.    if i == 2 else 1. - i/n_sims
    ax.plot(ts_fa4[i], Mps_fa4[i], c=c, ls=ls, alpha=alpha)

ax.legend()
    
ax.set_xlabel('Time [Gyr]')
ax.set_ylabel(r'$M_{\rm p}$ [$M_\oplus$]')

fig.tight_layout()
plt.show()

Upon visual inspection, one can see that the **two lowest mass planets evaporated too much** (already below the upper limit of the currently observed mass $M_{\rm p,\,\bullet}$ at the upper limit of the system age $t_\bullet$) while the **four highest mass planets did not evaporate enough** (above that threshold). A conservative maximum initial mass is consequently the red dash–dotted simulation, i.e., $M_{\rm p,\,0,\,max} = 0.12\,M_{\rm J} \simeq 38\,M_\oplus$. Once could for example confidently conclude that our example mini-Neptune **could have never been a Saturn-mass planet** (let alone a Jupiter analog).

*Note:* one could try to refine the coarse $M_{\rm p,\,0}$ grid in order to reach a simulation that ends as close to the top right corner of the orange shaded rectangle as possible, which would improve the accuracy of the derived $M_{\rm p,\,0,\,max}$. This can be conveniently done using the same `exploration.py` file, by adding more values **at the end** of the `Mpl` dictionary inside `variable_param`. The util will **create new input files accordingly while keeping the old ones intact**, and update the info file. This is also how more complex parameter space explorations, with several dimensions, can be iteratively refined. Nonetheless, we recall that more accuracy in this case might not be well-motivated given the observational precision on $M_{\rm p,\,\bullet}$.

***

**Congratulations, you have reached the end of this tutorial.** We hope it made it clear enough for you to employ the `JADE` code for atmospheric characterization, and to have a first intuition on its exploration capabilities.