# Cosmological structure

How to use `twinLab` to create a surrogate model for the cosmological matter power spectrum, a quantity that can be both measured from observations of galaxy clustering or weak gravitational lensing and also calculated theoretically using $n$-body simulations.

## Configuration

We need to supply our credentials to use `twinLab`.

In [None]:
# Standard imports
import os
import dotenv

In [None]:
# User and group details
USER_NAME = "mead"
GROUP_NAME = "digilab"

# Campaign
CAMPAIGN_ID = "cosmology"

# Data files
TRAINING_DATA = "cosmology.csv"
EVALUATION_DATA = "eval.csv"
GRID_DATA = "grid.csv"

# Directories
CAMPAIGN_DIR = "./resources/campaigns/cosmology"
DATASETS_DIR = "./resources/datasets"

In [None]:
# File paths
DATASET_PATH = os.path.join(DATASETS_DIR, TRAINING_DATA)
EVALUATION_PATH = os.path.join(CAMPAIGN_DIR, EVALUATION_DATA)
GRID_PATH = os.path.join(CAMPAIGN_DIR, GRID_DATA)

# Write to screen
print(f"Grid........ {GRID_PATH}")
print(f"Dataset..... {DATASET_PATH}")
print(f"Evaluate.... {EVALUATION_PATH}")

Create an `.env` file from `.env.example`, and fill in your `USER_NAME` and `GROUP_NAME`.

In [None]:
!cp .env.example .env

dotenv_file = dotenv.find_dotenv()
_ = dotenv.set_key(dotenv_file, "USER_NAME", USER_NAME)
_ = dotenv.set_key(dotenv_file, "GROUP_NAME", GROUP_NAME)

### Library

Import the `twinLab` client with: 

In [None]:
# Standard imports
from pprint import pprint

# Third-party imports
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# twinLab
import twinlab as tl

Ensure that the correct group and user names are reported.
These are used to track client usage.

## Run

### Upload dataset

We'll use the csv data in `resources/datasets/cosmology.csv` to train our emulator:

In [None]:
tl.upload_dataset(DATASET_PATH)

**NOTE:** If your dataset is larger than `6mb` you should use `tl.upload_big_dataset(DATASET_PATH)`

### List datasets

Check which datasets are avalible to train with:

In [None]:
tl.list_datasets()

### View dataset statistics

You can query the dataset to provide a statistical summary:

In [None]:
tl.query_dataset(TRAINING_DATA)

### Training

Set the emulator training parameters:

In [None]:
# Set parameters
power_ratio = True
power_log = True
nk = 100
inputs = ["Omega_c", "Omega_b", "h", "ns", "sigma_8"]
outputs = [f"k{i}" for i in range(nk)]
params = {
    "filename": TRAINING_DATA,
    "inputs": inputs,
    "outputs": outputs,
    "decompose_outputs": True,
    "output_explained_variance": 0.999,
    "train_test_split": 900,
}
pprint(params, compact=True, sort_dicts=False)

Train the emulator:

In [None]:
tl.train_campaign(params, CAMPAIGN_ID, server="cloud")

Check which campaigns are ready with:

In [None]:
tl.list_campaigns()

View the metadata of an emulator with:

In [None]:
response = tl.query_campaign(CAMPAIGN_ID)
pprint(response, compact=True)

### Sample

Sample the trained emulator with:

In [None]:
df_eval_mean, df_eval_std = tl.sample_campaign(EVALUATION_PATH, CAMPAIGN_ID)
df_train_mean, df_train_std = tl.sample_campaign(DATASET_PATH, CAMPAIGN_ID)

Read in the evaluation data and the grid of $k$ values on which to evaluate $P(k)$

In [None]:
df_train = pd.read_csv(DATASET_PATH)
df_grid = pd.read_csv(GRID_PATH)
df_eval = pd.read_csv(EVALUATION_PATH)

In [None]:
# Plotting parameters
nsig = [1, 2]
alpha_data = 0.5
alpha_model = 0.5
plot_band = True
npow = 10

# Plot power
plt.subplots(1, npow, figsize=(35., 3.), sharex=True, sharey=True)
grid = df_grid.iloc[0].values
for i in range(npow):
    plt.subplot(1, npow, i+1)
    if power_ratio:
        plt.axhline(1., color="black", lw=0.5)
    eval = df_eval[outputs].iloc[i].values
    mean = df_eval_mean.iloc[i].values
    err = df_eval_std.iloc[i].values
    if power_log:
        eval, mean = np.exp(eval), np.exp(mean)
    plt.plot(grid, eval, color="black", alpha=alpha_data)
    if plot_band:
        for sig in nsig:
            if power_log:
                ymin, ymax = np.exp(-sig*err), np.exp(sig*err)
                ymin, ymax = mean*ymin, mean*ymax
            else:
                ymin, ymax = -sig*err, sig*err
                ymin, ymax = mean+ymin, mean+ymax
            plt.fill_between(grid, ymin, ymax, color="blue", lw=0., alpha=alpha_model/sig)
    else:
        plt.plot(grid, mean, color="blue", alpha=0.5)
    plt.xlabel(r"$k/h\mathrm{Mpc}^{-1}$")
    plt.xscale("log")
    if i==0: 
        if power_ratio:
            plt.ylabel(r"$P(k)/P^\mathrm{lin}(k)$")
        else:
            plt.ylabel(r"$P(k)/(h^{-1}\mathrm{Mpc})^3$")
    plt.yscale("log")
plt.show()

In [None]:
# Parameters
nsig = [1, 2]
alpha_data = 0.5
alpha_model = 0.5
color_model = "blue"
plot_train = True
alpha_train = 0.5
color_train = "red"
dr = 2.
plot_band = True
ncos = 50
nrow = 5

# Calculations
ncol = ncos//nrow

# Plot
grid = df_grid.iloc[0].values
plt.subplots(nrow, ncol, figsize=(30, 2.5*nrow), sharex=True, sharey=True)
for i in range(ncos):
    plt.axhline(0., color="black", lw=1)
    plt.subplot(nrow, ncol, i+1)
    eval = df_eval[outputs].iloc[i].values
    eval_mean = df_eval_mean.iloc[i].values
    eval_err = df_eval_std.iloc[i].values
    train = df_train[outputs].iloc[i].values
    train_mean = df_train_mean.iloc[i].values
    train_err = df_train_std.iloc[i].values
    if power_log:
        eval, eval_mean = np.exp(eval), np.exp(eval_mean)
        train, train_mean = np.exp(train), np.exp(train_mean)
    if plot_band:
        for sig in nsig:
            if power_log:
                ymin, ymax = np.exp(-sig*eval_err), np.exp(sig*eval_err)
                ymin, ymax = 100.*((eval_mean*ymin)/eval-1.), 100.*((eval_mean*ymax)/eval-1.)
            else:
                ymin, ymax = -sig*eval_err, sig*eval_err
                ymin, ymax = 100.*((eval_mean+ymin)/eval-1.), 100.*((eval_mean+ymax)/eval-1.)
            plt.fill_between(grid, ymin, ymax, color=color_model, lw=0, alpha=alpha_model/sig)
            if plot_train:
                if power_log:
                    ymin, ymax = np.exp(-sig*train_err), np.exp(sig*train_err)
                    ymin, ymax = 100.*((train_mean*ymin)/train-1.), 100.*((train_mean*ymax)/train-1.)
                else:
                    ymin, ymax = -sig*train_err, sig*train_err
                    ymin, ymax = 100.*((train_mean+ymin)/train-1.), 100.*((train_mean+ymax)/train-1.)
                plt.fill_between(grid, ymin, ymax, color=color_train, lw=0, alpha=alpha_train/sig)
    else:
        y = 100.*(eval_mean/eval-1.)
        plt.plot(grid, y, color=color_model, alpha=alpha_model)
        if plot_train:
            y = 100.*(train_mean/train-1.)
            plt.plot(grid, y, color=color_train, alpha=alpha_train)
    if i//ncol==nrow-1: plt.xlabel(r"$k/h\mathrm{Mpc}^{-1}$")
    plt.xscale("log")
    if i%ncol==0: plt.ylabel(r"$P_\mathrm{model}(k)/P_\mathrm{truth}(k)-1$ [%]")
plt.tight_layout()
plt.show()

### Delete emulator

Delete a trained emulator with:

In [None]:
# tl.delete_campaign(CAMPAIGN_ID)

### Delete dataset

Delete an existing dataset with:

In [None]:
# tl.delete_dataset(TRAINING_DATA)