The goal of this task is to test two variants of fitting a power law model to data. Let's imagine a hypothetical multi-tokamak dataset where the power threshold for access to the H mode was measured together with the plasma parameters.

* `P_lh`: power threshold of L-H mode transition. The dependent variable.
* `n_e`: electron density.
* `a`: plasma size.
* `B_T`: magnetic field strength.
* `R`: plasma shape factor.
* `q_95`: safety factor at 95% of the plasma boundary.

We assume that the data can be described by a power law scaling:

$$P_{lh} = \alpha_0 n_e^{\alpha_{ne}} a^{\alpha_{a}} B_T^{\alpha_{BT}} R^{\alpha_{R}} q_{95}^{\alpha_{q95}}$$

with Normal distributed noise of unknown amplitude on `P_lh` only (this is indeed unrealistic, in real situation there will be also significant noise on the independent variables, systematic errors, etc. - see e.g. [[Verdoolaege 2021](https://iopscience.iop.org/article/10.1088/1741-4326/abdb91/meta)] for real-life power law analysis).

Your task is to compare two different approaches to fit this data.

1. Use the traditional approach to derive power law coefficients from experimental data that was used in many high-profile fusion papers. I.e. first logarithmize the X data and then fit a linear model:

$$\log P_{lh} = \log \alpha_0 + \alpha_{ne} \log n_e + \alpha_{a} \log a + \alpha_{BT} \log B_T + \alpha_{R} \log R + \alpha_{q95} \log q_{95}$$

2. Fit the power law directly to the datat:

$$P_{lh} = \alpha_0 n_e^{\alpha_{ne}} a^{\alpha_{a}} B_T^{\alpha_{BT}} R^{\alpha_{R}} q_{95}^{\alpha_{q95}}$$

3. What's your predction of the power threshold for a new (big) tokamak with the following parameters? (the parameters are of course not really realistic)
* `n_e = 9 x 10^19 m^-3`
* `a = 8 m`
* `B_T = 9.5 T`
* `R = 10 m`
* `q_95 = 3`

4. Send us your prediction to learn whether you new tokamak will achieve the H-mode and thus fusion ignition or not :)

Hint: Investigate the data first and thing how to handle the different scales of variables and missing values. Think about uncertainties of the fitted parameters.

# Data

In [12]:
import pandas as pd

In [13]:
# real data for the full power law fit

data = pd.read_csv('data/01-02-power_law_scaling-data.csv')
data.head()

Unnamed: 0,ne,a,BT,R,q95,P_lh
0,4.370861e+19,1.282863,6.778285,1.465135,1.928115,881529.2
1,9.556429e+19,6.727694,1.75726,5.782192,9.122976,8957567.0
2,7.587945e+19,3.829204,2.454658,5.865716,5.547271,4518826.0
3,6.387926e+19,5.577136,9.086988,6.736869,8.438117,9286648.0
4,2.404168e+19,9.168098,6.457862,7.534822,3.880446,6872281.0


In [14]:
# logarithmized data for the linear fit

data_log = pd.read_csv('data/01-02-power_law_scaling-log_data.csv')
data_log.head()

Unnamed: 0,log_ne,log_a,log_BT,log_R,log_q95,log_P_lh
0,19.640567,0.10818,0.83112,0.165878,0.285133,5.945237
1,19.980296,0.827866,0.244836,0.762092,0.960137,6.95219
2,19.880124,0.583108,0.389991,0.768321,0.744079,6.655026
3,19.80536,0.746411,0.95842,0.828458,0.926246,6.967859
4,19.380965,0.962279,0.810089,0.877073,0.588882,6.837101


In [15]:
# new data for the prediction
data_new = pd.DataFrame({
    'n_e': [9e19],
    'a': [8.],
    'B_T': [9.5],
    'R': [10.],
    'q_95': [3.]
})

data_new

Unnamed: 0,n_e,a,B_T,R,q_95
0,9e+19,8.0,9.5,10.0,3.0
