The goal of this task is to test two variants of fitting a power law model to data. Let's imagine a hypothetical multi-tokamak dataset where the power threshold for access to the H mode was measured together with the plasma parameters.

* `P_lh`: power threshold of L-H mode transition. The dependent variable.
* `n_e`: electron density.
* `a`: plasma size.
* `B_T`: magnetic field strength.
* `R`: plasma shape factor.
* `q_95`: safety factor at 95% of the plasma boundary.

We assume that the data can be described by a power law scaling:

$$P_{lh} = \alpha_0 n_e^{\alpha_{ne}} a^{\alpha_{a}} B_T^{\alpha_{BT}} R^{\alpha_{R}} q_{95}^{\alpha_{q95}}$$

with Normal distributed noise of unknown amplitude on `P_lh` only (this is indeed unrealistic, in real situation there will be also significant noise on the independent variables).

Your task is to compare two different approaches to fit this data.

1. Use the traditional approach to derive power law coefficients from experimental data that was used in many high-profile fusion papers. I.e. first logarithmize the X data and then fit a linear model:

$$\log P_{lh} = \log \alpha_0 + \alpha_{ne} \log n_e + \alpha_{a} \log a + \alpha_{BT} \log B_T + \alpha_{R} \log R + \alpha_{q95} \log q_{95}$$

2. Fit the power law directly to the datat:

$$P_{lh} = \alpha_0 n_e^{\alpha_{ne}} a^{\alpha_{a}} B_T^{\alpha_{BT}} R^{\alpha_{R}} q_{95}^{\alpha_{q95}}$$

3. What's your predction of the power threshold for a new (big) tokamak with the following parameters? (the parameters are of course not really realistic)
* `n_e = 9 x 10^19 m^-3`
* `a = 8 m`
* `B_T = 9.5 T`
* `R = 10 m`
* `q_95 = 3`

4. Send us your prediction to learn whether you new tokamak will achieve the H-mode and thus fusion ignition or not :)

Hint: Investigate the data first and thing how to handle the different scales of variables and missing values. Think about uncertainties of the fitted parameters.

# Data

In [40]:
import pandas as pd
import numpy as np
import scipy.stats as stats
from scipy.optimize import curve_fit

def P_lh(P,a_0,a_n_e,a_a,a_B_T,a_R,a_q_95): #calculates P_lh
    n_e,a,B_T,R,q_95 = P
    return a_0*n_e**a_n_e*a**a_a*B_T**a_B_T*R**a_R*q_95**a_q_95

In [31]:
# real data for the full power law fit

data = pd.read_csv('data/01-02-power_law_scaling-data.csv')

data=data[data>=0].dropna() #drops out values with negative P_lh

data.head(10)

Unnamed: 0,ne,a,BT,R,q95,P_lh
0,4.370861e+19,1.282863,6.778285,1.465135,1.928115,881529.2
1,9.556429e+19,6.727694,1.75726,5.782192,9.122976,8957567.0
2,7.587945e+19,3.829204,2.454658,5.865716,5.547271,4518826.0
3,6.387926e+19,5.577136,9.086988,6.736869,8.438117,9286648.0
4,2.404168e+19,9.168098,6.457862,7.534822,3.880446,6872281.0
5,2.403951e+19,3.24363,1.082773,9.782669,9.059709,1840149.0
7,8.795585e+19,7.79996,6.971516,3.906608,1.097539,13749710.0
8,6.410035e+19,3.059183,1.045554,8.156676,9.148438,3705082.0
11,9.729189e+19,2.450992,7.227057,1.706107,9.550558,3747442.0
12,8.491984e+19,9.367279,6.867651,1.228157,9.555464,3674357.0


In [3]:
# logarithmized data for the linear fit

data_log = pd.read_csv('data/01-02-power_law_scaling-log_data.csv')
data_log=data_log.dropna() # filters out lines with NaN P_lh values

data_log.head()

Unnamed: 0,log_ne,log_a,log_BT,log_R,log_q95,log_P_lh
0,19.640567,0.10818,0.83112,0.165878,0.285133,5.945237
1,19.980296,0.827866,0.244836,0.762092,0.960137,6.95219
2,19.880124,0.583108,0.389991,0.768321,0.744079,6.655026
3,19.80536,0.746411,0.95842,0.828458,0.926246,6.967859
4,19.380965,0.962279,0.810089,0.877073,0.588882,6.837101


In [None]:
X_log=np.column_stack((np.ones_like(data_log.iloc[:,0]),data_log.iloc[:,0],data_log.iloc[:,1],data_log.iloc[:,2],data_log.iloc[:,3],data_log.iloc[:,4]))
Y_log=data_log.iloc[:,5]

beta_log = np.linalg.lstsq(X_log,Y_log,rcond=None)[0] #linear model
beta_log[0]=np.exp(beta_log[0]) #exp(alpha_0)
beta_log #final result

array([0.11631819, 0.38002769, 0.99560878, 0.36141438, 0.44630932,
       0.00522906])

In [None]:
# new data for the prediction
data_new = pd.DataFrame({
    'n_e': [9e19],
    'a': [8.],
    'B_T': [9.5],
    'R': [10.],
    'q_95': [3.]
})


P_lh_log=P_lh(data_new.values[0,:],*beta_log) # P_lh from data_log using linear fit
P_lh_log


223925832.38867867

In [None]:
X=np.column_stack((data.iloc[:,0],data.iloc[:,1],data.iloc[:,2],data.iloc[:,3],data.iloc[:,4]))
Y=data.iloc[:,5]

popt, pcov = curve_fit(P_lh,X.T,Y,p0=beta_log,maxfev=2000) #non-linear model, used linear model as baseline p0

popt

array([ 3.17175675e-14,  9.22736056e-01,  1.44351055e+00,  5.11483199e-01,
        7.76894486e-01, -2.98282749e-03])

In [47]:
P_lh_non=P_lh(data_new.values[0,:],*popt) # P_lh from data using non-linear fit
P_lh_non

31115683.465297226

In [None]:
P_lh_non/P_lh_log #P_lh_non is approx 7 times smaller than P_lh_log

7.196558373478095