# ðŸ”¢ **Symbolic Regression - Analysis of equations**  

For this analysis, will be consider the file "optimization_sr.db" which was made by the pilot version of the script_sr.py, which considered the binary and unary operators as a parameter that should be optimized. In the new version, the operarors are considered as a default parameter in Symbolic Regression (SR) model. Here, the objective is to visualize the expressions obtained by SR with the optimized parameters.

***

### ðŸ“š **Importing libraries**

In [1]:
import sympy
import numpy as np
from matplotlib import pyplot as plt
import pandas as pd
import optuna
from sklearn.model_selection import train_test_split
from sklearn.cluster import BisectingKMeans

from script_sr import Clustering_SR, cross_validation

  from .autonotebook import tqdm as notebook_tqdm


Detected IPython. Loading juliacall extension. See https://juliapy.github.io/PythonCall.jl/stable/compat/#IPython


### ðŸ“‚ **Importing data and split**

In [2]:
data = pd.read_csv("../data/train.csv")

In [3]:
X = data.drop(columns=["critical_temp"])
y = data["critical_temp"]

X_test, X_train, y_test, y_train = train_test_split(
        X, y, test_size=0.9, random_state=1702
    )
y_test = np.clip(y_test, 1e-6, None)
y_train = np.clip(y_train, 1e-6, None)

### ðŸ“” **Importing study and its params**

In [4]:
study_name = "optimization_clusters_gam_teste15_07_2025_15_40_36" 
# The first version of script_sr.py defined the name with "gam" word. This was fixed in the new version!
from pathlib import Path

storage_url = f"sqlite:///../Optuna_files/optimization_sr.db"


study_sr = optuna.load_study(study_name=study_name, storage=storage_url)

In [5]:
print(f"The best params were: {study_sr.best_params}")

The best params were: {'clusterer': 'bisecting_kmeans', 'n_clusters_bkmeans': 17, 'bisecting_strategy': 'largest_cluster', 'n_iterations': 23, 'maxsize': 29, 'maxdepth': 14, 'n_binary_operators': 9, 'binary_operators_0': '*', 'binary_operators_1': '^', 'binary_operators_2': '*', 'binary_operators_3': '*', 'binary_operators_4': '+', 'binary_operators_5': '*', 'binary_operators_6': '+', 'binary_operators_7': '+', 'binary_operators_8': '-', 'n_unary_operators': 19, 'unary_operators_0': 'sin', 'unary_operators_1': 'sin', 'unary_operators_2': 'neg', 'unary_operators_3': 'tan', 'unary_operators_4': 'cube', 'unary_operators_5': 'cos', 'unary_operators_6': 'sqrt', 'unary_operators_7': 'inv', 'unary_operators_8': 'sin', 'unary_operators_9': 'tan', 'unary_operators_10': 'abs', 'unary_operators_11': 'exp', 'unary_operators_12': 'exp', 'unary_operators_13': 'log', 'unary_operators_14': 'log', 'unary_operators_15': 'sqrt', 'unary_operators_16': 'abs', 'unary_operators_17': 'neg', 'unary_operators_1

### ðŸ¦¾ **Training the model**

In [6]:
clusterer = BisectingKMeans(n_clusters=17,
                            bisecting_strategy="largest_cluster",)

In [7]:
model = Clustering_SR(clusterer=clusterer, 
              n_iterations=23, 
              maxsize=29, 
              maxdepth=14,
              select_k_features=20)

In [8]:
model.fit(X_train, y_train)

Using features ['entropy_atomic_mass' 'mean_fie' 'range_fie' 'std_fie' 'wtd_std_fie'
 'wtd_range_atomic_radius' 'wtd_std_atomic_radius' 'wtd_mean_FusionHeat'
 'entropy_FusionHeat' 'range_FusionHeat' 'wtd_std_FusionHeat'
 'mean_ThermalConductivity' 'wtd_gmean_ThermalConductivity'
 'entropy_ThermalConductivity' 'wtd_entropy_ThermalConductivity'
 'std_ThermalConductivity' 'wtd_mean_Valence' 'wtd_gmean_Valence'
 'wtd_entropy_Valence' 'wtd_std_Valence']
Using features ['wtd_range_atomic_mass' 'wtd_mean_fie' 'wtd_std_fie' 'mean_atomic_radius'
 'wtd_mean_atomic_radius' 'wtd_gmean_atomic_radius'
 'wtd_range_atomic_radius' 'wtd_std_atomic_radius' 'range_Density'
 'wtd_mean_ElectronAffinity' 'wtd_range_ElectronAffinity'
 'wtd_mean_FusionHeat' 'wtd_gmean_FusionHeat' 'range_FusionHeat'
 'wtd_range_FusionHeat' 'std_FusionHeat' 'wtd_std_FusionHeat'
 'wtd_entropy_ThermalConductivity' 'wtd_range_ThermalConductivity'
 'wtd_std_ThermalConductivity']
Using features ['mean_atomic_mass' 'wtd_mean_atomic_ma

0,1,2
,clusterer,BisectingKMea...n_clusters=17)
,n_iterations,23
,maxsize,29
,maxdepth,14
,select_k_features,20

0,1,2
,n_clusters,17
,init,'random'
,n_init,1
,random_state,
,max_iter,300
,verbose,0
,tol,0.0001
,copy_x,True
,algorithm,'lloyd'
,bisecting_strategy,'largest_cluster'


### ðŸ”¢ **Viewing equations**

#### **Cluster 1**

In [12]:
model.expressions_[0]

Unnamed: 0,complexity,loss,equation
0,1,22.927567,6.466067
1,3,21.490635,wtd_std_FusionHeat + 7.004014
2,4,19.164429,square(wtd_std_FusionHeat + 3.0578759)
3,6,18.182203,square(wtd_std_FusionHeat + 3.0578759) + entro...
4,10,18.014282,((12.306311 / exp(sqrt(abs(wtd_std_FusionHeat)...
5,12,16.226456,((0.40876484 ^ cos(exp(wtd_gmean_Valence) - sq...
6,19,16.10318,(exp(square(cos(wtd_gmean_Valence - 0.96376354...
7,21,15.511872,((0.40876484 ^ (cos(exp(wtd_gmean_Valence) - s...
8,22,15.458944,abs(entropy_ThermalConductivity + ((abs(wtd_st...


In [18]:
print(model.expressions_[0].iloc[8]["equation"])

abs(entropy_ThermalConductivity + ((abs(wtd_std_atomic_radius + (((wtd_mean_Valence - -0.7350441) / range_FusionHeat) / (inv(square(inv(wtd_mean_Valence))) + -1.3224036))) ^ 0.25417194) - -4.169578))


$$
T_c = \left| 
\mathrm{entropy\_ThermalConductivity} 
+ 
\left( 
\left| 
\mathrm{wtd\_std\_atomic\_radius} + 0.7350441
\right| 
\cdot 
\frac{1}{\mathrm{range\_FusionHeat}} 
\cdot 
\left( 
(\frac{1}{\mathrm{wtd\_mean\_Valence}}) ^ 2
- 1.3224036
\right) 
\right)^{0.25417194} 
+ 4.169578
\right|
$$

#### **Cluster 2**

In [22]:
model.expressions_[1]

Unnamed: 0,complexity,loss,equation
0,1,38.62166,5.9513054
1,2,33.488266,square(wtd_range_atomic_radius)
2,3,28.33331,wtd_mean_atomic_radius / 0.18650644
3,4,15.821792,cube(wtd_range_FusionHeat - wtd_std_fie)
4,5,15.821318,abs(cube(wtd_range_FusionHeat - wtd_std_fie))
5,6,14.611463,cube(wtd_range_FusionHeat - wtd_std_fie) - -1....
6,7,14.032359,abs(cube(wtd_range_FusionHeat - wtd_std_fie) -...
7,8,13.745005,(cube(wtd_range_FusionHeat - wtd_std_fie) - wt...
8,16,12.490854,abs(((wtd_gmean_atomic_radius + wtd_std_Fusion...
9,18,11.998098,abs(((wtd_gmean_atomic_radius + (wtd_std_Fusio...


In [24]:
print(model.expressions_[1].iloc[12]["equation"])

square(wtd_gmean_atomic_radius) + square(sin((wtd_std_FusionHeat + 0.4829266) * ((wtd_std_FusionHeat - wtd_gmean_atomic_radius) - (wtd_entropy_ThermalConductivity * wtd_gmean_atomic_radius))) - abs((mean_atomic_radius - wtd_mean_atomic_radius) - cos(wtd_range_ElectronAffinity)))


$$
T_c =
\begin{split}
& \left( \mathrm{wtd\_gmean\_atomic\_radius} \right)^2
+ \left( 
\sin\left(
\left( \mathrm{wtd\_std\_FusionHeat} + 0.4829266 \right) 
\cdot 
\left( 
\mathrm{wtd\_std\_FusionHeat} - \mathrm{wtd\_gmean\_atomic\_radius} 
- \mathrm{wtd\_entropy\_ThermalConductivity} \cdot \mathrm{wtd\_gmean\_atomic\_radius}
\right)
\right) \right. \\
& \left. \quad
- 
\left| 
\mathrm{mean\_atomic\_radius} - \mathrm{wtd\_mean\_atomic\_radius} 
- \cos\left( \mathrm{wtd\_range\_ElectronAffinity} \right)
\right|
\right)^2
\end{split}
$$



#### **Cluster 3**

In [25]:
model.expressions_[2]

Unnamed: 0,complexity,loss,equation
0,1,10.523952,4.396335
1,3,10.190161,wtd_mean_Valence + 4.154043
2,5,9.871449,cos(exp(wtd_mean_ThermalConductivity)) + 3.687...
3,6,9.734819,exp(inv(cos(cos(cos(mean_Density)))))
4,7,9.69269,(cos(exp(wtd_mean_ThermalConductivity)) + 3.25...
5,9,9.351548,exp(inv(cos(sin(mean_Density)))) + cos(exp(wtd...
6,11,9.244647,cos(exp(wtd_mean_ThermalConductivity)) + exp(i...
7,12,9.03562,(exp(inv(cos(cos(cos(mean_Density))))) + cos(e...
8,13,8.67232,abs(cos(exp(wtd_mean_ThermalConductivity)) + e...


In [27]:
print(model.expressions_[2].iloc[8]["equation"])

abs(cos(exp(wtd_mean_ThermalConductivity)) + exp(inv(cos(cos(cos(mean_Density * wtd_gmean_Valence))))))


$$
\begin{aligned}
T_c =\ & 
\left| 
\cos\left( 
\exp\left( \mathrm{wtd\_mean\_ThermalConductivity} \right) 
\right)
+ 
\exp\left( 
\frac{1}{
\cos\left( 
\cos\left( 
\cos\left( 
\mathrm{mean\_Density} \cdot \mathrm{wtd\_gmean\_Valence} 
\right)
\right)
\right)
}
\right)
\right|
\end{aligned}
$$


#### **Cluster 4**

In [28]:
model.expressions_[3]

Unnamed: 0,complexity,loss,equation
0,1,10.190784,3.2611098
1,2,9.104302,square(wtd_gmean_atomic_radius)
2,3,9.064628,wtd_gmean_atomic_radius * 2.108322
3,4,8.464461,square(wtd_mean_atomic_radius) - wtd_mean_fie
4,5,8.310187,abs(square(wtd_mean_atomic_radius) - wtd_gmean...
5,7,7.507406,abs(square(wtd_mean_atomic_radius) + wtd_std_a...
6,9,7.100302,abs(cos(wtd_gmean_atomic_radius - wtd_gmean_fi...
7,10,7.100302,abs((wtd_gmean_atomic_radius * cos(wtd_gmean_a...
8,11,7.019707,abs((wtd_gmean_atomic_radius * (cos(wtd_gmean_...
9,13,7.006529,abs(sqrt(square(1.7190316 - (wtd_gmean_atomic_...


In [29]:
print(model.expressions_[3].iloc[16]["equation"])

abs((cos(wtd_gmean_fie - (wtd_gmean_atomic_radius - cos(0.5337268 ^ (wtd_mean_atomic_radius / -2.5628972)))) * (-1.9834003 - (0.4087335 ^ cos((1.393756 ^ wtd_std_Density) / abs(range_Density))))) * wtd_gmean_atomic_radius)


$$
T_c = 
\biggl|
\begin{split}
\biggl( 
\cos\biggl(
\mathrm{wtd\_gmean\_fie} - 
\bigl(
\mathrm{wtd\_gmean\_atomic\_radius} - 
\cos\biggl(
0.5337268^{ 
\frac{
\mathrm{wtd\_mean\_atomic\_radius}
}{
-2.5628972}
} 
\biggr)
\bigr)
\biggr) \\
\cdot 
\biggl(
-1.9834003 - 
0.4087335^{ 
\cos\left(
\frac{
1.393756^{\mathrm{wtd\_std\_Density}}
}{
\bigl| \mathrm{range\_Density} \bigr|}
\right)
}
\biggr)
\biggr)
\cdot \mathrm{wtd\_gmean\_atomic\_radius}
\end{split}
\biggr|
$$





#### **Cluster 5**

In [30]:
model.expressions_[4]

Unnamed: 0,complexity,loss,equation
0,1,225.74194,16.463936
1,3,145.85725,square(square(mean_atomic_mass))
2,4,140.40585,exp(wtd_mean_atomic_mass * -1.9090296)
3,5,130.79886,(mean_atomic_mass * wtd_mean_atomic_mass) * 6....
4,6,122.854095,0.977742 / (exp(gmean_atomic_mass) ^ 1.9156873)
5,9,116.8114,abs((wtd_mean_ElectronAffinity * -8.700231) * ...
6,10,108.511154,abs((wtd_mean_fie + wtd_mean_ElectronAffinity)...
7,11,98.51867,abs(((4.7067037 - abs(wtd_mean_ThermalConducti...
8,12,95.589195,(wtd_range_FusionHeat + 2.3533518) * (((mean_a...
9,13,92.933395,abs(((wtd_range_FusionHeat + 2.5320327) * ((me...


In [31]:
print(model.expressions_[4].iloc[13]["equation"])

abs(abs(cube(wtd_std_Valence) + (cube(wtd_std_Valence) + ((wtd_range_FusionHeat + 2.3533518) * ((mean_atomic_mass * wtd_mean_ElectronAffinity) + 1.266521)))) - wtd_mean_atomic_mass)


$$
T_c = \biggl( \left(\mathrm{wtd\_std\_Valence}\right)^3 + \left(\left(\mathrm{wtd\_range\_FusionHeat} + 2.3533518\right) \cdot \left(\left(\mathrm{mean\_atomic\_mass} \cdot \mathrm{wtd\_mean\_ElectronAffinity}\right) + 1.266521\right)\right) - \mathrm{wtd\_mean\_atomic\_mass} \biggr)^3
$$
