
# üë• **Subgroup Discovery**  



<p align="justify">
Subgroup Discovery (SGD) is a data mining technique that aims to discover ‚ÄúSG Rules‚Äù, which are capable of filtering and locally describing regions in the search space. From that, it is possible to identify patterns in the data. [1]
</p>

With this objective are created $\pi$ prepositions about the target $\phi$, with the aim to find descriptors that maximizing the quality function Q:

$$
Q (\~{P}, SG) = (\frac{S_{SG}}{S_{\~{P}}})^{\gamma} \cdot (u (SG, \~{P}))^{(1 - \gamma)}
$$

Where:
* SG is the subgroup
* Q represents the tradeoff between coverage (generality) and utility (exceptionality) of the SG rules that are being created;
* $\~{P}$ is a subrgroup of P, representing the area of P that is being analysed;
* $S_{SG}$ and $S_{\~{P}}$ represente the number of points in their respectives subspaces (SG and $\~{P}$). The ratio between these two features is the coverage;
* The function $u$ measures how exceptional is SG em comparation to $\~{P}$;
* $\gamma$ is the parameter to be optimized.

***

###  üìö **Importing libraries & data**

In [2]:
import pysubgroup as ps
import pandas as pd 
import numpy as np


In [None]:
data = pd.read_csv("../data/train.csv")

### üë• **Subgroup Discovery**

In Python, Pysubgroup module [2] is one of the tools that can be used to do Subgroup discovery. The first step is indicate if the target is categorical or numercal.  After that, we need to define the searchspace, by providing the data and indicating which column must be ignored (the target). After that, we can define the number of gropus that should be create (``result_set_size``). Other parameters include ``depth`` (maximum numbers of selectors combined in a subgroup description) and ``qf`` (Quality function).

In [4]:
target = ps.NumericTarget("critical_temp")

searchspace = ps.create_selectors(data, ignore = ["critical_temp"])

print(searchspace)

task = ps.SubgroupDiscoveryTask(
    data, 
    target, 
    searchspace, 
    result_set_size= 10, 
    depth=3, 
    qf = ps.StandardQFNumeric(1.0)
)

result = ps.BeamSearch().execute(task)

df = result.to_dataframe()

[number_of_elements<3, number_of_elements: [3:4[, number_of_elements: [4:5[, number_of_elements: [5:6[, number_of_elements>=6, mean_atomic_mass<68.2298, mean_atomic_mass: [68.2298:78.7169571428571[, mean_atomic_mass: [78.7169571428571:90.2786[, mean_atomic_mass: [90.2786:104.3656[, mean_atomic_mass>=104.3656, wtd_mean_atomic_mass<51.2498976923077, wtd_mean_atomic_mass: [51.2498976923077:57.0657057692308[, wtd_mean_atomic_mass: [57.0657057692308:71.3846088888889[, wtd_mean_atomic_mass: [71.3846088888889:92.0409836185819[, wtd_mean_atomic_mass>=92.0409836185819, gmean_atomic_mass<52.6599017543343, gmean_atomic_mass: [52.6599017543343:60.7391827775422[, gmean_atomic_mass: [60.7391827775422:69.9918310747709[, gmean_atomic_mass: [69.9918310747709:82.7933193768878[, gmean_atomic_mass>=82.7933193768878, wtd_gmean_atomic_mass<34.8923750099752, wtd_gmean_atomic_mass: [34.8923750099752:36.3794591442759[, wtd_gmean_atomic_mass: [36.3794591442759:55.4031727897437[, wtd_gmean_atomic_mass: [55.40317

By saving the result in a dataframe, we can see the SG rules founded by the algorithm.

In [5]:
df

Unnamed: 0,quality,subgroup,size_sg,size_dataset,mean_sg,mean_dataset,std_sg,std_dataset,median_sg,median_dataset,max_sg,max_dataset,min_sg,min_dataset,mean_lift,median_lift
0,265161.823252,range_ThermalConductivity: [399.97:399.97[,10376,21263,59.976522,34.421219,31.082988,34.253557,63.5,20.0,143.0,185.0,0.001,0.00021,1.742429,3.175
1,216612.226302,range_ThermalConductivity: [399.97:399.97[ AND...,6109,21263,69.879105,34.421219,27.982203,34.253557,76.0,20.0,143.0,185.0,0.3,0.00021,2.030117,3.8
2,214039.436256,range_ThermalConductivity: [399.97:399.97[ AND...,5986,21263,70.177891,34.421219,27.828179,34.253557,76.0,20.0,143.0,185.0,0.3,0.00021,2.038797,3.8
3,214039.436256,range_ThermalConductivity: [399.97:399.97[ AND...,5986,21263,70.177891,34.421219,27.828179,34.253557,76.0,20.0,143.0,185.0,0.3,0.00021,2.038797,3.8
4,210264.333216,range_atomic_radius>=205 AND range_fie>=810.60,6333,21263,67.622598,34.421219,29.75667,34.253557,74.3,20.0,143.0,185.0,0.1,0.00021,1.964561,3.715
5,209669.687984,range_atomic_radius>=205,6366,21263,67.35708,34.421219,29.919101,34.253557,74.0,20.0,143.0,185.0,0.1,0.00021,1.956848,3.7
6,201474.547362,range_FusionHeat: [12.88:12.90[,7820,21263,60.185228,34.421219,31.561079,34.253557,64.55,20.0,143.0,185.0,0.001,0.00021,1.748492,3.2275
7,199761.972516,range_fie>=810.60,7258,21263,61.944224,34.421219,32.575919,34.253557,68.3,20.0,143.0,185.0,0.066,0.00021,1.799594,3.415
8,199692.372439,range_FusionHeat: [12.88:12.90[ AND range_Ther...,7701,21263,60.351926,34.421219,31.422558,34.253557,65.0,20.0,143.0,185.0,0.001,0.00021,1.753335,3.25
9,165134.640993,range_ThermalConductivity: [399.97:399.97[ AND...,3993,21263,75.777252,34.421219,25.385438,34.253557,80.0,20.0,143.0,185.0,1.0,0.00021,2.201469,4.0


In [6]:
grupo1 = df["subgroup"][0]
grupo2 = df["subgroup"][1]
grupo3 = df["subgroup"][2]
grupo4 = df["subgroup"][3]
grupo5 = df["subgroup"][4]
grupo6 = df["subgroup"][5]
grupo7 = df["subgroup"][6]
grupo8 = df["subgroup"][7]
grupo9 = df["subgroup"][8]
grupo10 = df["subgroup"][9]

In [7]:
grupo1

(range_ThermalConductivity: [399.97342:399.97417[)

In [8]:
df1 = data[(data["range_ThermalConductivity"] >= 399.97342) & 
           (data["range_ThermalConductivity"] < 399.97417)]
display(df1)

Unnamed: 0,number_of_elements,mean_atomic_mass,wtd_mean_atomic_mass,gmean_atomic_mass,wtd_gmean_atomic_mass,entropy_atomic_mass,wtd_entropy_atomic_mass,range_atomic_mass,wtd_range_atomic_mass,std_atomic_mass,...,wtd_mean_Valence,gmean_Valence,wtd_gmean_Valence,entropy_Valence,wtd_entropy_Valence,range_Valence,wtd_range_Valence,std_Valence,wtd_std_Valence,critical_temp
0,4,88.944468,57.862692,66.361592,36.116612,1.181795,1.062396,122.90607,31.794921,51.968828,...,2.257143,2.213364,2.219783,1.368922,1.066221,1,1.085714,0.433013,0.437059,29.0
2,4,88.944468,57.885242,66.361592,36.122509,1.181795,0.975980,122.90607,35.741099,51.968828,...,2.271429,2.213364,2.232679,1.368922,1.029175,1,1.114286,0.433013,0.444697,19.0
3,4,88.944468,57.873967,66.361592,36.119560,1.181795,1.022291,122.90607,33.768010,51.968828,...,2.264286,2.213364,2.226222,1.368922,1.048834,1,1.100000,0.433013,0.440952,22.0
4,4,88.944468,57.840143,66.361592,36.110716,1.181795,1.129224,122.90607,27.848743,51.968828,...,2.242857,2.213364,2.206963,1.368922,1.096052,1,1.057143,0.433013,0.428809,23.0
5,4,88.944468,57.795044,66.361592,36.098926,1.181795,1.225203,122.90607,20.687458,51.968828,...,2.214286,2.213364,2.181543,1.368922,1.141474,1,1.000000,0.433013,0.410326,23.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21150,4,46.846625,36.933336,41.451165,30.353185,1.293330,1.135530,47.54660,15.188437,18.288714,...,2.962500,2.912951,2.609441,1.271181,1.074221,4,1.387500,1.639360,1.698851,117.5
21151,4,46.846625,36.981448,41.451165,30.380291,1.293330,1.181948,47.54660,14.490375,18.288714,...,2.925000,2.912951,2.586929,1.271181,1.119688,4,1.275000,1.639360,1.664144,98.0
21152,3,43.847167,36.885225,37.530778,30.326102,0.974604,1.061829,47.54660,7.886800,20.248023,...,3.000000,2.884499,2.632148,0.950271,1.011404,4,1.000000,1.885618,1.732051,132.0
21246,5,70.406250,43.096221,59.725961,31.931306,1.497519,1.466096,79.96060,11.170606,29.321360,...,2.147710,2.701920,2.103026,1.494365,1.315617,4,1.002954,1.549193,0.589456,78.0


In [9]:
grupo2

(range_ThermalConductivity: [399.97342:399.97417[ and range_fie>=810.6)

In [10]:
df2 = data[(data["range_ThermalConductivity"] >= 399.97342) & 
           (data["range_ThermalConductivity"] < 399.97417) & 
           (data["range_fie"] >= 810.6)]
display(df2)

Unnamed: 0,number_of_elements,mean_atomic_mass,wtd_mean_atomic_mass,gmean_atomic_mass,wtd_gmean_atomic_mass,entropy_atomic_mass,wtd_entropy_atomic_mass,range_atomic_mass,wtd_range_atomic_mass,std_atomic_mass,...,wtd_mean_Valence,gmean_Valence,wtd_gmean_Valence,entropy_Valence,wtd_entropy_Valence,range_Valence,wtd_range_Valence,std_Valence,wtd_std_Valence,critical_temp
0,4,88.944468,57.862692,66.361592,36.116612,1.181795,1.062396,122.90607,31.794921,51.968828,...,2.257143,2.213364,2.219783,1.368922,1.066221,1,1.085714,0.433013,0.437059,29.0
2,4,88.944468,57.885242,66.361592,36.122509,1.181795,0.975980,122.90607,35.741099,51.968828,...,2.271429,2.213364,2.232679,1.368922,1.029175,1,1.114286,0.433013,0.444697,19.0
3,4,88.944468,57.873967,66.361592,36.119560,1.181795,1.022291,122.90607,33.768010,51.968828,...,2.264286,2.213364,2.226222,1.368922,1.048834,1,1.100000,0.433013,0.440952,22.0
4,4,88.944468,57.840143,66.361592,36.110716,1.181795,1.129224,122.90607,27.848743,51.968828,...,2.242857,2.213364,2.206963,1.368922,1.096052,1,1.057143,0.433013,0.428809,23.0
5,4,88.944468,57.795044,66.361592,36.098926,1.181795,1.225203,122.90607,20.687458,51.968828,...,2.214286,2.213364,2.181543,1.368922,1.141474,1,1.000000,0.433013,0.410326,23.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
20653,4,105.313925,76.633000,73.088376,42.605417,1.125848,1.158646,188.38390,31.383691,71.695924,...,2.181818,2.213364,2.153013,1.368922,1.198849,1,0.909091,0.433013,0.385695,33.0
20687,4,76.444563,52.655516,59.356672,36.043792,1.199541,1.285546,121.32760,14.859852,43.823354,...,2.080000,2.213364,2.065938,1.368922,1.222100,1,0.800000,0.433013,0.271293,54.1
20688,4,76.444563,52.163682,59.356672,35.653137,1.199541,1.287804,121.32760,14.660470,43.823354,...,2.078927,2.213364,2.065039,1.368922,1.215189,1,0.816101,0.433013,0.269624,60.6
20689,4,76.444563,51.936769,59.356672,35.474335,1.199541,1.288830,121.32760,14.568482,43.823354,...,2.078431,2.213364,2.064624,1.368922,1.211942,1,0.823529,0.433013,0.268849,75.3


In [11]:
grupo3

(range_ThermalConductivity: [399.97342:399.97417[ and range_atomic_radius>=205)

In [12]:
df3 = data[(data["range_ThermalConductivity"] >= 399.97342) & 
           (data["range_ThermalConductivity"] < 399.97417) & 
           (data["range_atomic_radius"] >= 205)]
display(df3)

Unnamed: 0,number_of_elements,mean_atomic_mass,wtd_mean_atomic_mass,gmean_atomic_mass,wtd_gmean_atomic_mass,entropy_atomic_mass,wtd_entropy_atomic_mass,range_atomic_mass,wtd_range_atomic_mass,std_atomic_mass,...,wtd_mean_Valence,gmean_Valence,wtd_gmean_Valence,entropy_Valence,wtd_entropy_Valence,range_Valence,wtd_range_Valence,std_Valence,wtd_std_Valence,critical_temp
0,4,88.944468,57.862692,66.361592,36.116612,1.181795,1.062396,122.90607,31.794921,51.968828,...,2.257143,2.213364,2.219783,1.368922,1.066221,1,1.085714,0.433013,0.437059,29.0
2,4,88.944468,57.885242,66.361592,36.122509,1.181795,0.975980,122.90607,35.741099,51.968828,...,2.271429,2.213364,2.232679,1.368922,1.029175,1,1.114286,0.433013,0.444697,19.0
3,4,88.944468,57.873967,66.361592,36.119560,1.181795,1.022291,122.90607,33.768010,51.968828,...,2.264286,2.213364,2.226222,1.368922,1.048834,1,1.100000,0.433013,0.440952,22.0
4,4,88.944468,57.840143,66.361592,36.110716,1.181795,1.129224,122.90607,27.848743,51.968828,...,2.242857,2.213364,2.206963,1.368922,1.096052,1,1.057143,0.433013,0.428809,23.0
5,4,88.944468,57.795044,66.361592,36.098926,1.181795,1.225203,122.90607,20.687458,51.968828,...,2.214286,2.213364,2.181543,1.368922,1.141474,1,1.000000,0.433013,0.410326,23.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
20653,4,105.313925,76.633000,73.088376,42.605417,1.125848,1.158646,188.38390,31.383691,71.695924,...,2.181818,2.213364,2.153013,1.368922,1.198849,1,0.909091,0.433013,0.385695,33.0
20687,4,76.444563,52.655516,59.356672,36.043792,1.199541,1.285546,121.32760,14.859852,43.823354,...,2.080000,2.213364,2.065938,1.368922,1.222100,1,0.800000,0.433013,0.271293,54.1
20688,4,76.444563,52.163682,59.356672,35.653137,1.199541,1.287804,121.32760,14.660470,43.823354,...,2.078927,2.213364,2.065039,1.368922,1.215189,1,0.816101,0.433013,0.269624,60.6
20689,4,76.444563,51.936769,59.356672,35.474335,1.199541,1.288830,121.32760,14.568482,43.823354,...,2.078431,2.213364,2.064624,1.368922,1.211942,1,0.823529,0.433013,0.268849,75.3


In [13]:
grupo4

(range_ThermalConductivity: [399.97342:399.97417[ and range_atomic_radius>=205 and range_fie>=810.6)

In [14]:
df4 = data[(data["range_ThermalConductivity"] >= 399.97342) & 
           (data["range_ThermalConductivity"] < 399.97417) & 
           (data["range_atomic_radius"] >= 205) & 
           (data["range_fie"] >= 810.6)
           ]
display(df4)

Unnamed: 0,number_of_elements,mean_atomic_mass,wtd_mean_atomic_mass,gmean_atomic_mass,wtd_gmean_atomic_mass,entropy_atomic_mass,wtd_entropy_atomic_mass,range_atomic_mass,wtd_range_atomic_mass,std_atomic_mass,...,wtd_mean_Valence,gmean_Valence,wtd_gmean_Valence,entropy_Valence,wtd_entropy_Valence,range_Valence,wtd_range_Valence,std_Valence,wtd_std_Valence,critical_temp
0,4,88.944468,57.862692,66.361592,36.116612,1.181795,1.062396,122.90607,31.794921,51.968828,...,2.257143,2.213364,2.219783,1.368922,1.066221,1,1.085714,0.433013,0.437059,29.0
2,4,88.944468,57.885242,66.361592,36.122509,1.181795,0.975980,122.90607,35.741099,51.968828,...,2.271429,2.213364,2.232679,1.368922,1.029175,1,1.114286,0.433013,0.444697,19.0
3,4,88.944468,57.873967,66.361592,36.119560,1.181795,1.022291,122.90607,33.768010,51.968828,...,2.264286,2.213364,2.226222,1.368922,1.048834,1,1.100000,0.433013,0.440952,22.0
4,4,88.944468,57.840143,66.361592,36.110716,1.181795,1.129224,122.90607,27.848743,51.968828,...,2.242857,2.213364,2.206963,1.368922,1.096052,1,1.057143,0.433013,0.428809,23.0
5,4,88.944468,57.795044,66.361592,36.098926,1.181795,1.225203,122.90607,20.687458,51.968828,...,2.214286,2.213364,2.181543,1.368922,1.141474,1,1.000000,0.433013,0.410326,23.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
20653,4,105.313925,76.633000,73.088376,42.605417,1.125848,1.158646,188.38390,31.383691,71.695924,...,2.181818,2.213364,2.153013,1.368922,1.198849,1,0.909091,0.433013,0.385695,33.0
20687,4,76.444563,52.655516,59.356672,36.043792,1.199541,1.285546,121.32760,14.859852,43.823354,...,2.080000,2.213364,2.065938,1.368922,1.222100,1,0.800000,0.433013,0.271293,54.1
20688,4,76.444563,52.163682,59.356672,35.653137,1.199541,1.287804,121.32760,14.660470,43.823354,...,2.078927,2.213364,2.065039,1.368922,1.215189,1,0.816101,0.433013,0.269624,60.6
20689,4,76.444563,51.936769,59.356672,35.474335,1.199541,1.288830,121.32760,14.568482,43.823354,...,2.078431,2.213364,2.064624,1.368922,1.211942,1,0.823529,0.433013,0.268849,75.3


In [15]:
grupo5

(range_atomic_radius>=205 and range_fie>=810.6)

In [16]:
df5 = data[(data["range_atomic_radius"] >= 205) & 
           (data["range_fie"] >= 810.6)
           ]
display(df5)

Unnamed: 0,number_of_elements,mean_atomic_mass,wtd_mean_atomic_mass,gmean_atomic_mass,wtd_gmean_atomic_mass,entropy_atomic_mass,wtd_entropy_atomic_mass,range_atomic_mass,wtd_range_atomic_mass,std_atomic_mass,...,wtd_mean_Valence,gmean_Valence,wtd_gmean_Valence,entropy_Valence,wtd_entropy_Valence,range_Valence,wtd_range_Valence,std_Valence,wtd_std_Valence,critical_temp
0,4,88.944468,57.862692,66.361592,36.116612,1.181795,1.062396,122.90607,31.794921,51.968828,...,2.257143,2.213364,2.219783,1.368922,1.066221,1,1.085714,0.433013,0.437059,29.0
1,5,92.729214,58.518416,73.132787,36.396602,1.449309,1.057755,122.90607,36.161939,47.094633,...,2.257143,1.888175,2.210679,1.557113,1.047221,2,1.128571,0.632456,0.468606,26.0
2,4,88.944468,57.885242,66.361592,36.122509,1.181795,0.975980,122.90607,35.741099,51.968828,...,2.271429,2.213364,2.232679,1.368922,1.029175,1,1.114286,0.433013,0.444697,19.0
3,4,88.944468,57.873967,66.361592,36.119560,1.181795,1.022291,122.90607,33.768010,51.968828,...,2.264286,2.213364,2.226222,1.368922,1.048834,1,1.100000,0.433013,0.440952,22.0
4,4,88.944468,57.840143,66.361592,36.110716,1.181795,1.129224,122.90607,27.848743,51.968828,...,2.242857,2.213364,2.206963,1.368922,1.096052,1,1.057143,0.433013,0.428809,23.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21077,7,100.616296,49.210285,77.516930,33.963068,1.750116,1.784134,184.59060,11.575336,59.833315,...,2.015000,1.919471,2.000657,1.908535,1.430624,2,0.980000,0.534522,0.234041,121.6
21078,7,100.616296,50.096729,77.516930,34.324405,1.750116,1.832276,184.59060,10.723850,59.833315,...,1.995000,1.919471,1.973113,1.908535,1.451696,2,0.970000,0.534522,0.273816,126.8
21079,7,100.616296,50.983173,77.516930,34.689586,1.750116,1.853785,184.59060,10.723850,59.833315,...,1.975000,1.919471,1.945949,1.908535,1.465459,2,0.970000,0.534522,0.307205,126.9
21080,7,100.616296,52.756061,77.516930,35.431646,1.750116,1.846444,184.59060,10.723850,59.833315,...,1.935000,1.919471,1.892737,1.908535,1.475131,2,0.970000,0.534522,0.361628,106.7


In [17]:
grupo6

(range_atomic_radius>=205)

In [18]:
df6 = data[data["range_atomic_radius"] >= 205]
display(df6)

Unnamed: 0,number_of_elements,mean_atomic_mass,wtd_mean_atomic_mass,gmean_atomic_mass,wtd_gmean_atomic_mass,entropy_atomic_mass,wtd_entropy_atomic_mass,range_atomic_mass,wtd_range_atomic_mass,std_atomic_mass,...,wtd_mean_Valence,gmean_Valence,wtd_gmean_Valence,entropy_Valence,wtd_entropy_Valence,range_Valence,wtd_range_Valence,std_Valence,wtd_std_Valence,critical_temp
0,4,88.944468,57.862692,66.361592,36.116612,1.181795,1.062396,122.90607,31.794921,51.968828,...,2.257143,2.213364,2.219783,1.368922,1.066221,1,1.085714,0.433013,0.437059,29.0
1,5,92.729214,58.518416,73.132787,36.396602,1.449309,1.057755,122.90607,36.161939,47.094633,...,2.257143,1.888175,2.210679,1.557113,1.047221,2,1.128571,0.632456,0.468606,26.0
2,4,88.944468,57.885242,66.361592,36.122509,1.181795,0.975980,122.90607,35.741099,51.968828,...,2.271429,2.213364,2.232679,1.368922,1.029175,1,1.114286,0.433013,0.444697,19.0
3,4,88.944468,57.873967,66.361592,36.119560,1.181795,1.022291,122.90607,33.768010,51.968828,...,2.264286,2.213364,2.226222,1.368922,1.048834,1,1.100000,0.433013,0.440952,22.0
4,4,88.944468,57.840143,66.361592,36.110716,1.181795,1.129224,122.90607,27.848743,51.968828,...,2.242857,2.213364,2.206963,1.368922,1.096052,1,1.057143,0.433013,0.428809,23.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21077,7,100.616296,49.210285,77.516930,33.963068,1.750116,1.784134,184.59060,11.575336,59.833315,...,2.015000,1.919471,2.000657,1.908535,1.430624,2,0.980000,0.534522,0.234041,121.6
21078,7,100.616296,50.096729,77.516930,34.324405,1.750116,1.832276,184.59060,10.723850,59.833315,...,1.995000,1.919471,1.973113,1.908535,1.451696,2,0.970000,0.534522,0.273816,126.8
21079,7,100.616296,50.983173,77.516930,34.689586,1.750116,1.853785,184.59060,10.723850,59.833315,...,1.975000,1.919471,1.945949,1.908535,1.465459,2,0.970000,0.534522,0.307205,126.9
21080,7,100.616296,52.756061,77.516930,35.431646,1.750116,1.846444,184.59060,10.723850,59.833315,...,1.935000,1.919471,1.892737,1.908535,1.475131,2,0.970000,0.534522,0.361628,106.7


In [19]:
grupo7

(range_FusionHeat: [12.878:12.9[)

In [20]:
df6 = data[(data["range_FusionHeat"] >= 12.878) & 
           (data["range_FusionHeat"] < 12.9)]
display(df6)

Unnamed: 0,number_of_elements,mean_atomic_mass,wtd_mean_atomic_mass,gmean_atomic_mass,wtd_gmean_atomic_mass,entropy_atomic_mass,wtd_entropy_atomic_mass,range_atomic_mass,wtd_range_atomic_mass,std_atomic_mass,...,wtd_mean_Valence,gmean_Valence,wtd_gmean_Valence,entropy_Valence,wtd_entropy_Valence,range_Valence,wtd_range_Valence,std_Valence,wtd_std_Valence,critical_temp
0,4,88.944468,57.862692,66.361592,36.116612,1.181795,1.062396,122.90607,31.794921,51.968828,...,2.257143,2.213364,2.219783,1.368922,1.066221,1,1.085714,0.433013,0.437059,29.0
1,5,92.729214,58.518416,73.132787,36.396602,1.449309,1.057755,122.90607,36.161939,47.094633,...,2.257143,1.888175,2.210679,1.557113,1.047221,2,1.128571,0.632456,0.468606,26.0
2,4,88.944468,57.885242,66.361592,36.122509,1.181795,0.975980,122.90607,35.741099,51.968828,...,2.271429,2.213364,2.232679,1.368922,1.029175,1,1.114286,0.433013,0.444697,19.0
3,4,88.944468,57.873967,66.361592,36.119560,1.181795,1.022291,122.90607,33.768010,51.968828,...,2.264286,2.213364,2.226222,1.368922,1.048834,1,1.100000,0.433013,0.440952,22.0
4,4,88.944468,57.840143,66.361592,36.110716,1.181795,1.129224,122.90607,27.848743,51.968828,...,2.242857,2.213364,2.206963,1.368922,1.096052,1,1.057143,0.433013,0.428809,23.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21077,7,100.616296,49.210285,77.516930,33.963068,1.750116,1.784134,184.59060,11.575336,59.833315,...,2.015000,1.919471,2.000657,1.908535,1.430624,2,0.980000,0.534522,0.234041,121.6
21078,7,100.616296,50.096729,77.516930,34.324405,1.750116,1.832276,184.59060,10.723850,59.833315,...,1.995000,1.919471,1.973113,1.908535,1.451696,2,0.970000,0.534522,0.273816,126.8
21079,7,100.616296,50.983173,77.516930,34.689586,1.750116,1.853785,184.59060,10.723850,59.833315,...,1.975000,1.919471,1.945949,1.908535,1.465459,2,0.970000,0.534522,0.307205,126.9
21080,7,100.616296,52.756061,77.516930,35.431646,1.750116,1.846444,184.59060,10.723850,59.833315,...,1.935000,1.919471,1.892737,1.908535,1.475131,2,0.970000,0.534522,0.361628,106.7


In [21]:
grupo8

(range_fie>=810.6)

In [22]:
df8 = data[data["range_fie"] >= 810.6]
display(df8)

Unnamed: 0,number_of_elements,mean_atomic_mass,wtd_mean_atomic_mass,gmean_atomic_mass,wtd_gmean_atomic_mass,entropy_atomic_mass,wtd_entropy_atomic_mass,range_atomic_mass,wtd_range_atomic_mass,std_atomic_mass,...,wtd_mean_Valence,gmean_Valence,wtd_gmean_Valence,entropy_Valence,wtd_entropy_Valence,range_Valence,wtd_range_Valence,std_Valence,wtd_std_Valence,critical_temp
0,4,88.944468,57.862692,66.361592,36.116612,1.181795,1.062396,122.90607,31.794921,51.968828,...,2.257143,2.213364,2.219783,1.368922,1.066221,1,1.085714,0.433013,0.437059,29.0
1,5,92.729214,58.518416,73.132787,36.396602,1.449309,1.057755,122.90607,36.161939,47.094633,...,2.257143,1.888175,2.210679,1.557113,1.047221,2,1.128571,0.632456,0.468606,26.0
2,4,88.944468,57.885242,66.361592,36.122509,1.181795,0.975980,122.90607,35.741099,51.968828,...,2.271429,2.213364,2.232679,1.368922,1.029175,1,1.114286,0.433013,0.444697,19.0
3,4,88.944468,57.873967,66.361592,36.119560,1.181795,1.022291,122.90607,33.768010,51.968828,...,2.264286,2.213364,2.226222,1.368922,1.048834,1,1.100000,0.433013,0.440952,22.0
4,4,88.944468,57.840143,66.361592,36.110716,1.181795,1.129224,122.90607,27.848743,51.968828,...,2.242857,2.213364,2.206963,1.368922,1.096052,1,1.057143,0.433013,0.428809,23.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21236,7,64.073395,70.874211,50.714667,55.172697,1.752162,1.285364,122.90607,32.921377,39.330466,...,3.223750,2.712353,3.023002,1.860419,1.415529,4,1.245000,1.195229,1.132999,18.6
21237,7,64.073395,70.249216,50.714667,54.865817,1.752162,1.336901,122.90607,31.185058,39.330466,...,3.223750,2.712353,3.023002,1.860419,1.444974,4,1.245000,1.195229,1.132999,24.1
21238,7,64.073395,68.999225,50.714667,54.257168,1.752162,1.416413,122.90607,27.712421,39.330466,...,3.223750,2.712353,3.023002,1.860419,1.485762,4,1.245000,1.195229,1.132999,26.6
21239,7,64.073395,68.624228,50.714667,54.075894,1.752162,1.436019,122.90607,26.670630,39.330466,...,3.223750,2.712353,3.023002,1.860419,1.494806,4,1.245000,1.195229,1.132999,27.9


In [23]:
grupo9

(range_FusionHeat: [12.878:12.9[ and range_ThermalConductivity: [399.97342:399.97417[)

In [24]:
df9 = data[(data["range_FusionHeat"] >= 12.878) & 
           (data["range_FusionHeat"] < 12.9) &
           (data["range_ThermalConductivity"] >= 399.97342) &
           (data["range_ThermalConductivity"] < 399.97417)] 

display(df9)

Unnamed: 0,number_of_elements,mean_atomic_mass,wtd_mean_atomic_mass,gmean_atomic_mass,wtd_gmean_atomic_mass,entropy_atomic_mass,wtd_entropy_atomic_mass,range_atomic_mass,wtd_range_atomic_mass,std_atomic_mass,...,wtd_mean_Valence,gmean_Valence,wtd_gmean_Valence,entropy_Valence,wtd_entropy_Valence,range_Valence,wtd_range_Valence,std_Valence,wtd_std_Valence,critical_temp
0,4,88.944468,57.862692,66.361592,36.116612,1.181795,1.062396,122.90607,31.794921,51.968828,...,2.257143,2.213364,2.219783,1.368922,1.066221,1,1.085714,0.433013,0.437059,29.0
2,4,88.944468,57.885242,66.361592,36.122509,1.181795,0.975980,122.90607,35.741099,51.968828,...,2.271429,2.213364,2.232679,1.368922,1.029175,1,1.114286,0.433013,0.444697,19.0
3,4,88.944468,57.873967,66.361592,36.119560,1.181795,1.022291,122.90607,33.768010,51.968828,...,2.264286,2.213364,2.226222,1.368922,1.048834,1,1.100000,0.433013,0.440952,22.0
4,4,88.944468,57.840143,66.361592,36.110716,1.181795,1.129224,122.90607,27.848743,51.968828,...,2.242857,2.213364,2.206963,1.368922,1.096052,1,1.057143,0.433013,0.428809,23.0
5,4,88.944468,57.795044,66.361592,36.098926,1.181795,1.225203,122.90607,20.687458,51.968828,...,2.214286,2.213364,2.181543,1.368922,1.141474,1,1.000000,0.433013,0.410326,23.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
20689,4,76.444563,51.936769,59.356672,35.474335,1.199541,1.288830,121.32760,14.568482,43.823354,...,2.078431,2.213364,2.064624,1.368922,1.211942,1,0.823529,0.433013,0.268849,75.3
20877,5,50.802080,60.368385,34.574599,48.107981,1.350123,1.223318,80.67900,24.363476,33.165975,...,2.818182,2.091279,2.546986,1.467734,1.255416,4,1.377622,1.356466,1.367128,8.0
21026,4,76.517718,57.175142,59.310096,35.891368,1.197273,0.943560,122.90607,36.451199,44.289459,...,2.271429,2.213364,2.232679,1.368922,1.029175,1,1.114286,0.433013,0.444697,22.9
21027,4,76.517718,56.845450,59.310096,35.785209,1.197273,0.978234,122.90607,34.994964,44.289459,...,2.265000,2.213364,2.226866,1.368922,1.046982,1,1.101429,0.433013,0.441333,32.2


In [25]:
grupo10

(range_ThermalConductivity: [399.97342:399.97417[ and wtd_gmean_Valence<2.07303817332395)

In [26]:
df10 = data[(data["wtd_gmean_Valence"] < 2.07303817332395) & 
           (data["range_ThermalConductivity"] >= 399.97342) &
           (data["range_ThermalConductivity"] < 399.97417)] 

display(df10)

Unnamed: 0,number_of_elements,mean_atomic_mass,wtd_mean_atomic_mass,gmean_atomic_mass,wtd_gmean_atomic_mass,entropy_atomic_mass,wtd_entropy_atomic_mass,range_atomic_mass,wtd_range_atomic_mass,std_atomic_mass,...,wtd_mean_Valence,gmean_Valence,wtd_gmean_Valence,entropy_Valence,wtd_entropy_Valence,range_Valence,wtd_range_Valence,std_Valence,wtd_std_Valence,critical_temp
16,5,69.171250,47.505319,54.872765,33.319073,1.419173,1.428952,121.32760,14.303962,41.809011,...,2.076923,2.168944,2.063362,1.594167,1.285132,1,1.000000,0.400000,0.266469,82.0
37,4,96.451652,74.924734,69.689342,52.781220,1.158346,1.216987,152.93481,28.998617,60.166149,...,2.083333,2.213364,2.068732,1.368922,1.326177,1,0.416667,0.433013,0.276385,93.2
38,4,96.451652,74.397947,69.689342,52.599312,1.158346,1.181297,152.93481,31.287400,60.166149,...,2.066667,2.213364,2.054799,1.368922,1.305501,1,0.466667,0.433013,0.249444,80.5
39,4,88.944468,55.091790,66.361592,36.155211,1.181795,1.328127,122.90607,12.512169,51.968828,...,2.076923,2.213364,2.063362,1.368922,1.201823,1,0.846154,0.433013,0.266469,48.0
45,4,92.209100,50.945200,67.869108,32.670052,1.172280,1.367867,135.96460,6.879600,55.307695,...,2.080000,2.213364,2.065938,1.368922,1.132727,1,0.960000,0.433013,0.271293,8.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
20651,4,76.444563,48.728075,59.356672,33.039869,1.199541,1.302088,121.32760,13.267725,43.823354,...,2.071429,2.213364,2.058771,1.368922,1.162020,1,0.928571,0.433013,0.257539,80.0
20687,4,76.444563,52.655516,59.356672,36.043792,1.199541,1.285546,121.32760,14.859852,43.823354,...,2.080000,2.213364,2.065938,1.368922,1.222100,1,0.800000,0.433013,0.271293,54.1
20688,4,76.444563,52.163682,59.356672,35.653137,1.199541,1.287804,121.32760,14.660470,43.823354,...,2.078927,2.213364,2.065039,1.368922,1.215189,1,0.816101,0.433013,0.269624,60.6
20689,4,76.444563,51.936769,59.356672,35.474335,1.199541,1.288830,121.32760,14.568482,43.823354,...,2.078431,2.213364,2.064624,1.368922,1.211942,1,0.823529,0.433013,0.268849,75.3


It is possible to notice that the groups presented overlap. So we cannot use the founded subgroups to training models, like clustering models. But we can do somenthing: By the feature importance analysis in the previous project, "range_ThermalConductivity" is the most important feature in this dataset to predict the critical temperature. So, since the subgroup discovery identify a subgroup with this feature, we can training the class Clustering_GLM, for example, two time: One using the materials that are in the subrgoup and the second for those that aren¬¥t. 

In [27]:
df_in = data[(data["range_ThermalConductivity"] >= 399.97342) & 
           (data["range_ThermalConductivity"] < 399.97417)]
display(df_in)

Unnamed: 0,number_of_elements,mean_atomic_mass,wtd_mean_atomic_mass,gmean_atomic_mass,wtd_gmean_atomic_mass,entropy_atomic_mass,wtd_entropy_atomic_mass,range_atomic_mass,wtd_range_atomic_mass,std_atomic_mass,...,wtd_mean_Valence,gmean_Valence,wtd_gmean_Valence,entropy_Valence,wtd_entropy_Valence,range_Valence,wtd_range_Valence,std_Valence,wtd_std_Valence,critical_temp
0,4,88.944468,57.862692,66.361592,36.116612,1.181795,1.062396,122.90607,31.794921,51.968828,...,2.257143,2.213364,2.219783,1.368922,1.066221,1,1.085714,0.433013,0.437059,29.0
2,4,88.944468,57.885242,66.361592,36.122509,1.181795,0.975980,122.90607,35.741099,51.968828,...,2.271429,2.213364,2.232679,1.368922,1.029175,1,1.114286,0.433013,0.444697,19.0
3,4,88.944468,57.873967,66.361592,36.119560,1.181795,1.022291,122.90607,33.768010,51.968828,...,2.264286,2.213364,2.226222,1.368922,1.048834,1,1.100000,0.433013,0.440952,22.0
4,4,88.944468,57.840143,66.361592,36.110716,1.181795,1.129224,122.90607,27.848743,51.968828,...,2.242857,2.213364,2.206963,1.368922,1.096052,1,1.057143,0.433013,0.428809,23.0
5,4,88.944468,57.795044,66.361592,36.098926,1.181795,1.225203,122.90607,20.687458,51.968828,...,2.214286,2.213364,2.181543,1.368922,1.141474,1,1.000000,0.433013,0.410326,23.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21150,4,46.846625,36.933336,41.451165,30.353185,1.293330,1.135530,47.54660,15.188437,18.288714,...,2.962500,2.912951,2.609441,1.271181,1.074221,4,1.387500,1.639360,1.698851,117.5
21151,4,46.846625,36.981448,41.451165,30.380291,1.293330,1.181948,47.54660,14.490375,18.288714,...,2.925000,2.912951,2.586929,1.271181,1.119688,4,1.275000,1.639360,1.664144,98.0
21152,3,43.847167,36.885225,37.530778,30.326102,0.974604,1.061829,47.54660,7.886800,20.248023,...,3.000000,2.884499,2.632148,0.950271,1.011404,4,1.000000,1.885618,1.732051,132.0
21246,5,70.406250,43.096221,59.725961,31.931306,1.497519,1.466096,79.96060,11.170606,29.321360,...,2.147710,2.701920,2.103026,1.494365,1.315617,4,1.002954,1.549193,0.589456,78.0


In [28]:
df_outside = data[~((data["range_ThermalConductivity"] >= 399.97342) & 
                           (data["range_ThermalConductivity"] < 399.97417))]
display(df_outside)

Unnamed: 0,number_of_elements,mean_atomic_mass,wtd_mean_atomic_mass,gmean_atomic_mass,wtd_gmean_atomic_mass,entropy_atomic_mass,wtd_entropy_atomic_mass,range_atomic_mass,wtd_range_atomic_mass,std_atomic_mass,...,wtd_mean_Valence,gmean_Valence,wtd_gmean_Valence,entropy_Valence,wtd_entropy_Valence,range_Valence,wtd_range_Valence,std_Valence,wtd_std_Valence,critical_temp
1,5,92.729214,58.518416,73.132787,36.396602,1.449309,1.057755,122.90607,36.161939,47.094633,...,2.257143,1.888175,2.210679,1.557113,1.047221,2,1.128571,0.632456,0.468606,26.00
12,5,92.729214,58.201829,73.132787,36.259297,1.449309,1.026457,122.90607,36.932426,47.094633,...,2.264286,1.888175,2.221652,1.557113,1.040517,2,1.135714,0.632456,0.456864,27.00
13,5,92.729214,58.518416,73.132787,36.396602,1.449309,1.057755,122.90607,36.161939,47.094633,...,2.257143,1.888175,2.210679,1.557113,1.047221,2,1.128571,0.632456,0.468606,27.00
14,5,92.729214,59.468178,73.132787,36.811646,1.449309,1.114758,122.90607,35.741099,47.094633,...,2.235714,1.888175,2.178087,1.557113,1.057441,2,1.114286,0.632456,0.501579,26.00
15,5,92.729214,61.051113,73.132787,37.513930,1.449309,1.146919,122.90607,35.741099,47.094633,...,2.200000,1.888175,2.124829,1.557113,1.053346,2,1.114286,0.632456,0.550325,27.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21257,3,89.389833,89.389833,63.694713,63.694713,0.782574,0.782574,164.13150,54.710500,73.156893,...,4.666667,4.578857,4.578857,1.078992,1.078992,2,0.666667,0.942809,0.942809,3.43
21258,4,106.957877,53.095769,82.515384,43.135565,1.177145,1.254119,146.88130,15.504479,65.764081,...,3.555556,3.223710,3.519911,1.377820,0.913658,1,2.168889,0.433013,0.496904,2.44
21260,2,99.663190,95.609104,99.433882,95.464320,0.690847,0.530198,13.51362,53.041104,6.756810,...,4.800000,4.472136,4.781762,0.686962,0.450561,1,3.200000,0.500000,0.400000,1.98
21261,2,99.663190,97.095602,99.433882,96.901083,0.690847,0.640883,13.51362,31.115202,6.756810,...,4.690000,4.472136,4.665819,0.686962,0.577601,1,2.210000,0.500000,0.462493,1.84


In [29]:
from Optimization_clustering import optimization

  from .autonotebook import tqdm as notebook_tqdm


In [30]:
from sklearn.model_selection import train_test_split
X = df_in.drop(columns=["critical_temp"])
y = df_in["critical_temp"]
X_test, X_train, y_test, y_train = train_test_split(
        X, y, test_size=0.9, random_state=1702
    )
y_test = np.clip(y_test, 1e-6, None)
y_train = np.clip(y_train, 1e-6, None)

    # Optimization
resultado = optimization(X_train, y_train)


[I 2025-07-12 14:33:51,838] A new study created in RDB with name: optimization_clusters_glm_teste12_07_2025_14_33_46
  return np.exp(z)
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  resid = np.power(self.endog - mu, 2) * self.iweights
  return np.power(np.fabs(mu), self.power)
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  wlsendog = (lin_pred + self.family.link.deriv(mu) * (self.endog-mu)
[I 2025-07-12 14:34:24,545] Trial 0 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


  return np.exp(z)
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  resid = np.power(self.endog - mu, 2) * self.iweights
  return np.power(np.fabs(mu), self.power)
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  wlsendog = (lin_pred + self.family.link.deriv(mu) * (self.endog-mu)
[I 2025-07-12 14:34:25,711] Trial 1 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


  return np.sum(resid / self.family.variance(mu)) / self.df_resid
[I 2025-07-12 14:34:51,639] Trial 2 finished with value: 697573625010.7021 and parameters: {'clusterer': 'bisecting_kmeans', 'n_clusters_bkmeans': 19, 'bisecting_strategy': 'biggest_inertia', 'distribution': 'gaussian', 'link_gaussian': 'identity'}. Best is trial 2 with value: 697573625010.7021.
  return np.power(z, 1. / self.power)
[I 2025-07-12 14:36:02,907] Trial 3 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


  return np.power(z, 1. / self.power)
[I 2025-07-12 14:36:03,474] Trial 4 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


  return np.exp(z)
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  wlsendog = (lin_pred + self.family.link.deriv(mu) * (self.endog-mu)
[I 2025-07-12 14:37:01,414] Trial 5 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


  return np.power(z, 1. / self.power)
[I 2025-07-12 14:37:01,725] Trial 6 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


[I 2025-07-12 14:38:48,406] Trial 7 finished with value: 34496.64309561457 and parameters: {'clusterer': 'kmeans', 'n_clusters': 6, 'distribution': 'gamma'}. Best is trial 7 with value: 34496.64309561457.
  return np.power(z, 1. / self.power)
[I 2025-07-12 14:38:49,228] Trial 8 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
[I 2025-07-12 14:40:02,629] Trial 9 pruned. 


The first guess on the deviance function returned a nan.  This could be a boundary  problem and should be reported.


[I 2025-07-12 14:40:20,716] Trial 10 finished with value: 3916.113201586234 and parameters: {'clusterer': 'kmeans', 'n_clusters': 10, 'distribution': 'gaussian', 'link_gaussian': 'identity'}. Best is trial 10 with value: 3916.113201586234.
  return np.exp(z)
  resid_dev = -np.log(endog_mu) + (endog - mu) / mu
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  wlsendog = (lin_pred + self.family.link.deriv(mu) * (self.endog-mu)
[I 2025-07-12 14:42:21,991] Trial 11 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
[I 2025-07-12 14:43:45,464] Trial 12 pruned. 


The first guess on the deviance function returned a nan.  This could be a boundary  problem and should be reported.


  output_errors = _average((y_true - y_pred) ** 2, axis=0, weights=sample_weight)
  return np.exp(z)
[I 2025-07-12 14:45:01,363] Trial 13 pruned. 


Input contains infinity or a value too large for dtype('float64').


  return np.exp(z)
  resid_dev = -np.log(endog_mu) + (endog - mu) / mu
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  wlsendog = (lin_pred + self.family.link.deriv(mu) * (self.endog-mu)
[I 2025-07-12 14:46:47,549] Trial 14 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


  return np.power(z, 1. / self.power)


NaN, inf or invalid value detected in weights, estimation infeasible.


[I 2025-07-12 14:47:35,860] Trial 15 pruned. 
  return np.exp(z)
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  wlsendog = (lin_pred + self.family.link.deriv(mu) * (self.endog-mu)
[I 2025-07-12 14:49:01,105] Trial 16 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


  return np.exp(z)
  resid_dev = -np.log(endog_mu) + (endog - mu) / mu
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  wlsendog = (lin_pred + self.family.link.deriv(mu) * (self.endog-mu)
[I 2025-07-12 14:50:55,908] Trial 17 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.power(z, 1. / self.power)
[I 2025-07-12 14:57:10,185] Trial 18 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


[I 2025-07-12 14:59:07,988] Trial 19 finished with value: 34496.64309561457 and parameters: {'clusterer': 'kmeans', 'n_clusters': 6, 'distribution': 'gamma'}. Best is trial 10 with value: 3916.113201586234.
  return np.exp(z)
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  resid = np.power(self.endog - mu, 2) * self.iweights
  return np.power(np.fabs(mu), self.power)
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  wlsendog = (lin_pred + self.family.link.deriv(mu) * (self.endog-mu)
[I 2025-07-12 14:59:26,182] Trial 20 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


  return np.exp(z)
  resid_dev = -np.log(endog_mu) + (endog - mu) / mu
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  wlsendog = (lin_pred + self.family.link.deriv(mu) * (self.endog-mu)
[I 2025-07-12 15:01:30,529] Trial 21 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


  return np.exp(z)
[I 2025-07-12 15:01:56,938] Trial 22 pruned. 


Input contains infinity or a value too large for dtype('float64').


  return np.exp(z)
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  resid = np.power(self.endog - mu, 2) * self.iweights
  return np.power(np.fabs(mu), self.power)
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  wlsendog = (lin_pred + self.family.link.deriv(mu) * (self.endog-mu)
[I 2025-07-12 15:02:04,327] Trial 23 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


  return np.sum(resid / self.family.variance(mu)) / self.df_resid
[I 2025-07-12 15:07:31,980] Trial 24 pruned. 


The first guess on the deviance function returned a nan.  This could be a boundary  problem and should be reported.


[I 2025-07-12 15:07:34,907] Trial 25 finished with value: 3172056371614520.0 and parameters: {'clusterer': 'kmeans', 'n_clusters': 1, 'distribution': 'gaussian', 'link_gaussian': 'log'}. Best is trial 10 with value: 3916.113201586234.
[I 2025-07-12 15:08:11,562] Trial 26 finished with value: 20.243920210604614 and parameters: {'clusterer': 'kmeans', 'n_clusters': 2, 'distribution': 'gamma'}. Best is trial 26 with value: 20.243920210604614.
  return np.power(z, 1. / self.power)
[I 2025-07-12 15:08:11,908] Trial 27 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.exp(z)
[I 2025-07-12 15:08:37,270] Trial 28 pruned. 


Input contains infinity or a value too large for dtype('float64').


  return np.exp(z)
[I 2025-07-12 15:08:52,970] Trial 29 pruned. 


Input contains infinity or a value too large for dtype('float64').


  return np.sum(resid / self.family.variance(mu)) / self.df_resid
[I 2025-07-12 15:11:50,666] Trial 30 pruned. 


The first guess on the deviance function returned a nan.  This could be a boundary  problem and should be reported.


[I 2025-07-12 15:11:55,971] Trial 31 finished with value: 19.436242713146882 and parameters: {'clusterer': 'kmeans', 'n_clusters': 1, 'distribution': 'gaussian', 'link_gaussian': 'identity'}. Best is trial 31 with value: 19.436242713146882.
[I 2025-07-12 15:12:01,223] Trial 32 finished with value: 19.436242713146882 and parameters: {'clusterer': 'kmeans', 'n_clusters': 1, 'distribution': 'gaussian', 'link_gaussian': 'identity'}. Best is trial 31 with value: 19.436242713146882.
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
[I 2025-07-12 15:14:13,793] Trial 33 pruned. 


The first guess on the deviance function returned a nan.  This could be a boundary  problem and should be reported.


  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
[I 2025-07-12 15:17:32,425] Trial 34 pruned. 


The first guess on the deviance function returned a nan.  This could be a boundary  problem and should be reported.


  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
[I 2025-07-12 15:18:08,793] Trial 35 pruned. 


The first guess on the deviance function returned a nan.  This could be a boundary  problem and should be reported.


  return np.exp(z)
  endog_mu = self._clean(endog / mu)
  endog_mu = self._clean(endog / mu)
  resid_dev = -np.log(endog_mu) + (endog - mu) / mu
  resid_dev = -np.log(endog_mu) + (endog - mu) / mu
  resid_dev = -np.log(endog_mu) + (endog - mu) / mu
  resid_dev = -np.log(endog_mu) + (endog - mu) / mu
  resid = np.power(self.endog - mu, 2) * self.iweights
  return np.power(np.fabs(mu), self.power)
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  wlsendog = (lin_pred + self.family.link.deriv(mu) * (self.endog-mu)
[I 2025-07-12 15:18:22,957] Trial 36 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


  return np.exp(z)
[I 2025-07-12 15:19:08,690] Trial 37 pruned. 


Input contains infinity or a value too large for dtype('float64').


  return np.exp(z)
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  wlsendog = (lin_pred + self.family.link.deriv(mu) * (self.endog-mu)
[I 2025-07-12 15:19:45,321] Trial 38 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


  return np.exp(z)
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  wlsendog = (lin_pred + self.family.link.deriv(mu) * (self.endog-mu)
[I 2025-07-12 15:20:26,202] Trial 39 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


  return np.exp(z)
  resid_dev = -np.log(endog_mu) + (endog - mu) / mu
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  wlsendog = (lin_pred + self.family.link.deriv(mu) * (self.endog-mu)
[I 2025-07-12 15:21:13,877] Trial 40 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


[I 2025-07-12 15:21:26,487] Trial 41 finished with value: 956.8795926869337 and parameters: {'clusterer': 'kmeans', 'n_clusters': 9, 'distribution': 'gaussian', 'link_gaussian': 'identity'}. Best is trial 31 with value: 19.436242713146882.
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
[I 2025-07-12 15:22:04,724] Trial 42 pruned. 


The first guess on the deviance function returned a nan.  This could be a boundary  problem and should be reported.


  return np.power(z, 1. / self.power)
[I 2025-07-12 15:22:46,819] Trial 43 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


  return np.exp(z)
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  wlsendog = (lin_pred + self.family.link.deriv(mu) * (self.endog-mu)
[I 2025-07-12 15:22:48,769] Trial 44 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


[I 2025-07-12 15:23:06,969] Trial 45 finished with value: 18.418579208974233 and parameters: {'clusterer': 'kmeans', 'n_clusters': 3, 'distribution': 'gaussian', 'link_gaussian': 'identity'}. Best is trial 45 with value: 18.418579208974233.
  scale = np.dot(wresid, wresid) / df_resid
  return np.exp(z)
[I 2025-07-12 15:24:01,377] Trial 46 pruned. 


Input contains infinity or a value too large for dtype('float64').


  return np.exp(z)
[I 2025-07-12 15:24:32,268] Trial 47 pruned. 


Input contains infinity or a value too large for dtype('float64').


  return np.sum(resid / self.family.variance(mu)) / self.df_resid
[I 2025-07-12 15:27:32,141] Trial 48 pruned. 


The first guess on the deviance function returned a nan.  This could be a boundary  problem and should be reported.


  return np.sum(resid / self.family.variance(mu)) / self.df_resid
[I 2025-07-12 15:27:42,121] Trial 49 finished with value: 7047413.779711388 and parameters: {'clusterer': 'bisecting_kmeans', 'n_clusters_bkmeans': 11, 'bisecting_strategy': 'largest_cluster', 'distribution': 'gaussian', 'link_gaussian': 'identity'}. Best is trial 45 with value: 18.418579208974233.
  return (endog - mu) ** 2
  resid = np.power(self.endog - mu, 2) * self.iweights
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
[I 2025-07-12 15:28:20,620] Trial 50 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


  return np.exp(z)
[I 2025-07-12 15:28:52,955] Trial 51 pruned. 


Input contains infinity or a value too large for dtype('float64').


[I 2025-07-12 15:29:29,839] Trial 52 finished with value: 61.14842191698876 and parameters: {'clusterer': 'kmeans', 'n_clusters': 4, 'distribution': 'gaussian', 'link_gaussian': 'log'}. Best is trial 45 with value: 18.418579208974233.
[I 2025-07-12 15:29:49,531] Trial 53 finished with value: 19.877653016458467 and parameters: {'clusterer': 'kmeans', 'n_clusters': 5, 'distribution': 'gaussian', 'link_gaussian': 'identity'}. Best is trial 45 with value: 18.418579208974233.
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
[I 2025-07-12 15:31:42,024] Trial 54 pruned. 


The first guess on the deviance function returned a nan.  This could be a boundary  problem and should be reported.


  return np.exp(z)
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  wlsendog = (lin_pred + self.family.link.deriv(mu) * (self.endog-mu)
[I 2025-07-12 15:32:48,601] Trial 55 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


  return (endog - mu) ** 2
  resid = np.power(self.endog - mu, 2) * self.iweights
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
[I 2025-07-12 15:33:59,478] Trial 56 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


[I 2025-07-12 15:34:40,116] Trial 57 finished with value: 61.14842191698876 and parameters: {'clusterer': 'kmeans', 'n_clusters': 4, 'distribution': 'gaussian', 'link_gaussian': 'log'}. Best is trial 45 with value: 18.418579208974233.
  resid = np.power(self.endog - mu, 2) * self.iweights
  return np.power(np.fabs(mu), self.power)
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
[I 2025-07-12 15:34:50,665] Trial 58 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  resid = np.power(self.endog - mu, 2) * self.iweights
  return np.power(np.fabs(mu), self.power)
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
[I 2025-07-12 15:35:14,515] Trial 59 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


  return np.exp(z)
  resid_dev = -np.log(endog_mu) + (endog - mu) / mu
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  wlsendog = (lin_pred + self.family.link.deriv(mu) * (self.endog-mu)
[I 2025-07-12 15:36:08,263] Trial 60 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
[I 2025-07-12 15:36:58,297] Trial 61 pruned. 


The first guess on the deviance function returned a nan.  This could be a boundary  problem and should be reported.


  return np.exp(z)
  resid_dev = -np.log(endog_mu) + (endog - mu) / mu
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  wlsendog = (lin_pred + self.family.link.deriv(mu) * (self.endog-mu)
[I 2025-07-12 15:37:54,803] Trial 62 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


  return np.exp(z)
[I 2025-07-12 15:38:10,478] Trial 63 pruned. 


Input contains infinity or a value too large for dtype('float64').


  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
[I 2025-07-12 15:39:37,170] Trial 64 pruned. 


The first guess on the deviance function returned a nan.  This could be a boundary  problem and should be reported.


  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
[I 2025-07-12 15:43:40,885] Trial 65 pruned. 


The first guess on the deviance function returned a nan.  This could be a boundary  problem and should be reported.


  return np.exp(z)
[I 2025-07-12 15:44:26,787] Trial 66 pruned. 


Input contains infinity or a value too large for dtype('float64').


  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
[I 2025-07-12 15:48:28,924] Trial 67 pruned. 


The first guess on the deviance function returned a nan.  This could be a boundary  problem and should be reported.


  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
[I 2025-07-12 15:52:49,819] Trial 68 pruned. 


The first guess on the deviance function returned a nan.  This could be a boundary  problem and should be reported.


  return (endog - mu) ** 2
  resid = np.power(self.endog - mu, 2) * self.iweights
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
[I 2025-07-12 15:53:43,271] Trial 69 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


  return np.power(z, 1. / self.power)
[I 2025-07-12 15:54:29,996] Trial 70 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
[I 2025-07-12 15:58:09,676] Trial 71 pruned. 


The first guess on the deviance function returned a nan.  This could be a boundary  problem and should be reported.


  return np.sum(resid / self.family.variance(mu)) / self.df_resid
[I 2025-07-12 16:02:36,416] Trial 72 pruned. 


The first guess on the deviance function returned a nan.  This could be a boundary  problem and should be reported.


[I 2025-07-12 16:02:58,345] Trial 73 finished with value: 93066.92021459788 and parameters: {'clusterer': 'bisecting_kmeans', 'n_clusters_bkmeans': 6, 'bisecting_strategy': 'biggest_inertia', 'distribution': 'gaussian', 'link_gaussian': 'identity'}. Best is trial 45 with value: 18.418579208974233.
  return np.exp(z)
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  resid = np.power(self.endog - mu, 2) * self.iweights
  return np.power(np.fabs(mu), self.power)
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  wlsend

NaN, inf or invalid value detected in weights, estimation infeasible.


  return np.sum(resid / self.family.variance(mu)) / self.df_resid
[I 2025-07-12 16:07:10,664] Trial 75 pruned. 


The first guess on the deviance function returned a nan.  This could be a boundary  problem and should be reported.


[I 2025-07-12 16:07:23,422] Trial 76 finished with value: 956.8795926869337 and parameters: {'clusterer': 'kmeans', 'n_clusters': 9, 'distribution': 'gaussian', 'link_gaussian': 'identity'}. Best is trial 45 with value: 18.418579208974233.
[I 2025-07-12 16:07:35,849] Trial 77 finished with value: 956.8795926869337 and parameters: {'clusterer': 'kmeans', 'n_clusters': 9, 'distribution': 'gaussian', 'link_gaussian': 'identity'}. Best is trial 45 with value: 18.418579208974233.
  return np.exp(z)
  resid_dev = -np.log(endog_mu) + (endog - mu) / mu
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  wlsendog = (lin_pred + self.family.link.deriv(mu) * (self.endog-mu)
[I 2025-07-12 16:08:29,302] Trial 78 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


[I 2025-07-12 16:08:38,379] Trial 79 finished with value: 18.790926973229386 and parameters: {'clusterer': 'kmeans', 'n_clusters': 2, 'distribution': 'gaussian', 'link_gaussian': 'identity'}. Best is trial 45 with value: 18.418579208974233.
  return np.exp(z)
[I 2025-07-12 16:08:59,771] Trial 80 pruned. 


Input contains infinity or a value too large for dtype('float64').


  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
[I 2025-07-12 16:09:43,687] Trial 81 pruned. 


The first guess on the deviance function returned a nan.  This could be a boundary  problem and should be reported.


  return np.exp(z)
  resid_dev = -np.log(endog_mu) + (endog - mu) / mu
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  wlsendog = (lin_pred + self.family.link.deriv(mu) * (self.endog-mu)
[I 2025-07-12 16:10:36,545] Trial 82 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


  return np.power(z, 1. / self.power)
[I 2025-07-12 16:11:23,019] Trial 83 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


[I 2025-07-12 16:11:38,555] Trial 84 finished with value: 94758642742.82312 and parameters: {'clusterer': 'bisecting_kmeans', 'n_clusters_bkmeans': 18, 'bisecting_strategy': 'largest_cluster', 'distribution': 'gaussian', 'link_gaussian': 'identity'}. Best is trial 45 with value: 18.418579208974233.
  return np.exp(z)
[I 2025-07-12 16:11:51,314] Trial 85 pruned. 


Input contains infinity or a value too large for dtype('float64').


  return np.exp(z)
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  wlsendog = (lin_pred + self.family.link.deriv(mu) * (self.endog-mu)
[I 2025-07-12 16:11:51,767] Trial 86 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


[I 2025-07-12 16:12:54,464] Trial 87 finished with value: 34496.64309561457 and parameters: {'clusterer': 'kmeans', 'n_clusters': 6, 'distribution': 'gamma'}. Best is trial 45 with value: 18.418579208974233.
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.exp(z)
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  wlsendog = (lin_pred + self.family.link.deriv(mu) * (self.endog-mu)
[I 2025-07-12 16:17:26,522] Trial 88 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


  return np.exp(z)
[I 2025-07-12 16:18:05,956] Trial 89 pruned. 


Input contains infinity or a value too large for dtype('float64').


[I 2025-07-12 16:18:43,555] Trial 90 finished with value: 19.651955081449575 and parameters: {'clusterer': 'kmeans', 'n_clusters': 2, 'distribution': 'gaussian', 'link_gaussian': 'log'}. Best is trial 45 with value: 18.418579208974233.
  return np.exp(z)
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  wlsendog = (lin_pred + self.family.link.deriv(mu) * (self.endog-mu)
[I 2025-07-12 16:19:28,490] Trial 91 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
[I 2025-07-12 16:24:03,292] Trial 92 pruned. 


The first guess on the deviance function returned a nan.  This could be a boundary  problem and should be reported.


  return np.exp(z)
  resid_dev = -np.log(endog_mu) + (endog - mu) / mu
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  wlsendog = (lin_pred + self.family.link.deriv(mu) * (self.endog-mu)
[I 2025-07-12 16:25:01,114] Trial 93 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
[I 2025-07-12 16:28:59,665] Trial 94 pruned. 


The first guess on the deviance function returned a nan.  This could be a boundary  problem and should be reported.


  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
[I 2025-07-12 16:33:40,071] Trial 95 pruned. 


The first guess on the deviance function returned a nan.  This could be a boundary  problem and should be reported.


  return np.sum(resid / self.family.variance(mu)) / self.df_resid
[I 2025-07-12 16:37:51,534] Trial 96 pruned. 


The first guess on the deviance function returned a nan.  This could be a boundary  problem and should be reported.


  return (endog - mu) ** 2
  resid = np.power(self.endog - mu, 2) * self.iweights
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
[I 2025-07-12 16:38:45,166] Trial 97 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


  return np.exp(z)
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  return 1. / (endog * mu ** 2) * (endog - mu) ** 2
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return np.sum(resid / self.family.variance(mu)) / self.df_resid
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  return 1. / (self.link.deriv(mu)**2 * self.variance(mu))
  wlsendog = (lin_pred + self.family.link.deriv(mu) * (self.endog-mu)
[I 2025-07-12 16:38:47,849] Trial 98 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


  return np.power(z, 1. / self.power)
[I 2025-07-12 16:38:48,151] Trial 99 pruned. 


NaN, inf or invalid value detected in weights, estimation infeasible.


The best trial was 45 with RMSE equal to 18.418579208974233. Now, we can test the another

### üóÉÔ∏è **References!**

[1] FOPPA, Lucas; SCHEFFLER, Matthias. Towards a Multi-Objective Optimization of Subgroups for the Discovery of Materials with Exceptional Performance. 2023. Available in: <http://arxiv.org/abs/2311.10381>. Acess in: 12 jul. 2025.

[2] Lemmerich, F., & Becker, M. (2018, September). pysubgroup: Easy-to-use subgroup discovery in python. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECMLPKDD). pp. 658-662.

