# **Truck Delay Classification**

In Part 2, we delve deeper into the machine-learning pipeline. Focusing on data retrieval from the feature store, train-validation-test split, one-hot encoding, scaling numerical features, and leveraging Weights and Biases for model experimentation, we will build our pipeline for model building with logistic regression, random forest, and XGBoost models. 

Further, we explore hyperparameter tuning with sweeps, discuss grid and random search, and, ultimately, the deployment of a Streamlit application on AWS.

## **Data Retrivel from Hopsworks**

In [1]:
import hopsworks

In [2]:
import pandas as pd

In [3]:
project = hopsworks.login()

2024-12-29 17:05:05,663 INFO: Initializing external client
2024-12-29 17:05:05,663 INFO: Base URL: https://c.app.hopsworks.ai:443
2024-12-29 17:05:07,159 INFO: Python Engine initialized.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/1208501


In [4]:
fs = project.get_feature_store()

In [5]:
final_data = fs.get_feature_group('final_data', version = 1)

In [6]:
query = final_data.select_all()

In [7]:
final_merge = query.read()

Finished: Reading data from Hopsworks, using Hopsworks Feature Query Service (0.79s) 


In [8]:
final_merge.head()

Unnamed: 0,unique_id,truck_id,route_id,departure_date,estimated_arrival,delay,route_avg_temp,route_avg_wind_speed,route_avg_precip,route_avg_humidity,...,driver_id,name,gender,age,experience,driving_style,ratings,vehicle_no,average_speed_mph,is_midnight
0,7223,42839716,R-936011ac,2019-01-19 07:00:00+00:00,2019-01-26 07:54:00+00:00,1,83.066667,8.2,0.0,75.333333,...,a7c336d9-2,Jason Figueroa,male,48,13,proactive,5,42839716,59.82,1
1,11943,14316933,R-7278d1fc,2019-02-12 07:00:00+00:00,2019-02-12 19:45:00+00:00,0,60.75,4.25,0.0,57.25,...,000546f3-a,Joshua Hayes,male,37,3,Unknown,7,14316933,60.67,0
2,6490,69577118,R-63bd6f39,2019-01-22 07:00:00+00:00,2019-01-23 01:52:48+00:00,1,78.8,7.4,0.0,74.8,...,b2555587-8,Brendan Jacobs,male,44,10,proactive,2,69577118,59.82,1
3,6448,31295807,R-ab336987,2019-01-22 07:00:00+00:00,2019-01-23 00:01:48+00:00,1,66.8,7.4,0.0,55.4,...,a484098f-c,Brendan Mcdowell,male,52,16,conservative,7,31295807,54.78,1
4,11469,32053728,R-34c87f19,2019-02-10 07:00:00+00:00,2019-02-11 18:36:00+00:00,1,63.375,7.375,0.0125,56.125,...,c157ada0-b,Ronald Smith,male,44,21,conservative,6,32053728,29.22,1


## Data Preprocessing

In [9]:
final_merge.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12308 entries, 0 to 12307
Data columns (total 49 columns):
 #   Column                          Non-Null Count  Dtype                  
---  ------                          --------------  -----                  
 0   unique_id                       12308 non-null  int32                  
 1   truck_id                        12308 non-null  int64                  
 2   route_id                        12308 non-null  object                 
 3   departure_date                  12308 non-null  datetime64[us, Etc/UTC]
 4   estimated_arrival               12308 non-null  datetime64[us, Etc/UTC]
 5   delay                           12308 non-null  int64                  
 6   route_avg_temp                  12308 non-null  float64                
 7   route_avg_wind_speed            12308 non-null  float64                
 8   route_avg_precip                12308 non-null  float64                
 9   route_avg_humidity              12308 n

In [10]:
datetime_columns = [
    "departure_date",
    "estimated_arrival",
    "estimated_arrival_nearest_hour",
    "departure_date_nearest_hour"
]

for column in datetime_columns:
    final_merge[column] = final_merge[column].dt.tz_localize(None)

print(final_merge.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12308 entries, 0 to 12307
Data columns (total 49 columns):
 #   Column                          Non-Null Count  Dtype         
---  ------                          --------------  -----         
 0   unique_id                       12308 non-null  int32         
 1   truck_id                        12308 non-null  int64         
 2   route_id                        12308 non-null  object        
 3   departure_date                  12308 non-null  datetime64[us]
 4   estimated_arrival               12308 non-null  datetime64[us]
 5   delay                           12308 non-null  int64         
 6   route_avg_temp                  12308 non-null  float64       
 7   route_avg_wind_speed            12308 non-null  float64       
 8   route_avg_precip                12308 non-null  float64       
 9   route_avg_humidity              12308 non-null  float64       
 10  route_avg_visibility            12308 non-null  float64       
 11  ro

In [11]:
final_merge.isna().sum()

unique_id                           0
truck_id                            0
route_id                            0
departure_date                      0
estimated_arrival                   0
delay                               0
route_avg_temp                      0
route_avg_wind_speed                0
route_avg_precip                    0
route_avg_humidity                  0
route_avg_visibility                0
route_avg_pressure                  0
route_description                   0
estimated_arrival_nearest_hour      0
departure_date_nearest_hour         0
origin_id                           0
destination_id                      0
distance                            0
average_hours                       0
origin_temp                         4
origin_wind_speed                   4
origin_description                  0
origin_precip                       4
origin_humidity                     4
origin_visibility                   4
origin_pressure                     4
destination_

In [12]:
# Let's check the rows where origin temp is null
final_merge[final_merge['origin_temp'].isnull()]

Unnamed: 0,unique_id,truck_id,route_id,departure_date,estimated_arrival,delay,route_avg_temp,route_avg_wind_speed,route_avg_precip,route_avg_humidity,...,driver_id,name,gender,age,experience,driving_style,ratings,vehicle_no,average_speed_mph,is_midnight
1608,7322,18091756,R-112b790b,2019-01-25 07:00:00,2019-01-27 02:40:48,1,66.555556,6.888889,0.0,90.888889,...,e975a383-c,Neil Herring,male,45,7,proactive,3,18091756,58.02,1
4483,7505,24746768,R-b5f9418a,2019-01-25 07:00:00,2019-01-27 14:35:24,0,47.454545,9.090909,0.0,70.636364,...,3d91387f-2,William Anderson III,male,50,0,conservative,4,24746768,40.69,1
6058,7607,22916520,R-78ee1f97,2019-01-25 07:00:00,2019-01-28 10:08:24,0,57.5,10.142857,0.0,78.214286,...,ffedbf74-a,Thomas Ochoa,male,57,19,proactive,6,22916520,63.64,1
7917,7537,24654257,R-21472caf,2019-01-25 07:00:00,2019-01-27 16:50:24,0,69.0,12.363636,0.018182,79.181818,...,f110642c-1,Marc Walters,male,47,5,proactive,3,24654257,61.93,1


In [13]:
# Let's check the rows where origin humidity is null
# Looks like we have null values in the same rows, let's find out which origin city is this
final_merge[final_merge['origin_humidity'].isnull()]

Unnamed: 0,unique_id,truck_id,route_id,departure_date,estimated_arrival,delay,route_avg_temp,route_avg_wind_speed,route_avg_precip,route_avg_humidity,...,driver_id,name,gender,age,experience,driving_style,ratings,vehicle_no,average_speed_mph,is_midnight
1608,7322,18091756,R-112b790b,2019-01-25 07:00:00,2019-01-27 02:40:48,1,66.555556,6.888889,0.0,90.888889,...,e975a383-c,Neil Herring,male,45,7,proactive,3,18091756,58.02,1
4483,7505,24746768,R-b5f9418a,2019-01-25 07:00:00,2019-01-27 14:35:24,0,47.454545,9.090909,0.0,70.636364,...,3d91387f-2,William Anderson III,male,50,0,conservative,4,24746768,40.69,1
6058,7607,22916520,R-78ee1f97,2019-01-25 07:00:00,2019-01-28 10:08:24,0,57.5,10.142857,0.0,78.214286,...,ffedbf74-a,Thomas Ochoa,male,57,19,proactive,6,22916520,63.64,1
7917,7537,24654257,R-21472caf,2019-01-25 07:00:00,2019-01-27 16:50:24,0,69.0,12.363636,0.018182,79.181818,...,f110642c-1,Marc Walters,male,47,5,proactive,3,24654257,61.93,1


In [14]:
# Fetch the routes data
routes_data = fs.get_feature_group('routes_details_fg', version=1)

routes_data_query = routes_data.select_all()

routes_df = routes_data_query.read(read_options={"use_hive": True})

Finished: Reading data from Hopsworks, using Hopsworks Feature Query Service (0.34s) 


In [15]:
# Find the rows with the routes ids which has no info on origin city's weather on 25th jan
# Only 1 city is there in all these rows
routes_df[routes_df.route_id.isin(['R-112b790b', 'R-78ee1f97','R-b5f9418a', 'R-21472caf'])]

Unnamed: 0,route_id,origin_id,destination_id,distance,average_hours,event_time
6,R-b5f9418a,C-f8f01604,C-4fe0fa24,2779.33,55.59,2023-08-23 00:00:00+00:00
438,R-21472caf,C-f8f01604,C-2e349ccd,2892.14,57.84,2023-08-23 00:00:00+00:00
702,R-112b790b,C-f8f01604,C-d3bb431c,2183.94,43.68,2023-08-23 00:00:00+00:00
1289,R-78ee1f97,C-f8f01604,C-f5ed4c15,3757.02,75.14,2023-08-23 00:00:00+00:00


In [16]:
# Let's check if we have any information on this city
# Fetching the weather data
weather_data = fs.get_feature_group('city_weather_details_fg', version=1)

weather_query = weather_data.select_all()

weather_df = weather_query.read(read_options={"use_hive": True})

Finished: Reading data from Hopsworks, using Hopsworks Feature Query Service (1.14s) 


In [17]:
# Filter the weather data with city and date
# We don't have any information on this, we will remove these rows
# It is important to check with the business regarding the information though 
weather_df[(weather_df.city_id=='C-f8f01604')&(weather_df.date==pd.to_datetime('2019-01-25'))]

Unnamed: 0,city_id,date,hour,temp,wind_speed,description,precip,humidity,visibility,pressure,chanceofrain,chanceoffog,chanceofsnow,chanceofthunder


In [18]:
# Drop the rows

final_merge=final_merge.dropna(subset =  ['origin_temp', 'origin_wind_speed', 'origin_precip',
                                'origin_humidity', 'origin_visibility', 'origin_pressure' ] ).reset_index(drop=True)

In [19]:
# Let's verify the dropped null values
final_merge.isna().sum()

unique_id                           0
truck_id                            0
route_id                            0
departure_date                      0
estimated_arrival                   0
delay                               0
route_avg_temp                      0
route_avg_wind_speed                0
route_avg_precip                    0
route_avg_humidity                  0
route_avg_visibility                0
route_avg_pressure                  0
route_description                   0
estimated_arrival_nearest_hour      0
departure_date_nearest_hour         0
origin_id                           0
destination_id                      0
distance                            0
average_hours                       0
origin_temp                         0
origin_wind_speed                   0
origin_description                  0
origin_precip                       0
origin_humidity                     0
origin_visibility                   0
origin_pressure                     0
destination_

In [20]:
final_merge

Unnamed: 0,unique_id,truck_id,route_id,departure_date,estimated_arrival,delay,route_avg_temp,route_avg_wind_speed,route_avg_precip,route_avg_humidity,...,driver_id,name,gender,age,experience,driving_style,ratings,vehicle_no,average_speed_mph,is_midnight
0,7223,42839716,R-936011ac,2019-01-19 07:00:00,2019-01-26 07:54:00,1,83.066667,8.200,0.0000,75.333333,...,a7c336d9-2,Jason Figueroa,male,48,13,proactive,5,42839716,59.82,1
1,11943,14316933,R-7278d1fc,2019-02-12 07:00:00,2019-02-12 19:45:00,0,60.750000,4.250,0.0000,57.250000,...,000546f3-a,Joshua Hayes,male,37,3,Unknown,7,14316933,60.67,0
2,6490,69577118,R-63bd6f39,2019-01-22 07:00:00,2019-01-23 01:52:48,1,78.800000,7.400,0.0000,74.800000,...,b2555587-8,Brendan Jacobs,male,44,10,proactive,2,69577118,59.82,1
3,6448,31295807,R-ab336987,2019-01-22 07:00:00,2019-01-23 00:01:48,1,66.800000,7.400,0.0000,55.400000,...,a484098f-c,Brendan Mcdowell,male,52,16,conservative,7,31295807,54.78,1
4,11469,32053728,R-34c87f19,2019-02-10 07:00:00,2019-02-11 18:36:00,1,63.375000,7.375,0.0125,56.125000,...,c157ada0-b,Ronald Smith,male,44,21,conservative,6,32053728,29.22,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12299,3551,22239531,R-18b58420,2019-01-13 07:00:00,2019-01-13 11:03:36,0,60.000000,5.500,0.0000,79.000000,...,7efdb081-c,Tyler Smith,male,47,11,proactive,5,22239531,56.31,0
12300,4912,20946964,R-b27b0bae,2019-01-17 07:00:00,2019-01-18 22:15:36,1,50.250000,5.875,0.0000,73.250000,...,a2927162-a,Phillip Escobar,male,39,6,proactive,7,20946964,63.96,1
12301,1042,28529541,R-e9be806f,2019-01-04 07:00:00,2019-01-04 12:55:12,0,78.666667,7.000,0.0000,55.666667,...,f7b58246-e,Brian Goodwin,male,39,6,Unknown,7,28529541,59.09,0
12302,9703,27799951,R-48ee4120,2019-02-03 07:00:00,2019-02-04 02:36:36,0,46.200000,11.400,0.0000,59.600000,...,2f2e8d57-9,Joshua Martinez,male,46,9,conservative,3,27799951,38.87,1


### **Train - Validation -  Test split**

In [22]:
#selecting necessary columns and removing id columns

cts_cols=['route_avg_temp', 'route_avg_wind_speed',
       'route_avg_precip', 'route_avg_humidity', 'route_avg_visibility',
       'route_avg_pressure', 'distance', 'average_hours',
       'origin_temp', 'origin_wind_speed', 'origin_precip', 'origin_humidity',
       'origin_visibility', 'origin_pressure',
       'destination_temp','destination_wind_speed','destination_precip',
       'destination_humidity', 'destination_visibility','destination_pressure',
        'avg_no_of_vehicles', 'truck_age','load_capacity_pounds', 'mileage_mpg',
        'age', 'experience','average_speed_mph']


cat_cols=['route_description',
       'origin_description', 'destination_description',
        'accident', 'fuel_type',
       'gender', 'driving_style', 'ratings','is_midnight']


target=['delay']



In [23]:
# Checking the date range
final_merge['estimated_arrival'].min(), final_merge['estimated_arrival'].max()

(Timestamp('2019-01-01 07:04:48'), Timestamp('2019-02-14 16:06:00'))

In [24]:
# Splitting the data into training, validation, and test sets based on date

train_df = final_merge[final_merge['estimated_arrival'] <= pd.to_datetime('2019-01-30')]

validation_df = final_merge[(final_merge['estimated_arrival'] > pd.to_datetime('2019-01-30')) &

                            (final_merge['estimated_arrival'] <= pd.to_datetime('2019-02-07'))]

test_df = final_merge[final_merge['estimated_arrival'] > pd.to_datetime('2019-02-07')]

In [25]:
X_train=train_df[cts_cols+cat_cols]

y_train=train_df['delay']



In [26]:
X_valid = validation_df[cts_cols + cat_cols]

y_valid = validation_df['delay']

X_test=test_df[cts_cols+cat_cols]

y_test=test_df['delay']

### Data Preprocessing

In [28]:
load_capacity_mode = X_train['load_capacity_pounds'].mode()

load_capacity_mode

0    3000.0
Name: load_capacity_pounds, dtype: float64

In [29]:
X_train['load_capacity_pounds']=X_train['load_capacity_pounds'].fillna(load_capacity_mode.iloc[0])
X_valid['load_capacity_pounds']=X_valid['load_capacity_pounds'].fillna(load_capacity_mode.iloc[0])
X_test['load_capacity_pounds']=X_test['load_capacity_pounds'].fillna(load_capacity_mode.iloc[0])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


In [30]:
X_train.isna().sum()

route_avg_temp             0
route_avg_wind_speed       0
route_avg_precip           0
route_avg_humidity         0
route_avg_visibility       0
route_avg_pressure         0
distance                   0
average_hours              0
origin_temp                0
origin_wind_speed          0
origin_precip              0
origin_humidity            0
origin_visibility          0
origin_pressure            0
destination_temp           0
destination_wind_speed     0
destination_precip         0
destination_humidity       0
destination_visibility     0
destination_pressure       0
avg_no_of_vehicles         0
truck_age                  0
load_capacity_pounds       0
mileage_mpg                0
age                        0
experience                 0
average_speed_mph          0
route_description          0
origin_description         0
destination_description    0
accident                   0
fuel_type                  0
gender                     0
driving_style              0
ratings       

In [31]:
X_valid.isna().sum()

route_avg_temp             0
route_avg_wind_speed       0
route_avg_precip           0
route_avg_humidity         0
route_avg_visibility       0
route_avg_pressure         0
distance                   0
average_hours              0
origin_temp                0
origin_wind_speed          0
origin_precip              0
origin_humidity            0
origin_visibility          0
origin_pressure            0
destination_temp           0
destination_wind_speed     0
destination_precip         0
destination_humidity       0
destination_visibility     0
destination_pressure       0
avg_no_of_vehicles         0
truck_age                  0
load_capacity_pounds       0
mileage_mpg                0
age                        0
experience                 0
average_speed_mph          0
route_description          0
origin_description         0
destination_description    0
accident                   0
fuel_type                  0
gender                     0
driving_style              0
ratings       

In [32]:
X_test.isna().sum()


route_avg_temp             0
route_avg_wind_speed       0
route_avg_precip           0
route_avg_humidity         0
route_avg_visibility       0
route_avg_pressure         0
distance                   0
average_hours              0
origin_temp                0
origin_wind_speed          0
origin_precip              0
origin_humidity            0
origin_visibility          0
origin_pressure            0
destination_temp           0
destination_wind_speed     0
destination_precip         0
destination_humidity       0
destination_visibility     0
destination_pressure       0
avg_no_of_vehicles         0
truck_age                  0
load_capacity_pounds       0
mileage_mpg                0
age                        0
experience                 0
average_speed_mph          0
route_description          0
origin_description         0
destination_description    0
accident                   0
fuel_type                  0
gender                     0
driving_style              0
ratings       

### Encoding

In [34]:
# Importing Standard Scaler and One-Hot Encoder
from sklearn.preprocessing import OneHotEncoder
from pickle import dump

In [35]:
# Creating the One-Hot Encoder
encoder = OneHotEncoder(sparse_output=False, handle_unknown='ignore')

In [36]:
final_merge.head(2)

Unnamed: 0,unique_id,truck_id,route_id,departure_date,estimated_arrival,delay,route_avg_temp,route_avg_wind_speed,route_avg_precip,route_avg_humidity,...,driver_id,name,gender,age,experience,driving_style,ratings,vehicle_no,average_speed_mph,is_midnight
0,7223,42839716,R-936011ac,2019-01-19 07:00:00,2019-01-26 07:54:00,1,83.066667,8.2,0.0,75.333333,...,a7c336d9-2,Jason Figueroa,male,48,13,proactive,5,42839716,59.82,1
1,11943,14316933,R-7278d1fc,2019-02-12 07:00:00,2019-02-12 19:45:00,0,60.75,4.25,0.0,57.25,...,000546f3-a,Joshua Hayes,male,37,3,Unknown,7,14316933,60.67,0


In [37]:
encode_columns = ['route_description', 'origin_description', 'destination_description', 'fuel_type', 'gender', 'driving_style']

In [38]:
# Fitting the encoder on the training data
encoder.fit(X_train[encode_columns])

In [39]:
# Generating names for the new one-hot encoded features
encoded_features = list(encoder.get_feature_names_out(encode_columns))

In [40]:
encoded_features

['route_description_Blizzard',
 'route_description_Blowing snow',
 'route_description_Clear',
 'route_description_Cloudy',
 'route_description_Fog',
 'route_description_Freezing drizzle',
 'route_description_Freezing fog',
 'route_description_Heavy rain',
 'route_description_Heavy rain at times',
 'route_description_Heavy snow',
 'route_description_Light drizzle',
 'route_description_Light freezing rain',
 'route_description_Light rain',
 'route_description_Light rain shower',
 'route_description_Light sleet',
 'route_description_Light sleet showers',
 'route_description_Light snow',
 'route_description_Mist',
 'route_description_Moderate or heavy freezing rain',
 'route_description_Moderate or heavy rain shower',
 'route_description_Moderate or heavy rain with thunder',
 'route_description_Moderate or heavy sleet',
 'route_description_Moderate or heavy sleet showers',
 'route_description_Moderate or heavy snow showers',
 'route_description_Moderate or heavy snow with thunder',
 'route

In [41]:
# Transforming the training, validation, and test sets

X_train[encoded_features] = encoder.transform(X_train[encode_columns])

X_valid[encoded_features] = encoder.transform(X_valid[encode_columns])

X_test[encoded_features] = encoder.transform(X_test[encode_columns])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user

In [42]:
# Dumping the encoder for future use
dump(encoder, open('truck_data_encoder.pkl', 'wb'))

In [43]:
# Dropping the original categorical features

X_train = X_train.drop(encode_columns, axis=1)

X_valid = X_valid.drop(encode_columns, axis=1)

X_test = X_test.drop(encode_columns, axis=1)

### Scaling numerical features

In [45]:
# Import Scaler
from sklearn.preprocessing import StandardScaler

In [46]:
scaler = StandardScaler()
X_train[cts_cols] = scaler.fit_transform(X_train[cts_cols])
X_valid[cts_cols] = scaler.transform(X_valid[cts_cols])
X_test[cts_cols] = scaler.transform(X_test[cts_cols])

In [47]:
# Dump the scaler to use in transforming test data

dump(scaler, open('truck_data_scaler.pkl', 'wb'))

## **Model Building and Experimentation**

Experiment tracking involves organizing and documenting the details of each model experiment, such as hyperparameters, performance metrics, and data versions.
Benefits: Ensures reproducibility, simplifies model comparisons, fosters collaboration, and supports data-driven decision-making.


A Model Registry serves as a centralized platform for managing, versioning, and monitoring machine learning models throughout their lifecycle.
Benefits: Facilitates version control, enhances collaboration, offers deployment capabilities, preserves metadata, and improves reproducibility.

## **Connecting to W&B (Weights and Biases)**

Weights and Biases (W&B) is a collaborative platform designed to assist machine learning practitioners in tracking, visualizing, and analyzing experiments. It offers robust tools for experiment management, performance visualization, and team collaboration. Key features include:

* Experiment Tracking:
  W&B enables logging of parameters, metrics, and artifacts for machine learning experiments, such as hyperparameters, model architectures, training metrics, evaluation results, and visualizations.

* Visualization:
Create detailed visualizations like charts, graphs, and plots to analyze experiment results and observe how parameter changes affect model performance over time.

* Hyperparameter Sweeps:
Automate hyperparameter tuning by defining ranges of values. W&B conducts multiple experiments with different combinations, helping identify the best hyperparameter configuration.

* Collaboration and Reproducibility:
Share and reproduce experiment results easily within teams, ensuring that all members can replicate and understand each other's work.

* Model Registry:
Use W&B’s model registry to version, organize, and compare trained models, facilitating consistent tracking and deployment.

* Integrations:
W&B integrates seamlessly with popular machine learning frameworks like TensorFlow, PyTorch, and Scikit-learn, making it adaptable to various workflows.

In [49]:
pip install wandb

Note: you may need to restart the kernel to use updated packages.


In [50]:
# Import Libraries
import wandb
import joblib
import os

In [83]:
wandb.login()

True

In [52]:
USER_NAME = "ahujavaibhav2001-st-lawrence-college-org"

PROJECT_NAME = "Truck Delay Classification"

### Classification Evaluation Metrics

In [54]:
# Importing training libraries and evaluation metrics

from sklearn.metrics import f1_score, recall_score, confusion_matrix, roc_auc_score

from sklearn.linear_model import LogisticRegression

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier

from xgboost import XGBClassifier

from sklearn.model_selection import GridSearchCV

In [55]:
# Evaluation function
# #Columns needed to compare metrics
comparison_columns = ['Model_Name', 'Train_F1score', 'Train_Recall', 'Valid_F1score', 'Valid_Recall', 'Test_F1score', 'Test_Recall']

comparison_df = pd.DataFrame()



def evaluate_models(model_name, model_defined_var, X_train, y_train, X_valid, y_valid, X_test, y_test):
  ''' This function predicts and evaluates various models for classification'''

  # train predictions
  y_train_pred = model_defined_var.predict(X_train)
  # train performance
  train_f1_score = f1_score(y_train, y_train_pred)
  train_recall = recall_score(y_train, y_train_pred)

  # validation predictions
  y_valid_pred = model_defined_var.predict(X_valid)
  # validation performance
  valid_f1_score = f1_score(y_valid, y_valid_pred)
  valid_recall = recall_score(y_valid, y_valid_pred)

  # test predictions
  y_pred = model_defined_var.predict(X_test)
  # test performance
  test_f1_score = f1_score(y_test, y_pred)
  test_recall = recall_score(y_test, y_pred)

  # Printing performance
  print("Train Results")
  print(f'F1 Score: {train_f1_score}')
  print(f'Recall Score: {train_recall}')
  print(f'Confusion Matrix: \n{confusion_matrix(y_train, y_train_pred)}')
  print(f'Area Under Curve: {roc_auc_score(y_train, y_train_pred)}')

  print(" ")

  print("Validation Results")
  print(f'F1 Score: {valid_f1_score}')
  print(f'Recall Score: {valid_recall}')
  print(f'Confusion Matrix: \n{confusion_matrix(y_valid, y_valid_pred)}')
  print(f'Area Under Curve: {roc_auc_score(y_valid, y_valid_pred)}')

  print(" ")

  print("Test Results")
  print(f'F1 Score: {test_f1_score}')
  print(f'Recall Score: {test_recall}')
  print(f'Confusion Matrix: \n{confusion_matrix(y_test, y_pred)}')
  print(f'Area Under Curve: {roc_auc_score(y_test, y_pred)}')

  # Saving our results
  global comparison_columns
  metric_scores = [model_name, train_f1_score, train_recall, valid_f1_score, valid_recall, test_f1_score, test_recall]
  final_dict = dict(zip(comparison_columns, metric_scores))
  return final_dict


final_list = []
def add_dic_to_final_df(final_dict):
  global final_list
  final_list.append(final_dict)
  global comparison_df
  comparison_df = pd.DataFrame(final_list, columns=comparison_columns)


### Logistic Regression

In [57]:
y_train.value_counts().to_dict()[1]

2651

In [58]:
weights = len(X_train)/(2*(y_train.value_counts().to_dict()[0])), len(X_train)/(2*(y_train.value_counts().to_dict()[1]))
weights

(0.7387858043595749, 1.5469634100339495)

In [59]:
# Define model
log_reg = LogisticRegression(random_state=13, class_weight={0:weights[0], 1:weights[1]})
# fit it
log_reg.fit(X_train,y_train)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


In [60]:
logistic_results = evaluate_models("Logistic Regression", log_reg, X_train, y_train, X_valid, y_valid, X_test, y_test)

add_dic_to_final_df(logistic_results)

Train Results
F1 Score: 0.5748006379585326
Recall Score: 0.6797434930215013
Confusion Matrix: 
[[3734 1817]
 [ 849 1802]]
Area Under Curve: 0.6762075418629394
 
Validation Results
F1 Score: 0.6087408949011447
Recall Score: 0.6956004756242569
Confusion Matrix: 
[[999 496]
 [256 585]]
Area Under Curve: 0.6819139501867103
 
Test Results
F1 Score: 0.7251675807434491
Recall Score: 0.7428214731585518
Confusion Matrix: 
[[720 245]
 [206 595]]
Area Under Curve: 0.7444677313979288


### Logistic Regression - Experiment Tracking

In [150]:
import joblib

w = {0: weights[0], 1: weights[1]}

def train_logistic_model(X_train=X_train, y_train=y_train, X_valid=X_valid, y_valid=y_valid, X_test=X_test, y_test=y_test):
    features = X_train.columns

    with wandb.init(project=PROJECT_NAME) as run:
        config = wandb.config
        params= {"random_state":13,
    "class_weight":w}

        model = LogisticRegression(**params)

        model.fit(X_train, y_train)
        
        # train predictions
        y_train_pred = model.predict(X_train)
        # train performance
        train_f1_score = f1_score(y_train, y_train_pred)


        # validation predictions
        y_valid_pred = model.predict(X_valid)
        # validation performance
        valid_f1_score = f1_score(y_valid, y_valid_pred)

        
        # test predictions
        y_preds = model.predict(X_test)
        y_probas = model.predict_proba(X_test)

        score = f1_score(y_test, y_preds)
        print(f"F1_score Train: {round(train_f1_score, 4)}")
        print(f"F1_score Valid: {round(valid_f1_score, 4)}")
        print(f"F1_score Test: {round(score, 4)}")


        wandb.log({"f1_score_train": train_f1_score})
        wandb.log({"f1_score_valid": valid_f1_score})
        wandb.log({"f1_score": score})

        wandb.sklearn.plot_classifier(model, X_train, X_test, y_train, y_test,
                                            y_preds, y_probas, labels= None, model_name='LogisticRegression', feature_names=features)

        model_artifact = wandb.Artifact(
                    "LogisticRegression", type="model",metadata=dict(config))

        joblib.dump(model, "log-truck-model.pkl")
        model_artifact.add_file("log-truck-model.pkl")

        # Log the artifact to wandb
        run.log_artifact(model_artifact)

In [152]:
train_logistic_model(X_train, y_train,X_valid, y_valid, X_test, y_test)



STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


F1_score Train: 0.5748
F1_score Valid: 0.6087
F1_score Test: 0.7252


wandb: 
wandb: Plotting LogisticRegression.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.72517
f1_score_train,0.5748
f1_score_valid,0.60874


### Decision Trees

In [91]:
# define model
w = {0: weights[0], 1: weights[1]}
random_f = RandomForestClassifier(n_estimators=20, class_weight=w, random_state=7)
random_f.fit(X_train, y_train)

randomf_results = evaluate_models("Random Forest", random_f,X_train, y_train, X_valid, y_valid, X_test, y_test)
add_dic_to_final_df(randomf_results)

Train Results
F1 Score: 0.9925897776933308
Recall Score: 0.985288570350811
Confusion Matrix: 
[[5551    0]
 [  39 2612]]
Area Under Curve: 0.9926442851754055
 
Validation Results
F1 Score: 0.525984251968504
Recall Score: 0.3971462544589774
Confusion Matrix: 
[[1400   95]
 [ 507  334]]
Area Under Curve: 0.6668005519786526
 
Test Results
F1 Score: 0.6245059288537549
Recall Score: 0.49313358302122345
Confusion Matrix: 
[[896  69]
 [406 395]]
Area Under Curve: 0.7108154961738242


In [166]:


def train_random_forest(X_train=X_train, y_train=y_train, X_valid=X_valid, y_valid=y_valid, X_test=X_test, y_test=y_test):
  features = X_train.columns
  labels=["delay"]

  with wandb.init(project=PROJECT_NAME) as run:
      config = wandb.config

      rand_model = RandomForestClassifier(n_estimators=20, class_weight=w, random_state=7)

      rand_model.fit(X_train, y_train)
      # train predictions
      y_train_pred = rand_model.predict(X_train)
      # train performance
      train_f1_score = f1_score(y_train, y_train_pred)


      # validation predictions
      y_valid_pred = rand_model.predict(X_valid)
      # validation performance
      valid_f1_score = f1_score(y_valid, y_valid_pred)


      # test predictions
      y_preds = rand_model.predict(X_test)
      y_probas = rand_model.predict_proba(X_test)

      score = f1_score(y_test, y_preds)
      print(f"F1_score Train: {round(train_f1_score, 4)}")
      print(f"F1_score Valid: {round(valid_f1_score, 4)}")
      print(f"F1_score Test: {round(score, 4)}")


      wandb.log({"f1_score_train": train_f1_score})
      wandb.log({"f1_score_valid": valid_f1_score})
      wandb.log({"f1_score": score})



      wandb.sklearn.plot_classifier(rand_model, X_train, X_test, y_train, y_test, y_preds, y_probas, labels=None,
                                                          model_name='RandomForestClassifier', feature_names=features)

      model_artifact = wandb.Artifact(
                  "RandomForestClassifier", type="model",metadata=dict(config))

      joblib.dump(rand_model, "rand-truck-model.pkl")
      model_artifact.add_file("rand-truck-model.pkl")

        # Log the artifact to wandb
      run.log_artifact(model_artifact)

In [168]:
train_random_forest(X_train, y_train,X_valid, y_valid, X_test, y_test)



F1_score Train: 0.9926
F1_score Valid: 0.526
F1_score Test: 0.6245


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.62451
f1_score_train,0.99259
f1_score_valid,0.52598


### XGBoost

In [104]:
# import xgboost
import xgboost as xgb

# Convert training and test sets to DMatrix
dtrain = xgb.DMatrix(X_train, label=y_train)
dvalid = xgb.DMatrix(X_valid, label=y_valid)
dtest = xgb.DMatrix(X_test, label=y_test)

# Train initial model
params = {'objective': 'multi:softmax', 'num_class': 2, 'seed': 7}
num_rounds = 30
xgbmodel = xgb.train(params, dtrain, num_rounds, evals=[(dvalid, 'validation')], early_stopping_rounds=10)

xgb_results = evaluate_models("XGB", xgbmodel, dtrain, y_train, dvalid, y_valid, dtest, y_test)
add_dic_to_final_df(xgb_results)

[0]	validation-mlogloss:0.61447
[1]	validation-mlogloss:0.57315
[2]	validation-mlogloss:0.54766
[3]	validation-mlogloss:0.53639
[4]	validation-mlogloss:0.53454
[5]	validation-mlogloss:0.52965
[6]	validation-mlogloss:0.52972
[7]	validation-mlogloss:0.53009
[8]	validation-mlogloss:0.53387
[9]	validation-mlogloss:0.53433
[10]	validation-mlogloss:0.53538
[11]	validation-mlogloss:0.53662
[12]	validation-mlogloss:0.53636
[13]	validation-mlogloss:0.53512
[14]	validation-mlogloss:0.53708
Train Results
F1 Score: 0.7110494683310218
Recall Score: 0.5801584307808374
Confusion Matrix: 
[[5414  137]
 [1113 1538]]
Area Under Curve: 0.7777390964929227
 
Validation Results
F1 Score: 0.5916542473919523
Recall Score: 0.47205707491082044
Confusion Matrix: 
[[1391  104]
 [ 444  397]]
Area Under Curve: 0.701245928759758
 
Test Results
F1 Score: 0.6825633383010432
Recall Score: 0.5717852684144819
Confusion Matrix: 
[[882  83]
 [343 458]]
Area Under Curve: 0.7428874528600907


In [142]:
import joblib
import xgboost as xgb

def train_xgb_model(X_train=X_train, y_train=y_train, X_valid=X_valid, y_valid=y_valid, X_test=X_test, y_test=y_test):
  features = X_train.columns
  labels=["delay"]

  with wandb.init(project=PROJECT_NAME) as run:
      config = wandb.config


      # Convert training and test sets to DMatrix
      dtrain = xgb.DMatrix(X_train, label=y_train)
      dvalid = xgb.DMatrix(X_valid, label=y_valid)
      dtest = xgb.DMatrix(X_test, label=y_test)

      # Train initial model
      params = {'objective': 'multi:softmax', 'num_class': 2}
      num_rounds = 30
      xgbmodel = xgb.train(params, dtrain, num_rounds, evals=[(dvalid, 'validation')], early_stopping_rounds=10)
        
      # train predictions
      y_train_pred = xgbmodel.predict(dtrain)
      # train performance
      train_f1_score = f1_score(y_train, y_train_pred)


      # validation predictions
      y_valid_pred = xgbmodel.predict(dvalid)
      # validation performance
      valid_f1_score = f1_score(y_valid, y_valid_pred)


      # test predictions
      y_preds = xgbmodel.predict(dtest)
      score = f1_score(y_test, y_preds)
      print(f"F1_score Train: {round(train_f1_score, 4)}")
      print(f"F1_score Valid: {round(valid_f1_score, 4)}")
      print(f"F1_score Test: {round(score, 4)}")


      wandb.log({"f1_score_train": train_f1_score})
      wandb.log({"f1_score_valid": valid_f1_score})
      wandb.log({"f1_score": score})


      model_artifact = wandb.Artifact(
                  "XGBoost", type="model",metadata=dict(config))

      joblib.dump(xgbmodel, "xgb-truck-model.pkl")
      model_artifact.add_file("xgb-truck-model.pkl")

        # Log the artifact to wandb
      run.log_artifact(model_artifact)

In [144]:
train_xgb_model(X_train, y_train,X_valid, y_valid, X_test, y_test)



[0]	validation-mlogloss:0.61447
[1]	validation-mlogloss:0.57315
[2]	validation-mlogloss:0.54766
[3]	validation-mlogloss:0.53639
[4]	validation-mlogloss:0.53454
[5]	validation-mlogloss:0.52965
[6]	validation-mlogloss:0.52972
[7]	validation-mlogloss:0.53009
[8]	validation-mlogloss:0.53387
[9]	validation-mlogloss:0.53433
[10]	validation-mlogloss:0.53538
[11]	validation-mlogloss:0.53662
[12]	validation-mlogloss:0.53636
[13]	validation-mlogloss:0.53512
[14]	validation-mlogloss:0.53708
F1_score Train: 0.711
F1_score Valid: 0.5917
F1_score Test: 0.6826


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.68256
f1_score_train,0.71105
f1_score_valid,0.59165


In [122]:
comparison_df

Unnamed: 0,Model_Name,Train_F1score,Train_Recall,Valid_F1score,Valid_Recall,Test_F1score,Test_Recall
0,Logistic Regression,0.574801,0.679743,0.608741,0.6956,0.725168,0.742821
1,Random Forest,0.99259,0.985289,0.525984,0.397146,0.624506,0.493134
2,XGB,0.711049,0.580158,0.591654,0.472057,0.682563,0.571785


### **Model Building Summary**

Logistic Regression:

* The F1 score and recall on the training set are relatively low, indicating that the model might not fit the training data well.
* The F1 score and recall on the validation and test sets are high but we'll have to look for consistency of  the model results over high scores.

Random Forest:

* The Random Forest model has very high F1 scores and recall on the training set, indicating a potential overfitting issue.
* On the validation set, the F1 score is lower than on the training set, which is expected, but it's still relatively high.
* The model performs well on the test set with a better F1 score and recall.
* To address potential overfitting, we can experiment with reducing the complexity of the Random Forest model, by limiting the depth of the trees.

XGBoost:

* XGBoost shows good F1 scores and recall on the training set, indicating a reasonable fit to the training data.
* On the validation set, the F1 score is reasonable, suggesting decent generalization.
* The model performs well on the test set with a high F1 score and recall, indicating good generalization to unseen data.
* XGBoost seems to be the most promising model, we can do hyperparameter tuning to see if we can improve its performance even more.

### Hyperparameter Sweeps

In [125]:
import joblib
w = {0: weights[0], 1: weights[1]}
def train_rf_model(X_train=X_train, y_train=y_train, X_valid=X_valid,y_valid=y_valid, X_test=X_test, y_test=y_test):
    features = X_train.columns

    with wandb.init(
        project=PROJECT_NAME ) as run:
        config = wandb.config

        model = RandomForestClassifier(
            n_estimators=config["n_estimators"],
            max_depth=config["max_depth"],
            min_samples_split=config["min_samples_split"],
            random_state=7,
            class_weight=w
        )
        model.fit(X_train, y_train)
        
        # train predictions
        y_train_pred = model.predict(X_train)
        # train performance
        train_f1_score = f1_score(y_train, y_train_pred)
        

        # validation predictions
        y_valid_pred = model.predict(X_valid)
        # validation performance
        valid_f1_score = f1_score(y_valid, y_valid_pred)
      

        y_preds = model.predict(X_test)
        y_probas = model.predict_proba(X_test)

        score = f1_score(y_test, y_preds)
        print(f"F1_score Train: {round(train_f1_score, 4)}")
        print(f"F1_score Valid: {round(valid_f1_score, 4)}")
        print(f"F1_score Test: {round(score, 4)}")


        wandb.log({"f1_score_train": train_f1_score})
        wandb.log({"f1_score_valid": valid_f1_score})
        wandb.log({"f1_score": score})

        wandb.sklearn.plot_classifier(model, X_train, X_test, y_train, y_test, y_preds, y_probas, labels=None,
                                                          model_name='RandomForestClassifier', feature_names=features)

        model_artifact = wandb.Artifact(
            "RandomForestClassifier", type="model",metadata=dict(config))
        joblib.dump(model, "random_f_tuned.pkl")
        model_artifact.add_file("random_f_tuned.pkl")
        run.log_artifact(model_artifact)

In [127]:
random_f.get_params()

{'bootstrap': True,
 'ccp_alpha': 0.0,
 'class_weight': {0: 0.7387858043595749, 1: 1.5469634100339495},
 'criterion': 'gini',
 'max_depth': None,
 'max_features': 'sqrt',
 'max_leaf_nodes': None,
 'max_samples': None,
 'min_impurity_decrease': 0.0,
 'min_samples_leaf': 1,
 'min_samples_split': 2,
 'min_weight_fraction_leaf': 0.0,
 'monotonic_cst': None,
 'n_estimators': 20,
 'n_jobs': None,
 'oob_score': False,
 'random_state': 7,
 'verbose': 0,
 'warm_start': False}

In [129]:
sweep_configs = {
    "method": "grid",
    "metric": {
        "name": "f1_score",
        "goal": "maximize"
    },
    "parameters": {
        "n_estimators": {
            "values": [8, 12, 16,20]
        },
        "max_depth": {
            "values": [None, 5, 10, 15, 20]
        },
        "min_samples_split": {
            "values": [2, 4, 8, 12]
        }
    }
}
# Then we initialize the sweep and run the sweep agent.

sweep_id = wandb.sweep(
    sweep=sweep_configs,
    project=PROJECT_NAME
)



wandb.agent(
    project=PROJECT_NAME,
    sweep_id=sweep_id,
    function=train_rf_model
)

Create sweep with ID: 9mnowi7x
Sweep URL: https://wandb.ai/ahujavaibhav2001-st-lawrence-college/Truck%20Delay%20Classification/sweeps/9mnowi7x


wandb: Agent Starting Run: 0mvgc8vn with config:
wandb: 	max_depth: None
wandb: 	min_samples_split: 2
wandb: 	n_estimators: 8


F1_score Train: 0.9587
F1_score Valid: 0.5023
F1_score Test: 0.5928


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.59277
f1_score_train,0.95865
f1_score_valid,0.50233


wandb: Agent Starting Run: y8wlz40v with config:
wandb: 	max_depth: None
wandb: 	min_samples_split: 2
wandb: 	n_estimators: 12


F1_score Train: 0.9765
F1_score Valid: 0.5313
F1_score Test: 0.622


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.62205
f1_score_train,0.97646
f1_score_valid,0.53135


wandb: Agent Starting Run: 7smg0nv7 with config:
wandb: 	max_depth: None
wandb: 	min_samples_split: 2
wandb: 	n_estimators: 16


F1_score Train: 0.9872
F1_score Valid: 0.5182
F1_score Test: 0.6144


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.61441
f1_score_train,0.9872
f1_score_valid,0.51817


wandb: Agent Starting Run: hnt0d1aj with config:
wandb: 	max_depth: None
wandb: 	min_samples_split: 2
wandb: 	n_estimators: 20


F1_score Train: 0.9926
F1_score Valid: 0.526
F1_score Test: 0.6245


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.62451
f1_score_train,0.99259
f1_score_valid,0.52598


wandb: Agent Starting Run: 46t64y1y with config:
wandb: 	max_depth: None
wandb: 	min_samples_split: 4
wandb: 	n_estimators: 8


F1_score Train: 0.9528
F1_score Valid: 0.5234
F1_score Test: 0.5981


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.59813
f1_score_train,0.95279
f1_score_valid,0.52345


wandb: Agent Starting Run: 3dyvw5os with config:
wandb: 	max_depth: None
wandb: 	min_samples_split: 4
wandb: 	n_estimators: 12


F1_score Train: 0.9686
F1_score Valid: 0.5628
F1_score Test: 0.638


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.63804
f1_score_train,0.96858
f1_score_valid,0.56278


wandb: Agent Starting Run: ary915c5 with config:
wandb: 	max_depth: None
wandb: 	min_samples_split: 4
wandb: 	n_estimators: 16


F1_score Train: 0.9807
F1_score Valid: 0.5811
F1_score Test: 0.6539


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.65393
f1_score_train,0.98066
f1_score_valid,0.58108


wandb: Agent Starting Run: 42niig9g with config:
wandb: 	max_depth: None
wandb: 	min_samples_split: 4
wandb: 	n_estimators: 20


F1_score Train: 0.9886
F1_score Valid: 0.5816
F1_score Test: 0.6641


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.66412
f1_score_train,0.98859
f1_score_valid,0.58157


wandb: Agent Starting Run: fffub3o0 with config:
wandb: 	max_depth: None
wandb: 	min_samples_split: 8
wandb: 	n_estimators: 8


F1_score Train: 0.9273
F1_score Valid: 0.5517
F1_score Test: 0.6738


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.67376
f1_score_train,0.92731
f1_score_valid,0.55168


wandb: Agent Starting Run: 70onpre9 with config:
wandb: 	max_depth: None
wandb: 	min_samples_split: 8
wandb: 	n_estimators: 12


F1_score Train: 0.9502
F1_score Valid: 0.5776
F1_score Test: 0.6812


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.68119
f1_score_train,0.95021
f1_score_valid,0.57756


wandb: Agent Starting Run: 82ny0ukq with config:
wandb: 	max_depth: None
wandb: 	min_samples_split: 8
wandb: 	n_estimators: 16


F1_score Train: 0.9585
F1_score Valid: 0.5949
F1_score Test: 0.6965


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.69647
f1_score_train,0.95848
f1_score_valid,0.59495


wandb: Agent Starting Run: 2oqeunna with config:
wandb: 	max_depth: None
wandb: 	min_samples_split: 8
wandb: 	n_estimators: 20


F1_score Train: 0.9665
F1_score Valid: 0.6103
F1_score Test: 0.6976


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.69761
f1_score_train,0.96646
f1_score_valid,0.61032


wandb: Agent Starting Run: 6zlcqz86 with config:
wandb: 	max_depth: None
wandb: 	min_samples_split: 12
wandb: 	n_estimators: 8


F1_score Train: 0.8946
F1_score Valid: 0.6086
F1_score Test: 0.6764


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.67641
f1_score_train,0.89463
f1_score_valid,0.60864


wandb: Agent Starting Run: n927x0hg with config:
wandb: 	max_depth: None
wandb: 	min_samples_split: 12
wandb: 	n_estimators: 12


F1_score Train: 0.9097
F1_score Valid: 0.6214
F1_score Test: 0.6903


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.69034
f1_score_train,0.90973
f1_score_valid,0.62144


wandb: Agent Starting Run: 4qoekrjt with config:
wandb: 	max_depth: None
wandb: 	min_samples_split: 12
wandb: 	n_estimators: 16


F1_score Train: 0.9156
F1_score Valid: 0.6284
F1_score Test: 0.7087


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.70872
f1_score_train,0.91561
f1_score_valid,0.62842


wandb: Agent Starting Run: 977m2ybo with config:
wandb: 	max_depth: None
wandb: 	min_samples_split: 12
wandb: 	n_estimators: 20


F1_score Train: 0.9239
F1_score Valid: 0.6349
F1_score Test: 0.7085


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.70848
f1_score_train,0.92385
f1_score_valid,0.63485


wandb: Agent Starting Run: ekibddlm with config:
wandb: 	max_depth: 5
wandb: 	min_samples_split: 2
wandb: 	n_estimators: 8


F1_score Train: 0.584
F1_score Valid: 0.6621
F1_score Test: 0.7371


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.73711
f1_score_train,0.58399
f1_score_valid,0.66215


wandb: Agent Starting Run: dlkxkgxq with config:
wandb: 	max_depth: 5
wandb: 	min_samples_split: 2
wandb: 	n_estimators: 12


F1_score Train: 0.5845
F1_score Valid: 0.6644
F1_score Test: 0.7396


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.7396
f1_score_train,0.58455
f1_score_valid,0.66439


wandb: Agent Starting Run: 5xzwufxf with config:
wandb: 	max_depth: 5
wandb: 	min_samples_split: 2
wandb: 	n_estimators: 16


F1_score Train: 0.5846
F1_score Valid: 0.6644
F1_score Test: 0.742


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.74198
f1_score_train,0.58462
f1_score_valid,0.66441


wandb: Agent Starting Run: xrqdlx6a with config:
wandb: 	max_depth: 5
wandb: 	min_samples_split: 2
wandb: 	n_estimators: 20


F1_score Train: 0.5801
F1_score Valid: 0.6659
F1_score Test: 0.7495


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.74953
f1_score_train,0.58014
f1_score_valid,0.66592


wandb: Agent Starting Run: ocqt31hk with config:
wandb: 	max_depth: 5
wandb: 	min_samples_split: 4
wandb: 	n_estimators: 8


F1_score Train: 0.584
F1_score Valid: 0.6621
F1_score Test: 0.7371


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.73711
f1_score_train,0.58399
f1_score_valid,0.66215


wandb: Agent Starting Run: nkgqsl69 with config:
wandb: 	max_depth: 5
wandb: 	min_samples_split: 4
wandb: 	n_estimators: 12


F1_score Train: 0.5845
F1_score Valid: 0.6644
F1_score Test: 0.7396


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.7396
f1_score_train,0.58455
f1_score_valid,0.66439


wandb: Agent Starting Run: q0g1qj5q with config:
wandb: 	max_depth: 5
wandb: 	min_samples_split: 4
wandb: 	n_estimators: 16


F1_score Train: 0.5833
F1_score Valid: 0.6667
F1_score Test: 0.746


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.746
f1_score_train,0.58335
f1_score_valid,0.66667


wandb: Agent Starting Run: dzvqyebo with config:
wandb: 	max_depth: 5
wandb: 	min_samples_split: 4
wandb: 	n_estimators: 20


F1_score Train: 0.5781
F1_score Valid: 0.6644
F1_score Test: 0.7503


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.75031
f1_score_train,0.57814
f1_score_valid,0.66443


wandb: Agent Starting Run: 312r7m4i with config:
wandb: 	max_depth: 5
wandb: 	min_samples_split: 8
wandb: 	n_estimators: 8


F1_score Train: 0.5718
F1_score Valid: 0.6637
F1_score Test: 0.7588


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.75883
f1_score_train,0.5718
f1_score_valid,0.66372


wandb: Agent Starting Run: 6ql1nq9b with config:
wandb: 	max_depth: 5
wandb: 	min_samples_split: 8
wandb: 	n_estimators: 12


F1_score Train: 0.5736
F1_score Valid: 0.6622
F1_score Test: 0.7646


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.76463
f1_score_train,0.5736
f1_score_valid,0.6622


wandb: Agent Starting Run: v1ywre3x with config:
wandb: 	max_depth: 5
wandb: 	min_samples_split: 8
wandb: 	n_estimators: 16


F1_score Train: 0.5749
F1_score Valid: 0.6644
F1_score Test: 0.7641


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.76406
f1_score_train,0.57485
f1_score_valid,0.66444


wandb: Agent Starting Run: c272bs20 with config:
wandb: 	max_depth: 5
wandb: 	min_samples_split: 8
wandb: 	n_estimators: 20


F1_score Train: 0.5688
F1_score Valid: 0.6648
F1_score Test: 0.7606


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.76063
f1_score_train,0.56878
f1_score_valid,0.66482


wandb: Agent Starting Run: en3dovn2 with config:
wandb: 	max_depth: 5
wandb: 	min_samples_split: 12
wandb: 	n_estimators: 8


F1_score Train: 0.5716
F1_score Valid: 0.6645
F1_score Test: 0.7596


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.75959
f1_score_train,0.57161
f1_score_valid,0.66445


wandb: Agent Starting Run: 17i9aft2 with config:
wandb: 	max_depth: 5
wandb: 	min_samples_split: 12
wandb: 	n_estimators: 12


F1_score Train: 0.5732
F1_score Valid: 0.6663
F1_score Test: 0.7641


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.76406
f1_score_train,0.57318
f1_score_valid,0.66629


wandb: Sweep Agent: Waiting for job.
wandb: Job received.
wandb: Agent Starting Run: r79ao9je with config:
wandb: 	max_depth: 5
wandb: 	min_samples_split: 12
wandb: 	n_estimators: 16


F1_score Train: 0.5768
F1_score Valid: 0.6655
F1_score Test: 0.7648


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.76481
f1_score_train,0.57679
f1_score_valid,0.66555


wandb: Agent Starting Run: 6f82axmm with config:
wandb: 	max_depth: 5
wandb: 	min_samples_split: 12
wandb: 	n_estimators: 20


F1_score Train: 0.5694
F1_score Valid: 0.6644
F1_score Test: 0.7623


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.76231
f1_score_train,0.56941
f1_score_valid,0.66445


wandb: Agent Starting Run: h25wwl1x with config:
wandb: 	max_depth: 10
wandb: 	min_samples_split: 2
wandb: 	n_estimators: 8


F1_score Train: 0.6871
F1_score Valid: 0.6507
F1_score Test: 0.7337


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.73373
f1_score_train,0.68709
f1_score_valid,0.65066


wandb: Agent Starting Run: dcefzft2 with config:
wandb: 	max_depth: 10
wandb: 	min_samples_split: 2
wandb: 	n_estimators: 12


F1_score Train: 0.692
F1_score Valid: 0.6546
F1_score Test: 0.7497


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.74967
f1_score_train,0.69204
f1_score_valid,0.65462


wandb: Agent Starting Run: 1zqrkaba with config:
wandb: 	max_depth: 10
wandb: 	min_samples_split: 2
wandb: 	n_estimators: 16


F1_score Train: 0.7008
F1_score Valid: 0.6529
F1_score Test: 0.7533


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.75333
f1_score_train,0.70085
f1_score_valid,0.6529


wandb: Agent Starting Run: 9xpluibx with config:
wandb: 	max_depth: 10
wandb: 	min_samples_split: 2
wandb: 	n_estimators: 20


F1_score Train: 0.7007
F1_score Valid: 0.662
F1_score Test: 0.7475


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.7475
f1_score_train,0.70067
f1_score_valid,0.66202


wandb: Agent Starting Run: hn83exbk with config:
wandb: 	max_depth: 10
wandb: 	min_samples_split: 4
wandb: 	n_estimators: 8


F1_score Train: 0.6729
F1_score Valid: 0.6359
F1_score Test: 0.7293


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.72926
f1_score_train,0.67295
f1_score_valid,0.63586


wandb: Agent Starting Run: vgx2aepp with config:
wandb: 	max_depth: 10
wandb: 	min_samples_split: 4
wandb: 	n_estimators: 12


F1_score Train: 0.6813
F1_score Valid: 0.6556
F1_score Test: 0.7505


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.75049
f1_score_train,0.68127
f1_score_valid,0.65562


wandb: Agent Starting Run: txd4zsvm with config:
wandb: 	max_depth: 10
wandb: 	min_samples_split: 4
wandb: 	n_estimators: 16


F1_score Train: 0.6958
F1_score Valid: 0.6524
F1_score Test: 0.7541


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.7541
f1_score_train,0.69584
f1_score_valid,0.65245


wandb: Agent Starting Run: p7fuja46 with config:
wandb: 	max_depth: 10
wandb: 	min_samples_split: 4
wandb: 	n_estimators: 20


F1_score Train: 0.6915
F1_score Valid: 0.6521
F1_score Test: 0.7567


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.75672
f1_score_train,0.69147
f1_score_valid,0.65207


wandb: Agent Starting Run: lph9bi7v with config:
wandb: 	max_depth: 10
wandb: 	min_samples_split: 8
wandb: 	n_estimators: 8


F1_score Train: 0.6842
F1_score Valid: 0.6154
F1_score Test: 0.7296


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.72962
f1_score_train,0.68417
f1_score_valid,0.61538


wandb: Agent Starting Run: 3smedy6x with config:
wandb: 	max_depth: 10
wandb: 	min_samples_split: 8
wandb: 	n_estimators: 12


F1_score Train: 0.6858
F1_score Valid: 0.6393
F1_score Test: 0.7294


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.72936
f1_score_train,0.68579
f1_score_valid,0.63934


wandb: Agent Starting Run: aiczqoyk with config:
wandb: 	max_depth: 10
wandb: 	min_samples_split: 8
wandb: 	n_estimators: 16


F1_score Train: 0.6938
F1_score Valid: 0.6569
F1_score Test: 0.7561


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.75615
f1_score_train,0.69384
f1_score_valid,0.65694


wandb: Agent Starting Run: 3k3xfd7w with config:
wandb: 	max_depth: 10
wandb: 	min_samples_split: 8
wandb: 	n_estimators: 20


F1_score Train: 0.6858
F1_score Valid: 0.6496
F1_score Test: 0.7588


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.75885
f1_score_train,0.68576
f1_score_valid,0.64963


wandb: Agent Starting Run: 1elx8vnv with config:
wandb: 	max_depth: 10
wandb: 	min_samples_split: 12
wandb: 	n_estimators: 8


F1_score Train: 0.6657
F1_score Valid: 0.639
F1_score Test: 0.7352


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.7352
f1_score_train,0.66565
f1_score_valid,0.63896


wandb: Agent Starting Run: 2be8y9tu with config:
wandb: 	max_depth: 10
wandb: 	min_samples_split: 12
wandb: 	n_estimators: 12


F1_score Train: 0.6756
F1_score Valid: 0.654
F1_score Test: 0.7425


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.74252
f1_score_train,0.67557
f1_score_valid,0.65396


wandb: Agent Starting Run: hptp8eiw with config:
wandb: 	max_depth: 10
wandb: 	min_samples_split: 12
wandb: 	n_estimators: 16


F1_score Train: 0.6749
F1_score Valid: 0.6483
F1_score Test: 0.7623


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.76234
f1_score_train,0.67488
f1_score_valid,0.64834


wandb: Agent Starting Run: uqjisrx0 with config:
wandb: 	max_depth: 10
wandb: 	min_samples_split: 12
wandb: 	n_estimators: 20


F1_score Train: 0.6728
F1_score Valid: 0.6506
F1_score Test: 0.7622


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.76215
f1_score_train,0.67281
f1_score_valid,0.65056


wandb: Sweep Agent: Waiting for job.
wandb: Job received.
wandb: Agent Starting Run: rbzihoj0 with config:
wandb: 	max_depth: 15
wandb: 	min_samples_split: 2
wandb: 	n_estimators: 12


F1_score Train: 0.8799
F1_score Valid: 0.5954
F1_score Test: 0.7048


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.70479
f1_score_train,0.87985
f1_score_valid,0.59542


wandb: Agent Starting Run: 6cmse08o with config:
wandb: 	max_depth: 15
wandb: 	min_samples_split: 2
wandb: 	n_estimators: 16


F1_score Train: 0.896
F1_score Valid: 0.5967
F1_score Test: 0.7221


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.7221
f1_score_train,0.89604
f1_score_valid,0.59666


wandb: Agent Starting Run: n6olf1qv with config:
wandb: 	max_depth: 15
wandb: 	min_samples_split: 2
wandb: 	n_estimators: 20


F1_score Train: 0.8975
F1_score Valid: 0.5992
F1_score Test: 0.7222


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.72222
f1_score_train,0.89751
f1_score_valid,0.59916


wandb: Agent Starting Run: 9pr93eng with config:
wandb: 	max_depth: 15
wandb: 	min_samples_split: 4
wandb: 	n_estimators: 8


F1_score Train: 0.8388
F1_score Valid: 0.5394
F1_score Test: 0.6855


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.68551
f1_score_train,0.83881
f1_score_valid,0.53944


wandb: Agent Starting Run: vumj9m23 with config:
wandb: 	max_depth: 15
wandb: 	min_samples_split: 4
wandb: 	n_estimators: 12


F1_score Train: 0.8519
F1_score Valid: 0.5725
F1_score Test: 0.7069


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.70688
f1_score_train,0.8519
f1_score_valid,0.57245


wandb: Agent Starting Run: 12fnmmj6 with config:
wandb: 	max_depth: 15
wandb: 	min_samples_split: 4
wandb: 	n_estimators: 16


F1_score Train: 0.8624
F1_score Valid: 0.5994
F1_score Test: 0.7204


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.7204
f1_score_train,0.8624
f1_score_valid,0.59943


wandb: Agent Starting Run: 54zcxkj5 with config:
wandb: 	max_depth: 15
wandb: 	min_samples_split: 4
wandb: 	n_estimators: 20


F1_score Train: 0.8601
F1_score Valid: 0.6213
F1_score Test: 0.723


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.72304
f1_score_train,0.86012
f1_score_valid,0.62132


wandb: Agent Starting Run: umhk3sam with config:
wandb: 	max_depth: 15
wandb: 	min_samples_split: 8
wandb: 	n_estimators: 8


F1_score Train: 0.7989
F1_score Valid: 0.5889
F1_score Test: 0.6872


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.68724
f1_score_train,0.79886
f1_score_valid,0.58894


wandb: Agent Starting Run: l92le2ae with config:
wandb: 	max_depth: 15
wandb: 	min_samples_split: 8
wandb: 	n_estimators: 12


F1_score Train: 0.8182
F1_score Valid: 0.621
F1_score Test: 0.7126


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.71258
f1_score_train,0.81824
f1_score_valid,0.62097


wandb: Agent Starting Run: i17w6kzi with config:
wandb: 	max_depth: 15
wandb: 	min_samples_split: 8
wandb: 	n_estimators: 16


F1_score Train: 0.8232
F1_score Valid: 0.6255
F1_score Test: 0.7089


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.70895
f1_score_train,0.82319
f1_score_valid,0.62551


wandb: Agent Starting Run: jxz61mxg with config:
wandb: 	max_depth: 15
wandb: 	min_samples_split: 8
wandb: 	n_estimators: 20


F1_score Train: 0.8271
F1_score Valid: 0.6236
F1_score Test: 0.7041


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.70413
f1_score_train,0.82712
f1_score_valid,0.62355


wandb: Agent Starting Run: j1a6xhbn with config:
wandb: 	max_depth: 15
wandb: 	min_samples_split: 12
wandb: 	n_estimators: 8


F1_score Train: 0.7851
F1_score Valid: 0.6216
F1_score Test: 0.7011


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.70113
f1_score_train,0.78511
f1_score_valid,0.62159


wandb: Agent Starting Run: xa1fjdzv with config:
wandb: 	max_depth: 15
wandb: 	min_samples_split: 12
wandb: 	n_estimators: 12


F1_score Train: 0.788
F1_score Valid: 0.6399
F1_score Test: 0.723


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.72299
f1_score_train,0.78798
f1_score_valid,0.63989


wandb: Agent Starting Run: q2udy4n8 with config:
wandb: 	max_depth: 15
wandb: 	min_samples_split: 12
wandb: 	n_estimators: 16


F1_score Train: 0.7946
F1_score Valid: 0.6396
F1_score Test: 0.7237


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.7237
f1_score_train,0.79458
f1_score_valid,0.63957


wandb: Agent Starting Run: 6lms02sy with config:
wandb: 	max_depth: 15
wandb: 	min_samples_split: 12
wandb: 	n_estimators: 20


F1_score Train: 0.8002
F1_score Valid: 0.6386
F1_score Test: 0.7196


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.71961
f1_score_train,0.80025
f1_score_valid,0.63861


wandb: Agent Starting Run: 2phxpj7n with config:
wandb: 	max_depth: 20
wandb: 	min_samples_split: 2
wandb: 	n_estimators: 8


F1_score Train: 0.9382
F1_score Valid: 0.5175
F1_score Test: 0.606


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.60601
f1_score_train,0.93817
f1_score_valid,0.51752


wandb: Agent Starting Run: pja8ou2v with config:
wandb: 	max_depth: 20
wandb: 	min_samples_split: 2
wandb: 	n_estimators: 12


F1_score Train: 0.9516
F1_score Valid: 0.5526
F1_score Test: 0.6285


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.62848
f1_score_train,0.95156
f1_score_valid,0.55255


wandb: Agent Starting Run: uxergbcr with config:
wandb: 	max_depth: 20
wandb: 	min_samples_split: 2
wandb: 	n_estimators: 16


F1_score Train: 0.9641
F1_score Valid: 0.558
F1_score Test: 0.6591


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.65906
f1_score_train,0.9641
f1_score_valid,0.55797


wandb: Agent Starting Run: tg6289u3 with config:
wandb: 	max_depth: 20
wandb: 	min_samples_split: 2
wandb: 	n_estimators: 20


F1_score Train: 0.9721
F1_score Valid: 0.5706
F1_score Test: 0.6621


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.66209
f1_score_train,0.97213
f1_score_valid,0.57057


wandb: Agent Starting Run: r24wokmt with config:
wandb: 	max_depth: 20
wandb: 	min_samples_split: 4
wandb: 	n_estimators: 8


F1_score Train: 0.9197
F1_score Valid: 0.5568
F1_score Test: 0.5902


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.59024
f1_score_train,0.91975
f1_score_valid,0.5568


wandb: Agent Starting Run: duied6bt with config:
wandb: 	max_depth: 20
wandb: 	min_samples_split: 4
wandb: 	n_estimators: 12


F1_score Train: 0.933
F1_score Valid: 0.5918
F1_score Test: 0.6409


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.64085
f1_score_train,0.93299
f1_score_valid,0.59182


wandb: Agent Starting Run: 5a11f8r7 with config:
wandb: 	max_depth: 20
wandb: 	min_samples_split: 4
wandb: 	n_estimators: 16


F1_score Train: 0.9482
F1_score Valid: 0.6011
F1_score Test: 0.6612


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.66116
f1_score_train,0.94821
f1_score_valid,0.60113


wandb: Agent Starting Run: ne9fsc95 with config:
wandb: 	max_depth: 20
wandb: 	min_samples_split: 4
wandb: 	n_estimators: 20


F1_score Train: 0.9561
F1_score Valid: 0.6147
F1_score Test: 0.6751


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.67509
f1_score_train,0.95613
f1_score_valid,0.61473


wandb: Sweep Agent: Waiting for job.
wandb: Job received.
wandb: Agent Starting Run: 8b2bbhub with config:
wandb: 	max_depth: 20
wandb: 	min_samples_split: 8
wandb: 	n_estimators: 8


F1_score Train: 0.883
F1_score Valid: 0.5923
F1_score Test: 0.6585


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.65847
f1_score_train,0.88296
f1_score_valid,0.59229


wandb: Agent Starting Run: q8psr2ca with config:
wandb: 	max_depth: 20
wandb: 	min_samples_split: 8
wandb: 	n_estimators: 12


F1_score Train: 0.9004
F1_score Valid: 0.6129
F1_score Test: 0.6667


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.66667
f1_score_train,0.90038
f1_score_valid,0.61293


wandb: Agent Starting Run: fcq19scy with config:
wandb: 	max_depth: 20
wandb: 	min_samples_split: 8
wandb: 	n_estimators: 16


F1_score Train: 0.9196
F1_score Valid: 0.6233
F1_score Test: 0.6804


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.68035
f1_score_train,0.91964
f1_score_valid,0.62332


wandb: Agent Starting Run: 8oz3ljmr with config:
wandb: 	max_depth: 20
wandb: 	min_samples_split: 8
wandb: 	n_estimators: 20


F1_score Train: 0.9251
F1_score Valid: 0.6181
F1_score Test: 0.6863


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.6863
f1_score_train,0.92514
f1_score_valid,0.61808


wandb: Agent Starting Run: dcs4nk2u with config:
wandb: 	max_depth: 20
wandb: 	min_samples_split: 12
wandb: 	n_estimators: 8


F1_score Train: 0.8545
F1_score Valid: 0.5949
F1_score Test: 0.6957


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.69571
f1_score_train,0.85454
f1_score_valid,0.59485


wandb: Agent Starting Run: ykxb6i36 with config:
wandb: 	max_depth: 20
wandb: 	min_samples_split: 12
wandb: 	n_estimators: 12


F1_score Train: 0.8655
F1_score Valid: 0.6061
F1_score Test: 0.6986


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.69859
f1_score_train,0.86555
f1_score_valid,0.6061


wandb: Agent Starting Run: acdw0f1z with config:
wandb: 	max_depth: 20
wandb: 	min_samples_split: 12
wandb: 	n_estimators: 16


F1_score Train: 0.8758
F1_score Valid: 0.6169
F1_score Test: 0.7043


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.70429
f1_score_train,0.87578
f1_score_valid,0.61689


wandb: Sweep Agent: Waiting for job.
wandb: Job received.
wandb: Agent Starting Run: rgyx4uez with config:
wandb: 	max_depth: 20
wandb: 	min_samples_split: 12
wandb: 	n_estimators: 20


F1_score Train: 0.8785
F1_score Valid: 0.6319
F1_score Test: 0.7145


wandb: 
wandb: Plotting RandomForestClassifier.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.71449
f1_score_train,0.87849
f1_score_valid,0.63194


wandb: Sweep Agent: Waiting for job.
wandb: Sweep Agent: Exiting.


In [None]:
w = {0: weights[0], 1: weights[1]}

def train_xgb_model(X_train, y_train, X_valid, y_valid, X_test, y_test):
    features = X_train.columns
    labels = ["delay"]

    # Hyperparameter grid for tuning
    param_grid = {
        'max_depth': [3, 5, 7, 10],
        'learning_rate': [0.01, 0.05, 0.1, 0.2],
        'n_estimators': [50, 100, 200],
        'min_child_weight': [1, 3, 5],
        'subsample': [0.6, 0.8, 1.0],
        'colsample_bytree': [0.6, 0.8, 1.0]
    }

    # Initialize WandB run
    with wandb.init(project=PROJECT_NAME) as run:
        config = wandb.config

        # Initialize the XGBoost model
        xgbmodel = XGBClassifier(
            random_state=7,
            class_weight=w
        )

        # Perform GridSearchCV to find the best hyperparameters
        grid_search = GridSearchCV(estimator=xgbmodel, param_grid=param_grid, cv=3, scoring='f1', verbose=1, n_jobs=-1)
        grid_search.fit(X_train, y_train)

        # Get the best model
        best_model = grid_search.best_estimator_

        # Train predictions
        y_train_pred = best_model.predict(X_train)
        train_f1_score = f1_score(y_train, y_train_pred)

        # Validation predictions
        y_valid_pred = best_model.predict(X_valid)
        valid_f1_score = f1_score(y_valid, y_valid_pred)

        # Test predictions
        y_preds = best_model.predict(X_test)
        y_probas = best_model.predict_proba(X_test)
        score = f1_score(y_test, y_preds)

        print(f"F1_score Train: {round(train_f1_score, 4)}")
        print(f"F1_score Valid: {round(valid_f1_score, 4)}")
        print(f"F1_score Test: {round(score, 4)}")

        # Log the metrics to WandB
        wandb.log({"f1_score_train": train_f1_score})
        wandb.log({"f1_score_valid": valid_f1_score})
        wandb.log({"f1_score": score})

        # Plot the classifier
        wandb.sklearn.plot_classifier(
            best_model, X_train, X_test, y_train, y_test, y_preds, y_probas, labels=None,
            model_name='XGBoost', feature_names=features
        )

        # Log the model artifact
        model_artifact = wandb.Artifact("XGBoostClassifier", type="model", metadata=dict(config))
        joblib.dump(best_model, "xgb_tuned_model.pkl")
        model_artifact.add_file("xgb_tuned_model.pkl")
        run.log_artifact(model_artifact)

# Call the function with your data
train_xgb_model(X_train, y_train, X_valid, y_valid, X_test, y_test)



Fitting 3 folds for each of 1296 candidates, totalling 3888 fits


Parameters: { "class_weight" } are not used.



F1_score Train: 0.9772
F1_score Valid: 0.6164
F1_score Test: 0.6945


wandb: 
wandb: Plotting XGBoost.
wandb: Logged feature importances.
wandb: Logged confusion matrix.
wandb: Logged summary metrics.
wandb: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
Parameters: { "class_weight" } are not used.

Parameters: { "class_weight" } are not used.

Parameters: { "class_weight" } are not used.

Parameters: { "class_weight" } are not used.

wandb: Logged calibration curve.
wandb: Logged roc curve.
wandb: Logged precision-recall curve.


0,1
f1_score,▁
f1_score_train,▁
f1_score_valid,▁

0,1
f1_score,0.69449
f1_score_train,0.97724
f1_score_valid,0.61641


## **Conclusion**

In this phase, we addressed the critical issue of delayed truck shipments in the logistics industry by enhancing our ability to predict delays. Here's a concise summary of our approach:

* Leveraged the Hopsworks feature store for efficient data retrieval.
* Prepared the data by splitting it into training, validation, and test sets, ensuring robust model performance on unseen data, and handled missing values to maintain prediction accuracy.
* Developed and tested logistic regression, random forest, and XGBoost models to identify the most effective solutions.
* Applied hyperparameter tuning to optimize model accuracy and performance.
* Built a Streamlit application to provide an interactive interface, making insights and predictions easily accessible.