# AIML-for-5g-energy-consumption-modelling

The project brief focuses on the energy consumption challenges that come with 5G networks. It outlines the importance of understanding the parameters and methods that impact the energy consumption of base stations in these networks.

Objectives:
Objective A: Develop a model able to estimate the energy consumed by different base station products. The model should consider various engineering configurations, traffic conditions, and energy-saving methods.

Objective B: Achieve generalization capabilities across different base station products. The model should estimate the energy consumption of a new base station product based on measurements from existing ones.

Objective C: Achieve generalization capabilities across different base station configurations. The model should predict the energy consumption of newly configured parameters based on a small number of real network configuration parameters.

The project is part of the "AI for Good - International Telecommunication Union (ITU)" initiative, which aims to identify practical applications of AI to advance the United Nations Sustainable Development Goals.

Now that we have an understanding of the project brief, Let us generate  the solutuion to our problem in codes.

In [24]:
import pandas as pd

In [25]:
# #Mount the drive where the downloaded files to be processed are reposed
# from google.colab import drive
# drive.mount('/content/drive')

In [5]:
# Load the datasets
base_station_info_df = pd.read_csv('Base_Station_basic_information.csv')
cell_level_data_df = pd.read_csv('Cell_level_data.csv')
energy_consumption_data_df = pd.read_csv('Energy_consumption_data.csv')
sample_submission_df = pd.read_csv('SampleSubmission.csv')
power_consumption_prediction_df = pd.read_csv('power_consumption_prediction.csv')


base_station_info_df.head()

Unnamed: 0,BS,CellName,RUType,Mode,Frequency,Bandwidth,Antennas,TXpower
0,B_0,Cell0,Type1,Mode2,365.0,20,4,6.875934
1,B_1,Cell0,Type2,Mode2,532.0,20,4,6.875934
2,B_2,Cell0,Type1,Mode2,365.0,20,4,6.875934
3,B_3,Cell0,Type2,Mode2,532.0,20,4,6.875934
4,B_4,Cell0,Type2,Mode2,532.0,20,4,6.875934


In [31]:
sample_submission_df.head()

Unnamed: 0,ID,Energy
0,2023-01-01 06:00:00_B_0,0
1,2023-01-01 11:00:00_B_0,0
2,2023-01-01 12:00:00_B_0,0
3,2023-01-01 13:00:00_B_0,0
4,2023-01-01 23:00:00_B_0,0


In [33]:
energy_consumption_data_df

0        64.275037
1        55.904335
2        57.698057
3        55.156951
4        56.053812
           ...    
92624    14.648729
92625    14.648729
92626    13.452915
92627    13.602392
92628    13.303438
Name: Energy, Length: 92629, dtype: float64

Now that we have loaded and inspected all the dataset files, we can start working on the objectives mentioned in the project brief. The first objective is to develop a model able to estimate the energy consumed by different base station products.

Shall we start with the data preprocessing and exploration?

In [6]:
# Check for missing values in each dataset
missing_values = {
    'Base Station Info': base_station_info_df.isnull().sum(),
    'Cell Level Data': cell_level_data_df.isnull().sum(),
    'Energy Consumption Data': energy_consumption_data_df.isnull().sum(),
    'Sample Submission': sample_submission_df.isnull().sum(),
    'Power Consumption Prediction': power_consumption_prediction_df.isnull().sum()
}

missing_values


{'Base Station Info': BS           0
 CellName     0
 RUType       0
 Mode         0
 Frequency    0
 Bandwidth    0
 Antennas     0
 TXpower      0
 dtype: int64,
 'Cell Level Data': Time        0
 BS          0
 CellName    0
 load        0
 ESMode1     0
 ESMode2     0
 ESMode3     0
 ESMode4     0
 ESMode5     0
 ESMode6     0
 dtype: int64,
 'Energy Consumption Data': Time      0
 BS        0
 Energy    0
 dtype: int64,
 'Sample Submission': ID        0
 Energy    0
 dtype: int64,
 'Power Consumption Prediction': Time          0
 BS            0
 Energy    26139
 w             0
 dtype: int64}

The 'Power Consumption Prediction' dataset has missing values in the 'Energy' column. Since this dataset seems to be a part of the submission file, we will not use it for model training. The other datasets do not have any missing values.

Next, let's merge the relevant datasets into a single dataframe that we can use for model training and testing. We will merge the 'Base Station Info', 'Cell Level Data', and 'Energy Consumption Data' datasets based on the 'Time' and 'BS' columns.

In [7]:
# Merge the datasets
merged_df = pd.merge(cell_level_data_df, energy_consumption_data_df, on=['Time', 'BS'])
merged_df = pd.merge(merged_df, base_station_info_df, on=['BS', 'CellName'])

# Display the first few rows of the merged dataset
merged_df.head()


Unnamed: 0,Time,BS,CellName,load,ESMode1,ESMode2,ESMode3,ESMode4,ESMode5,ESMode6,Energy,RUType,Mode,Frequency,Bandwidth,Antennas,TXpower
0,1/1/2023 1:00,B_0,Cell0,0.487936,0.0,0.0,0.0,0,0.0,0.0,64.275037,Type1,Mode2,365.0,20,4,6.875934
1,1/1/2023 2:00,B_0,Cell0,0.344468,0.0,0.0,0.0,0,0.0,0.0,55.904335,Type1,Mode2,365.0,20,4,6.875934
2,1/1/2023 3:00,B_0,Cell0,0.193766,0.0,0.0,0.0,0,0.0,0.0,57.698057,Type1,Mode2,365.0,20,4,6.875934
3,1/1/2023 4:00,B_0,Cell0,0.222383,0.0,0.0,0.0,0,0.0,0.0,55.156951,Type1,Mode2,365.0,20,4,6.875934
4,1/1/2023 5:00,B_0,Cell0,0.175436,0.0,0.0,0.0,0,0.0,0.0,56.053812,Type1,Mode2,365.0,20,4,6.875934


The datasets have been successfully merged. The merged dataset contains the following columns:

* Time: The timestamp of the data.
* BS: The base station identifier.
* CellName: The name of the cell.
* load: The load on the cell.
* ESMode1 - ESMode6: The energy-saving modes.
* Energy: The energy consumed by the base station.
* RUType: The Radio Unit (RU) type.
* Mode: The operation mode of the base station.
* Frequency: The operating frequency (in MHz).
* Bandwidth: The bandwidth (in MHz).
* Antennas: The number of antennas.
* TXpower: The transmission power (in dBm).

Next, let's perform some basic data exploration to understand the data * better. We will start by checking the data types of each column and then look at the summary statistics of the numerical columns.

In [8]:
# Check the data types of each column
data_types = merged_df.dtypes

# Get the summary statistics of the numerical columns
summary_statistics = merged_df.describe()

data_types, summary_statistics


(Time          object
 BS            object
 CellName      object
 load         float64
 ESMode1      float64
 ESMode2      float64
 ESMode3      float64
 ESMode4        int64
 ESMode5      float64
 ESMode6      float64
 Energy       float64
 RUType        object
 Mode          object
 Frequency    float64
 Bandwidth      int64
 Antennas       int64
 TXpower      float64
 dtype: object,
                load       ESMode1       ESMode2       ESMode3  ESMode4  \
 count  98084.000000  98084.000000  98084.000000  98084.000000  98084.0   
 mean       0.242025      0.039012      0.037728      0.000082      0.0   
 std        0.236798      0.187290      0.184540      0.002393      0.0   
 min        0.000000      0.000000      0.000000      0.000000      0.0   
 25%        0.055922      0.000000      0.000000      0.000000      0.0   
 50%        0.157570      0.000000      0.000000      0.000000      0.0   
 75%        0.358069      0.000000      0.000000      0.000000      0.0   
 max      

The merged dataset contains both numerical and categorical columns. The categorical columns are 'Time', 'BS', 'CellName', 'RUType', and 'Mode'. The other columns are numerical.

The summary statistics show that the 'ESMode4' column has a constant value of 0, so it does not provide any useful information and can be dropped. The 'Energy' column, which is our target variable, has a minimum value of 0.75 and a maximum value of 100.

Next, let's encode the categorical variables and normalize the numerical variables.

In [9]:
from sklearn.preprocessing import StandardScaler, LabelEncoder

# Drop the 'ESMode4' column
merged_df = merged_df.drop(columns=['ESMode4'])

# Encode the categorical variables
label_encoder = LabelEncoder()
merged_df['BS'] = label_encoder.fit_transform(merged_df['BS'])
merged_df['CellName'] = label_encoder.fit_transform(merged_df['CellName'])
merged_df['RUType'] = label_encoder.fit_transform(merged_df['RUType'])
merged_df['Mode'] = label_encoder.fit_transform(merged_df['Mode'])

# Normalize the numerical variables
scaler = StandardScaler()
merged_df[['load', 'ESMode1', 'ESMode2', 'ESMode3', 'ESMode5', 'ESMode6', 'Frequency', 'Bandwidth', 'Antennas', 'TXpower']] = scaler.fit_transform(merged_df[['load', 'ESMode1', 'ESMode2', 'ESMode3', 'ESMode5', 'ESMode6', 'Frequency', 'Bandwidth', 'Antennas', 'TXpower']])

# Display the first few rows of the processed dataset
merged_df.head()


Unnamed: 0,Time,BS,CellName,load,ESMode1,ESMode2,ESMode3,ESMode5,ESMode6,Energy,RUType,Mode,Frequency,Bandwidth,Antennas,TXpower
0,1/1/2023 1:00,0,0,1.038494,-0.208296,-0.204442,-0.03441,-0.004496,-0.203789,64.275037,0,1,-0.029199,0.714609,0.891641,0.466964
1,1/1/2023 2:00,0,0,0.432622,-0.208296,-0.204442,-0.03441,-0.004496,-0.203789,55.904335,0,1,-0.029199,0.714609,0.891641,0.466964
2,1/1/2023 3:00,0,0,-0.203798,-0.208296,-0.204442,-0.03441,-0.004496,-0.203789,57.698057,0,1,-0.029199,0.714609,0.891641,0.466964
3,1/1/2023 4:00,0,0,-0.082948,-0.208296,-0.204442,-0.03441,-0.004496,-0.203789,55.156951,0,1,-0.029199,0.714609,0.891641,0.466964
4,1/1/2023 5:00,0,0,-0.281206,-0.208296,-0.204442,-0.03441,-0.004496,-0.203789,56.053812,0,1,-0.029199,0.714609,0.891641,0.466964


In [56]:
power_consumption_prediction_df['w'].unique()

array([1, 5])

In [57]:
merged_df['Mode'].unique()

array([1, 0])

The categorical variables have been encoded, and the numerical variables have been normalized.

Next, we need to split the data into features (X) and target (y), and then split the data into training and testing sets. The target variable is 'Energy', and the rest of the columns, except 'Time', will be used as features.

In [10]:
from sklearn.model_selection import train_test_split

# Define the features and the target
X = merged_df.drop(columns=['Time', 'Energy'])
y = merged_df['Energy']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

X_train.shape, X_test.shape, y_train.shape, y_test.shape


((78467, 14), (19617, 14), (78467,), (19617,))

The data has been successfully split into training and testing sets. The training set contains 78,467 samples, and the testing set contains 19,617 samples.

Now we can start building the model. Since this is a regression problem, we can start with a simple linear regression model and then try more complex models if necessary.

Now lets start building the model

In [11]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import numpy as np

# Define the model
model = LinearRegression()

# Fit the model to the training data
model.fit(X_train, y_train)

# Predict the target for the testing set
y_pred = model.predict(X_test)

# Calculate the root mean squared error (RMSE)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))

rmse


6.494765755484195

The root mean squared error (RMSE) of the model on the testing set is approximately 6.49. This is a measure of how well the model is able to predict the energy consumption of the base stations.

Next, let's try a more complex model to see if we can improve the performance. We will use the Random Forest Regressor, which is an ensemble learning method that can be used for both classification and regression tasks.

In [18]:
from sklearn.ensemble import RandomForestRegressor

# Define the model
rf_model = RandomForestRegressor(random_state=42, max_depth=None, min_samples_leaf=1, min_samples_split=10, n_estimators=120)

# Fit the model to the training data
rf_model.fit(X_train, y_train)

# Predict the target for the testing set
y_rf_pred = rf_model.predict(X_test)

# Calculate the RMSE
rf_rmse = np.sqrt(mean_squared_error(y_test, y_rf_pred))

rf_rmse


2.9564642600967836

In [65]:
test_data= power_consumption_prediction_df.drop(['Time', 'Energy'], axis= 1)
test_data.head()
# test_data1= label_encoder.fit_transform(test_data)

Unnamed: 0,BS,w
0,B_0,1
1,B_0,1
2,B_0,1
3,B_0,1
4,B_0,1


In [59]:
test_pred = rf_model.predict(

(19617,)

In [23]:
from sklearn.ensemble import RandomForestRegressor

# Define the model
rf_model = RandomForestRegressor(random_state=42, max_depth=30, min_samples_leaf=1, min_samples_split=10, n_estimators=500)

# Fit the model to the training data
rf_model.fit(X_train, y_train)

# Predict the target for the testing set
y_rf_pred = rf_model.predict(X_test)

# Calculate the RMSE
rf_rmse = np.sqrt(mean_squared_error(y_test, y_rf_pred))

rf_rmse

2.949130578487861

The RMSE of the Random Forest model on the testing set is approximately 3.08, which is a significant improvement over the Linear Regression model.

Next, let's try another model, the Gradient Boosting Regressor, which is an ensemble learning method that builds an additive model in a forward stage-wise fashion.

In [15]:
from sklearn.ensemble import GradientBoostingRegressor

# Define the model
gb_model = GradientBoostingRegressor(random_state=42)

# Fit the model to the training data
gb_model.fit(X_train, y_train)

# Predict the target for the testing set
y_gb_pred = gb_model.predict(X_test)

# Calculate the RMSE
gb_rmse = np.sqrt(mean_squared_error(y_test, y_gb_pred))

gb_rmse

4.337373321220264

The RMSE of the Gradient Boosting model on the testing set is approximately 4.34, which is better than the Linear Regression model but not as good as the Random Forest model.

Considering the performance of the models, the Random Forest model seems to be the best choice for this task.

Next, we should tune the hyperparameters of the Random Forest model to see if we can further improve its performance. We can use grid search with cross-validation for this purpose.

In [54]:
# from sklearn.model_selection import GridSearchCV

# # Define the parameter grid
# param_grid = {
#     'n_estimators': [50, 100, 150],
#     'max_depth': [None, 10, 20, 30],
#     'min_samples_split': [2, 5, 10],
#     'min_samples_leaf': [1, 2, 4],
# }

# # Define the grid search
# grid_search = GridSearchCV(estimator=rf_model, param_grid=param_grid,
#                            cv=3, n_jobs=-1, verbose=2, scoring='neg_mean_squared_error')

# # Fit the grid search to the data
# grid_search.fit(X_train, y_train)

# # Get the best parameters
# best_params = grid_search.best_params_

# best_params


Let's use the Random Forest model with default parameters to make predictions on the test set and prepare the submission file.

The submission file should contain two columns: 'ID' and 'Energy'. The 'ID' column should contain a combination of the 'Time' and 'BS' columns from the test set, and the 'Energy' column should contain the predicted energy consumption.

Let's prepare the submission file.

In [30]:
X_test

Unnamed: 0,BS,CellName,load,ESMode1,ESMode2,ESMode3,ESMode5,ESMode6,RUType,Mode,Frequency,Bandwidth,Antennas,TXpower
72183,580,0,0.230643,-0.208296,-0.204442,-0.03441,-0.004496,-0.203789,6,1,1.196163,0.714609,-0.091681,0.466964
90260,751,0,1.481380,-0.208296,-0.204442,-0.03441,-0.004496,-0.203789,6,1,1.196163,0.714609,-0.091681,0.466964
73136,590,0,-0.652014,-0.208296,-0.204442,-0.03441,-0.004496,-0.203789,0,1,-0.029199,0.714609,-0.091681,0.466964
48411,362,0,-0.642513,-0.208296,-0.204442,-0.03441,-0.004496,-0.203789,8,1,-1.320598,-1.942535,-0.583343,-2.473784
64203,508,0,0.800341,-0.208296,-0.204442,-0.03441,-0.004496,-0.203789,9,1,-0.029199,0.714609,-0.091681,0.466964
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
77038,627,0,-0.860337,-0.208296,-0.204442,-0.03441,-0.004496,-0.203789,8,1,-1.320598,-1.056820,-0.091681,-1.003410
10661,868,0,-0.018854,-0.208296,-0.204442,-0.03441,-0.004496,-0.203789,6,1,1.196163,0.714609,-0.091681,0.466964
38201,261,1,-0.546338,-0.208296,-0.204442,-0.03441,-0.004496,4.618806,0,1,0.425579,-2.473964,-0.091681,1.942239
33522,217,0,1.790733,-0.208296,-0.204442,-0.03441,-0.004496,-0.203789,9,1,-0.029199,0.714609,-0.091681,0.466964


In [None]:
# Prepare the submission file
submission_df = X_test.copy()
submission_df['Energy'] = y_rf_pred
submission_df['Time'] = merged_df.loc[submission_df.index, 'Time']
submission_df['ID'] = submission_df['Time'] + ' x ' + submission_df['BS'].astype(str)
submission_df = submission_df[['ID', 'Energy']]

# Save the submission file
submission_file = '/content/drive/MyDrive/AI_ML/submission_rf.csv'
submission_df.to_csv(submission_file, index=False)

submission_df.head()

Unnamed: 0,ID,Energy
72183,1/6/2023 1:00 x 580,24.439462
90260,1/2/2023 22:00 x 751,32.717489
73136,1/1/2023 2:00 x 590,31.476831
48411,1/2/2023 20:00 x 362,13.248132
64203,1/5/2023 2:00 x 508,41.600897


In [None]:
sample_submission_df.head()

Unnamed: 0,ID,Energy
0,2023-01-01 06:00:00_B_0,0
1,2023-01-01 11:00:00_B_0,0
2,2023-01-01 12:00:00_B_0,0
3,2023-01-01 13:00:00_B_0,0
4,2023-01-01 23:00:00_B_0,0


In [None]:
# Prepare the submission file for the Linear Regression model
submission_lr_df = X_test.copy()
submission_lr_df['Energy'] = y_pred
submission_lr_df['Time'] = merged_df.loc[submission_lr_df.index, 'Time']
submission_lr_df['ID'] = submission_lr_df['Time'] + ' x ' + submission_lr_df['BS'].astype(str)
submission_lr_df = submission_lr_df[['ID', 'Energy']]

# Save the submission file
submission_lr_file = '/content/drive/MyDrive/AI_ML/submission_lr.csv'
submission_lr_df.to_csv(submission_lr_file, index=False)

# Prepare the submission file for the Gradient Boosting model
submission_gb_df = X_test.copy()
submission_gb_df['Energy'] = y_gb_pred
submission_gb_df['Time'] = merged_df.loc[submission_gb_df.index, 'Time']
submission_gb_df['ID'] = submission_gb_df['Time'] + ' x ' + submission_gb_df['BS'].astype(str)
submission_gb_df = submission_gb_df[['ID', 'Energy']]

# Save the submission file
submission_gb_file = '//content/drive/MyDrive/AI_ML/submission_gb.csv'
submission_gb_df.to_csv(submission_gb_file, index=False)

submission_lr_df.head(), submission_gb_df.head()


(                         ID     Energy
 72183   1/6/2023 1:00 x 580  25.971260
 90260  1/2/2023 22:00 x 751  34.384028
 73136   1/1/2023 2:00 x 590  38.985013
 48411  1/2/2023 20:00 x 362  10.059740
 64203   1/5/2023 2:00 x 508  30.146510,
                          ID     Energy
 72183   1/6/2023 1:00 x 580  25.527620
 90260  1/2/2023 22:00 x 751  32.631790
 73136   1/1/2023 2:00 x 590  38.794447
 48411  1/2/2023 20:00 x 362  13.133785
 64203   1/5/2023 2:00 x 508  35.853600)

let's revisit the model selection and hyperparameter tuning processes.

The RandomizedSearchCV and GridSearchCV techniques took too long to run because the dataset is quite large and the parameter grid is extensive.

An alternative approach could be to use a more efficient model selection technique like Bayesian optimization. Bayesian optimization is a global optimization technique for noisy black-box functions. It works by building a probabilistic model of the function and using it to select the most promising hyperparameters to evaluate in the true objective function.

We will use the Bayesian optimization technique to optimize the hyperparameters of the Random Forest model.

We will use the hyperopt library, which is a Python library for optimizing the hyperparameters of machine learning models.

Let's start by defining the parameter space and the objective function.

In [None]:
pip install hyperopt



In [None]:
from hyperopt import fmin, tpe, hp
from sklearn.model_selection import cross_val_score

# Define the parameter space
space = {
    'n_estimators': hp.quniform('n_estimators', 50, 150, 1),
    'max_depth': hp.choice('max_depth', [None, 10, 20, 30]),
    'min_samples_split': hp.quniform('min_samples_split', 2, 10, 1),
    'min_samples_leaf': hp.quniform('min_samples_leaf', 1, 4, 1),
}

# Define the objective function
def objective(params):
    params = {
        'n_estimators': int(params['n_estimators']),
        'max_depth': params['max_depth'],
        'min_samples_split': int(params['min_samples_split']),
        'min_samples_leaf': int(params['min_samples_leaf']),
    }
    rf_model = RandomForestRegressor(**params, random_state=42)
    score = cross_val_score(rf_model, X_train, y_train, cv=3, scoring='neg_mean_squared_error').mean()
    return -score

# Run the optimization
best = fmin(fn=objective, space=space, max_evals=10, rstate=np.random.default_rng(42), algo=tpe.suggest)

best


100%|██████████| 10/10 [05:26<00:00, 32.69s/trial, best loss: 9.404909773921807]


{'max_depth': 0,
 'min_samples_leaf': 1.0,
 'min_samples_split': 10.0,
 'n_estimators': 120.0}

In [None]:
# Define the cross-validation function
def cross_val_score_rmse(model, X, y):
    scores = cross_val_score(model, X, y, cv=3, scoring='neg_mean_squared_error')
    return np.sqrt(-scores.mean())

# Evaluate the model's performance for each set of hyperparameters
results = []
for params in hyperparams:
    rf_model = RandomForestRegressor(random_state=42, **params)
    rmse = cross_val_score_rmse(rf_model, X_train, y_train)
    results.append((rmse, params))

# Get the best set of hyperparameters
best_rmse, best_params = min(results)

best_rmse, best_params


NameError: ignored

In [None]:
def get_args():
    """
    Get training arguments
    """
    parser = argparse.ArgumentParser()
    # Required parameters
    parser.add_argument(
        "--data_dir",
        default=None,
        type=str,
        help="The input data dir. Should contain the training files for the CoNLL-2003 NER task.",
    )
    parser.add_argument(
        "--model_type",
        default=None,
        type=str,
        #help="Model type selected in the list: " + ", ".join(MODEL_CLASSES.keys()),
    )
    parser.add_argument(
        "--model_name_or_path",
        default=None,
        type=str,
        help="Path to pre-trained model or shortcut name selected in the list: " + ", ",
    )
    parser.add_argument(
        "--input_dir",
        default=None,
        type=str,
        required=False,
        help="The input model directory.",
    )
    parser.add_argument(
        "--output_dir",
        default=None,
        type=str,
        help="The output directory where the model predictions and checkpoints will be written.",
    )

    # Other parameters
    parser.add_argument(
        "--labels",
        default="",
        type=str,
        help="Path to a file containing all labels. If not specified, CoNLL-2003 labels are used.",
    )
    parser.add_argument(
        "--config_name", default="", type=str, help="Pretrained config name or path if not the same as model_name"
    )
    parser.add_argument(
        "--tokenizer_name",
        default="",
        type=str,
        help="Pretrained tokenizer name or path if not the same as model_name",
    )
    parser.add_argument(
        "--cache_dir",
        default="",
        type=str,
        help="Where do you want to store the pre-trained models downloaded from s3",
    )
    parser.add_argument(
        "--max_seq_length",
        default=128,
        type=int,
        help="The maximum total input sequence length after tokenization. Sequences longer "
        "than this will be truncated, sequences shorter will be padded.",
    )
    parser.add_argument("--do_train", action="store_true", help="Whether to run training.")
    parser.add_argument("--do_finetune", action="store_true", help="Whether to run training.")
    parser.add_argument("--do_eval", action="store_true", help="Whether to run eval on the dev set.")
    parser.add_argument("--do_predict", action="store_true", help="Whether to run predictions on the test set.")
    parser.add_argument(
        "--evaluate_during_training",
        action="store_true",
        help="Whether to run evaluation during training at each logging step.",
    )
    parser.add_argument(
        "--do_lower_case", action="store_true", help="Set this flag if you are using an uncased model."
    )

    parser.add_argument("--per_gpu_train_batch_size", default=8, type=int, help="Batch size per GPU/CPU for training.")
    parser.add_argument(
        "--per_gpu_eval_batch_size", default=8, type=int, help="Batch size per GPU/CPU for evaluation."
    )
    parser.add_argument(
        "--gradient_accumulation_steps",
        type=int,
        default=1,
        help="Number of updates steps to accumulate before performing a backward/update pass.",
    )
    parser.add_argument("--learning_rate", default=5e-5, type=float, help="The initial learning rate for Adam.")
    parser.add_argument("--weight_decay", default=0.0, type=float, help="Weight decay if we apply some.")
    parser.add_argument("--adam_epsilon", default=1e-8, type=float, help="Epsilon for Adam optimizer.")
    parser.add_argument("--max_grad_norm", default=1.0, type=float, help="Max gradient norm.")
    parser.add_argument(
        "--num_train_epochs", default=3.0, type=float, help="Total number of training epochs to perform."
    )
    parser.add_argument(
        "--max_steps",
        default=-1,
        type=int,
        help="If > 0: set total number of training steps to perform. Override num_train_epochs.",
    )
    parser.add_argument("--warmup_steps", default=0, type=int, help="Linear warmup over warmup_steps.")

    parser.add_argument("--logging_steps", type=int, default=500, help="Log every X updates steps.")
    parser.add_argument("--save_steps", type=int, default=500, help="Save checkpoint every X updates steps.")
    parser.add_argument(
        "--eval_all_checkpoints",
        action="store_true",
        help="Evaluate all checkpoints starting with the same prefix as model_name ending and ending with step number",
    )
    parser.add_argument("--no_cuda", action="store_true", help="Avoid using CUDA when available")
    parser.add_argument(
        "--overwrite_output_dir", action="store_true", help="Overwrite the content of the output directory"
    )
    parser.add_argument(
        "--overwrite_cache", action="store_true", help="Overwrite the cached training and evaluation sets"
    )
    parser.add_argument("--seed", type=int, default=42, help="random seed for initialization")

    parser.add_argument("--local_rank", type=int, default=-1, help="For distributed training: local_rank")
    parser.add_argument("--server_ip", type=str, default="", help="For distant debugging.")
    parser.add_argument("--server_port", type=str, default="", help="For distant debugging.")
