Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training is extremely slow on Gluonts [Torch] #3081

Open
khawar-islam opened this issue Dec 13, 2023 · 1 comment
Open

Training is extremely slow on Gluonts [Torch] #3081

khawar-islam opened this issue Dec 13, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@khawar-islam
Copy link

khawar-islam commented Dec 13, 2023

Description

I am quite frustrating because I am training a model and training is very very slow on RTX 3080. I am training on 500 CSV files. If anyone can help me for this,

To Reproduce

Define the DeepAR estimator

estimator = DeepAREstimator(
    prediction_length=12,  # Adjust based on how far you want to predict
    context_length=24,  # Context length should be at least as long as prediction length
    freq="1min",  # Change to your data's frequency
    batch_size=64,
    trainer_kwargs={"max_epochs": 1, "accelerator": "gpu"}
).train(training_data, )
predictor = estimator.train(training_data=training_data)

# Load your dataset
# Base directory where the folders are located
base_dir = '/media/cvpr/CM_1/coremax_cpu_usage/coremax_cpu/rnd'

# List of folder names
folders = ['2013-7', '2013-8', '2013-9']

# Initialize an empty DataFrame to store all data
all_data = pd.DataFrame()

# Iterate over each folder and read each file
for folder in folders:
    folder_path = os.path.join(base_dir, folder)
    for file in os.listdir(folder_path):
        if file.endswith('.csv'):
            file_path = os.path.join(folder_path, file)
            temp_df = pd.read_csv(file_path, delimiter=';')
            temp_df.columns = temp_df.columns.str.strip()  # Strip whitespace from column names here
            all_data = pd.concat([all_data, temp_df], ignore_index=True)

print(all_data)

# Convert timestamp to datetime and set it as the index
all_data['Timestamp'] = pd.to_datetime(all_data['Timestamp [ms]'], unit='ms')
all_data.set_index('Timestamp', inplace=True)

# Prepare the dataset for GluonTS
training_data = ListDataset([{
    "start": all_data.index[0],
    "target": all_data['CPU usage [MHZ]'].values,
    "feat_dynamic_real": all_data[
        ['CPU cores', 'Memory usage [KB]', 'Disk read throughput [KB/s]', 'Disk write throughput [KB/s]',
         'Network received throughput [KB/s]', 'Network transmitted throughput [KB/s]']].values.T
}], freq="1min")  # Change '5min' to the actual frequency of your data

# Define the DeepAR estimator
estimator = DeepAREstimator(
    prediction_length=12,  # Adjust based on how far you want to predict
    context_length=24,  # Context length should be at least as long as prediction length
    freq="1min",  # Change to your data's frequency
    batch_size=64,
    trainer_kwargs={"max_epochs": 1, "accelerator": "gpu"}
).train(training_data, )```

## Error message or code output
(Paste the complete error message, including stack trace, or the undesired output that the above snippet produces.)

Epoch 0: | | 3/? [08:49<00:00, 0.01it/s, v_num=22]


## Environment
- Operating system: 20.04
- Python version: 3.8.18
- GluonTS version: 0.14.3
- MXNet version: Using torch

(Add as much information about your environment as possible, e.g. dependencies versions.)
@khawar-islam khawar-islam added the bug Something isn't working label Dec 13, 2023
@lostella
Copy link
Contributor

lostella commented Dec 15, 2023

@khawar-islam what is the performance when running on CPU?

I'm not sure you can expect great performance with a DeepAR model (at least with default hyperparameters) since it's based on a recurrent neural network: this makes the model operations non-parallelizable, hence the GPU utilization will be extremely low.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants