# InfluxDB Logger Example

This notebook is a small demo of how to use gpumon in Jupyter notebooks and some convenience methods for working with GPUs
You will need to have PyTorch and Torchvision installed to run this as well as the python InfluxDB client

To install Pytorch and associated requiremetns run the following:
```bash
cuda install pytorch torchvision cuda80 -c python
```

To install python InfluxDB client
```bash
pip install influxdb
```
see [here](https://github.com/influxdata/influxdb-python) for more details on the InfluxDB client

In [37]:
from gpumon import device_count, device_name

In [38]:
device_count() # Returns the number of GPUs available

4

In [39]:
device_name() # Returns the type of GPU available

'Tesla P40'

Let's create a simple CNN and run the CIFAR dataset against it to see the load on our GPU

In [1]:
import torch
import torchvision
import torchvision.transforms as transforms

In [2]:
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64,
                                          shuffle=True, num_workers=4)

classes = ('plane', 'car', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

Files already downloaded and verified


In [3]:
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()

In [4]:
net.cuda()

Net(
  (conv1): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
  (pool): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False)
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)

In [5]:
import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

In [6]:
from gpumon.influxdb import log_context

In [7]:
display_every_minibatches=100

Be carefull that you specify the correct host and credentials in the context below

In [8]:
with log_context('localhost', 'admin', 'password', 'gpudb', 'gpuseries'):
    for epoch in range(20):  # loop over the dataset multiple times

        running_loss = 0.0
        for i, data in enumerate(trainloader, 0):
            # get the inputs
            inputs, labels = data

            # wrap them in Variable
            inputs, labels = Variable(inputs.cuda()), Variable(labels.cuda())

            # zero the parameter gradients
            optimizer.zero_grad()

            # forward + backward + optimize
            outputs = net(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            # print statistics
            running_loss += loss.data[0]
        print('[%d] loss: %.3f' %
              (epoch + 1, running_loss / (i+1)))

    print('Finished Training')

[1] loss: 2.301
[2] loss: 2.241
[3] loss: 1.954
[4] loss: 1.778
[5] loss: 1.670
[6] loss: 1.593
[7] loss: 1.532
[8] loss: 1.475
[9] loss: 1.422
[10] loss: 1.370
[11] loss: 1.330
[12] loss: 1.289
[13] loss: 1.254
[14] loss: 1.221
[15] loss: 1.194
[16] loss: 1.161
[17] loss: 1.137
[18] loss: 1.111
[19] loss: 1.086
[20] loss: 1.069
Finished Training


If you had your Grafana dashboard running you should have seen the measurements there. You can also pull the data from the database using the InfluxDB python client

In [22]:
from influxdb import InfluxDBClient, DataFrameClient

In [14]:
client = InfluxDBClient(host='localhost', username='admnin', password='password', database='gpudb')

In [15]:
client.get_list_measurements()

[{'name': 'gpuseries'}]

In [30]:
data = client.query('select * from gpuseries limit 10;')

In [31]:
type(data)

influxdb.resultset.ResultSet

In [32]:
data

ResultSet({'('gpuseries', None)': [{'time': '2018-04-02T15:00:00.452204032Z', 'GPU': '0', 'Memory Used': 8685355008, 'Memory Used Percent': 36.14988257180032, 'Memory Utilization': 70, 'Power': 94728, 'Temperature': 39, 'Utilization': 90, 'timestamp': '2018-04-02 15:00:00.452204'}, {'time': '2018-04-02T15:00:00.499869184Z', 'GPU': '1', 'Memory Used': 6447693824, 'Memory Used Percent': 26.83636700881325, 'Memory Utilization': 56, 'Power': 57664, 'Temperature': 42, 'Utilization': 73, 'timestamp': '2018-04-02 15:00:00.499869'}, {'time': '2018-04-02T15:00:00.54717312Z', 'GPU': '2', 'Memory Used': 8507097088, 'Memory Used Percent': 35.40794365628589, 'Memory Utilization': 55, 'Power': 57827, 'Temperature': 38, 'Utilization': 68, 'timestamp': '2018-04-02 15:00:00.547173'}, {'time': '2018-04-02T15:00:00.595969024Z', 'GPU': '3', 'Memory Used': 2947547136, 'Memory Used Percent': 12.268178185359254, 'Memory Utilization': 38, 'Power': 62068, 'Temperature': 36, 'Utilization': 73, 'timestamp': '201

In [23]:
df_client = DataFrameClient(host='localhost', username='admnin', password='password', database='gpudb')

In [33]:
df = df_client.query('select * from gpuseries limit 100;')['gpuseries']

In [36]:
df.head(100)

Unnamed: 0,GPU,Memory Used,Memory Used Percent,Memory Utilization,Power,Temperature,Utilization,timestamp
2018-04-02 15:00:00.452204032+00:00,0,8685355008,36.149883,70,94728,39,90,2018-04-02 15:00:00.452204
2018-04-02 15:00:00.499869184+00:00,1,6447693824,26.836367,56,57664,42,73,2018-04-02 15:00:00.499869
2018-04-02 15:00:00.547173120+00:00,2,8507097088,35.407944,55,57827,38,68,2018-04-02 15:00:00.547173
2018-04-02 15:00:00.595969024+00:00,3,2947547136,12.268178,38,62068,36,73,2018-04-02 15:00:00.595969
2018-04-02 15:00:02.519879168+00:00,0,9931063296,41.334726,0,57226,36,0,2018-04-02 15:00:02.519879
2018-04-02 15:00:02.565956096+00:00,1,7693402112,32.021211,0,56027,40,0,2018-04-02 15:00:02.565956
2018-04-02 15:00:02.615364096+00:00,2,9752805376,40.592787,0,56572,36,0,2018-04-02 15:00:02.615364
2018-04-02 15:00:02.661623040+00:00,3,3641704448,15.157376,1,185043,35,7,2018-04-02 15:00:02.661623
2018-04-02 15:00:04.063194112+00:00,0,3687841792,22.428377,0,69760,36,0,2018-04-02 15:00:04.063194
2018-04-02 15:00:04.117285120+00:00,1,4067426304,16.929300,0,57086,39,0,2018-04-02 15:00:04.117285
