# Calculate Dataset Mean

This notebook provides the source code for calculating the mean across each channel of all images in the dataset. We do that by adding up all pixels values along each channel in the image, then summing it with the rest of the images in the dataset, after that, we divide it by the number of all pixels of all images in the dataset.

In [1]:
# Import necessary libraries
import numpy as np
import pandas as pd
from keras.preprocessing.image import ImageDataGenerator
from tqdm import tqdm
import pickle
import os

Using TensorFlow backend.


In [2]:
train_dir = "./train"
datagen = ImageDataGenerator()


CHANNEL_NUM = 3
pixel_num = 0 # To store the number of all pixels in the dataset
channel_sum = np.zeros(CHANNEL_NUM) # To store the sum of pixels value along each channle for all images in the dataset

i = 0 # Counter for the sample size
for batch in tqdm(datagen.flow_from_directory(train_dir, batch_size=1)):
    im = np.asarray(batch[0], np.float32) # image in M*N*CHANNEL_NUM shape, channel in BGR order
    im = im/255.0
    pixel_num += (im.size/CHANNEL_NUM)
    channel_sum += np.sum(im, axis=(0, 1, 2))
    i += 1
    if i > 5756:
        break

  0%|                                                                                         | 0/5756 [00:00<?, ?it/s]

Found 5756 images belonging to 2 classes.


100%|██████████████████████████████████████████████████████████████████████████████| 5756/5756 [01:20<00:00, 87.57it/s]

In [3]:
# Calculate the mean across each channel
bgr_mean = channel_sum / pixel_num
rgb_mean = list(bgr_mean)[::-1]

In [4]:
print('The mean across each channel [R, G, B]: ', rgb_mean)

The mean across each channel [R, G, B]:  [0.48115942940770784, 0.48115942940770784, 0.48115942940770784]


In [5]:
# Store the mean in a pickle file
pickle_file = os.path.join(".", 'Mean.pickle')

try:
  f = open(pickle_file, 'wb')
  save = {
    'train_mean': rgb_mean,
    }
  pickle.dump(save, f, pickle.HIGHEST_PROTOCOL)
  f.close()
except Exception as e:
  print('Unable to save data to', pickle_file, ':', e)
  raise