Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I prepare my input data? I have a folder of pngs, what do I do with them? #138

Open
xjcl opened this issue Nov 7, 2020 · 10 comments

Comments

@xjcl
Copy link

xjcl commented Nov 7, 2020

I tried putting the images directly in the data/ directory as instructed on the README.md page, but this just leads to the following error:

FileNotFoundError: [Errno 2] No such file or directory: './data/faces_emore/lfw/meta/sizes'

Someone in this issue suggested to use prepare_data.py from the following repository (which btw also is in this repo under backup/):

https://github.com/TreB1eN/InsightFace_Pytorch#323-prepare-dataset--for-training

But that seems to be unable to work with just .pngs either, it seems to be looking for some sort of .rec file:

mxnet.base.MXNetError: [17:27:25] src/io/local_filesys.cc:209: Check failed: allow_null:  LocalFileSystem::Open "data/faces_emore/train.rec": No such file or directory

Any advice? Thanks for your attention.

@JoMe2704
Copy link

JoMe2704 commented Dec 7, 2020

The code is quite confusing, as it contains various different methods for alignment and normalization. After a lot of experiments, I found out that the pre-trained network IR-152 works best if the images are preprocessed as follows:

  • Read in image
  • If your image is gray scale, convert it to 3 channel color image
  • Perform face detection using the script align/detector.py to obtain a face bounding box
  • Extend the face bounding box to a square box using the function convert_to_square in script align/box_utils.py
  • Crop image to square box
  • Resize image to 128x128
  • Center crop image to 112x112
  • If necessary (if image was read in by cv2) convert image from BGR to RGB
  • Convert image to tensor
  • Normalize mean and std to the values used in training (for IR-152, that's mean = [0.5, 0.5, 0.5] and std = [0.5, 0.5, 0.5])

That is essentially the procedure implemented in extract_feature_v1.py and extract_feature_v2.py except that these scripts assume that the images have already been cropped to the squared face bounding box. You can use the following code:

import torch
import torchvision.transforms as transforms
from PIL import Image
import detector
import box_utils

def l2_norm(input, axis = 1):
	norm = torch.norm(input, 2, axis, True)
	output = torch.div(input, norm)
	return output

transform = transforms.Compose([
		transforms.ToTensor(),
		transforms.Normalize( mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])])

for inpicfile in inpiclist:
   pil_image = Image.open(inpicfile)
   if pil_image.mode != 'RBG':
	rgbimg = Image.new("RGB", pil_image.size)
	rgbimg.paste(pil_image)
	pil_image = rgbimg
   MIN_FACE_SIZE = 0.2 * np.float(pil_image.size[0])  # adjust that to your needs 
   detector.detect_faces(pil_image, min_face_size = MIN_FACE_SIZE)
   box = box_utils.convert_to_square(box)
   a,b,c,d = np.array(box[0,0:4], dtype = int)
   croppedImg = pil_image.crop((a,b,c,d))
   croppedImg = croppedImg.resize((128,128), resample = Image.BILINEAR) 
   croppedImg = croppedImg.crop((8,8,120,120))
   inputTensor = transform(croppedImg)
   inputTensor = inputTensor.unsqueeze(0)
   with torch.no_grad():
      embedding = l2_norm(backbone(inputTensor.to('cpu')).cpu())  # change this to you CUDA

@xjcl
Copy link
Author

xjcl commented Dec 21, 2020

Thanks a lot for going through the effort to write this answer. I've decided to use the Azure Face API rather than rolling my own, but your answer might be useful for someone else. I can't edit your answer, can you perhaps add formatting to the Python code?

@AGenchev
Copy link

AGenchev commented Jan 2, 2021

@JoMe2704 I have question - does
transforms.Normalize( mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]) map the pixel values from the 3 channels into the range [0;1] ?
If so then if I want to map in range [-1;+1] will this work:
transforms.Normalize( mean=[0, 0, 0], std=[1, 1, 1]) ?

@JoMe2704
Copy link

JoMe2704 commented Jan 3, 2021

No, transforms.ToTensor() already maps the pixel values to tensor values in the range [0;1]. The subsequent transformation transforms.Normalize( mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]) maps each tensor value x to (x - 0.5)/0.5. See https://jhui.github.io/2018/02/09/PyTorch-Data-loading-preprocess_torchvision/ for an explanation. Thus, the resulting tensor has values in the range [-1, 1].

The transform transforms.Normalize( mean=[0, 0, 0], std=[1, 1, 1]) would leave the tensor values unchanged (x -> (x-0)/1), resulting in values in the range [0,1].

@DrewdropLife
Copy link

The code is quite confusing, as it contains various different methods for alignment and normalization. After a lot of experiments, I found out that the pre-trained network IR-152 works best if the images are preprocessed as follows:

  • Read in image
  • If your image is gray scale, convert it to 3 channel color image
  • Perform face detection using the script align/detector.py to obtain a face bounding box
  • Extend the face bounding box to a square box using the function convert_to_square in script align/box_utils.py
  • Crop image to square box
  • Resize image to 128x128
  • Center crop image to 112x112
  • If necessary (if image was read in by cv2) convert image from BGR to RGB
  • Convert image to tensor
  • Normalize mean and std to the values used in training (for IR-152, that's mean = [0.5, 0.5, 0.5] and std = [0.5, 0.5, 0.5])

That is essentially the procedure implemented in extract_feature_v1.py and extract_feature_v2.py except that these scripts assume that the images have already been cropped to the squared face bounding box. You can use the following code:

import torch
import torchvision.transforms as transforms
from PIL import Image
import detector
import box_utils

def l2_norm(input, axis = 1):
	norm = torch.norm(input, 2, axis, True)
	output = torch.div(input, norm)
	return output

transform = transforms.Compose([
		transforms.ToTensor(),
		transforms.Normalize( mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])])

for inpicfile in inpiclist:
   pil_image = Image.open(inpicfile)
   if pil_image.mode != 'RBG':
	rgbimg = Image.new("RGB", pil_image.size)
	rgbimg.paste(pil_image)
	pil_image = rgbimg
   MIN_FACE_SIZE = 0.2 * np.float(pil_image.size[0])  # adjust that to your needs 
   detector.detect_faces(pil_image, min_face_size = MIN_FACE_SIZE)
   box = box_utils.convert_to_square(box)
   a,b,c,d = np.array(box[0,0:4], dtype = int)
   croppedImg = pil_image.crop((a,b,c,d))
   croppedImg = croppedImg.resize((128,128), resample = Image.BILINEAR) 
   croppedImg = croppedImg.crop((8,8,120,120))
   inputTensor = transform(croppedImg)
   inputTensor = inputTensor.unsqueeze(0)
   with torch.no_grad():
      embedding = l2_norm(backbone(inputTensor.to('cpu')).cpu())  # change this to you CUDA

Hi~ I don’t know what is the use of extracting these features. I hope to get the output label of the input image. What should I do?

@sriktrako
Copy link

Hi @JoMe2704 @changxinC ,
I am trying to train a dataset, I am not able to figure out the data format required for training, currently my data is inside

D:/face.evoLVe.PyTorch/data/dataV1/
Inside dataV1 directory the data is as follows:
-> id1/
-> 1.jpg
-> ...
-> id2/
-> 1.jpg
-> ...
-> ...
-> ...
-> ...
Data is already aligned, resized to 112 using the align script provided in repo.
When I run train.py, I am getting file not found error, I saw lot of people are facing the same issue, not being able to get the correct data format.

It would help a lot of people if you can guide how to get the correct dataset format for training. Help would be much appreciated, thank you.

@JoMe2704
Copy link

JoMe2704 commented Aug 9, 2021

The code is quite confusing, as it contains various different methods for alignment and normalization. After a lot of experiments, I found out that the pre-trained network IR-152 works best if the images are preprocessed as follows:

  • Read in image
  • If your image is gray scale, convert it to 3 channel color image
  • Perform face detection using the script align/detector.py to obtain a face bounding box
  • Extend the face bounding box to a square box using the function convert_to_square in script align/box_utils.py
  • Crop image to square box
  • Resize image to 128x128
  • Center crop image to 112x112
  • If necessary (if image was read in by cv2) convert image from BGR to RGB
  • Convert image to tensor
  • Normalize mean and std to the values used in training (for IR-152, that's mean = [0.5, 0.5, 0.5] and std = [0.5, 0.5, 0.5])

That is essentially the procedure implemented in extract_feature_v1.py and extract_feature_v2.py except that these scripts assume that the images have already been cropped to the squared face bounding box. You can use the following code:

import torch
import torchvision.transforms as transforms
from PIL import Image
import detector
import box_utils

def l2_norm(input, axis = 1):
	norm = torch.norm(input, 2, axis, True)
	output = torch.div(input, norm)
	return output

transform = transforms.Compose([
		transforms.ToTensor(),
		transforms.Normalize( mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])])

for inpicfile in inpiclist:
   pil_image = Image.open(inpicfile)
   if pil_image.mode != 'RBG':
	rgbimg = Image.new("RGB", pil_image.size)
	rgbimg.paste(pil_image)
	pil_image = rgbimg
   MIN_FACE_SIZE = 0.2 * np.float(pil_image.size[0])  # adjust that to your needs 
   detector.detect_faces(pil_image, min_face_size = MIN_FACE_SIZE)
   box = box_utils.convert_to_square(box)
   a,b,c,d = np.array(box[0,0:4], dtype = int)
   croppedImg = pil_image.crop((a,b,c,d))
   croppedImg = croppedImg.resize((128,128), resample = Image.BILINEAR) 
   croppedImg = croppedImg.crop((8,8,120,120))
   inputTensor = transform(croppedImg)
   inputTensor = inputTensor.unsqueeze(0)
   with torch.no_grad():
      embedding = l2_norm(backbone(inputTensor.to('cpu')).cpu())  # change this to you CUDA

Hi~ I don’t know what is the use of extracting these features. I hope to get the output label of the input image. What should I do?

While the network has been trained to perform classification of the subjects (persons) in the training sets, this doesn't help you, if you look at images of persons that aren't in the training set. In order to use the network for images of any person, you use the features (embeddings) of the finale layer. These can be used to measure the similarity of the faces. Precisely, the euclidean distance between the embeddings of two face images is a measure for the dissimilarity. Depending on the specific variance of the images (face pose, facial expression, illumination, sharpness, ageing), you can set a threshold for the distance to decide if the images depict the same person.

@JoMe2704
Copy link

JoMe2704 commented Aug 9, 2021

Hi @JoMe2704 @changxinC ,
I am trying to train a dataset, I am not able to figure out the data format required for training, currently my data is inside

D:/face.evoLVe.PyTorch/data/dataV1/
Inside dataV1 directory the data is as follows:
-> id1/
-> 1.jpg
-> ...
-> id2/
-> 1.jpg
-> ...
-> ...
-> ...
-> ...
Data is already aligned, resized to 112 using the align script provided in repo.
When I run train.py, I am getting file not found error, I saw lot of people are facing the same issue, not being able to get the correct data format.

It would help a lot of people if you can guide how to get the correct dataset format for training. Help would be much appreciated, thank you.

Sorry, I have no idea. I used my own images and scripts. And I didn't perform any training yet.

@chocokassy
Copy link

代码相当高,因为它包含了各种不同的卫星和归一化。 经过大量实验,我发现如果对图像进行了如下的理想,则预训练网络IR-152效果最佳:

  • 读入图片
  • 如果您的图像是图像,图像其转换为3幅彩色图像
  • 使用脚本 align/detector.py 进行人脸检测,获取人脸眼框
  • 使用剧本align/box_utils.py
  • 将图像传达到法院
  • 将图像大小调整为 128x128
  • 将图像中心为112x112
  • 如有必要(如果图像被 cv2 读入)将图像从 BGR 转换为 RGB
  • 将图像转换为张量
  • 将均值和标准差归一化为训练中使用的值(对于 IR-152,即均值 = [0.5, 0.5, 0.5] 和标准差 = [0.5, 0.5, 0.5])

这最后是在extract_feature_v1.py和extract_feature_v2.py中实现的,除了这些假想已经被代码骗到了人脸过程画面。您可以使用以下:

import torch
import torchvision.transforms as transforms
from PIL import Image
import detector
import box_utils

def l2_norm(input, axis = 1):
	norm = torch.norm(input, 2, axis, True)
	output = torch.div(input, norm)
	return output

transform = transforms.Compose([
		transforms.ToTensor(),
		transforms.Normalize( mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])])

for inpicfile in inpiclist:
   pil_image = Image.open(inpicfile)
   if pil_image.mode != 'RBG':
	rgbimg = Image.new("RGB", pil_image.size)
	rgbimg.paste(pil_image)
	pil_image = rgbimg
   MIN_FACE_SIZE = 0.2 * np.float(pil_image.size[0])  # adjust that to your needs 
   detector.detect_faces(pil_image, min_face_size = MIN_FACE_SIZE)
   box = box_utils.convert_to_square(box)
   a,b,c,d = np.array(box[0,0:4], dtype = int)
   croppedImg = pil_image.crop((a,b,c,d))
   croppedImg = croppedImg.resize((128,128), resample = Image.BILINEAR) 
   croppedImg = croppedImg.crop((8,8,120,120))
   inputTensor = transform(croppedImg)
   inputTensor = inputTensor.unsqueeze(0)
   with torch.no_grad():
      embedding = l2_norm(backbone(inputTensor.to('cpu')).cpu())  # change this to you CUDA

hello, When I use the above code you mentioned, I set the inpiclist to the path of my picture: /home/face.evoLVe/max/150.jpg, but when I running this code,it shows the error that 'box' is not defined and backbone is not defined, how can I solve it? Thank you!

@JoMe2704
Copy link

JoMe2704 commented Sep 1, 2021

hello, When I use the above code you mentioned, I set the inpiclist to the path of my picture: /home/face.evoLVe/max/150.jpg, but when I running this code,it shows the error that 'box' is not defined and backbone is not defined, how can I solve it? Thank you!

This was just the relevant code for face alignment, not a complete script. You need to initialize and load the model before.

# Load torch model
from model_irse import IR_152  # This example uses IR 152
MODELPATH = '/home/face.evoLVe/model_IR152.pth'  # change this to the path of your model file
backbone= IR_152((112, 112))
backbone.load_state_dict(torch.load(MODELPATH, map_location=torch.device('cpu'))) # change this to CUDA if you use a gpu
backbone.eval()
set_grad_enabled(False)

inpiclist should be a list of file paths, not a single file path.

IMGPATH = '/home/face.evoLVe/images/' # Change this to your image folder
inpiclist = [f for f in os.listdir(IMGPATH) if f.endswith('.jpg')]

There was an error in my code, right in the line before you get the error:
box, points = detector.detect_faces(pil_image, min_face_size = MIN_FACE_SIZE)

So, here is the complete code:


import torch
import torchvision.transforms as transforms
from PIL import Image
import detector
import box_utils

def l2_norm(input, axis = 1):
	norm = torch.norm(input, 2, axis, True)
	output = torch.div(input, norm)
	return output

transform = transforms.Compose([
		transforms.ToTensor(),
		transforms.Normalize( mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])])

# Load torch model
from model_irse import IR_152  # This example uses IR 152
MODELPATH = '/home/face.evoLVe/model_IR152.pth'  # change this to the path of your model file
backbone= IR_152((112, 112))
backbone.load_state_dict(torch.load(MODELPATH, map_location=torch.device('cpu'))) # change this to CUDA if you use a gpu
backbone.eval()
set_grad_enabled(False)

IMGPATH = '/home/face.evoLVe/images/' # Change this to your image folder
inpiclist = [f for f in os.listdir(IMGPATH) if f.endswith('.jpg')]
for inpicfile in inpiclist:
   pil_image = Image.open(inpicfile)
   if pil_image.mode != 'RBG':
	rgbimg = Image.new("RGB", pil_image.size)
	rgbimg.paste(pil_image)
	pil_image = rgbimg
   MIN_FACE_SIZE = 0.2 * np.float(pil_image.size[0])  # adjust that to your needs 
   box, points = detector.detect_faces(pil_image, min_face_size = MIN_FACE_SIZE)
   box = box_utils.convert_to_square(box)
   a,b,c,d = np.array(box[0,0:4], dtype = int)
   croppedImg = pil_image.crop((a,b,c,d))
   croppedImg = croppedImg.resize((128,128), resample = Image.BILINEAR) 
   croppedImg = croppedImg.crop((8,8,120,120))
   inputTensor = transform(croppedImg)
   inputTensor = inputTensor.unsqueeze(0)
   with torch.no_grad():
      embedding = l2_norm(backbone(inputTensor.to('cpu')).cpu())  # change this to you CUDA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants