## Validation Operations

### File format validation 
`validate.extension(file_path, valid_formats=['.jpg', '.png', '.jpeg'])` : This function checks if the files has a valid format by comparing the file extension.

Parameters:
- `file_paths`[list]: list of paths of the file to check.
- `valid_formats`: A list of valid file extensions (default is `['.jpg', '.png', '.jpeg']`).

 Returns:
- list of bools`True` if the file has a valid format.
- `False` otherwise.

In [1]:
from cv_utility import validate

In [2]:
file_path1 = "/home/azureuser/cloudfiles/code/Users/rohit.chandra/scripts/cv_utility/examples/test_image/IMG_2645.JPG" 
file_path2 = "/home/azureuser/cloudfiles/code/Users/rohit.chandra/scripts/cv_utility/cv_utility/validate/test_validate.py"
file_path3 = "xyz.csv"
print(validate.extension(file_paths=[file_path1,file_path2,file_path3]))


[True, False, False]


### Path validation 
`vaidate.path(file_path)`: This function checks if the specified path exists and is a file.

Parameters:
- `file_path`: The path to check.

Returns:
- `True` if the path exists and is a file.
- `False` otherwise.

In [3]:
print(validate.path(file_path=file_path1))
print(validate.path(file_path=file_path2))
print(validate.path(file_path=file_path3))

True
True
False


### Check for corrupt or invalid image
`vaidate.image(file_path)`: This function checks if the image file is valid or corrupt by attempting to load it.

Parameters:
- `file_path`: The path of the image file.

Returns:
- `True` if the image is valid.
- `False` if the image is corrupt or invalid.

In [4]:
print(validate.image(file_path=file_path1))
print(validate.image(file_path=file_path2))

True
False


### Detect Duplicate Images in a Directory Using Hash Comparison
`vaidate.find_content_duplicate(directory)` : This function detects duplicate images in a directory using hash comparison.

Parameters:
- `directory`: The directory to search for duplicate images.

Returns:
- A list of duplicate file names found in the directory.

In [5]:
duplicate = validate.find_content_duplicate(dir="/mnt/03modeling/5_Rohit/Bodytype_Classification/Body_type_extracts/hatchback")
print(duplicate)
duplicate = validate.find_content_duplicate(dir="/home/azureuser/cloudfiles/code/Users/rohit.chandra/scripts/cv_utility/examples/test_image")
print(duplicate)

[]


['batch1_dent_img_000014 copy.png', 'batch1_dent_img_000014.png']


### Find duplicate files in two directory using name
`find_dir_common_files(dir1, dir2)`: Find common file names between two directories based on file names.

Parameters:
- `dir1`: Path to the first directory.
- `dir2`: Path to the second directory.

Returns:
- List of common file names.


In [6]:
duplicate = validate.find_dir_common_files(dir1="/mnt/03modeling/5_Rohit/Bodytype_Classification/Body_type_extracts/hatchback/" , dir2 = "/mnt/03modeling/5_Rohit/Bodytype_Classification/Body_type_extracts/suv/" )
print(duplicate)
duplicate = validate.find_dir_common_files(dir1="/home/azureuser/cloudfiles/code/Users/rohit.chandra/scripts/cv_utility/examples/test_image" , dir2 = "/home/azureuser/cloudfiles/code/Users/rohit.chandra/scripts/cv_utility/examples/test_image" )
duplicate

[]


['batch1_dent_img_000009.png',
 'batch1_dent_img_000014.png',
 'Front_Bumper_(Fascia)_Carrier_Single_Motor_RGB.png',
 'batch1_dent_img_000014 copy.png',
 'Auction_A1_20221106_130911.png',
 'Rear_Energy_Absorber_RGB.png',
 'gettyimages-1387151277-612x612.jpg',
 'IMG_2645.JPG',
 '3.jpeg',
 'apple.jpg',
 'batch1_dent_img_000014 copy 2.png',
 'cv_utility_logo.jpg']

### check image dimension ans size
`validate.image_size_and_dim(file_path, min_dim=(50, 50), max_size_mb=5)` : This function checks if an image meets the required dimensions and file size without fully loading the image.

Parameters:
- `file_path`: The path of the image to check.
- `min_dim`: Minimum dimensions (width, height) for the image (default is (50, 50)).
- `max_size_mb`: Maximum file size in megabytes (default is 5 MB).

Returns:
- `True` if the image meets the criteria.
- `False` otherwise.

In [7]:
from cv_utility import image_ops
image_1_path = "/home/azureuser/cloudfiles/code/Users/rohit.chandra/scripts/cv_utility/examples/test_image/apple.jpg"
image_2_path = "/home/azureuser/cloudfiles/code/Users/rohit.chandra/scripts/cv_utility/examples/test_image/gettyimages-1387151277-612x612.jpg"
image_1 = image_ops.load_rgb(image_1_path)
image_2 = image_ops.load_rgb(image_2_path)
image_1 = image_ops.resize(image_1,(320,320))
image_2 = image_ops.resize(image_2,(320,320))

In [8]:
print(image_1.shape)
print(validate.image_size_and_dim(file_path=image_1_path))
print(validate.image_size_and_dim(file_path=image_1_path,min_dim=(2000,2000)))
print(validate.image_size_and_dim(file_path=image_1_path,min_dim=(500,500) , max_size_mb=1))
print(validate.image_size_and_dim(file_path=image_1_path,min_dim=(100,100) , max_size_mb=2))

(320, 320, 3)
True
False
True
True


### Check mask size with image
`validate.match_mask_size(image_path, mask_path)`:This function checks if the mask size matches the image size for segmentation without fully loading the image and mask.

Parameters:
- `image_path`: The path to the image file.
- `mask_path`: The path to the mask file.

Returns:
- `True` if the sizes of the image and mask match.
- `False` otherwise.

In [9]:
mask1_path = "/home/azureuser/cloudfiles/code/Users/rohit.chandra/scripts/cv_utility/examples/mask/batch1_dent_img_000009.png"
image1_path = "/home/azureuser/cloudfiles/code/Users/rohit.chandra/scripts/cv_utility/examples/test_image/batch1_dent_img_000009.png"
mask2_path= "/home/azureuser/cloudfiles/code/Users/rohit.chandra/scripts/cv_utility/examples/mask/batch1_dent_img_000014.png"
image2_path = "/home/azureuser/cloudfiles/code/Users/rohit.chandra/scripts/cv_utility/examples/test_image/batch1_dent_img_000014.png"

In [10]:
print(validate.match_mask_size(image_path=image1_path , mask_path=mask1_path))
print(validate.match_mask_size(image_path=image2_path , mask_path=mask2_path))
print(validate.match_mask_size(image_path=image2_path , mask_path=image_1_path))

True
True
False


### Search in directory
`validate.search_dir(dir, filename)` :This function checks if a file exists in the specified directory.

Parameters:
- `dir`: The directory to search in.
- `filename`: The name of the file to search for.

Returns:
- `True` if the file exists in the directory.
- `False` otherwise

In [11]:
print(validate.search_dir(dir="/home/azureuser/cloudfiles/code/Users/rohit.chandra/scripts/cv_utility/examples/test_image" , filename="apple.jpg"))
print(validate.search_dir(dir="/home/azureuser/cloudfiles/code/Users/rohit.chandra/scripts/cv_utility/examples/test_image" , filename="apple.png"))

True
False


### Search in CSV
`validate.search_csv(csv_path, column, to_search)` :This function checks if a value exists in a specific column of a CSV file.

Parameters:
- `csv_path`: The path to the CSV file.
- `column`: The column name or index to search in.
- `to_search`: The value to search for in the column.

Returns:
- `True` if the value is found in the specified column.
- `False` otherwise.

In [12]:
print(validate.search_csv(csv_path="/mnt/03modeling/5_Rohit/Bodytype_Classification/val.csv" , column="image_name" , to_search="IMG_6601.JPG"))
print(validate.search_csv(csv_path="/mnt/03modeling/5_Rohit/Bodytype_Classification/val.csv" , column="image_name" , to_search="IMG_601.JPG"))

True
False
