### `Step-2` : Image Resolution Filter
- In the model training process, having high-quality images improves the model's learning accuracy. After collecting around 3,000 images, the dataset was transferred to an Ubuntu system for faster processing using Apache PySpark. A shared file system between Ubuntu and Windows was used to facilitate seamless data transfer and processing.
- After that, the dataset was transferred from the local system to the HDFS (Hadoop Distributed File System) for storage and processing. The following code lines were executed in the Ubuntu terminal to perform the transfer:
  - hdfs dfs -mkdir -p /user1/dangerous_animals
  - hdfs dfs -put /home/hduser/Desktop/dangerous_animals /user1/dangerous_animals
  - hdfs dfs -chmod 777 /user1/dangerous_animals
  - hdfs dfs -ls /user1/dangerous_animals
- After transferring the data to the HDFS system, I applied an image resolution filter to remove low-resolution images across all image types.



In [1]:
from pyspark.sql import SparkSession
from PIL import Image
from io import BytesIO

# Initialize or reuse SparkSession
if 'sc' in globals():
    spark = SparkSession.builder.getOrCreate()
else:
    spark = SparkSession.builder.appName("ImageFilterLowPixel").getOrCreate()

# HDFS path to the target folder
hdfs_folder = "hdfs://localhost:9000/user1/dangerous_animals/scorpion/"

# Pixel threshold for deletion (width, height)
pixel_threshold = (1000, 1000)

# Function to check if an image is below the pixel threshold
def is_low_pixel_image(file_path):
    try:
        # Read image from HDFS
        image_data = spark.sparkContext.binaryFiles(file_path).collect()[0][1]
        image = Image.open(BytesIO(image_data))
        width, height = image.size
        
        # Return True if dimensions are below threshold
        return width < pixel_threshold[0] or height < pixel_threshold[1]
    except Exception as e:
        print(f"Error processing {file_path}: {e}")
        return False

# List files in the HDFS folder
hadoop_conf = spark.sparkContext._jsc.hadoopConfiguration()
fs = spark.sparkContext._jvm.org.apache.hadoop.fs.FileSystem.get(hadoop_conf)
status = fs.listStatus(spark.sparkContext._jvm.org.apache.hadoop.fs.Path(hdfs_folder))

# Filter image files
image_files = [file.getPath().toString() for file in status if file.getPath().toString().endswith(('.jpg', '.jpeg', '.png'))]

# Debug: Print found files
print(f"Files found in {hdfs_folder}:")
for file in image_files:
    print(file)

# Track deleted files
deleted_count = 0

# Delete low-resolution images
for file_path in image_files:
    print(f"Checking file: {file_path}")
    if is_low_pixel_image(file_path):
        print(f"Deleting: {file_path}")
        try:
            fs.delete(spark.sparkContext._jvm.org.apache.hadoop.fs.Path(file_path), True)
            print(f"Deleted: {file_path}")
            deleted_count += 1
        except Exception as e:
            print(f"Failed to delete {file_path}: {e}")

# Summary
print(f"Total deleted images: {deleted_count}")
print("Processing completed.")


Files found in hdfs://localhost:9000/user1/dangerous_animals/scorpion/:
hdfs://localhost:9000/user1/dangerous_animals/scorpion/000001.jpg
hdfs://localhost:9000/user1/dangerous_animals/scorpion/000002.jpg
hdfs://localhost:9000/user1/dangerous_animals/scorpion/000003.jpg
hdfs://localhost:9000/user1/dangerous_animals/scorpion/000004.jpg
hdfs://localhost:9000/user1/dangerous_animals/scorpion/000005.jpg
hdfs://localhost:9000/user1/dangerous_animals/scorpion/000007.jpg
hdfs://localhost:9000/user1/dangerous_animals/scorpion/000008.jpg
hdfs://localhost:9000/user1/dangerous_animals/scorpion/000009.jpg
hdfs://localhost:9000/user1/dangerous_animals/scorpion/000010.jpg
hdfs://localhost:9000/user1/dangerous_animals/scorpion/000011.jpg
hdfs://localhost:9000/user1/dangerous_animals/scorpion/000012.jpg
hdfs://localhost:9000/user1/dangerous_animals/scorpion/000013.jpg
hdfs://localhost:9000/user1/dangerous_animals/scorpion/000014.jpg
hdfs://localhost:9000/user1/dangerous_animals/scorpion/000015.jpg
hdfs

Checking file: hdfs://localhost:9000/user1/dangerous_animals/scorpion/000002.jpg
Checking file: hdfs://localhost:9000/user1/dangerous_animals/scorpion/000003.jpg
Checking file: hdfs://localhost:9000/user1/dangerous_animals/scorpion/000004.jpg
Checking file: hdfs://localhost:9000/user1/dangerous_animals/scorpion/000005.jpg
Checking file: hdfs://localhost:9000/user1/dangerous_animals/scorpion/000007.jpg
Deleting low resolution image: hdfs://localhost:9000/user1/dangerous_animals/scorpion/000007.jpg
Successfully deleted: hdfs://localhost:9000/user1/dangerous_animals/scorpion/000007.jpg
Checking file: hdfs://localhost:9000/user1/dangerous_animals/scorpion/000008.jpg
Deleting low resolution image: hdfs://localhost:9000/user1/dangerous_animals/scorpion/000008.jpg
Successfully deleted: hdfs://localhost:9000/user1/dangerous_animals/scorpion/000008.jpg
Checking file: hdfs://localhost:9000/user1/dangerous_animals/scorpion/000009.jpg
Deleting low resolution image: hdfs://localhost:9000/user1/dang

Deleting low resolution image: hdfs://localhost:9000/user1/dangerous_animals/scorpion/000051.jpg
Successfully deleted: hdfs://localhost:9000/user1/dangerous_animals/scorpion/000051.jpg
Checking file: hdfs://localhost:9000/user1/dangerous_animals/scorpion/000052.jpg
Deleting low resolution image: hdfs://localhost:9000/user1/dangerous_animals/scorpion/000052.jpg
Successfully deleted: hdfs://localhost:9000/user1/dangerous_animals/scorpion/000052.jpg
Checking file: hdfs://localhost:9000/user1/dangerous_animals/scorpion/000053.jpg
Deleting low resolution image: hdfs://localhost:9000/user1/dangerous_animals/scorpion/000053.jpg
Successfully deleted: hdfs://localhost:9000/user1/dangerous_animals/scorpion/000053.jpg
Checking file: hdfs://localhost:9000/user1/dangerous_animals/scorpion/000054.jpg
Deleting low resolution image: hdfs://localhost:9000/user1/dangerous_animals/scorpion/000054.jpg
Successfully deleted: hdfs://localhost:9000/user1/dangerous_animals/scorpion/000054.jpg
Checking file: hd

Deleting low resolution image: hdfs://localhost:9000/user1/dangerous_animals/scorpion/scorpion_18.jpg
Successfully deleted: hdfs://localhost:9000/user1/dangerous_animals/scorpion/scorpion_18.jpg
Checking file: hdfs://localhost:9000/user1/dangerous_animals/scorpion/scorpion_2.jpg
Deleting low resolution image: hdfs://localhost:9000/user1/dangerous_animals/scorpion/scorpion_2.jpg
Successfully deleted: hdfs://localhost:9000/user1/dangerous_animals/scorpion/scorpion_2.jpg
Checking file: hdfs://localhost:9000/user1/dangerous_animals/scorpion/scorpion_21.jpg
Deleting low resolution image: hdfs://localhost:9000/user1/dangerous_animals/scorpion/scorpion_21.jpg
Successfully deleted: hdfs://localhost:9000/user1/dangerous_animals/scorpion/scorpion_21.jpg
Checking file: hdfs://localhost:9000/user1/dangerous_animals/scorpion/scorpion_3.jpg
Checking file: hdfs://localhost:9000/user1/dangerous_animals/scorpion/scorpion_37.jpg
Checking file: hdfs://localhost:9000/user1/dangerous_animals/scorpion/scorp