# **Proyecto - Plant Status**

Para este proyecto, se utiliz√≥ el dataset disponible en la plataforma Kaggle, espec√≠ficamente el titulado https://www.kaggle.com/datasets/abdallahalidev/plantvillage-dataset/data Este conjunto de datos contiene im√°genes de diversas plantas en diferentes estados de salud, incluyendo tanto condiciones patol√≥gicas como muestras saludables.

Cabe mencionar que algunas de las clases presentes en el dataset √∫nicamente incluyen im√°genes de plantas en estado saludable. A pesar de esta limitaci√≥n, se decidi√≥ continuar trabajando con dichas clases para mantener la diversidad de especies vegetales representadas en el conjunto de datos.

Posteriormente, se realiz√≥ una reorganizaci√≥n del dataset con el fin de facilitar el procesamiento y la clasificaci√≥n. Para ello, se agruparon las im√°genes seg√∫n el tipo de planta, creando una estructura de carpetas nombradas con el nombre correspondiente a cada especie. Dentro de cada una de estas carpetas se almacenaron las im√°genes clasificadas por su estado, lo que permite una manipulaci√≥n m√°s ordenada y eficiente durante el desarrollo del modelo.

## ***Configuraci√≥n del entorno y Extracci√≥n***

### ***Verificaci√≥n y uso de GPU (CUDA) para el procesamiento de im√°genes***

Ahora, en el siguiente fragmento de c√≥digo se realiza una verificaci√≥n del entorno para comprobar si CUDA est√° disponible. Esto nos permite utilizar la GPU personal para acelerar el procesamiento de im√°genes durante el entrenamiento del modelo:

In [1]:
import torch

print("¬øCUDA disponible?:", torch.cuda.is_available())
if torch.cuda.is_available():
    print("‚úÖ GPU detectada:", torch.cuda.get_device_name(0))
else:
    print("‚ùå No se detect√≥ GPU")


¬øCUDA disponible?: True
‚úÖ GPU detectada: NVIDIA GeForce RTX 3050


### ***Definici√≥n y verificaci√≥n de la ruta del dataset***


Antes de cargar las im√°genes del dataset, es fundamental asegurarse de que la ruta hacia la carpeta que contiene los datos est√© correctamente definida. En el siguiente fragmento de c√≥digo, se especifica la ruta local donde se encuentra almacenado el dataset y se verifica su existencia en el sistema. Esto permite detectar posibles errores tempranamente si la ruta es incorrecta o si los archivos no se han descargado adecuadamente.

In [2]:
import os

# Definir la ruta del dataset
dataset_path = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"

# Verificar que la carpeta existe
if os.path.exists(dataset_path):
    print(f"¬°El dataset se encontr√≥ en: {dataset_path}!")
else:
    print(f"Error: No se encontr√≥ la carpeta en {dataset_path}. Verifica la ruta.")

¬°El dataset se encontr√≥ en: C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset!


## ***An√°lisis exploratorio***

### ***Estructura del dataset***

Ver con que datos o carpetas se esta trabajando a lo largo de este colab

In [3]:
import os

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"

def list_folders_clean(directory):
    # Verificar que la carpeta existe
    if not os.path.exists(directory):
        print(f"Error: No se encontr√≥ la carpeta en {directory}.")
        return
    
    visited = set()
    folders_by_origin = {"color": [], "grayscale": [], "segmented": []}
    
    # Recorrer el directorio
    for root, dirs, _ in os.walk(directory):
        depth = root[len(directory):].count(os.sep)
        folder_name = os.path.basename(root)
        if root == directory:
            folder_name = "plantvillage dataset"
        
        full_path = os.path.normpath(os.path.join(root))
        if full_path not in visited and depth >= 1:
            visited.add(full_path)
            
            # Determinar el tipo (color, grayscale, segmented)
            origin = ""
            parent_path = os.path.dirname(root)
            parent_name = os.path.basename(parent_path)
            if depth == 1:
                origin = folder_name
            elif depth >= 2 and parent_name in ["color", "grayscale", "segmented"]:
                origin = parent_name
            elif depth >= 2:
                grandparent_path = os.path.dirname(parent_path)
                grandparent_name = os.path.basename(grandparent_path)
                if grandparent_name in ["color", "grayscale", "segmented"]:
                    origin = grandparent_name
            
            # Guardar carpetas de nivel 2 o mayor
            if depth >= 2 and origin:
                is_simple = "___" not in folder_name
                folders_by_origin[origin].append((depth, folder_name, is_simple))
    
    # Imprimir carpeta ra√≠z
    print(f"üìÅ plantvillage dataset")
    print(f"{'=' * 50}")
    
    # Imprimir carpetas por tipo
    for origin in ["color", "grayscale", "segmented"]:
        if folders_by_origin[origin]:
            print(f"\n{origin.upper()}")
            print(f"{'=' * 50}")
            for depth, folder_name, is_simple in sorted(folders_by_origin[origin], key=lambda x: x[1]):
                indent = "  " * (depth - 1)
                icon = "üå±" if is_simple else "üçÉ"
                print(f"{indent}{icon} {folder_name}")

if __name__ == "__main__":
    list_folders_clean(DATASET_PATH)

üìÅ plantvillage dataset

COLOR
    üå± Apple
      üçÉ Apple___Apple_scab
      üçÉ Apple___Black_rot
      üçÉ Apple___Cedar_apple_rust
      üçÉ Apple___healthy
    üå± Blueberry
      üçÉ Blueberry___healthy
    üå± Cherry
      üçÉ Cherry_(including_sour)___Powdery_mildew
      üçÉ Cherry_(including_sour)___healthy
    üå± Corn
      üçÉ Corn_(maize)___Cercospora_leaf_spot Gray_leaf_spot
      üçÉ Corn_(maize)___Common_rust_
      üçÉ Corn_(maize)___Northern_Leaf_Blight
      üçÉ Corn_(maize)___healthy
    üå± Grape
      üçÉ Grape___Black_rot
      üçÉ Grape___Esca_(Black_Measles)
      üçÉ Grape___Leaf_blight_(Isariopsis_Leaf_Spot)
      üçÉ Grape___healthy
    üå± Orange
      üçÉ Orange___Haunglongbing_(Citrus_greening)
    üå± Peach
      üçÉ Peach___Bacterial_spot
      üçÉ Peach___healthy
    üå± Pepper
      üçÉ Pepper,_bell___Bacterial_spot
      üçÉ Pepper,_bell___healthy
    üå± Potato
      üçÉ Potato___Early_blight
      üçÉ Potato___La

El dataset PlantVillage contiene im√°genes de m√∫ltiples especies de plantas en tres formatos: color, escala de grises y segmentado. Cada especie incluye diferentes estados, que abarcan desde plantas saludables hasta diversas enfermedades comunes.

Este conjunto de datos ofrece una amplia variedad de condiciones para el entrenamiento y evaluaci√≥n de modelos de clasificaci√≥n y diagn√≥stico de enfermedades en plantas, lo que lo convierte en un recurso valioso para proyectos de aprendizaje autom√°tico en agricultura.

### ***Formato o extension de las imagenes***

Se verific√≥ la extensi√≥n de las im√°genes en el dataset y, para nuestro trabajo, se utilizar√°n √∫nicamente las im√°genes con extensi√≥n .jpg.

In [4]:
import os
from collections import defaultdict

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"

def analyze_image_extensions(directory):
    # Verificar que la carpeta existe
    if not os.path.exists(directory):
        print(f"Error: No se encontr√≥ la carpeta en {directory}.")
        return
    
    # Almacenar conteo de extensiones
    ext_counts = defaultdict(int)
    total_files = 0
    file_extensions = ('.jpg', '.jpeg', '.png', '.bmp', '.gif')
    
    # Recorrer el directorio
    for root, _, files in os.walk(directory):
        for file in files:
            ext = os.path.splitext(file)[1].lower()
            if ext in file_extensions:
                ext_counts[ext] += 1
                total_files += 1
    
    # Imprimir resumen
    print(f"{'=' * 30}")
    print("Extensiones de Im√°genes")
    print(f"{'=' * 30}")
    print(f"{'Extensi√≥n':<10} {'Conteo':>10}")
    print("-" * 22)
    for ext in sorted(ext_counts):
        print(f"{ext:<10} {ext_counts[ext]:>10}")
    print(f"\nTotal archivos: {total_files}")

if __name__ == "__main__":
    analyze_image_extensions(DATASET_PATH)

Extensiones de Im√°genes
Extensi√≥n      Conteo
----------------------
.jpg           162916

Total archivos: 162916


Se encontr√≥ la siguiente distribuci√≥n de extensiones en las im√°genes del dataset:

* Archivos con extensi√≥n .jpeg: 2

* Archivos con extensi√≥n .jpg: 162,912

* Archivos con extensi√≥n .png: 2

En total, el dataset contiene 162,916 archivos de imagen.

Dado que la gran mayor√≠a de las im√°genes utilizan la extensi√≥n .jpg, se requerir√° unificar todas las extensiones al formato .jpg para facilitar el procesamiento, estandarizaci√≥n y evitar errores por incompatibilidades en la lectura o filtrado por extensi√≥n.

### ***Resoluci√≥nes dentro del dataset por planta y tipo de Imagen***

Utilizaremos CUDA para procesar las im√°genes directamente en nuestra tarjeta gr√°fica.
A continuaci√≥n, se muestra un ejemplo de c√≥digo para verificar que CUDA est√© funcionando correctamente:

In [3]:
import cupy as cp
print(f"CUDA version: {cp.cuda.runtime.runtimeGetVersion()}")
print(f"GPU devices: {cp.cuda.runtime.getDeviceCount()}")

CUDA version: 12090
GPU devices: 1


Ahora analizaremos las dimensiones de las im√°genes por cada carpeta (color, grayscale, y segmented) y por cada planta dentro del dataset.
Debido a posibles cuellos de botella en el procesamiento, este an√°lisis se realizar√° planta por planta y carpeta por carpeta, de forma secuencial. 

***Conclusi√≥n del an√°lisis de dimensiones de las im√°genes***



Durante el an√°lisis del dataset, se detectaron dimensiones distintas a 256 x 256, particularmente en la carpeta segmented de las plantas Peach, Strawberry y Potato.

Sin embargo, para asegurar la compatibilidad con el modelo a utilizar ‚ÄîResNet, el cual requiere entradas de 224 x 224 p√≠xeles‚Äî, ser√° necesario redimensionar todas las im√°genes a 224 x 224, sin importar su dimensi√≥n original.

Esta transformaci√≥n garantiza una entrada homog√©nea al modelo, evitando errores durante la etapa de entrenamiento o inferencia.

#### ***Color***

##### ***Apple***

In [6]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Apple"
# Categor√≠a a procesar
CATEGORY = "color"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 3171/3171 [00:01<00:00, 3091.00it/s]


Dimensiones de Im√°genes (Apple - COLOR)
Dimensi√≥n           Conteo
---------------------------
256x256               3171

Total im√°genes procesadas: 3171


##### ***Blueberry***

In [7]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Blueberry"
# Categor√≠a a procesar
CATEGORY = "color"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1502/1502 [00:00<00:00, 5103.20it/s]

Dimensiones de Im√°genes (Blueberry - COLOR)
Dimensi√≥n           Conteo
---------------------------
256x256               1502

Total im√°genes procesadas: 1502





##### ***Cherry***

In [8]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Cherry"
# Categor√≠a a procesar
CATEGORY = "color"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1906/1906 [00:00<00:00, 4925.06it/s]

Dimensiones de Im√°genes (Cherry - COLOR)
Dimensi√≥n           Conteo
---------------------------
256x256               1906

Total im√°genes procesadas: 1906





##### ***Corn***

In [9]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Corn"
# Categor√≠a a procesar
CATEGORY = "color"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 3852/3852 [00:00<00:00, 4767.34it/s]

Dimensiones de Im√°genes (Corn - COLOR)
Dimensi√≥n           Conteo
---------------------------
256x256               3852

Total im√°genes procesadas: 3852





##### ***Grape***

In [10]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Grape"
# Categor√≠a a procesar
CATEGORY = "color"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 4062/4062 [00:00<00:00, 4864.67it/s]

Dimensiones de Im√°genes (Grape - COLOR)
Dimensi√≥n           Conteo
---------------------------
256x256               4062

Total im√°genes procesadas: 4062





##### ***Orange***

In [12]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Orange"
# Categor√≠a a procesar
CATEGORY = "color"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 5507/5507 [00:01<00:00, 3979.02it/s]

Dimensiones de Im√°genes (Orange - COLOR)
Dimensi√≥n           Conteo
---------------------------
256x256               5507

Total im√°genes procesadas: 5507





##### ***Peach***

In [13]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Peach"
# Categor√≠a a procesar
CATEGORY = "color"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 2657/2657 [00:00<00:00, 4653.24it/s]

Dimensiones de Im√°genes (Peach - COLOR)
Dimensi√≥n           Conteo
---------------------------
256x256               2657

Total im√°genes procesadas: 2657





##### ***Pepper***

In [14]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Pepper"
# Categor√≠a a procesar
CATEGORY = "color"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 2475/2475 [00:00<00:00, 5195.23it/s]

Dimensiones de Im√°genes (Pepper - COLOR)
Dimensi√≥n           Conteo
---------------------------
256x256               2475

Total im√°genes procesadas: 2475





##### ***Potato***

In [16]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Potato"
# Categor√≠a a procesar
CATEGORY = "color"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes:   0%|          | 0/2152 [00:00<?, ?it/s]

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 2152/2152 [00:00<00:00, 4981.51it/s]

Dimensiones de Im√°genes (Potato - COLOR)
Dimensi√≥n           Conteo
---------------------------
256x256               2152

Total im√°genes procesadas: 2152





##### ***Raspberry***

In [17]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Raspberry"
# Categor√≠a a procesar
CATEGORY = "color"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 371/371 [00:00<00:00, 4818.16it/s]

Dimensiones de Im√°genes (Raspberry - COLOR)
Dimensi√≥n           Conteo
---------------------------
256x256                371

Total im√°genes procesadas: 371





##### ***Soybean***

In [18]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Soybean"
# Categor√≠a a procesar
CATEGORY = "color"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 5090/5090 [00:01<00:00, 4975.56it/s]

Dimensiones de Im√°genes (Soybean - COLOR)
Dimensi√≥n           Conteo
---------------------------
256x256               5090

Total im√°genes procesadas: 5090





##### ***Squash***

In [19]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Squash"
# Categor√≠a a procesar
CATEGORY = "color"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1835/1835 [00:00<00:00, 3553.04it/s]

Dimensiones de Im√°genes (Squash - COLOR)
Dimensi√≥n           Conteo
---------------------------
256x256               1835

Total im√°genes procesadas: 1835





##### ***Strawberry***

In [20]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Strawberry"
# Categor√≠a a procesar
CATEGORY = "color"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1565/1565 [00:00<00:00, 5212.53it/s]

Dimensiones de Im√°genes (Strawberry - COLOR)
Dimensi√≥n           Conteo
---------------------------
256x256               1565

Total im√°genes procesadas: 1565





##### ***Tomato***

In [21]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Tomato"
# Categor√≠a a procesar
CATEGORY = "color"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 18160/18160 [00:03<00:00, 4927.47it/s]

Dimensiones de Im√°genes (Tomato - COLOR)
Dimensi√≥n           Conteo
---------------------------
256x256              18160

Total im√°genes procesadas: 18160





#### ***Grayscale***

##### ***Apple***

In [22]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Apple"
# Categor√≠a a procesar
CATEGORY = "grayscale"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 3171/3171 [00:00<00:00, 4704.76it/s]

Dimensiones de Im√°genes (Apple - GRAYSCALE)
Dimensi√≥n           Conteo
---------------------------
256x256               3171

Total im√°genes procesadas: 3171





##### ***Blueberry***

In [2]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Blueberry"
# Categor√≠a a procesar
CATEGORY = "grayscale"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1502/1502 [00:00<00:00, 5900.12it/s]

Dimensiones de Im√°genes (Blueberry - GRAYSCALE)
Dimensi√≥n           Conteo
---------------------------
256x256               1502

Total im√°genes procesadas: 1502





##### ***Cherry***

In [5]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Cherry"
# Categor√≠a a procesar
CATEGORY = "grayscale"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1906/1906 [00:00<00:00, 4933.46it/s]

Dimensiones de Im√°genes (Cherry - GRAYSCALE)
Dimensi√≥n           Conteo
---------------------------
256x256               1906

Total im√°genes procesadas: 1906





##### ***Corn***

In [6]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Corn"
# Categor√≠a a procesar
CATEGORY = "grayscale"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 3852/3852 [00:00<00:00, 5632.23it/s]

Dimensiones de Im√°genes (Corn - GRAYSCALE)
Dimensi√≥n           Conteo
---------------------------
256x256               3852

Total im√°genes procesadas: 3852





##### ***Grape***

In [7]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Grape"
# Categor√≠a a procesar
CATEGORY = "grayscale"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 4062/4062 [00:00<00:00, 5439.98it/s]

Dimensiones de Im√°genes (Grape - GRAYSCALE)
Dimensi√≥n           Conteo
---------------------------
256x256               4062

Total im√°genes procesadas: 4062





##### ***Orange***

In [8]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Orange"
# Categor√≠a a procesar
CATEGORY = "grayscale"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 5507/5507 [00:01<00:00, 5465.90it/s]

Dimensiones de Im√°genes (Orange - GRAYSCALE)
Dimensi√≥n           Conteo
---------------------------
256x256               5507

Total im√°genes procesadas: 5507





##### ***Peach***

In [9]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Peach"
# Categor√≠a a procesar
CATEGORY = "grayscale"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 2657/2657 [00:00<00:00, 5805.21it/s]

Dimensiones de Im√°genes (Peach - GRAYSCALE)
Dimensi√≥n           Conteo
---------------------------
256x256               2657

Total im√°genes procesadas: 2657





##### ***Pepper***

In [10]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Pepper"
# Categor√≠a a procesar
CATEGORY = "grayscale"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 2475/2475 [00:00<00:00, 5093.92it/s]

Dimensiones de Im√°genes (Pepper - GRAYSCALE)
Dimensi√≥n           Conteo
---------------------------
256x256               2475

Total im√°genes procesadas: 2475





##### ***Potato***

In [12]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Potato"
# Categor√≠a a procesar
CATEGORY = "grayscale"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 2152/2152 [00:00<00:00, 5764.15it/s]

Dimensiones de Im√°genes (Potato - GRAYSCALE)
Dimensi√≥n           Conteo
---------------------------
256x256               2152

Total im√°genes procesadas: 2152





##### ***Raspberry***

In [13]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Raspberry"
# Categor√≠a a procesar
CATEGORY = "grayscale"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 371/371 [00:00<00:00, 5640.02it/s]

Dimensiones de Im√°genes (Raspberry - GRAYSCALE)
Dimensi√≥n           Conteo
---------------------------
256x256                371

Total im√°genes procesadas: 371





##### ***Soybean***

In [14]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Soybean"
# Categor√≠a a procesar
CATEGORY = "grayscale"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 5090/5090 [00:00<00:00, 5485.34it/s]

Dimensiones de Im√°genes (Soybean - GRAYSCALE)
Dimensi√≥n           Conteo
---------------------------
256x256               5090

Total im√°genes procesadas: 5090





##### ***Squash***

In [15]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Squash"
# Categor√≠a a procesar
CATEGORY = "grayscale"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1835/1835 [00:00<00:00, 5462.27it/s]

Dimensiones de Im√°genes (Squash - GRAYSCALE)
Dimensi√≥n           Conteo
---------------------------
256x256               1835

Total im√°genes procesadas: 1835





##### ***Strawberry***

In [16]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Strawberry"
# Categor√≠a a procesar
CATEGORY = "grayscale"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1565/1565 [00:00<00:00, 4875.42it/s]

Dimensiones de Im√°genes (Strawberry - GRAYSCALE)
Dimensi√≥n           Conteo
---------------------------
256x256               1565

Total im√°genes procesadas: 1565





##### ***Tomato***

In [18]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Tomato"
# Categor√≠a a procesar
CATEGORY = "grayscale"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 18160/18160 [00:03<00:00, 5530.09it/s]

Dimensiones de Im√°genes (Tomato - GRAYSCALE)
Dimensi√≥n           Conteo
---------------------------
256x256              18160

Total im√°genes procesadas: 18160





#### ***Segmented***

##### ***Apple***

In [19]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Apple"
# Categor√≠a a procesar
CATEGORY = "segmented"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 3171/3171 [00:00<00:00, 4925.75it/s]

Dimensiones de Im√°genes (Apple - SEGMENTED)
Dimensi√≥n           Conteo
---------------------------
256x256               3171

Total im√°genes procesadas: 3171





##### ***Blueberry***

In [20]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Blueberry"
# Categor√≠a a procesar
CATEGORY = "segmented"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1502/1502 [00:00<00:00, 4747.68it/s]

Dimensiones de Im√°genes (Blueberry - SEGMENTED)
Dimensi√≥n           Conteo
---------------------------
256x256               1502

Total im√°genes procesadas: 1502





##### ***Cherry***

In [21]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Cherry"
# Categor√≠a a procesar
CATEGORY = "segmented"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1906/1906 [00:00<00:00, 5193.47it/s]

Dimensiones de Im√°genes (Cherry - SEGMENTED)
Dimensi√≥n           Conteo
---------------------------
256x256               1906

Total im√°genes procesadas: 1906





##### ***Corn***

In [22]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Corn"
# Categor√≠a a procesar
CATEGORY = "segmented"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 3852/3852 [00:01<00:00, 3468.22it/s]

Dimensiones de Im√°genes (Corn - SEGMENTED)
Dimensi√≥n           Conteo
---------------------------
256x256               3852

Total im√°genes procesadas: 3852





##### ***Grape***

In [23]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Grape"
# Categor√≠a a procesar
CATEGORY = "segmented"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 4063/4063 [00:00<00:00, 4969.45it/s]

Dimensiones de Im√°genes (Grape - SEGMENTED)
Dimensi√≥n           Conteo
---------------------------
256x256               4063

Total im√°genes procesadas: 4063





##### ***Orange***

In [24]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Orange"
# Categor√≠a a procesar
CATEGORY = "segmented"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 5507/5507 [00:01<00:00, 5067.89it/s]

Dimensiones de Im√°genes (Orange - SEGMENTED)
Dimensi√≥n           Conteo
---------------------------
256x256               5507

Total im√°genes procesadas: 5507





##### ***Peach***
(Otras dimensiones aparte de: 256x 256)

In [27]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Peach"
# Categor√≠a a procesar
CATEGORY = "segmented"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 2657/2657 [00:00<00:00, 5261.41it/s]

Dimensiones de Im√°genes (Peach - SEGMENTED)
Dimensi√≥n           Conteo
---------------------------
256x256               2655
324x512                  1
466x512                  1

Total im√°genes procesadas: 2657





##### ***Pepper***

In [28]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Pepper"
# Categor√≠a a procesar
CATEGORY = "segmented"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 2475/2475 [00:00<00:00, 5333.59it/s]

Dimensiones de Im√°genes (Pepper - SEGMENTED)
Dimensi√≥n           Conteo
---------------------------
256x256               2475

Total im√°genes procesadas: 2475





##### ***Potato***

In [29]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Potato"
# Categor√≠a a procesar
CATEGORY = "segmented"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 2152/2152 [00:00<00:00, 5096.48it/s]

Dimensiones de Im√°genes (Potato - SEGMENTED)
Dimensi√≥n           Conteo
---------------------------
256x256               2152

Total im√°genes procesadas: 2152





##### ***Raspberry***

In [30]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Raspberry"
# Categor√≠a a procesar
CATEGORY = "segmented"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 371/371 [00:00<00:00, 4685.99it/s]

Dimensiones de Im√°genes (Raspberry - SEGMENTED)
Dimensi√≥n           Conteo
---------------------------
256x256                371

Total im√°genes procesadas: 371





##### ***Soybean***

In [31]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Soybean"
# Categor√≠a a procesar
CATEGORY = "segmented"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 5090/5090 [00:00<00:00, 5365.50it/s]

Dimensiones de Im√°genes (Soybean - SEGMENTED)
Dimensi√≥n           Conteo
---------------------------
256x256               5090

Total im√°genes procesadas: 5090





##### ***Squash***

In [32]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Squash"
# Categor√≠a a procesar
CATEGORY = "segmented"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1835/1835 [00:00<00:00, 4558.84it/s]

Dimensiones de Im√°genes (Squash - SEGMENTED)
Dimensi√≥n           Conteo
---------------------------
256x256               1835

Total im√°genes procesadas: 1835





##### ***Strawberry***
(Otras dimensiones aparte de: 256x 256)

In [33]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Strawberry"
# Categor√≠a a procesar
CATEGORY = "segmented"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1565/1565 [00:00<00:00, 5115.83it/s]

Dimensiones de Im√°genes (Strawberry - SEGMENTED)
Dimensi√≥n           Conteo
---------------------------
256x256               1564
470x512                  1

Total im√°genes procesadas: 1565





##### ***Tomato***
(Otras dimensiones aparte de: 256x 256)

In [34]:
import os
from PIL import Image
import cupy as cp
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
# Nombre de la planta a procesar
PLANT_NAME = "Tomato"
# Categor√≠a a procesar
CATEGORY = "segmented"
# N√∫mero de hilos para paralelizaci√≥n
MAX_WORKERS = 16

def collect_image_paths(directory):
    file_paths = []
    for root, _, files in os.walk(directory):
        if PLANT_NAME in root:
            for file in files:
                if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    file_paths.append(os.path.join(root, file))
    return file_paths

def process_image(file_path):
    try:
        with Image.open(file_path) as img:
            return (img.size[0], img.size[1])
    except Exception as e:
        print(f"Error al procesar {file_path}: {e}")
        return None

def analyze_plant_dimensions():
    category_path = os.path.join(DATASET_PATH, CATEGORY, PLANT_NAME)
    if not os.path.exists(category_path):
        print(f"Error: No se encontr√≥ la carpeta {category_path}.")
        return None, None, 0
    
    file_paths = collect_image_paths(category_path)
    dimensions = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Usar tqdm para mostrar progreso
        with tqdm(total=len(file_paths), desc="Procesando im√°genes") as pbar:
            for dim in executor.map(process_image, file_paths):
                if dim is not None:
                    dimensions.append(dim)
                pbar.update(1)
    
    try:
        dim_array = cp.array(dimensions, dtype=cp.int32)
        unique_dims, counts = np.unique(dim_array.get(), axis=0, return_counts=True)
    except Exception as e:
        print(f"Error en el c√°lculo de dimensiones: {e}")
        return None, None, len(dimensions)
    
    return unique_dims, counts, len(dimensions)

if __name__ == "__main__":
    unique_dims, counts, total_dims = analyze_plant_dimensions()
    if unique_dims is not None and counts is not None:
        print(f"{'=' * 30}")
        print(f"Dimensiones de Im√°genes ({PLANT_NAME} - {CATEGORY.upper()})")
        print(f"{'=' * 30}")
        print(f"{'Dimensi√≥n':<15} {'Conteo':>10}")
        print("-" * 27)
        for dim, count in zip(unique_dims, counts):
            print(f"{f'{dim[0]}x{dim[1]}':<15} {count:>10}")
        print(f"\nTotal im√°genes procesadas: {total_dims}")

Procesando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 18160/18160 [00:03<00:00, 5134.67it/s]

Dimensiones de Im√°genes (Tomato - SEGMENTED)
Dimensi√≥n           Conteo
---------------------------
256x256              18159
335x512                  1

Total im√°genes procesadas: 18160





### ***Detecci√≥n de im√°genes corruptas o vac√≠as***


Se realizar√° una revisi√≥n del dataset para identificar im√°genes corruptas o vac√≠as, con el objetivo de determinar si es necesario eliminarlas o corregirlas antes de continuar con el preprocesamiento y entrenamiento del modelo. Esta validaci√≥n es fundamental para garantizar la calidad y consistencia de los datos.

In [5]:
import os
from PIL import Image
from concurrent.futures import ThreadPoolExecutor, as_completed
from tqdm import tqdm

# Configuraci√≥n
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
MAX_WORKERS = 16

def collect_image_paths(input_dir):
    """Recolecta rutas de im√°genes .jpg, .jpeg y .png con barra de progreso."""
    file_paths = []
    for root, _, files in tqdm(os.walk(input_dir), desc="Recolectando im√°genes"):
        for file in files:
            if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                file_paths.append(os.path.join(root, file))
    return file_paths

def check_image(file_path):
    """Verifica si una imagen es v√°lida o est√° corrupta."""
    try:
        if os.path.getsize(file_path) == 0:
            return file_path, "Archivo vac√≠o (0 bytes)"
        with Image.open(file_path) as img:
            img.verify()
            img = Image.open(file_path)
            img.load()
            if img.size[0] == 0 or img.size[1] == 0:
                return file_path, "Dimensiones inv√°lidas"
        return file_path, None
    except Exception as e:
        return file_path, f"Error: {str(e)}"

def check_dataset(input_dir, max_workers=MAX_WORKERS):
    """Verifica im√°genes en paralelo y genera informe."""
    file_paths = collect_image_paths(input_dir)
    corrupted_images = []

    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        future_to_file = {executor.submit(check_image, path): path for path in file_paths}
        for future in tqdm(as_completed(future_to_file), total=len(file_paths), desc="Verificando im√°genes"):
            file_path, error = future.result()
            if error:
                corrupted_images.append((file_path, error))

    # Resumen
    print(f"\n{'=' * 20}")
    print(f"Total im√°genes: {len(file_paths)}")
    print(f"Im√°genes corruptas: {len(corrupted_images)}")
    if corrupted_images:
        print("\nIm√°genes con problemas:")
        for file_path, error in corrupted_images:
            print(f"  {file_path}: {error}")
        output_file = r"C:\Users\Arys\Desktop\Proyecto - 2\corrupted_images.txt"
        with open(output_file, 'w') as f:
            for file_path, _ in corrupted_images:
                f.write(f"{file_path}\n")
        print(f"\nLista de im√°genes corruptas guardada en: {output_file}")
    else:
        print("‚úÖ No se encontraron im√°genes corruptas.")

    return corrupted_images

if __name__ == "__main__":
    check_dataset(DATASET_PATH)

Recolectando im√°genes: 161it [00:00, 268.70it/s]
Verificando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 162916/162916 [01:07<00:00, 2430.00it/s]



Total im√°genes: 162916
Im√°genes corruptas: 0
‚úÖ No se encontraron im√°genes corruptas.


### ***Buscar balanceo entre clases***

Se analizar√° la distribuci√≥n de im√°genes entre las distintas clases ‚Äîo en este caso, entre los diferentes estados de salud de las plantas‚Äî con el objetivo de verificar si existe un balance adecuado.
Este an√°lisis permitir√° identificar posibles desbalances que, en etapas posteriores, podr√≠an requerir t√©cnicas de reajuste de cantidades o asignaci√≥n de pesos por clase durante el entrenamiento del modelo, para evitar sesgos y mejorar el rendimiento general.

***Conclusi√≥n del an√°lisis***

Tras el an√°lisis realizado, se lleg√≥ a las siguientes conclusiones:

Balance entre clases:
Se observ√≥ un desbalance significativo en la cantidad de im√°genes entre distintas clases (estados de salud de las plantas). Esto indica que ser√° necesario aplicar t√©cnicas de balanceo, ya sea ajustando la cantidad de muestras por clase o utilizando pesos diferenciados durante el entrenamiento del modelo. Esta medida es crucial para evitar sesgos que afecten negativamente la capacidad de generalizaci√≥n del modelo, especialmente al emplear arquitecturas como ResNet.

Verificaci√≥n de im√°genes corruptas o vac√≠as:
Se ejecut√≥ un script para identificar im√°genes que presentaran errores como archivos vac√≠os, da√±os en la codificaci√≥n o dimensiones inv√°lidas.
El resultado mostr√≥ que existen im√°genes corruptas en el dataset, por lo que ser√° necesario eliminarlas o corregirlas antes de continuar con el preprocesamiento.
Se gener√≥ un archivo con la lista de im√°genes problem√°ticas para facilitar su depuraci√≥n:

In [12]:
import os
import pandas as pd
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Configuraci√≥n
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
MAX_WORKERS = 16
OUTPUT_CSV = r"C:\Users\Arys\Desktop\Proyecto - 2\class_counts.csv"

def collect_class_counts(input_dir):
    """Recolecta conteos de im√°genes .jpg, .jpeg y .png por clase con barra de progreso."""
    class_counts = {}
    subdirs = ['color', 'grayscale', 'segmented']

    for subdir in tqdm(subdirs, desc="Procesando subdirectorios"):
        subdir_path = os.path.join(input_dir, subdir)
        if not os.path.exists(subdir_path):
            continue

        for plant in os.listdir(subdir_path):
            plant_path = os.path.join(subdir_path, plant)
            if not os.path.isdir(plant_path):
                continue

            for state in os.listdir(plant_path):
                state_path = os.path.join(plant_path, state)
                if not os.path.isdir(state_path):
                    continue

                images = [f for f in os.listdir(state_path) if f.lower().endswith(('.jpg', '.jpeg', '.png'))]
                if images:
                    class_name = f"{subdir}_{plant}_{state}"
                    class_counts[class_name] = len(images)

    return class_counts

def generate_balance_report(class_counts):
    """Genera informe de balance de clases."""
    if not class_counts:
        print("Error: No se encontraron clases con im√°genes.")
        return

    # Convertir a DataFrame y ordenar
    df = pd.DataFrame(list(class_counts.items()), columns=['Class', 'Count']).sort_values(by='Count', ascending=False)

    # Estad√≠sticas
    total_images = df['Count'].sum()
    num_classes = len(df)
    mean_count = df['Count'].mean()
    std_count = df['Count'].std() if len(df['Count']) > 1 else 0
    min_count = df['Count'].min()
    max_count = df['Count'].max()

    # Imprimir resumen
    print(f"\n{'=' * 20}")
    print(f"Total im√°genes: {total_images}")
    print(f"Total clases: {num_classes}")
    print(f"Media de im√°genes por clase: {mean_count:.2f}")
    print(f"Desviaci√≥n est√°ndar: {std_count:.2f}")
    print(f"M√≠nimo: {min_count} ({df.loc[df['Count'].idxmin(), 'Class']})")
    print(f"M√°ximo: {max_count} ({df.loc[df['Count'].idxmax(), 'Class']})")

    # Evaluar balance
    imbalance_ratio = max_count / min_count if min_count > 0 else float('inf')
    print(f"Relaci√≥n de desbalance: {imbalance_ratio:.2f}")
    print("‚ö†Ô∏è Dataset desbalanceado. Considera balancear las clases." if imbalance_ratio > 2 and num_classes > 1 else "‚úÖ Dataset razonablemente balanceado.")

    # Guardar conteos en CSV
    df.to_csv(OUTPUT_CSV, index=False)
    print(f"Conteo guardado en: {OUTPUT_CSV}")

    # Mostrar clases extremas
    print("\nClases con m√°s im√°genes:")
    print(df.head(5)[['Class', 'Count']].to_string(index=False) if len(df) >= 5 else "No hay datos suficientes.")
    print("\nClases con menos im√°genes:")
    print(df.tail(5)[['Class', 'Count']].to_string(index=False) if len(df) >= 5 else "No hay datos suficientes.")

def main():
    """Ejecuta el an√°lisis de balance de clases."""
    class_counts = collect_class_counts(DATASET_PATH)
    generate_balance_report(class_counts)

if __name__ == "__main__":
    main()

Procesando subdirectorios: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 3/3 [00:00<00:00, 14.42it/s]


Total im√°genes: 162916
Total clases: 114
Media de im√°genes por clase: 1429.09
Desviaci√≥n est√°ndar: 1260.43
M√≠nimo: 152 (color_Potato_Potato___healthy)
M√°ximo: 5507 (color_Orange_Orange___Haunglongbing_(Citrus_greening))
Relaci√≥n de desbalance: 36.23
‚ö†Ô∏è Dataset desbalanceado. Considera balancear las clases.
Conteo guardado en: C:\Users\Arys\Desktop\Proyecto - 2\class_counts.csv

Clases con m√°s im√°genes:
                                                    Class  Count
    color_Orange_Orange___Haunglongbing_(Citrus_greening)   5507
grayscale_Orange_Orange___Haunglongbing_(Citrus_greening)   5507
segmented_Orange_Orange___Haunglongbing_(Citrus_greening)   5507
  segmented_Tomato_Tomato___Tomato_Yellow_Leaf_Curl_Virus   5357
  grayscale_Tomato_Tomato___Tomato_Yellow_Leaf_Curl_Virus   5357

Clases con menos im√°genes:
                                   Class  Count
grayscale_Apple_Apple___Cedar_apple_rust    275
segmented_Apple_Apple___Cedar_apple_rust    275
           color_




## ***Preprocesamiento***

### ***Conversi√≥n de formatos de imagen***

Se identificaron 2 archivos con extensi√≥n .jpeg y 2 archivos con extensi√≥n .png dentro del dataset.
Para mantener la uniformidad en el preprocesamiento, todas estas im√°genes fueron convertidas al formato .jpg, eliminando las versiones originales con extensiones distintas. Esto garantiza una estructura de datos homog√©nea para las etapas posteriores del flujo de trabajo.

In [13]:
import os
from PIL import Image
from tqdm import tqdm

# Ruta del dataset local
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"

def convert_only_png_jpeg_to_jpg_replace(input_dir):
    """Convierte im√°genes .png y .jpeg a .jpg en el mismo directorio, eliminando los originales."""
    target_extensions = ('.png', '.jpeg')
    converted_count = {'jpeg': 0, 'png': 0}
    total_files_processed = 0

    # Recorre directorios y busca im√°genes .png y .jpeg
    for root, _, files in tqdm(os.walk(input_dir), desc="Procesando directorios"):
        for file in files:
            ext = os.path.splitext(file)[1].lower()
            if ext in target_extensions:
                input_path = os.path.join(root, file)
                output_path = os.path.join(root, os.path.splitext(file)[0] + '.jpg')

                try:
                    # Convierte la imagen a RGB y guarda como .jpg
                    with Image.open(input_path) as img:
                        if img.mode != 'RGB':
                            img = img.convert('RGB')
                        img.save(output_path, 'JPEG', quality=95)
                    converted_count[ext[1:]] += 1
                    total_files_processed += 1
                    # Elimina el archivo original
                    os.remove(input_path)
                except Exception as e:
                    print(f"Error al procesar {input_path}: {e}")

    # Imprime resumen de conversiones
    print(f"\n{'=' * 20}")
    print(f"Im√°genes .jpeg convertidas: {converted_count['jpeg']}")
    print(f"Im√°genes .png convertidas: {converted_count['png']}")
    print(f"Total archivos procesados: {total_files_processed}")
    if total_files_processed == 0:
        print("No se encontraron im√°genes .png o .jpeg.")

    return converted_count

if __name__ == "__main__":
    convert_only_png_jpeg_to_jpg_replace(DATASET_PATH)

Procesando directorios: 160it [00:00, 333.34it/s]


Im√°genes .jpeg convertidas: 0
Im√°genes .png convertidas: 0
Total archivos procesados: 0
No se encontraron im√°genes .png o .jpeg.





### ***Redimenzionamiento de imagenes a 224***

Se redimensionaron todas las im√°genes a 224 x 224 p√≠xeles, ya que la mayor√≠a se encontraba en dimensiones como 256 x 256 u otras variantes. Esta estandarizaci√≥n es necesaria para asegurar la compatibilidad con el modelo ResNet, que requiere una entrada fija de dicha dimensi√≥n.

In [14]:
import os
from PIL import Image
from concurrent.futures import ThreadPoolExecutor, as_completed
from tqdm import tqdm

# Configuraci√≥n
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset"
OUTPUT_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset_resized"
TARGET_SIZE = (224, 224)
MAX_WORKERS = 16

def collect_image_paths(input_dir):
    """Recolecta rutas de im√°genes .jpg."""
    file_paths = []
    # Recorre directorios y recolecta im√°genes .jpg
    for root, _, files in tqdm(os.walk(input_dir), desc="Recolectando im√°genes"):
        for file in files:
            if file.lower().endswith('.jpg'):
                file_paths.append(os.path.join(root, file))
    return file_paths

def resize_image(file_path, output_dir, target_size):
    """Redimensiona una imagen .jpg y la guarda en el directorio de salida."""
    try:
        # Crea subdirectorio de salida manteniendo la estructura
        relative_path = os.path.relpath(os.path.dirname(file_path), DATASET_PATH)
        output_subdir = os.path.join(output_dir, relative_path)
        os.makedirs(output_subdir, exist_ok=True)
        output_path = os.path.join(output_subdir, os.path.basename(file_path))

        # Abre, convierte a RGB si es necesario, redimensiona y guarda
        with Image.open(file_path) as img:
            if img.mode != 'RGB':
                img = img.convert('RGB')
            img_resized = img.resize(target_size, Image.LANCZOS)
            img_resized.save(output_path, 'JPEG', quality=95)
        return file_path, output_path, None
    except Exception as e:
        return file_path, None, str(e)

def resize_images(input_dir, output_dir, target_size, max_workers=MAX_WORKERS):
    """Redimensiona im√°genes .jpg en paralelo y genera informe."""
    file_paths = collect_image_paths(input_dir)
    resized_count = 0
    errors = []

    # Procesa im√°genes en paralelo
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        future_to_file = {executor.submit(resize_image, path, output_dir, target_size): path for path in file_paths}
        for future in tqdm(as_completed(future_to_file), total=len(file_paths), desc="Redimensionando im√°genes"):
            input_path, output_path, error = future.result()
            if error:
                errors.append((input_path, error))
            else:
                resized_count += 1

    # Imprime resumen
    print(f"\n{'=' * 20}")
    print(f"Im√°genes redimensionadas: {resized_count}")
    print(f"Errores: {len(errors)}")
    if errors:
        print("\nArchivos con errores:")
        for path, error in errors:
            print(f"  {path}: {error}")
    if resized_count == 0:
        print("No se encontraron im√°genes .jpg.")

if __name__ == "__main__":
    resize_images(DATASET_PATH, OUTPUT_PATH, TARGET_SIZE)

Recolectando im√°genes: 160it [00:00, 279.71it/s]
Redimensionando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 162916/162916 [02:51<00:00, 949.76it/s] 



Im√°genes redimensionadas: 162916
Errores: 0


### ***Balanceo de clases por cada planta y estado***

In [15]:
import os
import pandas as pd
from tqdm import tqdm

# Configuraci√≥n
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset_resized"
OUTPUT_WEIGHTS_DIR = r"C:\Users\Arys\Desktop\Proyecto - 2\balanced_weights"
OUTPUT_WEIGHTS = os.path.join(OUTPUT_WEIGHTS_DIR, "plant_weights.csv")
os.makedirs(OUTPUT_WEIGHTS_DIR, exist_ok=True)

def apply_plant_balancing(class_counts):
    """Aplica balanceo por planta con ponderaci√≥n inversa."""
    # Extrae plantas √∫nicas
    plants = set('_'.join(cls.split('_')[1:2]) for cls in class_counts['Class'])
    balanced_data = []

    # Procesa cada planta
    for plant in tqdm(plants, desc="Balanceando plantas"):
        # Filtra clases de la planta actual
        plant_classes = [cls for cls in class_counts['Class'] if f"_{plant}_" in cls]
        plant_data = class_counts[class_counts['Class'].isin(plant_classes)].copy()

        if plant_data.empty:
            continue

        # Calcula pesos inversos para balanceo
        total_images = plant_data['Count'].sum()
        if total_images == 0:
            continue
        plant_data['Weight'] = total_images / plant_data['Count']
        weight_sum = plant_data['Weight'].sum()
        plant_data['Weight'] = plant_data['Weight'] / weight_sum

        # Genera lista de im√°genes por clase
        for _, row in plant_data.iterrows():
            img_parts = row['Class'].split('_')
            img_type = img_parts[0]
            plant_name = img_parts[1]
            state = '_'.join(img_parts[2:])
            state_path = os.path.join(DATASET_PATH, img_type, plant_name, state)

            if os.path.exists(state_path):
                images = [f for f in os.listdir(state_path) if f.lower().endswith('.jpg')]
                for img in images[:min(len(images), row['Count'])]:
                    balanced_data.append([row['Class'], img, row['Weight']])

    # Guarda datos balanceados en CSV
    if balanced_data:
        balanced_df = pd.DataFrame(balanced_data, columns=['Class', 'Image', 'Weight'])
        balanced_df.to_csv(OUTPUT_WEIGHTS, index=False)
        print(f"\n{'=' * 20}")
        print(f"Total im√°genes balanceadas: {len(balanced_df)}")
        print(f"Clases procesadas: {len(balanced_df['Class'].unique())}")
        print(f"Pesos guardados en: {OUTPUT_WEIGHTS}")
    else:
        print("Error: No se generaron datos balanceados. Verifica las rutas y el CSV.")

# Carga conteos y aplica balanceo
if __name__ == "__main__":
    class_counts = pd.read_csv(r"C:\Users\Arys\Desktop\Proyecto - 2\class_counts.csv")
    apply_plant_balancing(class_counts)

Balanceando plantas: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 14/14 [00:05<00:00,  2.56it/s]



Total im√°genes balanceadas: 162916
Clases procesadas: 114
Pesos guardados en: C:\Users\Arys\Desktop\Proyecto - 2\balanced_weights\plant_weights.csv


### ***Cargamos los pesos***

In [16]:
import pandas as pd
from PIL import Image
import torch
from torch.utils.data import Dataset, DataLoader, WeightedRandomSampler
from torchvision import transforms
from tqdm import tqdm

# Configuraci√≥n
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset_resized"
WEIGHTS_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\balanced_weights\plant_weights.csv"
BATCH_SIZE = 32
NUM_WORKERS = 4

# Transformaciones para entrenamiento
train_transforms = transforms.Compose([
    transforms.RandomResizedCrop(224),
    transforms.RandomHorizontalFlip(),
    transforms.RandomRotation(15),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

class PlantDataset(Dataset):
    """Carga im√°genes .jpg con pesos para balanceo."""
    def __init__(self, weights_df, root_dir, transform=None):
        self.data = weights_df
        self.root_dir = root_dir
        self.transform = transform
        self.class_to_idx = {cls: idx for idx, cls in enumerate(weights_df['Class'].unique())}

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        """Obtiene imagen, etiqueta y peso."""
        row = self.data.iloc[idx]
        img_parts = row['Class'].split('_')
        img_path = f"{self.root_dir}/{img_parts[0]}/{img_parts[1]}/{'_'.join(img_parts[2:])}/{row['Image']}"

        image = Image.open(img_path).convert('RGB')
        if self.transform:
            image = self.transform(image)
        label = self.class_to_idx[row['Class']]
        weight = row['Weight']
        return image, label, weight

def create_weighted_dataloader(weights_path, root_dir, batch_size=BATCH_SIZE, num_workers=NUM_WORKERS):
    """Crea DataLoader con muestreo ponderado."""
    # Carga datos balanceados
    weights_df = pd.read_csv(weights_path)
    
    # Crea dataset con barra de progreso
    dataset = PlantDataset(weights_df, root_dir, transform=train_transforms)
    
    # Configura WeightedRandomSampler
    weights = torch.tensor(weights_df['Weight'].values, dtype=torch.float)
    sampler = WeightedRandomSampler(weights, num_samples=len(weights), replacement=True)

    # Crea DataLoader
    dataloader = DataLoader(dataset, batch_size=batch_size, sampler=sampler, num_workers=num_workers)
    
    # Imprime resumen
    print(f"\n{'=' * 20}")
    print(f"Im√°genes cargadas: {len(dataset)}")
    print(f"Clases √∫nicas: {len(dataset.class_to_idx)}")
    
    return dataloader

if __name__ == "__main__":
    # Configura DataLoader con barra de progreso
    with tqdm(total=1, desc="Configurando DataLoader") as pbar:
        dataloader = create_weighted_dataloader(WEIGHTS_PATH, DATASET_PATH)
        pbar.update(1)

Configurando DataLoader: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1/1 [00:00<00:00,  4.08it/s]


Im√°genes cargadas: 162916
Clases √∫nicas: 114





## ***Divisi√≥n del dataset***


In [17]:
import os
import pandas as pd
from sklearn.model_selection import train_test_split

# Configuraci√≥n
WEIGHTS_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\balanced_weights\plant_weights.csv"
OUTPUT_WEIGHTS_DIR = r"C:\Users\Arys\Desktop\Proyecto - 2\balanced_weights"
TRAIN_WEIGHTS = os.path.join(OUTPUT_WEIGHTS_DIR, "train_weights.csv")
VAL_WEIGHTS = os.path.join(OUTPUT_WEIGHTS_DIR, "val_weights.csv")
TEST_WEIGHTS = os.path.join(OUTPUT_WEIGHTS_DIR, "test_weights.csv")
os.makedirs(OUTPUT_WEIGHTS_DIR, exist_ok=True)

def split_dataset(weights_path, train_ratio=0.7, val_ratio=0.15):
    """Divide el dataset en conjuntos de entrenamiento, validaci√≥n y prueba."""
    if not os.path.exists(weights_path):
        raise FileNotFoundError(f"No se encontr√≥ {weights_path}")
    
    df = pd.read_csv(weights_path)
    if not all(col in df.columns for col in ['Class', 'Image', 'Weight']):
        raise ValueError("El CSV debe contener 'Class', 'Image' y 'Weight'")
    
    if len(df) < 10:  # Validaci√≥n m√≠nima
        raise ValueError("El dataset es demasiado peque√±o para dividir")
    
    num_classes = len(df['Class'].unique())
    print(f"Clases encontradas: {num_classes}")
    
    # Divide en entrenamiento y resto (validaci√≥n + prueba)
    train_df, temp_df = train_test_split(
        df, train_size=train_ratio, stratify=df['Class'], random_state=42
    )
    val_size = val_ratio / (1 - train_ratio)
    val_df, test_df = train_test_split(
        temp_df, train_size=val_size, stratify=temp_df['Class'], random_state=42
    )
    
    # Guarda los conjuntos
    train_df.to_csv(TRAIN_WEIGHTS, index=False)
    val_df.to_csv(VAL_WEIGHTS, index=False)
    test_df.to_csv(TEST_WEIGHTS, index=False)
    
    print(f"\n{'=' * 20}")
    print(f"Total im√°genes: {len(df)}")
    print(f"Entrenamiento: {len(train_df)} im√°genes ({len(train_df['Class'].unique())} clases)")
    print(f"Validaci√≥n: {len(val_df)} im√°genes ({len(val_df['Class'].unique())} clases)")
    print(f"Prueba: {len(test_df)} im√°genes ({len(test_df['Class'].unique())} clases)")
    print(f"CSVs guardados en: {OUTPUT_WEIGHTS_DIR}")

if __name__ == "__main__":
    try:
        split_dataset(WEIGHTS_PATH)
    except Exception as e:
        print(f"Error: {str(e)}")

Clases encontradas: 114

Total im√°genes: 162916
Entrenamiento: 114041 im√°genes (114 clases)
Validaci√≥n: 24437 im√°genes (114 clases)
Prueba: 24438 im√°genes (114 clases)
CSVs guardados en: C:\Users\Arys\Desktop\Proyecto - 2\balanced_weights


## ***Entrenamiento del modelo***

In [5]:
import os
import pandas as pd
from PIL import Image
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader, WeightedRandomSampler
from torchvision import models, transforms
from torchvision.models import ResNet18_Weights
from tqdm import tqdm

# =================== CONFIGURACI√ìN ===================
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset_resized"
WEIGHTS_DIR = r"C:\Users\Arys\Desktop\Proyecto - 2\balanced_weights"
TRAIN_WEIGHTS = os.path.join(WEIGHTS_DIR, "train_weights.csv")
VAL_WEIGHTS = os.path.join(WEIGHTS_DIR, "val_weights.csv")
TEST_WEIGHTS = os.path.join(WEIGHTS_DIR, "test_weights.csv")
MODEL_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\normal_model.pth"
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# =================== TRANSFORMACIONES ===================
train_transforms = transforms.Compose([
    transforms.RandomResizedCrop(224),
    transforms.RandomHorizontalFlip(),
    transforms.RandomVerticalFlip(),
    transforms.RandomRotation(20),
    transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2),
    transforms.RandomAffine(degrees=0, translate=(0.05, 0.05)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

val_transforms = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# =================== DATASET PERSONALIZADO ===================
class PlantDataset(Dataset):
    def __init__(self, weights_df, root_dir, transform=None, class_to_idx=None):
        self.data = weights_df
        self.root_dir = root_dir
        self.transform = transform
        self.class_to_idx = class_to_idx

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        row = self.data.iloc[idx]
        img_parts = row['Class'].split('_')
        img_path = f"{self.root_dir}/{img_parts[0]}/{img_parts[1]}/{'_'.join(img_parts[2:])}/{row['Image']}"
        try:
            image = Image.open(img_path).convert('RGB')
            if self.transform:
                image = self.transform(image)
            label = self.class_to_idx[row['Class']]
            return image, label
        except Exception as e:
            print(f"Error al cargar {img_path}: {str(e)}")
            raise

# =================== DATALOADER ===================
def create_dataloader(weights_path, root_dir, transform, class_to_idx, batch_size=64, use_sampler=True):
    if not os.path.exists(weights_path):
        raise FileNotFoundError(f"No se encontr√≥ {weights_path}")
    weights_df = pd.read_csv(weights_path)
    if not all(col in weights_df.columns for col in ['Class', 'Image', 'Weight']):
        raise ValueError(f"El CSV debe contener 'Class', 'Image' y 'Weight'")
    
    dataset = PlantDataset(weights_df, root_dir, transform=transform, class_to_idx=class_to_idx)
    sampler = None
    if use_sampler:
        weights = torch.tensor(weights_df['Weight'].values, dtype=torch.float)
        sampler = WeightedRandomSampler(weights, num_samples=len(weights), replacement=True)
    dataloader = DataLoader(dataset, batch_size=batch_size, sampler=sampler, shuffle=not use_sampler)
    print(f"Cargado {weights_path}: {len(dataset)} im√°genes, {len(class_to_idx)} clases")
    return dataloader

# =================== ENTRENAMIENTO ===================
def train_model(train_loader, val_loader, num_epochs=20, early_stop_patience=3):
    model = models.resnet18(weights=ResNet18_Weights.IMAGENET1K_V1)
    num_classes = len(train_loader.dataset.class_to_idx)
    model.fc = nn.Sequential(
        nn.Dropout(0.3),
        nn.Linear(model.fc.in_features, num_classes)
    )
    model = model.to(DEVICE)

    criterion = nn.CrossEntropyLoss()
    optimizer = optim.AdamW(model.parameters(), lr=0.0005)
    scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='max', factor=0.5, patience=2)

    best_acc = 0.0
    epochs_without_improvement = 0

    for epoch in range(num_epochs):
        model.train()
        running_loss = 0.0
        correct = 0
        total = 0
        for images, labels in tqdm(train_loader, desc=f"Entrenando √©poca {epoch+1}/{num_epochs}", leave=True):
            images, labels = images.to(DEVICE), labels.to(DEVICE)
            optimizer.zero_grad()
            outputs = model(images)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            running_loss += loss.item()
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

        train_loss = running_loss / len(train_loader)
        train_acc = 100 * correct / total

        # VALIDACI√ìN
        model.eval()
        val_correct = 0
        val_total = 0
        with torch.no_grad():
            for images, labels in tqdm(val_loader, desc="Validando", leave=False):
                images, labels = images.to(DEVICE), labels.to(DEVICE)
                outputs = model(images)
                _, predicted = torch.max(outputs, 1)
                val_total += labels.size(0)
                val_correct += (predicted == labels).sum().item()

        val_acc = 100 * val_correct / val_total
        scheduler.step(val_acc)

        print(f"\n√âpoca {epoch+1}: P√©rdida Entrenamiento: {train_loss:.4f}, Precisi√≥n Entrenamiento: {train_acc:.2f}%")
        print(f"Precisi√≥n Validaci√≥n: {val_acc:.2f}%")

        # EARLY STOPPING Y SAVE MODEL
        if val_acc > best_acc:
            best_acc = val_acc
            epochs_without_improvement = 0
            try:
                torch.save(model.state_dict(), MODEL_PATH)
                print(f"Mejor modelo guardado en: {MODEL_PATH}")
            except Exception as e:
                print(f"Error al guardar el modelo: {str(e)}")
        else:
            epochs_without_improvement += 1

        if epochs_without_improvement >= early_stop_patience:
            print(f"Parando temprano en √©poca {epoch+1}")
            break

    return model

# =================== EVALUACI√ìN ===================
def evaluate_model(model, test_loader):
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in tqdm(test_loader, desc="Evaluando en prueba", leave=False):
            images, labels = images.to(DEVICE), labels.to(DEVICE)
            outputs = model(images)
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    test_acc = 100 * correct / total
    print(f"\n{'=' * 20}")
    print(f"Precisi√≥n en prueba: {test_acc:.2f}%")
    return test_acc

# =================== MAIN ===================
if __name__ == "__main__":
    try:
        # Verificar CSVs
        for path in [TRAIN_WEIGHTS, VAL_WEIGHTS, TEST_WEIGHTS]:
            if not os.path.exists(path):
                raise FileNotFoundError(f"No se encontr√≥ {path}")
        
        # Crear mapeo global de clases
        train_df = pd.read_csv(TRAIN_WEIGHTS)
        all_classes = sorted(train_df['Class'].unique())
        class_to_idx = {cls: idx for idx, cls in enumerate(all_classes)}

        with tqdm(total=3, desc="Configurando DataLoaders", leave=False) as pbar:
            train_loader = create_dataloader(TRAIN_WEIGHTS, DATASET_PATH, train_transforms, class_to_idx, use_sampler=True)
            pbar.update(1)
            val_loader = create_dataloader(VAL_WEIGHTS, DATASET_PATH, val_transforms, class_to_idx, use_sampler=False)
            pbar.update(1)
            test_loader = create_dataloader(TEST_WEIGHTS, DATASET_PATH, val_transforms, class_to_idx, use_sampler=False)
            pbar.update(1)

        model = train_model(train_loader, val_loader)
        evaluate_model(model, test_loader)
    except Exception as e:
        print(f"Error: {str(e)}")

                                                                       

Cargado C:\Users\Arys\Desktop\Proyecto - 2\balanced_weights\train_weights.csv: 114041 im√°genes, 114 clases
Cargado C:\Users\Arys\Desktop\Proyecto - 2\balanced_weights\val_weights.csv: 24437 im√°genes, 114 clases
Cargado C:\Users\Arys\Desktop\Proyecto - 2\balanced_weights\test_weights.csv: 24438 im√°genes, 114 clases


Entrenando √©poca 1/20: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1782/1782 [14:12<00:00,  2.09it/s]
                                                            


√âpoca 1: P√©rdida Entrenamiento: 0.5696, Precisi√≥n Entrenamiento: 84.06%
Precisi√≥n Validaci√≥n: 79.73%
Mejor modelo guardado en: C:\Users\Arys\Desktop\Proyecto - 2\normal_model.pth


Entrenando √©poca 2/20: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1782/1782 [14:41<00:00,  2.02it/s]
                                                            


√âpoca 2: P√©rdida Entrenamiento: 0.2960, Precisi√≥n Entrenamiento: 90.78%
Precisi√≥n Validaci√≥n: 84.75%
Mejor modelo guardado en: C:\Users\Arys\Desktop\Proyecto - 2\normal_model.pth


Entrenando √©poca 3/20: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1782/1782 [13:43<00:00,  2.16it/s]
                                                            


√âpoca 3: P√©rdida Entrenamiento: 0.2565, Precisi√≥n Entrenamiento: 91.88%
Precisi√≥n Validaci√≥n: 87.87%
Mejor modelo guardado en: C:\Users\Arys\Desktop\Proyecto - 2\normal_model.pth


Entrenando √©poca 4/20: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1782/1782 [13:26<00:00,  2.21it/s]
                                                            


√âpoca 4: P√©rdida Entrenamiento: 0.2243, Precisi√≥n Entrenamiento: 92.82%
Precisi√≥n Validaci√≥n: 88.47%
Mejor modelo guardado en: C:\Users\Arys\Desktop\Proyecto - 2\normal_model.pth


Entrenando √©poca 5/20: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1782/1782 [13:02<00:00,  2.28it/s]
                                                            


√âpoca 5: P√©rdida Entrenamiento: 0.2126, Precisi√≥n Entrenamiento: 93.16%
Precisi√≥n Validaci√≥n: 89.50%
Mejor modelo guardado en: C:\Users\Arys\Desktop\Proyecto - 2\normal_model.pth


Entrenando √©poca 6/20: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1782/1782 [13:02<00:00,  2.28it/s]
                                                            


√âpoca 6: P√©rdida Entrenamiento: 0.1913, Precisi√≥n Entrenamiento: 93.81%
Precisi√≥n Validaci√≥n: 82.50%


Entrenando √©poca 7/20: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1782/1782 [13:02<00:00,  2.28it/s]
                                                            


√âpoca 7: P√©rdida Entrenamiento: 0.1832, Precisi√≥n Entrenamiento: 94.06%
Precisi√≥n Validaci√≥n: 90.17%
Mejor modelo guardado en: C:\Users\Arys\Desktop\Proyecto - 2\normal_model.pth


Entrenando √©poca 8/20: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1782/1782 [13:55<00:00,  2.13it/s]
                                                            


√âpoca 8: P√©rdida Entrenamiento: 0.1696, Precisi√≥n Entrenamiento: 94.43%
Precisi√≥n Validaci√≥n: 91.06%
Mejor modelo guardado en: C:\Users\Arys\Desktop\Proyecto - 2\normal_model.pth


Entrenando √©poca 9/20: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1782/1782 [13:53<00:00,  2.14it/s]
                                                            


√âpoca 9: P√©rdida Entrenamiento: 0.1636, Precisi√≥n Entrenamiento: 94.52%
Precisi√≥n Validaci√≥n: 93.45%
Mejor modelo guardado en: C:\Users\Arys\Desktop\Proyecto - 2\normal_model.pth


Entrenando √©poca 10/20: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1782/1782 [13:52<00:00,  2.14it/s]
                                                            


√âpoca 10: P√©rdida Entrenamiento: 0.1564, Precisi√≥n Entrenamiento: 94.78%
Precisi√≥n Validaci√≥n: 92.07%


Entrenando √©poca 11/20: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1782/1782 [15:45<00:00,  1.88it/s]
                                                            


√âpoca 11: P√©rdida Entrenamiento: 0.1505, Precisi√≥n Entrenamiento: 95.08%
Precisi√≥n Validaci√≥n: 91.06%


Entrenando √©poca 12/20: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1782/1782 [16:57<00:00,  1.75it/s]
                                                            


√âpoca 12: P√©rdida Entrenamiento: 0.1493, Precisi√≥n Entrenamiento: 95.03%
Precisi√≥n Validaci√≥n: 91.74%
Parando temprano en √©poca 12


                                                                      


Precisi√≥n en prueba: 91.85%




## ***Pruebas con camara***

In [1]:
import cv2
import torch
import torch.nn as nn
from PIL import Image
from torchvision import models, transforms
import pandas as pd
import numpy as np
import os
import time
from tqdm import tqdm

# ================= CONFIGURACI√ìN =================
MODEL_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\normal_model.pth"
WEIGHTS_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\balanced_weights\train_weights.csv"
CAPTURED_IMAGES_DIR = r"C:\Users\Arys\Desktop\Proyecto - 2\captured_images"
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
DEBUG = True  # Activa la depuraci√≥n para ver la imagen umbralizada

os.makedirs(CAPTURED_IMAGES_DIR, exist_ok=True)

# ================ TRANSFORMACI√ìN =================
test_transforms = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# ================ CARGAR MODELO ==================
def load_model_and_classes(model_path, weights_path):
    """Carga el modelo y el mapeo de clases."""
    weights_df = pd.read_csv(weights_path)
    all_classes = sorted(weights_df['Class'].unique())
    class_to_idx = {cls: idx for idx, cls in enumerate(all_classes)}
    idx_to_class = {idx: cls for cls, idx in class_to_idx.items()}
    
    model = models.resnet18(pretrained=False)
    model.fc = nn.Sequential(
        nn.Dropout(0.3),
        nn.Linear(model.fc.in_features, len(class_to_idx))
    )
    model.load_state_dict(torch.load(model_path, map_location=DEVICE))
    model = model.to(DEVICE)
    model.eval()
    return model, idx_to_class

def normalize_brightness(frame):
    """Normaliza el brillo de la imagen."""
    hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
    hsv[:, :, 2] = cv2.normalize(hsv[:, :, 2], None, 0, 255, cv2.NORM_MINMAX)
    return cv2.cvtColor(hsv, cv2.COLOR_HSV2BGR)

def detect_leaf(frame, min_area=500):
    """Detecta una hoja con umbral adaptativo y devuelve el recuadro con contornos mejorados."""
    # Normaliza el brillo
    frame = normalize_brightness(frame)
    
    # Convierte a escala de grises y aplica desenfoque
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    blurred = cv2.GaussianBlur(gray, (5, 5), 0)
    
    # Aplica umbral adaptativo (mantenemos el original que te gusta)
    thresh = cv2.adaptiveThreshold(blurred, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, 
                                   cv2.THRESH_BINARY_INV, 21, 5)
    
    # Ligero refinamiento morfol√≥gico para mejorar contornos sin cambiar la esencia
    kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3, 3))
    cleaned = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel, iterations=1)
    
    # Muestra la imagen umbralizada para depuraci√≥n (si DEBUG=True)
    if DEBUG:
        cv2.imshow("Umbral", cleaned)
    
    # Encuentra contornos
    contours, _ = cv2.findContours(cleaned, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    
    if contours:
        # Selecciona el contorno m√°s grande
        largest_contour = max(contours, key=cv2.contourArea)
        area = cv2.contourArea(largest_contour)
        if area > min_area:
            x, y, w, h = cv2.boundingRect(largest_contour)
            if w > 30 and h > 30 and w < frame.shape[1] * 0.9 and h < frame.shape[0] * 0.9:
                # Dibuja el contorno en la imagen original
                cv2.drawContours(frame, [largest_contour], -1, (0, 255, 0), 2)
                return (x, y, x+w, y+h), frame[y:y+h, x:x+w]
    
    return None, frame

def predict_image(image, model, idx_to_class, transform):
    """Predice la clase de una imagen y devuelve la confianza."""
    image = transform(image).unsqueeze(0).to(DEVICE)
    with torch.no_grad():
        outputs = model(image)
        probs = torch.softmax(outputs, dim=1)
        confidence, predicted = torch.max(probs, 1)
    return idx_to_class[predicted.item()], confidence.item()

def capture_and_predict():
    """Captura im√°genes, detecta una hoja y predice en tiempo real."""
    model, idx_to_class = load_model_and_classes(MODEL_PATH, WEIGHTS_PATH)
    cap = cv2.VideoCapture(0)
    if not cap.isOpened():
        print("Error: No se pudo abrir la c√°mara.")
        return

    # Reducir la resoluci√≥n de la c√°mara para mejorar FPS
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

    print("Predicciones en tiempo real. Presiona 'q' para guardar imagen, 'Esc' para salir.")
    
    # Variables para estabilizaci√≥n
    last_bbox = None
    last_predicted_class = "Buscando hoja..."
    last_confidence = 0.0
    stable_count = 0
    STABLE_THRESHOLD = 3  # N√∫mero de frames consecutivos para considerar una detecci√≥n estable
    
    with tqdm(total=1, desc="Capturando desde c√°mara", leave=True) as pbar:
        while True:
            ret, frame = cap.read()
            if not ret:
                print("Error: No se pudo leer el frame.")
                break

            # Detecta una hoja
            bbox, roi = detect_leaf(frame, min_area=500)
            predicted_class = last_predicted_class
            confidence = last_confidence

            if bbox:
                # Compara con la detecci√≥n anterior para estabilizar
                if last_bbox and abs(bbox[0] - last_bbox[0]) < 50 and abs(bbox[1] - last_bbox[1]) < 50:
                    stable_count += 1
                else:
                    stable_count = 1
                    last_bbox = bbox
                
                if stable_count >= STABLE_THRESHOLD:
                    x1, y1, x2, y2 = bbox
                    roi_pil = Image.fromarray(cv2.cvtColor(roi, cv2.COLOR_BGR2RGB))
                    predicted_class, confidence = predict_image(roi_pil, model, idx_to_class, test_transforms)
                    last_predicted_class = predicted_class
                    last_confidence = confidence
                    # Dibuja el rect√°ngulo y la predicci√≥n
                    cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
                    cv2.putText(frame, f"Clase: {predicted_class}", (x1, y1-10), 
                                cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 255, 0), 2)
                    cv2.putText(frame, f"Confianza: {confidence*100:.2f}%", (x1, y1-30), 
                                cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 255, 255), 2)
            else:
                stable_count = 0
                last_bbox = None
                predicted_class = "Buscando hoja..."
                confidence = 0.0
                last_predicted_class = predicted_class
                last_confidence = confidence

            # Muestra la predicci√≥n en la parte superior
            cv2.putText(frame, f"Pred: {predicted_class}", (10, 30), 
                        cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
            cv2.putText(frame, f"Conf: {confidence*100:.2f}%", (10, 60), 
                        cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 255), 2)
            cv2.imshow("C√°mara", frame)

            key = cv2.waitKey(30)  # 30ms (~33 FPS) para un feed m√°s fluido
            if key == ord('q') and bbox:  # Solo guarda si se detecta una hoja
                roi_pil = Image.fromarray(cv2.cvtColor(roi, cv2.COLOR_BGR2RGB))
                save_path = os.path.join(CAPTURED_IMAGES_DIR, 
                                       f"captured_{predicted_class}_{confidence*100:.2f}_{int(time.time())}.jpg")
                roi_pil.save(save_path)
                print(f"Imagen guardada: {save_path}")
                pbar.update(0)
            elif key == 27:  # Esc
                break

        cap.release()
        cv2.destroyAllWindows()
        if DEBUG:
            cv2.destroyWindow("Umbral")

if __name__ == "__main__":
    capture_and_predict()



Predicciones en tiempo real. Presiona 'q' para guardar imagen, 'Esc' para salir.


Capturando desde c√°mara:   0%|          | 0/1 [03:19<?, ?it/s]


error: OpenCV(4.12.0) D:\a\opencv-python\opencv-python\opencv\modules\highgui\src\window_w32.cpp:1261: error: (-27:Null pointer) NULL window: 'Umbral' in function 'cvDestroyWindow'


## ***Generaci√≥n del dataset augmentado y Carga de pesos***

In [18]:
import os
import pandas as pd
from PIL import Image
import torch
from torchvision import transforms
from torchvision.transforms.functional import to_pil_image
from tqdm import tqdm
from concurrent.futures import ThreadPoolExecutor, as_completed

# Configuraci√≥n
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset_resized"
AUGMENTED_OUTPUT_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset_augmented"
AUGMENTED_WEIGHTS = r"C:\Users\Arys\Desktop\Proyecto - 2\augmented_weights\augmented_plant_weights.csv"
AUGMENTATIONS_PER_IMAGE = 5
TARGET_SIZE = (224, 224)
MAX_WORKERS = 16

# Transformaciones para aumento de datos
augment_transforms = transforms.Compose([
    transforms.ToTensor(),  # Convertir imagen PIL a tensor
    transforms.RandomResizedCrop(224, scale=(0.8, 1.0)),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.RandomVerticalFlip(p=0.5),
    transforms.RandomRotation(20),
    transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2),
    transforms.RandomAffine(degrees=0, translate=(0.05, 0.05)),
])

def collect_image_paths(input_dir):
    """Recolecta rutas de im√°genes .jpg."""
    file_paths = []
    for root, _, files in tqdm(os.walk(input_dir), desc="Recolectando im√°genes"):
        for file in files:
            if file.lower().endswith('.jpg'):
                file_paths.append(os.path.join(root, file))
    return file_paths

def augment_and_save_image(file_path, output_dir, augmentations_per_image):
    """Genera im√°genes aumentadas y las guarda."""
    results = []
    try:
        # Obtener la ruta relativa y la clase
        relative_path = os.path.relpath(os.path.dirname(file_path), DATASET_PATH)
        output_subdir = os.path.join(output_dir, relative_path)
        os.makedirs(output_subdir, exist_ok=True)
        
        # Generar nombre de clase (incluyendo prefijo: color, grayscale, segmented)
        img_class_parts = relative_path.replace('\\', '/').split('/')
        if len(img_class_parts) < 2:  # Asegurarse de que hay al menos tipo (color/grayscale/segmented) y planta
            return [(file_path, None, None, None, "Ruta relativa inv√°lida")]
        img_class = '_'.join(img_class_parts)  # Ejemplo: color_Apple_Apple_scab
        
        with Image.open(file_path) as img:
            img = img.convert('RGB')
            base_name = os.path.splitext(os.path.basename(file_path))[0]
            
            # Guardar imagen original
            output_path = os.path.join(output_subdir, f"{base_name}.jpg")
            try:
                img.resize(TARGET_SIZE, Image.LANCZOS).save(output_path, 'JPEG', quality=95)
                results.append((file_path, output_path, img_class, 1.0, None))
            except Exception as e:
                results.append((file_path, None, img_class, None, f"Error al guardar imagen original: {str(e)}"))
            
            # Generar im√°genes aumentadas
            for i in range(augmentations_per_image):
                try:
                    aug_img = augment_transforms(img)  # Aplica transformaciones (img se convierte a tensor)
                    aug_img_pil = to_pil_image(aug_img)  # Convierte tensor a PIL
                    aug_path = os.path.join(output_subdir, f"{base_name}_aug_{i}.jpg")
                    aug_img_pil.save(aug_path, 'JPEG', quality=95)  # Guardar con PIL
                    results.append((file_path, aug_path, img_class, 0.5, None))  # Peso menor para aumentadas
                except Exception as e:
                    results.append((file_path, None, img_class, None, f"Error al generar imagen aumentada {i}: {str(e)}"))
                
        return results
    except Exception as e:
        return [(file_path, None, None, None, f"Error general: {str(e)}")]

def augment_dataset(input_dir, output_dir, augmentations_per_image):
    """Aumenta el dataset y genera CSV con pesos."""
    os.makedirs(os.path.dirname(AUGMENTED_WEIGHTS), exist_ok=True)
    file_paths = collect_image_paths(input_dir)
    all_results = []
    
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        future_to_file = {executor.submit(augment_and_save_image, path, output_dir, augmentations_per_image): path for path in file_paths}
        for future in tqdm(as_completed(future_to_file), total=len(file_paths), desc="Aumentando im√°genes"):
            all_results.extend(future.result())
    
    # Separar resultados v√°lidos y errores
    valid_results = [r for r in all_results if r[1] is not None and r[2] is not None]
    errors = [r for r in all_results if r[4] is not None]
    
    # Crear DataFrame
    data = [(r[2], os.path.basename(r[1]), r[3]) for r in valid_results]
    if not data:
        print(f"Error: No se generaron im√°genes v√°lidas. Total errores: {len(errors)}")
        print("\nArchivos con errores:")
        for _, _, _, _, error in errors:
            print(f"  {error}")
        return
    
    df = pd.DataFrame(data, columns=['Class', 'Image', 'Weight'])
    
    # Calcular pesos inversos por clase
    class_counts = df['Class'].value_counts()
    total_images = len(df)
    df['Weight'] = df['Class'].apply(lambda x: total_images / class_counts[x])
    weight_sum = df['Weight'].sum()
    df['Weight'] = df['Weight'] / weight_sum
    
    df.to_csv(AUGMENTED_WEIGHTS, index=False)
    
    # Resumen
    print(f"\n{'=' * 20}")
    print(f"Im√°genes totales (originales + aumentadas): {len(df)}")
    print(f"Clases √∫nicas: {len(df['Class'].unique())}")
    print(f"Pesos guardados en: {AUGMENTED_WEIGHTS}")
    print(f"Errores: {len(errors)}")
    if errors:
        print("\nArchivos con errores:")
        for _, _, _, _, error in errors:
            print(f"  {error}")

if __name__ == "__main__":
    augment_dataset(DATASET_PATH, AUGMENTED_OUTPUT_PATH, AUGMENTATIONS_PER_IMAGE)

Recolectando im√°genes: 160it [00:00, 252.37it/s]
Aumentando im√°genes: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 162916/162916 [1:11:09<00:00, 38.15it/s]



Im√°genes totales (originales + aumentadas): 977496
Clases √∫nicas: 114
Pesos guardados en: C:\Users\Arys\Desktop\Proyecto - 2\augmented_weights\augmented_plant_weights.csv
Errores: 0


## ***Divisi√≥n del dataset augmentado***

In [19]:
import os
import pandas as pd
from sklearn.model_selection import train_test_split

# Configuraci√≥n
AUGMENTED_WEIGHTS = r"C:\Users\Arys\Desktop\Proyecto - 2\augmented_weights\augmented_plant_weights.csv"
OUTPUT_WEIGHTS_DIR = r"C:\Users\Arys\Desktop\Proyecto - 2\augmented_weights"
TRAIN_WEIGHTS = os.path.join(OUTPUT_WEIGHTS_DIR, "aug_train_weights.csv")
VAL_WEIGHTS = os.path.join(OUTPUT_WEIGHTS_DIR, "aug_val_weights.csv")
TEST_WEIGHTS = os.path.join(OUTPUT_WEIGHTS_DIR, "aug_test_weights.csv")
os.makedirs(OUTPUT_WEIGHTS_DIR, exist_ok=True)

def split_augmented_dataset(weights_path, train_ratio=0.7, val_ratio=0.15, sample_fraction=None):
    """Divide el dataset aumentado en entrenamiento, validaci√≥n y prueba."""
    if not os.path.exists(weights_path):
        raise FileNotFoundError(f"No se encontr√≥ {weights_path}")

    df = pd.read_csv(weights_path)
    if not all(col in df.columns for col in ['Class', 'Image', 'Weight']):
        raise ValueError("El CSV debe contener 'Class', 'Image' y 'Weight'")

    if len(df) < 10:
        raise ValueError("El dataset es demasiado peque√±o para dividir")

    if sample_fraction is not None and 0 < sample_fraction < 1:
        df = df.groupby('Class', group_keys=False).apply(lambda x: x.sample(frac=sample_fraction, random_state=42)).reset_index(drop=True)
        print(f"Se ha muestreado el {sample_fraction * 100:.1f}% del dataset para pruebas")

    num_classes = len(df['Class'].unique())
    print(f"Clases encontradas: {num_classes}")

    # Divide en entrenamiento y resto (validaci√≥n + prueba)
    train_df, temp_df = train_test_split(
        df, train_size=train_ratio, stratify=df['Class'], random_state=42
    )
    val_size = val_ratio / (1 - train_ratio)
    val_df, test_df = train_test_split(
        temp_df, train_size=val_size, stratify=temp_df['Class'], random_state=42
    )

    # Guarda los conjuntos
    train_df.to_csv(TRAIN_WEIGHTS, index=False)
    val_df.to_csv(VAL_WEIGHTS, index=False)
    test_df.to_csv(TEST_WEIGHTS, index=False)

    print(f"\n{'=' * 20}")
    print(f"Total im√°genes: {len(df)}")
    print(f"Entrenamiento: {len(train_df)} im√°genes ({len(train_df['Class'].unique())} clases)")
    print(f"Validaci√≥n: {len(val_df)} im√°genes ({len(val_df['Class'].unique())} clases)")
    print(f"Prueba: {len(test_df)} im√°genes ({len(test_df['Class'].unique())} clases)")
    print(f"CSVs guardados en: {OUTPUT_WEIGHTS_DIR}")

if __name__ == "__main__":
    try:
        # Usa sample_fraction=0.05 para trabajar solo con el 5% del dataset
        split_augmented_dataset(AUGMENTED_WEIGHTS, sample_fraction=0.30)
    except Exception as e:
        print(f"Error: {str(e)}")


  df = df.groupby('Class', group_keys=False).apply(lambda x: x.sample(frac=sample_fraction, random_state=42)).reset_index(drop=True)


Se ha muestreado el 30.0% del dataset para pruebas
Clases encontradas: 114

Total im√°genes: 293258
Entrenamiento: 205280 im√°genes (114 clases)
Validaci√≥n: 43988 im√°genes (114 clases)
Prueba: 43990 im√°genes (114 clases)
CSVs guardados en: C:\Users\Arys\Desktop\Proyecto - 2\augmented_weights


## ***Entrenamiento del modelo con imagenes augmentadas y originales***

In [None]:
import os
import pandas as pd
from PIL import Image
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader, WeightedRandomSampler
from torchvision import models, transforms
from torchvision.models import ResNet18_Weights
from tqdm import tqdm

# =================== CONFIGURACI√ìN ===================
DATASET_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\plantvillage-dataset_augmented"
WEIGHTS_DIR = r"C:\Users\Arys\Desktop\Proyecto - 2\augmented_weights"
TRAIN_WEIGHTS = os.path.join(WEIGHTS_DIR, "aug_train_weights.csv")
VAL_WEIGHTS = os.path.join(WEIGHTS_DIR, "aug_val_weights.csv")
TEST_WEIGHTS = os.path.join(WEIGHTS_DIR, "aug_test_weights.csv")
MODEL_DIR = r"C:\Users\Arys\Desktop\Proyecto - 2\models"
os.makedirs(MODEL_DIR, exist_ok=True)
MODEL_PATH = os.path.join(MODEL_DIR, "augmented_model.pth")
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
BATCH_SIZE = 32
NUM_WORKERS = 0
NUM_EPOCHS = 20
EARLY_STOP_PATIENCE = 3
METRICS_LOG = os.path.join(MODEL_DIR, "training_metrics.csv")

if torch.cuda.is_available():
    torch.cuda.empty_cache()
    torch.backends.cudnn.benchmark = True

train_transforms = transforms.Compose([
    transforms.RandomResizedCrop(224),
    transforms.RandomHorizontalFlip(),
    transforms.RandomVerticalFlip(),
    transforms.RandomRotation(20),
    transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2),
    transforms.RandomAffine(degrees=0, translate=(0.05, 0.05)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

val_transforms = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

class PlantDataset(Dataset):
    def __init__(self, weights_df, root_dir, transform=None, class_to_idx=None):
        self.data = weights_df
        self.root_dir = root_dir
        self.transform = transform
        self.class_to_idx = class_to_idx

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        row = self.data.iloc[idx]
        img_parts = row['Class'].split('_')
        img_path = f"{self.root_dir}/{img_parts[0]}/{img_parts[1]}/{'_'.join(img_parts[2:])}/{row['Image']}"
        try:
            image = Image.open(img_path).convert('RGB')
            if self.transform:
                image = self.transform(image)
            label = self.class_to_idx[row['Class']]
            return image, label
        except Exception as e:
            print(f"Error al cargar {img_path}: {str(e)}")
            raise

def create_dataloader(weights_path, root_dir, transform, class_to_idx, batch_size=BATCH_SIZE, use_sampler=True):
    weights_df = pd.read_csv(weights_path)
    dataset = PlantDataset(weights_df, root_dir, transform=transform, class_to_idx=class_to_idx)
    sampler = WeightedRandomSampler(torch.tensor(weights_df['Weight'].values, dtype=torch.float), len(weights_df), replacement=True) if use_sampler else None
    return DataLoader(dataset, batch_size=batch_size, sampler=sampler, shuffle=not use_sampler, num_workers=NUM_WORKERS, pin_memory=torch.cuda.is_available())

def train_model(train_loader, val_loader, start_epoch=1, num_epochs=NUM_EPOCHS, early_stop_patience=EARLY_STOP_PATIENCE):
    model = models.resnet18(weights=ResNet18_Weights.IMAGENET1K_V1)
    num_classes = len(train_loader.dataset.class_to_idx)
    model.fc = nn.Sequential(nn.Dropout(0.3), nn.Linear(model.fc.in_features, num_classes))
    model = model.to(DEVICE)

    checkpoint_path = os.path.join(MODEL_DIR, f"model_epoch_{start_epoch - 1}.pth")
    if os.path.exists(checkpoint_path):
        model.load_state_dict(torch.load(checkpoint_path))
        print(f"Modelo cargado desde {checkpoint_path}")

    criterion = nn.CrossEntropyLoss()
    optimizer = optim.AdamW(model.parameters(), lr=0.0005)
    scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='max', factor=0.5, patience=2)

    best_acc = 0.0
    epochs_without_improvement = 0

    if start_epoch == 1 and os.path.exists(METRICS_LOG):
        os.remove(METRICS_LOG)

    for epoch in range(start_epoch, num_epochs + 1):
        model.train()
        running_loss, correct, total = 0.0, 0, 0

        for images, labels in tqdm(train_loader, desc=f"Entrenando √©poca {epoch}/{num_epochs}"):
            images, labels = images.to(DEVICE), labels.to(DEVICE)
            optimizer.zero_grad()
            outputs = model(images)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            running_loss += loss.item()
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

        train_loss = running_loss / len(train_loader)
        train_acc = 100 * correct / total

        model.eval()
        val_correct, val_total = 0, 0
        with torch.no_grad():
            for images, labels in tqdm(val_loader, desc="Validando"):
                images, labels = images.to(DEVICE), labels.to(DEVICE)
                outputs = model(images)
                _, predicted = torch.max(outputs, 1)
                val_total += labels.size(0)
                val_correct += (predicted == labels).sum().item()

        val_acc = 100 * val_correct / val_total
        scheduler.step(val_acc)

        print(f"\n√âpoca {epoch}: P√©rdida Entrenamiento: {train_loss:.4f}, Precisi√≥n Entrenamiento: {train_acc:.2f}%")
        print(f"Precisi√≥n Validaci√≥n: {val_acc:.2f}%")

        torch.save(model.state_dict(), os.path.join(MODEL_DIR, f"model_epoch_{epoch}.pth"))
        if val_acc > best_acc:
            best_acc = val_acc
            torch.save(model.state_dict(), MODEL_PATH)
            epochs_without_improvement = 0
        else:
            epochs_without_improvement += 1

        with open(METRICS_LOG, "a") as f:
            if epoch == start_epoch:
                f.write("Epoch,TrainLoss,TrainAcc,ValAcc\n")
            f.write(f"{epoch},{train_loss:.4f},{train_acc:.2f},{val_acc:.2f}\n")

        if epochs_without_improvement >= early_stop_patience:
            print(f"Parando temprano en √©poca {epoch}")
            break
    return model

def evaluate_model(model, test_loader):
    model.eval()
    correct, total = 0, 0
    with torch.no_grad():
        for images, labels in tqdm(test_loader, desc="Evaluando en prueba"):
            images, labels = images.to(DEVICE), labels.to(DEVICE)
            outputs = model(images)
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    print(f"\n{'='*20}\nPrecisi√≥n en prueba: {100 * correct / total:.2f}%")
    return 100 * correct / total

if __name__ == "__main__":
    train_df = pd.read_csv(TRAIN_WEIGHTS)
    class_to_idx = {cls: idx for idx, cls in enumerate(sorted(train_df['Class'].unique()))}

    train_loader = create_dataloader(TRAIN_WEIGHTS, DATASET_PATH, train_transforms, class_to_idx)
    val_loader = create_dataloader(VAL_WEIGHTS, DATASET_PATH, val_transforms, class_to_idx, use_sampler=False)
    test_loader = create_dataloader(TEST_WEIGHTS, DATASET_PATH, val_transforms, class_to_idx, use_sampler=False)

    # Cambia aqu√≠ si quieres reanudar desde cierta √©poca
    model = train_model(train_loader, val_loader, start_epoch=1)
    evaluate_model(model, test_loader)


Entrenando √©poca 1/20: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 6415/6415 [44:34<00:00,  2.40it/s]  
Validando: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1375/1375 [09:22<00:00,  2.44it/s]



√âpoca 1: P√©rdida Entrenamiento: 0.8958, Precisi√≥n Entrenamiento: 72.66%
Precisi√≥n Validaci√≥n: 90.49%


Entrenando √©poca 2/20: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 6415/6415 [34:27<00:00,  3.10it/s]  
Validando: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1375/1375 [02:26<00:00,  9.39it/s]



√âpoca 2: P√©rdida Entrenamiento: 0.5359, Precisi√≥n Entrenamiento: 82.55%
Precisi√≥n Validaci√≥n: 91.58%


Entrenando √©poca 3/20: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 6415/6415 [30:26<00:00,  3.51it/s]
Validando: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1375/1375 [02:34<00:00,  8.92it/s]



√âpoca 3: P√©rdida Entrenamiento: 0.4505, Precisi√≥n Entrenamiento: 85.12%
Precisi√≥n Validaci√≥n: 93.74%


Entrenando √©poca 4/20: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 6415/6415 [29:26<00:00,  3.63it/s]
Validando: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1375/1375 [02:22<00:00,  9.63it/s]



√âpoca 4: P√©rdida Entrenamiento: 0.3990, Precisi√≥n Entrenamiento: 86.76%
Precisi√≥n Validaci√≥n: 94.64%


Entrenando √©poca 5/20: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 6415/6415 [29:55<00:00,  3.57it/s]
Validando: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1375/1375 [02:47<00:00,  8.22it/s]



√âpoca 5: P√©rdida Entrenamiento: 0.3641, Precisi√≥n Entrenamiento: 87.75%
Precisi√≥n Validaci√≥n: 94.97%


Entrenando √©poca 6/20: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 6415/6415 [30:46<00:00,  3.47it/s]
Validando: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1375/1375 [02:33<00:00,  8.96it/s]



√âpoca 6: P√©rdida Entrenamiento: 0.3392, Precisi√≥n Entrenamiento: 88.53%
Precisi√≥n Validaci√≥n: 94.46%


Entrenando √©poca 7/20: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 6415/6415 [32:57<00:00,  3.24it/s]  
Validando: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1375/1375 [03:32<00:00,  6.47it/s]



√âpoca 7: P√©rdida Entrenamiento: 0.3191, Precisi√≥n Entrenamiento: 89.12%
Precisi√≥n Validaci√≥n: 95.76%


Entrenando √©poca 8/20: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 6415/6415 [29:03<00:00,  3.68it/s] 
Validando: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1375/1375 [02:29<00:00,  9.18it/s]



√âpoca 8: P√©rdida Entrenamiento: 0.3041, Precisi√≥n Entrenamiento: 89.67%
Precisi√≥n Validaci√≥n: 95.96%


Entrenando √©poca 9/20: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 6415/6415 [28:40<00:00,  3.73it/s]  
Validando: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1375/1375 [03:05<00:00,  7.41it/s]



√âpoca 9: P√©rdida Entrenamiento: 0.2879, Precisi√≥n Entrenamiento: 90.16%
Precisi√≥n Validaci√≥n: 95.09%


Entrenando √©poca 10/20: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 6415/6415 [32:55<00:00,  3.25it/s]  
Validando: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1375/1375 [03:05<00:00,  7.40it/s]



√âpoca 10: P√©rdida Entrenamiento: 0.2782, Precisi√≥n Entrenamiento: 90.45%
Precisi√≥n Validaci√≥n: 96.10%


Entrenando √©poca 11/20: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 6415/6415 [29:49<00:00,  3.59it/s]  
Validando: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1375/1375 [03:11<00:00,  7.17it/s]



√âpoca 11: P√©rdida Entrenamiento: 0.2723, Precisi√≥n Entrenamiento: 90.66%
Precisi√≥n Validaci√≥n: 95.89%


Entrenando √©poca 12/20: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 6415/6415 [32:07<00:00,  3.33it/s]  
Validando: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1375/1375 [02:45<00:00,  8.31it/s]



√âpoca 12: P√©rdida Entrenamiento: 0.2605, Precisi√≥n Entrenamiento: 90.99%
Precisi√≥n Validaci√≥n: 96.62%


Entrenando √©poca 13/20: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 6415/6415 [30:30<00:00,  3.50it/s] 
Validando: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1375/1375 [03:20<00:00,  6.86it/s]



√âpoca 13: P√©rdida Entrenamiento: 0.2562, Precisi√≥n Entrenamiento: 91.22%
Precisi√≥n Validaci√≥n: 95.99%


Entrenando √©poca 14/20: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 6415/6415 [29:52<00:00,  3.58it/s]  
Validando: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1375/1375 [02:38<00:00,  8.66it/s]



√âpoca 14: P√©rdida Entrenamiento: 0.2506, Precisi√≥n Entrenamiento: 91.36%
Precisi√≥n Validaci√≥n: 95.70%


Entrenando √©poca 15/20: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 6415/6415 [28:22<00:00,  3.77it/s]  
Validando: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1375/1375 [02:40<00:00,  8.59it/s]



√âpoca 15: P√©rdida Entrenamiento: 0.2422, Precisi√≥n Entrenamiento: 91.56%
Precisi√≥n Validaci√≥n: 96.50%
Parando temprano en √©poca 15


Evaluando en prueba: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1375/1375 [08:41<00:00,  2.64it/s]


Precisi√≥n en prueba: 96.38%





: 

## ***Predicci√≥n con c√°mara (modelo augmentado)***

In [3]:
import cv2
import torch
import torch.nn as nn
import torchvision.transforms as transforms
from torchvision import models
from torchvision.models import ResNet18_Weights
import os
import pandas as pd
from PIL import Image
import numpy as np
import textwrap

# ================= CONFIGURACI√ìN =================
MODEL_PATH = r"C:\Users\Arys\Desktop\Proyecto - 2\models\augmented_model.pth"
CLASSES_CSV = r"C:\Users\Arys\Desktop\Proyecto - 2\augmented_weights\aug_train_weights.csv"
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
CAPTURED_IMAGES_DIR = r"C:\Users\Arys\Desktop\Proyecto - 2\captured_images"
os.makedirs(CAPTURED_IMAGES_DIR, exist_ok=True)

# ================ CLASES =================
train_df = pd.read_csv(CLASSES_CSV)
classes = sorted(train_df['Class'].unique())
class_to_idx = {cls: idx for idx, cls in enumerate(classes)}
idx_to_class = {idx: cls for cls, idx in class_to_idx.items()}

# ================ TRANSFORMACIONES =================
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ColorJitter(brightness=0.4, contrast=0.4),  # Aumentado para manejar vistas variables
    transforms.RandomRotation(180),  # Para manejar perspectivas (arriba/abajo)
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# ================ CARGAR MODELO =================
model = models.resnet18(weights=ResNet18_Weights.IMAGENET1K_V1)
model.fc = nn.Sequential(
    nn.Dropout(0.3),
    nn.Linear(model.fc.in_features, len(classes))
)
model.load_state_dict(torch.load(MODEL_PATH, map_location=DEVICE))
model = model.to(DEVICE)
model.eval()

# ================ CAPTURA DE VIDEO =================
cap = cv2.VideoCapture(0)
cv2.namedWindow("Detecci√≥n en tiempo real", cv2.WINDOW_NORMAL)
cv2.resizeWindow("Detecci√≥n en tiempo real", 900, 600)
cv2.namedWindow("Umbral Blanco y Negro", cv2.WINDOW_NORMAL)
cv2.resizeWindow("Umbral Blanco y Negro", 300, 300)

print("Presiona 'q' para salir, 's' para guardar imagen.")
while True:
    ret, frame = cap.read()
    if not ret:
        print("No se pudo capturar imagen.")
        break

    # Detecci√≥n de hoja usando color verde ampliado
    hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
    lower_green = np.array([15, 20, 20])  # Rango m√°s amplio para "peach"
    upper_green = np.array([100, 255, 255])
    mask = cv2.inRange(hsv, lower_green, upper_green)
    blurred = cv2.GaussianBlur(mask, (9, 9), 0)
    _, thresh_green = cv2.threshold(blurred, 80, 255, cv2.THRESH_BINARY)
    kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (9, 9))
    cleaned = cv2.morphologyEx(thresh_green, cv2.MORPH_CLOSE, kernel, iterations=4)

    # Umbral blanco y negro para depuraci√≥n
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    _, thresh_bw = cv2.threshold(gray, 70, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)

    contours, _ = cv2.findContours(cleaned, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    bbox = None
    roi = frame
    if contours:
        largest_contour = max(contours, key=cv2.contourArea)
        area = cv2.contourArea(largest_contour)
        if area > 300:  # Reducido para detectar hojas individuales
            x, y, w, h = cv2.boundingRect(largest_contour)
            if w > 30 and h > 30 and w < frame.shape[1] * 0.95 and h < frame.shape[0] * 0.95:
                cv2.drawContours(frame, [largest_contour], -1, (0, 255, 0), 2)
                bbox = (x, y, x + w, y + h)
                roi = frame[y:y+h, x:x+w]
                cv2.putText(frame, f"√Årea: {area:.0f}", (x, y-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 2)

    # Preprocesamiento y predicci√≥n
    rgb_image = cv2.cvtColor(roi, cv2.COLOR_BGR2RGB)
    pil_image = Image.fromarray(rgb_image)
    input_tensor = transform(pil_image).unsqueeze(0).to(DEVICE)

    with torch.no_grad():
        outputs = model(input_tensor)
        probs = torch.softmax(outputs, dim=1)
        confidence, predicted = torch.max(probs, 1)
        label = idx_to_class[predicted.item()] if confidence.item() > 0.6 else "Desconocido"

    # Mostrar texto
    wrapped_text = textwrap.wrap(label, width=40)
    y = 30
    for line in wrapped_text:
        cv2.putText(frame, line, (10, y), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 0), 2)
        y += 25
    cv2.putText(frame, f"Conf: {confidence.item()*100:.2f}%", (10, y), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 255), 2)

    # Mostrar ventanas
    cv2.imshow("Detecci√≥n en tiempo real", frame)
    cv2.imshow("Umbral Blanco y Negro", thresh_bw)

    # Salir o guardar
    key = cv2.waitKey(50) & 0xFF
    if key == ord('q'):
        break
    elif key == ord('s') and bbox:
        roi_pil = Image.fromarray(rgb_image)
        save_path = os.path.join(CAPTURED_IMAGES_DIR, f"captured_{label}_{confidence.item()*100:.2f}_{int(time.time())}.jpg")
        roi_pil.save(save_path)
        print(f"Imagen guardada: {save_path}")

cap.release()
cv2.destroyAllWindows()
cv2.destroyWindow("Umbral Blanco y Negro")

Presiona 'q' para salir, 's' para guardar imagen.


error: OpenCV(4.12.0) D:\a\opencv-python\opencv-python\opencv\modules\highgui\src\window_w32.cpp:1261: error: (-27:Null pointer) NULL window: 'Umbral Blanco y Negro' in function 'cvDestroyWindow'
