# Data Exploration

In this notebook, we will explore the dataset used for training the YOLO models. We will visualize some sample images, check the distribution of classes, and understand the data characteristics.

In [1]:
import os
import cv2
import matplotlib.pyplot as plt
import pandas as pd
from src.data.dataset_loader import load_dataset

# Load the dataset
data_path = 'path/to/your/dataset'
images, annotations = load_dataset(data_path)

# Display some sample images
def display_samples(images, annotations, num_samples=5):
    plt.figure(figsize=(15, 10))
    for i in range(num_samples):
        plt.subplot(1, num_samples, i + 1)
        img = cv2.cvtColor(images[i], cv2.COLOR_BGR2RGB)
        plt.imshow(img)
        plt.title(f'Annotation: {annotations[i]}')
        plt.axis('off')
    plt.show()

display_samples(images, annotations)


## Class Distribution

Next, we will analyze the distribution of classes in the dataset to understand the balance of our data.

In [2]:
class_distribution = pd.Series(annotations).value_counts()
class_distribution.plot(kind='bar', figsize=(10, 5), color='skyblue')
plt.title('Class Distribution')
plt.xlabel('Classes')
plt.ylabel('Number of Samples')
plt.xticks(rotation=45)
plt.show()

## Conclusion

In this notebook, we explored the dataset by visualizing sample images and analyzing the class distribution. This understanding will help us in the subsequent steps of model training and evaluation.