# Data Exploration

In this notebook, we will explore the dataset used for billboard detection. We will visualize data distributions and understand the characteristics of the data.

In [1]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set visualization style
sns.set(style='whitegrid')

In [2]:
# Load the dataset
data_path = '../data/processed/billboard_data.csv'  # Update with the correct path
data = pd.read_csv(data_path)

# Display the first few rows of the dataset
data.head()

In [3]:
# Summary statistics
data.describe()

In [4]:
# Visualize the distribution of billboard sizes
plt.figure(figsize=(10, 6))
sns.histplot(data['size'], bins=30, kde=True)
plt.title('Distribution of Billboard Sizes')
plt.xlabel('Size')
plt.ylabel('Frequency')
plt.show()

In [5]:
# Visualize the count of billboards by category
plt.figure(figsize=(12, 6))
sns.countplot(data['category'])
plt.title('Count of Billboards by Category')
plt.xlabel('Category')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.show()

## Conclusion

In this notebook, we explored the dataset for billboard detection. We visualized the distribution of billboard sizes and the count of billboards by category. Further analysis can be conducted to gain deeper insights into the data.