# ðŸ“¦ Simple Data Loading

<div style="background-color: #e3f2fd; padding: 15px; border-radius: 5px; border-left: 5px solid #2196F3;">
<b>ðŸ““ Information</b><br>
<b>Level:</b> Basic<br>
<b>Time:</b> 10 minutes<br>
<b>Dataset:</b> Wine (sklearn)
</div>

## ðŸŽ¯ Objectives
- âœ… Create DBDataset in the simplest way
- âœ… Understand automatic split
- âœ… Control test_size and random_state

In [None]:
import pandas as pd
from sklearn.datasets import load_wine
from deepbridge import DBDataset

# Load Wine dataset
wine = load_wine()
df = pd.DataFrame(wine.data, columns=wine.feature_names)
df['target'] = wine.target

print(f"Dataset: {df.shape}")
df.head()

## Automatic Split (80/20)

In [None]:
# Create DBDataset - Automatic split!
dataset = DBDataset(
    data=df,  # Complete DataFrame
    target_column='target',  # Target column name
    test_size=0.2,  # 20% for test
    random_state=42  # Reproducibility
)

print("âœ… DBDataset created!")
print(f"Train: {len(dataset.train_data)} samples")
print(f"Test: {len(dataset.test_data)} samples")

## Explore Properties

In [None]:
# Access data
print("ðŸ“Š DBDataset Properties:\n")
print(f"Features: {dataset.features[:5]}...")  # First 5
print(f"Target: {dataset.target_name}")
print(f"Categorical features: {dataset.categorical_features}")
print(f"Numerical features: {len(dataset.numerical_features)}")

## Different test_size

In [None]:
# Compare different splits
for test_size in [0.1, 0.2, 0.3]:
    ds = DBDataset(data=df, target_column='target', test_size=test_size, random_state=42)
    print(f"test_size={test_size}: Train={len(ds.train_data)}, Test={len(ds.test_data)}")

## ðŸŽ‰ Conclusion
- âœ… DBDataset creates split automatically
- âœ… test_size controls proportion
- âœ… random_state ensures reproducibility

**Next:** `02_pre_separated_data.ipynb`