# Create the dataset - Apple, Oranges and Lemons
This notebook creates a synthetic dataset representing three types of fruits: 
- Apples, 
- Oranges, 
- and Lemons. 

The fruits are modeled with their 
- **Weight (grams)** and 
- **Size (cm)** as features. 

Outliers are included to add complexity. This visualization maps **Size (cm)** to the x-axis and **Weight (grams)** to the y-axis, with fruit images as data points.

In [1]:
import numpy as np
import pandas as pd

# 1. Generate Data
sample_size = 100 # samples per fruit 
np.random.seed(42)
apple_size = np.random.normal(7.5, 0.3, sample_size)
orange_size = np.random.normal(8.5, 0.3, sample_size)
lemon_size = np.random.normal(6.5, 0.3, sample_size)

apple_weight = np.random.normal(180, 10, sample_size)
orange_weight = np.random.normal(150, 10, sample_size)
lemon_weight = np.random.normal(130, 10, sample_size)

sizes = np.concatenate([apple_size, orange_size, lemon_size])
weights = np.concatenate([apple_weight, orange_weight, lemon_weight])
labels = ["Apple"] * sample_size + ["Orange"] * sample_size + ["Lemon"] * sample_size

# 2. Combine Data into DataFrame
data = pd.DataFrame({"Size": sizes, "Weight": weights, "Label": labels})

# Display the first few rows of the dataset
display(data.sample(10))
# Save the data
data.to_csv("fruits.csv", index=False)

Unnamed: 0,Size,Weight,Label
79,6.903729,171.917017,Apple
12,7.572589,188.254163,Apple
204,6.086699,123.493574,Lemon
137,8.403382,136.198985,Orange
99,7.429624,192.378163,Apple
47,7.817137,165.925362,Apple
205,6.218652,125.128746,Lemon
15,7.331314,180.210038,Apple
42,7.465306,182.449666,Apple
190,8.366046,149.920274,Orange
