# Data Generation and Splitting

In this section, we generated synthetic data for cylinder surface area prediction and split it into training and testing datasets. Here's an overview:

- **Data Generation**: We created a synthetic dataset with 50,000 data points, including random values for cylinder radius and height. Surface area was calculated using the formula for cylinder surface area.

- **Data Saving**: The complete dataset was saved as 'complete_dataset.csv'. We then split this dataset into training and testing sets (80/20 split) using `train_test_split`.

- **Data Export**: The training and testing datasets were exported as 'train.csv' and 'test.csv', respectively, for use in machine learning models.


In [2]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

# Generate data
num_samples = 50000
radius = np.random.uniform(low=0, high=100, size=num_samples)
height = np.random.uniform(low=0, high=100, size=num_samples)
surface_area = 2 * np.pi * radius * (height + radius)

data = {'radius': radius, 'height': height, 'surface_area': surface_area}
df = pd.DataFrame(data)
#complete dataset
df.to_csv('Dataset/complete_dataset.csv', index=False)


# Split data into train and test
train, test = train_test_split(df, test_size=0.2, random_state=42)
train.to_csv('Dataset/train.csv', index=False)
test.to_csv('Dataset/test.csv', index=False)



