# Synthetic Data Generation for Classification Problems

<img align="left" width="130" src="https://raw.githubusercontent.com/PacktPublishing/Amazon-SageMaker-Cookbook/master/Extra/cover-small-padded.png"/>

This notebook contains the code to help readers work through one of the recipes of the book [Machine Learning with Amazon SageMaker Cookbook: 80 proven recipes for data scientists and developers to perform ML experiments and deployments](https://www.amazon.com/Machine-Learning-Amazon-SageMaker-Cookbook/dp/1800567030)

### How to do it...

In [None]:
from sklearn.datasets import make_blobs
X, y = make_blobs(n_samples=5000, centers=2, 
                  cluster_std=[6, 4], n_features=2, 
                  random_state=40)

In [None]:
import pandas as pd
all_dataset = pd.DataFrame(
    dict(label=y, a=X[:,0], b=X[:,1]))
print(all_dataset)

In [None]:
from matplotlib import pyplot

colors = {0:'red', 1:'blue'}
fig, ax = pyplot.subplots()
grouped = all_dataset.groupby('label')

for key, group in grouped:
    group.plot(ax=ax, kind='scatter', 
               x='a', y='b', 
               label=key, 
               color=colors[key])
    
pyplot.show()

In [None]:
from sklearn.model_selection import train_test_split

train_val, test = train_test_split(all_dataset, 
                                   test_size=0.2, 
                                   random_state=0)

training, validation = train_test_split(train_val, 
                                        test_size=0.25, 
                                        random_state=0)

In [None]:
training

In [None]:
validation

In [None]:
test

In [None]:
!mkdir -p tmp

In [None]:
training.to_csv('tmp/training_data.csv', header=False, index=False)
validation.to_csv('tmp/validation_data.csv', header=False, index=False)
test.to_csv('tmp/test_data.csv', header=False, index=False)

In [None]:
s3_bucket = "sagemaker-cookbook-bucket"
prefix = "chapter05"
path = f"s3://{s3_bucket}/{prefix}/input"

In [None]:
!aws s3 cp tmp/training_data.csv {path}/training_data.csv
!aws s3 cp tmp/validation_data.csv {path}/validation_data.csv
!aws s3 cp tmp/test_data.csv {path}/test_data.csv