Building SageMaker BlazingText built-in algorithm to predict if the sms is spam ot not. BlazingText is a variant of FastText which is based on word2vec. The Dataset which is used for the model is in public S3 bucket: https://s3.console.aws.amazon.com/s3/buckets/daria-hlibova-test?region=us-east-2&tab=objects. If there will be some issues to open this bucket, train and validation dataset are in repository (train.csv and val.csv)
The repository includes all steps:
- Downloading dataset from the S3 bucket
- Transforming and preparing dataset, bacause BlazingText algorithm requires the special format of data.
- Splitting data to train and validation datasets.
- Uploading train and validation datasets to S3 bucket.
- Model building
- Creating endpoint and deploying model.
- Doing prediction and testing model.