Skip to content
Branch: master
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.

Files

Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
notebook_ims
source_pytorch
source_sklearn
1_Data_Exploration.ipynb
2_Plagiarism_Feature_Engineering.ipynb
3_Training_a_Model.ipynb
Plagiarism Detector Review.md
README.md
helpers.py
problem_unittests.py

README.md

Plagiarism Project, Machine Learning Deployment

Using SageMaker


Project

The task of this project is building a plagiarism detector that examines a text file and performs binary classification; labeling that file as either plagiarized or not, depending on how similar that text file is to a provided source text.

There are three main notebooks:

Notebook 1: Data Exploration

  • Load in the corpus of plagiarism text data.
  • Explore the existing data features and the data distribution.
  • This first notebook is not required in your final project submission.

Notebook 2: Feature Engineering

  • Clean and pre-process the text data.
  • Define features for comparing the similarity of an answer text and a source text, and extract similarity features.
  • Select "good" features, by analyzing the correlations between different features.
  • Create train/test .csv files that hold the relevant features and class labels for train/test data points.

Notebook 3: Train and Deploy Your Model in SageMaker

  • Upload your train/test feature data to S3.
  • Define a binary classification model and a training script.
  • Train your model and deploy it using SageMaker.
  • Evaluate your deployed classifier.


Last modified: 8 November 2019

You can’t perform that action at this time.