This is the final solution of the Plagiarism detector Project that examines a text file and performs binary classification; labeling the file as either plagiarized or not, depending on how similar the text file is to a provided source text.
This project is broken down into three main notebooks:
- Notebook 1: Data Exploration
- Notebook 2: Feature Engineering
- Notebook 3: Training and Deploying Model in SageMaker
This project uses the following software and Python libraries:
- Python
- NumPy
- pandas
- Scikit-Learn
- RandomForest Classifier
- Amazon SageMaker