MOSS-UTFSM is a tool that aims to support teachers and assistants of the Universidad Técnica Federico Santa María in the detection of plagiarism in homework, using Standford's MOSS System for Measure Of Software Similarity and the Amazon S3 Service for plagiarism reports storage.
- Python 3 or latest
- MOSS User ID, you can get one here
- Amazon S3 Bucket for plagiarism reports storage
-
Clone this repository and navigate into it
git clone https://github.com/VadokDev/MOSS-UTFSM
cd MOSS-UTFSM
-
Rename the .env.example file to .env
-
Configure environment variables in the .env file
- MOSS_ID=
- ENDPOINT_URL=
- AWS_ACCESS_KEY_ID=
- AWS_SECRET_ACCESS_KEY=
- REGION_NAME=
-
Install dependences from requirements.txt (it's recommended to use Virtual Environments)
pip install -r requirements.txt
-
Create a new Course folder inside
data
to store the students and homework files, for example,CSJ-INF131-2021-01
-
Create a
students
folder inside the Course folder and put the SIGA exported students lists (.xls) in it, for example,data/CSJ-INF131-2021-01/students
. -
Download from AULA the student howeworks zip file and extract it in a new folder with the homework's name inside the folder created in step 1, for example,
data/CSJ-INF131-2021-01/T1
. -
Run the program with
python main.py [Language] [CourseFolder] [Homework] [SimilarityPercent]
Where:
- Language: the programming language used for plagiarism detection; see available list in MossService.py
- CourseFolder: name of the folder created in
data
for store student and homework files - Homework: name of the homework folder in the CourseFolder
- SimilarityPercent: % of similarity between two homework to classify them as plagiarism
Example:
python main.py python CSJ-INF131-2021-01 T1 60
-
Copy the MOSS URL output; that URL stores the complete MOSS plagiarism report.
-
Go to the
results
folder inside the Course folder and open the folder with the name of the homework used (for example.data/CSJ-INF131-2021-01/results/T1
), here you'll find:- web/: folder with the entire MOSS report website
- [section]/: folder with the high similarity homework of students in [section] (example: Section 201 students)
- inter/: folder with the high similarity homework of students from different sections
- not found/: folder with the high similarity homework of students that weren't found in the students .xls files
- [homework] [timestamp].xls: excel report with the detail of high similarity homework and their students.
Make the program easier to useImprove the code (sorry, it's awful)- Add testing
- Add a web interface
- Group same students in just one row
- Charts
- Improve this README (any suggestion is welcome)
- Improve my english (any suggestion is welcome x2)
- @soachishti - For his moss.py interface for MOSS
- @cristiancs - For help me implementing the plagiarism report storage on Amazon S3
- Stanford University - MOSS developers & maintainers
This project is licensed under the MIT License - see the LICENSE file for details.