Plagiarism Detection Script

This Python script performs plagiarism detection on a set of text documents using the Term Frequency-Inverse Document Frequency (TF-IDF) approach. It calculates the cosine similarity between pairs of documents to identify potential cases of plagiarism.

Usage

Place the text documents (.txt files) in the same directory as the script.
Run the script:
```
python plagiarism_detection.py
```
The script will output pairs of documents along with their cosine similarity scores, indicating potential plagiarism.

Requirements

Make sure to install the required dependencies before running the script:

pip install scikit-learn

How it works

The script reads all .txt files in the current directory and stores their content in a list (student_notes).
It then vectorizes the text using the TF-IDF vectorizer from scikit-learn.
Cosine similarity is calculated between each pair of documents to identify potential plagiarism.
The script outputs pairs of documents along with their cosine similarity scores.

Contributing

Feel free to contribute by submitting issues or pull requests.

License

This project is licensed under the MIT License.

Acknowledgments

The script uses scikit-learn for TF-IDF vectorization and cosine similarity calculations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Plagiarism Detection Script

Usage

Requirements

How it works

Contributing

License

Acknowledgments

Files

README.md

Latest commit

History

README.md

File metadata and controls

Plagiarism Detection Script

Usage

Requirements

How it works

Contributing

License

Acknowledgments