<a href="https://colab.research.google.com/github/CodeCrafter-101/Plagiarism-Checker/blob/main/Plagiarism_Checker.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Plagiarism-Checker
Definition
- Plagiarism Checker is a Python-based tool used to compare two or more textual documents and determine how similar they are. It helps detect potential cases of plagiarism by calculating the cosine similarity between documents using Natural Language Processing (NLP) techniques.
- It can be used by students, educators, and writers to compare documents and identify potential plagiarism by analyzing textual overlaps and computing similarity scores.


In [12]:
# 📚 Plagiarism Checker using TF-IDF & Cosine Similarity
# Import necessary modules!

import os
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [13]:
# Get all .txt files in the current directory
def get_text_files():
    return [doc for doc in os.listdir() if doc.endswith('.txt')]

In [14]:
# Read the content of each file
def read_files(file_list):
    return [open(file, encoding='utf-8').read() for file in file_list]

In [15]:
# Vectorize using TF-IDF
def vectorize(texts):
    return TfidfVectorizer().fit_transform(texts).toarray()

In [16]:
#  Calculate cosine similarity
def similarity_score(vec1, vec2):
    return cosine_similarity([vec1], [vec2])[0][0]

In [17]:
# Check for plagiarism across all file combinations
def check_plagiarism(files, vectors):
    results = set()
    for i in range(len(files)):
        for j in range(i + 1, len(files)):
            file_a, file_b = files[i], files[j]
            sim = similarity_score(vectors[i], vectors[j])
            results.add((file_a, file_b, round(sim * 100, 2)))
    return results

In [18]:
# Main runner
if __name__ == "__main__":
    student_files = get_text_files()

    if len(student_files) < 2:
        print("❌ Not enough text files to compare. Add at least two `.txt` files.")
    else:
        student_notes = read_files(student_files)
        vectors = vectorize(student_notes)

        print("\n📊 Plagiarism Results:")
        print("=" * 40)
        results = check_plagiarism(student_files, vectors)

        for file1, file2, score in sorted(results, key=lambda x: -x[2]):
            print(f"🔍 {file1} ⟷ {file2} → Similarity: {score}%")



📊 Plagiarism Results:
🔍 doc2.txt ⟷ doc1.txt → Similarity: 45.52%
