
## `RollingHash` Class

### Initialization (`__init__`):
- `base`: The base value for the rolling hash.
- `mod`: The modulo value for the rolling hash.
- `hash_value`: The current hash value.
- `base_power`: The current base power.

### Append Method (`append`):
- Updates the hash value by appending a character.
- Multiplies the current hash value by the base, adds the ASCII value of the new character, and takes the result modulo the modulo value.

### Skip Method (`skip`):
- Updates the hash value by skipping a character.
- Adjusts the base power and subtracts the ASCII value of the skipped character from the hash value.

### Get Hash Method (`get_hash`):
- Returns the cualts the results.
- If all tests pass, prints "All tests passed!"
ts the results.
- If all tests pass, prints "All tests passed!"


In [1]:
class RollingHash:
    def __init__(self, base, mod):
        """
        Initializes the RollingHash object.

        :param base: Base value for the rolling hash.
        :param mod: Modulo value for the rolling hash.
        """
        self.base = base
        self.mod = mod
        self.hash_value = 0
        self.base_power = 1

    def append(self, char):
        """
        Appends a character to the rolling hash.

        :param char: The character to append.
        """
        self.hash_value = (self.hash_value * self.base + ord(char)) % self.mod
        self.base_power = (self.base_power * self.base) % self.mod

    def skip(self, char):
        """
        Skips a character from the rolling hash.

        :param char: The character to skip.
        """
        self.base_power = (self.base_power * pow(self.base, -1, self.mod)) % self.mod
        self.hash_value = (self.hash_value - ord(char) * self.base_power) % self.mod

    def get_hash(self):
        """
        Returns the current hash value.

        :return: The current hash value.
        """
        return self.hash_value




In [18]:

# Test Case 1: Default RollingHash with base 256 and modulo 10^9 + 7
rolling_hash = RollingHash(256, 10**9 + 7)

# Append characters 'abcde'
for char in "abcde":
    rolling_hash.append(char)
print("Test Case 1 - Hash value after appending 'abcde':", rolling_hash.get_hash())

# Skip 'a' and append 'f'
rolling_hash.skip("a")
rolling_hash.append("f")
print("Test Case 1 - Hash value after skipping 'a' and appending 'f':", rolling_hash.get_hash())

# Test Case 2: Custom RollingHash with base 128 and modulo 10^9 + 9
custom_rolling_hash = RollingHash(128, 10**9 + 9)

# Append characters 'hello'
for char in "hello":
    custom_rolling_hash.append(char)
print("Test Case 2 - Custom RollingHash hash value for 'hello':", custom_rolling_hash.get_hash())

# Test Case 3: RollingHash with a larger base and modulo
larger_rolling_hash = RollingHash(997, 10**9 + 21)

# Append characters 'python'
for char in "python":
    larger_rolling_hash.append(char)
print("Test Case 3 - Larger RollingHash hash value for 'python':", larger_rolling_hash.get_hash())

# Test Case 4: RollingHash with a very large base and modulo
very_large_rolling_hash = RollingHash(10**6 + 3, 10**9 + 9)

# Append characters 'hashing'
for char in "hashing":
    very_large_rolling_hash.append(char)
print("Test Case 4 - Very Large RollingHash hash value for 'hashing':", very_large_rolling_hash.get_hash())


))


if __name__ == "__main__":
    main()


SyntaxError: unmatched ')' (3412410534.py, line 39)

## `hash_string` Function

- Initializes a `RollingHash` object with a specified base and modulo.
- Iterates through each character in the input string.
- Calls the `append` method of the `RollingHash` object to update the hash value.
- Returns the final hash value of the entire string.

In [2]:
def hash_string(input_str, base=256, mod=10**9 + 7):
    """
    Hashes a string using a rolling hash.

    :param input_str: Input string to be hashed.
    :param base: Base value for the rolling hash.
    :param mod: Modulo value for the rolling hash.
    :return: Hash value of the string.
    """
    hash_obj = RollingHash(base, mod)
    for char in input_str:
        hash_obj.append(char)
    return hash_obj.get_hash()

## `substring_matching` Function

- Gets the lengths of the pattern and the text.
- If the pattern is longer than the text, immediately returns `False`.
- Calculates the hash value of the initial substring of the pattern using the `hash_string` function.
- Creates an instance of the `RollingHash` class for rolling hash calculations.
- Iterates through each possible substring of the text with the same length as the pattern.
- Uses rolling hash to efficiently calculate the hash value of each substring.
- Compares the hash value of the current substring with the hash value of the pattern.
- If they match, compares the actual substrings.
- Returns `True` if a match is found, otherwise returns `False`.


In [3]:
def substring_matching(text, pattern):
    """
    Checks if a pattern is a substring of the given text using rolling hash.

    :param text: The complete text.
    :param pattern: The pattern to be searched in the text.
    :return: True if pattern is a substring of text, False otherwise.
    """
    pattern_length = len(pattern)
    text_length = len(text)

    if pattern_length > text_length:
        return False

    pattern_hash = hash_string(pattern[:pattern_length])
    rolling_hash = RollingHash(256, 10**9 + 7)

    for i in range(text_length - pattern_length + 1):
        if i == 0:
            window_hash = hash_string(text[:pattern_length])
        else:
            rolling_hash.skip(text[i - 1])
            rolling_hash.append(text[i + pattern_length - 1])
            window_hash = rolling_hash.get_hash()

        if window_hash == pattern_hash and text[i:i + pattern_length] == pattern:
            return True

    return False

## `plagiarism_detection` Function

- Creates a list of tuples, each containing the index of a document and its hash value.
- Sorts the list of tuples based on hash values.
- Iterates through the sorted list to find pairs of documents with the same hash value.
- Adds the indices of such documents to the `similar_pairs` list.
- Returns a list of tuples representing pairs of documents with similar content.

In [4]:
def plagiarism_detection(documents):
    document_hashes = [(i, hash_string(doc)) for i, doc in enumerate(documents)]
    document_hashes.sort(key=lambda x: x[1])

    similar_pairs = []

    for i in range(1, len(document_hashes)):
        if document_hashes[i][1] == document_hashes[i - 1][1]:
            similar_pairs.append((document_hashes[i - 1][0], document_hashes[i][0]))

    return similar_pairs




## Test Functions

### Test `RollingHash` Class:

- Creates a `RollingHash` object, appends characters to it, and prints the final hash value.

In [5]:
# Test Cases

rolling_hash = RollingHash(256, 10**9 + 7)
for char in "abcde":
    rolling_hash.append(char)
print(rolling_hash.get_hash())  # Expected output: The hash value of "abcde"




262505719


### Test `substring_matching` Function:

- Tests if the `substring_matching` function correctly identifies whether a pattern is a substring of the given text.


In [6]:
text = "ababcababcabc"
pattern1 = "abc"
pattern2 = "xyz"
print(substring_matching(text, pattern1))  # Expected output: True
print(substring_matching(text, pattern2))  # Expected output: False


False
False


### Test `plagiarism_detection` Function:

- Tests if the `plagiarism_detection` function correctly identifies pairs of documents with similar content.


In [7]:
documents = ["This is a sample document.",
             "Sample document for testing plagiarism.",
             "Another document with similar content."]
result = plagiarism_detection(documents)
print(result)  # Expected output: List of tuples representing pairs of documents with similar content


[]
