# 1. Palindrome Check
Problem: Write a function to check whether a given string is a palindrome or not (i.e., it reads the same forward and backward).

Why it's useful for ML: Palindrome checking is useful in natural language processing (NLP) tasks such as text normalization, sequence alignment, or even data cleaning when dealing with structured text data.

In [None]:
def str_palindrome(str1):
  return str1 == str1[::-1]

In [None]:
print(str_palindrome('madam'))
print(str_palindrome('hello'))

True
False


# 2. Anagram Check
Problem: Given two strings, write a function to check whether one string is an anagram of the other (i.e., they contain the same characters in the same frequency).

Why it's useful for ML: Anagram checking is crucial in text normalization tasks, feature engineering for text, and deduplication in document retrieval systems.

In [None]:
from collections import Counter

def str_anagram(str1,str2):
  return Counter(str1) == Counter(str2)

In [None]:
print(str_anagram('listen','silent'))
print(str_anagram('hello','world'))

True
False


# 3. Longest Substring Without Repeating Characters
Problem: Given a string, find the length of the longest substring that contains no repeating characters.

In [None]:
def longest_substring(str1):
  longest = ''
  current = ''

  for i in str1:
    if i not in current:
      current += i
    else:
      if len(current) > len(longest):
        longest = current
      current = i
  return longest


In [None]:
print(longest_substring('abcdfabcdeabcabba'))

abcdf


# 4. String Compression (Run-Length Encoding)
Problem: Implement a basic form of string compression using run-length encoding (RLE). If the string has repeating characters, replace them with the character followed by the number of occurrences. For example, "aabbbcc" should become "a2b3c2".

In [None]:
def string_compression(str1):
  string1 = str1
  string_set = set(string1)
  compressed_string = ''

  for i in sorted(string_set):
    compressed_string += i + str(string1.count(i))

  return compressed_string

In [None]:
print(string_compression('aabbbccddddddd'))

a2b3c2d7


# 5. Substring Search (Rabin-Karp Algorithm)
Problem: Implement a substring search algorithm, like the Rabin-Karp algorithm, to find all occurrences of a pattern within a given string.

In [None]:
def search_substring(str1,pattern):
  pattern = pattern
  string1 = str1

  result = []
  for i in range(len(string1)):
    if pattern in string1 and pattern[0] == string1[i]:
      result.append(i)
  return result


In [None]:
#print(search_substring('hello world hello world hello hello hello world world world','world'))
print(search_substring('hello world hello world hello hello hello world world world','world'))

[6, 18, 42, 48, 54]


# 6. Word Count (from a Large Text)
Problem: Given a large block of text (like a document or paragraph), write a function to count the occurrences of each word.

Why it's useful for ML: This is a classic NLP problem, useful in text classification, topic modeling, and sentiment analysis where you need to analyze the frequency of terms in a document.

In [None]:

def word_count(text):
  words = text.split()

  word_cnt = []
  for i in set(words):
    word_cnt.append((i,words.count(i)))

  return word_cnt

In [None]:
print(word_count("It's so cold. I am shivering. Get me a blanket. cold."))

[('blanket.', 1), ('cold.', 2), ("It's", 1), ('shivering.', 1), ('Get', 1), ('am', 1), ('a', 1), ('me', 1), ('so', 1), ('I', 1)]


In [None]:
s1, s2 = 4 , 5
dp = [[0] * (s2 + 1) for _ in range(s1 + 1)]

In [None]:
dp

[[0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0]]

# 7. Levenshtein Distance (Edit Distance)
Problem: Implement a function to compute the Levenshtein distance (edit distance) between two strings. This is the minimum number of operations (insertions, deletions, substitutions) required to convert one string into the other.

In [None]:
def levenshtein_distance(s1: str, s2: str) -> int:
    len_s1, len_s2 = len(s1), len(s2)
    dp = [[0] * (len_s2 + 1) for _ in range(len_s1 + 1)]

    for i in range(len_s1 + 1):
        for j in range(len_s2 + 1):
            if i == 0:
                dp[i][j] = j
            elif j == 0:
                dp[i][j] = i
            elif s1[i-1] == s2[j-1]:
                dp[i][j] = dp[i-1][j-1]
            else:
                dp[i][j] = 1 + min(dp[i-1][j], dp[i][j-1], dp[i-1][j-1])

    return dp[len_s1][len_s2]


In [None]:
print(levenshtein_distance("kitten", "sitting"))

3
