**Install Required Library**

In [1]:
!pip install pyspellchecker



!pip install pyspellchecker: Installs the pyspellchecker library, which is used for spelling correction.

**Import Required Libraries**

In [2]:
import string
import nltk
import re
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from spellchecker import SpellChecker
nltk.download('stopwords')
nltk.download('punkt')
nltk.download('wordnet')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True



*   import ...: Imports necessary libraries and modules.
    *   string: Used for string operations.
    *   nltk and its submodules: Used for natural language processing tasks.
    *   re: Used for regular expressions.
    *   SpellChecker: Used for spelling correction.
*   nltk.download(...): Downloads required NLTK data files for stopwords, tokenization, and lemmatization.



**Lowercase Conversion**

In [3]:
def lowercase(text):
  text=text.lower()
  print("Lowercase: ", text)
  return text

**Remove Punctuation**

In [4]:
def remove_punctuation(text):
  translator = str.maketrans('', '', string.punctuation)
  text = text.translate(translator)
  print("Punctuation Removed: ", text)
  return text

**Remove Special Characters**

In [5]:
def remove_special_chars(text):
  text = re.sub(r'[^\w\s]','',text)
  print("Special Characters Removed: ", text)
  return text

**Remove Stopwords**

In [6]:
def remove_stopwords(text):
  stop_words = set(stopwords.words('english'))
  tokens = nltk.word_tokenize(text)
  filtered_tokens = [token for token in tokens if token.lower() not in stop_words]
  filtered_text = ' '.join(filtered_tokens)
  print("Stopwords Removed: ", filtered_text)
  return filtered_text

**Standardize Text (Lemmatization)**

In [7]:
def standardize_text(text):
  tokens = nltk.word_tokenize(text)
  lemmatizer = WordNetLemmatizer()
  standardized_tokens = [lemmatizer.lemmatize(token) for token in tokens]
  standardized_text = ' '.join(standardized_tokens)
  print("Standardized Text: ", standardized_text)
  return standardized_text

**Correct Spelling**

In [8]:
def correct_spelling(text):
  corrected_text = ""
  spell = SpellChecker()
  corrected_text = ' '.join([spell.correction(word) or word for word in text.split()])
  print("Corrected Text: ", corrected_text)
  return corrected_text

**Define Text Cleaning Pipeline**

In [9]:
def clean_text(text):
  cleaned_text = lowercase(text)
  cleaned_text = remove_punctuation(cleaned_text)
  cleaned_text = remove_special_chars(cleaned_text)
  cleaned_text = remove_stopwords(cleaned_text)
  cleaned_text = standardize_text(cleaned_text)
  cleaned_text = correct_spelling(cleaned_text)
  return cleaned_text

**Apply Cleaning Pipeline to Sample Texts**

In [10]:
text1 = "Hello World! This is an example of text cleaning using Python."
cleaned_text1 = clean_text(text1)
seperator = '*'*100
print(seperator)
print("Cleaned text: ",cleaned_text1)

print("\n")

text2 = "Hi! I'm Mahiyat and I'm a CSE undergraduate. I love working with #data and #NLPs. I have a litle experience in this fielld."
cleaned_text2 = clean_text(text2)
print(seperator)
print("Cleaned text: ",cleaned_text2)

Lowercase:  hello world! this is an example of text cleaning using python.
Punctuation Removed:  hello world this is an example of text cleaning using python
Special Characters Removed:  hello world this is an example of text cleaning using python
Stopwords Removed:  hello world example text cleaning using python
Standardized Text:  hello world example text cleaning using python
Corrected Text:  hello world example text cleaning using python
****************************************************************************************************
Cleaned text:  hello world example text cleaning using python


Lowercase:  hi! i'm mahiyat and i'm a cse undergraduate. i love working with #data and #nlps. i have a litle experience in this fielld.
Punctuation Removed:  hi im mahiyat and im a cse undergraduate i love working with data and nlps i have a litle experience in this fielld
Special Characters Removed:  hi im mahiyat and im a cse undergraduate i love working with data and nlps i have a li



*   Defines two sample texts (text1 and text2).
*   Cleans each text using the clean_text function.
*   Prints the cleaned text along with a separator for clarity.




**Summary**<br>
This notebook provides a comprehensive pipeline for cleaning text data, including converting to lowercase, removing punctuation and special characters, removing stopwords, lemmatizing, and correcting spelling. The cleaning functions are applied in sequence to the sample texts, and the results are printed to demonstrate the effect of each step.