# String Manipulation

## What is String Manipulation?

String manipulation refers to the process of **modifying, processing, and transforming text data** in Python. Since strings are everywhere—tweets, logs, reports, chatbot messages—this skill is essential, especially in **AI/ML** and **NLP**.

Strings in Python are **immutable sequences of characters**, enclosed in quotes. Python provides powerful built-in methods to clean, format, slice, split, and validate text. These operations are crucial when dealing with **messy, real-world data** — like converting to lowercase, removing punctuation, or extracting keywords.

In AI/ML pipelines, especially NLP tasks, string manipulation is the first step in **preprocessing text** before feeding it into models. Mastery of string methods helps you clean datasets, tokenize inputs, and prepare data for tasks like classification, embeddings, and sentiment analysis.

Syntax:

```python
text1 = "Hello AI"
text2 = 'Machine Learning'
text3 = """This is
a multi-line string"""
```

### Common String Manipulation Operations

**Accessing Characters:** In Python, strings are sequences, meaning you can access individual characters by their index position. Indexing starts at 0 for the first character. Negative indices count from the end of the string, with -1 referring to the last character.

In [1]:
text = "Python"
print(text[0])
print(text[-1]) 

P
n


**Slicing Strings:** Slicing allows you to extract a portion of a string using the syntax [start:end:step]. The start index is inclusive, while the end index is exclusive. If start is omitted, slicing begins from the first character. If end is omitted, slicing continues to the end. Using `[::-1]` reverses the entire string.

In [2]:
text = "DeepLearning"
print(text[0:4])
print(text[4:])
print(text[::-1])

Deep
Learning
gninraeLpeeD


**Changing Case:** Python provides methods to convert strings between uppercase, lowercase, and title case. These operations are useful for standardizing text data, especially in text processing and analysis.

In [3]:
text = "  Hello World  "
print(text.lower())
print(text.upper())
print(text.title())   

  hello world  
  HELLO WORLD  
  Hello World  


**Trimming Spaces:** The strip() method removes whitespace from both ends of a string. You can also use lstrip() to remove spaces from the left side only or rstrip() for the right side. These methods are essential for cleaning user inputs and data from external sources.

In [4]:
text = "  Hello World  "
print(text.strip())
print(text.lstrip())
print(text.rstrip())

Hello World
Hello World  
  Hello World


**Replace Text:** The replace() method substitutes all occurrences of a substring with another substring. This is useful for correcting misspellings, standardizing terms, or removing unwanted characters.

In [5]:
sentence = "AI is the future"
print(sentence.replace("AI", "Artificial Intelligence"))

Artificial Intelligence is the future


**Splitting and Joining:** The split() method divides a string into a list of substrings based on a delimiter (spaces by default). Conversely, join() combines a list of strings into a single string, inserting a specified delimiter between elements. These operations are fundamental in tokenization for NLP.

In [6]:
sentence = "AI is transforming everything"
words = sentence.split()
print("-".join(words))

AI-is-transforming-everything


**Membership Testing:** The 'in' and 'not in' operators check if a substring exists within a string. This is useful for conditional processing based on text content.

In [7]:
sentence = "AI is transforming everything"
print("AI" in sentence)
print("ML" not in sentence)
print("ML" in sentence)

True
True
False


**Cleaning Text:** In real-world applications, especially NLP, text often needs to be cleaned before analysis. This typically involves multiple string operations chained together: converting to lowercase, removing punctuation, stripping whitespace, etc. Clean text improves the accuracy of text analysis and machine learning models.

In [8]:
raw_text = "  Welcome to AI, Sujit!   "
clean_text = raw_text.lower().strip().replace(",", "")
print(clean_text)

welcome to ai sujit!


**String Methods:** String methods like isalpha(), isdigit(), and isalnum() help validate string content. isalpha() checks if all characters are alphabetic, isdigit() verifies if all characters are digits, and isalnum() confirms if all characters are either alphabetic or numeric. The isspace() method checks if a string consists only of whitespace characters. These methods return boolean values and are commonly used for input validation and data cleaning.

In [9]:
s = "AI123"
print(s.isalpha())
print(s.isdigit())
print(s.isalnum())

print(" ".isspace()) 
print("".isspace())  

False
False
True
True
False


**Finding and Counting:** The count() method tallies how many times a substring appears in a string. The find() method returns the index of the first occurrence of a substring, or -1 if not found. These methods are crucial for text analysis, pattern matching, and extracting information from strings.

In [10]:
text = "AI and AI is the future"
print(text.count("AI"))
print(text.find("AI"))
print(text.find("ML"))

2
0
-1


**String Formatting:** F-strings (formatted string literals) provide a concise way to embed expressions inside string literals using curly braces {}. Introduced in Python 3.6, they offer a more readable alternative to older formatting approaches. F-strings are commonly used in applications requiring dynamic text generation, like reports, logs, or user interfaces.

In [11]:
name = "Sujit"
field = "AI/ML"
print(f"Hello {name}, welcome to {field}!")

Hello Sujit, welcome to AI/ML!


**Length of String:** The len() function returns the number of characters in a string. This fundamental operation is used in many string processing tasks, such as validation, truncation, or calculating text statistics. Unlike some languages, Python's len() accurately handles Unicode characters.

In [12]:
text = "Deep Learning"
print(len(text))

13


**Escape Characters:** Escape characters, preceded by a backslash (\), allow you to include special characters in strings that would otherwise be interpreted differently. Common escape sequences include \" for quotation marks, \n for newlines, and \t for tabs. They're essential when working with text containing special characters or formatting.

In [13]:
quote = "He said, \"AI is the future!\""
print(quote)

He said, "AI is the future!"


**Ordinal & Character Conversion:** The ord() function converts a character to its Unicode code point (an integer), while chr() does the opposite. These functions are useful for character-level manipulations, encryption/decryption algorithms, and working with character encodings in text processing.

In [14]:
print(ord('A'))
print(chr(65))

65
A


**String Alignment:** Methods like center(), ljust(), and rjust() align strings within a specified width by padding with chosen characters. These are particularly useful for formatting text output in console applications, creating fixed-width reports, or aligning elements in simple text-based interfaces.

In [15]:
s = "AI"
print(s.center(10, '*'))
print(s.ljust(10, '-'))
print(s.rjust(10, '-'))

****AI****
AI--------
--------AI


**Removing Special Characters:** This example demonstrates a manual approach to filter out non-alphanumeric characters using a list comprehension with character type checking. This technique is common in text preprocessing for NLP, data cleaning, and sanitizing user inputs to remove unwanted symbols or potential security risks.

In [16]:
text = "AI!!! is great###"
cleaned = ''.join(char for char in text if char.isalnum() or char.isspace())
print(cleaned)

AI is great


**Counting Words:** This example shows a basic NLP preprocessing technique to count word frequencies in text. It converts text to lowercase, splits it into words, and creates a dictionary with word counts. This fundamental text analysis approach is the foundation for more complex NLP tasks like keyword extraction, topic modeling, and text summarization.

In [17]:
text = "AI is the future. AI will change everything."
words = text.lower().split()
word_count = {word: words.count(word) for word in set(words)}
print(word_count)

{'will': 1, 'ai': 2, 'the': 1, 'future.': 1, 'is': 1, 'everything.': 1, 'change': 1}


### Exercises

Q1. Write a program that takes a string input and removes all whitespace and converts it to lowercase.

In [18]:
string = "AB C D EF G"
print(string.lower().replace(" ",""))

abcdefg


Q2. Write a program to count the number of words in a sentence.

In [19]:
sentence = "My name is Sujit Chaudhary"
words = sentence.split()
print("Number of words in the sentence:", len(words))

Number of words in the sentence: 5


Q3. Write a function to check whether a string is a palindrome.

In [20]:
def is_palindrome(s):
    s = s.lower().replace(" ", "")
    return s == s[::-1]
print(is_palindrome("ABBA"))
print(is_palindrome("Hello"))

True
False


Q4. Write a program to find and replace all instances of the word “AI” with “Artificial Intelligence”.

In [21]:
string = "AI"
print(string.replace("AI", "Artificial Intelligence"))

Artificial Intelligence


### Summary

**String manipulation in Python** is the process of transforming, processing, and analyzing text data using built-in methods and techniques. Strings are **immutable sequences of characters**, and Python provides a rich set of tools to work with them efficiently.

Common operations include:

- Accessing and slicing text using indices
- Changing case (e.g., `upper()`, `lower()`, `title()`)
- Trimming whitespace and cleaning input
- Replacing, splitting, and joining strings
- Checking membership and string properties (`isalpha()`, `isdigit()`)
- Formatting dynamic text using `f-strings`
- Aligning text or converting characters (`ord()`, `chr()`)

These skills are **essential in AI/ML workflows**, especially in **Natural Language Processing (NLP)**, where raw text must be cleaned and standardized before use. Whether you're removing punctuation, extracting keywords, or tokenizing sentences, string manipulation ensures that the data fed into your models is clean and consistent.

For any AI/ML engineer, **string manipulation is the foundation** of working with real-world, unstructured data — powering everything from chatbots and sentiment analysis to data scraping and document classification.
