# String Manipulation
Strings are one of the most fundamental data types, especcially in text analysis. A string is simply a sequence of characters. In this notebook, we'll explore how to slice, dice and transform strings to get into the format we need

## Basic Operations: Concatenation and Repetition
You can easily combine (concatenate) strings used the + operator or repeat them with the * operator.

In [3]:
# Concat
greeting = 'Hello'
name = 'world'

full_greeting = greeting+' ' + name
print(full_greeting)

Hello world


In [4]:
# Repitition
new_greeting = 'Hello' * 10
print(new_greeting)

HelloHelloHelloHelloHelloHelloHelloHelloHelloHello


### Slicing and Indexing
Since strings are sequences, you can access individual characters by **indexing**. We can also access sections of a string with **slicing**. Remember, Python indexing starts with 0, meaning that the first character in the string is index 0

In [5]:
sentence = "Text analysis is fun!"

# Get the first character (at index 0)
print(f"First character: {sentence[0]}")

# Get the last character
print(f"Last character: {sentence[-1]}")

# Slice from index 5 up to (but not including) index 13
analysis_word = sentence[5:13]
print(f"A slice: {analysis_word}")

# Slice from the beginning to index 4
print(f"Slice from start: {sentence[:4]}")

# Slice from index 17 to the end
print(f"Slice to end: {sentence[17:]}")


First character: T
Last character: !
A slice: analysis
Slice from start: Text
Slice to end: fun!


### Common String Methods
Python strings come with many powerful built=in methods to perform common tasks. 
- lower() / upper(): Convert the entire string to lowercase or uppercase.
- strip(): Remove whitespace (spaces, tabs, newlines) from the beginning and end of a string.
- replace(old, new): Replace all occurrences of a substring with another.
- split(delimiter): Split the string into a list of substrings based on a delimiter.
- find(substring): Returns the starting index of the first occurrence of a substring. Returns -1 if not found.
- join(list_of_strings): Joins elements of a list into a single string, with the string as the separator

In [6]:
raw_text = "   Here is a messy string.   "

# Cleaning up
clean_text = raw_text.strip().lower()
print(f"Cleaned: '{clean_text}'")

# Replacing text
replaced_text = clean_text.replace("messy", "clean")
print(f"Replaced: '{replaced_text}'")

# Splitting into a list of words
words = replaced_text.split(" ")
print(f"Words: {words}")

# Joining the list back into a string
new_sentence = " | ".join(words)
print(f"Joined: '{new_sentence}'")

# Finding a substring
position = replaced_text.find("string")
print(f"'string' found at index: {position}")

Cleaned: 'here is a messy string.'
Replaced: 'here is a clean string.'
Words: ['here', 'is', 'a', 'clean', 'string.']
Joined: 'here | is | a | clean | string.'
'string' found at index: 16


### F-strings for formatting
Formatted strings (or f-strings) are the modern way to embed expressions inside string literals. Just prefix the string with an f, and place your variables or expressions in {}

In [7]:
project = "Text Analysis"
year = 2025
status = "Ongoing"

# Create a summary using an f-string
summary = f"Project '{project}' started in {year}. Current status: {status}."
print(summary)

Project 'Text Analysis' started in 2025. Current status: Ongoing.
