# Python Strings

## Objectives

- Understand what strings are and how they can be manipulated in Python.
- Explore practical examples of string operations, including indexing and slicing.
- Learn various Python string methods and operations that can be applied for data manipulation.

## Background

Strings in Python are immutable sequences. They are one of the most common types of data used in programming for text processing, including manipulating textual data, parsing, formatting, and pattern matching. Given their frequent use, mastery of string handling is essential in data preprocessing, notably for natural language processing (NLP), data cleaning, and preparation.

## Datasets Used

This notebook does not use external datasets, but rather focus on hardcoded string literals to demonstrate the concepts. This approach allows learners to focus on the mechanics of string manipulation without the complexity of data loading and handling.

## String Definition

Either single or double quotation marks surround strings in Python.

'hello' is the same as "hello"

In [1]:
print('Hello World!')

Hello World!


In [2]:
print("Hello World!")

Hello World!


In [3]:
s = 'Hello World!'

In [4]:
print(s)

Hello World!


In [5]:
type(s)

str

Suppose you wan to print `I'm a string` How could you do it? There are two options.

In [6]:
print("I'm a string")

I'm a string


In [7]:
print('I\'m a string')

I'm a string


### Multiline strings

You can assign a multiline string to a variable by using three quotes:

In [8]:
a = """This is an 
example of a  
multiline string."""
print(a)

This is an 
example of a  
multiline string.


In [9]:
b = '''This is another 
example of a  
multiline string.'''
print(b)

This is another 
example of a  
multiline string.


### Strings as arrays

Strings are arrays:
- Python does not have a character data type. A single character is simply a string with a length of 1.
- 0 is the first index.
- Elements of any string are accessed using squared brackets.

In [10]:
s

'Hello World!'

In [11]:
print(s[0])
print(s[1])
print(s[2])
print(s[3])
print(s[4])

H
e
l
l
o


String len: **len()** function returns the lenght of the string.

In [12]:
len(s)

12

## Slicing

You can return a range of characters by using the slice syntax.

`s[a:b]` returns characters from a (included) to b-1 (does not include b). 

In [13]:
# from position 1 to 5-1:
print(s[1:5])

ello


In [14]:
# from position 0 to 5-1:
print(s[0:5])

Hello


In [15]:
# You do not need to specify 0 (first position)
print(s[:5])

Hello


In [16]:
# You do not need to specify the last position.
print(s[6:])

World!


### Negative Indexing

Use negative indexes to start the slice from the end of the string.

|**H**|**e**|**l**|**l**|**o**|   |**W**|**o**|**r**|**l**|**d**|**!**|
|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| -12 | -11 | -10 | -9 | -8 | -7 | -6 | -5 | -4 | -3 | -2 | -1 |

In [17]:
# You access the last character using the index -1:
print(s[-1])

!


In [18]:
print(s[-6:-1])

World


In [19]:
# From -6 to the last position:
print(s[-6:])

World!


In [20]:
print(s[-12:-7])

Hello


In [21]:
# You do not need to specify the first position.
print(s[:-7])

Hello


## Python Methods for Working with Strings

All string methods returns new values. They do not change the original string.

**strip()**: removes any whitespace from the beginning or the end. It returns a trimmed version of the string

In [22]:
s = '   Hello World!    '
s = s.strip()
print(s)

Hello World!


**lower()**: returns the string in lower case

In [23]:
print(s.lower())

hello world!


**upper()**: returns the string in upper case

In [24]:
print(s.upper())

HELLO WORLD!


**capitalize()**: converts the first character to upper case

In [25]:
l = 'hello world'
l.capitalize()

'Hello world'

**replace()**: replace a string with another string

In [26]:
print(s.replace('World', 'Earth'))

Hello Earth!


**find()**: searches the string for a specified value and returns its position

In [27]:
s.find('o')

4

**find** returns -1 if the substring does not appear

In [28]:
s.find('u')

-1

In case of multiple appearances, find returns the first position.

In [29]:
s.find('l')

2

**rfind()**: Searches the string for a specified value and returns the last position of where it was found

In [30]:
s.rfind('l')

9

**count()**: Returns the number of times a specified value occurs in a string

In [31]:
s.count('u')

0

In [32]:
s.count('l')

3

**String concatenation**

In [33]:
name = 'Jane'
last_name = 'Doe'
print(name + ' ' + last_name) 

Jane Doe


**format()**: takes the passed arguments, formats them, and places them in the string where the placeholders {} are.

In [34]:
age = 30
s1 = 'My name is Anna, I am {}'
print(s1.format(age))

My name is Anna, I am 30


The format() method takes unlimited number of arguments, and are placed into the respective placeholders.

In [35]:
age = 33
name = 'Jane'
last_name = 'Doe'
s2 = 'My name is {} {}, I am {}'
print(s2.format(name, last_name, age))

My name is Jane Doe, I am 33


You can use index numbers to be sure the arguments are placed in the correct placeholders.

In [36]:
age = 33
name = 'Jane'
last_name = 'Doe'
s2 = 'My name is {1} {2}, I am {0}'
print(s2.format(age, name, last_name))

My name is Jane Doe, I am 33


**split()**: splits the string into subtrings if it finds instances of the separator

In [37]:
a = 'Hello, World!'
print(a.split(','))

['Hello', ' World!']


## Conclusions

Key takeaways:
- The ability to perform basic string operations such as concatenation, multiplication, and accessing elements.
- Knowledge of string methods for transformation, search, and validation.
- A conceptual understanding of the immutability of strings in Python and how this affects how strings are manipulated.

## References

- VanderPlas, J. (2017) Python Data Science Handbook: Essential Tools for Working with Data. USA: O’Reilly Media, Inc. 