# Strings

Strings are sequence of Characters

In Python specifically, strings are a sequence of **Unicode Characters** , which means it can store characters from any language in the world.

- Creating Strings
- Accessing Strings
- Adding Chars to Strings
- Editing Strings
- Deleting Strings
- Operations on Strings
- String Functions

`Unicode` is a computing industry standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard is maintained by the Unicode Consortium.

## Creating Stings

In [12]:
s = 'hello'
s = "hello"
# multiline strings
s = '''hello'''
s = """hello
world
!"""
# s = str('hello')
print(s)

hello 
world
!


Why multiple ways to create strings?
- Single quotes and double quotes are used to create single line strings.
    - If a string contains a single quote, it is better to use double quotes to avoid escaping the single quote.
    - If a string contains double quotes, it is better to use single quotes to avoid escaping the double quotes.
- Triple quotes are used to create multi-line strings , which can span multiple lines , here also single and double quotes can be used.
- str() is a constructor that can convert other data types to strings.

In [15]:
"it's raining outside"

"it's raining outside"

## Accessing Substrings from a String

### Indexing
- Each character in a string has a unique index associated with it.
- Indexing starts from 0 for the first character, 1 for the second character, and so on.
- Negative indexing is also supported, where -1 represents the last character, -2 represents the second last character, and so on.

In [18]:
# Positive Indexing
s = 'hello world'
print(s[0])  #h
print(s[1])  #e
print(s[2])  #l
print(s[3])  #l
print(s[4])  #o

h
e
l
l
o


In [20]:
# Negative Indexing
s = 'hello world'
print(s[-1])  #d
print(s[-2])  #l
print(s[-3])  #r
print(s[-4])  #o
print(s[-5])  #w

d
l
r
o
w


### Slicing
- Slicing is used to extract a substring from a string.
- The syntax for slicing is `string[start:end:step]`, where:
    - `start` is the index to start the slice (inclusive).
    - `end` is the index to end the slice (exclusive).
    - `step` is the step size (optional, default is 1).

**Slicing does not modify the original string, it returns a new substring.**

In [21]:
s = "python is awesome"
print(s[0:6])  #python
print(s[7:9])  #is
print(s[10:17])  #awesome

python
is
awesome


- If `start` is omitted, it defaults to 0.
- If `end` is omitted, it defaults to the length of the string.
- If `step` is omitted, it defaults to 1.

In [22]:
print(s[:6])  #python
print(s[7:])  #is awesome
print(s[:])  #python is awesome

python
is awesome
python is awesome


In [23]:
print(s[::2])  #pto saeoe
print(s[::3])  #ph sae
print(s[::4])  #pniwm

pto saeoe
ph  em
posee


- Negative step values can be used to reverse the string.

In [24]:
print(s[::-1])  #emosewa si nohtyp (reversed string) ==> important
print(s[::-2])  #esoai oht
print(s[::-3])  #eoa nyp

emosewa si nohtyp
eoeas otp
esaioy


In [36]:
print(s[-1:-6:-1])  #emose
print(s[-6:-1])  #aweso
print(s[-6:])  #awesome

emose
wesom
wesome


In [32]:
print(s[6:0:-1])  #nohtyp
print(s[10:6:-1])  #si
print(s[17:10:-1])  #emosewa
print(s[5:15:2])  #yoi sae
print(s[15:5:-2])  #eoa iyt

 nohty
a si
emosew
ni ws
msw i


- If `start` is greater than `end` and `step` is positive, an empty string is returned , so always ensure that start < end for positive step.
- If `start` is less than `end` and `step` is negative, an empty string is returned. , so always ensure that start > end for negative step.
- If `step` is 0, a `ValueError` is raised.

In [35]:
print("Empty Strings below")
print(s[5:15:-2])  #empty string
print(s[15:5:2])  #empty string
# print(s[5:15:0])  #ValueError

Empty Strings below




- If `start` or `end` are out of bounds, they are adjusted to fit within the string length.

In [31]:
print(s[20:25])  #empty string
print(s[-25:5])  #pytho
print(s[5:25])  #n is awesome


pytho
n is awesome


## Editing and Deleting in Strings

In [None]:
# Editing
s = 'hello world'
s[0] = 'H'  #TypeError: 'str' object does not support item assignment
print(s)

# Python strings are immutable

`strings` are immutable, which means once a string is created, it cannot be modified. Any operation that seems to modify a string actually creates a new string.

In [None]:
# Deleting
s = 'hello world'
del s
print(s)  #NameError: name 's' is not defined

In [None]:
# Deleting a part of string using slicing is also not possible since strings are immutable
s = 'hello world'
del s[-1:-5:2]
print(s)

## Operations on Strings

- `Arithmetic Operations :` It includes concatenation (+) and repetition (*) on strings.
- `Relational Operations :` It includes comparison operators like ==, !=, <, >, <=, >= to compare strings lexicographically.
- `Logical Operations :` It includes logical operators like and, or, not to combine multiple conditions involving strings.
- `Loops on Strings :` It includes iterating over each character in a string using loops like for and while.
- `Membership Operations :` It includes checking if a substring exists within a string using in and not in operators.

#### Arithmetic Operations

In [3]:
print('Rajasthan' + ' ' + 'Haryana')

Rajasthan Haryana


In [4]:
print('Haryana '*5)

Haryana Haryana Haryana Haryana Haryana 


In [5]:
print("*"*50)
# Commonly used to create separators or borders in console output

**************************************************


#### Relational Operations

In [6]:
'Haryana' != 'Haryana'

False

In [8]:
'Haryana' > 'Rajasthan'
# lexicographically greater than

False

In [20]:
'Haryana' > 'haryana'
# lexicographically smaller than because ASCII value of 'H' is 72 and 'h' is 104

False

In [9]:
'Haryana' >= 'Haryana'

True

#### Logical Operations

In Python, logical operations can be performed on strings using the `and`, `or`, and `not` operators. These operators evaluate the truthiness of the strings involved in the operation.

In python non-empty strings are considered `True` and empty strings are considered `False` .

In [10]:
'Haryana' and 'Rajasthan'
# Here both strings are non-empty, so the result is the last evaluated operand because for `and` operator both operands has to be true hence it checks both and returns the last checked operand .

'Rajasthan'

In [12]:
'Haryana' and ''
# Here the first string is non-empty , but the second string is empty (falsy), so the result is the first false operand because for `and` operator both operands has to be true hence it stops at the first falsy operand and returns it.

''

In [14]:
'' and 'Rajasthan'

''

In [15]:
'Haryana' or 'Rajasthan'
# Here the first string is non-empty (True), so the result is the first true operand because for `or` operator only one operand has to be true hence it stops at the first true operand and returns it.

'Haryana'

In [16]:
'Rajasthan' or 'Haryana'

'Rajasthan'

In [17]:
'' or 'Haryana'
# Here the first string is empty (false), but the second string is non-empty (true), so the result is the second true operand because for `or` operator only one operand has to be true hence it checks the first operand finds it false and moves to the second operand and returns it.

'Haryana'

In [18]:
'Haryana' or ''

'Haryana'

In [21]:
not 'hello'

False

In [22]:
not ""

True

#### Loops on Strings

In [23]:
for ch in "Haryana":
    print(ch)

H
a
r
y
a
n
a


In [25]:
for ch in "Harayana":
    print("Rajasthan")
# prints Rajasthan 8 times because there are 8 characters in the string "Harayana"

Rajasthan
Rajasthan
Rajasthan
Rajasthan
Rajasthan
Rajasthan
Rajasthan
Rajasthan


#### Membership Operations

In [26]:
'n' in 'Haryana'

True

In [27]:
'n' not in 'Haryana'

False

## Common Functions
These functions can be used with strings , tuples , lists , sets , dictionaries etc.

- `len() :` Returns the length of the string.
- `min() :` Returns the smallest character in the string based on ASCII value.
- `max() :` Returns the largest character in the string based on ASCII value.
- `sorted() :` Returns a sorted list of characters in the string.

In [28]:
len('hello world')

11

In [29]:
max('hello world')

'w'

In [30]:
min('hello world')
# Here space has the smallest ASCII value of 32

' '

In [32]:
sorted('hello world',reverse=True)  # By default, it sorts in ascending order but here we have used reverse=True to sort in descending order
# returns a list of characters sorted in descending order not a string .

['w', 'r', 'o', 'o', 'l', 'l', 'l', 'h', 'e', 'd', ' ']

## Capitalize/Title/Upper/Lower/Swapcase

`capitalize()` : It converts the first character to uppercase and the rest to lowercase.

In [56]:
s = 'hello world'
print(s.capitalize())   # it returns a new string and does not modify the original string
print(s)

Hello world
hello world


**Here original string is not modified because strings are immutable.**

`title()` : It converts the first character of each word to uppercase .

In [34]:
s.title()

'Hello World'

`upper()` : It converts all characters to uppercase.

In [35]:
s.upper()

'HELLO WORLD'

`lower()` : It converts all characters to lowercase.

In [None]:
'Hello Wolrd'.lower()

`swapcase()` : It converts uppercase characters to lowercase and vice versa.

In [36]:
'HeLlO WorLD'.swapcase()

'hElLo wORld'

## Count/Find/Index

`count(substring)` : It returns the number of occurrences of a substring in the string.

In [37]:
'my name is ritesh swami'.count('i')

3

`find(substring)` : It returns the lowest index of the substring if found in the string. If not found, it returns -1.

In [40]:
'my name is ritesh swami'.find('x')

-1

`index(substring)` : It returns the lowest index of the substring if found in the string. If not found, it raises a `ValueError`.

In [None]:
'my name is ritesh swami'.index('x')  #ValueError

## endswith/startswith

`endswith(suffix)` : It returns `True` if the string ends with the specified suffix, otherwise it returns `False`.

In [48]:
'my name is ritesh swami'.endswith('ami')

True

`startswith(prefix)` : It returns `True` if the string starts with the specified prefix, otherwise it returns `False`.

In [46]:
'my name is ritesh swami'.startswith('my n')

True

## format
It is used to format strings by embedding variables or expressions within a string.

syntax:
```python
'string with placeholders {}'.format(values)
```
order of values should match the order of placeholders or else we can provide order using index inside {}, it starts from 0.

In [51]:
name = 'Ritesh'
state = 'Haryana'
print('My name is {} and I am from {}'.format(name, state))

My name is Ritesh and I am from Haryana


In [55]:
name = 'Ritesh'
state = 'Haryana'
'Hi my name is {1} and I am from {0}'.format(name,state)

'Hi my name is Haryana and I am from Ritesh'

## isalnum/ isalpha/ isdigit/ isidentifier

`isalnum()` : It returns `True` if all characters in the string are alphanumeric (letters and numbers) and there is at least one character, otherwise it returns `False`.

In [58]:
'Ritesh123'.isalnum()
# True if all characters are alphanumeric (letters and numbers) and there is at least one character, otherwise False .

True

In [60]:
'Ritesh'.isalnum()

True

Here all characters are alphabets so it returns True because alphabets are also considered as alphanumeric characters.

In [59]:
'ritesh@123'.isalnum()
# False because of special character '@'

False

`isalpha()` : It returns `True` if all characters in the string are alphabets and there is at least one character, otherwise it returns `False`.

In [61]:
'nitish'.isalpha()

True

In [None]:
`Ritesh123`.isalpha()

`isdigit()` : It returns `True` if all characters in the string are digits and there is at least one character, otherwise it returns `False`.

In [63]:
'123abc'.isdigit()

False

In [65]:
'123456'.isdigit()

True

`isidentifier()` : It returns `True` if the string is a valid identifier (variable name) in Python, otherwise it returns `False`. A valid identifier must start with a letter (a-z, A-Z) or an underscore (_) and can be followed by letters, digits (0-9), or underscores.

In [66]:
'first-name'.isidentifier()

False

In [67]:
'_first_name'.isidentifier()

True

## Split/Join

`split(delimiter)` : It splits the string into a list of substrings based on the specified delimiter. If no delimiter is provided, it splits on whitespace by default.

In [70]:
'my name is ritesh swami'.split()

['my', 'name', 'is', 'ritesh', 'swami']

In [71]:
'my,name,is,ritesh,swami'.split(',')

['my', 'name', 'is', 'ritesh', 'swami']

In [74]:
'my name is ritesh swami'.split('is')
# Here it splits the string at each occurrence of the substring 'is'

['my name ', ' ritesh swami']

`join(iterable)` : It joins the elements of an iterable (like a list or tuple) into a single string, with the specified string as the separator.

In [75]:
" ".join(['hi', 'my', 'name', 'is', 'ritesh'])

'hi my name is ritesh'

In [76]:
"$".join(['hi', 'my', 'name', 'is', 'ritesh'])

'hi$my$name$is$ritesh'

## Replace
It replaces all occurrences of a specified substring with another substring.
syntax:
```python
string.replace(old, new, count)
```
- `old` : The substring to be replaced.
- `new` : The substring to replace with.
- `count` : (Optional) The maximum number of occurrences to replace. If not provided, all occurrences are replaced.

In [77]:
'hi my name is ritesh'.replace('ritesh', 'RITESH SWAMI')
# Here it replaces all occurrences of the substring 'ritesh' with 'RITESH SWAMI'

'hi my name is RITESH SWAMI'

In [78]:
'hi my name is ritesh ritesh'.replace('ritesh', 'RITESH SWAMI', 1)

'hi my name is RITESH SWAMI ritesh'

Always remember we are not changing the original string because strings are immutable, it returns a new string with the replacements made.

## Strip
It removes leading and trailing whitespace (spaces, tabs, newlines) from the string. It can also remove specified characters from both ends of the string.

- Real world use cases:
    - Cleaning user input by removing extra spaces in form fields , especially in names, addresses, etc.
    - Preprocessing text data for analysis or machine learning.
    - Formatting strings for display or storage.
    - Preparing strings for comparison or searching.
syntax:
```python
string.strip(chars)
```
- `chars` : (Optional) A string specifying the set of characters to be removed. If not provided, it removes whitespace by default.

In [79]:
'   hello world   '.strip()

'hello world'

In [81]:
'***hello world***'.strip('*')
# Here it removes all leading and trailing '*' characters from the string

'hello world'