In [1]:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

# Analytics and Statistics using Python
## S02: String
- Strings and Tuples
- Accessing Strings
- Formatting Strings
- String slices

<img src='../../prasami_images/prasami_color_tutorials_small.png' width='400' alt="By Pramod Sharma : pramod.sharma@prasami.com" align = "left"/>

## Strings

Text is a string data type. Any data type written as text is a string. Any data under single, double or triple quote are strings. There are different string methods and built-in functions to deal with string data types. To check the length of a string use the len() method.

In [2]:
letter = 'P'                # A string could be a single character or a bunch of texts

print(letter)               # P

print(len(letter))          # 1

greeting = 'Hello, World!'  # String could be made using a single or double quote,"Hello, World!"

print(greeting)             # Hello, World!

print(len(greeting))        # 13

sentence = "Welcome to PG-DHPCAP"

print(sentence)

P
1
Hello, World!
13
Welcome to PG-DHPCAP


In [3]:
# Multiline string is created by using triple single (''') or triple double quotes ("""). See the example below
para = '''Python is a popular programming language. 
It was created by Guido van Rossum, and released in 1991. 
It is used for, web development (server-side), software development, mathematics, and system scripting.'''
print (para)

Python is a popular programming language. 
It was created by Guido van Rossum, and released in 1991. 
It is used for, web development (server-side), software development, mathematics, and system scripting.


## String Concatenation

Merging or connecting strings is called concatenation.

In [4]:
firstName = 'Mohan'
lastName = 'Sharma'
space = ' '
fullName = firstName  +  space + lastName
print(fullName) # Mohan Sharma

Mohan Sharma


In [5]:
# Alternatively
fullName = f'{firstName} {lastName}'
print ('Full Name: ', fullName) # Full Name:  Mohan Sharma

Full Name:  Mohan Sharma


In [6]:
# Also
fullName = ' '.join([firstName, lastName])
print ('Full Name: ', fullName) # Full Name:  Mohan Sharma

Full Name:  Mohan Sharma


In [7]:

# Checking the length of a string using len() built-in function
print(len(firstName))  # 5
print(len(lastName))   # 6
print(len(firstName) < len(lastName)) # True
print(len(fullName)) # 12

5
6
True
12


### Escape Sequences in Strings

In Python, a `\` followed by a character is an escape sequence. Let us see the most common escape characters:

- \n: new line
- \t: Tab means(4 spaces)
- \\\\: Back slash
- \\': Single quote (')
- \\": Double quote (")

Now, let us see the use of the above escape sequences with examples.

In [8]:
print('Python is a popular programming language. \n It was created by Guido van Rossum, and released in 1991.') # line break

Python is a popular programming language. 
 It was created by Guido van Rossum, and released in 1991.


In [9]:
print('Events   \tGold\tSilver\tBronze\tBrass') # adding tab space or 4 spaces  (default tab setting)

print('Athletics\t0\t1\t0\t0')

print('Archery  \t0\t0\t0\t1')

print('Badminton\t0\t0\t0\t1')

print('Hockey   \t0\t0\t1\t0')

print('Shooting \t0\t0\t3\t3')

Events   	Gold	Silver	Bronze	Brass
Athletics	0	1	0	0
Archery  	0	0	0	1
Badminton	0	0	0	1
Hockey   	0	0	1	0
Shooting 	0	0	3	3


In [10]:
print('This is a backslash  symbol (\\)') # To write a backslash

print('In every programming language it starts with \'Hello, World!\'') # to write a another single quote inside a single quote

This is a backslash  symbol (\)
In every programming language it starts with 'Hello, World!'


### String formatting

#### Old Style String Formatting (% Operator)

In Python there are many ways of formatting strings. In this section, we will cover some of them.

The "%" operator is used to format a set of variables enclosed in a "tuple" (a fixed size list), together with a format string, which contains normal text together with "argument specifiers", special symbols like "%s", "%d", "%f", "%.<small>number of digits</small>f".

- %s - String (or any object with a string representation, like numbers)
- %d - Integers
- %f - Floating point numbers
- "%.<small>number of digits</small>f" - Floating point numbers with fixed precision


In [11]:
# Strings only
language = 'Python'
formated_string = 'I am %s %s. I am learning %s.' %(firstName, lastName, language)
print(formated_string)

I am Mohan Sharma. I am learning Python.


In [12]:
# Strings  and numbers
radius = 2
pi = 3.14
area = pi * radius ** 2
formated_string = 'The area of circle with a radius %d is %.2f.' %(radius, area) # 2 refers the 2 significant digits after the point
formated_string

'The area of circle with a radius 2 is 12.56.'

In [13]:
# We can even print collections:
medals = ['Gold', 'Silver', 'Bronze', 'Brass']

events =  {'Athletics': [0, 1, 0, 0],
'Archery': [0, 0, 0, 1],
'Badminton': [0, 0, 0, 1],
'Hockey': [0, 0, 1, 0],
'Shooting': [0, 0, 3, 3]}

print ('Proposed medals are %s' % medals )
print ('Medal tally would have been %s' % events)

Proposed medals are ['Gold', 'Silver', 'Bronze', 'Brass']
Medal tally would have been {'Athletics': [0, 1, 0, 0], 'Archery': [0, 0, 0, 1], 'Badminton': [0, 0, 0, 1], 'Hockey': [0, 0, 1, 0], 'Shooting': [0, 0, 3, 3]}


### New Style String Formatting (str.format)

In [14]:
formated_string = 'I am {} {}. I teach {}'.format(firstName, lastName, language)
print(formated_string)

a = 4
b = 3

print('{} + {} = {}'.format(a, b, a + b)) # default each value goes in one place holder

print('{0:} - {1:} = {2:}'.format(a, b, a - b)) # we can index for placement

print('{1:} * {2:} = {0:}'.format(a * b, a, b)) # alter the sequence

print('{} / {} = {:.2f}'.format(a, b, a / b)) # limits it to two digits after decimal

print('{0:d} % {0:d} = {0:d}'.format(a, b, a % b)) # specify the format

print('{} // {} = {}'.format(a, b, a // b))

print('{} ** {} = {}'.format(a, b, a ** b))

I am Mohan Sharma. I teach Python
4 + 3 = 7
4 - 3 = 1
4 * 3 = 12
4 / 3 = 1.33
4 % 4 = 4
4 // 3 = 1
4 ** 3 = 64


In [15]:
# Strings  and numbers
radius = 2.0
pi = 3.14
area = pi * radius ** 2
formated_string = 'The area of a circle with a pi  {0:.2f} radius {1:3.1f} is {2:6.2f}.'.format(pi,radius, area) # 2 digits after decimal
print(formated_string)


The area of a circle with a pi  3.14 radius 2.0 is  12.56.


### String Interpolation / f-Strings (Python 3.6+)

Another new string formatting is string interpolation, f-strings. Strings start with f and we can inject the data in their corresponding positions. Quite convenient way of printing.

In [16]:
a = 4
b = 3
print(f'{a} + {b} = {a +b}')
print(f'{a} - {b} = {a - b}')
print(f'{a} * {b} = {a * b}')
print(f'{a} / {b} = {a / b:.2f}')
print(f'{a} % {b} = {a % b}')
print(f'{a} // {b} = {a // b}')
print(f'{a} ** {b} = {a ** b}')

4 + 3 = 7
4 - 3 = 1
4 * 3 = 12
4 / 3 = 1.33
4 % 3 = 1
4 // 3 = 1
4 ** 3 = 64


### Python Strings as Sequences of Characters

Python strings are sequences of characters, and share their basic methods of access with other Python ordered sequences of objects – lists and tuples. The simplest way of extracting single characters from strings (and individual members from any sequence) is to unpack them into corresponding variables.

#### Unpacking Characters

In [17]:
language = 'Python'
a,b,c,d,e,f = language # unpacking sequence characters into variables
print(a) # P
print(b) # y
print(c) # t
print(d) # h
print(e) # o
print(f) # n

P
y
t
h
o
n


#### Accessing Characters in Strings by Index

In programming counting starts from zero. Therefore the first letter of a string is at zero index and the last letter of a string is the length of a string minus one.

<img src = '../../images/asp_nb_fig1.png'>

In [18]:
language = 'Python'

firstLetter = language[0]

print(firstLetter) # P

P


In [19]:
secondLetter = language[1]

print(secondLetter) # y

y


In [20]:
lastIndex =  len(language) - 1

lastLetter = language[lastIndex]

print(lastLetter) # n

n


In [21]:
# If we want to start from right end we can use negative indexing. -1 is the last index.

lastLetter = language[-1]

print(lastLetter) # n
secondLast = language[-2]
print(secondLast) # o

n
o


#### Slicing Python Strings

In python we can slice strings into substrings.



In [22]:
firstThree = language[0:3] # starts at zero index and up to 3 but not include 3
print(firstThree) #Pyt

lastThree = language[3:6]
print(lastThree) # hon

# Another way
lastThree = language[-3:]
print(lastThree)   # hon

lastThree = language[3:]
print(lastThree)   # hon

Pyt
hon
hon
hon


#### Reversing a String

We can easily reverse strings in python.


In [23]:

greeting = 'Hello, World!'

print(greeting[::-1]) # !dlroW ,olleH

!dlroW ,olleH


#### Skipping Characters While Slicing

It is possible to skip characters while slicing by passing step argument to slice method.

In [24]:
pto = language[0:6:2] #
print(pto) # Pto

Pto


### String Methods

There are many string methods which allow us to format strings. See some of the string methods in the following example:

**capitalize():** Converts the first character of the string to capital letter

In [25]:
sentence2 = 'neeraj wins silver medal in Olympic 2024'
print(sentence2.capitalize()) # 'Neeraj wins silver medal in Olympic 2024'

Neeraj wins silver medal in olympic 2024


**count():** returns occurrences of substring in string, count(substring, start=.., end=..). The start is a starting indexing for counting and end is the last index to count.

In [26]:
sentence = 'Neeraj Chopra wins silver medal in Olympic 2024'
print(sentence.count('e')) # 3
print(sentence.count('i', 8, 35)) # 1, 
print(sentence.count('ra')) # 2`

4
3
2


**endswith():** Checks if a string ends with a specified ending

In [27]:
print(sentence.endswith('pic'))   # False
print(sentence.endswith('24'))    # True

False
True


**expandtabs():** Replaces tab character with spaces, default tab size is 8. It takes tab size argument

In [28]:
sentence2 = 'Neeraj\tChopra\twins\tsilver\tmedal\tin\tOlympic\t2024'
for i in range(1,61):
    print(i%10, end='')
print ()
print(sentence2.expandtabs())   # Default is 8
print(sentence2.expandtabs(1))   # Neeraj  Chopra  wins    silver  medal   in      Olympic 2024
print(sentence2.expandtabs(10)) # Neeraj Chopra wins silver medal in Olympic 2024

123456789012345678901234567890123456789012345678901234567890
Neeraj  Chopra  wins    silver  medal   in      Olympic 2024
Neeraj Chopra wins silver medal in Olympic 2024
Neeraj    Chopra    wins      silver    medal     in        Olympic   2024


**find():** Returns the index of the first occurrence of a substring, if not found returns -1

In [29]:
print(sentence.find('e'))  # 1
print(sentence.find('c')) # 41 not 7 as Caps C is different than smallcase c.

1
41


**rfind():** Right find -  Returns the index of the last occurrence of a substring, if not found returns -1

In [30]:
print(sentence.rfind('m'))  # 38
print(sentence.rfind('ra')) # 17
print(sentence.rfind('az')) # 17

38
11
-1


**index():** Returns the lowest index of a substring, additional arguments indicate starting and ending index (default 0 and string length - 1). If the substring is not found it raises a valueError. 

In [31]:
sub_string = 'ra'
print(sentence.index(sub_string))  # 3

try:
    print(sentence.index(sub_string, 30)) # error
except ValueError as e:
    print (f'Error: {e}')

3
Error: substring not found


**rindex():** Returns the highest index of a substring, additional arguments indicate starting and ending index (default 0 and string length - 1)

In [32]:
sub_string = 'ra'
print(sentence.rindex(sub_string))  # 11
try:
    print(sentence.rindex(sub_string, 30)) # error
except ValueError as e:
    print (f'Error: {e}')

11
Error: substring not found


**isalnum():** Checks alphanumeric character

In [33]:
sentence1 = 'AnalyticsandStatisticsusingPython'
print(sentence1, ':', sentence1.isalnum()) # True
print(sentence, ':', sentence.isalnum()) # False, space is not part of alpha-numeric set.
print(''.join(sentence.split()), ':', ''.join(sentence.split()).isalnum()) # Remove space and it is

AnalyticsandStatisticsusingPython : True
Neeraj Chopra wins silver medal in Olympic 2024 : False
NeerajChoprawinssilvermedalinOlympic2024 : True


**isalpha():** Checks if all string elements are alphabet characters (a-z and A-Z)

In [34]:
print(sentence1.isalpha()) # True

print(sentence.isalpha()) # False, space is once again excluded and numbers

print(''.join(sentence.split()), ':', ''.join(sentence.split()).isalpha())

True
False
NeerajChoprawinssilvermedalinOlympic2024 : False


**isdecimal():** Checks if all characters in a string are decimal (0-9)

In [35]:
print(sentence, ':', sentence.isdecimal())  # False

sentence2 = '123'
print(sentence2, ':', sentence2.isdecimal())  # True

sentence2 = '\u00B2'
print(sentence2, ':', sentence2.isdecimal())   # False

sentence2 = '12 3'
print(sentence2, ':', sentence2.isdecimal())  # False, space not allowed

sentence2 = '12.3'
print(sentence2, ':', sentence2.isdecimal()) 

Neeraj Chopra wins silver medal in Olympic 2024 : False
123 : True
² : False
12 3 : False
12.3 : False


**isdigit():** Checks if all characters in a string are numbers (0-9 and some other unicode characters for numbers)

In [36]:
sentence2 = '30'
print(sentence2, ':', sentence2.isdigit())   # True
sentence2 = '\u00B2'
print(sentence2, ':', sentence2.isdigit())   # True

30 : True
² : True


**isnumeric():** Checks if all characters in a string are numbers or number related (just like isdigit(), just accepts more symbols, like ½)

In [37]:
num = '10'
print(num, ':', num.isnumeric()) # True
num = '\u00BD' # ½
print(num, ':',num.isnumeric()) # True
num = '10.5'
print(num, ':',num.isnumeric()) # False
num = '10\u00BD'
print(num, ':',num.isnumeric()) # False

10 : True
½ : True
10.5 : False
10½ : True


**isidentifier()**: Checks for a valid identifier - it checks if a string is a valid variable name.

In [49]:
sentence2 = '123Silver4Bronze'
print(sentence2.isidentifier()) # False, because it starts with a number
sentence2 = 'oneSilverFourBronze'
print(sentence2.isidentifier()) # True

False
True


**islower():** Checks if all alphabet characters in the string are lowercase.

In [39]:
sentence2 = 'one silver four bronze'
print(sentence2.islower()) # True
print(sentence.islower()) # False

True
False


**isupper():** Checks if all alphabet characters in the string are uppercase

In [40]:
print(sentence2.isupper()) #  False
sentence2 = 'ONE SILVER FOUR BRONZE'
print(sentence2.isupper()) # True

False
True


**join():** Returns a concatenated string

In [41]:
ml_tech = ['Python', 'NumPy', 'Pandas', 'matplotlib']
result = ' '.join(ml_tech)
print(result) # 'Python NumPy Pandas matplotlib'

result = '# '.join(ml_tech)
print(result) # Python# NumPy# Pandas# matplotlib

Python NumPy Pandas matplotlib
Python# NumPy# Pandas# matplotlib


**strip():** Removes all given characters starting from the beginning and end of the string

In [42]:
print(sentence1.strip('Anho')) # 'alyticsandStatisticsusingPyt'

alyticsandStatisticsusingPyt


**replace():** Replaces substring with a given string

In [43]:
print(sentence1.replace('Python', 'Pytorch')) # 'AnalyticsandStatisticsusingPytorch'

AnalyticsandStatisticsusingPytorch


**split():** Splits the string, using given string or space as a separator

In [44]:
print(sentence.split()) # default split is at white space
sentence2 = 'Analytics,and,Statistics,using,Python'
print(sentence2.split(',')) # ['Analytics', 'and', 'Statistics', 'using', 'Python']

['Neeraj', 'Chopra', 'wins', 'silver', 'medal', 'in', 'Olympic', '2024']
['Analytics', 'and', 'Statistics', 'using', 'Python']


**title():** Returns a title cased string.

In [45]:
sentence2 = 'AnalyticsandStatisticsusingPython'
print(sentence2.title()) # Thirty Days Of Python

Analyticsandstatisticsusingpython


**swapcase():** Converts all uppercase characters to lowercase and all lowercase characters to uppercase characters

In [46]:
print(sentence2.swapcase())   # aNALYTICSANDsTATISTICSUSINGpYTHON

aNALYTICSANDsTATISTICSUSINGpYTHON


## INTEGERS VS. STRINGS OF DIGITS
You might wonder why we have used strings to represent the telephone numbers—why not integers? Consider what would happen to STD and ISD representations.

In [47]:
02265870412

SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers (689994431.py, line 1)

In [None]:
code = 02265870412

Not exactly what we wanted, is it? More over, octal numbers are written with an initial zero. It is impossible to write decimal numbers like that. 

The take away is: telephone numbers (and other numbers that may contain leading zeros) should be
represented as strings of digits—not integers

### Practice Problem: 
#### Problem Statement 1:

Write a Python program that checks if a given string is a palindrome (reads the same forward and backward).

In [48]:
pal_str = 'abbcbba'
no_pal = 'abcde'

print (pal_str == pal_str[::-1])

print (no_pal == no_pal[::-1])

True
False


#### Problem Statement 2:
Write a Python program that takes a full name as input in the format "First Last".
- Make sure it is in "title case" i.e. first letter capitalized.
- Assign the first name and last name to two separate variables at once.
- Format the output to display the name as "Last, First".
- Print the initials of the name using string slicing.

Example imput : full_name = 'Donald tRump'

In [49]:
# Step 1: Input full name
full_name = 'Donald tRump' #input("Enter your full name (First Last): ")

# Step 2: convert to lower case
full_name = full_name.lower()
print (full_name)

# Step 3: Assigning multiple values at once
first_name, last_name = full_name.split()

# Step 4: change to title case
first_name, last_name = first_name.title(), last_name.title()

# Formatting the output
formatted_name = f"{last_name}, {first_name}"
print("Formatted Name:", formatted_name)

# String Slicing to get initials
initials = first_name[0] + last_name[0]
print("Initials:", initials)


donald trump
Formatted Name: Trump, Donald
Initials: DT


#### Problem Statement 2:
Write a Python program that:

1. Takes a sentence as input.
2. Counts the number of vowels in the sentence.
3. Reverses the sentence using string slicing.
4. Replaces all occurrences of a specific word in the sentence with another word using basic string operations.

In [50]:
# Step 1: Input sentence
sentence = 'Python is fun' #input("Enter a sentence: ")

# Step 2: Count vowels
vowels = "aeiouAEIOU"
vowel_count = sum(1 for char in sentence if char in vowels)
print("Number of vowels:", vowel_count)

# Step 3: Reverse the sentence using string slicing
reversed_sentence = sentence[::-1]
print("Reversed Sentence:", reversed_sentence)

# Step 4: Replace a word in the sentence
old_word = 'fun'# input("Enter the word to replace: ")
new_word = 'exciting' #input("Enter the new word: ")
modified_sentence = sentence.replace(old_word, new_word)
print("Modified Sentence:", modified_sentence)


Number of vowels: 3
Reversed Sentence: nuf si nohtyP
Modified Sentence: Python is exciting
