![full_logo_small-2.png](attachment:full_logo_small-2.png)

# Strings and Regular Expressions

The purpose of this notebook is to demonstrate string manipulation and the use of `regular expressions`, commonly known as `regex`, to perform advanced operations. Often times one may jump at the slightest opportunity to use regex to solve problems that can be handled elegantly with Python's in-built string functions. As powerful as regex is, it is also slow to compile and should be avoided when possible. The following problems showcase the use of both regex as well as string functions.

Learning Objectives:
- String functions
- String manipulation
- Slicing and rearranging strings
- Regular expression usage
- Iteration with strings
- List comprehension
***

In [None]:
# Importing all necessary libraries in the topmost cell
# Run this cell once before running other cells
import re
from collections import Counter
import unicodedata

1. A Python program to calculate the length of a string.

In [None]:
test_string = input("Enter a string ").strip()  # accepting user input

# Method 1: Naive
counter = 0
for char in test_string:
    counter += 1
print(f"The string {test_string} is {counter} characters long - Method 1")

# Method 2: Standard Library
print(f"The string {test_string} is {len(test_string)} characters long - Method 2")

2. A Python program to count the number of characters (character frequency) in a string.<br><br> 
```
Sample String : google.com 
Expected Result: {'o': 3, 'g': 2, '.': 1, 'e': 1, 'l': 1, 'm': 1, 'c': 1}
```

In [None]:
test_string = input("Enter a string ").strip()  # accepting user input

# Method 1: Naive
chars_one = {}  # creating a dictionary to store the data

for char in test_string:
    if char in chars_one:
        chars_one[char] += 1
    else:
        chars_one[char] = 1

print(f"The count of all unique characters is: {chars_one} - Method 1")

# Method 2: Standard Library
chars_two = {}

for keys in test_string:
    chars_two[keys] = chars_two.get(keys, 0) + 1

print(f"The count of all unique characters is: {chars_two} - Method 2")

3. A Python program to get a string made of the first 2 and the last 2 chars from a given string. If the string length is less than 2, return a 4 letter string of repetition of the characters instead of the empty string. <br><br>
```
Sample String: 'Teinstein' Expected Result : 'Tein'
Sample String : 'Te' Expected Result : 'TeTe'
```

In [None]:
test_string = input("Enter a string ").strip()  # accepting user input

new_str = ""  # new empty strings have to be defined each time to prevent conflict between cells
if len(test_string) < 3:
    new_string = test_string + test_string
else:
    chars = [c for c in test_string]
    new_string = chars[0] + chars[1] + chars[-2] + chars[-1]

print(f"The new string is {new_string}")

4. A Python program to get a string from a given string where all occurrences of its first char have been changed to "$", except the first char itself. <br><br>
```
Sample String : 'restart' 
Expected Result : 'resta$t'
```

In [None]:
test_string = input("Enter a string ").strip()  # accepting user input

leading_char = test_string[0]
new_string = leading_char + test_string.replace(leading_char, '$')[1:]

print(f"Result: {new_string}")
        

5. A Python program to get a single string from two given strings, separated by a space
and swap the first two characters of each string. <br><br>
```
Sample String : 'abc', 'xyz' 
Expected Result : 'xyc abz'
```

In [None]:
string_one = input("Enter the first string ").strip()  # accepting user input
string_two = input("Enter the second string ").strip()

final_string = string_two[0:2] + string_one[2:] + " " + string_one[0:2] + string_two[2:]
print(f"Result: {final_string}")


6. A Python program to add 'ing' at the end of a given string (length should be at least
3). If the given string already ends with 'ing' then add 'ly' instead. If the string length of the
given string is less than 3, leave it unchanged. <br><br>
```
Sample String : 'abc' Expected Result : 'abcing'
Sample String : 'string' Expected Result : 'stringly'
```

In [None]:
test_string = input("Enter a string ").strip()  # accepting user input

new_string = ""
if len(test_string) >= 3:
    if test_string[-3:] == "ing":
        new_string = test_string + "ly"
    else:
        new_string = test_string + "ing"
        
print(new_string)

7. A Python program to find the first occurence of the substring 'not' and 'poor' from a
given string, if 'not' follows 'poor', replace the whole 'not'...'poor' substring with 'good'. Return
the resulting string. <br><br>
```
Sample String: 'The lyrics are not poor!' 
Expected Result: 'The lyrics are good!'
```

In [None]:
test_string = input("Enter a string ").strip()  # accepting user input

str_poor = test_string.find("poor")
str_not = test_string.find("not")

if str_poor > str_not and str_not > 0 and str_poor > 0:
    test_string = test_string.replace(test_string[str_not: (str_poor + 4)], "good")

print(f"Result: {test_string}")

8. A program that takes your full name as input and displays the abbreviations of the
first and middle names except the last name which is displayed as it is. <br><br>
```
Sample String: 'Jehangir Ratanji Dadabhoy Tata' 
Expected Result: 'J.R.D. Tata'
```

In [None]:
test_string = input("Enter a string ").strip()  # accepting user input

names = test_string.split()
initials = ".".join([name[0] for name in names[:-1]])

if len(names) > 1:
    print(f"Your shortened name is {initials}. {names[-1]}")
else:
    print(f"Your name is {test_string}")

9. A program to create a new string with all the consonants deleted from a string.

In [None]:
test_string = input("Enter a string ").strip()  # accepting user input

new_string = ""
for char in test_string:
    if char in ['a', 'e', 'i', 'o', 'u', 'A', 'E', 'I', 'O', 'U']:
        new_string += char

print(f"The string without consonants: {new_string}")

10. A Python program to check that a string contains only a certain set of characters
using regular expressions(in this case a-z, A-Z and 0-9).

In [None]:
test_string = input("Enter a string ").strip()  # accepting user input
pattern = re.compile(r'[A-za-z0-9 ]+')

if pattern.fullmatch(test_string) != None:
    print("The string matches")
else:
    print("The string does not match")

11. A Python program that matches a string that has an 'a' followed by zero or more b's.

In [None]:
test_string = input("Enter a string ").strip()  # accepting user input
pattern = re.compile(r'ab*?')

if pattern.match(test_string) != None:
    print("The string matches")
else:
    print("The string does not match")

12. A Python program that matches a string that has an a followed by one or more b's.

In [None]:
test_string = input("Enter a string ").strip()  # accepting user input
pattern = re.compile(r'ab+?')

if pattern.match(test_string) != None:
    print("The string matches")
else:
    print("The string does not match")

13. A Python program that matches a string that has an 'a' followed by anything, ending in 'b'.

In [None]:
test_string = input("Enter a string ").strip()  # accepting user input
pattern = re.compile(r'a.*?b$')

if pattern.match(test_string) != None:
    print("The string matches")
else:
    print("The string does not match")

14. A Python program to search some literals in a string. <br><br>
```
Sample text: 'The quick brown fox jumps over the lazy dog.' 
Searched words: 'fox', 'dog', 'horse'
```

In [None]:
while True:
    test_string = input("Enter a string to search, enter E to exit\n").strip()  # accepting user input
    if test_string in ['e' or 'E']:
        break
    else:
        while True:
            search = input("Enter a search term, or S to stop\n").strip()
            if search in ['s', 'S']:
                break
            elif search in test_string.split():
                print(f"{search} exists in the string.")
            else:
                print(f"{search} does not exist in the string.")


15. A Python program to validate IPv4 and IPv6 addresses using regular expressions.

In [None]:
test_string = input("Enter an IP ").strip()  # accepting user input
pattern_ipv4 = re.compile("(([0-9]|[1-9][0-9]|1[0-9][0-9]|"\
            "2[0-4][0-9]|25[0-5])\\.){3}"\
            "([0-9]|[1-9][0-9]|1[0-9][0-9]|"\
            "2[0-4][0-9]|25[0-5])")
pattern_ipv6 = re.compile("((([0-9a-fA-F]){1,4})\\:){7}"\
             "([0-9a-fA-F]){1,4}")
        
if re.search(pattern_ipv4, test_string):
    print("Matched a valid IPv4 address")
elif re.search(pattern_ipv6, test_string):
    print("Matched a valid IPv6 address")
else:
    print("The IP is invalid")

16. A Python program to validate RFC 3339 date time format using regular expressions. <br><br>
```
Example: 
2002-10-02T10:00:00-05:00 
2002-10-02T15:00:00Z 
2002-10-02T15:00:00.05Z
```

In [None]:
test_string = input("Enter a date ").strip()  # accepting user input
pattern = re.compile(r'^[1-9]\d{3}-\d{2}-\d{2}T\d{2}:\d{2}:(\d{2}Z|\d{2}-\d{2}:\d{2}|\d{2}\.\d{2}Z)$')

if re.search(pattern, test_string):
    print("Matched a valid RFC 3339 date")
else:
    print("The date is invalid")

17. A Python program to validate simple email addresses using regular expressions. Note that to cover every valid email address, the regex would simply be out of hand.

In [None]:
test_string = input("Enter an email ").strip()  # accepting user input
pattern = re.compile(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b')

if re.match(pattern, test_string):
    print("Matched a valid email")
else:
    print("Invalid email")

18. A Python program to validate Indian phone numbers using regular expressions.

In [None]:
test_string = input("Enter a phone number ").strip()  # accepting user input
pattern = re.compile(r'(0|91)?[7-9][0-9]{9}')

if re.match(pattern, test_string):
    print("Valid phone number matched")
else:
    print("Invalid phone number")

19. A Python program to validate if two given strings are identical. (Password validation)

In [None]:
string_one = input("Enter the first string ").strip()  # accepting user input
string_two = input("Enter the second string ").strip()

if string_one != string_two:
    print("Passwords don't match")
else:
    print("Passwords matched successfully")

20. A Python program to reverse a string.

In [None]:
test_string = input("Enter a string ").strip()  # accepting user input

# Method 1
new_str_1 = test_string[::-1]

# Method 2
new_str_2 = "".join(reversed(test_string))

print(f"The reversed string is {new_str_1} - Method 1")
print(f"The reversed string is {new_str_2} - Method 2")

21. A Python program to check if a password matches the format using a regular expression. It must contain at least 8 and at most 20 characters with at least one lowercase, one uppercase, one special symbol, one digit and no whitespace.

In [None]:
test_string = input("Enter your password ").strip()  # accepting user input
pattern = re.compile("^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,20}$")

if re.match(pattern, test_string):
    print("Matched a valid password")
else:
    print("Invalid password")


22. A Python program to convert alternate characters of a string to uppercase.

In [None]:
test_string = input("Enter a string ").strip()  # accepting user input

new_string = ""
for char in test_string:
    if test_string.index(char) % 2 == 0:
        new_string += char.upper()
    else:
        new_string += char

print(f"{new_string}")

23. A Python program to check for palindromes.

In [None]:
test_string = input("Enter a string ").strip()  # accepting user input

def palindrome(str):
    for i in range(0, int(len(str)/2)):
        if test_string[i] != str[len(str) - i - 1]:
            return False
        else:
            return True

if palindrome(test_string):
    print("The string is a palindrome")
else:
    print("The string is not a palindrome")

24. A Python program to check if a substring exists in a given string using regular expressions.

In [None]:
str1 = input("Enter a string ").strip()  # accepting user input
str2 = input("Enter the search string ").strip()

if re.search(str2, str1):
    print(f"{str2} exists in {str1}")
else:
    print(f"{str2} is not present")


25. A Python program to slice a string in half and interchange the halves.

In [None]:
test_string = input("Enter a string ").strip()  # accepting user input
strlen = len(test_string)

top_half, bottom_half = test_string[:strlen//2], test_string[strlen//2:]

print(f"The new string is {bottom_half + top_half}")

26. A Python program to automatically extract a username given an email.

In [None]:
test_string = input("Enter an email ").strip()  # accepting user input
pattern = re.compile(r'@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b')

username = re.sub(pattern, '', test_string)
print(f"Your username is {username}")

27. A Python program to convert given unicode to a raw string.

In [None]:
test_string = input("Enter a string ").strip()  # accepting user input

print(unicodedata.normalize('NFKD', test_string).encode('ascii', 'ignore'))

28. A Python program to shift each character by 10 ASCII values.

In [None]:
test_string = input("Enter a string ").strip()  # accepting user input
shift = 10
new_str = ""
for i in range(len(test_string)):
    char = test_string[i]
    
    if (char.isupper()):  # Uppercase shift
        new_str += chr((ord(char) + shift - 65) % 26 + 65)
    else:  # Lowercase shift
        new_str += chr((ord(char) + shift -97) % 26 + 97)

print(f"The new string is {new_str}")

29. A Python program to unshift each character back to the original string. (Using cyclic properties of shifting)

In [None]:
test_string = input("Enter a string ").strip()  # accepting user input
shift = 16  # Cyclic property of ASCII

new_str = ""
for i in range(len(test_string)):
    char = test_string[i]
    
    if (char.isupper()):  # Uppercase shift
        new_str += chr((ord(char) + shift - 65) % 26 + 65)
    else:  # Lowercase shift
        new_str += chr((ord(char) + shift -97) % 26 + 97)

print(f"The new string is {new_str}")

30. A Python program to validate co-ordinates in a strict latitude/longitude format using regular expressions. <br><br>

```
Sample Strings: 
+90.0, -127.554334
45, 180
-90, -180
-90.000, -180.0000
+90, +180
47.1231231, 179.99999999
```

In [None]:
test_string = input("Enter co-ordinates ").strip()  # accepting user input
pattern = re.compile(r'^[-+]?([1-8]?\d(\.\d+)?|90(\.0+)?),\s*[-+]?(180(\.0+)?|((1[0-7]\d)|'\
                     '([1-9]?\d))(\.\d+)?)$')

if re.search(pattern, test_string):
    print("Matched valid co-ordinates")
else:
    print("Invalid co-ordinates")

31. A Python program to validate Windows folder paths using regular expressions. <br><br>
```
Sample Strings: 
\\192.168.0.1\folder\file.pdf
\\192.168.0.1\my folder\folder.2\file.gif
c:\my folder\abc abc.docx
c:\my-folder\another_folder\abc.v2.docx
```

In [None]:
test_string = input("Enter a filepath ").strip()  # accepting user input
pattern = re.compile(r'^(?:[\w]\:|\\)(\\[a-z_\-\s0-9\.]+)+\.(txt|gif|pdf|doc|docx|xls|xlsx)$')

if re.search(pattern, test_string):
    print("Matched a valid filepath")
else:
    print("Invalid filepath")