 
# Python Strings Concept

Python strings are sequences of characters enclosed in quotes. They are one of the most commonly used data types in Python programming. Let me explain the key concepts with examples.

## Basic String Creation

Strings in Python can be created using single quotes (`'`), double quotes (`"`), or triple quotes (`'''` or `"""`) for multi-line strings.



## Useful Python Packages for String Manipulation

When working with strings in more advanced ways, these packages can be helpful:

1. **re** - Regular expressions for pattern matching in strings
2. **string** - Additional string constants and utilities
3. **textwrap** - Functions for wrapping and filling text
4. **difflib** - Helpers for computing string differences
5. **unicodedata** - Unicode character database

Strings are immutable in Python, which means once created, you cannot change their content. Any operation that appears to modify a string actually creates a new string.

Understanding string manipulation is fundamental to Python programming as strings are used extensively in data processing, file handling, and user interactions.

In [31]:
import tensorflow

In [12]:
# Different ways to create strings
single_quoted = 'Hello, World!'
double_quoted = "Python Programming"
multi_line = '''This is a
multi-line string
in Python'''


In [13]:
# Basic string operations
print(single_quoted)
print(double_quoted)
print(multi_line)

Hello, World!
Python Programming
This is a
multi-line string
in Python


In [14]:
# String length
print(f"Length of 'Hello, World!': {len(single_quoted)}")


Length of 'Hello, World!': 13


In [15]:
# String indexing (accessing characters)
print(f"First character: {single_quoted[0]}")
print(f"Last character: {single_quoted[-1]}")

First character: H
Last character: !


In [16]:
# String slicing
print(f"Substring (2-7): {single_quoted[2:7]}")
print(f"Every second character: {single_quoted[::2]}")
print(f"Reverse string: {single_quoted[::-1]}")

Substring (2-7): llo, 
Every second character: Hlo ol!
Reverse string: !dlroW ,olleH


In [17]:
# String concatenation
greeting = "Hello"
name = "Python"
message = greeting + ", " + name + "!"
print(message)

Hello, Python!


In [18]:
# String methods
text = "python programming"
print(f"Uppercase: {text.upper()}")
print(f"Capitalize: {text.capitalize()}")
print(f"Replace: {text.replace('python', 'awesome python')}")
print(f"Split: {text.split()}")
print(f"Count 'p': {text.count('p')}")
print(f"Find 'gram': {text.find('gram')}")

Uppercase: PYTHON PROGRAMMING
Capitalize: Python programming
Replace: awesome python programming
Split: ['python', 'programming']
Count 'p': 2
Find 'gram': 10


In [25]:
# String formatting
name = "Alice"
age = 30
# Using f-strings (Python 3.6+)
print(f"{name} is {age} years old")
# Using format() method
print("{} is {} years old".format(name, age))
# Using % operator
print("%s is %d years old" % (name, age))

Alice is 30 years old
Alice is 30 years old
Alice is 30 years old


In [26]:
# String checking methods
print(f"Is alphabetic? {'abc'.isalpha()}")
print(f"Is numeric? {'123'.isnumeric()}")
print(f"Is alphanumeric? {'abc123'.isalnum()}")
print(f"Starts with 'py'? {text.startswith('py')}")
print(f"Ends with 'ing'? {text.endswith('ing')}")

Is alphabetic? True
Is numeric? True
Is alphanumeric? True
Starts with 'py'? False
Ends with 'ing'? False


In [30]:
text='python programming'

print(f"Starts with 'py'? {text.startswith('py')}")
print(f"Ends with 'ing'? {text.endswith('ing')}")
print(text)

Starts with 'py'? True
Ends with 'ing'? True
python programming



## Useful Python Packages for String Manipulation

When working with strings in more advanced ways, these packages can be helpful:

1. **re** - Regular expressions for pattern matching in strings
2. **string** - Additional string constants and utilities
3. **textwrap** - Functions for wrapping and filling text
4. **difflib** - Helpers for computing string differences
5. **unicodedata** - Unicode character database

Strings are immutable in Python, which means once created, you cannot change their content. Any operation that appears to modify a string actually creates a new string.

Understanding string manipulation is fundamental to Python programming as strings are used extensively in data processing, file handling, and user interactions.

 # Python's `re` Package (Regular Expressions)

The `re` package in Python provides powerful tools for working with regular expressions, which are special sequences of characters that define search patterns. Regular expressions are extremely useful for text processing, validation, and extraction.

## Basic Concepts and Examples


In [24]:
import re

# Basic pattern matching
text = "The rain in Spain falls mainly in the plain."

# Simple search
match = re.search(r"rain", text)
print(f"Found 'rain' at position: {match.start() if match else 'Not found'}")

# Find all occurrences
all_matches = re.findall(r"in", text)
print(f"All occurrences of 'in': {all_matches} (count: {len(all_matches)})")

# Match at beginning of string
starts_with = re.match(r"The", text)
print(f"Starts with 'The': {bool(starts_with)}")

# Split string by pattern
split_text = re.split(r" ", text, maxsplit=3)
print(f"Split by spaces (max 3): {split_text}")

# Replace pattern
replaced = re.sub(r"Spain", "France", text)
print(f"After replacement: {replaced}")

# Using special characters and metacharacters
# \d - digit, \w - word character, \s - whitespace
phone_number = "Call me at 555-123-4567 or (555) 987-6543"
numbers = re.findall(r"\d{3}-\d{3}-\d{4}", phone_number)
print(f"Phone numbers found: {numbers}")

# Character classes with []
vowels = re.findall(r"[aeiou]", text.lower())
print(f"Vowels found: {vowels} (count: {len(vowels)})")

# Quantifiers: *, +, ?, {n}, {n,}, {n,m}
words = re.findall(r"\b\w{4,}\b", text)  # Words with 4+ characters
print(f"Words with 4+ characters: {words}")

# Groups with ()
email_text = "Contact us at info@example.com or support@company.org"
emails = re.findall(r"(\w+)@(\w+)\.(\w+)", email_text)
print(f"Email parts (username, domain, TLD): {emails}")

# Named groups
pattern = r"(?P<username>\w+)@(?P<domain>\w+)\.(?P<tld>\w+)"
for match in re.finditer(pattern, email_text):
    print(f"Username: {match.group('username')}, Domain: {match.group('domain')}, TLD: {match.group('tld')}")

# Greedy vs. non-greedy matching
html = "<div>First content</div><div>Second content</div>"
greedy = re.findall(r"<div>(.*)</div>", html)
non_greedy = re.findall(r"<div>(.*?)</div>", html)
print(f"Greedy match: {greedy}")
print(f"Non-greedy match: {non_greedy}")

# Flags
text_multiline = """First line
SECOND line
Third LINE"""

# Case-insensitive search
case_insensitive = re.findall(r"line", text_multiline, re.IGNORECASE)
print(f"Case-insensitive matches: {case_insensitive}")

# Multiline mode - ^ and $ match start/end of each line
multiline_matches = re.findall(r"^.*line$", text_multiline, re.MULTILINE)
print(f"Lines ending with 'line': {multiline_matches}")

# Compiling patterns for reuse (more efficient)
pattern = re.compile(r"\b\w{3}\b")  # 3-letter words
three_letter_words = pattern.findall(text)
print(f"Three-letter words: {three_letter_words}")

# Lookahead and lookbehind assertions
text_with_prices = "Products: $10, €20, $30"
# Positive lookahead: match $ only if followed by digits
dollars = re.findall(r"\$(?=\d+)", text_with_prices)
# Positive lookbehind: match digits only if preceded by $
dollar_amounts = re.findall(r"(?<=\$)\d+", text_with_prices)
print(f"Dollar symbols: {dollars}")
print(f"Dollar amounts: {dollar_amounts}")

Found 'rain' at position: 4
All occurrences of 'in': ['in', 'in', 'in', 'in', 'in', 'in'] (count: 6)
Starts with 'The': True
Split by spaces (max 3): ['The', 'rain', 'in', 'Spain falls mainly in the plain.']
After replacement: The rain in France falls mainly in the plain.
Phone numbers found: ['555-123-4567']
Vowels found: ['e', 'a', 'i', 'i', 'a', 'i', 'a', 'a', 'i', 'i', 'e', 'a', 'i'] (count: 13)
Words with 4+ characters: ['rain', 'Spain', 'falls', 'mainly', 'plain']
Email parts (username, domain, TLD): [('info', 'example', 'com'), ('support', 'company', 'org')]
Username: info, Domain: example, TLD: com
Username: support, Domain: company, TLD: org
Greedy match: ['First content</div><div>Second content']
Non-greedy match: ['First content', 'Second content']
Case-insensitive matches: ['line', 'line', 'LINE']
Lines ending with 'line': ['First line', 'SECOND line']
Three-letter words: ['The', 'the']
Dollar symbols: ['$', '$']
Dollar amounts: ['10', '30']



## Key Regular Expression Concepts:

1. **Basic Functions**:
   - `re.search()` - Find first match
   - `re.match()` - Match at beginning of string
   - `re.findall()` - Find all matches
   - `re.finditer()` - Return iterator of match objects
   - `re.sub()` - Substitute matches
   - `re.split()` - Split string by pattern

2. **Special Characters**:
   - `.` - Any character except newline
   - `\d` - Digit (`[0-9]`)
   - `\D` - Non-digit
   - `\w` - Word character (`[a-zA-Z0-9_]`)
   - `\W` - Non-word character
   - `\s` - Whitespace
   - `\S` - Non-whitespace

3. **Anchors**:
   - `^` - Start of string/line
   - `$` - End of string/line
   - `\b` - Word boundary

4. **Quantifiers**:
   - `*` - 0 or more
   - `+` - 1 or more
   - `?` - 0 or 1
   - `{n}` - Exactly n
   - `{n,}` - n or more
   - `{n,m}` - Between n and m

5. **Common Flags**:
   - `re.IGNORECASE` or `re.I` - Case-insensitive matching
   - `re.MULTILINE` or `re.M` - Multi-line mode
   - `re.DOTALL` or `re.S` - Dot matches newline

Regular expressions are powerful but can be complex. It's often helpful to test them in a regex tester tool when developing complex patterns. The `re` module is essential for text processing tasks like validation, extraction, and transformation of structured text data.