# Chapter 8: More About Strings

## A. String Topics Covered:
1. Basic String Operations
2. String Slicing
3. Testing, Searching, and Manipulating Strings

### 1. Basic String Operations:
- **Accessing Individual Characters:**
  - Use a for loop to iterate over each character.
  - Use indexing to access characters (e.g., `character = my_string[i]`).
  - Handle IndexError exceptions by using the `len()` function to check string length.

- **String Concatenation:**
  - Use the `+` operator to concatenate strings.
  - Use the `+=` operator for appending strings to existing variables.

- **Immutability of Strings:**
  - Strings cannot be modified after creation.
  - Concatenation creates new strings rather than modifying existing ones.

### 2. String Slicing:
- Slicing format: `string[start:end]`.
- mission of `start` defaults to 0; mission of `end` defaults to `len(string)`.
- Slicing can include a step value and negative indices.

### 3. Testing, Searching, and Manipulating Strings:
- **Membership Operators:**
  - Use `in` and `not in` to check for substrings within strings.

- **String Methods:**
  - **Testing Methods:**
    - `isalnum()`, `isalpha()`, `isdigit()`, `islower()`, `isspace()`, `isupper()`.

  - **Modification Methods:**
    - `lower()`, `lstrip()`, `rstrip()`, `strip()`, `upper()`.

  - **Search and Replace Methods:**
    - `endswith()`, `find()`, `replace()`, `startswith()`.

- **Repetition Operator:**
  - Use the `*` symbol to repeat strings (e.g., `string_to_copy * n`).

- **Splitting Strings:**
  - `split()` method to convert a string into a list of words.
  - Specify different delimiters for splitting.



### 1. BASIC STRING OPERATIONS:

#### 1.1. Accessing Individual Characters:

In [2]:
my_string = "Hello, World!"

# Using a for loop
for any_character in my_string:
    print(any_character)
    
# Using indexing
first_char = my_string[0]
print("First character:", first_char)

last_char = my_string[-1]
print("Last character:", last_char)


H
e
l
l
o
,
 
W
o
r
l
d
!
First character: H
Last character: !


#### 1.2. Concatenation

In [3]:
# String Concatenation
greeting = "Hello"
name = "Alice"
message = greeting + ", " + name + "!"
print(message)

# Using += operator
greeting += " everyone!"
print("Updated greeting:", greeting)



Hello, Alice!
Updated greeting: Hello everyone!


#### 1.3. Immutability

In [4]:
# Demonstrating immutability
immutable_string = "Hello"
# immutable_string[0] = 'h'  # Uncommenting this line will raise an error
print("Immutable string:", immutable_string)

# Concatenation creates a new string
new_string = immutable_string + " World"
print("New string:", new_string)


TypeError: 'str' object does not support item assignment

### 2. String Slicing

In [7]:
my_string = "Hello, World!"

# Basic slicing
substring = my_string[0:5]
print("Substring (0:5):", substring)

# Omitting start index
substring_start = my_string[:5]
print("Substring (:5):", substring_start)

# Omitting end index
substring_end = my_string[7:]
print("Substring (7:):", substring_end)

# Using negative indices
substring_negative = my_string[-6:]
print("Substring (-6:):", substring_negative)

# Including step value
substring_step = my_string[::2]
print("Substring with step (0::2):", substring_step)


Substring (0:5): Hello
Substring (:5): Hello
Substring (7:): World!
Substring (-6:): World!


KeyboardInterrupt: 

### 3.  Testing, Searching, and Manipulating Strings

#### 3.1. Membership Operators:


In [9]:
my_string = "Hello, World!"
contains_hello = "Hello" in my_string
print("Contains 'Hello':", contains_hello)

contains_python = "Python" not in my_string
print("Does not contain 'Python':", contains_python)


Contains 'Hello': True
Does not contain 'Python': True


#### 3.2. String Methods:

In [1]:
# Testing methods
alphanumeric = "abc123".isalnum()
print("Is alphanumeric:", alphanumeric) # Output: True

alpha = "abc".isalpha()
print("Is alphabetic:", alpha) # Output: True

digit = "123".isdigit()
print("Is digit:", digit) # Output: True

lowercase = "abc".islower()
print("Is lowercase:", lowercase) # Output: True

uppercase = "ABC".isupper()
print("Is uppercase:", uppercase)  # Output: True

# Modification methods
original_string = "   Hello, World!   "
print("Original string:", original_string)

lower_string = original_string.lower()
print("Lowercase string:", lower_string)

stripped_string = original_string.strip()
print("Stripped string:", stripped_string)

# Search and replace methods
found_index = original_string.find("World")
print("Index of 'World':", found_index)

replaced_string = original_string.replace("World", "Universe")
print("Replaced string:", replaced_string)


Is alphanumeric: True
Is alphabetic: True
Is digit: True
Is lowercase: True
Is uppercase: True
Original string:    Hello, World!   
Lowercase string:    hello, world!   
Stripped string: Hello, World!
Index of 'World': 10
Replaced string:    Hello, Universe!   


### 4. Repetition Operator:


In [13]:
repeated_string = "abc" * 4
print("Repeated string:", repeated_string)


Repeated string: abcabcabcabc


### 5. Splitting Strings:

In [16]:
# Splitting with default separator (space)
my_string = "One two , three , four"
word_list = my_string.split()
print("Split by spaces:", word_list)

# Splitting with a specific separator
my_string_with_slashes = "1/2/3/4/5"
number_list = my_string_with_slashes.split('/')
print("Split by '/':", number_list)


Split by spaces: ['One', 'two', ',', 'three', ',', 'four']
Split by '/': ['1', '2', '3', '4', '5']


### 6. String Tokens:

In [17]:
# Example with space as delimiter
str = 'peach raspberry strawberry vanilla'
tokens = str.split()
print("Tokens:", tokens)

# Example with custom delimiter
my_address = 'www.example.com'
tokens_custom = my_address.split('.')
print("Custom tokens:", tokens_custom)


Tokens: ['peach', 'raspberry', 'strawberry', 'vanilla']
Custom tokens: ['www', 'example', 'com']


In [24]:
def replace_char_at_index(input_string, index, new_char):
    if index < 0 or index >= len(input_string):
        raise ValueError("Index is out of range")
    
    # Convert the string to a list to allow modification
    string_list = list(input_string)
    string_list[index] = new_char
    
    # Join the list back into a string, String Methods
    modified_string = ''.join(string_list)
    return modified_string

# Example usage
original_string = "Hello, World!"
index_to_replace = 7
new_character = 'P'

try:
    print("Original string:", original_string)
    original_string = replace_char_at_index(original_string, index_to_replace, new_character)
    print("Modified string:", original_string)
except ValueError as e:
    print(e)


Original string: Hello, World!
Modified string: Hello, Porld!


## B. Regular Expressions in Python (NO TEST)

Regular expressions (regex) are powerful tools for working with text, enabling complex pattern matching, searching, and text manipulation. Python's `re` module provides support for regex operations. Here's an overview and some examples:

#### 1. Importing the `re` Module

```python
import re
```

#### 2. Basic Functions

1. `re.search(pattern, string)`: Searches for the first location where the regex pattern produces a match.
2. `re.match(pattern, string)`: Checks for a match only at the beginning of the string.
3. `re.findall(pattern, string)`: Returns a list of all matches in the string.
4. `re.finditer(pattern, string)`: Returns an iterator yielding match objects for all matches.
5. `re.sub(pattern, repl, string)`: Replaces matches with a specified replacement.

### 3. Examples

**Basic Pattern Matching**

```python
import re

text = "The rain in Spain"

# Search for "rain" in the text
match = re.search(r"rain", text)
if match:
    print("Found 'rain':", match.group())
```

**Matching at the Beginning of a String**

```python
# Check if the string starts with "The"
match = re.match(r"The", text)
if match:
    print("String starts with 'The'")
```

**Finding All Matches**

```python
# Find all occurrences of the letter 'a'
matches = re.findall(r"a", text)
print("All matches for 'a':", matches)
```

**Using Metacharacters**

```python
# Find words starting with 'S' followed by any characters
matches = re.findall(r"\bS\w+", text)
print("Words starting with 'S':", matches)
```

**Using Quantifiers**

```python
# Find sequences of digits
text_with_numbers = "My phone number is 123-456-7890"
matches = re.findall(r"\d+", text_with_numbers)
print("Sequences of digits:", matches)
```

**Using Groups**

```python
# Extract area code from a phone number
phone = "123-456-7890"
match = re.search(r"(\d{3})-(\d{3})-(\d{4})", phone)
if match:
    print("Area code:", match.group(1))
    print("First part:", match.group(2))
    print("Second part:", match.group(3))
```

**Replacing Patterns**

```python
# Replace digits with 'X'
replaced = re.sub(r"\d", "X", text_with_numbers)
print("Replaced text:", replaced)
```

**Using `re.finditer`**

```python
# Find all matches and their positions
iterator = re.finditer(r"\d", text_with_numbers)
for match in iterator:
    print("Match at position:", match.start())
```

### 4. Example: Email Validation

In [19]:
import re

email = "user@example.com"

# Simple email validation pattern
pattern = r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$"
if re.match(pattern, email):
    print("Valid email address")
else:
    print("Invalid email address")
    

Valid email address user@example.com


### 5. Comprehensive Example

In [23]:
import re

text = "Contact us at support@example.com or sales@example.com"

# Find all email addresses
emails = re.findall(r"[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+", text)
print("Email addresses found:", emails)

# Replace email addresses with a placeholder
masked_text = re.sub(r"[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+", "[email]", text)
print("Masked text:", masked_text)

# Extract domain names from the email addresses
for email in emails:
    domain = re.search(r"@([a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)$", email)
    if domain:
        print("Domain:", domain.group(1))

Email addresses found: ['support@example.com', 'sales@example.com']
Masked text: Contact us at [email] or [email]
Domain: example.com
Domain: example.com



## C. Natural Language Processing (NLP) (NO TEST) 

### 1. NLP
NLP is a field of artificial intelligence that focuses on the interaction between computers and humans through natural language. It involves various techniques and tools to enable machines to read, understand, and derive meaning from human languages.

### 2. Key NLP Tasks and Techniques
- Tokenization: Splitting text into individual words or tokens.
- Part-of-Speech Tagging (POS): Identifying the grammatical parts of speech in a sentence.
- Named Entity Recognition (NER): Detecting and classifying named entities (e.g., people, places, organizations) in text.
- Sentiment Analysis: Determining the sentiment expressed in a piece of text (e.g., positive, negative, neutral).
- Text Classification: Categorizing text into predefined categories.
- Machine Translation: Translating text from one language to another.
- Text Summarization: Producing a shorter version of a text that captures the main points.
- Question Answering: Providing answers to questions posed in natural language.
- Text Generation: Generating coherent and contextually relevant text.
- Example NLP Workflow in Python is NLTK and SpaCy

### Example:
- https://colab.research.google.com/drive/1xGSFxrjEtipL1TQvWOgKfGZIZZHluc48?usp=sharing
- https://www.loom.com/share/c6697fd242464c0ab93923885ab4a6dd?sid=0579e9c3-9ee8-4948-9234-fbe98843c1fd