### **‚û°Ô∏è String Functions in NumPy**

##### **Why do we need String Manipulation during Data Analysis?**
String manipulation is an essential part of **data cleaning and preprocessing**, especially when dealing with textual or categorical data. Below are key reasons:

---
**1Ô∏è‚É£ Data Cleaning**
1. **`Whitespace Removal:`** Real-world string data often contains unnecessary leading or trailing spaces. Cleaning them ensures **data consistency** and avoids mismatched comparisons.

2. **`Case Consistency:`** Converting all strings to lowercase or uppercase helps maintain **uniformity** for matching and comparisons --> `"Apple"` and `"apple"` should be treated the same.
---
**2Ô∏è‚É£ Data Transformation**
1. **`Concatenation:`** Combine multiple string fields into one ‚Äî useful for creating **composite keys** or merging data columns --> Combining `"First Name"` and `"Last Name"` into `"Full Name"`.

2. **`Replacement and Substitution:`** Correct typos or replace parts of strings with new values ‚Äî crucial for **data correction** --> Replacing `"N.A."` with `"Not Available"`.
---
**3Ô∏è‚É£ Data Parsing and Extraction**
1. **`Splitting:`** Break a string into smaller parts for better organization --> Splitting `"John Doe"` into `"John"` and `"Doe"`.

2. **`Pattern Matching:`** Extract specific data patterns like **emails, phone numbers, or dates** using string functions or regular expressions. This helps in **data validation and structured analysis**.

In [1]:
import numpy as np

#### **‚û°Ô∏è Concatenation and Repetition in NumPy String Functions**
NumPy provides powerful string operations through the `np.char` module, allowing you to perform **element-wise** string manipulations on arrays.

---
**‚û°Ô∏è Summary**
| Operation      | Function               | Description                                 |
| -------------- | ---------------------- | ------------------------------------------- |
| Concatenation  | `np.char.add()`        | Joins corresponding string elements         |
| Repetition     | `np.char.multiply()`   | Repeats each string element multiple times  |
| Capitalization | `np.char.capitalize()` | Capitalizes the first letter of each string |

**1Ô∏è‚É£ String Concatenation (`np.char.add`)**: **concatenate (add)**

In [2]:
arr1 = np.array(['Hello', 'Good'])
arr2 = np.array([' World', ' Morning'])

print(np.char.add(arr1, arr2))

['Hello World' 'Good Morning']


**2Ô∏è‚É£ String Repetition (np.char.multiply)**:

In [3]:
arr = np.array(['Hello', 'Good'])

print(np.char.multiply(arr, 3))

['HelloHelloHello' 'GoodGoodGood']


**3Ô∏è‚É£ Capitalizing Strings (np.char.capitalize)**

In [4]:
arr = np.array(['hello', 'good morning'])

print(np.char.capitalize(arr))

['Hello' 'Good morning']


In [None]:
import numpy as np

arr1 = np.array([['hello', 'good'], ['morning', 'night']])
arr2 = np.array([['world', 'morning'], ['everyone', 'moon']])

add = np.char.add(arr1, arr2) # Concatenate corressponding elements
print(add)

capitalize = np.char.capitalize(add) # Capitalize first letter of each element
print(capitalize)

multiply = np.char.multiply(capitalize, 2) # Repeat each element twice
print(multiply)


[['helloworld' 'goodmorning']
 ['morningeveryone' 'nightmoon']]
[['Helloworld' 'Goodmorning']
 ['Morningeveryone' 'Nightmoon']]
[['HelloworldHelloworld' 'GoodmorningGoodmorning']
 ['MorningeveryoneMorningeveryone' 'NightmoonNightmoon']]


#### **‚û°Ô∏è Case Conversion in NumPy String Functions**
NumPy provides several string manipulation functions under the `np.char` module for **case conversion** and **text formatting**.

---
**‚û°Ô∏è Summary**
| Operation     | Function          | Description                               |
| ------------- | ----------------- | ----------------------------------------- |
| Title Case    | `np.char.title()` | Capitalizes the first letter of each word |
| Lowercase     | `np.char.lower()` | Converts all characters to lowercase      |
| Uppercase     | `np.char.upper()` | Converts all characters to uppercase      |

**1Ô∏è‚É£ Title Case Conversion (`np.char.title`)**

In [12]:
arr = np.array(['hello world', 'good morning'])

print(np.char.title(arr))

['Hello World' 'Good Morning']


**2Ô∏è‚É£ Lowercase Conversion (`np.char.lower`)**

In [13]:
arr = np.array(['Hello', 'Good Morning'])

print(np.char.lower(arr))

['hello' 'good morning']


**3Ô∏è‚É£ Uppercase Conversion (`np.char.upper`)**

In [14]:
arr = np.array(['Hello', 'Good Morning'])

print(np.char.upper(arr))

['HELLO' 'GOOD MORNING']


In [15]:
import numpy as np

arr1 = np.array([['HeLLo WoRLd', 'goOd'], ['MorninG AWESOME', 'NIGHT']])

lower = np.char.lower(arr1) # covert all characters to lowercase
print(lower)

title = np.char.title(arr1) # Converts the first letter of every word to uppercase
print(title)

concatenate = np.char.add(arr1, arr1) # Concatenates the strings element-wise
print(concatenate)

[['hello world' 'good']
 ['morning awesome' 'night']]
[['Hello World' 'Good']
 ['Morning Awesome' 'Night']]
[['HeLLo WoRLdHeLLo WoRLd' 'goOdgoOd']
 ['MorninG AWESOMEMorninG AWESOME' 'NIGHTNIGHT']]


#### **‚û°Ô∏è String Splitting and Joining in NumPy**
NumPy provides powerful string manipulation functions for splitting and joining strings within arrays.

---
**‚û°Ô∏è Summary**
| Operation       | Function               | Description                                      |
| --------------- | ---------------------- | ------------------------------------------------ |
| **`Split`**           | `np.char.split()`      | Splits each string in the array into substrings |
| **`Join`**            | `np.char.join()`       | Joins a list of strings into a single string    |
| **`Splitlines`**      | `np.char.splitlines()` | Splits strings at line breaks                    |

**üîπ Splitting Space-Separated Strings: `np.char.split()`**

In [16]:
import numpy as np

arr = np.array(['hello world', 'good morning'])
print(np.char.split(arr)) # Output: # [list(['hello', 'world']) list(['good', 'morning'])]

[list(['hello', 'world']) list(['good', 'morning'])]


**üîπ Splitting Strings Based on Line Separators: `np.char.splitlines()`**

In [17]:
arr = np.array(['hello\nworld', 'good\nmorning\neveryone'])

print(np.char.splitlines(arr)) # Output: [list(['hello', 'world']) list(['good', 'morning', 'everyone'])]

[list(['hello', 'world']) list(['good', 'morning', 'everyone'])]


**üîπ Joining Characters with a Separator: `np.char.join()`**

In [18]:
arr = np.array(['hello', 'world'])
separator = '-'

print(np.char.join(separator, arr)) # Output:['h-e-l-l-o' 'w-o-r-l-d']

['h-e-l-l-o' 'w-o-r-l-d']


In [19]:
import numpy as np

arr = np.array(['hello world', 'good morning'])
print(np.char.split(arr))

arr1 = np.array(['helloworld', 'goodmorning'])
print(np.char.split(arr1))       # notice that this does not change the original values

arr2 = np.array(['hello\nworld', 'good\nmorning\neveryone'])
print(arr2)                     # how do the elements look before splitlines
print(np.char.splitlines(arr2))

arr3 = np.array(['hello', 'world'])
separator = '-'
print(np.char.join(separator, arr3))

[list(['hello', 'world']) list(['good', 'morning'])]
[list(['helloworld']) list(['goodmorning'])]
['hello\nworld' 'good\nmorning\neveryone']
[list(['hello', 'world']) list(['good', 'morning', 'everyone'])]
['h-e-l-l-o' 'w-o-r-l-d']


#### **‚û°Ô∏è String Searching in NumPy**
NumPy provides functions to **search for substrings** within string arrays. You can find the position (index) of a substring in each string element using the following methods: `np.char.find()` & `np.char.rfind()`

---
**üîπSummary**
| Function          | Description                                                                 | Returns                    | Example Output |
| ----------------- | --------------------------------------------------------------------------- | -------------------------- | -------------- |
| `np.char.find()`  | Finds the **first occurrence** (lowest index) of a substring in each string | Index or `-1` if not found | `[4, 1, 4]`    |
| `np.char.rfind()` | Finds the **last occurrence** (highest index) of a substring in each string | Index or `-1` if not found | `[7, 6, 11]`   |


‚û°Ô∏è **`np.char.find()`**: Returns the **lowest index** in each string where the substring is found. If the substring is not found, it returns **-1**.

In [20]:
import numpy as np

arr = np.array(['hello world', 'good morning', 'hello everyone'])
substring = 'o'

print(np.char.find(arr, substring)) # Output: [ 4  1  4 ]

[4 1 4]


**‚û°Ô∏è `np.char.rfind()`:** Returns the **highest index** in each string where the substring is found. If the substring is not found, it also returns -1.

In [21]:
arr = np.array(['hello world', 'good morning', 'hello everyone'])
substring = 'o'

print(np.char.rfind(arr, substring)) # Output: [ 7  6 11 ]

[ 7  6 11]


In [24]:
import numpy as np

arr = np.array([
    ['hello world', 'good morning'],
    ['numpy array', 'substring find']
])
user_input = input("Enter the substring to search for: ")

print(f"Lowest / First occurence of the substring {user_input}", np.char.find(arr, user_input))
print(f"Highest / Last occurence of the substring {user_input}", np.char.rfind(arr, user_input))

Lowest / First occurence of the substring n [[-1  8]
 [ 0  7]]
Highest / Last occurence of the substring n [[-1 10]
 [ 0 12]]


#### **‚û°Ô∏è String Modification in NumPy**
NumPy provides functions to **modify strings** element-wise in arrays. This is especially useful for **data cleaning** and text preprocessing.

---
**‚û°Ô∏è Summary:**
| Function                    | Description                              | Example Output                                  |
| --------------------------- | ---------------------------------------- | ----------------------------------------------- |
| `np.char.strip()`           | Removes leading and trailing whitespaces | `['hello  world' 'beautiful day' 'numpy']`      |
| `np.char.lstrip()`          | Removes only leading whitespaces         | `['hello  world ' 'beautiful day  ' 'numpy ']`  |
| `np.char.rstrip()`          | Removes only trailing whitespaces        | `['  hello  world' '  beautiful day' ' numpy']` |
| `np.char.replace(old, new)` | Replaces substring `old` with `new`      | `['hello universe' 'goodbye universe']`         |

‚û°Ô∏è **`np.char.strip()`**: Removes **leading and trailing whitespaces** from each string element in the array. Spaces **within the string are not removed**. 

In [None]:
import numpy as np

arr = np.array(['  hello  world ', '  beautiful day  ', ' numpy '])
print(np.char.strip(arr)) # Output: ['hello  world' 'beautiful day' 'numpy']

['hello  world' 'beautiful day' 'numpy']


**‚û°Ô∏è `np.char.lstrip()`**: **Removes leading whitespaces** (spaces at the beginning) from each string element.

In [26]:
arr = np.array(['  hello  world ', '  beautiful day  ', ' numpy '])

print(np.char.lstrip(arr)) # Output: ['hello  world ' 'beautiful day  ' 'numpy ']

['hello  world ' 'beautiful day  ' 'numpy ']


‚û°Ô∏è **`np.char.rstrip()`**: **Removes trailing whitespaces** (spaces at the end) from each string element.

In [27]:
arr = np.array(['  hello  world ', '  beautiful day  ', ' numpy '])

print(np.char.rstrip(arr)) # Output: ['  hello  world' '  beautiful day' ' numpy']

['  hello  world' '  beautiful day' ' numpy']


‚û°Ô∏è **`np.char.replace()`**: **Replaces all occurrences** of a substring with a new substring in each string element.

In [28]:
arr = np.array(['hello world', 'goodbye world'])

print(np.char.replace(arr, 'world', 'universe')) # Output: ['hello universe' 'goodbye universe']

['hello universe' 'goodbye universe']


In [30]:
import numpy as np

comments = np.array([
    ['  Hello [NAME], how are you?  ', 'Welcome [NAME]!  '],
    ['  [NAME], you have a new message. ', '  Your order, [NAME], is ready.']
])

name = 'hrishi'

# Update your code blow this line
strip_word = np.char.strip(comments)
print(np.char.replace(strip_word, "[NAME]", name))

[['Hello hrishi, how are you?' 'Welcome hrishi!']
 ['hrishi, you have a new message.' 'Your order, hrishi, is ready.']]


#### **‚û°Ô∏è Character Checking Functions in NumPy**
NumPy provides functions to **check the type of characters** in string arrays. These functions return a **boolean array** of the same shape.

---
**üîπ Summary**
| Function              | Description                                                            | Example Output                    |
| --------------------- | ---------------------------------------------------------------------- | --------------------------------- |
| `np.char.isdigit()`   | Checks if strings contain only digits                                  | `[ True False  True False  True]` |
| `np.char.isalpha()`   | Checks if strings contain only alphabetic characters                   | `[ True  True False  True False]` |
| `np.char.isnumeric()` | Checks if strings contain numeric characters (digits, fractions, etc.) | `[ True False False False  True]` |
| `np.column_stack()`   | Stacks 1D arrays as columns into a 2D array                            | `[[1 4] [2 5] [3 6]]`             |


üîπ **`np.char.isdigit()`**: Checks if each string element contains **only digits (0-9)**

In [39]:
import numpy as np

arr = np.array(['123', '456a', '7890', 'abc', '42'])
print(np.char.isdigit(arr)) # Output: [ True False  True False  True]

[ True False  True False  True]


**üîπ `np.char.isalpha()`**: Checks if each string element contains **only alphabetic characters (a-z, A-Z)**.

In [32]:
arr = np.array(['abc', 'Hello', 'world123', 'XYZ', '42'])

print(np.char.isalpha(arr)) # Output: [ True  True False  True False]

[ True  True False  True False]


**üîπ `np.char.isnumeric()`**: Checks if each string element contains **only numeric characters**. Numeric characters include **digits and characters** representing numbers in other number systems (e.g., **fractions, subscripts, superscripts**).

In [None]:
arr = np.array(['123', '456a', 'IV', 'abc', '42', "1.2"])
result = np.char.isnumeric(arr)

print(result) # Output: [ True False False False  True False]

[ True False False False  True False]


üîπ **`np.column_stack()`** : Stacks **1D arrays as columns into a 2D array**. Useful to combine multiple arrays column-wise.

In [44]:
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

combined = np.column_stack((arr1, arr2))
print(combined)
# Output:
# [[1 4]
#  [2 5]
#  [3 6]]

[[1 4]
 [2 5]
 [3 6]]


In [45]:
import numpy as np

data = np.array([
    ['Alice', '25'],
    ['Bob123', '30'],
    ['Charlie', 'Twenty'],
    ['Diana', '40'],
    ['Eve', '32years']
])
name = np.char.isalpha(data[:, 0])
age = np.char.isnumeric(data[:, 1])

stacked_result = np.column_stack((name, age))
print(stacked_result)

[[ True  True]
 [False  True]
 [ True False]
 [ True  True]
 [ True False]]
