### Working with Text Data

###  1: Creating a Series
#### This code creates a Pandas Series containing names and a missing value (np.nan). The Series is printed as output. 

In [1]:
import pandas as pd
import numpy as np

s = pd.Series(['Ali', 'Aslam', 'Umar', 'Akram', np.nan, '1234', 'Azam'])
print(s)

0      Ali
1    Aslam
2     Umar
3    Akram
4      NaN
5     1234
6     Azam
dtype: object


### 2: Convert Strings to Lowercase
#### The str.lower() method converts all strings in the Series to lowercase. Missing values (np.nan) remain unchanged.

In [2]:
s = pd.Series(['Ali', 'Aslam', 'Umar', 'Akram', np.nan, '1234', 'Azam'])
print(s.str.lower())

0      ali
1    aslam
2     umar
3    akram
4      NaN
5     1234
6     azam
dtype: object


### 3: Convert Strings to Uppercase
#### The str.upper() method converts all strings in the Series to uppercase. Missing values (np.nan) are not affected.

In [3]:
s = pd.Series(['Ali', 'Aslam', 'Umar', 'Akram', np.nan, '1234', 'Azam'])
print(s.str.upper())

0      ALI
1    ASLAM
2     UMAR
3    AKRAM
4      NaN
5     1234
6     AZAM
dtype: object


### 4: Calculate Length of Strings
#### The str.len() method calculates the length of each string in the Series. Missing values return NaN.

In [4]:
s = pd.Series(['Ali', 'Aslam', 'Umar', 'Akram', np.nan, '1234', 'Azam'])
print(s.str.len())

0    3.0
1    5.0
2    4.0
3    5.0
4    NaN
5    4.0
6    4.0
dtype: float64


### 5: Strip Whitespace
#### The str.strip() method removes leading and trailing whitespace from each string in the Series.

In [5]:
s = pd.Series(['Ali ', ' Aslam', 'Umar', 'Akram'])
print(s)
print("After Stripping:")
print(s.str.strip())

0      Ali 
1     Aslam
2      Umar
3     Akram
dtype: object
After Stripping:
0      Ali
1    Aslam
2     Umar
3    Akram
dtype: object


### 6: Split Strings by Pattern
#### Split a string into substrings based on a specified pattern. The pattern can be a regular expression or a simple string. The function returns a list of substrings. The str.split(' ') method splits each string in the Series by the space character ( ) and returns a list of substrings.

In [6]:
s = pd.Series(['Ali', 'Aslam Umar', 'Akram', 'Azam'])
print(s)
print("Split Pattern:")
print(s.str.split(' '))

0           Ali
1    Aslam Umar
2         Akram
3          Azam
dtype: object
Split Pattern:
0            [Ali]
1    [Aslam, Umar]
2          [Akram]
3           [Azam]
dtype: object


### 7: Concatenate Strings with Separator
#### You can use the `join()` function to concatenate strings with a separator. For example, you can join a list of strings with a comma and a space as the separator: `', '.join(['apple', 'banana', 'cherry'])` returns `'apple, banana , cherry'`. The str.cat(sep='_') method concatenates all strings in the Series into a single string, separated by the specified delimiter (_).

In [7]:
s = pd.Series(['Ali', 'Aslam', 'Umar', 'Akram'])
print(s.str.cat(sep='_'))

Ali_Aslam_Umar_Akram


### 8: Get Dummy Variables
#### The str.get_dummies() method converts each unique string value into a binary column (dummy variable). 

In [8]:
s = pd.Series(['Ali', 'Aslam', 'Umar', 'Akram'])
print(s.str.get_dummies())

   Akram  Ali  Aslam  Umar
0      0    1      0     0
1      0    0      1     0
2      0    0      0     1
3      1    0      0     0


### 9: Check if Strings Contain a Pattern
##### The str.contains(' ') method checks if each string in the Series contains a space character and returns a boolean Series.


In [9]:
s = pd.Series(['Ali', 'Aslam Umar', 'Akram', 'Azam'])
print(s.str.contains(' '))

0    False
1     True
2    False
3    False
dtype: bool


### 10: Replace Substrings
##### Replace all occurrences of a substring in a string with another substring. This function takes three arguments: the original string, the substring to replace, and the replacement substring. It returns the modified string. The function uses the `str.replace()` method to replace all occurrences of the substring. If the substring is not found, the original string is returned. The str.replace('@', '$') method replaces all occurrences of @ with $ in each string.

In [10]:
s = pd.Series(['Ali', 'Aslam@Umar', 'Akram', 'Azam'])
print(s)
print("After replacing @ with $:")
print(s.str.replace('@', '$'))

0           Ali
1    Aslam@Umar
2         Akram
3          Azam
dtype: object
After replacing @ with $:
0           Ali
1    Aslam$Umar
2         Akram
3          Azam
dtype: object


### 11: Repeat Strings
##### Write a function that takes a string and an integer as input, and returns a new string where th s input string is repeated the specified number of times. For example, if the input string is "hello" and the integer is 3, the function should return "hellohellohello". The str.repeat(2) method repeats each string in the Series twice. The function should handle both positive and negative integers. If the integer is negative, the function should return an empty string. If the integer is zero, the function should return an empty string. If the integer is positive, the function should return the input string repeated the specified number of times. 

In [11]:
s = pd.Series(['Ali', 'Aslam', 'Umar', 'Akram'])
print(s.str.repeat(2))

0        AliAli
1    AslamAslam
2      UmarUmar
3    AkramAkram
dtype: object


### 12: Count Occurrences of a Pattern
##### The following code counts the occurrences of a pattern in a string. It uses a regular expression to find all occurrences of the pattern and then returns the count of matches. The code is well-structured, readable, and follows best practices. However, it does not handle the case where the pattern is not found in the string. It would be better to add a check to return 0 in this case. The str.count('a') method counts the occurrences of the letter a in each string (case-sensitive).

In [12]:
s = pd.Series(['Ali', 'Aslam', 'Umar', 'Akram'])
print("The number of 'a's in each string:")
print(s.str.count('a'))

The number of 'a's in each string:
0    0
1    1
2    1
3    1
dtype: int64


### 13: Check if Strings Start with a Pattern
#### You can use the `str.startswith()` method to check if a string starts with a specific pattern. The str.startswith('A') method checks if each string starts with the letter A and returns a boolean Series. The result is a boolean Series where True means the string starts with the letter A and False means it does not. 

In [13]:
s = pd.Series(['Ali', 'Aslam', 'Umar', 'Akram'])
print("Strings that start with 'A':")
print(s.str.startswith('A'))

Strings that start with 'A':
0     True
1     True
2    False
3     True
dtype: bool


### 14: Check if Strings End with a Pattern
You can use the `end_with()` method to check if a string ends with a specific pattern. The str.endswith('m') method checks if each string ends with the letter m and returns a boolean Series. 

In [14]:
s = pd.Series(['Ali', 'Aslam', 'Umar', 'Akram'])
print("Strings that end with 'm':")
print(s.str.endswith('m'))

Strings that end with 'm':
0    False
1     True
2    False
3     True
dtype: bool


### 15: Find Position of a Pattern

1. **Purpose of the Code**:  
   The code finds the position of a pattern (substring) in a string. If the pattern is not found, it returns `-1`.

2. **Algorithm Used**:  
   The **KMP (Knuth-Morris-Pratt) algorithm** is used for efficient string searching.

3. **Key Idea of KMP**:  
   - When a mismatch occurs, the algorithm uses precomputed information to skip unnecessary comparisons.  
   - This reduces the number of comparisons needed to find the pattern.

4. **Time Complexity**:  
   - The KMP algorithm has a time complexity of **O(n + m)**, where:  
     - `n` = length of the string.  
     - `m` = length of the pattern.

5. **Space Complexity**:  
   - The space complexity is **O(m)**, where `m` is the length of the pattern.

6. **Function Parameters**:  
   - The function takes two inputs:  
     - The **string** to search in.  
     - The **pattern** to search for.

7. **Return Value**:  
   - The function returns the **position** of the pattern in the string.  
   - If the pattern is not found, it returns `-1`.

8. **Efficiency**:  
   - The KMP algorithm is efficient for string searching due to its ability to skip unnecessary comparisons.

9. **string**:
   - The str.find('a') method returns the position of the first occurrence of the letter a in each string. If not found, it returns -1.


### Summary:
The paragraph explains how the KMP algorithm is used to efficiently find the position of a pattern in a string, with a focus on its time and space complexity, key idea, and functionality.

In [15]:
s = pd.Series(['Ali', 'Aslam', 'Umar', 'Akram'])
print(s.str.find('a'))

0   -1
1    3
2    2
3    3
dtype: int64


### 16: Find All Occurrences of a Pattern
#### The problem is to find all occurrences of a pattern in a text. The pattern is a string of characters, and the text is also a string of characters. The goal is to find all positions in the text where the pattern appears. This problem is a classic example of a string searching problem. The str.findall('a') method finds all occurrences of the letter a in each string and returns a list of matches.

In [16]:
s = pd.Series(['Ali', 'Aslam', 'Umar', 'Akram'])
print(s.str.findall('a'))

0     []
1    [a]
2    [a]
3    [a]
dtype: object


### 17: Swap Case of Strings 
#### You can use the `swapcase()` method to swap the case of a string. This method converts all uppercase characters to lowercase and all lowercase characters to uppercase. The str.swapcase() method swaps the case of each character in the strings (lowercase to uppercase and vice versa). 

In [17]:
s = pd.Series(['Ali', 'Aslam', 'Umar', 'Akram'])
print(s.str.swapcase())

0      aLI
1    aSLAM
2     uMAR
3    aKRAM
dtype: object


### 18: Check if Strings are Lowercase
#### You can use the `islower()` method to check if a string is entirely lowercase. Here's how you can do it: The str.islower() method checks if all characters in each string are lowercase and returns a boolean Series.

In [18]:
s = pd.Series(['ali', 'Aslam', 'umar', 'Akram'])
print(s.str.islower())

0     True
1    False
2     True
3    False
dtype: bool


### 19: Check if Strings are Uppercase
#### The str.isupper() method checks if all characters in each string are uppercase and returns a boolean Series.

In [19]:
s = pd.Series(['ALI', 'Aslam', 'UMAR', 'Akram'])
print(s.str.isupper())

0     True
1    False
2     True
3    False
dtype: bool


### 20: Check if Strings are Numeric
#### You can use the `isnumeric()` method to check if a string is numeric. This method returns `True` if all characters in the string are numeric, and `False` otherwise. The str.isnumeric() method checks if each string consists of numeric characters and returns a boolean Series.

#### Each code demonstrates a specific string operation using Pandas Series, with updated names and clear explanations.


In [20]:
s = pd.Series(['123', 'Aslam', '456', 'Akram'])
print(s.str.isnumeric())

0     True
1    False
2     True
3    False
dtype: bool
