#### Pandas Tutorial - Part 51

This notebook covers various Series methods including:
- Normalizing datetime values with `dt.normalize()`
- Formatting datetime values with `dt.strftime()`
- String extraction methods: `str.extractall()`, `str.find()`, and `str.findall()`

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import re

%matplotlib inline

##### Normalizing Datetime Values with `dt.normalize()`

The `dt.normalize()` method converts times to midnight (00:00:00). This is useful when the time component doesn't matter.

In [None]:
# Create a datetime Series
idx = pd.date_range(start='2023-01-01 10:00', freq='H', periods=5)
s = pd.Series(idx)
print("Original datetime Series:")
print(s)

In [None]:
# Normalize the datetime values
s_normalized = s.dt.normalize()
print("Normalized datetime Series:")
print(s_normalized)

In [None]:
# Create a datetime Series with timezone
idx_tz = pd.date_range(start='2023-01-01 10:00', freq='H', periods=5, tz='Asia/Calcutta')
s_tz = pd.Series(idx_tz)
print("Original datetime Series with timezone:")
print(s_tz)

In [None]:
# Normalize the datetime values with timezone
s_tz_normalized = s_tz.dt.normalize()
print("Normalized datetime Series with timezone:")
print(s_tz_normalized)

In [None]:
# Create a datetime Series with different dates
dates = ['2023-01-01 10:30:45', '2023-01-02 12:15:30', '2023-01-03 18:45:00']
s_mixed = pd.Series(pd.to_datetime(dates))
print("Original datetime Series with different dates:")
print(s_mixed)

In [None]:
# Normalize the datetime values
s_mixed_normalized = s_mixed.dt.normalize()
print("Normalized datetime Series with different dates:")
print(s_mixed_normalized)

##### Formatting Datetime Values with `dt.strftime()`

The `dt.strftime()` method converts datetime values to strings using a specified format.

In [None]:
# Create a datetime Series
rng = pd.date_range(pd.Timestamp("2023-01-01 09:00"), periods=5, freq='H')
s = pd.Series(rng)
print("Original datetime Series:")
print(s)

In [None]:
# Format datetime values with strftime
s_formatted = s.dt.strftime("%Y-%m-%d %H:%M:%S")
print("Formatted datetime Series:")
print(s_formatted)

In [None]:
# Format datetime values with different format
s_formatted_short = s.dt.strftime("%d/%m/%Y")
print("Formatted datetime Series (short format):")
print(s_formatted_short)

In [None]:
# Format datetime values with day name
s_formatted_day = s.dt.strftime("%A, %B %d, %Y")
print("Formatted datetime Series with day name:")
print(s_formatted_day)

In [None]:
# Format datetime values with time only
s_formatted_time = s.dt.strftime("%I:%M %p")
print("Formatted datetime Series with time only:")
print(s_formatted_time)

In [None]:
# Create a datetime Series with timezone
rng_tz = pd.date_range(pd.Timestamp("2023-01-01 09:00"), periods=5, freq='H', tz='Europe/Berlin')
s_tz = pd.Series(rng_tz)
print("Original datetime Series with timezone:")
print(s_tz)

In [None]:
# Format datetime values with timezone
s_tz_formatted = s_tz.dt.strftime("%Y-%m-%d %H:%M:%S %Z")
print("Formatted datetime Series with timezone:")
print(s_tz_formatted)

##### String Extraction Methods

Pandas provides several methods for extracting and finding patterns in strings.

### Extracting All Matches with `str.extractall()`

The `str.extractall()` method extracts groups from all matches of a regular expression pattern.

In [None]:
# Create a Series with strings
s = pd.Series(["a1a2", "b1", "c1"], index=["A", "B", "C"])
print("Original Series:")
print(s)

In [None]:
# Extract all matches with one group
result = s.str.extractall(r"[ab](\d)")
print("Result of extractall with one group:")
print(result)

In [None]:
# Extract all matches with named group
result_named = s.str.extractall(r"[ab](?P<digit>\d)")
print("Result of extractall with named group:")
print(result_named)

In [None]:
# Extract all matches with two groups
result_two_groups = s.str.extractall(r"(?P<letter>[ab])(?P<digit>\d)")
print("Result of extractall with two groups:")
print(result_two_groups)

In [None]:
# Extract all matches with optional group
result_optional = s.str.extractall(r"(?P<letter>[ab])?(?P<digit>\d)")
print("Result of extractall with optional group:")
print(result_optional)

In [None]:
# Create a Series with more complex strings
s_complex = pd.Series(['foo 123 bar 456', 'bar 789 foo', 'foo 123 456'])
print("Original Series with complex strings:")
print(s_complex)

In [None]:
# Extract all numbers
result_complex = s_complex.str.extractall(r'(\d+)')
print("Result of extractall for numbers:")
print(result_complex)

In [None]:
# Extract words and numbers
result_complex_words = s_complex.str.extractall(r'(?P<word>foo|bar) (?P<number>\d+)')
print("Result of extractall for words and numbers:")
print(result_complex_words)

### Finding Substrings with `str.find()`

The `str.find()` method returns the lowest index where the substring is found. Returns -1 if not found.

In [None]:
# Create a Series with strings
s = pd.Series(['apple', 'banana', 'cherry'])
print("Original Series:")
print(s)

In [None]:
# Find substring 'a'
result = s.str.find('a')
print("Result of find('a'):")
print(result)

In [None]:
# Find substring 'an'
result_an = s.str.find('an')
print("Result of find('an'):")
print(result_an)

In [None]:
# Find substring 'z'
result_z = s.str.find('z')
print("Result of find('z'):")
print(result_z)

In [None]:
# Find substring 'a' with start index
result_start = s.str.find('a', 1)
print("Result of find('a', 1):")
print(result_start)

In [None]:
# Find substring 'a' with start and end indices
result_start_end = s.str.find('a', 1, 3)
print("Result of find('a', 1, 3):")
print(result_start_end)

### Finding All Occurrences with `str.findall()`

The `str.findall()` method finds all occurrences of a pattern or regular expression.

In [None]:
# Create a Series with strings
s = pd.Series(['Lion', 'Monkey', 'Rabbit'])
print("Original Series:")
print(s)

In [None]:
# Find all occurrences of 'Monkey'
result = s.str.findall('Monkey')
print("Result of findall('Monkey'):")
print(result)

In [None]:
# Find all occurrences of 'MONKEY'
result_upper = s.str.findall('MONKEY')
print("Result of findall('MONKEY'):")
print(result_upper)

In [None]:
# Find all occurrences of 'MONKEY' with case-insensitive flag
result_ignore_case = s.str.findall('MONKEY', flags=re.IGNORECASE)
print("Result of findall('MONKEY', flags=re.IGNORECASE):")
print(result_ignore_case)

In [None]:
# Create a Series with more complex strings
s_complex = pd.Series(['apple and banana', 'orange, apple and pear', 'apple, orange'])
print("Original Series with complex strings:")
print(s_complex)

In [None]:
# Find all occurrences of fruits
result_complex = s_complex.str.findall(r'apple|banana|orange|pear')
print("Result of findall for fruits:")
print(result_complex)

In [None]:
# Find all words
result_words = s_complex.str.findall(r'\w+')
print("Result of findall for words:")
print(result_words)

##### Conclusion

In this notebook, we've explored various Series methods in pandas:

1. `dt.normalize()`: Converts times to midnight (00:00:00), which is useful when the time component doesn't matter.
2. `dt.strftime()`: Formats datetime values to strings using a specified format, allowing for customized date and time representations.
3. String extraction methods:
   - `str.extractall()`: Extracts groups from all matches of a regular expression pattern, returning a DataFrame with one row for each match and one column for each group.
   - `str.find()`: Returns the lowest index where the substring is found, or -1 if not found.
   - `str.findall()`: Finds all occurrences of a pattern or regular expression, returning a list of matches for each string.

These methods are essential tools for working with datetime data and string manipulation in pandas, allowing for flexible and powerful operations on your data.