#### Pandas Tutorial - Part 50

This notebook covers various Series methods including:
- Cross-sectioning with `xs()`
- Working with timezones using `dt.tz_localize()` and `dt.tz_convert()`

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import pytz

%matplotlib inline

##### Cross-sectioning with `xs()`

The `xs()` method returns a cross-section from a Series or DataFrame with a MultiIndex.

In [2]:
# Create a DataFrame with MultiIndex
d = {'num_legs': [4, 4, 2, 2],
     'num_wings': [0, 0, 2, 2],
     'class': ['mammal', 'mammal', 'mammal', 'bird'],
     'animal': ['cat', 'dog', 'bat', 'penguin'],
     'locomotion': ['walks', 'walks', 'flies', 'walks']}
df = pd.DataFrame(data=d)
df = df.set_index(['class', 'animal', 'locomotion'])
print("DataFrame with MultiIndex:")
print(df)

DataFrame with MultiIndex:
                           num_legs  num_wings
class  animal  locomotion                     
mammal cat     walks              4          0
       dog     walks              4          0
       bat     flies              2          2
bird   penguin walks              2          2


In [3]:
# Get values at specified index
print("Cross-section for 'mammal':")
print(df.xs('mammal'))

Cross-section for 'mammal':
                   num_legs  num_wings
animal locomotion                     
cat    walks              4          0
dog    walks              4          0
bat    flies              2          2


In [4]:
# Get values at several indexes
print("Cross-section for ('mammal', 'dog'):")
print(df.xs(('mammal', 'dog')))

Cross-section for ('mammal', 'dog'):
            num_legs  num_wings
locomotion                     
walks              4          0


  print(df.xs(('mammal', 'dog')))


In [5]:
# Get values at specified index and level
print("Cross-section for 'cat' at level 1:")
print(df.xs('cat', level=1))

Cross-section for 'cat' at level 1:
                   num_legs  num_wings
class  locomotion                     
mammal walks              4          0


In [6]:
# Get values at specified index and level with drop_level=False
print("Cross-section for 'cat' at level 1 with drop_level=False:")
print(df.xs('cat', level=1, drop_level=False))

Cross-section for 'cat' at level 1 with drop_level=False:
                          num_legs  num_wings
class  animal locomotion                     
mammal cat    walks              4          0


In [7]:
# Get values at specified index and level by position
print("Cross-section for 'walks' at level 2:")
print(df.xs('walks', level=2))

Cross-section for 'walks' at level 2:
                num_legs  num_wings
class  animal                      
mammal cat             4          0
       dog             4          0
bird   penguin         2          2


In [8]:
# Create a Series with MultiIndex
s = pd.Series([1, 2, 3, 4], 
              index=pd.MultiIndex.from_tuples([('a', 'one'), ('a', 'two'), 
                                              ('b', 'one'), ('b', 'two')],
                                             names=['letter', 'number']))
print("Series with MultiIndex:")
print(s)

Series with MultiIndex:
letter  number
a       one       1
        two       2
b       one       3
        two       4
dtype: int64


In [9]:
# Get values at specified index
print("Cross-section for 'a':")
print(s.xs('a'))

Cross-section for 'a':
number
one    1
two    2
dtype: int64


In [10]:
# Get values at specified index and level
print("Cross-section for 'one' at level 'number':")
print(s.xs('one', level='number'))

Cross-section for 'one' at level 'number':
letter
a    1
b    3
dtype: int64


##### Working with Timezones

Pandas provides methods for working with timezones in datetime Series.

### Localizing Timezones with `dt.tz_localize()`

The `dt.tz_localize()` method localizes tz-naive datetime Series to a given timezone.

In [11]:
# Create a datetime Series
s = pd.Series(pd.date_range('2023-01-01', periods=5))
print("Original datetime Series (tz-naive):")
print(s)

Original datetime Series (tz-naive):
0   2023-01-01
1   2023-01-02
2   2023-01-03
3   2023-01-04
4   2023-01-05
dtype: datetime64[ns]


In [12]:
# Localize to UTC
s_utc = s.dt.tz_localize('UTC')
print("Datetime Series localized to UTC:")
print(s_utc)

Datetime Series localized to UTC:
0   2023-01-01 00:00:00+00:00
1   2023-01-02 00:00:00+00:00
2   2023-01-03 00:00:00+00:00
3   2023-01-04 00:00:00+00:00
4   2023-01-05 00:00:00+00:00
dtype: datetime64[ns, UTC]


In [13]:
# Localize to US/Eastern
s_eastern = s.dt.tz_localize('US/Eastern')
print("Datetime Series localized to US/Eastern:")
print(s_eastern)

Datetime Series localized to US/Eastern:
0   2023-01-01 00:00:00-05:00
1   2023-01-02 00:00:00-05:00
2   2023-01-03 00:00:00-05:00
3   2023-01-04 00:00:00-05:00
4   2023-01-05 00:00:00-05:00
dtype: datetime64[ns, US/Eastern]


In [14]:
# Localize to Europe/London
s_london = s.dt.tz_localize('Europe/London')
print("Datetime Series localized to Europe/London:")
print(s_london)

Datetime Series localized to Europe/London:
0   2023-01-01 00:00:00+00:00
1   2023-01-02 00:00:00+00:00
2   2023-01-03 00:00:00+00:00
3   2023-01-04 00:00:00+00:00
4   2023-01-05 00:00:00+00:00
dtype: datetime64[ns, Europe/London]


In [15]:
# Create a datetime Series during DST transition
s_dst = pd.to_datetime(pd.Series(['2018-10-28 01:30:00',
                                 '2018-10-28 02:00:00',
                                 '2018-10-28 02:30:00',
                                 '2018-10-28 02:00:00',
                                 '2018-10-28 02:30:00',
                                 '2018-10-28 03:00:00',
                                 '2018-10-28 03:30:00']))
print("Datetime Series during DST transition:")
print(s_dst)

Datetime Series during DST transition:
0   2018-10-28 01:30:00
1   2018-10-28 02:00:00
2   2018-10-28 02:30:00
3   2018-10-28 02:00:00
4   2018-10-28 02:30:00
5   2018-10-28 03:00:00
6   2018-10-28 03:30:00
dtype: datetime64[ns]


In [16]:
# Localize with ambiguous='infer'
s_dst_cet = s_dst.dt.tz_localize('CET', ambiguous='infer')
print("Datetime Series localized to CET with ambiguous='infer':")
print(s_dst_cet)

Datetime Series localized to CET with ambiguous='infer':
0   2018-10-28 01:30:00+02:00
1   2018-10-28 02:00:00+02:00
2   2018-10-28 02:30:00+02:00
3   2018-10-28 02:00:00+01:00
4   2018-10-28 02:30:00+01:00
5   2018-10-28 03:00:00+01:00
6   2018-10-28 03:30:00+01:00
dtype: datetime64[ns, CET]


In [17]:
# Localize with explicit ambiguous array
s_ambiguous = pd.to_datetime(pd.Series(['2018-10-28 01:20:00',
                                       '2018-10-28 02:36:00',
                                       '2018-10-28 03:46:00']))
s_ambiguous_cet = s_ambiguous.dt.tz_localize('CET', ambiguous=np.array([True, True, False]))
print("Datetime Series localized to CET with explicit ambiguous array:")
print(s_ambiguous_cet)

Datetime Series localized to CET with explicit ambiguous array:
0   2018-10-28 01:20:00+02:00
1   2018-10-28 02:36:00+02:00
2   2018-10-28 03:46:00+01:00
dtype: datetime64[ns, CET]


In [18]:
# Create a datetime Series with nonexistent times (during DST spring forward)
s_nonexistent = pd.to_datetime(pd.Series(['2015-03-29 02:30:00',
                                         '2015-03-29 03:30:00']))
print("Datetime Series with nonexistent times:")
print(s_nonexistent)

Datetime Series with nonexistent times:
0   2015-03-29 02:30:00
1   2015-03-29 03:30:00
dtype: datetime64[ns]


In [19]:
# Localize with nonexistent='shift_forward'
s_nonexistent_forward = s_nonexistent.dt.tz_localize('Europe/Warsaw', nonexistent='shift_forward')
print("Datetime Series localized with nonexistent='shift_forward':")
print(s_nonexistent_forward)

Datetime Series localized with nonexistent='shift_forward':
0   2015-03-29 03:00:00+02:00
1   2015-03-29 03:30:00+02:00
dtype: datetime64[ns, Europe/Warsaw]


In [20]:
# Localize with nonexistent='shift_backward'
s_nonexistent_backward = s_nonexistent.dt.tz_localize('Europe/Warsaw', nonexistent='shift_backward')
print("Datetime Series localized with nonexistent='shift_backward':")
print(s_nonexistent_backward)

Datetime Series localized with nonexistent='shift_backward':
0   2015-03-29 01:59:59.999999999+01:00
1             2015-03-29 03:30:00+02:00
dtype: datetime64[ns, Europe/Warsaw]


In [21]:
# Localize with nonexistent=Timedelta
s_nonexistent_timedelta = s_nonexistent.dt.tz_localize('Europe/Warsaw', nonexistent=pd.Timedelta('1H'))
print("Datetime Series localized with nonexistent=Timedelta('1H'):")
print(s_nonexistent_timedelta)

Datetime Series localized with nonexistent=Timedelta('1H'):
0   2015-03-29 03:30:00+02:00
1   2015-03-29 03:30:00+02:00
dtype: datetime64[ns, Europe/Warsaw]


  s_nonexistent_timedelta = s_nonexistent.dt.tz_localize('Europe/Warsaw', nonexistent=pd.Timedelta('1H'))


### Converting Timezones with `dt.tz_convert()`

The `dt.tz_convert()` method converts tz-aware datetime Series from one timezone to another.

In [22]:
# Create a tz-aware datetime Series
dti = pd.date_range(start='2014-08-01 09:00', freq='H', periods=3, tz='Europe/Berlin')
s = pd.Series(dti)
print("Original tz-aware datetime Series (Europe/Berlin):")
print(s)

Original tz-aware datetime Series (Europe/Berlin):
0   2014-08-01 09:00:00+02:00
1   2014-08-01 10:00:00+02:00
2   2014-08-01 11:00:00+02:00
dtype: datetime64[ns, Europe/Berlin]


  dti = pd.date_range(start='2014-08-01 09:00', freq='H', periods=3, tz='Europe/Berlin')


In [23]:
# Convert to US/Eastern
s_eastern = s.dt.tz_convert('US/Eastern')
print("Datetime Series converted to US/Eastern:")
print(s_eastern)

Datetime Series converted to US/Eastern:
0   2014-08-01 03:00:00-04:00
1   2014-08-01 04:00:00-04:00
2   2014-08-01 05:00:00-04:00
dtype: datetime64[ns, US/Eastern]


In [24]:
# Convert to Asia/Tokyo
s_tokyo = s.dt.tz_convert('Asia/Tokyo')
print("Datetime Series converted to Asia/Tokyo:")
print(s_tokyo)

Datetime Series converted to Asia/Tokyo:
0   2014-08-01 16:00:00+09:00
1   2014-08-01 17:00:00+09:00
2   2014-08-01 18:00:00+09:00
dtype: datetime64[ns, Asia/Tokyo]


In [25]:
# Convert to UTC
s_utc = s.dt.tz_convert('UTC')
print("Datetime Series converted to UTC:")
print(s_utc)

Datetime Series converted to UTC:
0   2014-08-01 07:00:00+00:00
1   2014-08-01 08:00:00+00:00
2   2014-08-01 09:00:00+00:00
dtype: datetime64[ns, UTC]


In [26]:
# Remove timezone information
s_naive = s.dt.tz_convert(None)
print("Datetime Series with timezone information removed:")
print(s_naive)

Datetime Series with timezone information removed:
0   2014-08-01 07:00:00
1   2014-08-01 08:00:00
2   2014-08-01 09:00:00
dtype: datetime64[ns]


In [27]:
# Try to convert a tz-naive Series
s_naive = pd.Series(pd.date_range('2023-01-01', periods=3))
print("Tz-naive datetime Series:")
print(s_naive)

try:
    s_naive.dt.tz_convert('UTC')
except TypeError as e:
    print(f"\nError: {e}")

Tz-naive datetime Series:
0   2023-01-01
1   2023-01-02
2   2023-01-03
dtype: datetime64[ns]

Error: Cannot convert tz-naive timestamps, use tz_localize to localize


##### Practical Applications of Timezone Handling

In [28]:
# Create a datetime Series with timestamps from different timezones
timestamps = [
    '2023-01-01 08:00:00',  # New York
    '2023-01-01 14:00:00',  # London
    '2023-01-01 23:00:00',  # Tokyo
]
locations = ['New York', 'London', 'Tokyo']
timezones = ['US/Eastern', 'Europe/London', 'Asia/Tokyo']

# Create a DataFrame
df = pd.DataFrame({
    'timestamp': pd.to_datetime(timestamps),
    'location': locations,
    'timezone': timezones
})
print("Original DataFrame:")
print(df)

Original DataFrame:
            timestamp  location       timezone
0 2023-01-01 08:00:00  New York     US/Eastern
1 2023-01-01 14:00:00    London  Europe/London
2 2023-01-01 23:00:00     Tokyo     Asia/Tokyo


In [29]:
# Localize each timestamp to its corresponding timezone
for i, row in df.iterrows():
    df.loc[i, 'localized_timestamp'] = row['timestamp'].tz_localize(row['timezone'])

print("DataFrame with localized timestamps:")
print(df)

DataFrame with localized timestamps:
            timestamp  location       timezone       localized_timestamp
0 2023-01-01 08:00:00  New York     US/Eastern 2023-01-01 08:00:00-05:00
1 2023-01-01 14:00:00    London  Europe/London 2023-01-01 09:00:00-05:00
2 2023-01-01 23:00:00     Tokyo     Asia/Tokyo 2023-01-01 09:00:00-05:00


In [30]:
# Convert all timestamps to UTC
df['utc_timestamp'] = df['localized_timestamp'].apply(lambda x: x.tz_convert('UTC'))
print("DataFrame with UTC timestamps:")
print(df)

DataFrame with UTC timestamps:
            timestamp  location       timezone       localized_timestamp  \
0 2023-01-01 08:00:00  New York     US/Eastern 2023-01-01 08:00:00-05:00   
1 2023-01-01 14:00:00    London  Europe/London 2023-01-01 09:00:00-05:00   
2 2023-01-01 23:00:00     Tokyo     Asia/Tokyo 2023-01-01 09:00:00-05:00   

              utc_timestamp  
0 2023-01-01 13:00:00+00:00  
1 2023-01-01 14:00:00+00:00  
2 2023-01-01 14:00:00+00:00  


In [31]:
# Check if all timestamps are at the same UTC time
utc_times = df['utc_timestamp'].dt.strftime('%Y-%m-%d %H:%M:%S')
print("UTC times:")
print(utc_times)
print(f"\nAll timestamps are at the same UTC time: {utc_times.nunique() == 1}")

UTC times:
0    2023-01-01 13:00:00
1    2023-01-01 14:00:00
2    2023-01-01 14:00:00
Name: utc_timestamp, dtype: object

All timestamps are at the same UTC time: False


##### Conclusion

In this notebook, we've explored various Series methods in pandas:

1. Cross-sectioning with `xs()`, which returns a cross-section from a Series or DataFrame with a MultiIndex, allowing for flexible data selection at different levels.
2. Working with timezones using `dt.tz_localize()` and `dt.tz_convert()`, which provide powerful tools for handling datetime data across different timezones.

These methods are essential tools for data manipulation and analysis in pandas, allowing for flexible and powerful operations on your data, especially when working with hierarchical indexes and time series data across different timezones.