# <center><div style="width: 370px;"> ![Panel Data](pictures/Panel_Data.jpg)

# <center> Date and Text with `.dt` and `.str`

In [1]:
import numpy as np
import pandas as pd

## .dt accessor

`Series` has an accessor to succinctly return datetime like properties for the
*values* of the Series, if it is a datetime/period like Series.
This will return a Series, indexed like the existing Series.

In [2]:
s = pd.Series(pd.date_range("20130101 09:10:12", periods=4))
s

0   2013-01-01 09:10:12
1   2013-01-02 09:10:12
2   2013-01-03 09:10:12
3   2013-01-04 09:10:12
dtype: datetime64[ns]

In [3]:
s.dt.hour

0    9
1    9
2    9
3    9
dtype: int32

In [4]:
s.dt.second

0    12
1    12
2    12
3    12
dtype: int32

In [5]:
s.dt.day

0    1
1    2
2    3
3    4
dtype: int32

This enables nice expressions like this:

In [6]:
s[s.dt.day == 2]

1   2013-01-02 09:10:12
dtype: datetime64[ns]

In [7]:
stz = s.dt.tz_localize("US/Eastern")
stz

0   2013-01-01 09:10:12-05:00
1   2013-01-02 09:10:12-05:00
2   2013-01-03 09:10:12-05:00
3   2013-01-04 09:10:12-05:00
dtype: datetime64[ns, US/Eastern]

In [8]:
stz.dt.tz

<DstTzInfo 'US/Eastern' LMT-1 day, 19:04:00 STD>

You can also chain these types of operations:

In [9]:
s.dt.tz_localize("UTC").dt.tz_convert("US/Eastern")

0   2013-01-01 04:10:12-05:00
1   2013-01-02 04:10:12-05:00
2   2013-01-03 04:10:12-05:00
3   2013-01-04 04:10:12-05:00
dtype: datetime64[ns, US/Eastern]

You can also format datetime values as strings with `Series.dt.strftime()` which
supports the same format as the standard `strftime()`.

In [10]:
s = pd.Series(pd.date_range("20130101", periods=4))
s

0   2013-01-01
1   2013-01-02
2   2013-01-03
3   2013-01-04
dtype: datetime64[ns]

In [11]:
s.dt.strftime("%Y/%m/%d")

0    2013/01/01
1    2013/01/02
2    2013/01/03
3    2013/01/04
dtype: object

In [12]:
s = pd.Series(pd.period_range("20130101", periods=4))
s

0    2013-01-01
1    2013-01-02
2    2013-01-03
3    2013-01-04
dtype: period[D]

In [13]:
s.dt.strftime("%Y/%m/%d")

0    2013/01/01
1    2013/01/02
2    2013/01/03
3    2013/01/04
dtype: object

The `.dt` accessor works for period and timedelta dtypes.

In [14]:
s = pd.Series(pd.period_range("20130101", periods=4, freq="D"))
s

0    2013-01-01
1    2013-01-02
2    2013-01-03
3    2013-01-04
dtype: period[D]

In [15]:
s.dt.year

0    2013
1    2013
2    2013
3    2013
dtype: int64

In [16]:
s.dt.day

0    1
1    2
2    3
3    4
dtype: int64

In [17]:
s = pd.Series(pd.timedelta_range("1 day 00:00:05", periods=4, freq="s"))
s

0   1 days 00:00:05
1   1 days 00:00:06
2   1 days 00:00:07
3   1 days 00:00:08
dtype: timedelta64[ns]

In [18]:
s.dt.days

0    1
1    1
2    1
3    1
dtype: int64

In [19]:
s.dt.seconds

0    5
1    6
2    7
3    8
dtype: int32

In [20]:
s.dt.components

Unnamed: 0,days,hours,minutes,seconds,milliseconds,microseconds,nanoseconds
0,1,0,0,5,0,0,0
1,1,0,0,6,0,0,0
2,1,0,0,7,0,0,0
3,1,0,0,8,0,0,0


> Note
> 
> `Series.dt` will raise a `TypeError` if you access with a non-datetime-like values.

## Vectorized string methods

Series is equipped with a set of string processing methods that make it easy to
operate on each element of the array. Perhaps most importantly, these methods
exclude missing/NA values automatically. These are accessed via the Series’s
`str` attribute and generally have names matching the equivalent (scalar)
built-in string methods. For example:

In [21]:
s = pd.Series(
    ["A", "B", "C", "Aaba", "Baca", np.nan, "CABA", "dog", "cat"],
    dtype="string",
)

In [22]:
s

0       A
1       B
2       C
3    Aaba
4    Baca
5    <NA>
6    CABA
7     dog
8     cat
dtype: string

In [23]:
s

0       A
1       B
2       C
3    Aaba
4    Baca
5    <NA>
6    CABA
7     dog
8     cat
dtype: string

In [24]:
s.str.lower()

0       a
1       b
2       c
3    aaba
4    baca
5    <NA>
6    caba
7     dog
8     cat
dtype: string

In [25]:
s.str.upper()

0       A
1       B
2       C
3    AABA
4    BACA
5    <NA>
6    CABA
7     DOG
8     CAT
dtype: string

In [26]:
s.str.len()

0       1
1       1
2       1
3       4
4       4
5    <NA>
6       4
7       3
8       3
dtype: Int64

In [27]:
s = pd.Series(
    ['Nika Shakarami', 'Sarina EsmaeilZadeh', 'Mahsa Amini']
)

In [28]:
s.str.split(' ')

0         [Nika, Shakarami]
1    [Sarina, EsmaeilZadeh]
2            [Mahsa, Amini]
dtype: object

In [29]:
s.str.split(' ').str.get(0)

0      Nika
1    Sarina
2     Mahsa
dtype: object

In [30]:
s.str.split(' ', expand=True)

Unnamed: 0,0,1
0,Nika,Shakarami
1,Sarina,EsmaeilZadeh
2,Mahsa,Amini


Powerful pattern-matching methods are provided as well, but note that
pattern-matching generally uses regular expressions) by default (and in some cases
always uses them).

> **Note:**
> 
> Prior to pandas 1.0, string methods were only available on `object` -dtype
`Series`. pandas 1.0 added the `StringDtype` which is dedicated
to strings. See Text data types for more.

Please see [Vectorized String Methods](https://pandas.pydata.org/docs/user_guide/text.html#text-string-methods) for a complete
description.