<div class="licence">
<span>Licence CC BY-NC-ND</span>
<span>Valérie Roy</span>
<span><img src="media/ensmp-25-alpha.png" /></span>
</div>

##  VIII) dealing with **temporality** in $\texttt{pandas}$

In [None]:
import numpy as np
import pandas as pd

**3 ways** to talk about **temporality**
   - **date** or **time**: it is an **instant** e.g. **just now** 
   - **time duration**: e.g. **3 hours** (deltas)
   - **time period**: it is an **interval of time** e.g. a **date** plus a **duration**

### 1) date and time intervals in **numpy**

#### a) date in **numpy** $\texttt{numpy.datetime64}$

   - dates in $\texttt{pandas}$ are based on $\texttt{numpy.datetime64}$
   - the format is **'year-month-day hour:minute:second'**
   - the numbers are **zero-padded** ($09$ and not $9$)

In [None]:
np.datetime64('2019-09-04 14:00:09')

#### b) **time duration** in **numpy** $\texttt{numpy.timedelta64}$
   

In [None]:
np.datetime64('2019-09-04 14:00:09') - np.datetime64('2019-09-11 09:27:09')

### 2)  temporality in **pandas**

#### a) dates in **pandas** $\texttt{pandas.Timestamp}$

In [None]:
pd.Timestamp(0) # the Unix time 

*"the Unix time is the number of seconds that have elapsed since 00:00:00 Thursday, 1 January 1970"*

In [None]:
pd.Timestamp('2019-10-04 14:00:00') 

   - if you have a **specific format** use $\texttt{pandas.to_datetime}$ with the $\texttt{format}$ parameter
   - **Y** is year (2019), **y** is year (19), **m** is month, **d** is day (number), **M** is minute, ...

In [None]:
pd.to_datetime('2019|10|04 14;00;07', format='%Y|%m|%d %H;%M;%S') 

#### b) time duration in **pandas**  $\texttt{pandas.Timedelta}$

   - it is a **time interval**
   - with no mention of a precise **date**

   - **duration** between **two** dates

In [None]:
pd.Timestamp('2019-09-04 14:00:00') - pd.Timestamp('2019-09-04 8:36:57')

In [None]:
pd.Timestamp('2019-10-04 14:00:00') - pd.Timestamp('2019-09-04 8:36:57')

In [None]:
#pd.Timedelta?

#### c) time period in **pandas**  $\texttt{pandas.Timedelta}$

   - a **period** is a **date** and a **duration**

In [None]:
d = pd.Timestamp('2019-10-04 14:00:00')

In [None]:
d.to_period(d - pd.Timestamp('2019-09-04 8:36:57'))

In [None]:
d.to_period('D')

### 3) columns of dates for $\texttt{pandas.DataFrame}$

#### a) in an already created **dataframe**

   - you have a **dataset** with a column of **dates**

In [None]:
df = pd.DataFrame({'time': ['2019/12/25 23:59', '2019/12/31 23:59'],
                   'holidays': ['Christmas', 'New Year']})

   - the **time** is a **simple** a python **string** 

In [None]:
type(df.loc[0, 'time'])   

   - you can **transform** a **string** in **objects** of type **date**
   - with the $\texttt{pandas.to_datetime}$ method

In [None]:
df['time'] = pd.to_datetime(df['time'])

   - note that $\texttt{pandas.to_datetime}$, applied to an **array of dates**, returns an **index of dates**

In [None]:
df.dtypes

   - remember the $\texttt{pandas}$ datetimes rely on $\texttt{numpy.datetime64}$

   - you can **index** the **data frame** by a column of dates 

In [None]:
df.set_index('time')

#### b) creating **date** type while reading the **csv** file

   - you can **convert** the date during a **csv** read
   - and index your DataFrame by the date

   - we write the data frame in a file without the index

In [None]:
df.to_csv('foo.csv', index=None)

   - we **parse** the date while we **read** the csv-file

In [None]:
df = pd.read_csv('foo.csv', parse_dates=['time'])

In [None]:
df.dtypes

In [None]:
df.head()

   - we **index** the data frame by **date** while we read the csv-file

In [None]:
df = pd.read_csv('foo.csv', parse_dates=['time'], index_col='time')

In [None]:
df.head()

In [None]:
df.index

   - you have a new type: $\texttt{pandas.DatetimeIndex}$

   - for **unsusual dates format** indicate the parser function **to use** 
   - in the file **test_date.csv** we replace the '/' by '|' in the date strings

In [None]:
def my_date_parser (d):
    return pd.to_datetime(d, format='%Y|%m|%d %H:%M')

df = pd.read_csv('test_date.csv', parse_dates=['time'], index_col='time', date_parser=my_date_parser)

In [None]:
df.head()

### xxx) when dates are **wrong** you can **ignore** or **coerce**

   - you get an error

In [None]:
try:
    pd.to_datetime('30/02/2019')
except ValueError as e:
    print(e)

   - you ignore the error

In [None]:
pd.to_datetime('30/02/2019', errors='ignore') # your create a 30th of February

   - you coerce the error

In [None]:
pd.to_datetime('30/02/2019', errors='coerce') # this is Not a Time

   - it is the $\texttt{pandas}$ **object**: $\texttt{pandas.NaT}$
   - classical **NaN** methods work on **NaT values**

# ANNEXES

## 1) Dealing with unicode in $\texttt{pandas}$


https://docs.python.org/3/library/codecs.html#standard-encodings