# Formatos de fecha y series temporales

Pandas ha sido desarrollado en el contexto del modelado financiero. Por lo tanto tiene muchas funcionalidades referidas a fechas y tiempo y a su indexado. Tiene características como:
- **Instantes temporales** - referencias a momentos particulares en el tiempo (ej.: 4 de agosto a las 17:00).
        numpy.datetime64, DatetimeIndex
- **Intervalos de tiempo y períodos** -  referencia a un fragmento de tiempo entre dos instantes temporales. Los períodos son casos en los que se generan intervalos con la misma longitud temporal.
        numpy.datetime64, PeriodIndex
- **Deltas de tiempo** - referencia a tiempo transcurrido desde un instante temporal.
        numpy.timedelta64, TimedeltaIndex



### Formatos de fecha en  Pandas 

Pandas incorpora un *parsing* de fechas flexible:

In [34]:
import numpy as np
import pandas as pd

In [1]:
?pd.to_datetime

Object `pd.to_datetime` not found.


    pd.to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, utc=None, box=True, format=None, exact=True, coerce=None, unit=None, infer_datetime_format=False)

***
***
    Parameters

    arg : string, datetime, list, tuple, 1-d array, Series

        .. versionadded: 0.18.1

           or DataFrame/dict-like

    errors : {'ignore', 'raise', 'coerce'}, default 'raise'

        - If 'raise', then invalid parsing will raise an exception
        - If 'coerce', then invalid parsing will be set as NaT
        - If 'ignore', then invalid parsing will return the input
    dayfirst : boolean, default False
        Specify a date parse order if `arg` is str or its list-likes.
        If True, parses dates with the day first, eg 10/11/12 is parsed as
        2012-11-10.
        Warning: dayfirst=True is not strict, but will prefer to parse
        with day first (this is a known bug, based on dateutil behavior).
    yearfirst : boolean, default False
        Specify a date parse order if `arg` is str or its list-likes.

        - If True parses dates with the year first, eg 10/11/12 is parsed as
          2010-11-12.
        - If both dayfirst and yearfirst are True, yearfirst is preceded (same
          as dateutil).

        Warning: yearfirst=True is not strict, but will prefer to parse
        with year first (this is a known bug, based on dateutil beahavior).

        .. versionadded: 0.16.1

    utc : boolean, default None
        Return UTC DatetimeIndex if True (converting any tz-aware
        datetime.datetime objects as well).
    box : boolean, default True

    - If True returns a DatetimeIndex
    - If False returns ndarray of values.
    format : string, default None
        strftime to parse time, eg "%d/%m/%Y", note that "%f" will parse
        all the way up to nanoseconds.
    exact : boolean, True by default

        - If True, require an exact format match.
        - If False, allow the format to match anywhere in the target string.

    unit : string, default 'ns'
        unit of the arg (D,s,ms,us,ns) denote the unit in epoch
        (e.g. a unix timestamp), which is an integer/float number.
    infer_datetime_format : boolean, default False
        If True and no `format` is given, attempt to infer the format of the
        datetime strings, and if it can be inferred, switch to a faster
        method of parsing them. In some cases this can increase the parsing
        speed by ~5-10x.

In [2]:
import pandas as pd
pd.to_datetime("4th of July, 2015"), pd.to_datetime("January 19/20"),pd.to_datetime("2020/August/3")

(Timestamp('2015-07-04 00:00:00'),
 Timestamp('2020-01-19 00:00:00'),
 Timestamp('2020-08-03 00:00:00'))

In [14]:
pd.to_datetime([pd.datetime(2015, 7, 3), '4th of July, 2015','2015-Jul-6', '07-07-2015', '20150708'])

DatetimeIndex(['2015-07-03', '2015-07-04', '2015-07-06', '2015-07-07',
               '2015-07-08'],
              dtype='datetime64[ns]', freq=None)

También se puede establecer manualmente el formato de fecha

In [55]:
pd.to_datetime("2010, 5/10",format="%Y, %d/%m")

Timestamp('2010-10-05 00:00:00')

Se pueden extraer por separado las variables asociadas a la fecha:

<img src='formatosfecha.png',width=900>

In [16]:
#extraemos componentes de la fecha
date=pd.to_datetime("4th of July, 2015 3:17:59 PM")

lista=['%A','%w','%d','%W','%m','%Y','%H','%M','%S']

valores=[date.strftime(k) for k in lista]

nombres=['Día de la semana','Día de la semana numérico',
         'Día del mes','Semana del año','Mes del año','Año','Hora 24','Minuto','Segundo']

info=pd.DataFrame(valores,index=nombres,columns=[date])
info

Unnamed: 0,2015-07-04 15:17:59
Día de la semana,Saturday
Día de la semana numérico,6
Día del mes,04
Semana del año,26
Mes del año,07
Año,2015
Hora 24,15
Minuto,17
Segundo,59


Los formatos están basados en `numpy.datetime64`, cuyos parámetros temporales vienen dados

|Code    | Meaning     | Time span (relative) | Time span (absolute)   |
|--------|-------------|----------------------|------------------------|
| ``Y``  | Year	       | ± 9.2e18 years       | [9.2e18 BC, 9.2e18 AD] |
| ``M``  | Month       | ± 7.6e17 years       | [7.6e17 BC, 7.6e17 AD] |
| ``W``  | Week	       | ± 1.7e17 years       | [1.7e17 BC, 1.7e17 AD] |
| ``D``  | Day         | ± 2.5e16 years       | [2.5e16 BC, 2.5e16 AD] |
| ``h``  | Hour        | ± 1.0e15 years       | [1.0e15 BC, 1.0e15 AD] |
| ``m``  | Minute      | ± 1.7e13 years       | [1.7e13 BC, 1.7e13 AD] |
| ``s``  | Second      | ± 2.9e12 years       | [ 2.9e9 BC, 2.9e9 AD]  |
| ``ms`` | Millisecond | ± 2.9e9 years        | [ 2.9e6 BC, 2.9e6 AD]  |
| ``us`` | Microsecond | ± 2.9e6 years        | [290301 BC, 294241 AD] |
| ``ns`` | Nanosecond  | ± 292 years          | [ 1678 AD, 2262 AD]    |
| ``ps`` | Picosecond  | ± 106 days           | [ 1969 AD, 1970 AD]    |
| ``fs`` | Femtosecond | ± 2.6 hours          | [ 1969 AD, 1970 AD]    |
| ``as`` | Attosecond  | ± 9.2 seconds        | [ 1969 AD, 1970 AD]    |

Más documentación en la web de numpy sobre formatos temporales https://docs.scipy.org/doc/numpy/reference/arrays.datetime.html

### Generar fechas en un rango

In [17]:
pd.date_range('2015-07-03', '2015-07-10')

DatetimeIndex(['2015-07-03', '2015-07-04', '2015-07-05', '2015-07-06',
               '2015-07-07', '2015-07-08', '2015-07-09', '2015-07-10'],
              dtype='datetime64[ns]', freq='D')

In [74]:
pd.date_range('2015-07-03', '2015-12-10',freq='M')

DatetimeIndex(['2015-07-31', '2015-08-31', '2015-09-30', '2015-10-31',
               '2015-11-30'],
              dtype='datetime64[ns]', freq='M')

In [75]:
pd.date_range('2015-07-03', '2015-12-10',freq='B') #días de negocio como frecuencia

DatetimeIndex(['2015-07-03', '2015-07-06', '2015-07-07', '2015-07-08',
               '2015-07-09', '2015-07-10', '2015-07-13', '2015-07-14',
               '2015-07-15', '2015-07-16',
               ...
               '2015-11-27', '2015-11-30', '2015-12-01', '2015-12-02',
               '2015-12-03', '2015-12-04', '2015-12-07', '2015-12-08',
               '2015-12-09', '2015-12-10'],
              dtype='datetime64[ns]', length=115, freq='B')

In [19]:
pd.date_range('2015-07-03', periods=8) #se puede indicar el número de divisiones

DatetimeIndex(['2015-07-03', '2015-07-04', '2015-07-05', '2015-07-06',
               '2015-07-07', '2015-07-08', '2015-07-09', '2015-07-10'],
              dtype='datetime64[ns]', freq='D')

In [20]:
pd.date_range('2015-07-03', periods=2)

DatetimeIndex(['2015-07-03', '2015-07-04'], dtype='datetime64[ns]', freq='D')

In [38]:
#genera 5 semanas desde la fecha inicial
pd.date_range('2015-07-03', periods=5,freq='w')

DatetimeIndex(['2015-07-05', '2015-07-12', '2015-07-19', '2015-07-26',
               '2015-08-02'],
              dtype='datetime64[ns]', freq='W-SUN')

In [39]:
#genera 5 meses
pd.date_range('2015-07-03', periods=5,freq='M')

DatetimeIndex(['2015-07-31', '2015-08-31', '2015-09-30', '2015-10-31',
               '2015-11-30'],
              dtype='datetime64[ns]', freq='M')

Con `to_timedelta` se pasa un array de valores numéricos a formato fecha representando la unidad temporal indicada

In [40]:
pd.to_timedelta(np.arange(12), 'D')

TimedeltaIndex([ '0 days',  '1 days',  '2 days',  '3 days',  '4 days',
                 '5 days',  '6 days',  '7 days',  '8 days',  '9 days',
                '10 days', '11 days'],
               dtype='timedelta64[ns]', freq=None)

Los deltas de tiempo sirven para sumarlos en instantes y generar nuevos índices temporales.

In [41]:
date

Timestamp('2015-07-04 15:17:59')

In [48]:
rangomeses=date + pd.to_timedelta(np.arange(1,2000,90), 'D')
rangomeses

DatetimeIndex(['2015-07-05 15:17:59', '2015-10-03 15:17:59',
               '2016-01-01 15:17:59', '2016-03-31 15:17:59',
               '2016-06-29 15:17:59', '2016-09-27 15:17:59',
               '2016-12-26 15:17:59', '2017-03-26 15:17:59',
               '2017-06-24 15:17:59', '2017-09-22 15:17:59',
               '2017-12-21 15:17:59', '2018-03-21 15:17:59',
               '2018-06-19 15:17:59', '2018-09-17 15:17:59',
               '2018-12-16 15:17:59', '2019-03-16 15:17:59',
               '2019-06-14 15:17:59', '2019-09-12 15:17:59',
               '2019-12-11 15:17:59', '2020-03-10 15:17:59',
               '2020-06-08 15:17:59', '2020-09-06 15:17:59',
               '2020-12-05 15:17:59'],
              dtype='datetime64[ns]', freq=None)

Si restamos dos instantes de tiempo obtenemos un objeto de tipo Timedelta


In [73]:
rangomeses - rangomeses[0]

TimedeltaIndex([   '0 days',   '90 days',  '180 days',  '270 days',
                 '360 days',  '450 days',  '540 days',  '630 days',
                 '720 days',  '810 days',  '900 days',  '990 days',
                '1080 days', '1170 days', '1260 days', '1350 days',
                '1440 days', '1530 days', '1620 days', '1710 days',
                '1800 days', '1890 days', '1980 days'],
               dtype='timedelta64[ns]', freq=None)

### Transformarlas a un formato período

Los valores posibles del parámetro `freq` son

| Code   | Description         | Code   | Description          |
|--------|---------------------|--------|----------------------|
| ``D``  | Calendar day        | ``B``  | Business day         |
| ``W``  | Weekly              |        |                      |
| ``M``  | Month end           | ``BM`` | Business month end   |
| ``Q``  | Quarter end         | ``BQ`` | Business quarter end |
| ``A``  | Year end            | ``BA`` | Business year end    |
| ``H``  | Hours               | ``BH`` | Business hours       |
| ``T``  | Minutes             |        |                      |
| ``S``  | Seconds             |        |                      |
| ``L``  | Milliseonds         |        |                      |
| ``U``  | Microseconds        |        |                      |
| ``N``  | nanoseconds         |        |                      |


In [49]:
rangomeses.to_period('M')

PeriodIndex(['2015-07', '2015-10', '2016-01', '2016-03', '2016-06', '2016-09',
             '2016-12', '2017-03', '2017-06', '2017-09', '2017-12', '2018-03',
             '2018-06', '2018-09', '2018-12', '2019-03', '2019-06', '2019-09',
             '2019-12', '2020-03', '2020-06', '2020-09', '2020-12'],
            dtype='period[M]', freq='M')

In [52]:
rangomeses.to_period('A')

PeriodIndex(['2015', '2015', '2016', '2016', '2016', '2016', '2016', '2017',
             '2017', '2017', '2017', '2018', '2018', '2018', '2018', '2019',
             '2019', '2019', '2019', '2020', '2020', '2020', '2020'],
            dtype='period[A-DEC]', freq='A-DEC')

In [53]:
rangomeses.to_period('Q')

PeriodIndex(['2015Q3', '2015Q4', '2016Q1', '2016Q1', '2016Q2', '2016Q3',
             '2016Q4', '2017Q1', '2017Q2', '2017Q3', '2017Q4', '2018Q1',
             '2018Q2', '2018Q3', '2018Q4', '2019Q1', '2019Q2', '2019Q3',
             '2019Q4', '2020Q1', '2020Q2', '2020Q3', '2020Q4'],
            dtype='period[Q-DEC]', freq='Q-DEC')

### Fechas como índices para Series

Se pueden establecer índices temporales en las series y tomar subconjuntos de estas fechas:

In [67]:
index = pd.DatetimeIndex(['2014-07-03', '2014-08-04',
                          '2015-07-06', '2015-08-14'])
data = pd.Series([0, 1, 2, 3], index=index)
data

2014-07-03    0
2014-08-04    1
2015-07-06    2
2015-08-14    3
dtype: int64

In [68]:
data['2014-07-03':'2015-07-04']

2014-07-03    0
2014-08-04    1
dtype: int64

In [69]:
data['2015']

2015-07-06    2
2015-08-14    3
dtype: int64

In [70]:
data.loc[data.index.month==7]

2014-07-03    0
2015-07-06    2
dtype: int64

In [71]:
data.loc[data.index.day<6]

2014-07-03    0
2014-08-04    1
dtype: int64

In [78]:
seriemeses=pd.Series(np.arange(len(rangomeses)),index=rangomeses)
seriemeses

2015-07-05 15:17:59     0
2015-10-03 15:17:59     1
2016-01-01 15:17:59     2
2016-03-31 15:17:59     3
2016-06-29 15:17:59     4
2016-09-27 15:17:59     5
2016-12-26 15:17:59     6
2017-03-26 15:17:59     7
2017-06-24 15:17:59     8
2017-09-22 15:17:59     9
2017-12-21 15:17:59    10
2018-03-21 15:17:59    11
2018-06-19 15:17:59    12
2018-09-17 15:17:59    13
2018-12-16 15:17:59    14
2019-03-16 15:17:59    15
2019-06-14 15:17:59    16
2019-09-12 15:17:59    17
2019-12-11 15:17:59    18
2020-03-10 15:17:59    19
2020-06-08 15:17:59    20
2020-09-06 15:17:59    21
2020-12-05 15:17:59    22
dtype: int64

In [80]:
seriemeses.loc[seriemeses.index.dayofweek==1] #extraemos los lunes

2016-09-27 15:17:59     5
2018-06-19 15:17:59    12
2020-03-10 15:17:59    19
dtype: int64