# Operaciones

Los `DataFrames` ofrecen varias operaciones básicas, el listado completo se puede encontrar en la [documentación](https://pandas.pydata.org/docs/user_guide/basics.html). 

## Rango de fechas

In [1]:
import pandas as pd
import numpy as np

# Rango de fechas para usar de índice en un dataframe
index = pd.date_range("7/15/2022", periods=20)

index

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


DatetimeIndex(['2022-07-15', '2022-07-16', '2022-07-17', '2022-07-18',
               '2022-07-19', '2022-07-20', '2022-07-21', '2022-07-22',
               '2022-07-23', '2022-07-24', '2022-07-25', '2022-07-26',
               '2022-07-27', '2022-07-28', '2022-07-29', '2022-07-30',
               '2022-07-31', '2022-08-01', '2022-08-02', '2022-08-03'],
              dtype='datetime64[ns]', freq='D')

## Consultas rápidas

In [21]:
# Lo utilizamos para rellenar un df con valores aleatorios
df = pd.DataFrame(np.random.randn(20, 4), index=index, columns=["A", "B", "C", "D"])

df

Unnamed: 0,A,B,C,D
2022-07-15,-0.011961,1.710416,2.325046,-1.669884
2022-07-16,-1.216634,-0.498198,0.705776,0.542406
2022-07-17,0.848479,-2.158653,1.637913,-0.88182
2022-07-18,-0.005994,0.00017,0.540943,-0.78759
2022-07-19,-0.944669,-1.314748,-0.414227,-0.652729
2022-07-20,-1.063026,0.120792,1.360617,-0.972178
2022-07-21,0.887138,-0.332656,-1.055303,-1.02434
2022-07-22,1.313917,0.323939,0.969588,-0.214606
2022-07-23,-0.518019,0.42458,-0.395986,-0.616451
2022-07-24,0.012873,-0.432329,-0.814812,-0.518714


In [3]:
# Primeras filas (cabeza)
df.head()

Unnamed: 0,A,B,C,D
2022-07-15,1.183673,0.155926,0.91359,0.153647
2022-07-16,-0.82985,0.918131,-0.645803,0.370965
2022-07-17,0.491792,-1.128323,2.523705,-0.342127
2022-07-18,-0.583506,-0.379722,-0.619071,0.865903
2022-07-19,-0.87216,-0.445559,-0.61038,-0.731612


In [4]:
# Primeras tres filas
df.head(3)

Unnamed: 0,A,B,C,D
2022-07-15,1.183673,0.155926,0.91359,0.153647
2022-07-16,-0.82985,0.918131,-0.645803,0.370965
2022-07-17,0.491792,-1.128323,2.523705,-0.342127


In [5]:
# Últimas filas (cola)
df.tail()

Unnamed: 0,A,B,C,D
2022-07-30,-1.776082,0.422872,0.442382,-0.56108
2022-07-31,-0.54336,-1.131246,-0.982543,0.911901
2022-08-01,-0.747388,-0.104183,-1.107131,-0.832021
2022-08-02,0.125725,-0.59913,-0.390472,-0.431658
2022-08-03,-1.351646,-0.355741,0.085301,0.661144


In [6]:
# Últimas tres filas
df.tail(3)

Unnamed: 0,A,B,C,D
2022-08-01,-0.747388,-0.104183,-1.107131,-0.832021
2022-08-02,0.125725,-0.59913,-0.390472,-0.431658
2022-08-03,-1.351646,-0.355741,0.085301,0.661144


## Valores únicos

In [7]:
# Definimos un DataFrame con información de diferentes tipos
df = pd.DataFrame({
      'enteros': [100, 200, 300, 400],
    'decimales': [3.14, 2.72, 1.618, 3.14],
      'cadenas': ['hola','adiós','hola','adiós']})

df

Unnamed: 0,enteros,decimales,cadenas
0,100,3.14,hola
1,200,2.72,adiós
2,300,1.618,hola
3,400,3.14,adiós


In [8]:
# Array de valores únicos de una columna
df['cadenas'].unique()

array(['hola', 'adiós'], dtype=object)

In [9]:
# Contador de valores únicos de una columna
df['cadenas'].nunique()

2

In [10]:
# Dataframe con los de valores únicos y su contador de una columna
df['cadenas'].value_counts()

cadenas
hola     2
adiós    2
Name: count, dtype: int64

## Aplicación de funciones

In [11]:
# Método interno de las Series columna
df['decimales'].sum()

10.618

In [12]:
# Aplicar una función predefinida
df['cadenas'].apply(len)

0    4
1    5
2    4
3    5
Name: cadenas, dtype: int64

In [13]:
# Aplicar una función definida
def doblar(n):
    return n*2

df['enteros'].apply(doblar)

0    200
1    400
2    600
3    800
Name: enteros, dtype: int64

In [14]:
# Aplicar una función anónima
df['enteros'].apply(lambda n: n/3)

0     33.333333
1     66.666667
2    100.000000
3    133.333333
Name: enteros, dtype: float64

In [15]:
# Borrar permanentemente una columna
del df['decimales']

In [16]:
df

Unnamed: 0,enteros,cadenas
0,100,hola
1,200,adiós
2,300,hola
3,400,adiós


## Recuperar índices

In [17]:
# Índices de las columnas
df.columns

Index(['enteros', 'cadenas'], dtype='object')

In [18]:
# Índice de las filas
df.index

RangeIndex(start=0, stop=4, step=1)

## Aplicar ordenaciones

In [19]:
# Ordenar por columna (inplace=False por defecto)
df.sort_values(by='enteros')

Unnamed: 0,enteros,cadenas
0,100,hola
1,200,adiós
2,300,hola
3,400,adiós


In [20]:
# Ordenar por columna inversamente (inplace=False por defecto)
df.sort_values(by='enteros',ascending=False)

Unnamed: 0,enteros,cadenas
3,400,adiós
2,300,hola
1,200,adiós
0,100,hola
