#### Pandas Modismos (Idioms) -> Convenciones de uso

Los modismos de pandas, conocidos en inglés como "Pandas Idioms", son patrones de uso comunes y eficientes que los usuarios de la biblioteca pandas de Python emplean para manipular y analizar datos. Estos modismos permiten realizar tareas comunes de manera más legible y eficiente.

##### Ejemplos de uso básicos

In [15]:
# dataframe de ejemplo para aplicar los idioms
import pandas as pd

data = {
    'category': ['Electronics', 'Electronics', 'Clothing', 'Clothing', 'Groceries', 'Groceries'],
    'product': ['Laptop', 'Smartphone', 'Shirt', 'Pants', 'Apples', 'Bananas'],
    'price': [1000, 500, 50, 40, 3, 2],
    'quantity_sold': [5, 10, 15, 8, 30, 25],
    'sale_date': ['2024-06-01', '2024-06-02', '2024-06-01', '2024-06-02', '2024-06-01', '2024-06-02']
}

df = pd.DataFrame(data)

In [16]:
# 1.- selección y filtrado eficiente de datos
select_df = df[["category", "product"]]
print("Selección: ")
print(select_df)

electronics_df = df.loc[df['category'] == 'Electronics']
print("Filtrado: ")
print(electronics_df)


Selección: 
      category     product
0  Electronics      Laptop
1  Electronics  Smartphone
2     Clothing       Shirt
3     Clothing       Pants
4    Groceries      Apples
5    Groceries     Bananas
Filtrado: 
      category     product  price  quantity_sold   sale_date
0  Electronics      Laptop   1000              5  2024-06-01
1  Electronics  Smartphone    500             10  2024-06-02


In [17]:
# 2.- asignación y modificación de nuevas columnas
df['total_sales'] = df['price'] * df['quantity_sold']
print("Asignación de nuevas columnas: ")
print(df)

Asignación de nuevas columnas: 
      category     product  price  quantity_sold   sale_date  total_sales
0  Electronics      Laptop   1000              5  2024-06-01         5000
1  Electronics  Smartphone    500             10  2024-06-02         5000
2     Clothing       Shirt     50             15  2024-06-01          750
3     Clothing       Pants     40              8  2024-06-02          320
4    Groceries      Apples      3             30  2024-06-01           90
5    Groceries     Bananas      2             25  2024-06-02           50


In [18]:
# 3.- uso de métodos de agregación
category_sales = df.groupby('category')['total_sales'].sum()
print("Agregación: ")
print(category_sales)


Agregación: 
category
Clothing        1070
Electronics    10000
Groceries        140
Name: total_sales, dtype: int64


In [19]:
# 4.- agregación de funcionalidades a nivel de fila o columna
df['discounted_price'] = df['price'].apply(lambda x: x * 0.9)
print("Funcionalidades a columnas: ")
print(df)


Funcionalidades a columnas: 
      category     product  price  quantity_sold   sale_date  total_sales  \
0  Electronics      Laptop   1000              5  2024-06-01         5000   
1  Electronics  Smartphone    500             10  2024-06-02         5000   
2     Clothing       Shirt     50             15  2024-06-01          750   
3     Clothing       Pants     40              8  2024-06-02          320   
4    Groceries      Apples      3             30  2024-06-01           90   
5    Groceries     Bananas      2             25  2024-06-02           50   

   discounted_price  
0             900.0  
1             450.0  
2              45.0  
3              36.0  
4               2.7  
5               1.8  


In [20]:
# 5.- fusión y combinación de dataframe
additional_data = {
    'category': ['Electronics', 'Clothing', 'Groceries'],
    'tax_rate': [0.15, 0.10, 0.05]
}
additional_df = pd.DataFrame(additional_data)

merged_df = pd.merge(df, additional_df, on='category')
print("Fusión de dataframes: ")
print(merged_df)

Fusión de dataframes: 
      category     product  price  quantity_sold   sale_date  total_sales  \
0  Electronics      Laptop   1000              5  2024-06-01         5000   
1  Electronics  Smartphone    500             10  2024-06-02         5000   
2     Clothing       Shirt     50             15  2024-06-01          750   
3     Clothing       Pants     40              8  2024-06-02          320   
4    Groceries      Apples      3             30  2024-06-01           90   
5    Groceries     Bananas      2             25  2024-06-02           50   

   discounted_price  tax_rate  
0             900.0      0.15  
1             450.0      0.15  
2              45.0      0.10  
3              36.0      0.10  
4               2.7      0.05  
5               1.8      0.05  


In [21]:
# 6.- manejo eficiente de datos faltantes
df['discounted_price'].fillna(df['price'], inplace=True)
print("Manejo de datos faltantes: ")
print(df)

# en este dataframe de ejemplo no hay datos faltantes por lo que no cambiará nada

Manejo de datos faltantes: 
      category     product  price  quantity_sold   sale_date  total_sales  \
0  Electronics      Laptop   1000              5  2024-06-01         5000   
1  Electronics  Smartphone    500             10  2024-06-02         5000   
2     Clothing       Shirt     50             15  2024-06-01          750   
3     Clothing       Pants     40              8  2024-06-02          320   
4    Groceries      Apples      3             30  2024-06-01           90   
5    Groceries     Bananas      2             25  2024-06-02           50   

   discounted_price  
0             900.0  
1             450.0  
2              45.0  
3              36.0  
4               2.7  
5               1.8  


In [23]:
# 7.- pivot y unpivot (creación de tablas pivote)

# crear un pivot table que muestre el total de ventas por categoría y fecha de venta
pivot_df = df.pivot_table(index='category', columns='sale_date', values='total_sales', aggfunc='sum')
print(pivot_df)

# volver a un formato largo (unpivot)
melted_df = pivot_df.reset_index().melt(id_vars=['category'], value_vars=['2024-06-01', '2024-06-02'], var_name='sale_date', value_name='total_sales')
print(melted_df)

sale_date    2024-06-01  2024-06-02
category                           
Clothing            750         320
Electronics        5000        5000
Groceries            90          50
      category   sale_date  total_sales
0     Clothing  2024-06-01          750
1  Electronics  2024-06-01         5000
2    Groceries  2024-06-01           90
3     Clothing  2024-06-02          320
4  Electronics  2024-06-02         5000
5    Groceries  2024-06-02           50


##### Ejemplos de uso (en practica)

In [25]:
# libs y dataframe para el ejemplo
import pandas as pd

df = pd.read_csv('../datasets/census.csv')
df.head()

Unnamed: 0,SUMLEV,REGION,DIVISION,STATE,COUNTY,STNAME,CTYNAME,CENSUS2010POP,ESTIMATESBASE2010,POPESTIMATE2010,...,RDOMESTICMIG2011,RDOMESTICMIG2012,RDOMESTICMIG2013,RDOMESTICMIG2014,RDOMESTICMIG2015,RNETMIG2011,RNETMIG2012,RNETMIG2013,RNETMIG2014,RNETMIG2015
0,40,3,6,1,0,Alabama,Alabama,4779736,4780127,4785161,...,0.002295,-0.193196,0.381066,0.582002,-0.467369,1.030015,0.826644,1.383282,1.724718,0.712594
1,50,3,6,1,1,Alabama,Autauga County,54571,54571,54660,...,7.242091,-2.915927,-3.012349,2.265971,-2.530799,7.606016,-2.626146,-2.722002,2.59227,-2.187333
2,50,3,6,1,3,Alabama,Baldwin County,182265,182265,183193,...,14.83296,17.647293,21.845705,19.243287,17.197872,15.844176,18.559627,22.727626,20.317142,18.293499
3,50,3,6,1,5,Alabama,Barbour County,27457,27457,27341,...,-4.728132,-2.50069,-7.056824,-3.904217,-10.543299,-4.874741,-2.758113,-7.167664,-3.978583,-10.543299
4,50,3,6,1,7,Alabama,Bibb County,22915,22919,22861,...,-5.527043,-5.068871,-6.201001,-0.177537,0.177258,-5.088389,-4.363636,-5.403729,0.754533,1.107861


In [26]:
# encadenar métodos para hacer el código más legible
(df.where(df['SUMLEV']==50)
    .dropna()
    .set_index(['STNAME','CTYNAME'])
    .rename(columns={'ESTIMATESBASE2010': 'Estimates Base 2010'}))

Unnamed: 0_level_0,Unnamed: 1_level_0,SUMLEV,REGION,DIVISION,STATE,COUNTY,CENSUS2010POP,Estimates Base 2010,POPESTIMATE2010,POPESTIMATE2011,POPESTIMATE2012,...,RDOMESTICMIG2011,RDOMESTICMIG2012,RDOMESTICMIG2013,RDOMESTICMIG2014,RDOMESTICMIG2015,RNETMIG2011,RNETMIG2012,RNETMIG2013,RNETMIG2014,RNETMIG2015
STNAME,CTYNAME,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
Alabama,Autauga County,50.0,3.0,6.0,1.0,1.0,54571.0,54571.0,54660.0,55253.0,55175.0,...,7.242091,-2.915927,-3.012349,2.265971,-2.530799,7.606016,-2.626146,-2.722002,2.592270,-2.187333
Alabama,Baldwin County,50.0,3.0,6.0,1.0,3.0,182265.0,182265.0,183193.0,186659.0,190396.0,...,14.832960,17.647293,21.845705,19.243287,17.197872,15.844176,18.559627,22.727626,20.317142,18.293499
Alabama,Barbour County,50.0,3.0,6.0,1.0,5.0,27457.0,27457.0,27341.0,27226.0,27159.0,...,-4.728132,-2.500690,-7.056824,-3.904217,-10.543299,-4.874741,-2.758113,-7.167664,-3.978583,-10.543299
Alabama,Bibb County,50.0,3.0,6.0,1.0,7.0,22915.0,22919.0,22861.0,22733.0,22642.0,...,-5.527043,-5.068871,-6.201001,-0.177537,0.177258,-5.088389,-4.363636,-5.403729,0.754533,1.107861
Alabama,Blount County,50.0,3.0,6.0,1.0,9.0,57322.0,57322.0,57373.0,57711.0,57776.0,...,1.807375,-1.177622,-1.748766,-2.062535,-1.369970,1.859511,-0.848580,-1.402476,-1.577232,-0.884411
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Wyoming,Sweetwater County,50.0,4.0,8.0,56.0,37.0,43806.0,43806.0,43593.0,44041.0,45104.0,...,1.072643,16.243199,-5.339774,-14.252889,-14.248864,1.255221,16.243199,-5.295460,-14.075283,-14.070195
Wyoming,Teton County,50.0,4.0,8.0,56.0,39.0,21294.0,21294.0,21297.0,21482.0,21697.0,...,-1.589565,0.972695,19.525929,14.143021,-0.564849,0.654527,2.408578,21.160658,16.308671,1.520747
Wyoming,Uinta County,50.0,4.0,8.0,56.0,41.0,21118.0,21118.0,21102.0,20912.0,20989.0,...,-17.755986,-4.916350,-6.902954,-14.215862,-12.127022,-18.136812,-5.536861,-7.521840,-14.740608,-12.606351
Wyoming,Washakie County,50.0,4.0,8.0,56.0,43.0,8533.0,8533.0,8545.0,8469.0,8443.0,...,-11.637475,-0.827815,-2.013502,-17.781491,1.682288,-11.990126,-1.182592,-2.250385,-18.020168,1.441961


In [27]:
# aplicación de funciones a columnas de manera eficiente

# .apply(lambda x: get_state_region(x)) -> aplicamos la función a todos los valores de la
# columna utilizando lambda

def get_state_region(x):
    northeast = ['Connecticut', 'Maine', 'Massachusetts', 'New Hampshire', 
                 'Rhode Island','Vermont','New York','New Jersey','Pennsylvania']
    midwest = ['Illinois','Indiana','Michigan','Ohio','Wisconsin','Iowa',
               'Kansas','Minnesota','Missouri','Nebraska','North Dakota',
               'South Dakota']
    south = ['Delaware','Florida','Georgia','Maryland','North Carolina',
             'South Carolina','Virginia','District of Columbia','West Virginia',
             'Alabama','Kentucky','Mississippi','Tennessee','Arkansas',
             'Louisiana','Oklahoma','Texas']
    west = ['Arizona','Colorado','Idaho','Montana','Nevada','New Mexico','Utah',
            'Wyoming','Alaska','California','Hawaii','Oregon','Washington']
    
    if x in northeast:
        return "Northeast"
    elif x in midwest:
        return "Midwest"
    elif x in south:
        return "South"
    else:
        return "West"
    
df['state_region'] = df['STNAME'].apply(lambda x: get_state_region(x))
df[['STNAME','state_region']].head()

Unnamed: 0,STNAME,state_region
0,Alabama,South
1,Alabama,South
2,Alabama,South
3,Alabama,South
4,Alabama,South
