# Caso 1: Como la producción global de energía tiende a cambiar en el tiempo

<h3> Propósito </h3>

Para este caso se espera poder mejorar las habilidades en el uso de panda para usar funciones en series, así como manejar grandes datasets. Algunos de los métodos/funciones a utilizar son:

* `drop_duplicates()`
* `apply()`
* `value_counts()`
* `reset_index()`
* `fillna()`

### Contexto

La producción, el consumo, la importación y la exportación de electricidad a nivel mundial es compleja e interesante por diversas razones. Cada país debe realizar un seguimiento de una amplia gama de información para asegurarse de que producen suficiente electricidad, así como equilibrar estas necesidades con las implicaciones financieras a mediano plazo y las consecuencias ambientales.

Usted es un analista que trabaja en una organización no gubernamental (ONG) que informa sobre las tendencias energéticas globales. Su departamento ha obtenido un archivo CSV de gran tamaño, pero sus compañeros están intentando extraer información relevante de él utilizando Excel debido a su tamaño y formato. Para empeorar la situación, tiene miles de variables y no están seguros de cuáles son relevantes. Por lo tanto, se le ha hecho responsable de apoyar a su equipo proporcionándoles datos y conocimientos que pueden convertir en informes escritos.

La tarea consiste en responder las siguientes preguntas:

1. ¿Cuánta energía se produce?
2. ¿Cuánta energía se consume?
3. ¿Cuánta energía se importa y exporta?
4. ¿Cuánto de esta energía es renovable?

**Los datos necesarios para responder estas preguntas están almacenados en el archivo: all_energy_statistics.csv tomados del sitio http://data.un.org/Explorer.aspx**

In [1]:
# Importar librerías

import pandas as pd

In [2]:
# Leer conjuntos de datos 

df = pd.read_csv("data/all_energy_statistics.csv")

In [3]:
df["unit"].unique() # No son comparables las filas por que son de diferentes unidades

array(['Metric tons,  thousand', 'Terajoules', 'Kilowatts,  thousand',
       'Kilowatt-hours, million', 'Cubic metres, thousand', 'Metric Tons'],
      dtype=object)

En un primer ejerccio se puede explorar la cantidad de paises o regiones que se incluyen en el conjunto de datos, asemás de las categorías. Para esto se hace uso de la función drop_duplicates(), el cual elimina las valores repetidos en la serie que se aplique

In [4]:
# country_or_area
df["country_or_area"].drop_duplicates()

0                                          Austria
2                                          Belgium
8                                          Czechia
10                                         Finland
26                                          France
                            ...                   
212765                                      Tuvalu
212938                    United States Virgin Is.
213088                       Wallis and Futuna Is.
362966    Commonwealth of Independent States (CIS)
399113                         Antarctic Fisheries
Name: country_or_area, Length: 243, dtype: object

In [5]:
# category
df["category"].drop_duplicates()

0                                   additives_and_oxygenates
3018                                            animal_waste
4940                                              anthracite
9834                                       aviation_gasoline
28005                                                bagasse
                                 ...                        
1037653                                    total_electricity
1171569                                total_refinery_output
1177352                                              uranium
1178036    white_spirit_and_special_boiling_point_industr...
1188115                                     wind_electricity
Name: category, Length: 71, dtype: object

Se evidencia que se cuentan con 243 regiones/países y con 71 categorías únicas.También es posible definir la ventana de tiempo para los datos registrados (1990-2014)

In [6]:
print(df["year"].min())
print(df["year"].max())

1990
2014


### Uso de funciones para procesar texto

La columna de commodity_transaction puede resultar algo caótica al usar letras en mayúsculas y símbolos, por lo que se hace necesario establecer un estándar para realizar tareas como: buscar las filas con la plabra producition. 

Una primera manera de estandarizar es pasar todo el texto a minúsculas mediante el método lower sobre series de tipo str. Cunado se llama str.lower() sobre una Serie, el resultado es otra serie en la cual todos sus valores están en minúscula

In [7]:
df["commodity_transaction"].str.lower()

0           additives and oxygenates - exports
1           additives and oxygenates - exports
2           additives and oxygenates - exports
3           additives and oxygenates - exports
4           additives and oxygenates - exports
                          ...                 
1189477    electricity - total wind production
1189478    electricity - total wind production
1189479    electricity - total wind production
1189480    electricity - total wind production
1189481    electricity - total wind production
Name: commodity_transaction, Length: 1189482, dtype: object

En la columna de commodity_transaction se podía ver que habían símbolos como - para separar las palabras. Mediante pandas es posible contar el número de veces que se repite este símbolo en cada una de las diferentes categorías 

In [8]:
# Primero se eliminan las filas repetidas, se cuentas cuantas veces se reipte el símbolo - y se muestra la frecuencia en que 
# aparecen

df["commodity_transaction"].drop_duplicates().str.count("-").value_counts()


1    1845
2     538
0      57
3      12
Name: commodity_transaction, dtype: int64

## Ejercicio 1

Filtre el conjunto de datos a aquellas descripciones que no contienen el símbolo -

In [66]:
df["commodity_transaction"][df["commodity_transaction"].str.count("-")==0].drop_duplicates().head()

533715                 From chemical sources – Autoproducer
533859    From chemical sources – Autoproducer – CHP plants
533920    From chemical sources – Autoproducer – Heat pl...
534028                From combustible fuels – Autoproducer
534664    From combustible fuels – Autoproducer – CHP pl...
Name: commodity_transaction, dtype: object

A primera vista se podría pensar que no se cumple la condición, sin embargo, en estas descripciones se hacen uso tanto de guiones "-" (hyphen) y guiones extendidos "—" (dash). Para lidiar con este problema se podría reemplazar todo por guiones con la función str.replace()

In [12]:
df["clean_transaction"] = df["commodity_transaction"].str.lower().str.replace("—", "-")
df["clean_transaction"].unique()

array(['additives and oxygenates - exports',
       'additives and oxygenates - imports',
       'additives and oxygenates - production', ...,
       'white spirit and special boiling point industrial spirits - transformation',
       'white spirit and special boiling point industrial spirits - transformation in petrochemical plants',
       'electricity - total wind production'], dtype=object)

### Selección de filas de interés

Es posible para este conjunto de datos realizar filtros según las palabras que hayan presentes en las descripciones de acuerdo a nuestro intereses. Por ejemplo, se filtran aquellas descripciones donde este contenida la palabra import

In [15]:
df["clean_transaction"][df["clean_transaction"].str.contains("import")].drop_duplicates()

1108326    electricity - imports
Name: clean_transaction, dtype: object

In [16]:
df.head()

Unnamed: 0,country_or_area,commodity_transaction,year,unit,quantity,quantity_footnotes,category,clean_transaction
0,Austria,Additives and Oxygenates - Exports,1996,"Metric tons, thousand",5.0,,additives_and_oxygenates,additives and oxygenates - exports
1,Austria,Additives and Oxygenates - Exports,1995,"Metric tons, thousand",17.0,,additives_and_oxygenates,additives and oxygenates - exports
2,Belgium,Additives and Oxygenates - Exports,2014,"Metric tons, thousand",0.0,,additives_and_oxygenates,additives and oxygenates - exports
3,Belgium,Additives and Oxygenates - Exports,2013,"Metric tons, thousand",0.0,,additives_and_oxygenates,additives and oxygenates - exports
4,Belgium,Additives and Oxygenates - Exports,2012,"Metric tons, thousand",35.0,,additives_and_oxygenates,additives and oxygenates - exports


Si bien resulta útil poder filtrar por filas, también es de interés poder estudiar con mayor detalle los valores de energías renovables para la columna de `commodity_transaction`

In [23]:
# Se crea una lista de energías renovables que son de interés para el estudio

keep_values =  [
        "Electricity - Gross demand",
        "Electricity - Gross production",
        "Electricity - imports",
        "Electricity - exports",
        "Electricity - total hydro production",
        "Electricity - total wind production",
        "Electricity - total solar production",
        "Electricity - total geothermal production",
        "Electricity - total tide, wave production"]

# Filtrar solamente para energías renovables

df_filtered = df[df["commodity_transaction"].isin(keep_values)]

# Se realiza un pivote sobre la columna commodity, es decir, los valores que era filas lo pasa a columna manteniendo solo el
# valor de quantity

df_countries = pd.pivot_table(
    df_filtered,
    values="quantity",
    index=["country_or_area", "year"],
    columns="commodity_transaction",
)

df_countries.head(3)

Unnamed: 0_level_0,commodity_transaction,Electricity - Gross demand,Electricity - Gross production,Electricity - exports,Electricity - imports,Electricity - total geothermal production,Electricity - total hydro production,Electricity - total solar production,"Electricity - total tide, wave production",Electricity - total wind production
country_or_area,year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Afghanistan,1990,1055.0,1128.0,,,,764.0,,,
Afghanistan,1991,945.0,1015.0,,,,690.0,,,
Afghanistan,1992,789.0,703.0,,131.0,,478.0,,,


In [24]:
# Modificar el nombre de las columnas

df_countries.columns = ["demand","production","exports","imports","geothermal","hydro","solar","tide","wind"]

df_countries

Unnamed: 0_level_0,Unnamed: 1_level_0,demand,production,exports,imports,geothermal,hydro,solar,tide,wind
country_or_area,year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Afghanistan,1990,1055.0,1128.0,,,,764.0,,,
Afghanistan,1991,945.0,1015.0,,,,690.0,,,
Afghanistan,1992,789.0,703.0,,131.0,,478.0,,,
Afghanistan,1993,780.0,695.0,,130.0,,475.0,,,
Afghanistan,1994,770.0,687.0,,128.0,,472.0,,,
...,...,...,...,...,...,...,...,...,...,...
Zimbabwe,2010,9317.3,8602.9,694.4,1681.7,,5762.8,,,
Zimbabwe,2011,9645.5,9177.2,988.2,1578.7,,5201.8,,,
Zimbabwe,2012,9425.2,9148.6,700.9,1076.1,,5387.3,,,
Zimbabwe,2013,9919.7,9498.8,1189.3,1722.0,,4981.8,,,


Posiblemente deseamos conocer por orden cuales han sido los países con mayor producción. Esta tarea es posible con el método `sort_values()` tomando como argumento un by 

In [25]:
df_countries = df_countries.sort_values(by="production", ascending=False) # Ordena de manera descendente por producción
df_countries.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,demand,production,exports,imports,geothermal,hydro,solar,tide,wind
country_or_area,year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
China,2014,5219096.0,5649583.4,18158.0,6750.0,,1064337.0,15189.0,,156078.0
China,2013,5016127.0,5431637.4,18669.0,7438.0,,920291.0,5564.0,,141197.0
China,2012,4609729.0,4987553.0,17653.0,6874.0,,872107.0,,,95978.0
China,2011,4319132.0,4713019.0,19307.0,6562.0,,698945.0,,,70331.0
United States,2010,4153664.0,4378422.0,19107.0,45083.0,17577.0,286333.0,3934.0,,95148.0


En caso de desear dejar los dos niveles de los índices ("country_or_area", "year") se usa el método `reset_index()`

In [26]:
df_countries = df_countries.reset_index()
df_countries.head()

Unnamed: 0,country_or_area,year,demand,production,exports,imports,geothermal,hydro,solar,tide,wind
0,China,2014,5219096.0,5649583.4,18158.0,6750.0,,1064337.0,15189.0,,156078.0
1,China,2013,5016127.0,5431637.4,18669.0,7438.0,,920291.0,5564.0,,141197.0
2,China,2012,4609729.0,4987553.0,17653.0,6874.0,,872107.0,,,95978.0
3,China,2011,4319132.0,4713019.0,19307.0,6562.0,,698945.0,,,70331.0
4,United States,2010,4153664.0,4378422.0,19107.0,45083.0,17577.0,286333.0,3934.0,,95148.0


Este nuevo dataframe resulta ser más amigable para su uso al contener información específica para los países por año sobre la variable "quantity"

In [29]:
df_countries["year"].value_counts() # Alguno países dejaron de reportar información, posiblemente dejaron de existir

2012    229
2013    229
2014    229
2011    226
2010    226
2009    226
2007    226
2008    226
2006    225
2005    225
2002    224
2004    224
2003    224
2001    221
1994    220
1996    220
1995    220
1999    220
1997    220
2000    220
1998    220
1992    219
1993    219
1991    197
1990    197
Name: year, dtype: int64

## Exploración del crecimiento de la producción de energías y energías renovables

Hasta el momento se ha filtrado la cantidad total de energía renovable producida por cada paísy cada una de sus fuentes. Ahora se va a buscar agregar información adicional en búsqueda de patrones interesantes para nuestro análisis

### Ejercicio 2

Reemplazar los valores perdidos en el DataFrame filtrado con valores de 0

In [33]:
df_countries = df_countries.fillna(0)

Se va a agregar información sobre la cantidad total de energía renovable producida y su porcentaje. Esta última se refiere a el procentaje de energía total asociado a la renovable

In [34]:
# Se usa el método .sum() con el argumento axis="columns" para realizar sumas horizontales, y no verticales (por defecto)

df_countries["renewable_total"] = df_countries[["hydro", "wind", "solar", "geothermal", "tide"]].sum(axis="columns")
df_countries["renewable_percent"] = df_countries["renewable_total"] / df_countries['production']
df_countries

Unnamed: 0,country_or_area,year,demand,production,exports,imports,geothermal,hydro,solar,tide,wind,renewable_total,renewable_percent
0,China,2014,5219096.0,5649583.4,18158.0,6750.0,0.0,1064337.0,15189.0,0.0,156078.0,1235604.0,0.218707
1,China,2013,5016127.0,5431637.4,18669.0,7438.0,0.0,920291.0,5564.0,0.0,141197.0,1067052.0,0.196451
2,China,2012,4609729.0,4987553.0,17653.0,6874.0,0.0,872107.0,0.0,0.0,95978.0,968085.0,0.194100
3,China,2011,4319132.0,4713019.0,19307.0,6562.0,0.0,698945.0,0.0,0.0,70331.0,769276.0,0.163224
4,United States,2010,4153664.0,4378422.0,19107.0,45083.0,17577.0,286333.0,3934.0,0.0,95148.0,402992.0,0.092040
...,...,...,...,...,...,...,...,...,...,...,...,...,...
5527,Lesotho,1994,310.0,0.0,0.0,310.0,0.0,0.0,0.0,0.0,0.0,0.0,
5528,Lesotho,1995,324.0,0.0,0.0,324.0,0.0,0.0,0.0,0.0,0.0,0.0,
5529,Lesotho,1996,335.0,0.0,0.0,335.0,0.0,0.0,0.0,0.0,0.0,0.0,
5530,Lesotho,1997,395.0,0.0,0.0,395.0,0.0,0.0,0.0,0.0,0.0,0.0,


### Ejercicio 3

Considerando solamente el años más reciente (2014), cuáles son los 5 países con la mayor proporción de producción y los 5 países con la menor proporción

In [38]:
# Top 5
df_countries[df_countries["year"]==2014].sort_values(by="renewable_percent", ascending=False).head(5)

Unnamed: 0,country_or_area,year,demand,production,exports,imports,geothermal,hydro,solar,tide,wind,renewable_total,renewable_percent
2655,Albania,2014,7791.43,4724.43,183.45,3250.45,0.0,4724.43,0.0,0.0,0.0,4724.43,1.0
3924,Lesotho,2014,783.48,515.2,2.92,271.2,0.0,515.2,0.0,0.0,0.0,515.2,1.0
2357,Bhutan,2014,2085.46,7003.86,4991.9,187.37,0.0,7003.36,0.0,0.0,0.0,7003.36,0.999929
1008,Paraguay,2014,13432.0,55282.3,41400.1,0.0,0.0,55276.4,0.0,0.0,0.0,55276.4,0.999893
1704,Iceland,2014,17475.0,18122.0,0.0,0.0,5238.0,12873.0,0.0,0.0,8.0,18119.0,0.999834


In [39]:
# Últimos 5
df_countries[df_countries["year"]==2014].sort_values(by="renewable_percent").head(5)

Unnamed: 0,country_or_area,year,demand,production,exports,imports,geothermal,hydro,solar,tide,wind,renewable_total,renewable_percent
4368,Chad,2014,206.0,225.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2105,Trinidad and Tobago,2014,9531.0,9891.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4020,Northern Mariana Islands,2014,418.9,418.9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4434,Grenada,2014,194.0,199.9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4044,Djibouti,2014,396.0,402.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Ejercicio 4

Repetir el análisis anterior pero ahora para aquellos que están por encima del percentil 90 y los que están debajo del percentil 10

In [46]:
umbral = df_countries["production"].quantile(0.9)
df_countries[(df["year"]==2014) & (df_countries["production"]>umbral)].sort_values(by="renewable_percent", 
                                                                                   ascending=False).head(5)

  df_countries[(df["year"]==2014) & (df_countries["production"]>umbral)].sort_values(by="renewable_percent",


Unnamed: 0,country_or_area,year,demand,production,exports,imports,geothermal,hydro,solar,tide,wind,renewable_total,renewable_percent
221,Brazil,2004,421359.0,387451.0,7.0,37392.0,0.0,320797.0,0.0,0.0,0.0,320797.0,0.827968
187,Brazil,2010,544762.0,515798.0,1257.0,35906.0,0.0,403289.0,0.0,0.0,2177.0,405466.0,0.786095
166,Canada,1994,494576.0,555784.0,50919.0,7005.0,0.0,329945.0,3.0,33.0,59.0,330040.0,0.593828
528,Sweden,2006,144307.0,143419.0,11497.0,17537.0,0.0,61856.0,2.0,0.0,987.0,62845.0,0.438192
381,Turkey,2010,202272.0,211208.0,1918.0,1144.0,668.0,51796.0,0.0,0.0,2916.0,55380.0,0.262206


Al obtener estos resultados surge la pregunta si sería posible evaluar como ha cambiado el comportamiento a lo largo del tiempo del porcentaje de energía renovable. Para esta tarea se puede hacer de nuevo uso de la función pivot() y dejar como columnas los diferentes años

In [50]:
cambios_renovables = pd.pivot_table(df_countries, values="renewable_percent", columns="year", index=["country_or_area"])
cambios_renovables = cambios_renovables.reset_index()
cambios_renovables[["country_or_area", 2014, 1990]]

year,country_or_area,2014,1990
0,Afghanistan,0.853235,0.677305
1,Albania,1.000000,0.876134
2,Algeria,0.003954,0.008383
3,American Samoa,0.007009,0.000000
4,Andorra,0.894322,1.000000
...,...,...,...
236,Yemen Arab Rep. (former),,0.000000
237,"Yemen, Dem. (former)",,0.000000
238,"Yugoslavia, SFR (former)",,0.242374
239,Zambia,0.971630,0.994853


In [52]:
# Se agrega la diferencia entre el inicio y el final

cambios_renovables["diff"] = cambios_renovables[2014] - cambios_renovables[1990]
cambios_renovables.sort_values(by="diff", ascending=False).head(5)

year,country_or_area,1990,1991,1992,1993,1994,1995,1996,1997,1998,...,2006,2007,2008,2009,2010,2011,2012,2013,2014,diff
86,Greenland,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.511877,0.524299,0.520495,0.462766,0.555153,0.597898,0.600357,0.664492,0.683475,0.683475
185,Sierra Leone,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.101942,0.3,0.690148,0.666609,0.873821,0.873776,0.660995,0.641097,0.653569,0.653569
75,French Guiana,0.0,0.0,0.0,0.0,0.215247,0.288889,0.422222,0.488889,0.116998,...,0.734637,0.671855,0.650307,0.60355,0.538991,0.572738,0.695652,0.604757,0.605495,0.605495
20,Belize,0.0,0.0,0.0,0.0,0.0,0.27027,0.401316,0.413174,0.415094,...,0.847718,0.822754,0.947476,0.792914,0.63381,0.629416,0.49194,0.55813,0.507055,0.507055
58,Denmark,0.024555,0.020933,0.030679,0.031259,0.028833,0.032836,0.023254,0.044071,0.069248,...,0.134463,0.183157,0.189999,0.185361,0.201637,0.27835,0.338458,0.33527,0.42538,0.400824


In [54]:
# Se repite el análisis pero ahora con los países top en la producción de energías renovables

threshold = df_countries.production.quantile(0.9)
df_countries_large = df_countries[df_countries.production > threshold]

renewable_change = pd.pivot_table(
    df_countries_large, values="renewable_percent", index=["country_or_area"], columns="year",
).reset_index()[["country_or_area", 1990, 2014]]

renewable_change["diff"] = renewable_change[2014] - renewable_change[1990]
renewable_change.sort_values(by="diff", ascending=False).head(5)

year,country_or_area,1990,2014,diff
23,Spain,0.172486,0.389797,0.217312
12,Italy,0.176856,0.370429,0.193573
29,United Kingdom,0.022512,0.132286,0.109774
19,Poland,0.024305,0.065491,0.041187
1,Australia,0.095988,0.135047,0.039059


In [55]:
renewable_change.sort_values(by="diff").head(5)

year,country_or_area,1990,2014,diff
2,Brazil,0.927691,0.653072,-0.274618
9,India,0.247679,0.123335,-0.124344
24,Sweden,0.498512,0.48908,-0.009432
13,Japan,0.115881,0.114409,-0.001472
3,Canada,0.615727,0.620037,0.00431


### Grandes importadores y exportadores de energía

Se pretende ahora conocer los países con mayor exportación e importación de energía. Responda:

1. ¿Cuáles son los países con las mayores exportaciones e importaciones en total?
2. ¿Cuál es el porcentaje de la producción que es exportado?
3. ¿Qué porcentaje de la demanda de energía es importada?

1. ¿Cuáles son los países con las mayores exportaciones e importaciones en total?

In [58]:
# Top de importaciones

df_countries_2014 = df_countries[df_countries["year"]==2014]
top_importers = df_countries_2014.sort_values(by="imports", ascending=False)
top_importers[["country_or_area","imports"]].head(10)

Unnamed: 0,country_or_area,imports
8,United States,66511.0
287,Italy,46747.0
122,Germany,40435.0
138,Brazil,33778.0
658,Netherlands,32855.0
818,Switzerland,28530.0
882,Austria,26712.0
241,United Kingdom,23244.0
810,Belgium,21791.0
854,Finland,21622.0


In [59]:
# Top de exportaciones

top_exporters = df_countries_2014.sort_values(by="exports", ascending=False)
top_exporters[["country_or_area","exports"]].head(10)

Unnamed: 0,country_or_area,exports
158,France,75063.0
122,Germany,74320.0
111,Canada,58421.0
1008,Paraguay,41400.1
818,Switzerland,34021.0
493,Sweden,29475.0
723,Czechia,28142.0
536,Norway,21932.0
0,China,18158.0
658,Netherlands,18128.0


2. ¿Cuál es el porcentaje de la producción que es exportado?

In [61]:
top_exporters["prod_export"] = top_exporters["exports"] / top_exporters["production"]
top_exporters[["country_or_area","prod_export"]].head(10)

Unnamed: 0,country_or_area,prod_export
158,France,0.13338
122,Germany,0.118383
111,Canada,0.089026
1008,Paraguay,0.748885
818,Switzerland,0.474048
493,Sweden,0.191817
723,Czechia,0.327141
536,Norway,0.154096
0,China,0.003214
658,Netherlands,0.175289


3. ¿Qué porcentaje de la demanda de energía es importada?

In [65]:
top_importers["demand_import"] = top_importers["imports"] / top_importers["demand"]
top_importers[["country_or_area","demand_import"]].sort_values(by="demand_import", ascending=False).head(10)

Unnamed: 0,country_or_area,demand_import
2920,Luxembourg,1.098643
4145,State of Palestine,0.936188
5186,Liechtenstein,0.897799
3833,"China, Macao SAR",0.872499
4474,Benin,0.856809
4641,Jersey,0.834362
3531,Afghanistan,0.827048
3603,Republic of Moldova,0.799904
2693,Lithuania,0.787232
4670,Andorra,0.766052


## Uso de apply() para aplicar funciones a filas y columnas

apply() es una función de pandas que permite aplicar a todas las filas o columnas una determinada función sin necesidad de acceder a los elementos del dataframe de forma explícita, similar al método sum()

### Ejemplo

Se definen las siguientes condiciones para poner etiquetas a los paises según sus exportaciones

* Si un país exporta menos de 500 unidades al años, se etiqueta como: menos de 500
* Si un país exporta entre 500 y 5.000 unidades al años, se etiqueta como: entre de 500 y 5.000
* Si un país exporta entre 5.000 y 50.000 unidades al años, se etiqueta como: entre de 5.000 y 50.000
* Si un país exporta más de 50.000 unidades al años, se etiqueta como: más de 50000

In [67]:
def assign_label(value):
    """Asignar etiquetas al país de acuerdo a sus reportes de exportaciones
    """
    if value < 500:
        label = "Less than 500"
    elif value < 5000:
        label = "Between 500 and 5,000"
    elif value < 50000:
        label = "Between 5,000 and 50,000"
    else:
        label = "More than 50,000"
    return label

In [68]:
df_countries_2014 = df_countries[df_countries["year"]==2014]

In [69]:
df_countries_2014["exports"].apply(assign_label)

0       Between 5,000 and 50,000
8       Between 5,000 and 50,000
42                 Less than 500
58      Between 5,000 and 50,000
65                 Less than 500
                  ...           
5372               Less than 500
5380               Less than 500
5441               Less than 500
5474               Less than 500
5489               Less than 500
Name: exports, Length: 229, dtype: object

## Comentarios finales

En este caso se cubrieron algunas funciones básicas de pandas, específicamente:

1. Trabajar con conjuntos de datos numerosos y explorar algunas de las columnas o variables que eran de interés
2. Hacer pivot entre filas y columnas
3. Usar la función apply para construir y aplicar funciones