# Datos con Pandas II

Utilizar la librería Pandas para filtrar datos. Comprender conceptos de copias de dataframes utilizando el parámetro inplace

In [1]:
# Importamos la librería Pandas. En caso de no estar instalada, ejecutar --> pip install pandas
import pandas as pd

## Dataset a utilizar: Winter olympic medals
Este dataset se encuentra disponible en la web y nos brinda información de los ganadores de medallas olímpicas entre 1924 y 2006

In [2]:
url = 'http://winterolympicsmedals.com/medals.csv'
df = pd.read_csv(url)
df

Unnamed: 0,Year,City,Sport,Discipline,NOC,Event,Event gender,Medal
0,1924,Chamonix,Skating,Figure skating,AUT,individual,M,Silver
1,1924,Chamonix,Skating,Figure skating,AUT,individual,W,Gold
2,1924,Chamonix,Skating,Figure skating,AUT,pairs,X,Gold
3,1924,Chamonix,Bobsleigh,Bobsleigh,BEL,four-man,M,Bronze
4,1924,Chamonix,Ice Hockey,Ice Hockey,CAN,ice hockey,M,Gold
...,...,...,...,...,...,...,...,...
2306,2006,Turin,Skiing,Snowboard,USA,Half-pipe,M,Silver
2307,2006,Turin,Skiing,Snowboard,USA,Half-pipe,W,Gold
2308,2006,Turin,Skiing,Snowboard,USA,Half-pipe,W,Silver
2309,2006,Turin,Skiing,Snowboard,USA,Snowboard Cross,M,Gold


## Algunas funciones útiles para trabajar con dataframes

**1. df.info():** Nos brinda información sobre el tipo de columnas, valores nulos y no nulos, tipos de datos, cantidad de memoria que consume, entre otros.

**2. df.head(N):** Nos muestra las primeras filas del dataframe. N es la cantidad de filas, si se deja en blanco, nos muestra 5 filas.

**3. df.tail(N):** Nos muestra las últimas filas del dataframe. N es la cantidad de filas, si se deja en blanco, nos muestra 5 filas.

**4. df.shape:** Nos devuelve la cantidad de filas y columnas que tiene el dataframe.

**5. df.describe():** Nos devuelve para las **columnas numéricas** (int64, float, etc) parámetros como el mínimo, máximo, media, desvío estándar, entre otros.

**6. df.columns:** Nos devuelve un objeto lista con los nombres de las columnas del dataframe.

**7. df.drop([column_name] / [index], axis = 0/1:** Permite borrar filas o columnas de un dataframe. Si axis = 0 borra según el número de índice del dataframe. Si axis es = 1 borra según los nombres de columnas.

**8. df[column].unique():** Nos devuelve los datos únicos de esa columna del dataframe.

In [3]:
# 1. Información del dataframe
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2311 entries, 0 to 2310
Data columns (total 8 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   Year          2311 non-null   int64 
 1   City          2311 non-null   object
 2   Sport         2311 non-null   object
 3   Discipline    2311 non-null   object
 4   NOC           2311 non-null   object
 5   Event         2311 non-null   object
 6   Event gender  2311 non-null   object
 7   Medal         2311 non-null   object
dtypes: int64(1), object(7)
memory usage: 144.6+ KB


In [4]:
#2. Encabezado del dataframe
df.head()

Unnamed: 0,Year,City,Sport,Discipline,NOC,Event,Event gender,Medal
0,1924,Chamonix,Skating,Figure skating,AUT,individual,M,Silver
1,1924,Chamonix,Skating,Figure skating,AUT,individual,W,Gold
2,1924,Chamonix,Skating,Figure skating,AUT,pairs,X,Gold
3,1924,Chamonix,Bobsleigh,Bobsleigh,BEL,four-man,M,Bronze
4,1924,Chamonix,Ice Hockey,Ice Hockey,CAN,ice hockey,M,Gold


In [5]:
#3. Footter del dataframe
df.tail()

Unnamed: 0,Year,City,Sport,Discipline,NOC,Event,Event gender,Medal
2306,2006,Turin,Skiing,Snowboard,USA,Half-pipe,M,Silver
2307,2006,Turin,Skiing,Snowboard,USA,Half-pipe,W,Gold
2308,2006,Turin,Skiing,Snowboard,USA,Half-pipe,W,Silver
2309,2006,Turin,Skiing,Snowboard,USA,Snowboard Cross,M,Gold
2310,2006,Turin,Skiing,Snowboard,USA,Snowboard Cross,W,Silver


In [6]:
#4. Cantidad de filas y columnas
df.shape

(2311, 8)

In [7]:
#5. Descripción de las variables numéricas
df.describe()

Unnamed: 0,Year
count,2311.0
mean,1980.361748
std,22.089091
min,1924.0
25%,1968.0
50%,1988.0
75%,1998.0
max,2006.0


In [8]:
#6. Columnas
df.columns

Index(['Year', 'City', 'Sport', 'Discipline', 'NOC', 'Event', 'Event gender',
       'Medal'],
      dtype='object')

In [9]:
#7. Drop Columns por índice: Borramos las filas 0 a 4
df.drop([0, 1, 2, 3, 4], axis=0)

Unnamed: 0,Year,City,Sport,Discipline,NOC,Event,Event gender,Medal
5,1924,Chamonix,Biathlon,Biathlon,FIN,military patrol,M,Silver
6,1924,Chamonix,Skating,Figure skating,FIN,pairs,X,Silver
7,1924,Chamonix,Skating,Speed skating,FIN,10000m,M,Gold
8,1924,Chamonix,Skating,Speed skating,FIN,10000m,M,Silver
9,1924,Chamonix,Skating,Speed skating,FIN,1500m,M,Gold
...,...,...,...,...,...,...,...,...
2306,2006,Turin,Skiing,Snowboard,USA,Half-pipe,M,Silver
2307,2006,Turin,Skiing,Snowboard,USA,Half-pipe,W,Gold
2308,2006,Turin,Skiing,Snowboard,USA,Half-pipe,W,Silver
2309,2006,Turin,Skiing,Snowboard,USA,Snowboard Cross,M,Gold


In [42]:
#7. Drop Columns por columnas
df.drop(['Sport', 'NOC'], axis=1)

Unnamed: 0,Year,City,Discipline,Event,Event gender,Medal
1727,1998,Nagano,Cross Country S,4x5km relay,W,Bronze
640,1968,Grenoble,Ski Jumping,K120 individual (90m),M,Bronze
641,1968,Grenoble,Bobsleigh,two-man,M,Bronze
642,1968,Grenoble,Bobsleigh,four-man,M,Bronze
643,1968,Grenoble,Alpine Skiing,downhill,M,Bronze
...,...,...,...,...,...,...
1139,1988,Calgary,Figure skating,individual,M,Silver
1136,1988,Calgary,Nordic Combined,individual,M,Silver
1134,1988,Calgary,Alpine Skiing,super-G,M,Silver
1088,1984,Sarajevo,Ice Hockey,ice hockey,M,Silver


In [45]:
#8. Unique
df['Discipline'].unique()

array(['Cross Country S', 'Ski Jumping', 'Bobsleigh', 'Alpine Skiing',
       'Figure skating', 'Biathlon', 'Speed skating', 'Freestyle Ski.',
       'Curling', 'Skeleton', 'Short Track S.', 'Snowboard', 'Luge',
       'Nordic Combined', 'Ice Hockey'], dtype=object)

# Tratamiento de datos con Pandas

## 1. Crear un subconjunto de datos utilizando columnas

In [11]:
# Definimos las columnas candidatas
columnas = ['Year', 'Event', 'Medal']

# Creamos un subconjunto con esas columnas
sub_set = df[columnas]
sub_set

Unnamed: 0,Year,Event,Medal
0,1924,individual,Silver
1,1924,individual,Gold
2,1924,pairs,Gold
3,1924,four-man,Bronze
4,1924,ice hockey,Gold
...,...,...,...
2306,2006,Half-pipe,Silver
2307,2006,Half-pipe,Gold
2308,2006,Half-pipe,Silver
2309,2006,Snowboard Cross,Gold


## 2. Crear un subconjunto de datos con condiciones

In [12]:
# Ejemplo 1: Queremos los primeros 100 registros

sub_set = df[0:99]
sub_set

Unnamed: 0,Year,City,Sport,Discipline,NOC,Event,Event gender,Medal
0,1924,Chamonix,Skating,Figure skating,AUT,individual,M,Silver
1,1924,Chamonix,Skating,Figure skating,AUT,individual,W,Gold
2,1924,Chamonix,Skating,Figure skating,AUT,pairs,X,Gold
3,1924,Chamonix,Bobsleigh,Bobsleigh,BEL,four-man,M,Bronze
4,1924,Chamonix,Ice Hockey,Ice Hockey,CAN,ice hockey,M,Gold
...,...,...,...,...,...,...,...,...
94,1932,Lake Placid,Skating,Speed skating,CAN,10000m,M,Bronze
95,1932,Lake Placid,Skating,Speed skating,CAN,1500m,M,Bronze
96,1932,Lake Placid,Skating,Speed skating,CAN,1500m,M,Silver
97,1932,Lake Placid,Skating,Speed skating,CAN,5000m,M,Bronze


In [13]:
# Ejemplo 2: Utilizando una condición.

sub_set = df[df['Year']>=1980]
sub_set

Unnamed: 0,Year,City,Sport,Discipline,NOC,Event,Event gender,Medal
896,1980,Lake Placid,Luge,Luge,AUT,doubles,X,Bronze
897,1980,Lake Placid,Skiing,Alpine Skiing,AUT,downhill,M,Gold
898,1980,Lake Placid,Skiing,Alpine Skiing,AUT,downhill,M,Silver
899,1980,Lake Placid,Skiing,Alpine Skiing,AUT,downhill,W,Gold
900,1980,Lake Placid,Skiing,Alpine Skiing,AUT,giant slalom,M,Bronze
...,...,...,...,...,...,...,...,...
2306,2006,Turin,Skiing,Snowboard,USA,Half-pipe,M,Silver
2307,2006,Turin,Skiing,Snowboard,USA,Half-pipe,W,Gold
2308,2006,Turin,Skiing,Snowboard,USA,Half-pipe,W,Silver
2309,2006,Turin,Skiing,Snowboard,USA,Snowboard Cross,M,Gold


In [14]:
# Ejemplo 3: Utilizando mas de una condición
# Operadores de comparación: AND --> &, OR ---> |
sub_set = df[(df['Year']>=1980) & (df['NOC']=='USA')]
sub_set

Unnamed: 0,Year,City,Sport,Discipline,NOC,Event,Event gender,Medal
999,1980,Lake Placid,Ice Hockey,Ice Hockey,USA,ice hockey,M,Gold
1000,1980,Lake Placid,Skating,Figure skating,USA,individual,M,Bronze
1001,1980,Lake Placid,Skating,Figure skating,USA,individual,W,Silver
1002,1980,Lake Placid,Skating,Speed skating,USA,10000m,M,Gold
1003,1980,Lake Placid,Skating,Speed skating,USA,1000m,M,Gold
...,...,...,...,...,...,...,...,...
2306,2006,Turin,Skiing,Snowboard,USA,Half-pipe,M,Silver
2307,2006,Turin,Skiing,Snowboard,USA,Half-pipe,W,Gold
2308,2006,Turin,Skiing,Snowboard,USA,Half-pipe,W,Silver
2309,2006,Turin,Skiing,Snowboard,USA,Snowboard Cross,M,Gold


## 3. Crear un subconjunto de datos con LOC

In [15]:
# Vamos a visualizar las primeras 5 filas del dataframe
df.head(5)

Unnamed: 0,Year,City,Sport,Discipline,NOC,Event,Event gender,Medal
0,1924,Chamonix,Skating,Figure skating,AUT,individual,M,Silver
1,1924,Chamonix,Skating,Figure skating,AUT,individual,W,Gold
2,1924,Chamonix,Skating,Figure skating,AUT,pairs,X,Gold
3,1924,Chamonix,Bobsleigh,Bobsleigh,BEL,four-man,M,Bronze
4,1924,Chamonix,Ice Hockey,Ice Hockey,CAN,ice hockey,M,Gold


El método df.loc[index, column] nos permite filtrar los datos por el índice y también seleccionar las columnas

In [16]:
df.loc[3]

Year                 1924
City             Chamonix
Sport           Bobsleigh
Discipline      Bobsleigh
NOC                   BEL
Event            four-man
Event gender            M
Medal              Bronze
Name: 3, dtype: object

In [17]:
# Si utilizamos doble corchete esta función nos devuelve un dataframe
df.loc[[3]]

Unnamed: 0,Year,City,Sport,Discipline,NOC,Event,Event gender,Medal
3,1924,Chamonix,Bobsleigh,Bobsleigh,BEL,four-man,M,Bronze


In [18]:
# Filas 3 a 5
df.loc[3:5]

Unnamed: 0,Year,City,Sport,Discipline,NOC,Event,Event gender,Medal
3,1924,Chamonix,Bobsleigh,Bobsleigh,BEL,four-man,M,Bronze
4,1924,Chamonix,Ice Hockey,Ice Hockey,CAN,ice hockey,M,Gold
5,1924,Chamonix,Biathlon,Biathlon,FIN,military patrol,M,Silver


In [19]:
# Filas 0, 3, 4 y 5
df.loc[[0, 3, 4, 5]]

Unnamed: 0,Year,City,Sport,Discipline,NOC,Event,Event gender,Medal
0,1924,Chamonix,Skating,Figure skating,AUT,individual,M,Silver
3,1924,Chamonix,Bobsleigh,Bobsleigh,BEL,four-man,M,Bronze
4,1924,Chamonix,Ice Hockey,Ice Hockey,CAN,ice hockey,M,Gold
5,1924,Chamonix,Biathlon,Biathlon,FIN,military patrol,M,Silver


In [20]:
# Filas 0, 1 y 4 + columnas: Year, NOC y Event
df.loc[[0, 1, 4],['Year', 'NOC', 'Event']]

Unnamed: 0,Year,NOC,Event
0,1924,AUT,individual
1,1924,AUT,individual
4,1924,CAN,ice hockey


In [21]:
# Filtramos solo las filas que corresponden a Bélgica
bel = df.loc[:, 'NOC'] == 'BEL'
sub_set_belgica = df.loc[bel]
sub_set_belgica

Unnamed: 0,Year,City,Sport,Discipline,NOC,Event,Event gender,Medal
3,1924,Chamonix,Bobsleigh,Bobsleigh,BEL,four-man,M,Bronze
53,1928,St. Moritz,Skating,Figure skating,BEL,individual,M,Bronze
191,1948,St. Moritz,Bobsleigh,Bobsleigh,BEL,four-man,M,Silver
192,1948,St. Moritz,Skating,Figure skating,BEL,pairs,X,Gold
1638,1998,Nagano,Skating,Speed skating,BEL,5000m,M,Bronze


In [22]:
# Podemos ver que la variable bel es indica si cada una de las filas del dataframe cumplen la condición NOC == 'BEL'
bel

0       False
1       False
2       False
3        True
4       False
        ...  
2306    False
2307    False
2308    False
2309    False
2310    False
Name: NOC, Length: 2311, dtype: bool

## 4. Crear un subconjunto de datos con ILOC
El método iloc se utiliza en los DataFrames para seleccionar los elementos en base a su ubicación. 

Su sintaxis es dataframe.iloc[filas, columnas]

Donde: **filas** y **columnas** son la posición de las filas y columnas que se desean seleccionar en el orden que aparecen en el objeto.

In [23]:
# Filtramos las 3 primeras columnas y todas las filas
df.iloc[:, 0:3]

Unnamed: 0,Year,City,Sport
0,1924,Chamonix,Skating
1,1924,Chamonix,Skating
2,1924,Chamonix,Skating
3,1924,Chamonix,Bobsleigh
4,1924,Chamonix,Ice Hockey
...,...,...,...
2306,2006,Turin,Skiing
2307,2006,Turin,Skiing
2308,2006,Turin,Skiing
2309,2006,Turin,Skiing


In [24]:
# Filtramos las primeras 10 filas y las 3 primeras columnas del dataframe
df.iloc[0:10, 0:3]

Unnamed: 0,Year,City,Sport
0,1924,Chamonix,Skating
1,1924,Chamonix,Skating
2,1924,Chamonix,Skating
3,1924,Chamonix,Bobsleigh
4,1924,Chamonix,Ice Hockey
5,1924,Chamonix,Biathlon
6,1924,Chamonix,Skating
7,1924,Chamonix,Skating
8,1924,Chamonix,Skating
9,1924,Chamonix,Skating


## 5. Copiar dataframes
Muchas veces ocurre que necesitamos copiar nuestras variables para realizar alguna modificación sin alterar la variable original.

Veamos que sucede cuando realizamos esta operación con un dataframe

In [25]:
# Recordemos nuestro dataframe
df

Unnamed: 0,Year,City,Sport,Discipline,NOC,Event,Event gender,Medal
0,1924,Chamonix,Skating,Figure skating,AUT,individual,M,Silver
1,1924,Chamonix,Skating,Figure skating,AUT,individual,W,Gold
2,1924,Chamonix,Skating,Figure skating,AUT,pairs,X,Gold
3,1924,Chamonix,Bobsleigh,Bobsleigh,BEL,four-man,M,Bronze
4,1924,Chamonix,Ice Hockey,Ice Hockey,CAN,ice hockey,M,Gold
...,...,...,...,...,...,...,...,...
2306,2006,Turin,Skiing,Snowboard,USA,Half-pipe,M,Silver
2307,2006,Turin,Skiing,Snowboard,USA,Half-pipe,W,Gold
2308,2006,Turin,Skiing,Snowboard,USA,Half-pipe,W,Silver
2309,2006,Turin,Skiing,Snowboard,USA,Snowboard Cross,M,Gold


In [26]:
# Realizamos una copia
copy_df = df
copy_df

Unnamed: 0,Year,City,Sport,Discipline,NOC,Event,Event gender,Medal
0,1924,Chamonix,Skating,Figure skating,AUT,individual,M,Silver
1,1924,Chamonix,Skating,Figure skating,AUT,individual,W,Gold
2,1924,Chamonix,Skating,Figure skating,AUT,pairs,X,Gold
3,1924,Chamonix,Bobsleigh,Bobsleigh,BEL,four-man,M,Bronze
4,1924,Chamonix,Ice Hockey,Ice Hockey,CAN,ice hockey,M,Gold
...,...,...,...,...,...,...,...,...
2306,2006,Turin,Skiing,Snowboard,USA,Half-pipe,M,Silver
2307,2006,Turin,Skiing,Snowboard,USA,Half-pipe,W,Gold
2308,2006,Turin,Skiing,Snowboard,USA,Half-pipe,W,Silver
2309,2006,Turin,Skiing,Snowboard,USA,Snowboard Cross,M,Gold


In [27]:
# Vamos a editar un registro en copy_df
copy_df['City'][0] = 'Buenos Aires'
copy_df.head(2)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  copy_df['City'][0] = 'Buenos Aires'


Unnamed: 0,Year,City,Sport,Discipline,NOC,Event,Event gender,Medal
0,1924,Buenos Aires,Skating,Figure skating,AUT,individual,M,Silver
1,1924,Chamonix,Skating,Figure skating,AUT,individual,W,Gold


In [28]:
# Ahora veamos el dataframe original!
df.head(2)

Unnamed: 0,Year,City,Sport,Discipline,NOC,Event,Event gender,Medal
0,1924,Buenos Aires,Skating,Figure skating,AUT,individual,M,Silver
1,1924,Chamonix,Skating,Figure skating,AUT,individual,W,Gold


Podemos ver que la actualización se realizó en **ambos dataframes!!**

Para trabajar con copias debemos utilizar el método copy()

In [29]:
# Volvemos a cargar el dataframe original
df = pd.read_csv(url)

# Generamos la copia del dataframe
copy_df = df.copy(deep=True)

In [30]:
# Vamos a editar un registro en copy_df
copy_df['City'][1] = 'Buenos Aires'
copy_df.head(2)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  copy_df['City'][1] = 'Buenos Aires'


Unnamed: 0,Year,City,Sport,Discipline,NOC,Event,Event gender,Medal
0,1924,Chamonix,Skating,Figure skating,AUT,individual,M,Silver
1,1924,Buenos Aires,Skating,Figure skating,AUT,individual,W,Gold


In [31]:
# Visualizamos el dataframe original nuevamente y comprobamos que esta vez no se modificó!
df.head(2)

Unnamed: 0,Year,City,Sport,Discipline,NOC,Event,Event gender,Medal
0,1924,Chamonix,Skating,Figure skating,AUT,individual,M,Silver
1,1924,Chamonix,Skating,Figure skating,AUT,individual,W,Gold


### Salvar el warning: A value is trying to be set on a copy of a slice from a DataFrame

Si prestaron atención, cuando editamos la posición 1 de la columna City en cualquiera de los dos datafranes nos apareció un warning. No obstante, la operación se realizó sin inconvenientes.

Para asignar correctamente un valor en una posición específica del dataframe, debemos utilizar el método loc.

In [32]:
# Utilizando la asignación directa reproducimos el warning.
copy_df['Year'][0]=2022

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  copy_df['Year'][0]=2022


In [33]:
copy_df.head(2)

Unnamed: 0,Year,City,Sport,Discipline,NOC,Event,Event gender,Medal
0,2022,Chamonix,Skating,Figure skating,AUT,individual,M,Silver
1,1924,Buenos Aires,Skating,Figure skating,AUT,individual,W,Gold


In [34]:
# Utilizando el método loc vemos que se ha asignado también el nuevo valor, pero evitando el warning.
copy_df.loc[[0],['Year']] = 2030

In [35]:
copy_df.head(2)

Unnamed: 0,Year,City,Sport,Discipline,NOC,Event,Event gender,Medal
0,2030,Chamonix,Skating,Figure skating,AUT,individual,M,Silver
1,1924,Buenos Aires,Skating,Figure skating,AUT,individual,W,Gold


## 6. Modificar un dataframe utilizando el parámetro inplace

Muchas veces necesitamos modificar un dataframe de acuerdo a necesidades específicas, como por ejemplo:
* Borrar columnas o filas
* Rellenar valores nulos o modificar valores extremos
* Ordenar los datos de forma ascendente o descendente.
* Etc...

In [36]:
# Ejemplo: Necesitamos ordenar al dataframe según el tipo de medalla
df.sort_values('Medal', ascending=True)

Unnamed: 0,Year,City,Sport,Discipline,NOC,Event,Event gender,Medal
1727,1998,Nagano,Skiing,Cross Country S,ITA,4x5km relay,W,Bronze
640,1968,Grenoble,Skiing,Ski Jumping,NOR,K120 individual (90m),M,Bronze
641,1968,Grenoble,Bobsleigh,Bobsleigh,ROU,two-man,M,Bronze
642,1968,Grenoble,Bobsleigh,Bobsleigh,SUI,four-man,M,Bronze
643,1968,Grenoble,Skiing,Alpine Skiing,SUI,downhill,M,Bronze
...,...,...,...,...,...,...,...,...
1139,1988,Calgary,Skating,Figure skating,CAN,individual,M,Silver
1136,1988,Calgary,Skiing,Nordic Combined,AUT,individual,M,Silver
1134,1988,Calgary,Skiing,Alpine Skiing,AUT,super-G,M,Silver
1088,1984,Sarajevo,Ice Hockey,Ice Hockey,TCH,ice hockey,M,Silver


In [37]:
# No obstante si imprimimos el df de nuevo veremos que no se almacenó el orden de las filas como lo necesitabamos
df

Unnamed: 0,Year,City,Sport,Discipline,NOC,Event,Event gender,Medal
0,1924,Chamonix,Skating,Figure skating,AUT,individual,M,Silver
1,1924,Chamonix,Skating,Figure skating,AUT,individual,W,Gold
2,1924,Chamonix,Skating,Figure skating,AUT,pairs,X,Gold
3,1924,Chamonix,Bobsleigh,Bobsleigh,BEL,four-man,M,Bronze
4,1924,Chamonix,Ice Hockey,Ice Hockey,CAN,ice hockey,M,Gold
...,...,...,...,...,...,...,...,...
2306,2006,Turin,Skiing,Snowboard,USA,Half-pipe,M,Silver
2307,2006,Turin,Skiing,Snowboard,USA,Half-pipe,W,Gold
2308,2006,Turin,Skiing,Snowboard,USA,Half-pipe,W,Silver
2309,2006,Turin,Skiing,Snowboard,USA,Snowboard Cross,M,Gold


In [38]:
# Una alternativa podría ser manejar una copia del dataframe donde los valores esten ordenados
copy = df.sort_values('Medal', ascending=True).copy()

In [39]:
copy

Unnamed: 0,Year,City,Sport,Discipline,NOC,Event,Event gender,Medal
1727,1998,Nagano,Skiing,Cross Country S,ITA,4x5km relay,W,Bronze
640,1968,Grenoble,Skiing,Ski Jumping,NOR,K120 individual (90m),M,Bronze
641,1968,Grenoble,Bobsleigh,Bobsleigh,ROU,two-man,M,Bronze
642,1968,Grenoble,Bobsleigh,Bobsleigh,SUI,four-man,M,Bronze
643,1968,Grenoble,Skiing,Alpine Skiing,SUI,downhill,M,Bronze
...,...,...,...,...,...,...,...,...
1139,1988,Calgary,Skating,Figure skating,CAN,individual,M,Silver
1136,1988,Calgary,Skiing,Nordic Combined,AUT,individual,M,Silver
1134,1988,Calgary,Skiing,Alpine Skiing,AUT,super-G,M,Silver
1088,1984,Sarajevo,Ice Hockey,Ice Hockey,TCH,ice hockey,M,Silver


**No obstante puede ser que ordenar las medallas sea la primera de muchas tareas que vamos a realizar en el dataframe.**

Entonces, para mantener los cambios en el dataframe original incorporamos el parámetro **inplace = True.**

In [40]:
# Probamos nuevamente
df.sort_values('Medal', ascending=True, inplace=True)

In [41]:
# Imprimimos nuevamente el dataframe y ahora vemos que está ordenado!
df

Unnamed: 0,Year,City,Sport,Discipline,NOC,Event,Event gender,Medal
1727,1998,Nagano,Skiing,Cross Country S,ITA,4x5km relay,W,Bronze
640,1968,Grenoble,Skiing,Ski Jumping,NOR,K120 individual (90m),M,Bronze
641,1968,Grenoble,Bobsleigh,Bobsleigh,ROU,two-man,M,Bronze
642,1968,Grenoble,Bobsleigh,Bobsleigh,SUI,four-man,M,Bronze
643,1968,Grenoble,Skiing,Alpine Skiing,SUI,downhill,M,Bronze
...,...,...,...,...,...,...,...,...
1139,1988,Calgary,Skating,Figure skating,CAN,individual,M,Silver
1136,1988,Calgary,Skiing,Nordic Combined,AUT,individual,M,Silver
1134,1988,Calgary,Skiing,Alpine Skiing,AUT,super-G,M,Silver
1088,1984,Sarajevo,Ice Hockey,Ice Hockey,TCH,ice hockey,M,Silver


### El parámetro inplace = True se suele utilizar con los siguiente métodos de Pandas:

* dropna()
* drop_duplicates()
* fillna()
* query()
* rename()
* reset_index()
* sort_index()
* sort_values()