# Introducción a Pandas

<div style="text-align: justify; font-size: 1.1rem;">
El presente Notebook abordará conceptos básicos del uso de Pandas en Python para análisis y ciencia de datos.

Todo el material aquí recopilado corresponde al curso introductorio de Data Science de YouTube de Frank Andrade.

Link al curso:
<a href="https://www.youtube.com/watch?v=zAIWnwqHGok" target="_blank">Curso de Data Science en Python Desde Cero</a>

Por otro lado, vale aclarar que las imágenes con contenido teórico sinóptico fueron tomadas del propio curso de Frank Andrade

</div>

In [1]:
import pandas as pd
import numpy as np

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


## Pandas vs Excel y Terminología

![Image 1](./images/image1.png)

![Image 2](./images/image2.png)

![Image 3](./images/image3.png)

![Image 4](./images/image4.png)

![Image 5](./images/image5.png)

## Crear DataFrames

### Resumen teórico

![Image 7](./images/image7.png)

![Image 8](./images/image8.png)

![Image 9](./images/image9.png)

### Con arrays

In [2]:
def createIdxAndFeaturesNames(rowNum, colNum):
    indexes = []
    features = []
    for col in range(1, colNum+1, 1):
        features.append("Col" + str(col))
    for row in range(1, rowNum+1, 1):
        indexes.append("Id" + str(row))
    return (indexes, features)

#### Lista

In [3]:
# creando el array
data_list = [
    [1, 4], # fila 1
    [2, 5], # fila 2
    [3, 4] # fila 3
]

In [4]:
# creando el dataframe
list_cols = len(data_list[0])
list_rows = len(data_list)

indexes, features = createIdxAndFeaturesNames(list_rows, list_cols)

# si no aportamos nombres a indices y columnas, pandas los asignara con numeros a partir del 0 en adelante automaticamente
df_list = pd.DataFrame(data_list, index=indexes, columns=features)

In [5]:
# mostrando el dataframe
df_list

Unnamed: 0,Col1,Col2
Id1,1,4
Id2,2,5
Id3,3,4


#### Numpy

In [6]:
# creando el array (tener en cuenta que en numpy solo podemos crear array con numeros)
data = np.array([
    [1, 4], # fila 1
    [2, 5], # fila 2
    [3, 4] # fila 3
])

In [7]:
# creando el dataframe
rows, cols = data.shape

indexes, features = createIdxAndFeaturesNames(rows, cols)

# si no aportamos nombres a indices y columnas, pandas los asignara con numeros a partir del 0 en adelante automaticamente
df = pd.DataFrame(data, index=indexes, columns=features)

In [8]:
# mostrando el dataframe
df

Unnamed: 0,Col1,Col2
Id1,1,4
Id2,2,5
Id3,3,4


### Con diccionarios

In [9]:
# creando las listas. cada lista contendra los valores de toda una columna del dataframe
states = ["California", "Texas", "Florida", "New York"]
population = [39098766, 12563789, 34987232, 19875378]

In [10]:
# creando el diccionario
dict_states = {
    "State": states, # columna State
    "Population": population # columna Population
}

In [11]:
# creando el dataframe, en este caso tomara como nombres d elas columnas las keys del diccionario
df_states = pd.DataFrame(dict_states) 

In [12]:
# mostrando el dtaframe
df_states

Unnamed: 0,State,Population
0,California,39098766
1,Texas,12563789
2,Florida,34987232
3,New York,19875378


### Con archivo CSV

In [13]:
# leyendo el archivo csv
df_exams = pd.read_csv("./datasets/StudentsPerformance.csv")

In [14]:
# mostrando el dataframe
df_exams

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score
0,female,group B,bachelor's degree,standard,none,72,72,74
1,female,group C,some college,standard,completed,69,90,88
2,female,group B,master's degree,standard,none,90,95,93
3,male,group A,associate's degree,free/reduced,none,47,57,44
4,male,group C,some college,standard,none,76,78,75
...,...,...,...,...,...,...,...,...
995,female,group E,master's degree,standard,completed,88,99,95
996,male,group C,high school,free/reduced,none,62,55,55
997,female,group C,high school,free/reduced,completed,59,71,65
998,female,group D,some college,standard,completed,68,78,77


## Mostrar un Dataframe

<div style="text-align: justify; font-size: 1.1rem;">

Pandas por defecto no nos mostrará todas las filas del dataframe creado (como ocurre en el caso anterior con el arcihvo csv). Solo nos muestra un conjunto de filas, como para tener una idea de la estructura de dichos datos de manera resumida.

De todas formas, tenemos métodos para cambiar la forma/cantidad de filas que podemos ver con esta librería.
</div>

In [15]:
# mostrar 5 primeras filas del dataframe
df_exams.head()

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score
0,female,group B,bachelor's degree,standard,none,72,72,74
1,female,group C,some college,standard,completed,69,90,88
2,female,group B,master's degree,standard,none,90,95,93
3,male,group A,associate's degree,free/reduced,none,47,57,44
4,male,group C,some college,standard,none,76,78,75


In [16]:
# mostrar 5 ultimas filas del dataframe
df_exams.tail()

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score
995,female,group E,master's degree,standard,completed,88,99,95
996,male,group C,high school,free/reduced,none,62,55,55
997,female,group C,high school,free/reduced,completed,59,71,65
998,female,group D,some college,standard,completed,68,78,77
999,female,group D,some college,free/reduced,none,77,86,86


In [17]:
# mostrar primeras n filas del dataframe
df_exams.head(10)

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score
0,female,group B,bachelor's degree,standard,none,72,72,74
1,female,group C,some college,standard,completed,69,90,88
2,female,group B,master's degree,standard,none,90,95,93
3,male,group A,associate's degree,free/reduced,none,47,57,44
4,male,group C,some college,standard,none,76,78,75
5,female,group B,associate's degree,standard,none,71,83,78
6,female,group B,some college,standard,completed,88,95,92
7,male,group B,some college,free/reduced,none,40,43,39
8,male,group D,high school,free/reduced,completed,64,64,67
9,female,group B,high school,free/reduced,none,38,60,50


In [18]:
# mostrar ultimas n filas del dataframe
df_exams.tail(10)

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score
990,male,group E,high school,free/reduced,completed,86,81,75
991,female,group B,some high school,standard,completed,65,82,78
992,female,group D,associate's degree,free/reduced,none,55,76,76
993,female,group D,bachelor's degree,free/reduced,none,62,72,74
994,male,group A,high school,standard,none,63,63,62
995,female,group E,master's degree,standard,completed,88,99,95
996,male,group C,high school,free/reduced,none,62,55,55
997,female,group C,high school,free/reduced,completed,59,71,65
998,female,group D,some college,standard,completed,68,78,77
999,female,group D,some college,free/reduced,none,77,86,86


In [19]:
# atributo shape
df_exams.shape

(1000, 8)

In [20]:
#mostrar n filas
pd.set_option("display.max_rows", 1000)
df_exams

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score
0,female,group B,bachelor's degree,standard,none,72,72,74
1,female,group C,some college,standard,completed,69,90,88
2,female,group B,master's degree,standard,none,90,95,93
3,male,group A,associate's degree,free/reduced,none,47,57,44
4,male,group C,some college,standard,none,76,78,75
5,female,group B,associate's degree,standard,none,71,83,78
6,female,group B,some college,standard,completed,88,95,92
7,male,group B,some college,free/reduced,none,40,43,39
8,male,group D,high school,free/reduced,completed,64,64,67
9,female,group B,high school,free/reduced,none,38,60,50


## Atributos básicos, funciones y métodos

### Resumen teórico

### Ejemplos

In [21]:
pd.set_option("display.max_rows", 11) # para volver a setear el display de filas por defecto

In [22]:
df_exams

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score
0,female,group B,bachelor's degree,standard,none,72,72,74
1,female,group C,some college,standard,completed,69,90,88
2,female,group B,master's degree,standard,none,90,95,93
3,male,group A,associate's degree,free/reduced,none,47,57,44
4,male,group C,some college,standard,none,76,78,75
...,...,...,...,...,...,...,...,...
995,female,group E,master's degree,standard,completed,88,99,95
996,male,group C,high school,free/reduced,none,62,55,55
997,female,group C,high school,free/reduced,completed,59,71,65
998,female,group D,some college,standard,completed,68,78,77


#### Atributos

In [23]:
# atributo shape (muestra la cant de filas y columnas)
df_exams.shape 

(1000, 8)

In [24]:
# atributo index
df_exams.index

RangeIndex(start=0, stop=1000, step=1)

In [25]:
# atributo columns (me muestra los titulos de las columnas)
df_exams.columns

Index(['gender', 'race/ethnicity', 'parental level of education', 'lunch',
       'test preparation course', 'math score', 'reading score',
       'writing score'],
      dtype='object')

In [26]:
# tipo de dato de cada columna (nombre columna -> tipo de dato)
df_exams.dtypes

gender                         object
race/ethnicity                 object
parental level of education    object
lunch                          object
test preparation course        object
math score                      int64
reading score                   int64
writing score                   int64
dtype: object

#### Métodos

In [27]:
# mostrando las 5 primeras filas
df_exams.head()

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score
0,female,group B,bachelor's degree,standard,none,72,72,74
1,female,group C,some college,standard,completed,69,90,88
2,female,group B,master's degree,standard,none,90,95,93
3,male,group A,associate's degree,free/reduced,none,47,57,44
4,male,group C,some college,standard,none,76,78,75


In [28]:
# mostrando informacion general del dataframe
df_exams.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 8 columns):
 #   Column                       Non-Null Count  Dtype 
---  ------                       --------------  ----- 
 0   gender                       1000 non-null   object
 1   race/ethnicity               1000 non-null   object
 2   parental level of education  1000 non-null   object
 3   lunch                        1000 non-null   object
 4   test preparation course      1000 non-null   object
 5   math score                   1000 non-null   int64 
 6   reading score                1000 non-null   int64 
 7   writing score                1000 non-null   int64 
dtypes: int64(3), object(5)
memory usage: 62.6+ KB


In [29]:
# obteniendo valores estadisticos del dataframe (de las columnas con valores numericos)
df_exams.describe()

Unnamed: 0,math score,reading score,writing score
count,1000.0,1000.0,1000.0
mean,66.089,69.169,68.054
std,15.16308,14.600192,15.195657
min,0.0,17.0,10.0
25%,57.0,59.0,57.75
50%,66.0,70.0,69.0
75%,77.0,79.0,79.0
max,100.0,100.0,100.0


#### Funciones

In [30]:
# obtener el tamaño del dataframe (filas)
len(df_exams)

1000

In [31]:
# valor de indice maximo
max(df_exams.index)

999

In [32]:
# valor minimo de indice
min(df_exams.index)

0

In [33]:
# obtener el tipo de dato del dataframe en si
type(df_exams)

pandas.core.frame.DataFrame

In [34]:
# redondear valores numericos del dataframe (en este caso a dos decimales)
round(df_exams, 2)

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score
0,female,group B,bachelor's degree,standard,none,72,72,74
1,female,group C,some college,standard,completed,69,90,88
2,female,group B,master's degree,standard,none,90,95,93
3,male,group A,associate's degree,free/reduced,none,47,57,44
4,male,group C,some college,standard,none,76,78,75
...,...,...,...,...,...,...,...,...
995,female,group E,master's degree,standard,completed,88,99,95
996,male,group C,high school,free/reduced,none,62,55,55
997,female,group C,high school,free/reduced,completed,59,71,65
998,female,group D,some college,standard,completed,68,78,77


## Seleccionar columnas

### Seleccionar 1 columna

#### Syntax 1

In [35]:
# seleccionar una columna con [] -> Esto nos devuelve una serie
df_exams['gender']

0      female
1      female
2      female
3        male
4        male
        ...  
995    female
996      male
997    female
998    female
999    female
Name: gender, Length: 1000, dtype: object

In [36]:
# revisar el tipo de dato de una columna
type(df_exams['gender'])

pandas.core.series.Series

In [37]:
# series: atributos y metodos - las series comparten varios atributos y metodos con los dataframe
df_exams['gender'].index
df_exams['gender'].head()

0    female
1    female
2    female
3      male
4      male
Name: gender, dtype: object

#### Syntax 2

In [38]:
# seleccionar columna con .nombre_columna
df_exams.gender

0      female
1      female
2      female
3        male
4        male
        ...  
995    female
996      male
997    female
998    female
999    female
Name: gender, Length: 1000, dtype: object

In [39]:
# esta forma tiene la limitacion de que no sirve para columnas con palabras separadas por espacios en blanco
df_exams.math score

SyntaxError: invalid syntax (3957362103.py, line 2)

In [40]:
df_exams["math score"]

0      72
1      69
2      90
3      47
4      76
       ..
995    88
996    62
997    59
998    68
999    77
Name: math score, Length: 1000, dtype: int64

### Seleccionar 2 o más columnas

In [41]:
# seleccionar 2 columnas usando [[]] -> esto nos devuelve un DataFrame
df_exams[["gender", "math score"]]

Unnamed: 0,gender,math score
0,female,72
1,female,69
2,female,90
3,male,47
4,male,76
...,...,...
995,female,88
996,male,62
997,female,59
998,female,68


In [42]:
# revisar el tipo de data de la seleccion
type(df_exams[["gender", "math score"]])

pandas.core.frame.DataFrame

In [43]:
# seleccionar 2 o mas columnas usando [[]]
df_exams[["gender", "math score", "reading score", "writing score"]]

Unnamed: 0,gender,math score,reading score,writing score
0,female,72,72,74
1,female,69,90,88
2,female,90,95,93
3,male,47,57,44
4,male,76,78,75
...,...,...,...,...
995,female,88,99,95
996,male,62,55,55
997,female,59,71,65
998,female,68,78,77


In [44]:
# no podemos seleccionar 2 o mas columnas usando el .
df_exams."gender", "math score"

SyntaxError: invalid syntax (563264676.py, line 2)

## Agregar una nueva columna

### Agregar una nueva columna con un valor

In [45]:
# agregar una nueva columna al dataframe
df_exams['language score'] = 70 # todas las filas tendran ese valor constante

In [46]:
df_exams

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score,language score
0,female,group B,bachelor's degree,standard,none,72,72,74,70
1,female,group C,some college,standard,completed,69,90,88,70
2,female,group B,master's degree,standard,none,90,95,93,70
3,male,group A,associate's degree,free/reduced,none,47,57,44,70
4,male,group C,some college,standard,none,76,78,75,70
...,...,...,...,...,...,...,...,...,...
995,female,group E,master's degree,standard,completed,88,99,95,70
996,male,group C,high school,free/reduced,none,62,55,55,70
997,female,group C,high school,free/reduced,completed,59,71,65,70
998,female,group D,some college,standard,completed,68,78,77,70


### Agregar una nueva columna con un array

<p style="text-align: justify; font-size: 1.1rem;">El dataframe tiene 1000 filas/observaciones, por ello, tenemos que crear un array de 1000 elementos</p>

In [49]:
# crear un array de 100 elementos
language_score = np.arange(0, 1000)

In [50]:
# verificamos el tamaño del array
print(len(language_score))
print(len(language_score) == len(df_exams))

1000
True


In [51]:
# agregar una nueva columna al dataframe con un array
df_exams['language score'] = language_score
df_exams

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score,language score
0,female,group B,bachelor's degree,standard,none,72,72,74,0
1,female,group C,some college,standard,completed,69,90,88,1
2,female,group B,master's degree,standard,none,90,95,93,2
3,male,group A,associate's degree,free/reduced,none,47,57,44,3
4,male,group C,some college,standard,none,76,78,75,4
...,...,...,...,...,...,...,...,...,...
995,female,group E,master's degree,standard,completed,88,99,95,995
996,male,group C,high school,free/reduced,none,62,55,55,996
997,female,group C,high school,free/reduced,completed,59,71,65,997
998,female,group D,some college,standard,completed,68,78,77,998


In [52]:
# crear numeros aleatorios entre 1 y 100
int_language_score = np.random.randint(1, 100, size=len(df_exams))

In [53]:
# el minimo valor es inclusivo y el maximo es exclusivo
min(int_language_score)
max(int_language_score)

99

In [54]:
df_exams['language score'] = int_language_score
df_exams

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score,language score
0,female,group B,bachelor's degree,standard,none,72,72,74,77
1,female,group C,some college,standard,completed,69,90,88,86
2,female,group B,master's degree,standard,none,90,95,93,11
3,male,group A,associate's degree,free/reduced,none,47,57,44,71
4,male,group C,some college,standard,none,76,78,75,89
...,...,...,...,...,...,...,...,...,...
995,female,group E,master's degree,standard,completed,88,99,95,68
996,male,group C,high school,free/reduced,none,62,55,55,16
997,female,group C,high school,free/reduced,completed,59,71,65,37
998,female,group D,some college,standard,completed,68,78,77,10


In [55]:
# crear numeros decimales aleatorios entre 1 y 100
np.random.uniform(1, 100, size=len(df_exams))

array([ 1.60597111, 68.8031974 , 53.64978845, 67.28633278, 62.71069119,
       72.99819451, 11.56249689, 68.77575269, 44.96872154, 19.40286104,
       99.2896438 , 30.55548947, 26.72163667, 39.28519363, 74.03005032,
       83.3058971 , 10.64330487, 20.33838081, 74.54297345,  3.43963189,
       98.94417892, 38.73654516, 11.28948618, 53.77785396, 31.68019963,
       98.23938202, 43.4655229 , 91.72497644,  5.00971261, 40.57980439,
       30.88114642, 36.07093534, 21.89157625, 17.50149473, 79.83534855,
        8.32866454, 31.27998496, 36.13551767,  6.53196828, 74.02545775,
       31.67464395, 63.25801445, 89.68109067, 58.6102211 , 38.85831318,
       29.418246  , 12.12707997, 22.53980462, 26.96948799, 77.63882662,
       70.28240759, 44.999465  , 72.76467561,  6.51992223, 31.31448065,
       92.74856951, 70.61158723, 92.43972424, 99.50048703, 31.79633682,
       69.74398085, 57.12523886, 91.13983384, 89.50180942, 78.36045022,
       67.0373192 , 29.22576657, 52.88880551, 82.86466165, 90.95

## Operaciones en Dataframes

### Operaciones en columnas

In [56]:
# selecciona columna y calcula la suma total de todos los registros de esa columna
df_exams["math score"].sum()

66089

In [57]:
# contar, promedio, desv. estandar, maximo y minimo
print(df_exams["math score"].count()) # cant de registros
print(df_exams["math score"].mean()) # promedio
print(df_exams["math score"].std()) # dev estandar
print(df_exams["math score"].min())
print(df_exams["math score"].max())

1000
66.089
15.16308009600945
0
100


In [58]:
# calcula mas rapido con .describe()
df_exams.describe()

Unnamed: 0,math score,reading score,writing score,language score
count,1000.0,1000.0,1000.0,1000.0
mean,66.089,69.169,68.054,49.101
std,15.16308,14.600192,15.195657,28.696346
min,0.0,17.0,10.0,1.0
25%,57.0,59.0,57.75,24.0
50%,66.0,70.0,69.0,47.0
75%,77.0,79.0,79.0,74.25
max,100.0,100.0,100.0,99.0


### Operaciones en filas

In [59]:
# calcular la suma de valores de columnas en cada fila
average = df_exams["math score"] + df_exams["reading score"] + df_exams["writing score"]
average

0      218
1      247
2      278
3      148
4      229
      ... 
995    282
996    172
997    195
998    223
999    249
Length: 1000, dtype: int64

In [60]:
type(average)

pandas.core.series.Series

In [61]:
# calcular el promedio y asignar los resultados a una nueval columna
df_exams['average'] = (average / 3).round(decimals=2)

In [62]:
# mostrar el dataframe
df_exams

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score,language score,average
0,female,group B,bachelor's degree,standard,none,72,72,74,77,72.67
1,female,group C,some college,standard,completed,69,90,88,86,82.33
2,female,group B,master's degree,standard,none,90,95,93,11,92.67
3,male,group A,associate's degree,free/reduced,none,47,57,44,71,49.33
4,male,group C,some college,standard,none,76,78,75,89,76.33
...,...,...,...,...,...,...,...,...,...,...
995,female,group E,master's degree,standard,completed,88,99,95,68,94.00
996,male,group C,high school,free/reduced,none,62,55,55,16,57.33
997,female,group C,high school,free/reduced,completed,59,71,65,37,65.00
998,female,group D,some college,standard,completed,68,78,77,10,74.33


## Value_Counts

In [63]:
# contar los elementos de la columna "gender"

# funcion len
print(len(df_exams["gender"]))

# metodo .count()
print(df_exams["gender"].count())


1000
1000


In [64]:
# contar los elementos de "gender" por categoria
df_exams["gender"].value_counts()

gender
female    518
male      482
Name: count, dtype: int64

In [65]:
# obtener la frecuencia relativa (porcentaje del total de cada categoria)
df_exams["gender"].value_counts(normalize=True)

gender
female    0.518
male      0.482
Name: proportion, dtype: float64

In [66]:
# contar elementos de la categoria "parental level of education" por categoria
df_exams["parental level of education"].value_counts()

parental level of education
some college          226
associate's degree    222
high school           196
some high school      179
bachelor's degree     118
master's degree        59
Name: count, dtype: int64

In [67]:
type(df_exams["parental level of education"].value_counts())

pandas.core.series.Series

In [68]:
# obtener la frecuencia relativa y redondear a 2 decimales
df_exams["parental level of education"].value_counts(normalize=True).round(2)

parental level of education
some college          0.23
associate's degree    0.22
high school           0.20
some high school      0.18
bachelor's degree     0.12
master's degree       0.06
Name: proportion, dtype: float64

## Ordenar un dataframe

In [69]:
# ordenar por columna
df_exams.sort_values(by="math score")

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score,language score,average
59,female,group C,some high school,free/reduced,none,0,17,10,63,9.00
980,female,group B,high school,free/reduced,none,8,24,23,85,18.33
17,female,group B,some high school,free/reduced,none,18,32,28,52,26.00
787,female,group B,some college,standard,none,19,38,32,15,29.67
145,female,group C,some college,free/reduced,none,22,39,33,97,31.33
...,...,...,...,...,...,...,...,...,...,...
625,male,group D,some college,standard,completed,100,97,99,21,98.67
623,male,group A,some college,standard,completed,100,96,86,5,94.00
451,female,group E,some college,standard,none,100,92,97,83,96.33
962,female,group E,associate's degree,standard,none,100,100,100,53,100.00


In [70]:
# ordenar descendiente por columna
df_exams.sort_values(by="math score", ascending=False)

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score,language score,average
962,female,group E,associate's degree,standard,none,100,100,100,53,100.00
625,male,group D,some college,standard,completed,100,97,99,21,98.67
458,female,group E,bachelor's degree,standard,none,100,100,100,3,100.00
623,male,group A,some college,standard,completed,100,96,86,5,94.00
451,female,group E,some college,standard,none,100,92,97,83,96.33
...,...,...,...,...,...,...,...,...,...,...
145,female,group C,some college,free/reduced,none,22,39,33,97,31.33
787,female,group B,some college,standard,none,19,38,32,15,29.67
17,female,group B,some high school,free/reduced,none,18,32,28,52,26.00
980,female,group B,high school,free/reduced,none,8,24,23,85,18.33


In [71]:
# orden descendiente por multiples columnas
df_exams.sort_values(by=["math score", "reading score"], ascending=False)

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score,language score,average
149,male,group E,associate's degree,free/reduced,completed,100,100,93,46,97.67
458,female,group E,bachelor's degree,standard,none,100,100,100,3,100.00
916,male,group E,bachelor's degree,standard,completed,100,100,100,78,100.00
962,female,group E,associate's degree,standard,none,100,100,100,53,100.00
625,male,group D,some college,standard,completed,100,97,99,21,98.67
...,...,...,...,...,...,...,...,...,...,...
145,female,group C,some college,free/reduced,none,22,39,33,97,31.33
787,female,group B,some college,standard,none,19,38,32,15,29.67
17,female,group B,some high school,free/reduced,none,18,32,28,52,26.00
980,female,group B,high school,free/reduced,none,8,24,23,85,18.33


In [72]:
# el metodo sort_values no altera el dataframe original, sino que devuelve una copia
# si queremos que realmente se ordene el original:
df_exams.sort_values(by=["math score", "reading score"], ascending=False, inplace=True)

In [73]:
df_exams

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score,language score,average
149,male,group E,associate's degree,free/reduced,completed,100,100,93,46,97.67
458,female,group E,bachelor's degree,standard,none,100,100,100,3,100.00
916,male,group E,bachelor's degree,standard,completed,100,100,100,78,100.00
962,female,group E,associate's degree,standard,none,100,100,100,53,100.00
625,male,group D,some college,standard,completed,100,97,99,21,98.67
...,...,...,...,...,...,...,...,...,...,...
145,female,group C,some college,free/reduced,none,22,39,33,97,31.33
787,female,group B,some college,standard,none,19,38,32,15,29.67
17,female,group B,some high school,free/reduced,none,18,32,28,52,26.00
980,female,group B,high school,free/reduced,none,8,24,23,85,18.33


In [75]:
# orden descendiente con una funcion key
df_exams.sort_values(by="race/ethnicity", key=lambda col: col.str.lower())

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score,language score,average
741,female,group A,associate's degree,free/reduced,none,37,57,56,76,50.00
151,male,group A,bachelor's degree,standard,none,77,67,68,53,70.67
811,male,group A,high school,free/reduced,none,45,47,49,46,47.00
112,male,group A,associate's degree,standard,none,54,53,47,20,51.33
25,male,group A,master's degree,free/reduced,none,73,74,72,33,73.00
...,...,...,...,...,...,...,...,...,...,...
751,male,group E,some college,standard,none,68,72,65,21,68.33
915,female,group E,some college,standard,none,68,70,66,2,68.00
592,male,group E,bachelor's degree,standard,none,68,68,64,6,66.67
479,male,group E,associate's degree,standard,none,76,71,67,25,71.33
