# Manejo de datos con `dplyr`. 

**Autora: Nayeli Luis**

**email**: <nayeli.luis@ciencias.unam.mx>

## Pasos iniciales

* Llamar librería 
* Cargar datos 


In [1]:
#----Instalar librería----
# install.packages("tidyverse")

#----Llamar librería----
library(tidyverse)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.1 ──

[32m✔[39m [34mggplot2[39m 3.3.6     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.1.7     [32m✔[39m [34mdplyr  [39m 1.0.9
[32m✔[39m [34mtidyr  [39m 1.2.0     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 2.1.2     [32m✔[39m [34mforcats[39m 0.5.1

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()



In [2]:
# Importar datos
pokemon <- read_csv("../00_datasets/pokemon.csv")

[1mRows: [22m[34m801[39m [1mColumns: [22m[34m41[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m  (7): abilities, capture_rate, classfication, japanese_name, name, type1...
[32mdbl[39m (34): against_bug, against_dark, against_dragon, against_electric, again...

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


In [3]:
# Tipo de objeto que es pokemon
class(pokemon)

In [4]:
# Conocer las primeras filas 
head(pokemon)

abilities,against_bug,against_dark,against_dragon,against_electric,against_fairy,against_fight,against_fire,against_flying,against_ghost,⋯,percentage_male,pokedex_number,sp_attack,sp_defense,speed,type1,type2,weight_kg,generation,is_legendary
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<dbl>,<dbl>,<dbl>
"['Overgrow', 'Chlorophyll']",1.0,1,1,0.5,0.5,0.5,2.0,2,1,⋯,88.1,1,65,65,45,grass,poison,6.9,1,0
"['Overgrow', 'Chlorophyll']",1.0,1,1,0.5,0.5,0.5,2.0,2,1,⋯,88.1,2,80,80,60,grass,poison,13.0,1,0
"['Overgrow', 'Chlorophyll']",1.0,1,1,0.5,0.5,0.5,2.0,2,1,⋯,88.1,3,122,120,80,grass,poison,100.0,1,0
"['Blaze', 'Solar Power']",0.5,1,1,1.0,0.5,1.0,0.5,1,1,⋯,88.1,4,60,50,65,fire,,8.5,1,0
"['Blaze', 'Solar Power']",0.5,1,1,1.0,0.5,1.0,0.5,1,1,⋯,88.1,5,80,65,80,fire,,19.0,1,0
"['Blaze', 'Solar Power']",0.25,1,1,2.0,0.5,0.5,0.5,1,1,⋯,88.1,6,159,115,100,fire,flying,90.5,1,0


El dataset fue descargado de [Kaggle](https://www.kaggle.com/datasets/rounakbanik/pokemon?resource=download). 

In [5]:
#----Exploración inicial de los datos----
# Conocer las dimensiones del dataset
dim(pokemon)

In [6]:
# Conocerlos nombres de las columnas
colnames(pokemon)

La columna `abilities` es una lista (en sintaxis de Python) de las habilidades que el Pokemon es capaz de tener. 

In [7]:
# Explorar una columna en específico 
# pokemon$type1

# Explorar solo algunos elementos 
head(pokemon$type1, 30)


In [8]:
head(pokemon$classfication, 20)

## Manejo de datos

En general, cuando tenemos un dataframe muy largo, no utilizamos todos los datos. Nos concentramos en algunas variables y en algunas observaciones. De manera que generamos *suconjuntos* de datos del dataset original.

Para esto tenemos dos opciones: 

1. Seleccionar columnas 
2. Filtrar por filas. 



## `dplyr`

### `select()`

In [9]:
#----select()----
# select() pues... para seleccionar que no? 

# Solo queremos algunas cuantas columnas no las 41

columnas <- c('abilities', 'name', 'type1', 'classfication', 'is_legendary')
# select()
select(pokemon, all_of(columnas))

In [17]:
# Seleccionar por rango de columnas 
select(pokemon, 10:15) 

against_ghost,against_grass,against_ground,against_ice,against_normal,against_poison
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,0.25,1,2.0,1,1
1,0.25,1,2.0,1,1
1,0.25,1,2.0,1,1
1,0.5,2,0.5,1,1
1,0.5,2,0.5,1,1
1,0.25,0,1.0,1,1


In [11]:
# Selecciona de manera invertida
select(pokemon, 20:1)

attack,against_water,against_steel,against_rock,against_psychic,against_poison,against_normal,against_ice,against_ground,against_grass,against_ghost,against_flying,against_fire,against_fight,against_fairy,against_electric,against_dragon,against_dark,against_bug,abilities
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>
49,0.5,1.0,1.0,2,1.0,1,2.0,1.0,0.25,1,2.0,2.0,0.50,0.5,0.5,1,1,1.00,"['Overgrow', 'Chlorophyll']"
62,0.5,1.0,1.0,2,1.0,1,2.0,1.0,0.25,1,2.0,2.0,0.50,0.5,0.5,1,1,1.00,"['Overgrow', 'Chlorophyll']"
100,0.5,1.0,1.0,2,1.0,1,2.0,1.0,0.25,1,2.0,2.0,0.50,0.5,0.5,1,1,1.00,"['Overgrow', 'Chlorophyll']"
52,2.0,0.5,2.0,1,1.0,1,0.5,2.0,0.50,1,1.0,0.5,1.00,0.5,1.0,1,1,0.50,"['Blaze', 'Solar Power']"
64,2.0,0.5,2.0,1,1.0,1,0.5,2.0,0.50,1,1.0,0.5,1.00,0.5,1.0,1,1,0.50,"['Blaze', 'Solar Power']"
104,2.0,0.5,4.0,1,1.0,1,1.0,0.0,0.25,1,1.0,0.5,0.50,0.5,2.0,1,1,0.25,"['Blaze', 'Solar Power']"
48,0.5,0.5,1.0,1,1.0,1,0.5,1.0,2.00,1,1.0,0.5,1.00,1.0,2.0,1,1,1.00,"['Torrent', 'Rain Dish']"
63,0.5,0.5,1.0,1,1.0,1,0.5,1.0,2.00,1,1.0,0.5,1.00,1.0,2.0,1,1,1.00,"['Torrent', 'Rain Dish']"
103,0.5,0.5,1.0,1,1.0,1,0.5,1.0,2.00,1,1.0,0.5,1.00,1.0,2.0,1,1,1.00,"['Torrent', 'Rain Dish']"
30,1.0,1.0,2.0,1,1.0,1,1.0,0.5,0.50,1,2.0,2.0,0.50,1.0,1.0,1,1,1.00,"['Shield Dust', 'Run Away']"


In [27]:
# Columna 1
pokemon[1]

abilities
<chr>
"['Overgrow', 'Chlorophyll']"
"['Overgrow', 'Chlorophyll']"
"['Overgrow', 'Chlorophyll']"
"['Blaze', 'Solar Power']"
"['Blaze', 'Solar Power']"
"['Blaze', 'Solar Power']"
"['Torrent', 'Rain Dish']"
"['Torrent', 'Rain Dish']"
"['Torrent', 'Rain Dish']"
"['Shield Dust', 'Run Away']"


In [26]:
# Hacer select a la antigua
pokemon[1:10,columnas] 

abilities,name,type1,classfication,is_legendary
<chr>,<chr>,<chr>,<chr>,<dbl>
"['Overgrow', 'Chlorophyll']",Bulbasaur,grass,Seed Pokémon,0
"['Overgrow', 'Chlorophyll']",Ivysaur,grass,Seed Pokémon,0
"['Overgrow', 'Chlorophyll']",Venusaur,grass,Seed Pokémon,0
"['Blaze', 'Solar Power']",Charmander,fire,Lizard Pokémon,0
"['Blaze', 'Solar Power']",Charmeleon,fire,Flame Pokémon,0
"['Blaze', 'Solar Power']",Charizard,fire,Flame Pokémon,0
"['Torrent', 'Rain Dish']",Squirtle,water,Tiny Turtle Pokémon,0
"['Torrent', 'Rain Dish']",Wartortle,water,Turtle Pokémon,0
"['Torrent', 'Rain Dish']",Blastoise,water,Shellfish Pokémon,0
"['Shield Dust', 'Run Away']",Caterpie,bug,Worm Pokémon,0


In [3]:
# Seleccionar utilizando un patron de caracteres: 
# opcion1
select(pokemon, contains("against"))

against_bug,against_dark,against_dragon,against_electric,against_fairy,against_fight,against_fire,against_flying,against_ghost,against_grass,against_ground,against_ice,against_normal,against_poison,against_psychic,against_rock,against_steel,against_water
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1.00,1,1,0.5,0.5,0.50,2.0,2.0,1,0.25,1.0,2.0,1,1.0,2,1.0,1.0,0.5
1.00,1,1,0.5,0.5,0.50,2.0,2.0,1,0.25,1.0,2.0,1,1.0,2,1.0,1.0,0.5
1.00,1,1,0.5,0.5,0.50,2.0,2.0,1,0.25,1.0,2.0,1,1.0,2,1.0,1.0,0.5
0.50,1,1,1.0,0.5,1.00,0.5,1.0,1,0.50,2.0,0.5,1,1.0,1,2.0,0.5,2.0
0.50,1,1,1.0,0.5,1.00,0.5,1.0,1,0.50,2.0,0.5,1,1.0,1,2.0,0.5,2.0
0.25,1,1,2.0,0.5,0.50,0.5,1.0,1,0.25,0.0,1.0,1,1.0,1,4.0,0.5,2.0
1.00,1,1,2.0,1.0,1.00,0.5,1.0,1,2.00,1.0,0.5,1,1.0,1,1.0,0.5,0.5
1.00,1,1,2.0,1.0,1.00,0.5,1.0,1,2.00,1.0,0.5,1,1.0,1,1.0,0.5,0.5
1.00,1,1,2.0,1.0,1.00,0.5,1.0,1,2.00,1.0,0.5,1,1.0,1,1.0,0.5,0.5
1.00,1,1,1.0,1.0,0.50,2.0,2.0,1,0.50,0.5,1.0,1,1.0,1,2.0,1.0,1.0


In [4]:
# Opcion2
select(pokemon, matches("against")) 

against_bug,against_dark,against_dragon,against_electric,against_fairy,against_fight,against_fire,against_flying,against_ghost,against_grass,against_ground,against_ice,against_normal,against_poison,against_psychic,against_rock,against_steel,against_water
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1.00,1,1,0.5,0.5,0.50,2.0,2.0,1,0.25,1.0,2.0,1,1.0,2,1.0,1.0,0.5
1.00,1,1,0.5,0.5,0.50,2.0,2.0,1,0.25,1.0,2.0,1,1.0,2,1.0,1.0,0.5
1.00,1,1,0.5,0.5,0.50,2.0,2.0,1,0.25,1.0,2.0,1,1.0,2,1.0,1.0,0.5
0.50,1,1,1.0,0.5,1.00,0.5,1.0,1,0.50,2.0,0.5,1,1.0,1,2.0,0.5,2.0
0.50,1,1,1.0,0.5,1.00,0.5,1.0,1,0.50,2.0,0.5,1,1.0,1,2.0,0.5,2.0
0.25,1,1,2.0,0.5,0.50,0.5,1.0,1,0.25,0.0,1.0,1,1.0,1,4.0,0.5,2.0
1.00,1,1,2.0,1.0,1.00,0.5,1.0,1,2.00,1.0,0.5,1,1.0,1,1.0,0.5,0.5
1.00,1,1,2.0,1.0,1.00,0.5,1.0,1,2.00,1.0,0.5,1,1.0,1,1.0,0.5,0.5
1.00,1,1,2.0,1.0,1.00,0.5,1.0,1,2.00,1.0,0.5,1,1.0,1,1.0,0.5,0.5
1.00,1,1,1.0,1.0,0.50,2.0,2.0,1,0.50,0.5,1.0,1,1.0,1,2.0,1.0,1.0


In [5]:
# Seleccionar por tipo de dato 
select(pokemon, where(is.numeric)) 

against_bug,against_dark,against_dragon,against_electric,against_fairy,against_fight,against_fire,against_flying,against_ghost,against_grass,⋯,height_m,hp,percentage_male,pokedex_number,sp_attack,sp_defense,speed,weight_kg,generation,is_legendary
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1.00,1,1,0.5,0.5,0.50,2.0,2.0,1,0.25,⋯,0.7,45,88.1,1,65,65,45,6.9,1,0
1.00,1,1,0.5,0.5,0.50,2.0,2.0,1,0.25,⋯,1.0,60,88.1,2,80,80,60,13.0,1,0
1.00,1,1,0.5,0.5,0.50,2.0,2.0,1,0.25,⋯,2.0,80,88.1,3,122,120,80,100.0,1,0
0.50,1,1,1.0,0.5,1.00,0.5,1.0,1,0.50,⋯,0.6,39,88.1,4,60,50,65,8.5,1,0
0.50,1,1,1.0,0.5,1.00,0.5,1.0,1,0.50,⋯,1.1,58,88.1,5,80,65,80,19.0,1,0
0.25,1,1,2.0,0.5,0.50,0.5,1.0,1,0.25,⋯,1.7,78,88.1,6,159,115,100,90.5,1,0
1.00,1,1,2.0,1.0,1.00,0.5,1.0,1,2.00,⋯,0.5,44,88.1,7,50,64,43,9.0,1,0
1.00,1,1,2.0,1.0,1.00,0.5,1.0,1,2.00,⋯,1.0,59,88.1,8,65,80,58,22.5,1,0
1.00,1,1,2.0,1.0,1.00,0.5,1.0,1,2.00,⋯,1.6,79,88.1,9,135,115,78,85.5,1,0
1.00,1,1,1.0,1.0,0.50,2.0,2.0,1,0.50,⋯,0.3,45,50.0,10,20,20,45,2.9,1,0


### `filter()`

In [23]:
#----filter()----
# Hacer subconjuntos de datos con filter() 
# Utiliza operadores relacionales, logicos y de pertenecia para trabajar, ya que va a filtrar en funcion del valor que haya en una columna.
filter(pokemon, type1 == "fire")

abilities,against_bug,against_dark,against_dragon,against_electric,against_fairy,against_fight,against_fire,against_flying,against_ghost,⋯,percentage_male,pokedex_number,sp_attack,sp_defense,speed,type1,type2,weight_kg,generation,is_legendary
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<dbl>,<dbl>,<dbl>
"['Blaze', 'Solar Power']",0.5,1.0,1.0,1.0,0.5,1.0,0.5,1.0,1.0,⋯,88.1,4,60,50,65,fire,,8.5,1,0
"['Blaze', 'Solar Power']",0.5,1.0,1.0,1.0,0.5,1.0,0.5,1.0,1.0,⋯,88.1,5,80,65,80,fire,,19.0,1,0
"['Blaze', 'Solar Power']",0.25,1.0,1.0,2.0,0.5,0.5,0.5,1.0,1.0,⋯,88.1,6,159,115,100,fire,flying,90.5,1,0
"['Flash Fire', 'Drought', 'Snow Cloak', 'Snow Warning']",0.5,1.0,1.0,1.0,0.5,1.0,0.5,1.0,1.0,⋯,24.6,37,50,65,65,fire,ice,,1,0
"['Flash Fire', 'Drought', 'Snow Cloak', 'Snow Warning']",0.5,1.0,1.0,1.0,0.5,1.0,0.5,1.0,1.0,⋯,24.6,38,81,100,109,fire,ice,,1,0
"['Intimidate', 'Flash Fire', 'Justified']",0.5,1.0,1.0,1.0,0.5,1.0,0.5,1.0,1.0,⋯,75.4,58,70,50,60,fire,,19.0,1,0
"['Intimidate', 'Flash Fire', 'Justified']",0.5,1.0,1.0,1.0,0.5,1.0,0.5,1.0,1.0,⋯,75.4,59,100,80,95,fire,,155.0,1,0
"['Run Away', 'Flash Fire', 'Flame Body']",0.5,1.0,1.0,1.0,0.5,1.0,0.5,1.0,1.0,⋯,50.0,77,65,65,90,fire,,30.0,1,0
"['Run Away', 'Flash Fire', 'Flame Body']",0.5,1.0,1.0,1.0,0.5,1.0,0.5,1.0,1.0,⋯,50.0,78,80,80,105,fire,,95.0,1,0
"['Flame Body', 'Vital Spirit']",0.5,1.0,1.0,1.0,0.5,1.0,0.5,1.0,1.0,⋯,75.4,126,100,85,93,fire,,44.5,1,0


### Operadores relacionales
Aquellos operadores que nos permiten comparar valores. 

* `==` igual que 
* `!=` diferente que
* `>=` mayor o igual que
* `>` mayor que
* `<=` menor o igual que
* `<` menor que

### Operadores lógicos

* `&` Operador AND: Ambas expresiones deben ser ciertar para obtener un resultado
* `|` Operador OR: Solo una de las expresiones debe ser cierta para obtener un resultado

In [24]:
# Todo lo que sea diferente a determinado valor
filter(pokemon, type1 != "fire") 

abilities,against_bug,against_dark,against_dragon,against_electric,against_fairy,against_fight,against_fire,against_flying,against_ghost,⋯,percentage_male,pokedex_number,sp_attack,sp_defense,speed,type1,type2,weight_kg,generation,is_legendary
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<dbl>,<dbl>,<dbl>
"['Overgrow', 'Chlorophyll']",1.0,1,1,0.5,0.5,0.50,2.0,2.0,1,⋯,88.1,1,65,65,45,grass,poison,6.9,1,0
"['Overgrow', 'Chlorophyll']",1.0,1,1,0.5,0.5,0.50,2.0,2.0,1,⋯,88.1,2,80,80,60,grass,poison,13.0,1,0
"['Overgrow', 'Chlorophyll']",1.0,1,1,0.5,0.5,0.50,2.0,2.0,1,⋯,88.1,3,122,120,80,grass,poison,100.0,1,0
"['Torrent', 'Rain Dish']",1.0,1,1,2.0,1.0,1.00,0.5,1.0,1,⋯,88.1,7,50,64,43,water,,9.0,1,0
"['Torrent', 'Rain Dish']",1.0,1,1,2.0,1.0,1.00,0.5,1.0,1,⋯,88.1,8,65,80,58,water,,22.5,1,0
"['Torrent', 'Rain Dish']",1.0,1,1,2.0,1.0,1.00,0.5,1.0,1,⋯,88.1,9,135,115,78,water,,85.5,1,0
"['Shield Dust', 'Run Away']",1.0,1,1,1.0,1.0,0.50,2.0,2.0,1,⋯,50.0,10,20,20,45,bug,,2.9,1,0
['Shed Skin'],1.0,1,1,1.0,1.0,0.50,2.0,2.0,1,⋯,50.0,11,25,25,30,bug,,9.9,1,0
"['Compoundeyes', 'Tinted Lens']",0.5,1,1,2.0,1.0,0.25,2.0,2.0,1,⋯,50.0,12,90,80,70,bug,flying,32.0,1,0
"['Shield Dust', 'Run Away']",0.5,1,1,1.0,0.5,0.25,2.0,2.0,1,⋯,50.0,13,20,20,50,bug,poison,3.2,1,0


In [25]:
# Cuantas categorias hay en type1
dplyr::count(pokemon, type1) 

type1,n
<chr>,<int>
bug,72
dark,29
dragon,27
electric,39
fairy,18
fighting,28
fire,52
flying,3
ghost,27
grass,78


Escoger aquellos pokemones que sean de los tipos roca, agua, pasto y agua

In [30]:
# Escoger aquellos pokemones que sean de roca, agua, pasto y fuego 
# Opción 1. Uso de operadores 

# ¿Cuál es la diferencia entre AND y OR? ¿Deberia usar AND u OR?

filter(pokemon, type1 == "rock" & type1 == "water" & type1 == "grass" & type1 == "fire")

“number of rows of result is not a multiple of vector length (arg 2)”
“number of rows of result is not a multiple of vector length (arg 2)”
“number of rows of result is not a multiple of vector length (arg 2)”
“number of rows of result is not a multiple of vector length (arg 2)”


abilities,against_bug,against_dark,against_dragon,against_electric,against_fairy,against_fight,against_fire,against_flying,against_ghost,⋯,percentage_male,pokedex_number,sp_attack,sp_defense,speed,type1,type2,weight_kg,generation,is_legendary
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<dbl>,<dbl>,<dbl>


In [33]:
filter(pokemon, type1 == "rock" | type1 == "water" | type1 == "grass" | type1 == "fire") %>%
    head()

abilities,against_bug,against_dark,against_dragon,against_electric,against_fairy,against_fight,against_fire,against_flying,against_ghost,⋯,percentage_male,pokedex_number,sp_attack,sp_defense,speed,type1,type2,weight_kg,generation,is_legendary
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<dbl>,<dbl>,<dbl>
"['Overgrow', 'Chlorophyll']",1.00,1.0,1,0.5,0.5,0.5,2.0,2.0,1,⋯,88.1,1,65,65,45,grass,poison,6.9,1,0
"['Overgrow', 'Chlorophyll']",1.00,1.0,1,0.5,0.5,0.5,2.0,2.0,1,⋯,88.1,2,80,80,60,grass,poison,13.0,1,0
"['Overgrow', 'Chlorophyll']",1.00,1.0,1,0.5,0.5,0.5,2.0,2.0,1,⋯,88.1,3,122,120,80,grass,poison,100.0,1,0
"['Blaze', 'Solar Power']",0.50,1.0,1,1.0,0.5,1.0,0.5,1.0,1,⋯,88.1,4,60,50,65,fire,,8.5,1,0
"['Blaze', 'Solar Power']",0.50,1.0,1,1.0,0.5,1.0,0.5,1.0,1,⋯,88.1,5,80,65,80,fire,,19.0,1,0
"['Blaze', 'Solar Power']",0.25,1.0,1,2.0,0.5,0.5,0.5,1.0,1,⋯,88.1,6,159,115,100,fire,flying,90.5,1,0
"['Torrent', 'Rain Dish']",1.00,1.0,1,2.0,1.0,1.0,0.5,1.0,1,⋯,88.1,7,50,64,43,water,,9.0,1,0
"['Torrent', 'Rain Dish']",1.00,1.0,1,2.0,1.0,1.0,0.5,1.0,1,⋯,88.1,8,65,80,58,water,,22.5,1,0
"['Torrent', 'Rain Dish']",1.00,1.0,1,2.0,1.0,1.0,0.5,1.0,1,⋯,88.1,9,135,115,78,water,,85.5,1,0
"['Flash Fire', 'Drought', 'Snow Cloak', 'Snow Warning']",0.50,1.0,1,1.0,0.5,1.0,0.5,1.0,1,⋯,24.6,37,50,65,65,fire,ice,,1,0


Pues... está bien pero ¡Qué engorroso!, entonces podemos utilizar un operador de pertenencia. 

In [34]:
tipos_pokemones <- c("rock", "water", "grass", "fire")
filter(pokemon, type1 %in% tipos_pokemones) 

abilities,against_bug,against_dark,against_dragon,against_electric,against_fairy,against_fight,against_fire,against_flying,against_ghost,⋯,percentage_male,pokedex_number,sp_attack,sp_defense,speed,type1,type2,weight_kg,generation,is_legendary
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<dbl>,<dbl>,<dbl>
"['Overgrow', 'Chlorophyll']",1.00,1.0,1,0.5,0.5,0.5,2.0,2.0,1,⋯,88.1,1,65,65,45,grass,poison,6.9,1,0
"['Overgrow', 'Chlorophyll']",1.00,1.0,1,0.5,0.5,0.5,2.0,2.0,1,⋯,88.1,2,80,80,60,grass,poison,13.0,1,0
"['Overgrow', 'Chlorophyll']",1.00,1.0,1,0.5,0.5,0.5,2.0,2.0,1,⋯,88.1,3,122,120,80,grass,poison,100.0,1,0
"['Blaze', 'Solar Power']",0.50,1.0,1,1.0,0.5,1.0,0.5,1.0,1,⋯,88.1,4,60,50,65,fire,,8.5,1,0
"['Blaze', 'Solar Power']",0.50,1.0,1,1.0,0.5,1.0,0.5,1.0,1,⋯,88.1,5,80,65,80,fire,,19.0,1,0
"['Blaze', 'Solar Power']",0.25,1.0,1,2.0,0.5,0.5,0.5,1.0,1,⋯,88.1,6,159,115,100,fire,flying,90.5,1,0
"['Torrent', 'Rain Dish']",1.00,1.0,1,2.0,1.0,1.0,0.5,1.0,1,⋯,88.1,7,50,64,43,water,,9.0,1,0
"['Torrent', 'Rain Dish']",1.00,1.0,1,2.0,1.0,1.0,0.5,1.0,1,⋯,88.1,8,65,80,58,water,,22.5,1,0
"['Torrent', 'Rain Dish']",1.00,1.0,1,2.0,1.0,1.0,0.5,1.0,1,⋯,88.1,9,135,115,78,water,,85.5,1,0
"['Flash Fire', 'Drought', 'Snow Cloak', 'Snow Warning']",0.50,1.0,1,1.0,0.5,1.0,0.5,1.0,1,⋯,24.6,37,50,65,65,fire,ice,,1,0


In [38]:
# Filtrar una variable numérica
# Conocer el valor mínimo y máximo de una variable numérica
summary(pokemon$weight_kg) 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
   0.10    9.00   27.30   61.38   64.80  999.90      20 

In [40]:
# Ahora si a hacer el filtrado
# ¿Usamos OR o AND?
filter(pokemon, weight_kg > 100 | weight_kg < 300 ) 

abilities,against_bug,against_dark,against_dragon,against_electric,against_fairy,against_fight,against_fire,against_flying,against_ghost,⋯,percentage_male,pokedex_number,sp_attack,sp_defense,speed,type1,type2,weight_kg,generation,is_legendary
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<dbl>,<dbl>,<dbl>
"['Overgrow', 'Chlorophyll']",1.00,1.0,1,0.5,0.5,0.50,2.0,2.0,1,⋯,88.1,1,65,65,45,grass,poison,6.9,1,0
"['Overgrow', 'Chlorophyll']",1.00,1.0,1,0.5,0.5,0.50,2.0,2.0,1,⋯,88.1,2,80,80,60,grass,poison,13.0,1,0
"['Overgrow', 'Chlorophyll']",1.00,1.0,1,0.5,0.5,0.50,2.0,2.0,1,⋯,88.1,3,122,120,80,grass,poison,100.0,1,0
"['Blaze', 'Solar Power']",0.50,1.0,1,1.0,0.5,1.00,0.5,1.0,1,⋯,88.1,4,60,50,65,fire,,8.5,1,0
"['Blaze', 'Solar Power']",0.50,1.0,1,1.0,0.5,1.00,0.5,1.0,1,⋯,88.1,5,80,65,80,fire,,19.0,1,0
"['Blaze', 'Solar Power']",0.25,1.0,1,2.0,0.5,0.50,0.5,1.0,1,⋯,88.1,6,159,115,100,fire,flying,90.5,1,0
"['Torrent', 'Rain Dish']",1.00,1.0,1,2.0,1.0,1.00,0.5,1.0,1,⋯,88.1,7,50,64,43,water,,9.0,1,0
"['Torrent', 'Rain Dish']",1.00,1.0,1,2.0,1.0,1.00,0.5,1.0,1,⋯,88.1,8,65,80,58,water,,22.5,1,0
"['Torrent', 'Rain Dish']",1.00,1.0,1,2.0,1.0,1.00,0.5,1.0,1,⋯,88.1,9,135,115,78,water,,85.5,1,0
"['Shield Dust', 'Run Away']",1.00,1.0,1,1.0,1.0,0.50,2.0,2.0,1,⋯,50.0,10,20,20,45,bug,,2.9,1,0


In [53]:
# Saber qué pokemones son diferentes
filter(pokemon, weight_kg > 100 | weight_kg < 300 ) -> confusion

# La columna del nombre de los pokemones es única
length(unique(pokemon$name))

# Conocer las dimensiones del dataframe que nos causa confusion
dim(confusion)

# Convertir a vector los nombres de los pokemones que estan en confusion 
nombres_confusion <- confusion$name

# Filtrar por esos nombres 
los_excluidos <- filter(pokemon, !name %in% nombres_confusion)

In [6]:
filter(pokemon, weight_kg > 100 & weight_kg < 300 )

abilities,against_bug,against_dark,against_dragon,against_electric,against_fairy,against_fight,against_fire,against_flying,against_ghost,⋯,percentage_male,pokedex_number,sp_attack,sp_defense,speed,type1,type2,weight_kg,generation,is_legendary
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<dbl>,<dbl>,<dbl>
"['Intimidate', 'Flash Fire', 'Justified']",0.50,1.0,1.0,1.0,0.5,1.00,0.50,1.00,1.0,⋯,75.4,59,100,80,95,fire,,155.0,1,0
"['Guts', 'No Guard', 'Steadfast']",0.50,0.5,1.0,1.0,2.0,1.00,1.00,2.00,1.0,⋯,75.4,68,65,85,55,fighting,,130.0,1,0
"['Thick Fat', 'Hydration', 'Ice Body']",1.00,1.0,1.0,2.0,1.0,2.00,1.00,1.00,1.0,⋯,50.0,87,70,95,70,water,ice,120.0,1,0
"['Shell Armor', 'Skill Link', 'Overcoat']",1.00,1.0,1.0,2.0,1.0,2.00,1.00,1.00,1.0,⋯,50.0,91,85,45,70,water,ice,132.5,1,0
"['Rock Head', 'Sturdy', 'Weak Armor']",1.00,1.0,1.0,0.0,1.0,2.00,0.50,0.50,1.0,⋯,50.0,95,30,45,70,rock,ground,210.0,1,0
"['Lightningrod', 'Rock Head', 'Reckless']",1.00,1.0,1.0,0.0,1.0,2.00,0.50,0.50,1.0,⋯,50.0,111,30,30,25,ground,rock,115.0,1,0
"['Lightningrod', 'Rock Head', 'Reckless']",1.00,1.0,1.0,0.0,1.0,2.00,0.50,0.50,1.0,⋯,50.0,112,45,45,40,ground,rock,120.0,1,0
"['Intimidate', 'Moxie']",0.50,1.0,1.0,4.0,1.0,0.50,0.50,1.00,1.0,⋯,50.0,130,70,130,81,water,flying,235.0,1,0
"['Water Absorb', 'Shell Armor', 'Hydration']",1.00,1.0,1.0,2.0,1.0,2.00,1.00,1.00,1.0,⋯,50.0,131,85,95,60,water,ice,220.0,1,0
"['Inner Focus', 'Multiscale']",0.50,1.0,2.0,1.0,2.0,0.50,0.50,1.00,1.0,⋯,50.0,149,100,100,80,dragon,flying,210.0,1,0


In [54]:
los_excluidos$weight_kg

In [68]:
filter(pokemon, weight_kg >= 100 & weight_kg <= 300 ) 

abilities,against_bug,against_dark,against_dragon,against_electric,against_fairy,against_fight,against_fire,against_flying,against_ghost,⋯,percentage_male,pokedex_number,sp_attack,sp_defense,speed,type1,type2,weight_kg,generation,is_legendary
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<dbl>,<dbl>,<dbl>
"['Overgrow', 'Chlorophyll']",1.00,1.0,1.0,0.5,0.5,0.50,2.00,2.00,1.0,⋯,88.1,3,122,120,80,grass,poison,100.0,1,0
"['Intimidate', 'Flash Fire', 'Justified']",0.50,1.0,1.0,1.0,0.5,1.00,0.50,1.00,1.0,⋯,75.4,59,100,80,95,fire,,155.0,1,0
"['Guts', 'No Guard', 'Steadfast']",0.50,0.5,1.0,1.0,2.0,1.00,1.00,2.00,1.0,⋯,75.4,68,65,85,55,fighting,,130.0,1,0
"['Thick Fat', 'Hydration', 'Ice Body']",1.00,1.0,1.0,2.0,1.0,2.00,1.00,1.00,1.0,⋯,50.0,87,70,95,70,water,ice,120.0,1,0
"['Shell Armor', 'Skill Link', 'Overcoat']",1.00,1.0,1.0,2.0,1.0,2.00,1.00,1.00,1.0,⋯,50.0,91,85,45,70,water,ice,132.5,1,0
"['Rock Head', 'Sturdy', 'Weak Armor']",1.00,1.0,1.0,0.0,1.0,2.00,0.50,0.50,1.0,⋯,50.0,95,30,45,70,rock,ground,210.0,1,0
"['Lightningrod', 'Rock Head', 'Reckless']",1.00,1.0,1.0,0.0,1.0,2.00,0.50,0.50,1.0,⋯,50.0,111,30,30,25,ground,rock,115.0,1,0
"['Lightningrod', 'Rock Head', 'Reckless']",1.00,1.0,1.0,0.0,1.0,2.00,0.50,0.50,1.0,⋯,50.0,112,45,45,40,ground,rock,120.0,1,0
"['Intimidate', 'Moxie']",0.50,1.0,1.0,4.0,1.0,0.50,0.50,1.00,1.0,⋯,50.0,130,70,130,81,water,flying,235.0,1,0
"['Water Absorb', 'Shell Armor', 'Hydration']",1.00,1.0,1.0,2.0,1.0,2.00,1.00,1.00,1.0,⋯,50.0,131,85,95,60,water,ice,220.0,1,0


In [69]:
# usar between, toma en cuenta los rangos
filter(pokemon, between(weight_kg, 100, 300)) 

abilities,against_bug,against_dark,against_dragon,against_electric,against_fairy,against_fight,against_fire,against_flying,against_ghost,⋯,percentage_male,pokedex_number,sp_attack,sp_defense,speed,type1,type2,weight_kg,generation,is_legendary
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<dbl>,<dbl>,<dbl>
"['Overgrow', 'Chlorophyll']",1.00,1.0,1.0,0.5,0.5,0.50,2.00,2.00,1.0,⋯,88.1,3,122,120,80,grass,poison,100.0,1,0
"['Intimidate', 'Flash Fire', 'Justified']",0.50,1.0,1.0,1.0,0.5,1.00,0.50,1.00,1.0,⋯,75.4,59,100,80,95,fire,,155.0,1,0
"['Guts', 'No Guard', 'Steadfast']",0.50,0.5,1.0,1.0,2.0,1.00,1.00,2.00,1.0,⋯,75.4,68,65,85,55,fighting,,130.0,1,0
"['Thick Fat', 'Hydration', 'Ice Body']",1.00,1.0,1.0,2.0,1.0,2.00,1.00,1.00,1.0,⋯,50.0,87,70,95,70,water,ice,120.0,1,0
"['Shell Armor', 'Skill Link', 'Overcoat']",1.00,1.0,1.0,2.0,1.0,2.00,1.00,1.00,1.0,⋯,50.0,91,85,45,70,water,ice,132.5,1,0
"['Rock Head', 'Sturdy', 'Weak Armor']",1.00,1.0,1.0,0.0,1.0,2.00,0.50,0.50,1.0,⋯,50.0,95,30,45,70,rock,ground,210.0,1,0
"['Lightningrod', 'Rock Head', 'Reckless']",1.00,1.0,1.0,0.0,1.0,2.00,0.50,0.50,1.0,⋯,50.0,111,30,30,25,ground,rock,115.0,1,0
"['Lightningrod', 'Rock Head', 'Reckless']",1.00,1.0,1.0,0.0,1.0,2.00,0.50,0.50,1.0,⋯,50.0,112,45,45,40,ground,rock,120.0,1,0
"['Intimidate', 'Moxie']",0.50,1.0,1.0,4.0,1.0,0.50,0.50,1.00,1.0,⋯,50.0,130,70,130,81,water,flying,235.0,1,0
"['Water Absorb', 'Shell Armor', 'Hydration']",1.00,1.0,1.0,2.0,1.0,2.00,1.00,1.00,1.0,⋯,50.0,131,85,95,60,water,ice,220.0,1,0


### Ejercicio
Hacer un subconjunto de datos de las columnas: nombre, tipo1, clasificación, habilidades, peso y si es legendario de los pokemones de agua, fuego, hielo y electricos. 

In [71]:
# Ver de nuevo los nombres de columnas
colnames(pokemon)

In [74]:
# Hacer un vector con las columnas que se solicitan 
columnas2 <- c("name", 'type1','classfication','abilities', 'weight_kg', 'is_legendary')
tipos <- c('water', 'ice', 'fire', 'electric')
pokemon2 <- pokemon %>%
    select(all_of(columnas2)) %>%
    filter(type1 %in% tipos)

head(pokemon2)

name,type1,classfication,abilities,weight_kg,is_legendary
<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>
Charmander,fire,Lizard Pokémon,"['Blaze', 'Solar Power']",8.5,0
Charmeleon,fire,Flame Pokémon,"['Blaze', 'Solar Power']",19.0,0
Charizard,fire,Flame Pokémon,"['Blaze', 'Solar Power']",90.5,0
Squirtle,water,Tiny Turtle Pokémon,"['Torrent', 'Rain Dish']",9.0,0
Wartortle,water,Turtle Pokémon,"['Torrent', 'Rain Dish']",22.5,0
Blastoise,water,Shellfish Pokémon,"['Torrent', 'Rain Dish']",85.5,0
Pikachu,electric,Mouse Pokémon,"['Static', 'Lightningrod']",6.0,0
Raichu,electric,Mouse Pokémon,"['Static', 'Lightningrod', 'Surge Surfer']",,0
Vulpix,fire,Fox Pokémon,"['Flash Fire', 'Drought', 'Snow Cloak', 'Snow Warning']",,0
Ninetales,fire,Fox Pokémon,"['Flash Fire', 'Drought', 'Snow Cloak', 'Snow Warning']",,0


### `arrange()`
Permite ordenar el dataframe en funcion de los valores que hay en una columna. De manera predeterminada lo hacer de menor a mayor. 

In [76]:
#----arrange()----
# Ordena los nombres de los pokemones en orden alfabético
arrange(pokemon2, name) 

name,type1,classfication,abilities,weight_kg,is_legendary
<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>
Alomomola,water,Caring Pokémon,"['Healer', 'Hydration', 'Regenerator']",31.6,0
Ampharos,electric,Light Pokémon,"['Static', 'Plus']",61.5,0
Araquanid,water,Water Bubble Pokémon,"['Water Bubble', 'Water Absorb']",82.0,0
Arcanine,fire,Legendary Pokémon,"['Intimidate', 'Flash Fire', 'Justified']",155.0,0
Articuno,ice,Freeze Pokémon,"['Pressure', 'Snow Cloak']",55.4,1
Avalugg,ice,Iceberg Pokémon,"['Own Tempo', 'Ice Body', 'Sturdy']",505.0,0
Azumarill,water,Aquarabbit Pokémon,"['Thick Fat', 'Huge Power', 'Sap Sipper']",28.5,0
Barboach,water,Whiskers Pokémon,"['Oblivious', 'Anticipation', 'Hydration']",1.9,0
Basculin,water,Hostile Pokémon,"['Reckless', 'Rock Head', 'Adaptability', 'Mold Breaker']",18.0,0
Beartic,ice,Freezing Pokémon,"['Snow Cloak', 'Slush Rush', 'Swift Swim']",260.0,0


In [77]:
# Ordenar los nombres en orden alfábetico pero el peso de mayor a menor 
arrange(pokemon2, name, desc(weight_kg)) 

name,type1,classfication,abilities,weight_kg,is_legendary
<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>
Alomomola,water,Caring Pokémon,"['Healer', 'Hydration', 'Regenerator']",31.6,0
Ampharos,electric,Light Pokémon,"['Static', 'Plus']",61.5,0
Araquanid,water,Water Bubble Pokémon,"['Water Bubble', 'Water Absorb']",82.0,0
Arcanine,fire,Legendary Pokémon,"['Intimidate', 'Flash Fire', 'Justified']",155.0,0
Articuno,ice,Freeze Pokémon,"['Pressure', 'Snow Cloak']",55.4,1
Avalugg,ice,Iceberg Pokémon,"['Own Tempo', 'Ice Body', 'Sturdy']",505.0,0
Azumarill,water,Aquarabbit Pokémon,"['Thick Fat', 'Huge Power', 'Sap Sipper']",28.5,0
Barboach,water,Whiskers Pokémon,"['Oblivious', 'Anticipation', 'Hydration']",1.9,0
Basculin,water,Hostile Pokémon,"['Reckless', 'Rock Head', 'Adaptability', 'Mold Breaker']",18.0,0
Beartic,ice,Freezing Pokémon,"['Snow Cloak', 'Slush Rush', 'Swift Swim']",260.0,0


## mutate()`
Agrega nuevas variables y preserva las existentes.

In [7]:
#----mutate()----
# Sumar el total de las variables `against`.
# Opcion 1
pokemon %>% 
  select(name, contains("against")) %>% 
  mutate(total = rowSums(select(., -name))) %>% 
  select(name, total)

name,total
<chr>,<dbl>
Bulbasaur,19.25
Ivysaur,19.25
Venusaur,19.25
Charmander,18.00
Charmeleon,18.00
Charizard,18.50
Squirtle,18.00
Wartortle,18.00
Blastoise,18.00
Caterpie,19.50


In [8]:
# opcion2
pokemon %>% 
  select(name, contains("against")) %>% 
  mutate(total = reduce(select(., -name), `+`)) 

name,against_bug,against_dark,against_dragon,against_electric,against_fairy,against_fight,against_fire,against_flying,against_ghost,against_grass,against_ground,against_ice,against_normal,against_poison,against_psychic,against_rock,against_steel,against_water,total
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Bulbasaur,1.00,1,1,0.5,0.5,0.50,2.0,2.0,1,0.25,1.0,2.0,1,1.0,2,1.0,1.0,0.5,19.25
Ivysaur,1.00,1,1,0.5,0.5,0.50,2.0,2.0,1,0.25,1.0,2.0,1,1.0,2,1.0,1.0,0.5,19.25
Venusaur,1.00,1,1,0.5,0.5,0.50,2.0,2.0,1,0.25,1.0,2.0,1,1.0,2,1.0,1.0,0.5,19.25
Charmander,0.50,1,1,1.0,0.5,1.00,0.5,1.0,1,0.50,2.0,0.5,1,1.0,1,2.0,0.5,2.0,18.00
Charmeleon,0.50,1,1,1.0,0.5,1.00,0.5,1.0,1,0.50,2.0,0.5,1,1.0,1,2.0,0.5,2.0,18.00
Charizard,0.25,1,1,2.0,0.5,0.50,0.5,1.0,1,0.25,0.0,1.0,1,1.0,1,4.0,0.5,2.0,18.50
Squirtle,1.00,1,1,2.0,1.0,1.00,0.5,1.0,1,2.00,1.0,0.5,1,1.0,1,1.0,0.5,0.5,18.00
Wartortle,1.00,1,1,2.0,1.0,1.00,0.5,1.0,1,2.00,1.0,0.5,1,1.0,1,1.0,0.5,0.5,18.00
Blastoise,1.00,1,1,2.0,1.0,1.00,0.5,1.0,1,2.00,1.0,0.5,1,1.0,1,1.0,0.5,0.5,18.00
Caterpie,1.00,1,1,1.0,1.0,0.50,2.0,2.0,1,0.50,0.5,1.0,1,1.0,1,2.0,1.0,1.0,19.50


## `group_by()` y `summarise()`

Funciones que nos permiten conocer alguna medida de estadística descriptiva, a partir de las categorías de un grupo. 

Ejemplo: 
¿Cuál es la media del peso de los pokemones en función del tipo de pokemon? 

In [84]:
#----group_by() y summarise()----
pokemon2 %>% 
    group_by(type1) %>%
    summarise(across(weight_kg, .fns = list(media = mean))) 

type1,weight_kg_media
<chr>,<dbl>
electric,
fire,
ice,103.26087
water,51.07193


In [85]:
# Comprobar que hay NAs
filter(pokemon2, type1 == "electric") %>%
    select(weight_kg) 

weight_kg
<dbl>
6.0
""
6.0
60.0
10.4
66.6
30.0
24.5
52.6
2.0


In [86]:
pokemon2 %>% 
    group_by(type1) %>%
    summarise(across(weight_kg, .fns = list(media = mean), na.rm = T))

type1,weight_kg_media
<chr>,<dbl>
electric,37.94474
fire,66.096
ice,103.26087
water,51.07193
