# Descriptive Statistics with Python and R


## Index:
* [Data-set](#1)
* [Selecting](#2)
* [Sampling](#3)
* [Filtering](#4)
* [Mutate](#5)
* [Arrange](#6)
* [Rename](#7)
* [Gather](#8)
* [Spread](#9)
* [Separate](#10)
* [Unite](#11)
* [Joins](#12)
* *  [Inner Join](#13)
* * [Full Join](#14)
* * [Left Join](#15)
* * [Right Join](#16)
* * [Semi Join](#17)
* * [Anti Join](#18)
* * [Union](#19)
* * [Intersect](#20)
* * [Difference](#21)
*  [Concatenate](#22)
*  [Group and Summarize](#23)
*  [Other usuful functions ](#24)

## Data-Set <a class="anchor" id="1"></a>

We load the data-set with which we are going to work mainly:

Working with `Python`:

In [92]:
import pandas as pd

from IPython.display import display
pd.options.display.max_columns = None

import warnings
warnings.filterwarnings('ignore')

In [93]:
url = 'https://raw.githubusercontent.com/FabioScielzoOrtiz/Estadistica4all-blog/main/Descriptive%20Statisitcs%20in%20Python%20and%20R/datosECV_Aragon.csv'

data_Python = pd.read_csv(url)

data_Python

Unnamed: 0,ca,datosECVmas16.prov,nomprov,datosECVmas16.gen,edad,nac,neduc,sitlab,ingnorm,horas,factorel
0,2,22,Huesca,1,3,1,3,1,21237.1,36.42,393.7
1,2,22,Huesca,2,2,1,2,1,17810.8,31.72,393.7
2,2,22,Huesca,1,1,1,2,1,11889.1,31.88,393.7
3,2,22,Huesca,1,1,1,2,1,16000.5,38.18,393.7
4,2,22,Huesca,1,1,1,2,3,21169.6,0.00,393.7
...,...,...,...,...,...,...,...,...,...,...,...
1231,2,50,Zaragoza,1,2,1,2,1,11760.6,28.79,2206.1
1232,2,50,Zaragoza,1,2,1,2,2,19321.6,0.00,124.4
1233,2,50,Zaragoza,2,2,1,2,1,19924.8,37.21,124.4
1234,2,50,Zaragoza,2,3,1,1,3,13042.5,0.00,246.5


In [94]:
# pip install dfply 

In [95]:
from dfply import *

In [96]:
list(range(2,10)) 

[2, 3, 4, 5, 6, 7, 8, 9]

In [97]:
data_Python = (data_Python.T >> row_slice( list(range(2,10)) ) ).T

data_Python

Unnamed: 0,nomprov,datosECVmas16.gen,edad,nac,neduc,sitlab,ingnorm,horas
0,Huesca,1,3,1,3,1,21237.1,36.42
1,Huesca,2,2,1,2,1,17810.8,31.72
2,Huesca,1,1,1,2,1,11889.1,31.88
3,Huesca,1,1,1,2,1,16000.5,38.18
4,Huesca,1,1,1,2,3,21169.6,0.0
...,...,...,...,...,...,...,...,...
1231,Zaragoza,1,2,1,2,1,11760.6,28.79
1232,Zaragoza,1,2,1,2,2,19321.6,0.0
1233,Zaragoza,2,2,1,2,1,19924.8,37.21
1234,Zaragoza,2,3,1,1,3,13042.5,0.0


In [98]:
data_Python >> rename(  provincia=X.nomprov , ingresos=X.ingnorm ) 

Unnamed: 0,provincia,datosECVmas16.gen,edad,nac,neduc,sitlab,ingresos,horas
0,Huesca,1,3,1,3,1,21237.1,36.42
1,Huesca,2,2,1,2,1,17810.8,31.72
2,Huesca,1,1,1,2,1,11889.1,31.88
3,Huesca,1,1,1,2,1,16000.5,38.18
4,Huesca,1,1,1,2,3,21169.6,0.0
...,...,...,...,...,...,...,...,...
1231,Zaragoza,1,2,1,2,1,11760.6,28.79
1232,Zaragoza,1,2,1,2,2,19321.6,0.0
1233,Zaragoza,2,2,1,2,1,19924.8,37.21
1234,Zaragoza,2,3,1,1,3,13042.5,0.0


Working with `R`

In [99]:
import rpy2

In [100]:
%load_ext rpy2.ipython

The rpy2.ipython extension is already loaded. To reload it, use:
  %reload_ext rpy2.ipython


In [101]:
import rpy2.robjects as robjects

In [102]:
%%R

library(tidyverse)

url = 'https://raw.githubusercontent.com/FabioScielzoOrtiz/Estadistica4all-blog/main/Descriptive%20Statisitcs%20in%20Python%20and%20R/datosECV_Aragon.csv'

data_R <- read_csv(url)

data_R <- data_R %>% select(3:10)

data_R <- data_R %>% rename("genero"="datosECVmas16.gen",
         "provincia"="nomprov", "ingresos"="ingnorm")

data_R <- as.data.frame(data_R)

Rows: 1236 Columns: 11
-- Column specification --------------------------------------------------------
Delimiter: ","
chr  (1): nomprov
dbl (10): ca, datosECVmas16.prov, datosECVmas16.gen, edad, nac, neduc, sitla...

i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.


In [103]:
%%R

head(data_R)

  provincia genero edad nac neduc sitlab ingresos horas
1    Huesca      1    3   1     3      1  21237.1 36.42
2    Huesca      2    2   1     2      1  17810.8 31.72
3    Huesca      1    1   1     2      1  11889.1 31.88
4    Huesca      1    1   1     2      1  16000.5 38.18
5    Huesca      1    1   1     2      3  21169.6  0.00
6    Huesca      1    2   1     2      1  16001.3 34.52
