# Data Manipulation in Python for R users

## Indice:
* [Head and Tail](#1)
* [Selecting](#2)
* [Sampling](#3)
* [Filtering](#4)
* [Mutate](#5)
* [Arrange](#6)
* [Rename](#7)
* [Gather](#8)
* [Spread](#9)
* [Separate](#10)
* [Unite](#11)
* [Joins](#12)
* *  [Inner Join](#13)
* * [Full Join](#14)
* * [Left Join](#15)
* * [Right Join](#16)
* * [Semi Join](#17)
* * [Anti Join](#18)
* * [Union](#19)
* * [Intersect](#20)
* * [Difference](#21)
*  [Concatenate](#22)
*  [Group and Summarize](#23)
*  [Other usuful functions ](#24)



This article is about data manipulation in Python, but it´s made thinking in data manipulation in R. So here we are not going to be focus on data manipulation with Pandas, that is the classic python package for that propose. Instead of that we will use a package inspired in dplyr R package, that´s probably the most used R packages for data manipulation.

`Dfply` is a Python package whose main goal is replicate dplyr R package. There are other Python packages that allows it, as `dplython`, but `Dfply`  is more developed.

All merits must go to Kiefer Katovich, who is the developer of the `Dfply` package. 

Here the repository of the package, on which this article is based:

https://github.com/kieferk/dfply

In [3]:
import pandas as pd

from IPython.display import display
pd.options.display.max_columns = None

import warnings
warnings.filterwarnings('ignore')

We can install `dfply` as follows:

In [4]:
# pip install dfply 

In [5]:
from dfply import *

We load a data-set with which we will work:

We will use the following url to obtein the data: 

https://raw.githubusercontent.com/FabioScielzoOrtiz/Estadistica4all-blog/main/Data%20Manipulation%20in%20Python/properties_data.csv

In [6]:
url = 'https://raw.githubusercontent.com/FabioScielzoOrtiz/Estadistica4all-blog/main/Data%20Manipulation%20in%20Python/properties_data.csv'
data = pd.read_csv(url)
data

Unnamed: 0,id,neighborhood,latitude,longitude,price,size_in_sqft,price_per_sqft,no_of_bedrooms,no_of_bathrooms,quality,maid_room,unfurnished,balcony,barbecue_area,built_in_wardrobes,central_ac,childrens_play_area,childrens_pool,concierge,covered_parking,kitchen_appliances,lobby_in_building,maid_service,networked,pets_allowed,private_garden,private_gym,private_jacuzzi,private_pool,security,shared_gym,shared_pool,shared_spa,study,vastu_compliant,view_of_landmark,view_of_water,walk_in_closet
0,5528049,Palm Jumeirah,25.000000,55.138932,2700000,1079,2502.32,1,2,Medium,False,False,True,True,False,True,True,False,True,False,True,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,True,False
1,6008529,Palm Jumeirah,25.106809,55.151201,2850000,1582,1801.52,2,2,Medium,False,False,True,False,True,True,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,True,False,False,False,False,True,False
2,6034542,Jumeirah Lake Towers,25.063302,55.137728,1150000,1951,589.44,3,5,Medium,True,True,True,False,True,False,False,False,False,True,False,False,False,False,False,False,False,True,False,True,True,True,False,False,False,True,True,True
3,6326063,Culture Village,25.227295,55.341761,2850000,2020,1410.89,2,3,Low,False,True,True,False,False,False,False,False,True,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False
4,6356778,Palm Jumeirah,25.114275,55.139764,1729200,507,3410.65,0,1,Medium,False,False,False,False,True,True,False,False,False,True,True,False,False,True,False,False,False,False,False,True,True,True,True,False,False,True,True,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1900,7705450,Mohammed Bin Rashid City,25.176892,55.310712,1500000,1087,1379.94,2,2,Ultra,False,True,True,True,True,True,True,True,True,True,True,True,True,True,True,False,False,False,False,True,True,True,True,True,True,True,True,True
1901,7706287,Mohammed Bin Rashid City,25.166145,55.276684,1230000,760,1618.42,1,2,Medium,False,False,True,False,True,True,True,False,True,False,True,False,False,False,False,False,False,False,False,False,True,True,False,False,False,False,True,True
1902,7706389,Dubai Creek Harbour (The Lagoons),25.206500,55.345056,2900000,1930,1502.59,3,5,Medium,True,True,True,False,False,True,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False
1903,7706591,Jumeirah Village Circle,25.073858,55.229844,675000,740,912.16,1,2,Medium,False,True,True,False,True,True,True,False,False,True,True,False,False,False,False,True,False,False,False,True,True,True,False,False,False,False,True,True


## Head and Tail <a class="anchor" id="1"></a>

We can see the head of the data-set:

In [7]:
data >> head(5)

Unnamed: 0,id,neighborhood,latitude,longitude,price,size_in_sqft,price_per_sqft,no_of_bedrooms,no_of_bathrooms,quality,maid_room,unfurnished,balcony,barbecue_area,built_in_wardrobes,central_ac,childrens_play_area,childrens_pool,concierge,covered_parking,kitchen_appliances,lobby_in_building,maid_service,networked,pets_allowed,private_garden,private_gym,private_jacuzzi,private_pool,security,shared_gym,shared_pool,shared_spa,study,vastu_compliant,view_of_landmark,view_of_water,walk_in_closet
0,5528049,Palm Jumeirah,25.0,55.138932,2700000,1079,2502.32,1,2,Medium,False,False,True,True,False,True,True,False,True,False,True,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,True,False
1,6008529,Palm Jumeirah,25.106809,55.151201,2850000,1582,1801.52,2,2,Medium,False,False,True,False,True,True,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,True,False,False,False,False,True,False
2,6034542,Jumeirah Lake Towers,25.063302,55.137728,1150000,1951,589.44,3,5,Medium,True,True,True,False,True,False,False,False,False,True,False,False,False,False,False,False,False,True,False,True,True,True,False,False,False,True,True,True
3,6326063,Culture Village,25.227295,55.341761,2850000,2020,1410.89,2,3,Low,False,True,True,False,False,False,False,False,True,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False
4,6356778,Palm Jumeirah,25.114275,55.139764,1729200,507,3410.65,0,1,Medium,False,False,False,False,True,True,False,False,False,True,True,False,False,True,False,False,False,False,False,True,True,True,True,False,False,True,True,False


Also We can see the tail of the data-set:

In [8]:
data >> tail(5)

Unnamed: 0,id,neighborhood,latitude,longitude,price,size_in_sqft,price_per_sqft,no_of_bedrooms,no_of_bathrooms,quality,maid_room,unfurnished,balcony,barbecue_area,built_in_wardrobes,central_ac,childrens_play_area,childrens_pool,concierge,covered_parking,kitchen_appliances,lobby_in_building,maid_service,networked,pets_allowed,private_garden,private_gym,private_jacuzzi,private_pool,security,shared_gym,shared_pool,shared_spa,study,vastu_compliant,view_of_landmark,view_of_water,walk_in_closet
1900,7705450,Mohammed Bin Rashid City,25.176892,55.310712,1500000,1087,1379.94,2,2,Ultra,False,True,True,True,True,True,True,True,True,True,True,True,True,True,True,False,False,False,False,True,True,True,True,True,True,True,True,True
1901,7706287,Mohammed Bin Rashid City,25.166145,55.276684,1230000,760,1618.42,1,2,Medium,False,False,True,False,True,True,True,False,True,False,True,False,False,False,False,False,False,False,False,False,True,True,False,False,False,False,True,True
1902,7706389,Dubai Creek Harbour (The Lagoons),25.2065,55.345056,2900000,1930,1502.59,3,5,Medium,True,True,True,False,False,True,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False
1903,7706591,Jumeirah Village Circle,25.073858,55.229844,675000,740,912.16,1,2,Medium,False,True,True,False,True,True,True,False,False,True,True,False,False,False,False,True,False,False,False,True,True,True,False,False,False,False,True,True
1904,7706643,Jumeirah Lake Towers,25.07913,55.154713,760887,800,951.11,1,2,High,False,False,True,True,True,True,True,True,True,True,True,True,False,False,False,False,False,False,False,True,True,True,True,False,False,False,True,False


## Selecting <a class="anchor" id="2"></a>

Selecting columns of the data-set:

In [9]:
data >> select(X.price , X.size_in_sqft, X.no_of_bedrooms , X.no_of_bathrooms)

Unnamed: 0,price,size_in_sqft,no_of_bedrooms,no_of_bathrooms
0,2700000,1079,1,2
1,2850000,1582,2,2
2,1150000,1951,3,5
3,2850000,2020,2,3
4,1729200,507,0,1
...,...,...,...,...
1900,1500000,1087,2,2
1901,1230000,760,1,2
1902,2900000,1930,3,5
1903,675000,740,1,2


In [10]:
data >> select( 4, 7, 8)

Unnamed: 0,price,no_of_bedrooms,no_of_bathrooms
0,2700000,1,2
1,2850000,2,2
2,1150000,3,5
3,2850000,2,3
4,1729200,0,1
...,...,...,...
1900,1500000,2,2
1901,1230000,1,2
1902,2900000,3,5
1903,675000,1,2


We can select all the columns (variables) that are different to some:

In [11]:
data >> select( ~X.neighborhood , ~X.id )

Unnamed: 0,latitude,longitude,price,size_in_sqft,price_per_sqft,no_of_bedrooms,no_of_bathrooms,quality,maid_room,unfurnished,balcony,barbecue_area,built_in_wardrobes,central_ac,childrens_play_area,childrens_pool,concierge,covered_parking,kitchen_appliances,lobby_in_building,maid_service,networked,pets_allowed,private_garden,private_gym,private_jacuzzi,private_pool,security,shared_gym,shared_pool,shared_spa,study,vastu_compliant,view_of_landmark,view_of_water,walk_in_closet
0,25.000000,55.138932,2700000,1079,2502.32,1,2,Medium,False,False,True,True,False,True,True,False,True,False,True,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,True,False
1,25.106809,55.151201,2850000,1582,1801.52,2,2,Medium,False,False,True,False,True,True,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,True,False,False,False,False,True,False
2,25.063302,55.137728,1150000,1951,589.44,3,5,Medium,True,True,True,False,True,False,False,False,False,True,False,False,False,False,False,False,False,True,False,True,True,True,False,False,False,True,True,True
3,25.227295,55.341761,2850000,2020,1410.89,2,3,Low,False,True,True,False,False,False,False,False,True,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False
4,25.114275,55.139764,1729200,507,3410.65,0,1,Medium,False,False,False,False,True,True,False,False,False,True,True,False,False,True,False,False,False,False,False,True,True,True,True,False,False,True,True,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1900,25.176892,55.310712,1500000,1087,1379.94,2,2,Ultra,False,True,True,True,True,True,True,True,True,True,True,True,True,True,True,False,False,False,False,True,True,True,True,True,True,True,True,True
1901,25.166145,55.276684,1230000,760,1618.42,1,2,Medium,False,False,True,False,True,True,True,False,True,False,True,False,False,False,False,False,False,False,False,False,True,True,False,False,False,False,True,True
1902,25.206500,55.345056,2900000,1930,1502.59,3,5,Medium,True,True,True,False,False,True,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False
1903,25.073858,55.229844,675000,740,912.16,1,2,Medium,False,True,True,False,True,True,True,False,False,True,True,False,False,False,False,True,False,False,False,True,True,True,False,False,False,False,True,True


Note that if you are going to use the inversion operator, you must place it prior to the symbolic X

In [12]:
data >> select(starts_with('p')) >> head(5)

Unnamed: 0,price,price_per_sqft,pets_allowed,private_garden,private_gym,private_jacuzzi,private_pool
0,2700000,2502.32,True,False,False,False,False
1,2850000,1801.52,False,False,False,False,False
2,1150000,589.44,False,False,False,True,False
3,2850000,1410.89,True,False,False,False,False
4,1729200,3410.65,False,False,False,False,False


Now we are going to see more specifical functions to select columns. 

In [13]:
data >> select( ~ starts_with('p')) >> head(5)

Unnamed: 0,id,neighborhood,latitude,longitude,size_in_sqft,no_of_bedrooms,no_of_bathrooms,quality,maid_room,unfurnished,balcony,barbecue_area,built_in_wardrobes,central_ac,childrens_play_area,childrens_pool,concierge,covered_parking,kitchen_appliances,lobby_in_building,maid_service,networked,security,shared_gym,shared_pool,shared_spa,study,vastu_compliant,view_of_landmark,view_of_water,walk_in_closet
0,5528049,Palm Jumeirah,25.0,55.138932,1079,1,2,Medium,False,False,True,True,False,True,True,False,True,False,True,False,False,False,False,True,False,False,False,False,False,True,False
1,6008529,Palm Jumeirah,25.106809,55.151201,1582,2,2,Medium,False,False,True,False,True,True,True,False,False,False,False,False,False,False,False,True,True,False,False,False,False,True,False
2,6034542,Jumeirah Lake Towers,25.063302,55.137728,1951,3,5,Medium,True,True,True,False,True,False,False,False,False,True,False,False,False,False,True,True,True,False,False,False,True,True,True
3,6326063,Culture Village,25.227295,55.341761,2020,2,3,Low,False,True,True,False,False,False,False,False,True,True,False,False,False,False,False,False,False,False,False,False,False,False,False
4,6356778,Palm Jumeirah,25.114275,55.139764,507,0,1,Medium,False,False,False,False,True,True,False,False,False,True,True,False,False,True,True,True,True,True,False,False,True,True,False


In [14]:
data >> select( ends_with('e')) >> head(5)

Unnamed: 0,latitude,longitude,price,concierge,maid_service
0,25.0,55.138932,2700000,True,False
1,25.106809,55.151201,2850000,False,False
2,25.063302,55.137728,1150000,False,False
3,25.227295,55.341761,2850000,True,False
4,25.114275,55.139764,1729200,False,False


In [15]:
data >> select(contains('of'))

Unnamed: 0,no_of_bedrooms,no_of_bathrooms,view_of_landmark,view_of_water
0,1,2,False,True
1,2,2,False,True
2,3,5,True,True
3,2,3,False,False
4,0,1,True,True
...,...,...,...,...
1900,2,2,True,True
1901,1,2,False,True
1902,3,5,False,False
1903,1,2,False,True


In [16]:
data >> select(columns_between( X.latitude , X.size_in_sqft , inclusive=True))

Unnamed: 0,latitude,longitude,price,size_in_sqft
0,25.000000,55.138932,2700000,1079
1,25.106809,55.151201,2850000,1582
2,25.063302,55.137728,1150000,1951
3,25.227295,55.341761,2850000,2020
4,25.114275,55.139764,1729200,507
...,...,...,...,...
1900,25.176892,55.310712,1500000,1087
1901,25.166145,55.276684,1230000,760
1902,25.206500,55.345056,2900000,1930
1903,25.073858,55.229844,675000,740


In [17]:
data >> select( ~ columns_between( X.latitude , X.size_in_sqft , inclusive=True))

Unnamed: 0,id,neighborhood,price_per_sqft,no_of_bedrooms,no_of_bathrooms,quality,maid_room,unfurnished,balcony,barbecue_area,built_in_wardrobes,central_ac,childrens_play_area,childrens_pool,concierge,covered_parking,kitchen_appliances,lobby_in_building,maid_service,networked,pets_allowed,private_garden,private_gym,private_jacuzzi,private_pool,security,shared_gym,shared_pool,shared_spa,study,vastu_compliant,view_of_landmark,view_of_water,walk_in_closet
0,5528049,Palm Jumeirah,2502.32,1,2,Medium,False,False,True,True,False,True,True,False,True,False,True,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,True,False
1,6008529,Palm Jumeirah,1801.52,2,2,Medium,False,False,True,False,True,True,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,True,False,False,False,False,True,False
2,6034542,Jumeirah Lake Towers,589.44,3,5,Medium,True,True,True,False,True,False,False,False,False,True,False,False,False,False,False,False,False,True,False,True,True,True,False,False,False,True,True,True
3,6326063,Culture Village,1410.89,2,3,Low,False,True,True,False,False,False,False,False,True,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False
4,6356778,Palm Jumeirah,3410.65,0,1,Medium,False,False,False,False,True,True,False,False,False,True,True,False,False,True,False,False,False,False,False,True,True,True,True,False,False,True,True,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1900,7705450,Mohammed Bin Rashid City,1379.94,2,2,Ultra,False,True,True,True,True,True,True,True,True,True,True,True,True,True,True,False,False,False,False,True,True,True,True,True,True,True,True,True
1901,7706287,Mohammed Bin Rashid City,1618.42,1,2,Medium,False,False,True,False,True,True,True,False,True,False,True,False,False,False,False,False,False,False,False,False,True,True,False,False,False,False,True,True
1902,7706389,Dubai Creek Harbour (The Lagoons),1502.59,3,5,Medium,True,True,True,False,False,True,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False
1903,7706591,Jumeirah Village Circle,912.16,1,2,Medium,False,True,True,False,True,True,True,False,False,True,True,False,False,False,False,True,False,False,False,True,True,True,False,False,False,False,True,True


In [18]:
data >> select(columns_between( X.latitude , X.size_in_sqft , inclusive=False))

Unnamed: 0,latitude,longitude,price
0,25.000000,55.138932,2700000
1,25.106809,55.151201,2850000
2,25.063302,55.137728,1150000
3,25.227295,55.341761,2850000
4,25.114275,55.139764,1729200
...,...,...,...
1900,25.176892,55.310712,1500000
1901,25.166145,55.276684,1230000
1902,25.206500,55.345056,2900000
1903,25.073858,55.229844,675000


In [19]:
data >> select(columns_between( 2 , 5 , inclusive=True))

Unnamed: 0,latitude,longitude,price,size_in_sqft
0,25.000000,55.138932,2700000,1079
1,25.106809,55.151201,2850000,1582
2,25.063302,55.137728,1150000,1951
3,25.227295,55.341761,2850000,2020
4,25.114275,55.139764,1729200,507
...,...,...,...,...
1900,25.176892,55.310712,1500000,1087
1901,25.166145,55.276684,1230000,760
1902,25.206500,55.345056,2900000,1930
1903,25.073858,55.229844,675000,740


In [20]:
( data >> select(columns_between( 2 , 5 , inclusive=False)) ) 

Unnamed: 0,latitude,longitude,price
0,25.000000,55.138932,2700000
1,25.106809,55.151201,2850000
2,25.063302,55.137728,1150000
3,25.227295,55.341761,2850000
4,25.114275,55.139764,1729200
...,...,...,...
1900,25.176892,55.310712,1500000
1901,25.166145,55.276684,1230000
1902,25.206500,55.345056,2900000
1903,25.073858,55.229844,675000


In [21]:
data >> select(columns_to(end_col=X.price , inclusive=True))

Unnamed: 0,id,neighborhood,latitude,longitude,price
0,5528049,Palm Jumeirah,25.000000,55.138932,2700000
1,6008529,Palm Jumeirah,25.106809,55.151201,2850000
2,6034542,Jumeirah Lake Towers,25.063302,55.137728,1150000
3,6326063,Culture Village,25.227295,55.341761,2850000
4,6356778,Palm Jumeirah,25.114275,55.139764,1729200
...,...,...,...,...,...
1900,7705450,Mohammed Bin Rashid City,25.176892,55.310712,1500000
1901,7706287,Mohammed Bin Rashid City,25.166145,55.276684,1230000
1902,7706389,Dubai Creek Harbour (The Lagoons),25.206500,55.345056,2900000
1903,7706591,Jumeirah Village Circle,25.073858,55.229844,675000


In [22]:
data >> select(columns_from(start_col=X.price))

Unnamed: 0,price,size_in_sqft,price_per_sqft,no_of_bedrooms,no_of_bathrooms,quality,maid_room,unfurnished,balcony,barbecue_area,built_in_wardrobes,central_ac,childrens_play_area,childrens_pool,concierge,covered_parking,kitchen_appliances,lobby_in_building,maid_service,networked,pets_allowed,private_garden,private_gym,private_jacuzzi,private_pool,security,shared_gym,shared_pool,shared_spa,study,vastu_compliant,view_of_landmark,view_of_water,walk_in_closet
0,2700000,1079,2502.32,1,2,Medium,False,False,True,True,False,True,True,False,True,False,True,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,True,False
1,2850000,1582,1801.52,2,2,Medium,False,False,True,False,True,True,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,True,False,False,False,False,True,False
2,1150000,1951,589.44,3,5,Medium,True,True,True,False,True,False,False,False,False,True,False,False,False,False,False,False,False,True,False,True,True,True,False,False,False,True,True,True
3,2850000,2020,1410.89,2,3,Low,False,True,True,False,False,False,False,False,True,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False
4,1729200,507,3410.65,0,1,Medium,False,False,False,False,True,True,False,False,False,True,True,False,False,True,False,False,False,False,False,True,True,True,True,False,False,True,True,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1900,1500000,1087,1379.94,2,2,Ultra,False,True,True,True,True,True,True,True,True,True,True,True,True,True,True,False,False,False,False,True,True,True,True,True,True,True,True,True
1901,1230000,760,1618.42,1,2,Medium,False,False,True,False,True,True,True,False,True,False,True,False,False,False,False,False,False,False,False,False,True,True,False,False,False,False,True,True
1902,2900000,1930,1502.59,3,5,Medium,True,True,True,False,False,True,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False
1903,675000,740,912.16,1,2,Medium,False,True,True,False,True,True,True,False,False,True,True,False,False,False,False,True,False,False,False,True,True,True,False,False,False,False,True,True


We can also select rows of a data-set wit `row_slice`:

In [23]:
data >> row_slice([10,15])

Unnamed: 0,id,neighborhood,latitude,longitude,price,size_in_sqft,price_per_sqft,no_of_bedrooms,no_of_bathrooms,quality,maid_room,unfurnished,balcony,barbecue_area,built_in_wardrobes,central_ac,childrens_play_area,childrens_pool,concierge,covered_parking,kitchen_appliances,lobby_in_building,maid_service,networked,pets_allowed,private_garden,private_gym,private_jacuzzi,private_pool,security,shared_gym,shared_pool,shared_spa,study,vastu_compliant,view_of_landmark,view_of_water,walk_in_closet
10,6473651,Downtown Dubai,25.198796,55.271342,3550000,1918,1850.89,3,4,Low,False,True,True,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,True,False
15,6526806,Palm Jumeirah,25.132021,55.151405,2349990,1109,2119.02,1,1,Medium,False,False,True,True,True,True,False,False,True,False,False,False,False,False,False,False,False,False,False,False,True,True,False,False,False,False,True,False


In [52]:
list(range(2,10)) # 2:9

data >> row_slice(list(range(2,10)))

SyntaxError: invalid syntax (165881151.py, line 3)

We can use `row_slice` as column_slice using de transpose data-set :

In [25]:
(data.T >> row_slice( [2,3,5,7]) ).T 
# select the columns 3,4,6,8 of the data-frame

# data-frame.T is the way in Pandas to transpose a data-frame

Unnamed: 0,latitude,longitude,size_in_sqft,no_of_bedrooms
0,25.0,55.138932,1079,1
1,25.106809,55.151201,1582,2
2,25.063302,55.137728,1951,3
3,25.227295,55.341761,2020,2
4,25.114275,55.139764,507,0
...,...,...,...,...
1900,25.176892,55.310712,1087,2
1901,25.166145,55.276684,760,1
1902,25.2065,55.345056,1930,3
1903,25.073858,55.229844,740,1


## Sampling <a class="anchor" id="3"></a>

We can extraxt a sample of the rows of a data-set as follows:

In [26]:
data >> sample(frac=0.15, replace=False)

Unnamed: 0,id,neighborhood,latitude,longitude,price,size_in_sqft,price_per_sqft,no_of_bedrooms,no_of_bathrooms,quality,maid_room,unfurnished,balcony,barbecue_area,built_in_wardrobes,central_ac,childrens_play_area,childrens_pool,concierge,covered_parking,kitchen_appliances,lobby_in_building,maid_service,networked,pets_allowed,private_garden,private_gym,private_jacuzzi,private_pool,security,shared_gym,shared_pool,shared_spa,study,vastu_compliant,view_of_landmark,view_of_water,walk_in_closet
1287,7662808,Jumeirah Village Circle,25.049772,55.204243,1199999,1361,881.70,2,3,Medium,False,True,True,False,True,True,True,False,False,True,True,False,False,False,True,False,False,False,False,False,True,True,False,False,False,False,False,False
1821,7698876,Jumeirah Village Circle,25.048095,55.206373,610502,812,751.85,1,2,Ultra,True,False,True,True,True,True,True,True,True,True,True,True,True,True,True,False,False,False,False,True,True,True,True,True,True,True,True,True
1404,7669561,Dubai Marina,25.086376,55.147360,949999,619,1534.73,1,1,Medium,False,True,True,False,True,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,True,True,False,False,False,False,True,False
493,7566522,Town Square,25.010570,55.289787,1100000,1462,752.39,3,3,Low,False,True,False,False,True,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,True,False,False,False,True,False,False
122,7282005,Dubai Hills Estate,25.115760,55.248755,700000,646,1083.59,1,1,Low,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
612,7590274,World Trade Center,25.223728,55.285118,2800000,2938,953.03,3,3,Medium,True,True,True,False,True,True,True,False,False,True,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False
1441,7670734,Jumeirah Village Circle,25.058746,55.209389,426000,390,1092.31,0,1,Medium,False,False,True,False,True,True,True,False,False,True,True,False,False,False,True,False,False,False,False,False,True,True,False,False,False,False,False,False
870,7623173,Culture Village,25.226946,55.343628,1000000,707,1414.43,1,1,Low,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
533,7580761,The Views,25.093319,55.168887,1300000,1342,968.70,2,2,Low,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False


In [27]:
data >> sample(n=15, replace=False)


Unnamed: 0,id,neighborhood,latitude,longitude,price,size_in_sqft,price_per_sqft,no_of_bedrooms,no_of_bathrooms,quality,maid_room,unfurnished,balcony,barbecue_area,built_in_wardrobes,central_ac,childrens_play_area,childrens_pool,concierge,covered_parking,kitchen_appliances,lobby_in_building,maid_service,networked,pets_allowed,private_garden,private_gym,private_jacuzzi,private_pool,security,shared_gym,shared_pool,shared_spa,study,vastu_compliant,view_of_landmark,view_of_water,walk_in_closet
1686,7688910,Greens,25.094173,55.169437,1780000,2050,868.29,3,3,Low,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
688,7600585,Palm Jumeirah,25.113366,55.13609,1395000,1953,714.29,2,2,Low,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
1438,7670619,Business Bay,25.18902,55.282216,1500000,1741,861.57,2,3,Medium,False,True,True,False,True,True,False,False,False,False,True,False,False,False,True,False,False,False,False,False,True,True,False,True,False,False,False,False
1096,7647809,Jumeirah Village Circle,25.056706,55.207907,603000,997,604.81,1,2,Medium,False,True,True,True,True,True,True,False,False,True,False,False,False,False,False,False,False,False,False,False,True,True,False,True,False,False,False,False
1233,7658635,Jumeirah Lake Towers,25.067019,55.139829,1305000,1338,975.34,2,3,Medium,False,True,True,False,True,True,False,False,False,False,False,False,False,False,True,False,False,False,False,True,True,True,False,False,False,False,True,False
1344,7666763,Al Barari,25.099422,55.312967,4300000,4026,1068.06,3,4,Low,False,False,True,False,True,True,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False
941,7630166,Dubai Marina,25.086376,55.14736,2175000,1312,1657.77,2,3,Medium,False,True,True,False,True,True,False,False,False,True,True,False,False,False,False,False,False,False,False,True,True,True,False,False,False,True,True,False
1090,7647086,Jumeirah Village Circle,25.059555,55.201659,750000,1321,567.75,2,3,Medium,False,True,True,False,True,True,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,True,False,False,False,False,False,False
877,7623483,Mirdif,25.22724,55.441623,936000,780,1200.0,1,2,Medium,False,True,True,False,True,True,False,False,False,True,False,False,False,False,False,False,False,False,False,True,True,True,False,False,False,True,False,False
1895,7705052,Dubai Marina,25.081243,55.14512,1350000,1578,855.51,2,4,Low,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False


## Filtering <a class="anchor" id="4"></a>

We can obtain the rows with different values for one variable (column) as follows:

In [28]:
data >> distinct(X.price)

Unnamed: 0,id,neighborhood,latitude,longitude,price,size_in_sqft,price_per_sqft,no_of_bedrooms,no_of_bathrooms,quality,maid_room,unfurnished,balcony,barbecue_area,built_in_wardrobes,central_ac,childrens_play_area,childrens_pool,concierge,covered_parking,kitchen_appliances,lobby_in_building,maid_service,networked,pets_allowed,private_garden,private_gym,private_jacuzzi,private_pool,security,shared_gym,shared_pool,shared_spa,study,vastu_compliant,view_of_landmark,view_of_water,walk_in_closet
0,5528049,Palm Jumeirah,25.000000,55.138932,2700000,1079,2502.32,1,2,Medium,False,False,True,True,False,True,True,False,True,False,True,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,True,False
1,6008529,Palm Jumeirah,25.106809,55.151201,2850000,1582,1801.52,2,2,Medium,False,False,True,False,True,True,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,True,False,False,False,False,True,False
2,6034542,Jumeirah Lake Towers,25.063302,55.137728,1150000,1951,589.44,3,5,Medium,True,True,True,False,True,False,False,False,False,True,False,False,False,False,False,False,False,True,False,True,True,True,False,False,False,True,True,True
4,6356778,Palm Jumeirah,25.114275,55.139764,1729200,507,3410.65,0,1,Medium,False,False,False,False,True,True,False,False,False,True,True,False,False,True,False,False,False,False,False,True,True,True,True,False,False,True,True,False
5,6356784,Palm Jumeirah,25.114275,55.139764,3119900,1015,3073.79,1,2,Medium,False,False,False,False,True,True,False,False,False,True,True,False,False,True,False,False,False,False,False,True,True,True,True,False,False,True,True,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1885,7703691,Palm Jumeirah,25.103972,55.149621,31440000,6542,4805.87,4,6,High,True,False,True,False,False,True,False,False,True,True,True,False,True,False,True,False,False,True,True,True,True,True,True,False,False,True,True,True
1886,7703701,Dubai Harbour,25.099380,55.141275,4856888,2088,2326.10,3,3,Medium,True,True,True,False,False,True,False,False,False,True,False,False,False,False,True,False,False,False,True,True,True,True,True,False,False,False,True,False
1894,7704943,Downtown Dubai,25.191107,55.269910,980888,1088,901.55,1,1,Medium,False,True,True,False,False,True,False,False,True,True,True,False,False,False,False,False,False,False,False,False,True,True,False,False,False,False,False,True
1896,7705124,Downtown Dubai,25.196489,55.272126,18040888,5253,3434.40,4,4,Medium,True,True,True,False,True,True,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,True,False,True,False,False,False,False


We would get the same if we would use `drop_duplicate`, that is a function of `Pandas`

In [29]:
data.drop_duplicates('price')

Unnamed: 0,id,neighborhood,latitude,longitude,price,size_in_sqft,price_per_sqft,no_of_bedrooms,no_of_bathrooms,quality,maid_room,unfurnished,balcony,barbecue_area,built_in_wardrobes,central_ac,childrens_play_area,childrens_pool,concierge,covered_parking,kitchen_appliances,lobby_in_building,maid_service,networked,pets_allowed,private_garden,private_gym,private_jacuzzi,private_pool,security,shared_gym,shared_pool,shared_spa,study,vastu_compliant,view_of_landmark,view_of_water,walk_in_closet
0,5528049,Palm Jumeirah,25.000000,55.138932,2700000,1079,2502.32,1,2,Medium,False,False,True,True,False,True,True,False,True,False,True,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,True,False
1,6008529,Palm Jumeirah,25.106809,55.151201,2850000,1582,1801.52,2,2,Medium,False,False,True,False,True,True,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,True,False,False,False,False,True,False
2,6034542,Jumeirah Lake Towers,25.063302,55.137728,1150000,1951,589.44,3,5,Medium,True,True,True,False,True,False,False,False,False,True,False,False,False,False,False,False,False,True,False,True,True,True,False,False,False,True,True,True
4,6356778,Palm Jumeirah,25.114275,55.139764,1729200,507,3410.65,0,1,Medium,False,False,False,False,True,True,False,False,False,True,True,False,False,True,False,False,False,False,False,True,True,True,True,False,False,True,True,False
5,6356784,Palm Jumeirah,25.114275,55.139764,3119900,1015,3073.79,1,2,Medium,False,False,False,False,True,True,False,False,False,True,True,False,False,True,False,False,False,False,False,True,True,True,True,False,False,True,True,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1885,7703691,Palm Jumeirah,25.103972,55.149621,31440000,6542,4805.87,4,6,High,True,False,True,False,False,True,False,False,True,True,True,False,True,False,True,False,False,True,True,True,True,True,True,False,False,True,True,True
1886,7703701,Dubai Harbour,25.099380,55.141275,4856888,2088,2326.10,3,3,Medium,True,True,True,False,False,True,False,False,False,True,False,False,False,False,True,False,False,False,True,True,True,True,True,False,False,False,True,False
1894,7704943,Downtown Dubai,25.191107,55.269910,980888,1088,901.55,1,1,Medium,False,True,True,False,False,True,False,False,True,True,True,False,False,False,False,False,False,False,False,False,True,True,False,False,False,False,False,True
1896,7705124,Downtown Dubai,25.196489,55.272126,18040888,5253,3434.40,4,4,Medium,True,True,True,False,True,True,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,True,False,True,False,False,False,False


We can filter rows (observations) using conditions on the variables (columns) as folllow:

In [30]:
data >> filter_by(X.quality == 'Low' , X.price > 500000 )

Unnamed: 0,id,neighborhood,latitude,longitude,price,size_in_sqft,price_per_sqft,no_of_bedrooms,no_of_bathrooms,quality,maid_room,unfurnished,balcony,barbecue_area,built_in_wardrobes,central_ac,childrens_play_area,childrens_pool,concierge,covered_parking,kitchen_appliances,lobby_in_building,maid_service,networked,pets_allowed,private_garden,private_gym,private_jacuzzi,private_pool,security,shared_gym,shared_pool,shared_spa,study,vastu_compliant,view_of_landmark,view_of_water,walk_in_closet
3,6326063,Culture Village,25.227295,55.341761,2850000,2020,1410.89,2,3,Low,False,True,True,False,False,False,False,False,True,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False
8,6376886,Palm Jumeirah,25.106668,55.149275,2100000,2186,960.66,3,3,Low,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
10,6473651,Downtown Dubai,25.198796,55.271342,3550000,1918,1850.89,3,4,Low,False,True,True,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,True,False
11,6493425,Dubai Marina,25.075017,55.137997,2094999,1058,1980.15,2,3,Low,False,True,True,False,True,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,True,False,False,False,False,False,False
12,6493433,Dubai Marina,25.075017,55.137997,1049999,609,1724.14,1,2,Low,False,True,False,False,True,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,True,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1884,7703575,Jumeirah Beach Residence,25.072569,55.126527,3300000,1180,2796.61,2,3,Low,False,True,False,False,True,False,False,False,True,True,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False
1887,7703806,City Walk,25.208262,55.262829,1396000,1672,834.93,1,2,Low,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
1889,7703914,Palm Jumeirah,25.106668,55.149275,1400000,1173,1193.52,1,1,Low,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
1891,7704040,Culture Village,25.226946,55.343628,7000000,4068,1720.75,4,6,Low,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False


In [31]:
data >> filter_by( (X.quality == 'Low') &  (X.price > 500000) )

Unnamed: 0,id,neighborhood,latitude,longitude,price,size_in_sqft,price_per_sqft,no_of_bedrooms,no_of_bathrooms,quality,maid_room,unfurnished,balcony,barbecue_area,built_in_wardrobes,central_ac,childrens_play_area,childrens_pool,concierge,covered_parking,kitchen_appliances,lobby_in_building,maid_service,networked,pets_allowed,private_garden,private_gym,private_jacuzzi,private_pool,security,shared_gym,shared_pool,shared_spa,study,vastu_compliant,view_of_landmark,view_of_water,walk_in_closet
3,6326063,Culture Village,25.227295,55.341761,2850000,2020,1410.89,2,3,Low,False,True,True,False,False,False,False,False,True,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False
8,6376886,Palm Jumeirah,25.106668,55.149275,2100000,2186,960.66,3,3,Low,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
10,6473651,Downtown Dubai,25.198796,55.271342,3550000,1918,1850.89,3,4,Low,False,True,True,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,True,False
11,6493425,Dubai Marina,25.075017,55.137997,2094999,1058,1980.15,2,3,Low,False,True,True,False,True,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,True,False,False,False,False,False,False
12,6493433,Dubai Marina,25.075017,55.137997,1049999,609,1724.14,1,2,Low,False,True,False,False,True,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,True,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1884,7703575,Jumeirah Beach Residence,25.072569,55.126527,3300000,1180,2796.61,2,3,Low,False,True,False,False,True,False,False,False,True,True,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False
1887,7703806,City Walk,25.208262,55.262829,1396000,1672,834.93,1,2,Low,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
1889,7703914,Palm Jumeirah,25.106668,55.149275,1400000,1173,1193.52,1,1,Low,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
1891,7704040,Culture Village,25.226946,55.343628,7000000,4068,1720.75,4,6,Low,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False


Another available syntax with which we can do the previous operation:

In [32]:
data >> filter_by( (X['quality'] == 'Low') &  (X['price'] > 500000) )

Unnamed: 0,id,neighborhood,latitude,longitude,price,size_in_sqft,price_per_sqft,no_of_bedrooms,no_of_bathrooms,quality,maid_room,unfurnished,balcony,barbecue_area,built_in_wardrobes,central_ac,childrens_play_area,childrens_pool,concierge,covered_parking,kitchen_appliances,lobby_in_building,maid_service,networked,pets_allowed,private_garden,private_gym,private_jacuzzi,private_pool,security,shared_gym,shared_pool,shared_spa,study,vastu_compliant,view_of_landmark,view_of_water,walk_in_closet
3,6326063,Culture Village,25.227295,55.341761,2850000,2020,1410.89,2,3,Low,False,True,True,False,False,False,False,False,True,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False
8,6376886,Palm Jumeirah,25.106668,55.149275,2100000,2186,960.66,3,3,Low,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
10,6473651,Downtown Dubai,25.198796,55.271342,3550000,1918,1850.89,3,4,Low,False,True,True,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,True,False
11,6493425,Dubai Marina,25.075017,55.137997,2094999,1058,1980.15,2,3,Low,False,True,True,False,True,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,True,False,False,False,False,False,False
12,6493433,Dubai Marina,25.075017,55.137997,1049999,609,1724.14,1,2,Low,False,True,False,False,True,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,True,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1884,7703575,Jumeirah Beach Residence,25.072569,55.126527,3300000,1180,2796.61,2,3,Low,False,True,False,False,True,False,False,False,True,True,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False
1887,7703806,City Walk,25.208262,55.262829,1396000,1672,834.93,1,2,Low,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
1889,7703914,Palm Jumeirah,25.106668,55.149275,1400000,1173,1193.52,1,1,Low,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
1891,7704040,Culture Village,25.226946,55.343628,7000000,4068,1720.75,4,6,Low,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False


More examples of filter conditions:

In [33]:
data >> filter_by( (X.quality == 'Low') | (X.price > 500000) )

Unnamed: 0,id,neighborhood,latitude,longitude,price,size_in_sqft,price_per_sqft,no_of_bedrooms,no_of_bathrooms,quality,maid_room,unfurnished,balcony,barbecue_area,built_in_wardrobes,central_ac,childrens_play_area,childrens_pool,concierge,covered_parking,kitchen_appliances,lobby_in_building,maid_service,networked,pets_allowed,private_garden,private_gym,private_jacuzzi,private_pool,security,shared_gym,shared_pool,shared_spa,study,vastu_compliant,view_of_landmark,view_of_water,walk_in_closet
0,5528049,Palm Jumeirah,25.000000,55.138932,2700000,1079,2502.32,1,2,Medium,False,False,True,True,False,True,True,False,True,False,True,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,True,False
1,6008529,Palm Jumeirah,25.106809,55.151201,2850000,1582,1801.52,2,2,Medium,False,False,True,False,True,True,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,True,False,False,False,False,True,False
2,6034542,Jumeirah Lake Towers,25.063302,55.137728,1150000,1951,589.44,3,5,Medium,True,True,True,False,True,False,False,False,False,True,False,False,False,False,False,False,False,True,False,True,True,True,False,False,False,True,True,True
3,6326063,Culture Village,25.227295,55.341761,2850000,2020,1410.89,2,3,Low,False,True,True,False,False,False,False,False,True,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False
4,6356778,Palm Jumeirah,25.114275,55.139764,1729200,507,3410.65,0,1,Medium,False,False,False,False,True,True,False,False,False,True,True,False,False,True,False,False,False,False,False,True,True,True,True,False,False,True,True,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1900,7705450,Mohammed Bin Rashid City,25.176892,55.310712,1500000,1087,1379.94,2,2,Ultra,False,True,True,True,True,True,True,True,True,True,True,True,True,True,True,False,False,False,False,True,True,True,True,True,True,True,True,True
1901,7706287,Mohammed Bin Rashid City,25.166145,55.276684,1230000,760,1618.42,1,2,Medium,False,False,True,False,True,True,True,False,True,False,True,False,False,False,False,False,False,False,False,False,True,True,False,False,False,False,True,True
1902,7706389,Dubai Creek Harbour (The Lagoons),25.206500,55.345056,2900000,1930,1502.59,3,5,Medium,True,True,True,False,False,True,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False
1903,7706591,Jumeirah Village Circle,25.073858,55.229844,675000,740,912.16,1,2,Medium,False,True,True,False,True,True,True,False,False,True,True,False,False,False,False,True,False,False,False,True,True,True,False,False,False,False,True,True


In [34]:
data >> filter_by( ~ ( (X.quality == 'Low') | (X.price > 500000) ) )

Unnamed: 0,id,neighborhood,latitude,longitude,price,size_in_sqft,price_per_sqft,no_of_bedrooms,no_of_bathrooms,quality,maid_room,unfurnished,balcony,barbecue_area,built_in_wardrobes,central_ac,childrens_play_area,childrens_pool,concierge,covered_parking,kitchen_appliances,lobby_in_building,maid_service,networked,pets_allowed,private_garden,private_gym,private_jacuzzi,private_pool,security,shared_gym,shared_pool,shared_spa,study,vastu_compliant,view_of_landmark,view_of_water,walk_in_closet
49,7033393,DAMAC Hills,25.016736,55.251010,365000,451,809.31,0,1,Medium,False,True,True,False,True,True,False,False,False,True,False,False,False,False,True,False,False,False,False,False,True,True,False,False,False,False,False,False
55,7075159,Jumeirah Lake Towers,25.065736,55.137452,375000,450,833.33,0,1,Medium,False,True,True,False,True,True,False,False,False,True,False,False,False,False,True,False,False,False,False,True,True,True,False,False,False,True,True,False
146,7346962,Dubai Sports City,25.044572,55.218948,390000,849,459.36,1,2,Medium,False,False,True,False,True,True,False,False,False,True,True,False,False,False,False,False,False,False,False,True,True,True,False,False,False,True,False,False
160,7376067,Dubai Sports City,25.042264,55.217360,410000,969,423.12,1,2,Medium,False,True,True,False,True,True,False,False,True,False,True,False,False,False,False,False,False,False,False,True,True,True,False,False,False,False,False,False
197,7419623,Jumeirah Village Circle,25.063265,55.215131,410000,738,555.56,1,1,Medium,False,True,True,True,True,True,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,True,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1845,7700464,Jumeirah Village Circle,25.046296,55.200783,443850,422,1051.78,0,1,Medium,False,True,False,False,True,True,True,False,False,True,False,False,False,False,False,False,False,False,False,True,True,True,False,False,False,False,False,False
1849,7700952,Jumeirah Lake Towers,25.071246,55.140806,499000,672,742.56,1,1,Medium,False,True,True,False,True,True,False,False,False,True,False,False,False,False,True,False,False,False,False,False,True,True,False,False,False,False,False,False
1857,7701727,Jumeirah Lake Towers,25.078148,55.148277,400888,403,994.76,0,1,Medium,False,False,True,True,True,True,True,False,True,True,True,False,False,False,False,False,False,True,False,False,True,True,False,False,False,False,False,False
1874,7702410,Town Square,25.003730,55.297034,488888,698,700.41,1,2,Medium,False,True,True,False,True,True,True,False,False,False,False,False,False,False,False,False,False,False,False,True,True,True,False,False,False,True,False,True


In [35]:
data >> filter_by( (X.quality != 'Low') )

Unnamed: 0,id,neighborhood,latitude,longitude,price,size_in_sqft,price_per_sqft,no_of_bedrooms,no_of_bathrooms,quality,maid_room,unfurnished,balcony,barbecue_area,built_in_wardrobes,central_ac,childrens_play_area,childrens_pool,concierge,covered_parking,kitchen_appliances,lobby_in_building,maid_service,networked,pets_allowed,private_garden,private_gym,private_jacuzzi,private_pool,security,shared_gym,shared_pool,shared_spa,study,vastu_compliant,view_of_landmark,view_of_water,walk_in_closet
0,5528049,Palm Jumeirah,25.000000,55.138932,2700000,1079,2502.32,1,2,Medium,False,False,True,True,False,True,True,False,True,False,True,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,True,False
1,6008529,Palm Jumeirah,25.106809,55.151201,2850000,1582,1801.52,2,2,Medium,False,False,True,False,True,True,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,True,False,False,False,False,True,False
2,6034542,Jumeirah Lake Towers,25.063302,55.137728,1150000,1951,589.44,3,5,Medium,True,True,True,False,True,False,False,False,False,True,False,False,False,False,False,False,False,True,False,True,True,True,False,False,False,True,True,True
4,6356778,Palm Jumeirah,25.114275,55.139764,1729200,507,3410.65,0,1,Medium,False,False,False,False,True,True,False,False,False,True,True,False,False,True,False,False,False,False,False,True,True,True,True,False,False,True,True,False
5,6356784,Palm Jumeirah,25.114275,55.139764,3119900,1015,3073.79,1,2,Medium,False,False,False,False,True,True,False,False,False,True,True,False,False,True,False,False,False,False,False,True,True,True,True,False,False,True,True,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1900,7705450,Mohammed Bin Rashid City,25.176892,55.310712,1500000,1087,1379.94,2,2,Ultra,False,True,True,True,True,True,True,True,True,True,True,True,True,True,True,False,False,False,False,True,True,True,True,True,True,True,True,True
1901,7706287,Mohammed Bin Rashid City,25.166145,55.276684,1230000,760,1618.42,1,2,Medium,False,False,True,False,True,True,True,False,True,False,True,False,False,False,False,False,False,False,False,False,True,True,False,False,False,False,True,True
1902,7706389,Dubai Creek Harbour (The Lagoons),25.206500,55.345056,2900000,1930,1502.59,3,5,Medium,True,True,True,False,False,True,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False
1903,7706591,Jumeirah Village Circle,25.073858,55.229844,675000,740,912.16,1,2,Medium,False,True,True,False,True,True,True,False,False,True,True,False,False,False,False,True,False,False,False,True,True,True,False,False,False,False,True,True


In [36]:
data >> filter_by( ~ (X.price > 500000) )

Unnamed: 0,id,neighborhood,latitude,longitude,price,size_in_sqft,price_per_sqft,no_of_bedrooms,no_of_bathrooms,quality,maid_room,unfurnished,balcony,barbecue_area,built_in_wardrobes,central_ac,childrens_play_area,childrens_pool,concierge,covered_parking,kitchen_appliances,lobby_in_building,maid_service,networked,pets_allowed,private_garden,private_gym,private_jacuzzi,private_pool,security,shared_gym,shared_pool,shared_spa,study,vastu_compliant,view_of_landmark,view_of_water,walk_in_closet
49,7033393,DAMAC Hills,25.016736,55.251010,365000,451,809.31,0,1,Medium,False,True,True,False,True,True,False,False,False,True,False,False,False,False,True,False,False,False,False,False,True,True,False,False,False,False,False,False
55,7075159,Jumeirah Lake Towers,25.065736,55.137452,375000,450,833.33,0,1,Medium,False,True,True,False,True,True,False,False,False,True,False,False,False,False,True,False,False,False,False,True,True,True,False,False,False,True,True,False
146,7346962,Dubai Sports City,25.044572,55.218948,390000,849,459.36,1,2,Medium,False,False,True,False,True,True,False,False,False,True,True,False,False,False,False,False,False,False,False,True,True,True,False,False,False,True,False,False
160,7376067,Dubai Sports City,25.042264,55.217360,410000,969,423.12,1,2,Medium,False,True,True,False,True,True,False,False,True,False,True,False,False,False,False,False,False,False,False,True,True,True,False,False,False,False,False,False
194,7414649,Jumeirah Village Triangle,25.043352,55.193510,310000,425,729.41,0,1,Low,False,True,True,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1848,7700652,Dubai Residence Complex,25.091311,55.378277,270000,460,586.96,0,1,Low,False,True,True,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,True,False,False,False,False,False,False
1849,7700952,Jumeirah Lake Towers,25.071246,55.140806,499000,672,742.56,1,1,Medium,False,True,True,False,True,True,False,False,False,True,False,False,False,False,True,False,False,False,False,False,True,True,False,False,False,False,False,False
1857,7701727,Jumeirah Lake Towers,25.078148,55.148277,400888,403,994.76,0,1,Medium,False,False,True,True,True,True,True,False,True,True,True,False,False,False,False,False,False,True,False,False,True,True,False,False,False,False,False,False
1874,7702410,Town Square,25.003730,55.297034,488888,698,700.41,1,2,Medium,False,True,True,False,True,True,True,False,False,False,False,False,False,False,False,False,False,False,False,True,True,True,False,False,False,True,False,True


## Mutate <a class="anchor" id="5"></a>

We can add a new variable to the data-set as follows:

In [37]:
data >> mutate(new_variable=X.price/1000) >> select(X.price , X.new_variable)

Unnamed: 0,price,new_variable
0,2700000,2700.000
1,2850000,2850.000
2,1150000,1150.000
3,2850000,2850.000
4,1729200,1729.200
...,...,...
1900,1500000,1500.000
1901,1230000,1230.000
1902,2900000,2900.000
1903,675000,675.000


In [38]:
y = np.random.random_integers(0,50, size=len(data))

data >> mutate(new_variable = y) >> select(X.price , X.quality , X.new_variable)

Unnamed: 0,price,quality,new_variable
0,2700000,Medium,29
1,2850000,Medium,11
2,1150000,Medium,49
3,2850000,Low,43
4,1729200,Medium,3
...,...,...,...
1900,1500000,Ultra,37
1901,1230000,Medium,49
1902,2900000,Medium,39
1903,675000,Medium,29


We can also add more than one variable at the same time:

In [39]:
y = np.random.random_integers(0 , 50 , size=len(data))

( data >> mutate(new_1 = y , new_2 = X.price/1000) >> 
          select(X.price , X.new_1 , X.new_2) )

Unnamed: 0,price,new_1,new_2
0,2700000,25,2700.000
1,2850000,38,2850.000
2,1150000,27,1150.000
3,2850000,17,2850.000
4,1729200,12,1729.200
...,...,...,...
1900,1500000,50,1500.000
1901,1230000,21,1230.000
1902,2900000,26,2900.000
1903,675000,50,675.000


`transmute` is the same as `mutate` but with the difference that `transmute`select only the new variables that have been added.

In [40]:
data >> transmute(new_1 = y , new_2 = X.price/1000)

Unnamed: 0,new_1,new_2
0,25,2700.000
1,38,2850.000
2,27,1150.000
3,17,2850.000
4,12,1729.200
...,...,...
1900,50,1500.000
1901,21,1230.000
1902,26,2900.000
1903,50,675.000


## Arrange <a class="anchor" id="6"></a>

We can order the rows respect to one variable (column) as follows:

In [41]:
data >> arrange(X.price, ascending=True)

Unnamed: 0,id,neighborhood,latitude,longitude,price,size_in_sqft,price_per_sqft,no_of_bedrooms,no_of_bathrooms,quality,maid_room,unfurnished,balcony,barbecue_area,built_in_wardrobes,central_ac,childrens_play_area,childrens_pool,concierge,covered_parking,kitchen_appliances,lobby_in_building,maid_service,networked,pets_allowed,private_garden,private_gym,private_jacuzzi,private_pool,security,shared_gym,shared_pool,shared_spa,study,vastu_compliant,view_of_landmark,view_of_water,walk_in_closet
1592,7683016,International City,25.173301,55.402315,220000,484,454.55,0,1,Low,False,True,True,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,True
1610,7684626,Dubai Production City (IMPZ),25.036803,55.200909,230000,380,605.26,0,1,Medium,False,False,True,False,True,True,True,False,False,False,False,False,False,False,True,False,False,False,False,False,True,True,False,False,False,False,True,False
1609,7684618,Dubai Production City (IMPZ),25.036803,55.200909,230000,362,635.36,0,1,Medium,False,True,True,False,True,True,True,False,False,False,False,False,False,False,True,False,False,False,False,False,True,True,False,False,False,False,True,False
749,7607826,Dubai Silicon Oasis,25.115934,55.390236,245000,306,800.65,0,1,Low,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
1499,7675284,Dubai Silicon Oasis,25.115934,55.390236,250000,353,708.22,0,1,Low,False,True,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,True,True,True,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
427,7545873,Business Bay,25.188299,55.288975,30950000,7922,3906.84,4,4,Medium,False,True,False,False,True,True,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,True,False,False,False,True,False,False
1885,7703691,Palm Jumeirah,25.103972,55.149621,31440000,6542,4805.87,4,6,High,True,False,True,False,False,True,False,False,True,True,True,False,True,False,True,False,False,True,True,True,True,True,True,False,False,True,True,True
576,7586332,Palm Jumeirah,25.103550,55.168509,34314000,9576,3583.33,4,5,Medium,False,True,True,False,True,True,False,False,True,True,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False
1869,7702221,Palm Jumeirah,25.103972,55.149621,34340000,8722,3937.17,4,6,High,True,False,True,False,False,True,False,False,True,True,True,False,True,False,True,True,False,True,True,True,True,True,True,False,False,True,True,True


In [42]:
data >> arrange(X.price, ascending=False)

Unnamed: 0,id,neighborhood,latitude,longitude,price,size_in_sqft,price_per_sqft,no_of_bedrooms,no_of_bathrooms,quality,maid_room,unfurnished,balcony,barbecue_area,built_in_wardrobes,central_ac,childrens_play_area,childrens_pool,concierge,covered_parking,kitchen_appliances,lobby_in_building,maid_service,networked,pets_allowed,private_garden,private_gym,private_jacuzzi,private_pool,security,shared_gym,shared_pool,shared_spa,study,vastu_compliant,view_of_landmark,view_of_water,walk_in_closet
989,7636368,Palm Jumeirah,25.103550,55.168509,35000000,7346,4764.50,4,5,Low,False,True,True,False,True,False,False,False,True,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False
1869,7702221,Palm Jumeirah,25.103972,55.149621,34340000,8722,3937.17,4,6,High,True,False,True,False,False,True,False,False,True,True,True,False,True,False,True,True,False,True,True,True,True,True,True,False,False,True,True,True
576,7586332,Palm Jumeirah,25.103550,55.168509,34314000,9576,3583.33,4,5,Medium,False,True,True,False,True,True,False,False,True,True,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False
1885,7703691,Palm Jumeirah,25.103972,55.149621,31440000,6542,4805.87,4,6,High,True,False,True,False,False,True,False,False,True,True,True,False,True,False,True,False,False,True,True,True,True,True,True,False,False,True,True,True
427,7545873,Business Bay,25.188299,55.288975,30950000,7922,3906.84,4,4,Medium,False,True,False,False,True,True,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,True,False,False,False,True,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1499,7675284,Dubai Silicon Oasis,25.115934,55.390236,250000,353,708.22,0,1,Low,False,True,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,True,True,True,False,False,False,False,False,False
749,7607826,Dubai Silicon Oasis,25.115934,55.390236,245000,306,800.65,0,1,Low,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
1609,7684618,Dubai Production City (IMPZ),25.036803,55.200909,230000,362,635.36,0,1,Medium,False,True,True,False,True,True,True,False,False,False,False,False,False,False,True,False,False,False,False,False,True,True,False,False,False,False,True,False
1610,7684626,Dubai Production City (IMPZ),25.036803,55.200909,230000,380,605.26,0,1,Medium,False,False,True,False,True,True,True,False,False,False,False,False,False,False,True,False,False,False,False,False,True,True,False,False,False,False,True,False


## Rename <a class="anchor" id="7"></a>

We can rename the names of variables as follows:

In [43]:
data >> rename(Name1=X.id, Name2=X.price) 

Unnamed: 0,Name1,neighborhood,latitude,longitude,Name2,size_in_sqft,price_per_sqft,no_of_bedrooms,no_of_bathrooms,quality,maid_room,unfurnished,balcony,barbecue_area,built_in_wardrobes,central_ac,childrens_play_area,childrens_pool,concierge,covered_parking,kitchen_appliances,lobby_in_building,maid_service,networked,pets_allowed,private_garden,private_gym,private_jacuzzi,private_pool,security,shared_gym,shared_pool,shared_spa,study,vastu_compliant,view_of_landmark,view_of_water,walk_in_closet
0,5528049,Palm Jumeirah,25.000000,55.138932,2700000,1079,2502.32,1,2,Medium,False,False,True,True,False,True,True,False,True,False,True,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,True,False
1,6008529,Palm Jumeirah,25.106809,55.151201,2850000,1582,1801.52,2,2,Medium,False,False,True,False,True,True,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,True,False,False,False,False,True,False
2,6034542,Jumeirah Lake Towers,25.063302,55.137728,1150000,1951,589.44,3,5,Medium,True,True,True,False,True,False,False,False,False,True,False,False,False,False,False,False,False,True,False,True,True,True,False,False,False,True,True,True
3,6326063,Culture Village,25.227295,55.341761,2850000,2020,1410.89,2,3,Low,False,True,True,False,False,False,False,False,True,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False
4,6356778,Palm Jumeirah,25.114275,55.139764,1729200,507,3410.65,0,1,Medium,False,False,False,False,True,True,False,False,False,True,True,False,False,True,False,False,False,False,False,True,True,True,True,False,False,True,True,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1900,7705450,Mohammed Bin Rashid City,25.176892,55.310712,1500000,1087,1379.94,2,2,Ultra,False,True,True,True,True,True,True,True,True,True,True,True,True,True,True,False,False,False,False,True,True,True,True,True,True,True,True,True
1901,7706287,Mohammed Bin Rashid City,25.166145,55.276684,1230000,760,1618.42,1,2,Medium,False,False,True,False,True,True,True,False,True,False,True,False,False,False,False,False,False,False,False,False,True,True,False,False,False,False,True,True
1902,7706389,Dubai Creek Harbour (The Lagoons),25.206500,55.345056,2900000,1930,1502.59,3,5,Medium,True,True,True,False,False,True,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False
1903,7706591,Jumeirah Village Circle,25.073858,55.229844,675000,740,912.16,1,2,Medium,False,True,True,False,True,True,True,False,False,True,True,False,False,False,False,True,False,False,False,True,True,True,False,False,False,False,True,True


## Gather <a class="anchor" id="8"></a>

In [44]:
( data >> gather('variable', 'value', ['price', 'quality']) >> 
select(X.size_in_sqft , X.no_of_bedrooms, X.variable, X.value) )

Unnamed: 0,size_in_sqft,no_of_bedrooms,variable,value
0,1079,1,price,2700000
1,1582,2,price,2850000
2,1951,3,price,1150000
3,2020,2,price,2850000
4,507,0,price,1729200
...,...,...,...,...
3805,1087,2,quality,Ultra
3806,760,1,quality,Medium
3807,1930,3,quality,Medium
3808,740,1,quality,Medium


In [45]:
data >> gather('variable', 'value') 

Unnamed: 0,variable,value
0,id,5528049
1,id,6008529
2,id,6034542
3,id,6326063
4,id,6356778
...,...,...
72385,walk_in_closet,True
72386,walk_in_closet,True
72387,walk_in_closet,False
72388,walk_in_closet,True


In [46]:
( data_long = data >> gather('variable', 'value', ['price', 'quality'] ,  add_id=True ) >> 
select(X._ID , X.size_in_sqft , X.no_of_bedrooms, X.variable, X.value) )
# add_id=True is needed 
data_long

SyntaxError: invalid syntax (2212293065.py, line 1)

## Spread <a class="anchor" id="9"></a>

In [None]:
data_long >> spread(X.variable, X.value)

Unnamed: 0,_ID,size_in_sqft,no_of_bedrooms,price,quality
0,0,1079,1,2700000,Medium
1,1,1582,2,2850000,Medium
2,2,1951,3,1150000,Medium
3,3,2020,2,2850000,Low
4,4,507,0,1729200,Medium
...,...,...,...,...,...
1900,1900,1087,2,1500000,Ultra
1901,1901,760,1,1230000,Medium
1902,1902,1930,3,2900000,Medium
1903,1903,740,1,675000,Medium


We can separate and unite the data-set columns:

In [None]:
X1 = pd.Series([ "Femenino" , "Masculino", "Masculino", "Femenino" ])
X2 = pd.Series([ 12, 32, 30, -13])
X3 = pd.Series([  '1-a-3' , '1-b', '1-c-3-4', '9-d-1', '10' ])
X4 = pd.Series([ 22, 5, -30, 23])

df = pd.DataFrame( {"Sex": X1 , "X2": X2 , "X3": X3 , "X4": X4} )
df

Unnamed: 0,Sex,X2,X3,X4
0,Femenino,12.0,1-a-3,22.0
1,Masculino,32.0,1-b,5.0
2,Masculino,30.0,1-c-3-4,-30.0
3,Femenino,-13.0,9-d-1,23.0
4,,,10,


## Separate <a class="anchor" id="10"></a>

Separate one column in several columns:

In [None]:
df >> separate(X.X3, ['col1', 'col2'], remove=True, convert=True,
                                       extra='drop', fill='right')

Unnamed: 0,Sex,X2,X4,col1,col2
0,Femenino,12.0,22.0,1,a
1,Masculino,32.0,5.0,1,b
2,Masculino,30.0,-30.0,1,c
3,Femenino,-13.0,23.0,9,d
4,,,,10,


In [None]:
df >> separate(X.X3, ['col1', 'col2', 'col3', 'col4'], remove=True, convert=True,
                                                     extra='drop', fill='right')

Unnamed: 0,Sex,X2,X4,col1,col2,col3,col4
0,Femenino,12.0,22.0,1,a,3.0,
1,Masculino,32.0,5.0,1,b,,
2,Masculino,30.0,-30.0,1,c,3.0,4.0
3,Femenino,-13.0,23.0,9,d,1.0,
4,,,,10,,,


In [None]:
df >> separate(X.X3, ['col1', 'col2', 'col3', 'col4'], remove=True, convert=True,
                                                       extra='drop', fill='left')

Unnamed: 0,Sex,X2,X4,col1,col2,col3,col4
0,Femenino,12.0,22.0,,1,a,3
1,Masculino,32.0,5.0,,,1,b
2,Masculino,30.0,-30.0,1.0,c,3,4
3,Femenino,-13.0,23.0,,9,d,1
4,,,,,,,10


In [None]:
df >> separate(X.X3, ['col1', 'col2', 'col3', 'col4'], remove=False, convert=True,
                                                       extra='drop', fill='left')

Unnamed: 0,Sex,X2,X3,X4,col1,col2,col3,col4
0,Femenino,12.0,1-a-3,22.0,,1,a,3
1,Masculino,32.0,1-b,5.0,,,1,b
2,Masculino,30.0,1-c-3-4,-30.0,1.0,c,3,4
3,Femenino,-13.0,9-d-1,23.0,,9,d,1
4,,,10,,,,,10


In [None]:
df >> separate(X.X3, ['col1', 'col2', 'col3', 'col4'], sep=[1,4] ,
               remove=False, convert=True,
               extra='drop', fill='left')

Unnamed: 0,Sex,X2,X3,X4,col1,col2,col3,col4
0,Femenino,12.0,1-a-3,22.0,1,-a-,3,
1,Masculino,32.0,1-b,5.0,1,-b,,
2,Masculino,30.0,1-c-3-4,-30.0,1,-c-,3-4,
3,Femenino,-13.0,9-d-1,23.0,9,-d-,1,
4,,,10,,1,0,,


In [None]:
from numpy import NaN

X1 = pd.Series([ "Femenino" , "Masculino", "Masculino", "Femenino" ])
X2 = pd.Series([ 12, 32, 30, -13])
X3 = pd.Series([  'a-3' , '+b', 10, NaN ])
X4 = pd.Series([ 22, 5, -30, 23])

df = pd.DataFrame( {"Sex": X1 , "X2": X2 , "X3": X3 , "X4": X4} )
df

Unnamed: 0,Sex,X2,X3,X4
0,Femenino,12,a-3,22
1,Masculino,32,+b,5
2,Masculino,30,10,-30
3,Femenino,-13,,23


## Unite <a class="anchor" id="11"></a>

Unite several columns in one column:

In [None]:
df >> unite('united',  ['X3', 'X4'], remove=False, na_action='maintain')

['X3', 'X4'] _ False maintain


Unnamed: 0,Sex,X2,X3,X4,united
0,Femenino,12,a-3,22,a-3_22
1,Masculino,32,+b,5,+b_5
2,Masculino,30,10,-30,10_-30
3,Femenino,-13,,23,


In [None]:
df >> unite('united',  ['X3', 'X4'], remove=False, na_action='as_string')

['X3', 'X4'] _ False as_string


Unnamed: 0,Sex,X2,X3,X4,united
0,Femenino,12,a-3,22,a-3_22
1,Masculino,32,+b,5,+b_5
2,Masculino,30,10,-30,10_-30
3,Femenino,-13,,23,nan_23


In [None]:
df >> unite('united',  ['X3', 'X4'], remove=True, na_action='maintain')

['X3', 'X4'] _ True maintain


Unnamed: 0,Sex,X2,united
0,Femenino,12,a-3_22
1,Masculino,32,+b_5
2,Masculino,30,10_-30
3,Femenino,-13,


In [None]:
df >> unite('united',  ['X3', 'X4'], remove=False, na_action='maintain', sep='**')

['X3', 'X4'] ** False maintain


Unnamed: 0,Sex,X2,X3,X4,united
0,Femenino,12,a-3,22,a-3**22
1,Masculino,32,+b,5,+b**5
2,Masculino,30,10,-30,10**-30
3,Femenino,-13,,23,


In [None]:
df >> unite('united',  ['X3', 'X4'], remove=False, na_action='maintain', sep='//')

## Joins <a class="anchor" id="12"></a>

We can join data-sets as well:

In [None]:
Alumnos = pd.DataFrame({
        'Name':['Pepe','Juan','Eva', 'Rosa', 'Ricardo'],
        'DNI':['2234X', '3987Y', '5412U' , '7814J' , '2233K'],
        'Age': [15 , 17 , 16 , 13, 14 ]
    })

Repetidores = pd.DataFrame({
    'Name':['Pepe','Juan','Rosa'],
    'Final_Score':[5, 6, 7.5],
    'Repeater' : ['yes', 'yes', 'yes']
})

Deportistas = pd.DataFrame({
    'Name':['Eva','Juan'],
    'Sport':['Basket', 'Soccer'],
    'Sporty': ['true' , 'true']
})

In [None]:
Alumnos

Unnamed: 0,Name,DNI,Age
0,Pepe,2234X,15
1,Juan,3987Y,17
2,Eva,5412U,16
3,Rosa,7814J,13
4,Ricardo,2233K,14


In [None]:
Repetidores

Unnamed: 0,Name,Final_Score,Repeater
0,Pepe,5.0,yes
1,Juan,6.0,yes
2,Rosa,7.5,yes


In [None]:
Deportistas

Unnamed: 0,Name,Sport,Sporty
0,Eva,Basket,True
1,Juan,Soccer,True


### Inner Join <a class="anchor" id="13"></a>

In [None]:
Alumnos >> inner_join(Repetidores, by='Name')

Unnamed: 0,Name,DNI,Age,Final_Score,Repeater
0,Pepe,2234X,15,5.0,yes
1,Juan,3987Y,17,6.0,yes
2,Rosa,7814J,13,7.5,yes


### Full Join <a class="anchor" id="14"></a>

In [None]:
Alumnos >> full_join(Repetidores, by='Name')

Unnamed: 0,Name,DNI,Age,Final_Score,Repeater
0,Pepe,2234X,15,5.0,yes
1,Juan,3987Y,17,6.0,yes
2,Eva,5412U,16,,
3,Rosa,7814J,13,7.5,yes
4,Ricardo,2233K,14,,


### Left Join <a class="anchor" id="15"></a>

In [None]:
Alumnos >>  left_join(Repetidores, by='Name')

Unnamed: 0,Name,DNI,Age,Final_Score,Repeater
0,Pepe,2234X,15,5.0,yes
1,Juan,3987Y,17,6.0,yes
2,Eva,5412U,16,,
3,Rosa,7814J,13,7.5,yes
4,Ricardo,2233K,14,,


In [None]:
Deportistas >>  left_join(Repetidores, by='Name')

Unnamed: 0,Name,Sport,Sporty,Final_Score,Repeater
0,Eva,Basket,True,,
1,Juan,Soccer,True,6.0,yes


In [None]:
Repetidores >>  left_join(Deportistas, by='Name')

Unnamed: 0,Name,Final_Score,Repeater,Sport,Sporty
0,Pepe,5.0,yes,,
1,Juan,6.0,yes,Soccer,True
2,Rosa,7.5,yes,,


### Right Join <a class="anchor" id="16"></a>

In [None]:
Deportistas >>  right_join(Repetidores, by='Name') 

# equivalent to Repetidores >>  left_join(Deportistas, by='Name')

Unnamed: 0,Name,Sport,Sporty,Final_Score,Repeater
0,Pepe,,,5.0,yes
1,Juan,Soccer,True,6.0,yes
2,Rosa,,,7.5,yes


In [None]:
Repetidores >>  right_join(Deportistas, by='Name') 

# equivalent to Deportistas >>  left_join(Repetidores, by='Name')

Unnamed: 0,Name,Final_Score,Repeater,Sport,Sporty
0,Eva,,,Basket,True
1,Juan,6.0,yes,Soccer,True


### Semi Join <a class="anchor" id="17"></a>

In [None]:
Alumnos >>  semi_join(Repetidores, by='Name')

Unnamed: 0,Name,DNI,Age
0,Pepe,2234X,15
1,Juan,3987Y,17
2,Rosa,7814J,13


### Anti Join <a class="anchor" id="18"></a>

In [None]:
Alumnos >>  anti_join(Repetidores, by='Name')

Unnamed: 0,Name,DNI,Age
2,Eva,5412U,16
4,Ricardo,2233K,14


In [None]:
df1 = pd.DataFrame({
        'X1':['A','B','C'],
        'X2':[55, 32, 63]
    })
df2 = pd.DataFrame({
      'X1':['B','C','D'],
      'X2':[32, 63, 74]
})

### Union <a class="anchor" id="19"></a>

In [None]:
df1 >> union(df2)

Unnamed: 0,X1,X2
0,A,55
1,B,32
2,C,63
2,D,74


### Intersect <a class="anchor" id="20"></a>

In [None]:
df1 >> intersect(df2)

Unnamed: 0,X1,X2
0,B,32
1,C,63


### Difference <a class="anchor" id="21"></a>

In [None]:
df1 >>  set_diff(df2)

Unnamed: 0,X1,X2
0,A,55


In [None]:
df2 >>  set_diff(df1)

Unnamed: 0,X1,X2
2,D,74


## Concatenate <a class="anchor" id="22"></a>

We can concatenate different data-set as follows:

In [None]:
df1 = pd.DataFrame({
        'X1':['A','B','C'],
        'X2':[55, 32, 63], 
        'X3': [0, 2, 5]
    })
df2 = pd.DataFrame({
      'X1':['B','C','D'],
      'X2':[32, 63, 74],
      'X4':[32, 63, 74]
})

In [None]:
df1

Unnamed: 0,X1,X2,X3
0,A,55,0
1,B,32,2
2,C,63,5


In [None]:
df2

Unnamed: 0,X1,X2,X4
0,B,32,32
1,C,63,63
2,D,74,74


Concatenate by rows:

In [None]:
df1 >> bind_rows(df2 , join='inner')

Unnamed: 0,X1,X2
0,A,55
1,B,32
2,C,63
0,B,32
1,C,63
2,D,74


In [None]:
df1 >> bind_rows(df2 , join='outer')

Unnamed: 0,X1,X2,X3,X4
0,A,55,0.0,
1,B,32,2.0,
2,C,63,5.0,
0,B,32,,32.0
1,C,63,,63.0
2,D,74,,74.0


In [None]:
df1 >> bind_rows(df2 , join='outer', ignore_index=True)

Unnamed: 0,X1,X2,X3,X4
0,A,55,0.0,
1,B,32,2.0,
2,C,63,5.0,
3,B,32,,32.0
4,C,63,,63.0
5,D,74,,74.0


Concatenate by columns:

In [None]:
df1 >> bind_cols(df2 , join='inner')

Unnamed: 0,X1,X2,X3,X1.1,X2.1,X4
0,A,55,0,B,32,32
1,B,32,2,C,63,63
2,C,63,5,D,74,74


In [None]:
df1 >> bind_cols(df2 , join='outer')

Unnamed: 0,X1,X2,X3,X1.1,X2.1,X4
0,A,55,0,B,32,32
1,B,32,2,C,63,63
2,C,63,5,D,74,74


## Group and Summarize <a class="anchor" id="23"></a>

We can group and summarize as follows:

In [None]:
( round( data >> group_by('quality') >> 
summarize(price_mean=X.price.mean(), 
          price_median=X.price.median(), 
          price_std=X.price.std()) ) )

Unnamed: 0,quality,price_mean,price_median,price_std
0,High,2688829.0,1400103.0,4852033.0
1,Low,1937610.0,1465444.0,2307412.0
2,Medium,2168123.0,1470388.0,2947205.0
3,Ultra,919428.0,759502.0,395046.0


In [None]:
round( data >> group_by('quality') >> 
summarize(price_mean=X.price.mean(), 
          price_median=X.price.median(), price_var=X.price.var(), 
          price_std=X.price.std()  , price_max=X.price.max() ,  
          price_min=X.price.min(),   price_count=n(X.price) ,  
          price_count_distinct=n_distinct(X.price) , 
          price_IQR = IQR(X.price) ) )

Unnamed: 0,quality,price_mean,price_median,price_var,price_std,price_max,price_min,price_count,price_count_distinct,price_IQR
0,High,2688829.0,1400103.0,23542230000000.0,4852033.0,34340000,360000,134,113,1441842.0
1,Low,1937610.0,1465444.0,5324149000000.0,2307412.0,35000000,220000,544,299,1216250.0
2,Medium,2168123.0,1470388.0,8686019000000.0,2947205.0,34314000,230000,1146,559,1389084.0
3,Ultra,919428.0,759502.0,156061600000.0,395046.0,2551888,440500,81,74,369300.0


Other useful function to filter observations (rows):

In [None]:
( data >> mutate(price_between = between(X.price , 500000 , 1000000)) >>
 select(X.price , X.price_between) )

Unnamed: 0,price,price_between
0,2700000,False
1,2850000,False
2,1150000,False
3,2850000,False
4,1729200,False
...,...,...
1900,1500000,False
1901,1230000,False
1902,2900000,False
1903,675000,True


## Other usuful functions <a class="anchor" id="24"></a>

In [None]:
( data >> mutate(price_dense_rank = dense_rank(X.price)) >> 
select(X.price , X.price_dense_rank) )

Unnamed: 0,price,price_dense_rank
0,2700000,645.0
1,2850000,651.0
2,1150000,341.0
3,2850000,651.0
4,1729200,503.0
...,...,...
1900,1500000,454.0
1901,1230000,360.0
1902,2900000,654.0
1903,675000,148.0


In [None]:
( data >> mutate(price_cumsum = cumsum(X.price)) >> 
select(X.price , X.price_cumsum) )

Unnamed: 0,price,price_cumsum
0,2700000,2700000
1,2850000,5550000
2,1150000,6700000
3,2850000,9550000
4,1729200,11279200
...,...,...
1900,1500000,3967940022
1901,1230000,3969170022
1902,2900000,3972070022
1903,675000,3972745022


In [None]:
( data >> mutate(price_cummean = cummean(X.price)) >>
select(X.price , X.price_cummean) )

Unnamed: 0,price,price_cummean
0,2700000,2.700000e+06
1,2850000,2.775000e+06
2,1150000,2.233333e+06
3,2850000,2.387500e+06
4,1729200,2.255840e+06
...,...,...
1900,1500000,2.087291e+06
1901,1230000,2.086840e+06
1902,2900000,2.087267e+06
1903,675000,2.086526e+06


In [None]:
(data >> mutate(price_cumprod = cumprod(X.price)) >>
 select(X.price , X.price_cumprod) )

Unnamed: 0,price,price_cumprod
0,2700000,2700000
1,2850000,7695000000000
2,1150000,8849250000000000000
3,2850000,-7550831625259843584
4,1729200,2556877478461177856
...,...,...
1900,1500000,0
1901,1230000,0
1902,2900000,0
1903,675000,0
