# Video: Manual Data Frame Queries

This video compares declarative queries using pandas data frames with implementing equivalent filtering using a series of lower level data frame operations.


## Why do We Like Queries?

How would we implement this without query?

* `abalone.query("Sex in @target_sexes and (Whole_weight > 0.5 or Height > 0.2)")`

## Code Example: Reimplementing a Query without Query

In [28]:
import pandas as pd

In [29]:
abalone = pd.read_csv("https://raw.githubusercontent.com/bu-cds-omds/bu-cds-omds-data/main/data/abalone.tsv", sep="\t")
abalone

Unnamed: 0,Sex,Length,Diameter,Height,Whole_weight,Shucked_weight,Viscera_weight,Shell_weight,Rings
0,M,0.455,0.365,0.095,0.5140,0.2245,0.1010,0.1500,15
1,M,0.350,0.265,0.090,0.2255,0.0995,0.0485,0.0700,7
2,F,0.530,0.420,0.135,0.6770,0.2565,0.1415,0.2100,9
3,M,0.440,0.365,0.125,0.5160,0.2155,0.1140,0.1550,10
4,I,0.330,0.255,0.080,0.2050,0.0895,0.0395,0.0550,7
...,...,...,...,...,...,...,...,...,...
4172,F,0.565,0.450,0.165,0.8870,0.3700,0.2390,0.2490,11
4173,M,0.590,0.440,0.135,0.9660,0.4390,0.2145,0.2605,10
4174,M,0.600,0.475,0.205,1.1760,0.5255,0.2875,0.3080,9
4175,F,0.625,0.485,0.150,1.0945,0.5310,0.2610,0.2960,10


In [30]:
abalone.query("Sex in @target_sexes and (Whole_weight > 0.5 or Height > 0.2)")

Unnamed: 0,Sex,Length,Diameter,Height,Whole_weight,Shucked_weight,Viscera_weight,Shell_weight,Rings
0,M,0.455,0.365,0.095,0.5140,0.2245,0.1010,0.1500,15
2,F,0.530,0.420,0.135,0.6770,0.2565,0.1415,0.2100,9
3,M,0.440,0.365,0.125,0.5160,0.2155,0.1140,0.1550,10
6,F,0.530,0.415,0.150,0.7775,0.2370,0.1415,0.3300,20
7,F,0.545,0.425,0.125,0.7680,0.2940,0.1495,0.2600,16
...,...,...,...,...,...,...,...,...,...
4172,F,0.565,0.450,0.165,0.8870,0.3700,0.2390,0.2490,11
4173,M,0.590,0.440,0.135,0.9660,0.4390,0.2145,0.2605,10
4174,M,0.600,0.475,0.205,1.1760,0.5255,0.2875,0.3080,9
4175,F,0.625,0.485,0.150,1.0945,0.5310,0.2610,0.2960,10


## Breaking Up the Query

Goal:
* `Sex in @target_sexes and (Whole_weight > 0.5 or Height > 0.2)")`

Pieces:
* `Sex in @target_sexes`
* `Whole_weight > 0.5`
* `Height > 0.2`

In [31]:
abalone["Sex"].isin(target_sexes)

0        True
1        True
2        True
3        True
4       False
        ...  
4172     True
4173     True
4174     True
4175     True
4176     True
Name: Sex, Length: 4177, dtype: bool

In [32]:
abalone["Whole_weight"] > 0.5

0        True
1       False
2        True
3        True
4       False
        ...  
4172     True
4173     True
4174     True
4175     True
4176     True
Name: Whole_weight, Length: 4177, dtype: bool

In [33]:
abalone["Height"] > 0.2

0       False
1       False
2       False
3       False
4       False
        ...  
4172    False
4173    False
4174     True
4175    False
4176    False
Name: Height, Length: 4177, dtype: bool

In [34]:
(abalone["Whole_weight"] > 0.5) | (abalone["Height"] > 0.2)

0        True
1       False
2        True
3        True
4       False
        ...  
4172     True
4173     True
4174     True
4175     True
4176     True
Length: 4177, dtype: bool

In [35]:
abalone["Sex"].isin(target_sexes) & (abalone["Whole_weight"] > 0.5) | (abalone["Height"] > 0.2)

0        True
1       False
2        True
3        True
4       False
        ...  
4172     True
4173     True
4174     True
4175     True
4176     True
Length: 4177, dtype: bool

In [36]:
abalone[abalone["Sex"].isin(target_sexes) & ((abalone["Whole_weight"] > 0.5) | (abalone["Height"] > 0.2))]

Unnamed: 0,Sex,Length,Diameter,Height,Whole_weight,Shucked_weight,Viscera_weight,Shell_weight,Rings
0,M,0.455,0.365,0.095,0.5140,0.2245,0.1010,0.1500,15
2,F,0.530,0.420,0.135,0.6770,0.2565,0.1415,0.2100,9
3,M,0.440,0.365,0.125,0.5160,0.2155,0.1140,0.1550,10
6,F,0.530,0.415,0.150,0.7775,0.2370,0.1415,0.3300,20
7,F,0.545,0.425,0.125,0.7680,0.2940,0.1495,0.2600,16
...,...,...,...,...,...,...,...,...,...
4172,F,0.565,0.450,0.165,0.8870,0.3700,0.2390,0.2490,11
4173,M,0.590,0.440,0.135,0.9660,0.4390,0.2145,0.2605,10
4174,M,0.600,0.475,0.205,1.1760,0.5255,0.2875,0.3080,9
4175,F,0.625,0.485,0.150,1.0945,0.5310,0.2610,0.2960,10


In [37]:
abalone.query("Sex in @target_sexes and (Whole_weight > 0.5 or Height > 0.2)")

Unnamed: 0,Sex,Length,Diameter,Height,Whole_weight,Shucked_weight,Viscera_weight,Shell_weight,Rings
0,M,0.455,0.365,0.095,0.5140,0.2245,0.1010,0.1500,15
2,F,0.530,0.420,0.135,0.6770,0.2565,0.1415,0.2100,9
3,M,0.440,0.365,0.125,0.5160,0.2155,0.1140,0.1550,10
6,F,0.530,0.415,0.150,0.7775,0.2370,0.1415,0.3300,20
7,F,0.545,0.425,0.125,0.7680,0.2940,0.1495,0.2600,16
...,...,...,...,...,...,...,...,...,...
4172,F,0.565,0.450,0.165,0.8870,0.3700,0.2390,0.2490,11
4173,M,0.590,0.440,0.135,0.9660,0.4390,0.2145,0.2605,10
4174,M,0.600,0.475,0.205,1.1760,0.5255,0.2875,0.3080,9
4175,F,0.625,0.485,0.150,1.0945,0.5310,0.2610,0.2960,10


## Which Do You Prefer?

* `abalone.query("Sex in @target_sexes and (Whole_weight > 0.5 or Height > 0.2)")`
* `abalone[abalone["Sex"].isin(target_sexes) & ((abalone["Whole_weight"] > 0.5) | (abalone["Height"] > 0.2))]`
