# Demo 3.4  ***query()***:  Filtering a *pandas* Dataframe         

 
- **Demonstrates**:  
  - Filtering with  ***query()*** 
    - Using  ***in*** 
  - Checking Data Types with ***dtypes***  


- ** requires data file:  `Olympics.csv`

---

In [2]:
import pandas as pd

### Read the datafile File into a *pandas* Dataframe  

In [3]:
# 1. Read the data file "Olympics.csv" into a pandas DataFrame;
# display the DataFrame head and shape.
df = pd.read_csv("Olympics.csv")

print(df.shape)
df.head()

(87, 6)


Unnamed: 0,Rank,Country,Gold,Silver,Bronze,Total
0,1,United States (USA),46,37,38,121
1,2,Great Britain (GBR),27,23,17,67
2,3,China (CHN),26,18,26,70
3,4,Russia (RUS),19,17,19,55
4,5,Germany (GER),17,10,15,42


### Check Data Types

In [5]:
# 2. Display the data types of each column
df.dtypes

Rank        int64
Country    object
Gold        int64
Silver      int64
Bronze      int64
Total       int64
dtype: object

# Filtering on a *String* Condition 
- Options:  
  - **==**  
  - **!=**   

#### Use unique() to get exact spelling  

In [4]:
# 3. Get unique values of the column and get the exact spelling
df['Country'].unique()

array(['United States (USA)', 'Great Britain (GBR)', 'China (CHN)',
       'Russia (RUS)', 'Germany (GER)', 'Japan (JPN)', 'France (FRA)',
       'South Korea (KOR)', 'Italy (ITA)', 'Australia (AUS)',
       'Netherlands (NED)', 'Hungary (HUN)', 'Brazil (BRA)*',
       'Spain (ESP)', 'Kenya (KEN)', 'Jamaica (JAM)', 'Croatia (CRO)',
       'Cuba (CUB)', 'New Zealand (NZL)', 'Canada (CAN)',
       'Uzbekistan (UZB)', 'Kazakhstan (KAZ)', 'Colombia (COL)',
       'Switzerland (SUI)', 'Iran (IRI)', 'Greece (GRE)',
       'Argentina (ARG)', 'Denmark (DEN)', 'Sweden (SWE)',
       'South Africa (RSA)', 'Ukraine (UKR)', 'Serbia (SRB)',
       'Poland (POL)', 'North Korea (PRK)', 'Belgium (BEL)',
       'Thailand (THA)', 'Slovakia (SVK)', 'Georgia (GEO)',
       'Azerbaijan (AZE)', 'Belarus (BLR)', 'Turkey (TUR)',
       'Armenia (ARM)', 'Czech Republic (CZE)', 'Ethiopia (ETH)',
       'Slovenia (SLO)', 'Indonesia (INA)', 'Romania (ROU)',
       'Bahrain (BRN)', 'Vietnam (VIE)', 'Chinese Taipei

### Filter

In [9]:
# 4. Filter the data using the query method
df_Q1 = df.query('Country == "United States (USA)"')

print(df_Q1.shape)
df_Q1.head()

(1, 6)


Unnamed: 0,Rank,Country,Gold,Silver,Bronze,Total
0,1,United States (USA),46,37,38,121


### Check Filtering  

In [10]:
# 5. Check filtering
df_Q1['Country'].unique()

array(['United States (USA)'], dtype=object)

# Filtering on a *Numeric* Condition 
- Options:  
  - **>**  
  - **<** 
  - **>=**  
  - **<=**    
  - **!=** 

#### First Make Sure the Field is Numeric  
- If it's not, the above operators might work, but not exactly the way you want them to!  

In [11]:
# 6. Check data type for Gold to make sure it's numeric
df.dtypes

Rank        int64
Country    object
Gold        int64
Silver      int64
Bronze      int64
Total       int64
dtype: object

### Filter  

In [12]:
# 7. Filter the data using the query method
df_Q2 = df.query('Gold > 10')
print(df_Q2.shape)
df_Q2

(6, 6)


Unnamed: 0,Rank,Country,Gold,Silver,Bronze,Total
0,1,United States (USA),46,37,38,121
1,2,Great Britain (GBR),27,23,17,67
2,3,China (CHN),26,18,26,70
3,4,Russia (RUS),19,17,19,55
4,5,Germany (GER),17,10,15,42
5,6,Japan (JPN),12,8,21,41


### Check Filtering  

In [14]:
# 8. Check filtering 
df_Q2['Gold'].unique()

array([46, 27, 26, 19, 17, 12])

# Filtering on *Multiple Values*  

- Option:  ***in***  


#### Use unique() to get exact spellings  


In [15]:
# 9. Get unique values of the column and get the exact spelling
df['Country'].unique()

array(['United States (USA)', 'Great Britain (GBR)', 'China (CHN)',
       'Russia (RUS)', 'Germany (GER)', 'Japan (JPN)', 'France (FRA)',
       'South Korea (KOR)', 'Italy (ITA)', 'Australia (AUS)',
       'Netherlands (NED)', 'Hungary (HUN)', 'Brazil (BRA)*',
       'Spain (ESP)', 'Kenya (KEN)', 'Jamaica (JAM)', 'Croatia (CRO)',
       'Cuba (CUB)', 'New Zealand (NZL)', 'Canada (CAN)',
       'Uzbekistan (UZB)', 'Kazakhstan (KAZ)', 'Colombia (COL)',
       'Switzerland (SUI)', 'Iran (IRI)', 'Greece (GRE)',
       'Argentina (ARG)', 'Denmark (DEN)', 'Sweden (SWE)',
       'South Africa (RSA)', 'Ukraine (UKR)', 'Serbia (SRB)',
       'Poland (POL)', 'North Korea (PRK)', 'Belgium (BEL)',
       'Thailand (THA)', 'Slovakia (SVK)', 'Georgia (GEO)',
       'Azerbaijan (AZE)', 'Belarus (BLR)', 'Turkey (TUR)',
       'Armenia (ARM)', 'Czech Republic (CZE)', 'Ethiopia (ETH)',
       'Slovenia (SLO)', 'Indonesia (INA)', 'Romania (ROU)',
       'Bahrain (BRN)', 'Vietnam (VIE)', 'Chinese Taipei

### Filter  

In [16]:
# 10. Create a List of the Countries we want to filter by
selected_countries = ['United States (USA)', 'Great Britain (GBR)', 'China (CHN)',
       'Russia (RUS)', 'Germany (GER)' ]

In [17]:
# 11. Create a new Dataframe based on the filter results
df_Q3 = df.query('Country in @selected_countries')

print(df_Q3.shape)
df_Q3

(5, 6)


Unnamed: 0,Rank,Country,Gold,Silver,Bronze,Total
0,1,United States (USA),46,37,38,121
1,2,Great Britain (GBR),27,23,17,67
2,3,China (CHN),26,18,26,70
3,4,Russia (RUS),19,17,19,55
4,5,Germany (GER),17,10,15,42


### Check Filtering  

In [21]:
# 12. Check filtering
print(type(df_Q3['Country']))
df_Q3['Country']

<class 'pandas.core.series.Series'>


0    United States (USA)
1    Great Britain (GBR)
2            China (CHN)
3           Russia (RUS)
4          Germany (GER)
Name: Country, dtype: object