In [35]:
import pandas as pd

df = pd.read_parquet('datasets/data.parquet')

In [37]:
df.head()

Unnamed: 0,employee_number,name,company,country,dob,age,department,salary,has_parking_space
0,897028,Kenneth Jensen,Wilson and Sons,India,1983-07-03,38,Management,124790,False
1,463979,Sarah Anderson,"Hernandez, Cunningham and Clark",India,1980-08-09,41,Consulting,103122,True
2,388446,Tracie Rollins,"Hernandez, Cunningham and Clark",Cayman Islands,1987-07-29,34,Consulting,119072,False
3,267447,Seth Smith,Spears-Brown,Germany,1969-03-04,52,System Architect,115653,False
4,401300,Katherine Fields,"Hernandez, Cunningham and Clark",Venezuela,1980-01-26,42,Finance,119412,False


### 1. Select one column and return as series

In [38]:
df['name'].head(3)

0    Kenneth Jensen
1    Sarah Anderson
2    Tracie Rollins
Name: name, dtype: object

### Or

In [39]:
df.name.head(3)

0    Kenneth Jensen
1    Sarah Anderson
2    Tracie Rollins
Name: name, dtype: object

If the column name does not contain any spaces we can use dot-notation. This is often simpler because we can take advantage of the intellisense.

### 2. Select one column and return as dataframe 

In [40]:
df[['name']].head(3)

Unnamed: 0,name
0,Kenneth Jensen
1,Sarah Anderson
2,Tracie Rollins


### 3. Select several columns

In [41]:
cols = ['employee_number', 'name', 'company']
df[cols].head(3)

Unnamed: 0,employee_number,name,company
0,897028,Kenneth Jensen,Wilson and Sons
1,463979,Sarah Anderson,"Hernandez, Cunningham and Clark"
2,388446,Tracie Rollins,"Hernandez, Cunningham and Clark"


### 4. Select columns using the `select_dtype` method

- The use either **`include`** or **`exclude`** as the argument. You can select by one datatype or a list of datatypes.
- common dtypes include but are not limited to: 
    - `int` 
    - `number`
    - `object` 
    - `category` 
    - `datetime`
    - `datetime64`
    - `timedelta`

In [42]:
df.select_dtypes(include=['int']).head(3)

Unnamed: 0,employee_number,age,salary
0,897028,38,124790
1,463979,41,103122
2,388446,34,119072


In [43]:
df.select_dtypes(exclude=['int']).head(3)

Unnamed: 0,name,company,country,dob,department,has_parking_space
0,Kenneth Jensen,Wilson and Sons,India,1983-07-03,Management,False
1,Sarah Anderson,"Hernandez, Cunningham and Clark",India,1980-08-09,Consulting,True
2,Tracie Rollins,"Hernandez, Cunningham and Clark",Cayman Islands,1987-07-29,Consulting,False


### 4. Select columns using the **`filter`** method

In [44]:
cols = ['employee_number', 'name', 'company']

# filter on exact column names
df.filter(items=cols).head(n=2)

Unnamed: 0,employee_number,name,company
0,897028,Kenneth Jensen,Wilson and Sons
1,463979,Sarah Anderson,"Hernandez, Cunningham and Clark"


In [45]:
# filter on column names that include the word 'has'
df.filter(like='has').head(2)

Unnamed: 0,has_parking_space
0,False
1,True


In [46]:
# filter on column names that start with a c
df.filter(regex=r'^c+?').head(2)

Unnamed: 0,company,country
0,Wilson and Sons,India
1,"Hernandez, Cunningham and Clark",India
