# Pandas tip #10: filter your rows and columns
Tabular data can consist of a large number of columns and sometimes you want to select a subset of columns in a smart way. For example, you have a dataset that contains the color combination for a car and you want to get all the columns about colors.

I used to .loc[] until I dropped and used a list comprehension to select the columns I want. This works very well but also is quite long and therefore, less readable. For such cases Pandas almost always offers a neater way to solve that problem: .filter().

The .filter() method helps you to select a subset of the DataFrame, but it only filters the labels, not the content. There are three parameters that can be used for filtering: items, like, and regex. The first parameter is simply a list of label names and must match exactly. The second parameter works similar to the `LIKE` keyword in SQL and is used to filter labels that contains the substring passed to like. With the regex parameter we can pass a regex as a selection criteria.

Pandas offers many of such small improvements and I think those make the code much more readable and a bit less typing.

Lets generate some random data:

In [None]:
import numpy as np
import pandas as pd

colors = ['red', 'blue', 'yellow', 'green', 'purple'] 
n_samples = 100

rng = np.random.default_rng()
df = pd.DataFrame({
    'car_serial_id': rng.integers(0, 1000, size=n_samples),    
    'body_color': rng.choice(colors, size=n_samples),
    'door_color': rng.choice(colors, size=n_samples),
    'roof_color': rng.choice(colors, size=n_samples),
})

Select all rows containing color:

In [None]:
df.loc[
    :,
    [x.endswith('_color') for x in df.columns]
]

It is much easier using the filter command:

In [None]:
df.filter(like='color', axis=1)

If you have any questions, comments, or requests, feel free to [contact me on LinkedIn](https://linkedin.com/in/dennisbakhuis).