# Filtering columns and rows in pandas

This notebook has a little more detail on selecting and filtering data in pandas. We'll use the MLB salary data as an example.

In [None]:
import pandas as pd

In [None]:
df = pd.read_csv('../data/mlb.csv')

In [None]:
df.head()

### Selecting one column of data

You can select a column of data using dot notation `.` or square brackets: `[]`.

If you want to select just one column of data, and the name of the column you're selecting doesn't have spaces, you can use dot notion. You could also pass the name of the column as a string inside square brackets.

Let's say we wanted to select the `TEAM` column. We could do this:

In [None]:
df.TEAM

... or we could do this:

In [None]:
df['TEAM']

Either works.

### Selecting multiple columns of data

To select multiple columns of data, we're going to pass a _list_ of column names into the square brackets. Let's select the `NAME` and `TEAM` columns.

👉 For a refresher on _lists_, [check out this notebook](Python%20data%20types%20and%20basic%20syntax.ipynb).

In [None]:
df[['NAME', 'TEAM']]

Lots of square brackets happening here! You could easily assign the list of column names to its own variable to make things clearer:

In [None]:
cols_of_interest = ['TEAM', 'NAME']
df[cols_of_interest]

### Filtering rows of data

You can also filter rows to keep just the records that meet your filtering condition(s) -- like a `WHERE` clause in SQL.

For example, let's say you wanted to filter this data to include just the Los Angeles Dodgers. The basic syntax is to pass your filtering condition to the data frame in square brackets `[]`.

In [None]:
lad = df[df['TEAM'] == 'LAD']

In [None]:
lad

You can do numerical comparisons -- let's get just the players who make $1 million or more:

In [None]:
millionaires = df[df['SALARY'] >= 1000000]

In [None]:
millionaires

You can use the [`isin()`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.isin.html) method to test a value against multiple matches -- just hand it your list of values to check against.

Let's say we wanted to return all of the players for the Texas Rangers and Houston Astros.

In [None]:
tx = df[df['TEAM'].isin(['TEX', 'HOU'])]

In [None]:
tx

### Filtering on multiple criteria

You can filter your data on multiple criteria. A few gotchas:
- Using Python's `and` and `or` operators to chain the statements -- [pandas wants you to use `&` and `|`](https://pandas.pydata.org/pandas-docs/version/0.22/indexing.html#boolean-indexing)
- Forgetting to use parentheses to group your statements

Let's filter for all catchers who make the league minimum of $535,000.

In [None]:
catchers_lm = df[(df['POS'] == 'C') & (df['SALARY'] == 535000)]

In [None]:
catchers_lm