## Selecting Data
When processing data, we are usually considering some subset of a dataset rather than looking at it holistically. For example, we may want to just analyse two specific columns in our dataframe, or might want to focus on just the first 100 entries in the dataframe. These tasks require you to be able to select data. Pandas provides two primary tools for data selection: <code>loc</code> which selects by label, and <code>iloc</code> which selects by numbered indexing. These are summarised in the table below. Note that df represents the name of the dataframe you are working with.

| Code |	Description |
| ----------- | ----------- |
| <code>df[col_label]</code> or <code>df.col_label</code> |	Select columns by label |
| <code>df.loc[row_label]</code> |	Select rows by label |
| <code>df.loc[row_label, col_label]</code>	| Select dataframe elements by row and column labels |
| <code>df.iloc[:, col_index]</code>	| Select columns by numbered indexing |
| <code>df.iloc[row_index]</code>	| Select rows by numbered indexing |
| <code>df.iloc[row_index, col_index]</code>	| Select dataframe elements by row and column numbered indexes |

In the event you want to select multiple columns/rows, you can provide a list for your label/index. Pandas also supports slice notation if you wanted to select a slice of a dataframe. Some examples are provided below for the pollution dataset.

In [None]:
import pandas as pd
pollution = pd.read_csv('LSTM-Multivariate_pollution.csv', index_col = 'date', parse_dates = True)
pollution

In [None]:
# selecting a row by label
pollution.loc['2/01/2010 0:00']

In [None]:
# selecting a column by label
pollution['dew']

In [None]:
# selecting an element by row and column label
pollution.loc['2/01/2010 0:00', 'dew']

In [None]:
# selecting myltiple columns with a list
pollution[['pollution', 'temp', 'rain']]

In [None]:
# selecting multiple rows using slice notation
pollution.loc['2/01/2010 0:00': '3/01/2010 0:00']

***
## Filtering Data
Filtering is the processing of selecting data conditionally. Filtering can be performed in pandas using Boolean indexing. Doing a comparison with a pandas series will result in a Boolean series, which is a sequences of Booleans (<code>True</code> and <code>False</code>). This can then be used as an index to select values. This will keep all the values where the Boolean index is <code>True</code>, and remove the values where its <code>False</code>.

In [None]:
# Filtering for rows where it is raining
is_rain = pollution['rain'] > 0
pollution[is_rain]

In [None]:
# Filtering for rows where the temperature is high (above 30)
is_hot = pollution['temp'] > 30
pollution[is_hot]

Note that the logical operators (<code>and</code>, <code>or</code>, <code>not</code>) are not elementwise operators. For elementwise application, you need to use their bitwise counterparts (<code>&</code>, <code>|</code>, <code>~</code>). Some examples are provided below for the pollution dataset.

In [None]:
# Filtering for rows that are hot and raining
pollution[is_hot & is_rain]

In [None]:
# Filtering for rows that are hot or raining
pollution[is_hot | is_rain]

In [None]:
# Filtering for rows that are dry
pollution[~is_rain]