# Pandas

Pandas is a Python library that is made for manipulating tabular data. To start using Pandas you will need to install it and then import it. It is conventional to import it using the alias `pd`:

In [None]:
import pandas as pd

Note: This tutorial summarizes the [Pandas Getting Started Tutorials](https://pandas.pydata.org/docs/getting_started/intro_tutorials/index.html).

## DataFrame

The data structure provided by Pandas is the `DataFrame`. A `DataFrame` has rows and columns, similar to a spreadsheet or database table:

In [None]:
df = pd.DataFrame(
    {
        'Site ID': [10163000, 10163000, 10163000],
        'Site Name': ['PROVO RIVER AT PROVO, UT', 'PROVO RIVER AT PROVO, UT', 'PROVO RIVER AT PROVO, UT'],
        'Parameter': ['Discharge(Mean)', 'Discharge(Mean)', 'Discharge(Mean)'],
        'Units': ['cfs', 'cfs', 'cfs'],
        'Value': [48.2, 53.0, 62.2],
    }
)
df

## Series

Each column in a `DataFrame` is a `Series`. A column can be selected by its name using square brackets `[]`:

In [None]:
df['Value']

The `describe` method can be used to calculate summary statistics on a column or `Series`:

In [None]:
df['Value'].describe()

The `Series` also has methods for computing each of the individual statistics:

In [None]:
df['Value'].mean()

## Import Data

Pandas provides several methods for reading data from files in common formats.

Download the **Titanic dataset** from the Pandas tutorial here: [Read and Write Data | Pandas](https://pandas.pydata.org/docs/getting_started/intro_tutorials/02_read_write.html)

Import the Titanic dataset using the `read_csv` function as follows:

In [None]:
titanic = pd.read_csv('titanic.csv')
titanic.head(5)

## Subset Data

Multiple columns can be selected as a subset of the `DataFrame`: 

In [None]:
titanic[["Age", "Sex"]]

The data can be filtered to specific rows using the following syntax:

In [None]:
titanic[titanic["Age"] > 35]

## `loc`

The `loc` operator can be used to select specific rows and columns by column name, row label, or conditional statement:

In [None]:
# Shows names of passengers older than 35
titanic.loc[titanic["Age"] > 35, "Name"]

## iloc

The `iloc` operator can be used to select specific rows and columns by their index/coordinates:

In [None]:
# Selects rows 9-25, columns 2-5
titanic.iloc[9:25, 2:5]

# Learn More

To learn more about using Pandas, see the [Pandas Getting Started Tutorials](https://pandas.pydata.org/docs/getting_started/intro_tutorials/index.html)