# Introduction to the _pandas_ Library

In this notebook, we'll work with a dataset which contains information on public artworks that are located around Nashville.

First, we'll import the _pandas_ library, using the __alias__ `pd`.

In [None]:
import pandas as pd

## Importing and Inspecting the Data

In [None]:
art = pd.read_csv('../data/public_art.csv')

To see the top of the dataset, you can use the `.head()` method.

In [None]:
art.head()

In [None]:
art.tail(2)

In [None]:
art.shape

In [None]:
art.info()

## Modifying/Cleaning Up

The `columns` attribute shows the column names for the DataFrame.

In [None]:
art.columns

First, let's get rid of the `Mapped Location` column. This can be done using the `.drop( )` method; we need to specify `columns = ` and pass a list of columns to the method.

In [None]:
art.drop(columns = ['Mapped Location'])

Let's check to see if the column is gone.

In [None]:
art.head(1)

When modifying a _pandas_ DataFrame, if you want the changes to stick, you need to assign the result back to the DataFrame.

In [None]:
art = art.drop(columns = ['Mapped Location'])
art.head(1)

Now, let's change the column names. This can be done by assigning a new list of values to the `columns` attribute. Note that the new column names need to be in the correct order.

In [None]:
art.columns = ['title', 'last', 'first', 'loc', 'med',
              'art_type', 'desc', 'lat', 'lng']

If you only want to change the name of a subset of columns, you can use the df.rename() function. This is the safer way to rename columns.

In [None]:
art = art.rename(columns = {
    'Title': 'title', 
    'Last Name': 'last_name', 
    'First Name': 'first_name',
    'Location': 'loc', 
    'Medium': 'med',
    'Type': 'art_type',
    'Description': 'desc', 
    'Latitude': 'lat', 
    'Longitude': 'lng'})

## Summarizing

Let's say we want to know the what all types of art there are in our dataset. We can get a list of the unique values in a column by using `.unique()`.

In [None]:
art['art_type'].unique()

If you just need to know _how many_ different values there are in a column, you can use `.nunique`.

In [None]:
art['med'].nunique()

Finally, if we want to see how common each art type is, we can use `.value_counts`.

In [None]:
art['art_type'].value_counts()

## Slicing and Filtering

The `loc[ ]` accessor returns the specified rows (and columns) by their __labels__.

You can filter for just some of rows according to specific values or conditions.

For example, let's find all rows where the `art_type` is 'Mural'.

In [None]:
murals = art.loc[art['art_type'] == 'Mural']

In [None]:
murals

Let's confirm that we got the expected number of rows.

In [None]:
murals.shape

When using `.loc`, you can also keep only certain columns.

In [None]:
art.loc[art['art_type'] == 'Mural', ['last', 'first']].head()

Passing a list of columns to slice from the DataFrame (double brackets) returns a DataFrame with just those columns

In [None]:
artists = art[['last', 'first']]
artists.head(2)

To subset the `art` DataFrame to only include furniture and stained glass you can use the `isin( )` function along with `.loc[ ]`. You need to pass a list of art types to include to `isin()`. 

In [None]:
art.loc[art.art_type.isin(['Furniture','Stained Glass'])]

To subset the `art ` DataFrame to include everything _but_  furniture and stained glass, use the same syntax with a `~` at the beginning of the expression you pass to `.loc[ ]`.

In [None]:
art.loc[~art.art_type.isin(['Furniture','Stained Glass'])]