# Introduction to the _pandas_ Library

In this notebook, we'll work with a dataset which contains information on public artworks that are located around Nashville.

First, we'll import the _pandas_ library, using the __alias__ `pd`.

In [2]:
import pandas as pd

## Importing and Inspecting the Data

In [3]:
art = pd.read_csv('../data/public_art.csv')

To see the top of the dataset, you can use the `.head()` method.

In [24]:
art.head()

Unnamed: 0,Title,Last Name,First Name,Location,Medium,Type,Description,Latitude,Longitude,Mapped Location
0,[Cross Country Runners],Frost,Miley,"4001 Harding Rd., Nashville TN",Bronze,Sculpture,,36.12856,-86.8366,"(36.12856, -86.8366)"
1,[Fourth and Commerce Sculpture],Walker,Lin,"333 Commerce Street, Nashville TN",,Sculpture,,36.16234,-86.77774,"(36.16234, -86.77774)"
2,12th & Porter Mural,Kennedy,Kim,114 12th Avenue N,Porter all-weather outdoor paint,Mural,Kim Kennedy is a musician and visual artist wh...,36.1579,-86.78817,"(36.1579, -86.78817)"
3,A Splash of Color,Stevenson and Stanley and ROFF (Harroff),Doug and Ronnica and Lynn,616 17th Ave. N.,"Steel, brick, wood, and fabric on frostproof c...",Mural,Painted wooden hoop dancer on a twenty foot po...,36.16202,-86.79975,"(36.16202, -86.79975)"
4,A Story of Nashville,Ridley,Greg,"615 Church Street, Nashville TN",Hammered copper repousse,Frieze,"Inside the Grand Reading Room, this is a serie...",36.16215,-86.78205,"(36.16215, -86.78205)"


In [25]:
art.tail(2)

Unnamed: 0,Title,Last Name,First Name,Location,Medium,Type,Description,Latitude,Longitude,Mapped Location
130,Women Suffrage Memorial,LeQuire,Alan,"600 Charlotte Avenue, Nashville TN",Bronze sculpture,Sculpture,,36.16527,-86.78382,"(36.16527, -86.78382)"
131,Youth Opportunity Center-STARS Nashville - Pea...,Rudloff,Andee,1704 Charlotte Ave.,House paint on vinyl,Mural,,36.15896,-86.799,"(36.15896, -86.799)"


In [4]:
art.shape

(132, 10)

In [5]:
art.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 132 entries, 0 to 131
Data columns (total 10 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Title            132 non-null    object 
 1   Last Name        132 non-null    object 
 2   First Name       122 non-null    object 
 3   Location         131 non-null    object 
 4   Medium           128 non-null    object 
 5   Type             132 non-null    object 
 6   Description      87 non-null     object 
 7   Latitude         132 non-null    float64
 8   Longitude        132 non-null    float64
 9   Mapped Location  132 non-null    object 
dtypes: float64(2), object(8)
memory usage: 10.4+ KB


## Modifying/Cleaning Up

The `columns` attribute shows the column names for the DataFrame.

In [6]:
art.columns

Index(['Title', 'Last Name', 'First Name', 'Location', 'Medium', 'Type',
       'Description', 'Latitude', 'Longitude', 'Mapped Location'],
      dtype='object')

First, let's get rid of the `Mapped Location` column. This can be done using the `.drop( )` method; we need to specify `columns = ` and pass a list of columns to the method.

In [7]:
art.drop(columns = ['Mapped Location'])

Unnamed: 0,Title,Last Name,First Name,Location,Medium,Type,Description,Latitude,Longitude
0,[Cross Country Runners],Frost,Miley,"4001 Harding Rd., Nashville TN",Bronze,Sculpture,,36.128560,-86.836600
1,[Fourth and Commerce Sculpture],Walker,Lin,"333 Commerce Street, Nashville TN",,Sculpture,,36.162340,-86.777740
2,12th & Porter Mural,Kennedy,Kim,114 12th Avenue N,Porter all-weather outdoor paint,Mural,Kim Kennedy is a musician and visual artist wh...,36.157900,-86.788170
3,A Splash of Color,Stevenson and Stanley and ROFF (Harroff),Doug and Ronnica and Lynn,616 17th Ave. N.,"Steel, brick, wood, and fabric on frostproof c...",Mural,Painted wooden hoop dancer on a twenty foot po...,36.162020,-86.799750
4,A Story of Nashville,Ridley,Greg,"615 Church Street, Nashville TN",Hammered copper repousse,Frieze,"Inside the Grand Reading Room, this is a serie...",36.162150,-86.782050
...,...,...,...,...,...,...,...,...,...
127,We Are Our Stories,Omari Booker & The REAL Program at Oasis Center,,1037 28th Avenue North,acrylic & spray paint on plywood,Mural,"""We Are Our Stories"" is a public art project t...",36.165101,-86.822209
128,Welcome to Flatrock,Cooper,Michael,3756 Nolensville Rd,Silicate paint on concrete,Mural,Trompe L'oeil animals and architectural stonew...,36.090820,-86.734450
129,Wind Reeds,Kahn,Ned,"1 Terminal Drive, Nashville TN",Aluminum panels,Sculpture,Hinged aluminum panels that cover a wall of th...,36.134690,-86.667770
130,Women Suffrage Memorial,LeQuire,Alan,"600 Charlotte Avenue, Nashville TN",Bronze sculpture,Sculpture,,36.165270,-86.783820


Let's check to see if the column is gone.

In [8]:
art.head(1)

Unnamed: 0,Title,Last Name,First Name,Location,Medium,Type,Description,Latitude,Longitude,Mapped Location
0,[Cross Country Runners],Frost,Miley,"4001 Harding Rd., Nashville TN",Bronze,Sculpture,,36.12856,-86.8366,"(36.12856, -86.8366)"


When modifying a _pandas_ DataFrame, if you want the changes to stick, you need to assign the result back to the DataFrame.

In [26]:
art = art.drop(columns = ['Mapped Location'])
art.head(1)

Unnamed: 0,Title,Last Name,First Name,Location,Medium,Type,Description,Latitude,Longitude
0,[Cross Country Runners],Frost,Miley,"4001 Harding Rd., Nashville TN",Bronze,Sculpture,,36.12856,-86.8366


Now, let's change the column names. This can be done by assigning a new list of values to the `columns` attribute. Note that the new column names need to be in the correct order.

In [27]:
art.columns = ['title', 'last', 'first', 'loc', 'med',
              'art_type', 'desc', 'lat', 'lng']

If you only want to change the name of a subset of columns, you can use the df.rename() function. This is the safer way to rename columns.

In [28]:
art = art.rename(columns = {
    'Title': 'title', 
    'Last Name': 'last_name', 
    'First Name': 'first_name',
    'Location': 'loc', 
    'Medium': 'medium',
    'Type': 'art_type',
    'Description': 'desc', 
    'Latitude': 'lat', 
    'Longitude': 'lng'})

## Summarizing

Let's say we want to know the what all types of art there are in our dataset. We can get a list of the unique values in a column by using `.unique()`.

In [29]:
art['art_type'].unique()

array(['Sculpture', 'Mural', 'Frieze', 'Monument', 'Mobile', 'Furniture',
       'Mosaic', 'Relief', 'Stained Glass', 'Bronzes',
       'Sculpture/Fountain', 'Various', 'Street Art', 'mural', 'Fountain',
       'Multipart'], dtype=object)

If you just need to know _how many_ different values there are in a column, you can use `.nunique`.

In [30]:
art['med'].nunique()

100

Finally, if we want to see how common each art type is, we can use `.value_counts`.

In [35]:
art['art_type'].value_counts()

Sculpture             61
Mural                 38
Monument              16
Frieze                 2
Mobile                 2
Mosaic                 2
Various                2
Furniture              1
Relief                 1
Stained Glass          1
Bronzes                1
Sculpture/Fountain     1
Street Art             1
mural                  1
Fountain               1
Multipart              1
Name: art_type, dtype: int64

## Slicing and Filtering

The `loc[ ]` accessor returns the specified rows (and columns) by their __labels__.

You can filter for just some of rows according to specific values or conditions.

For example, let's find all rows where the `art_type` is 'Mural'.

In [31]:
murals = art.loc[art['art_type'] == 'Mural']

In [32]:
murals

Unnamed: 0,title,last,first,loc,med,art_type,desc,lat,lng
2,12th & Porter Mural,Kennedy,Kim,114 12th Avenue N,Porter all-weather outdoor paint,Mural,Kim Kennedy is a musician and visual artist wh...,36.1579,-86.78817
3,A Splash of Color,Stevenson and Stanley and ROFF (Harroff),Doug and Ronnica and Lynn,616 17th Ave. N.,"Steel, brick, wood, and fabric on frostproof c...",Mural,Painted wooden hoop dancer on a twenty foot po...,36.16202,-86.79975
5,Aerial Innovations Mural,Rudloff,Andee,202 South 17th St.,House paint on wood,Mural,,36.17354,-86.73994
10,April Baby,Prestwod,Seth,3020 Charlotte Avenue,Acrylic Paint,Mural,portrait of artists little sister with links t...,36.15399,-86.819539
16,Bicycle Bus-Green Fleet,Rudloff,Andee,1st Avenue (under John Seigenthaler Pedestrian...,Metallic paint on metal/found object,Mural,,36.16131,-86.77336
19,Building a Positive Community,"Healing Arts Project, Inc.",Healing Arts Project,East Park Community Center,interior wall paint on board,Mural,"The Healing Arts Project, Inc. sponsored the c...",36.17214,-86.76244
26,Cool Fences,Guion,Scott,"500 East Iris Dr., Nashville, TN",Latex house paint on wood fence,Mural,Portraits of iconic musicians on decorative ba...,36.11554,-86.76366
28,Demonbreun Hill Mural,Deese,Bryan,1524 Demonbreun Street,Latex paint and spray paint,Mural,This piece celebrates Demonbreun Hills former ...,36.153,-86.790492
29,Dragon Wall Mural,Randolf and Glick,Adam and David,21st Avenue and Belcourt Ave.,painting,Mural,,36.1375,-86.80119
30,Eastside Mural,Sterling Goller-Brown. Ian Lawrence,,1008 Forrest Ave,Spray Paint,Mural,,36.178323,-86.75024


Let's confirm that we got the expected number of rows.

In [37]:
murals.shape

(38, 9)

When using `.loc`, you can also keep only certain columns.

In [33]:
art.loc[art['art_type'] == 'Mural', ['last', 'first']].head()

Unnamed: 0,last,first
2,Kennedy,Kim
3,Stevenson and Stanley and ROFF (Harroff),Doug and Ronnica and Lynn
5,Rudloff,Andee
10,Prestwod,Seth
16,Rudloff,Andee


Passing a list of columns to slice from the DataFrame (double brackets) returns a DataFrame with just those columns

In [34]:
artists = art[['last', 'first']]
artists.head(2)

Unnamed: 0,last,first
0,Frost,Miley
1,Walker,Lin


To subset the `art` DataFrame to only include furniture and stained glass you can use the `isin( )` function along with `.loc[ ]`. You need to pass a list of art types to include to `isin()`. 

In [18]:
art.loc[art.art_type.isin(['Furniture','Stained Glass'])]

Unnamed: 0,title,last,first,loc,med,art_type,desc,lat,lng
22,Children's Chairs For The Seasons,McGraw,Deloss,"615 Church Street, Nashville TN",Mixed Media - wood and paint,Furniture,chairs depicting the four seasons,36.16215,-86.78205
43,History in Stained Glass,Baker,Gus,"1101 19th Avenue South, Nashville TN",83 Stained glass medallions,Stained Glass,,36.14564,-86.79765


To subset the `art ` DataFrame to include everything _but_  furniture and stained glass, use the same syntax with a `~` at the beginning of the expression you pass to `.loc[ ]`.

In [19]:
art.loc[~art.art_type.isin(['Furniture','Stained Glass'])]

Unnamed: 0,title,last,first,loc,med,art_type,desc,lat,lng
0,[Cross Country Runners],Frost,Miley,"4001 Harding Rd., Nashville TN",Bronze,Sculpture,,36.128560,-86.836600
1,[Fourth and Commerce Sculpture],Walker,Lin,"333 Commerce Street, Nashville TN",,Sculpture,,36.162340,-86.777740
2,12th & Porter Mural,Kennedy,Kim,114 12th Avenue N,Porter all-weather outdoor paint,Mural,Kim Kennedy is a musician and visual artist wh...,36.157900,-86.788170
3,A Splash of Color,Stevenson and Stanley and ROFF (Harroff),Doug and Ronnica and Lynn,616 17th Ave. N.,"Steel, brick, wood, and fabric on frostproof c...",Mural,Painted wooden hoop dancer on a twenty foot po...,36.162020,-86.799750
4,A Story of Nashville,Ridley,Greg,"615 Church Street, Nashville TN",Hammered copper repousse,Frieze,"Inside the Grand Reading Room, this is a serie...",36.162150,-86.782050
...,...,...,...,...,...,...,...,...,...
127,We Are Our Stories,Omari Booker & The REAL Program at Oasis Center,,1037 28th Avenue North,acrylic & spray paint on plywood,Mural,"""We Are Our Stories"" is a public art project t...",36.165101,-86.822209
128,Welcome to Flatrock,Cooper,Michael,3756 Nolensville Rd,Silicate paint on concrete,Mural,Trompe L'oeil animals and architectural stonew...,36.090820,-86.734450
129,Wind Reeds,Kahn,Ned,"1 Terminal Drive, Nashville TN",Aluminum panels,Sculpture,Hinged aluminum panels that cover a wall of th...,36.134690,-86.667770
130,Women Suffrage Memorial,LeQuire,Alan,"600 Charlotte Avenue, Nashville TN",Bronze sculpture,Sculpture,,36.165270,-86.783820
