# Introduction to pandas

In this notebook, you'll get familiar with the basics of reading in and getting acquainted with data using the `pandas` library.

### Import the `pandas` library, aliased as `pd`.

In [None]:
import pandas as pd

### Let's explore these pandas methods, attributes, and accessors
 - .head()
 * .tail()
 * .shape
 * .info()
 * .dtypes
 - .columns
 - .drop()
 * .unique()
 * .nunique()
 * .value_counts()]].
 - .query()
 - .rename()
 - .loc[]
 - [[]]

### Read in the public art data and examine the head, tail, shape, info and dtypes

In [None]:
art = pd.read_csv('../data/public_art.csv')

To inspect a portion of the dataframe, you can use `.head()` (to see the first few rows) or `.tail()` (to see the last few rows).

In [None]:
art.head(2)

In [None]:
art.tail(2)

In [None]:
art.shape

In [None]:
art.info()

**What do you notice?**

Quite a few missing Desciptions, 10 missing First Names, a few missing Mediums, and one missing Location.

In [None]:
art.dtypes

You may notice that most of the columns are "objects". This is the datatype that `pandas` uses for text data. 

The float64 datatype is a numeric datatype that can handle decimal values.

In [None]:
art.columns

Since the Mapped Location information is already contained in the Latitude and Longitude columns, you really don't need to store it twice. You can use the `.drop()` method to get rid of that column.

In [None]:
art.drop(columns='Mapped Location')

In [None]:
art.head(2)

What happened? We failed to save the result of dropping the column. We need to assign the result back to the art dataframe.

In [None]:
art = art.drop(columns = 'Mapped Location')

What are the different Types of artwork in this dataset?

In [None]:
art['Type'].unique()

If you only care about the _number_ of unique values in a colmn, you can use `.nunique`.

For example, if you want to know the number of artist last names:

In [None]:
art['Last Name'].nunique()

Which is the most popular Type?

In [None]:
art['Type'].value_counts()

What if you want to see all of the Murals? You can slice a dataframe using the `.query` method.

In [None]:
art.query('Type == "Mural"')

If you want to do further work or exploration with the sliced dataframe, you need to save it to a new variable.

In [None]:
murals = art.query('Type == "Mural"')
murals.shape

Who is the most prolific mural painter in Nashville?

In [None]:
murals['Last Name'].value_counts()

Let's see all of the artwork that Cooper painted.

In [None]:
murals.query('Last Name == "Cooper"')

Oh no. What went wrong?

The `.query` method does not like spaces in column names. These can be escaped by using backticks around the column name.

In [None]:
murals.query('`Last Name` == "Cooper"')

But another option is to rename our columns to remove the spaces.

In [None]:
art.columns

Column names can be set either by specifying a **list** of names:

In [None]:
art.columns = ['title', 'last_name', 'first_name', 'location', 'medium',
              'type', 'description', 'lat', 'lng']

or by using the `.rename` method and passing in a **dictionary**. A dictionary is a collection of key-value pairs.

In [None]:
art = art.rename(columns = {'Title': 'title', 
                            'Last Name': 'last_name', 
                            'First Name': 'first_name',
                            'Location': 'loc', 
                            'Medium': 'medium', 
                            'Desccription': 'desc', 
                            'Latiitude': 'lat', 
                            'Longitude': 'lng'})

So now you can try to slice the murals dataframe using the new column name.

In [None]:
murals.query('last_name == "Cooper"')

Oops, what went wrong? Just because we renamed the art dataframe doesn't mean the murals one will be renamed as well. Fix the issue so that the query above will work.

In [None]:
murals = murals.rename(columns = {'Title': 'title', 
                            'Last Name': 'last_name', 
                            'First Name': 'first_name',
                            'Location': 'loc', 
                            'Medium': 'medium', 
                            'Desccription': 'desc()', 
                            'Latiitude': 'lat', 
                            'Longitude': 'lng'})

In [None]:
murals.query('last_name == "Cooper"')

In [None]:
murals['last_name'].value_counts()

Take another look at the murals dataframe and notice that Sterling Goller-Brown and Ian Lawrence collaborated on multiple murals, but these are stored in the dataframe differently. What if we want to slice down and find these rows?

In [None]:
goller_lawrence = ['Sterling Goller-Brown.  Ian Lawrence', 'Sterling Goller-Brown and Ian Lawrence, co-creators']

In [None]:
murals.query('last_name in @goller_lawrence')

Another method of slicing a dataframe is by using `.loc`.

We can fetch rows based on their **index** values (the first column of the dataframe).

In [None]:
art.loc[20]

You can fetch a range of rows:

In [None]:
art.loc[20:25]

You can also fetch only columns you're interested in by specifying them by name.

In [None]:
art.loc[20:25, ['title', 'last_name', 'medium']]

It's also possible to slice down to a list of columns using double brackets [[ ]]:

In [None]:
art[['title', 'type', 'description']]

And finally, you can slice based on a condition, similar to using `.query`.

In [None]:
art.loc[art['last_name'] == 'Faxon']

Note that to slice to values in a list, you can use `.isin`.

In [None]:
art.loc[art['last_name'].isin(goller_lawrence)]

Finally, you can negate a condition by adding a tilde ~ before that condition. So if you want to find all murals not paineted by Goller-Brown and Lawrence:

In [None]:
murals.loc[~murals['last_name'].isin(goller_lawrence)]