In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### Import and Clean the Data

First, we'll repeat the steps that we did in the last notebook to prepare the dataset.

In [None]:
art = pd.read_csv('../data/public_art.csv')
art = art.drop(columns = ['Mapped Location'])
art = art.rename(columns = {
    'Title': 'title', 
    'Last Name': 'last_name', 
    'First Name': 'first_name',
    'Location': 'loc', 
    'Medium': 'medium',
    'Type': 'art_type',
    'Description': 'desc', 
    'Latitude': 'lat', 
    'Longitude': 'lng'})

art.head(2)

### More exploration with pandas
 - .value_counts()
 - .reset_index()
 - .describe()
 - .info()
 - .isna().sum()

Recall that `value_counts( )` tallies the number of each value for a column; here we look at the art_type column.

In [None]:
art['art_type'].value_counts()

Let's save it into a variable and check what type of object it is.

In [None]:
type_counts = art['art_type'].value_counts()
type_counts.head()

Let's check to see what `type_counts` is. Is it a DataFrame?

In [None]:
type(type_counts)

If you look at the head, you'll notice that the art type is now the index value for this Series.

In [None]:
type_counts.head(2)

In [None]:
type_counts.index

You can move the index value to a column (converting the series to a DataFrame in the process) with the `reset_index()` method.

In [None]:
#reset index resets to 0-based index and moves existing index to a column
type_counts = type_counts.reset_index()
type_counts

Now check the type again and look at the head.

In [None]:
type(type_counts)

In [None]:
type_counts.head(2)

Now you can rename the columns.

In [None]:
type_counts.columns = ['art_type', 'number']
type_counts.head(3)

The `.info( )` method gets lots more information about the dataset: 
 - number of rows
 - types for each column
 - size in memory
 - missingness

In [None]:
art.info()

If you want to explore the missing values, the `.isna()` method can be useful. It returns a _Boolean_ (True/False) value indicating whether a particular value is missing (NA).

In [None]:
art['first_name'].isna()

For example, we can use this in combination with `.loc` to find rows there the first name is missing.

In [None]:
art.loc[art['first_name'].isna()]

You can also check the number of null values by chaining the `isna( )` and `sum( )` methods together.

The reason this works is that Python will treat True as 1 and False as 0 when doing arithmetic on Booleans.

In [None]:
art.isna().sum()

For quantitative (numeric) variables, the `.describe( )` method gives statistical information:
- count
- mean
- standard deviation
- minimum
- maximum
- quartiles


In [None]:
type_counts['number'].describe()

## Plotting Data

Visualizing data can sometimes make it easier to derive insights. Let's revisit our `type_counts` DataFrame and see how we can create a plot out of it.

In [None]:
type_counts.head()

We can start by making a basic bar chart using the `.plot` method.

In [None]:
type_counts.plot(kind = 'bar', 
                 x = 'art_type', 
                 y = 'number');

The dimensions of the figure can be adjusted by using the `figsize` argument.

In [None]:
type_counts.plot(kind = 'bar', 
                 x = 'art_type', 
                 y = 'number',
                 figsize = (9,3));

If you have a large number of categories or long category names, it sometimes makes sense to do a horizontal bar chart.

In [None]:
type_counts.plot(kind = 'barh', 
                 x = 'art_type', 
                 y = 'number');

If we want to plot the data with the largest values on top, we can use the `sort_values` method to sort our DataFrame before plotting.

In [None]:
type_counts.sort_values('number').plot(kind = 'barh', 
                                       x = 'art_type', 
                                       y = 'number');

### Fancier horizontal barplot with seaborn

We can make a nicer looking plot using the _seaborn_ library. You can check out the seaborn plot gallery here: [https://seaborn.pydata.org/examples/index.html](https://seaborn.pydata.org/examples/index.html).


If you're not sure what each line is doing in the following block, try removing a line by adding a # at the beginning of that line to comment it out. This will make Python ignore the line when evaluating the block.

In [None]:
plt.figure(figsize = (10, 6))                               # Increase the plot size to 10 x 6
sns.set(style="whitegrid")                                  # Change the plot style
sns.barplot(x = 'number', 
            y = 'art_type', 
            data = type_counts)
plt.xlabel('')                                              # Remove the x-axis label
plt.ylabel('')                                              # Remove the y-axis label
plt.title('Types of Public Art in Nashville');