Now that you can create your own **line charts**, it's time to learn about more chart types!  

> _By the way, if this is your first experience with writing code in Python, you should be **very proud** of all that you have accomplished so far, because it's never easy to learn a completely new skill!  If you stick with the course, you'll notice that everything will only get easier (while the charts you'll build will get more impressive!), since the code is pretty similar for all of the charts._

In this tutorial, you'll learn about **bar charts** and **heatmaps**.

# Set up the notebook

As always, we begin by setting up the coding environment.  (_This code is hidden, but you can un-hide and re-hide it by clicking on the "Code" button immediately below this text, on the right._)

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
print("Setup Complete")

# Select a dataset

In this tutorial, we'll work with a dataset from the US Department of Transportation that tracks flight delays and cancellations.

Opening the corresponding CSV file in Excel shows that each flight has a different ID number (corresponding to **column A** in the spreadsheet), and the various columns track detailed information about each flight.

<img src="images/tut2_flight_head.png">

In this tutorial, we'll work with only three of the columns, including:
- `'Month'` - the month of the departure date
- `'Airline'` - the airline code (_`AS` stands for Alaska Airlines, `AA` stands for American Airlines, etc_)
- `'Arrival delay'` - how many minutes late the flight arrived (_where negative values denote a flight that arrived early_)

# Load the data

As before, we load the dataset using the `pd.read_csv` command.

In [None]:
# Path of the file to read
flight_filepath = "../input/flights.csv"

# Read the file into a variable flight_data
flight_data = pd.read_csv(flight_filepath, index_col="Id")

You may notice that the code is slightly shorter than what we used in [the previous tutorial](add-link-here).  In this case, since the row labels (from the `'Id'` column) don't correspond to dates, we don't add `parse_dates=True` in the parentheses.  But, we keep the first two pieces of text as before, to provide both: 
- the filepath for the dataset (in this case, `flight_filepath`), and 
- the name of the column that will be used to index the rows (in this case, `index_col="Id"`). 

You'll also notice that in this case, some output is returned below the code cell - it may initially look like a scary error, but in fact -- it's just a warning that can be ignored.  
> _In this case, the warning is telling us that while the dataset has loaded successfully, there are more efficient ways to load the CSV file than we've used here.  We'll ignore the warning and proceed, since the dataset loads pretty quickly nonetheless, and so heeding the warning would involve a lot of additional complexity with minimal payoff_.

# Examine the data

We'll use `.head()` to print the first five rows of the data.  Note that due to the large size of the dataset, not all of the columns are shown - but it's still enough information for us to verify that the dataset was correctly loaded.

In [None]:
# Print the first 5 rows of the data
flight_data.head()

# Bar chart, Part 1

It's time to create our first **bar chart**!  Say we'd like to see the number of flights from each airline.

As before, this can be done with a single line of code (_where customizing settings like the size and title of the figure involves some additional commands_).

In [None]:
# Set the width and height of the figure
plt.figure(figsize=(10,6))

# Add title
plt.title("Number of Flights, for Each Airline")

# Bar chart showing number of flights for each airline
sns.countplot(y=flight_data['Airline'])

# Add label for horizontal axis
plt.xlabel("")

The commands for setting the size and title of the figure are familiar from the previous tutorial.  The horizontal label is specified as `""`, which effectively removes the horizontal label.  The code that creates the bar chart is new:
```python
# Bar chart showing number of flights for each airline
sns.countplot(y=flight_data['Airline'])
```
It has two main components:
- `sns.countplot` tells the notebook that we want to create a **_special_** kind of bar chart.  (More on that soon!)
- `y=flight_data['Airline']` selects the data that will be used to create the bar chart (in this case, the `'Airline'` column of `flight_data`).  
 - Using `y=` creates a **horizontal bar chart** (where the categories appear along the vertical axis).  
 - If you'd prefer to create a **vertical bar chart** (where the categories appear along the horizontal axis), you need only change `y=` to `x=`.

So, why is this a **_special_** bar chart?  To see this, recall that it only uses the `'Airline'` column of `flight_data`.  

<img src="images/tut2_airline_column.png">

Notice that this column doesn't have any numerical values!  And that's why this bar chart is **_special_** -- because the `sns.countplot` command does all of the work of **_counting_** the number of entries for each airline for us.  

And, in the case that our dataset **_did_** explicitly contain a numerical value for the height of each bar, we'd use a different command to create the bar chart.  This is the next command we'll cover!

# Load and examine a *slightly* different dataset

For the next charts, we'll work with a different CSV file.  This new file is the result of some quick data munging, where we've taken the relatively "raw", extremely detailed information in `flight_data` and rearranged it to a table with only the information that we need to build the next visualizations.

Opening this new CSV file in Excel shows a row for each month (where `1` = January, `2` = February, etc) and a column for each airline code.

<img src="images/tut2_flight_delay_head.png">

Each entry shows the average arrival delay (in minutes) for a different airline and month (all in year 2015).  Negative entries denote flights that (_on average_) tended to arrive early.  For instance, the average American Airlines flight (_airline code: **AA**_) in January arrived roughly 7 minutes late, and the average Alaska Airlines flight (_airline code: **AS**_) in April arrived roughly 3 minutes early.

Below, we use `pd.read_csv` to load the data and `.head()` to check that it loaded properly.

In [None]:
# Path of the file to read
flight_delay_filepath = "../input/flight_delays.csv"

# Read the file into a variable flight_delay_data
flight_delay_data = pd.read_csv(flight_delay_filepath, index_col="Month")

# Print the first five rows of the data
flight_delay_data.head()

# Bar chart, Part 2

In this section, we'll use a different command to create a bar chart.  Say we'd like to plot the average arrival delay for Spirit Airlines (_airline code: **NK**_) flights, by month.  

In [None]:
# Set the width and height of the figure
plt.figure(figsize=(10,6))

# Add title
plt.title("Average Arrival Delay for Spirit Airlines Flights, by Month")

# Bar chart showing average arrival delay for Spirit Airlines flights by month
sns.barplot(x=flight_delay_data.index, y=flight_delay_data['NK'])

# Add label for vertical axis
plt.ylabel("Arrival delay (in minutes)")

The relevant code to create the bar chart is as follows:
```python
# Bar chart showing average arrival delay for Spirit Airlines flights by month
sns.barplot(x=flight_delay_data.index, y=flight_delay_data['NK'])
```
Note that `flight_delay_data` explicitly contains the height of each bar in the bar chart.  For instance, the height of the bar for January (or month `1`) is 11.398054, which can be read directly from the table.  In this case, we use `sns.barplot` to build the bar chart.  We also provide two additional pieces of information to customize the behavior:
- `x=flight_delay_data.index` - This determines what to use on the horizontal axis.  In this case, we have selected the column that **_index_**es the rows (in this case, the column containing the months).
- `y=flight_delay_data['NK']` - This sets the column in the data that will be used to determine the height of each bar.  In this case, we select the `'NK'` column.

> **Important Note**: You must select the indexing column with `flight_delay_data.index`, and it is not possible to use `flight_delay_data['Month']` (_which will return an error_).  This is because when we loaded the dataset, the `"Month"` column was used to index the rows.  **We always have to use this special notation to select the indexing column.**

# Heatmap

We have one more plot type to learn about: **heatmaps**! 

In the code cell below, we create a heatmap to quickly visualize patterns in `flight_delay_data`.  Each cell is color-coded according to its corresponding value.

In [None]:
# Set the width and height of the figure
plt.figure(figsize=(14,7))

# Add title
plt.title("Average Arrival Delay for Each Airline, by Month")

# Heatmap showing average arrival delay for each airline by month
sns.heatmap(data=flight_delay_data, annot=True)

# Add label for horizontal axis
plt.xlabel("Airline")

The relevant code to create the heatmap is as follows:
```python
# Heatmap showing average arrival delay for each airline by month
sns.heatmap(data=flight_delay_data, annot=True)
```
This code uses the `sns.heatmap` command to build the heatmap, along with a few additional pieces of information:
- `data=flight_delay_data` - This tells the notebook to use all of the entries in the data to create the heatmap.
- `annot=True` - This ensures that the values for each cell appear on the chart.  (_Leaving this out removes the numbers from each of the cells!_)

# What's next?

write later.