## Creating charts with Altair

Welcome! This notebook will walk you through creating charts with the Python library Altair.

We'll use Pandas and Numpy to work with our datasets, but you don't need to know too much about them — just run the cells with code in it, and that will let you use the variables.

Try to use [Altair documentation](LINK) or Google to solve the chart prompts. If you get stuck, solutions are in the `fun-with-altair-SOLUTIONS.ipynb` file.

In [None]:
import pandas as pd
import numpy as np
import altair as alt

We're going to import our first dataset. It's a file with data on internet access in Vermont households between 2013 and 2017.

In [None]:
internet_vt = pd.read_csv("data/internet-vt-5yr.csv")
internet_vt.head()

Great.

Now that we have that in place, let's make a line chart that looks at the `year` on the x-axis and the number of households that report `no-internet` access on the y-axis.

The x-axis numbering probablylooks a bit weird, so read up on specifying data types in the Altair documentation and see if you can fix that.

Next, we're going to read in a file with a subset of that data, so we can focus in on the households without internet. While we're at it, let's also recast the year column to datetime format.

In [None]:
nointernet = pd.read_csv("data/internet-vt-pct-5yr.csv")
nointernet["year"] = pd.to_datetime(
    nointernet["year"], format="%Y"
)
nointernet.head()

Create an area chart that shows the households with `no-internet` by year.

That's interesting, but we don't know anything about how the *proportion* of households changed over those years. Luckily, there's a column for that. Change the the y-axis to `no-internet-pct`.

That's great, but what does it look like in context? For that, we'll need to reformat our first `internet_vt` dataset. We're just changing it from wide (each category gets a column) to long (there's a category column and a value column). That means there are multiple rows per year. 

In [None]:
internet_vt_long = pd.melt(internet_vt,
                           id_vars=["year"])
internet_vt_long.head()

Now we can create an chart that shows us all three categories. Create an area chart using the `internet_vt_long` dataset. How do you get it to show the different categories separately for each year?

Bonus: Figure out how to create a tooltip so when you hover over each area, it tells you the year, category and value.

You're making good progress! Let's load in a new dataset and run through a couple other chart types quickly.

In [None]:
rural_internet = pd.read_csv("data/internetrural-vt-county-2017.csv")
rural_internet.head()

Create a bar chart using `county-name` and `no-internet-pct`.

Cool. How about a scatterplot comparing `no-internet-pct` and `rural-pct`? This would be a good time to practice those tooltip skills.

Now that you've got that down, save each of those charts to a variable and put them into one chart. Try concatenating them vertically.

Almost done, but there's one more cool thing we should cover: maps! Altair's map handling is still relatively new and is the most difficult part of the library to use, since it involves geojson files and map projections. But it's fun to use!

For that, let's load in a national version of this data. We'll look at the percentage of households without internet access in every county across the nation in 2017.

In [None]:
internet_natl = pd.read_csv("data/internet-natl-county-2017.csv")
internet_natl.head()

Now, use the `no-internet-pct` column and the `data/us-10m.json` dataset to create a national map of internet access. You're going to need to use the `.transform_lookup` function to merge the two datasets, using the `id` column in the json file and the `code` column in the `internet_natl` file.

Good luck!