# Exploring a DataSet

Now that we have seen the basic pieces of Altair's API, it's time to practice using it to explore a new dataset.
With your partner, choose one of the following four datasets, detailed below.

As you explore the data, recall the building blocks we've discussed:

- various marks: ``mark_point()``, ``mark_line()``, ``mark_tick()``, ``mark_bar()``, ``mark_area()``, ``mark_rect()``, etc.
- various encodings: ``x``, ``y``, ``color``, ``shape``, ``size``, ``row``, ``column``, ``text``, ``tooltip``, etc.
- binning and aggregations: a [List of available aggregations](https://altair-viz.github.io/user_guide/encoding.html#binning-and-aggregation) can be found in Altair's documentation
- stacking and layering (``alt.layer`` <-> ``+``, ``alt.hconcat`` <-> ``|``, ``alt.vconcat`` <-> ``&``)

Start simple and build from there. Which encodings work best with quantitative data? With categorical data?
What can you learn about your dataset using these tools?

We'll set aside about 20 minutes for you to work on this with your partner.

In [None]:
from vega_datasets import data

## Seattle Weather

This data includes daily precipitation, temperature range, wind speed, and weather type as a function of date between 2012 and 2015 in Seattle.

In [None]:
weather = data.seattle_weather()
weather.head()

## Gapminder

This data consists of population, fertility, and life expectancy over time in a number of countries around the world.

Note that, while you may be tempted to use a temporal encoding for the year, here the year is simply a number, not a date stamp, and so temporal encoding is not the best choice here.

In [None]:
gapminder = data.gapminder()
gapminder.head()

## Population

This data contains the US population sub-divided by age and sex every decade from 1850 to near the present.

Note that, while you may be tempted to use a temporal encoding for the year, here the year is simply a number, not a date stamp, and so temporal encoding is not the best choice.

In [None]:
population = data.population()
population.head()

## Movies

The movies dataset has data on 3200 movies, including release date, budget, and ratings on IMDB and Rotten Tomatoes.

In [None]:
movies = data.movies()
movies.head()