# Example of an interactive chart 

In this notebook we will be using 3 tools or `libraries`:
1. `pandas` for data analytics
2. `altair` for data visualization
3. `ipywidgets` for a dropdown menu widget to interact with our chart.

Check out their documentation pages to learn more about them:
* [pandas docs](https://pandas.pydata.org/pandas-docs/stable/) This one can be kind of dense, I would suggest checking out the `Pandas Cookbook` textbook included in this repo.
* [altair](https://altair-viz.github.io) *Altair is a declarative statistical visualization library for Python*
* [ipywidgets](https://ipywidgets.readthedocs.io/en/stable/examples/Widget%20Basics.html) *Widgets are eventful python objects that have a representation in the browser, often as a control like a slider, textbox, etc.*

***
If you ever want to explore a library on the go you can use `jupyter`'s auto-completion feature to see what `methods` a library has (kind of like functions).

Underneath, import pandas and in another cell write `pandas.` and hit `tab` (make sure you `import pandas` first)

In [None]:
import pandas

In [None]:
pandas.read_csv

***
With the dropdown menu open start writing `read_csv`. 

To explore what this function does write a `?` at the end and run the cell. This should bring out the `documentation` (sort of like when you write `<help something>`.

Right click on the `cell`'s output will give you the option to *enable scrolling output* in case the output of your cell is too long.

In [None]:
pandas.read_csv?

***
***
Here is an example of how one could write a blog piece in `jupyter`.<br>
***

# Immigration to California has steadily declined since 2000

Draft: May 3rd, 2018

Author: Sergio Sanchez

Notes: 300-400 words. 2 charts (already selected).

***
Main points: 
1. Overall immigration to California has declined drastically since 2000
2. Newly arrived immigrants (5 years or less) educational attainment has steadily increased across time. 
3. About half of newly arrived immigrants in 2016 had completed at least a Bachelor's degree. 
4. Asia surpassed Latin America as the #1 region of origin of recently arrived immigrants.

*** 
**Set Up**

We need to import the libraries we will be using

If you don't have these libraries already installed you can very easily install them using `pip`! <br>
For example, let's say you don't have `altair` installed. You just run `!pip install altair` in a code cell and it'll install the package for you. Try running the code below and install any libraries you don't have installed already.

In [None]:
import pandas as pd
import altair as alt
from ipywidgets import interact

We will import the csv 'immigrants blog post (clean).csv' on a `pandas` dataframe (`df`). <br> The `parse_dates` *argument* is optional. All it does is to tell `pandas` to interpret the column `'year'` as a date. `pandas` is optimized in many ways and intepreting a column as a date (when it is a date) *unlocks* many features that only make sense when you're working with dates.

In [None]:
df = pd.read_csv('immigrants blog post (clean).csv', parse_dates = ['year'], )

In the cell below we create a function that takes one *argument* that will be saved into the variable `country` which our function uses. We can set up our function to have a default value like this:
```python
def bar_chart(country = 'China'):
```
It's not *necessary* but it doesn't hurt. <br>
Run the cell below to create the function and the next cell to use it.

In [None]:
def bar_chart(country):
    '''
    This function creates an altair barchart. Expect the variable "country".
    '''
    data = df[df['year'] > '1990-01-01'].copy()
    data['year'] = data['year'].dt.year
    
    bar_chart = alt.Chart(data[data['bpld'] == country]).mark_bar().encode(
        x = 'year:O',
        y = 'perwt',
    ).properties(
        title = '{country_title}'.format(country_title = country),
        width = 500,
        height = 300,
    )
    return bar_chart
    
top_countries = df.groupby('bpld')['perwt'].sum().to_frame().reset_index().copy()
top_countries = top_countries[top_countries['perwt'] >= 5000]
top_countries = [country for country in top_countries['bpld']]
top_countries.pop(0) # this gets rid of 'Abroad, n.s.' aka the first country in our list

In [None]:
# Try typing the name of other countries! (Make sure it's capitalized)
bar_chart(country = 'China')

***
Let's explore this function to make more sense of it.

the first part cleans the data a little.
```python
data = df[df['year'] > '1990-01-01'].copy() 
```
This says *from your dataframe df grab the parts were the column 'year' of your dataframe df are greater than 1990-01-01. Make a .copy() of that and save it to the variable data.* Because pandas understands that 'year' is a date it can make sense of what `"> '1990-01-01'"` means.
```python
data['year'] = data['year'].dt.year
```
This says *from your dataframe `data` grab the column 'year'. `.dt` means interpret this as a `datatime` object (a date). `.year` means grab the year of that date. Now set that to the column 'year' of data.* This essentially overwrites the column `'year'` of your dataframe with just it's own `'year'`. Before this the column was formated as `'1990-01-01'` but now it'll only be `'1990'`. 

The next part of the function creates the `bar_chart` we want.
```python
alt.Chart(data[data['bpld'] == country])
```
means *from that `data` dataframe grab only the values where `'bpld' == country` (which we already defaulted to 'China' but will be whatever value we feed the function `bar_chart()`). With that subset of the dataframe use create an `altair.Chart()`.*

```python
alt.Chart(data[data['bpld'] == country]).mark_bar()
```
means the `marks` in this chart will be bars.
```python
alt.Chart(data[data['bpld'] == country]).mark_bar().encode(
    x = 'year:O',
    y = 'perwt',
)
```
means *encode the value of 'year' on the X-axis and the value 'perwt' (Person's Weight from the original ACS dataset) on the Y-axis.* You may wonder why there is a '`:O`' after 'year'. This just tells `altair` that these are **ordered** values. Try running the function without the '`:O`'. It should run just the same but it may look a little different. 

```python
alt.Chart(data[data['bpld'] == country]).mark_bar().encode(
    x = 'year:O',
    y = 'perwt',
).properties(
    title = '{country_title}'.format(country_title = country),
    width = 500,
    height = 300,
)
```
means *the `properties` for that Chart you just created are these: the title is whatever the value of the variable `country` is; the width is 500 pixels; the height is 300.* `Altair` is a very customizable library. [Read the docs](altair-viz.github.io) to explore what other values you can throw in there.

```python
top_countries = df.groupby('bpld')['perwt'].sum().to_frame().reset_index().copy()
top_countries = top_countries[top_countries['perwt'] >= 5000]
top_countries = [country for country in top_countries['bpld']]
top_countries.pop(0) # this gets rid of 'Abroad, n.s.' aka the first country in our list
```
This is just more data cleaning. We are creating a `list` of `top_countries` which we define as those that have had more than 5,000 people come to California from that country since the year 2000. 

***
### Build on that Chart!

1. add `color`
  - you can `.encode()` a third value on your Chart. Add `color = alt.Color('agg educd',),` after `y = 'perwt',` and see what happens.
2. make it `.interactive()`
  - making Charts interactive is very easy in `altair`. Just add `.interactive()` after `.properties()`
3. make the axis more readable. 
  - use 
  ```python 
    x = alt.X('year:O', axis = alt.Axis(title='Year')),
    y = alt.Y('perwt:Q', axis = alt.Axis(title('Number of People')), 
    color = alt.Color('agg educd',),
  ``` 
  in your `.encode()` instead of the code you have in there. 
***
**Bonus** 

Try changing `.mark_bar(...` to `.mark_circle(...` and see what happens. 
  - `.mark_line(`
  - `.mark_square(`
  - `.mark_text(` <br>
  some are more useful than others. 

***
Now instead of typing in every the name of every country you want to explore, wouldn't it be nice if we just had a list of countries to choose from? We did just create one (`top_countries`).

Below you can create a dropdown menu to `interact` with a function. <br>
The syntax is as follows:
```python
interact(function, variable_to_feed_that_function = list_of_values_that_variable_can_take)
```
our function is `bar_chart` which takes in a variable `country` (which is just the name of a country). For the list of values you could write a list like this:
```python
['Brazil', 'Colombia', 'Sweden', 'Mexico']
```
in `python` brackets '[]' mean it is a `list`. 

However, you already created a list `top_countries`. Run the cell below to see all the values it contains.

In [None]:
top_countries

You can feed that `top_countries` list to our `interact()` function. <br> 
Try it below!

In [None]:
interact(bar_chart, country = top_countries,);