# Tutorial 11-01: Getting Started with Charts

Our colleagues at GeoNinjas PythonAnalytics have been working with census data and are asking if we can help them create some charts.  They're interested in the number of housing units per county in each state.  They'd like to see a bar chart showing the number of housing units in each county in descending order (starting with the highest number).

## Gather Data for Charting

#### 1.  Log into ArcGIS Online

To get data for housing units, we can go to the census data.  For our purposes, we can use the data that Esri curates in the Living Atlas.  This data is derived from publicly available census data, so you could get this elsewhere if you needed to.  First, log into ArcGIS Online and find a census layer.

In [None]:
# import the arcgis package
import arcgis

# set up ArcGIS Online credentials
gis = arcgis.GIS("home")

# display my user information
gis.users.me

#### 2.  Identify an item for census housing data.

Now you'll get an item for census housing data in the Living Atlas.  Referencing an item by its item ID is generally the most repeatable and consistent way to access a specific item.

In [None]:
item_2020_census_housing = gis.content.get('81d9e89b8b574a649ff6e14f61c8494f')
item_2020_census_housing

#### 3.  Identify the data layer of interest.

This item has many layers, so you'll need to identify the county layer specifically.  You could iterate through each layer in the item by accessing the item's `.layers` property.  In the example below, though, you'll use a list comprehension along with `enumerate`, which is a built-in python generator that returns an index with each item in a list.

In [None]:
[
    (i, lyr.properties.name) for i, lyr 
    in enumerate(item_2020_census_housing.layers)
]

In this list, you can see that the county layer is at index **2**.  We'll use that going forward.

In [None]:
lyr_counties = item_2020_census_housing.layers[2]

#### 4. Query data for a single state.

Now that you've identified the county layer, you can pick a sample state to start with.  In this example, we'll use California as a starting point.  

You can use the `.query()` method on the counties layer object.  This method has many parameters worth exploring (using the ? functionality explained in the **Jupyter Notebook** chapter).  In this case, you can use the `where` parameter to specify which state to query (via an attribute expression), the `as_df` parameter to indicate that you'd like the data returned as a DataFrame, and the `return_geometry` parameter to indicate whether you'd like the operation to return geometry with each feature.

In [None]:
df_california = lyr_counties.query(
    where = "STATE = 'California'",
    as_df = True,
    return_geometry = False
)

TIP - You may be wondering why a book focusing on Python in GIS would choose to not return spatial data.  In this case, since we're not doing any geospatial processing we don't actually need the geometry.  For a polygon layer like this, geometries can take up a lot of space.  Excluding them from our return can speed up our query process.

#### 5.  Use field aliases for column names

If you look at the column names in our DataFrame, you might notice that they're not very human-readable.  You can try this by executing `df_california.dtypes` if you'd like.  You can replace the column names with the field aliases for a bit more human readability though.

In [None]:
# get the field names and aliases from the layer
field_aliases = {f['name']: f['alias'] 
                  for f in lyr_counties.properties.fields}

# rename the DataFrame columns to use the field aliases
df_california = df_california.rename(columns=field_aliases)

##  Get started with charts.

#### 1.  Plot all the data for a DataFrame.

The pandas package makes it very easy to start charting with a DataFrame.  By simply calling the `.plot()` method on the DataFrame, pandas will make an educated guess on what it can plot.  In this case, it's going to plot all the numeric fields that it can as a line chart.

In [None]:
df_california.plot()

This isn't super helpful because our data is categorical in nature.  We can specify that we want our chart to be a bar, though.  We can also specify which columns we want to use as our axes.

#### 2.  Add parameters to the plot method

Adding parameters to the `.plot()` method will help focus your charting efforts on the data you're interested in.  In this case, you can specify that you would like a bar chart by using the `kind` parameter and specify the columns you'd like to use as the `x` and `y` axes.

In [None]:
df_california.plot(
    kind='bar', 
    x='Name', 
    y='Total Housing Units',
)

This looks better, but if you recall we have a requirement to sort the data with the largest values at the left.  Let's do that now.

#### 3.  Sort the values and recreate the chart.

Now you'll sort the values in the DataFrame and recreate the chart.  First, you'll use a built-in pandas method for sorting the values.  You want to sort the values in descending order by the "Total Housing Units" column.

In [None]:
df_california = df_california.sort_values(
    'Total Housing Units', 
    ascending=False
    )

df_california[['Name','Total Housing Units']].head()

Now that you've sorted the values, let's try creating that last chart again.

In [None]:
df_california.plot(
    kind='bar', 
    x='Name', 
    y='Total Housing Units',
)

#### 4.  Increase the size of the chart

This is getting closer.  Now make the chart wider so that it allows for more space between the county labels.  There's a `figsize` parameter of the plot method that allows us to set the width and height.

In [None]:
df_california.plot(
    kind='bar', 
    x='Name', 
    y='Total Housing Units',
    figsize=(15, 5),
)

## Repeat the process for multiple states.

Now that you've got an acceptable chart, you repeat the process for multiple states.  This will involve reusing some of the code you wrote previously.  First, you'll define some states to run your process on.

#### 1.  Generate a list of states to iterate through.

You could spend your time creating a list and writing out the name of every state, but that wouldn't be super efficient.  You can use some of the tools we already know about to get a list of states from the Living Atlas though.  There are many ways to do this.  You could access the layer of states included in this item, for instance.  In this example, you can use the same `.query()` method on the counties layer to get all the values in the "State" column.

In [None]:
df_states = lyr_counties.query(
    out_fields=['State'],
    return_geometry=False, 
    as_df=True
)


This returned a DataFrame of records for each county.  You can reduce this to the unique state names using a built-in pandas method called `.unique()`.

In [None]:
states_to_map = df.State.unique()

#### 2.  Iterate through all the states

Now you can repeat your charting process for each state.  In the cell below, you'll re-use the code you developed in previous steps.  For each state, you will need to:
 - Query the data for that state
 - Rename the columns in the resulting DataFrame
 - Sort the values in the DataFrame by "Total Housing Units"
 - Plot the chart
 - Save the chart as a .pdf file

In [None]:
for state in states_to_map:
    
    # query the state layer
    df_state = lyr_counties.query(
        where = f"STATE = '{state}'",
        as_df = True
        )
    
    # rename the columns (we already created the field aliases dictionary)
    df_state = df_state.rename(columns=field_aliases)
    
    # sort the values in the DataFrame
    df_state = df_state.sort_values(
        'Total Housing Units', 
        ascending=False
        )
    
    # create a chart object
    chart = df_state.plot(
        kind='bar', 
        x='Name', 
        y='Total Housing Units',
        figsize=(15, 10),
    )
    
   
    # convert to a matplotlib Figure
    fig = chart.get_figure()
    
    # save as a png file
    fig.savefig(f"./{state}.pdf")

Note that the last two lines save our charts as .pdf files.  These are files we can hand off to our colleagues that they can use in their products.