# Tutorial 11-01: Getting Started with Charts

Our colleagues at GeoNinjas PythonAnalytics have been working with census data and are asking if we can help them create some charts.  They're interested in the number of housing units per county in each state.  They'd like to see a bar chart showing the number of housing units in each county in descending order (starting with the highest number).

#### 1.  Identify census housing data layer.

To get data for housing units, we can go to the census data.  For our purposes, we can use the data that Esri curates in the Living Atlas.  This data is derived from publicly available census data, so you could get this elsewhere if you needed to.  First, let's log into ArcGIS Online and find a census layer.

In [None]:
# import the arcgis package
import arcgis

# set up ArcGIS Online credentials
gis = arcgis.GIS("home")

# print my user information
gis.users.me

Now let's get an item for census housing data in the Living Atlas.  Referencing an item by its item ID is generally the most repeatable and consistent way to access a specific item.

In [None]:
item_2020_census_housing = gis.content.get('81d9e89b8b574a649ff6e14f61c8494f')
item_2020_census_housing

Now we need to access the county data in this item.  This item has many layers, so we'll need to identify the county layer specifically.

In [None]:
[
    (i, lyr.properties.name) for i, lyr in enumerate(item_2020_census_housing.layers)
]

In this list, we can see that the county layer is at index **2**.  We'll use that going forward.

In [None]:
lyr_counties = item_2020_census_housing.layers[2]

#### 2. Query data for a single state.

Now that we've identified our layer, let's get a sample state to start with.  We can start with California as a test.  We'll query that feature layer and specify that we'd like to return a DataFrame with only data for California.

In [None]:
df_california = lyr_counties.query(
    where = "STATE = 'California'",
    as_df = True
)

If we look at the column names in our DataFrame, we might notice that they're not very human-readable.  You can try this by executing `df_california.dtypes` if you'd like.  We can replace the column names with the field aliases for a bit more human readability though.

In [None]:
# get the field names and aliases from the layer
field_aliases = {f['name']: f['alias'] 
                  for f in lyr_counties.properties.fields}

# rename the DataFrame columns to use the field aliases
df_california = df_california.rename(columns=field_aliases)

#### 3.  Get started with charts.

The pandas package makes it very easy to start charting with a DataFrame.  By simply calling the `.plot()` method on the DataFrame, pandas will make an educated guess on what it can plot.  In this case, it's going to plot all the numeric fields that it can as a line chart.

In [None]:
df_california.plot()

This isn't super helpful because our data is categorical in nature.  We can specify that we want our chart to be a bar, though.  We can also specify which columns we want to use as our axes.

In [None]:
df_california.plot(
    kind='bar', 
    x='Name', 
    y='Total Housing Units',
)

This looks better, but if you recall we have a requirement to sort the data with the largest values at the left.  Let's do that now.

#### 4.  Sort the values and recreate the chart.

Let's sort the values in our DataFrame and recreate the chart.  First, we'll use a built-in pandas method for sorting the values.  We want to sort the values in descending order by the "Total Housing Units" column.

In [None]:
df_california = df_california.sort_values(
    'Total Housing Units', 
    ascending=False
    )

Now that we've sorted the values, let's try creating that last chart again.

In [None]:
df_california.plot(
    kind='bar', 
    x='Name', 
    y='Total Housing Units',
)

This is getting closer.  Now let's make the chart wider so that it allows for more space between the county labels.  There's a `figsize` parameter of the plot method that allows us to set the width and height.

In [None]:
df_california.plot(
    kind='bar', 
    x='Name', 
    y='Total Housing Units',
    figsize=(15, 5),
)

#### 5.  Repeat the process for multiple states.

Now that we've got an acceptable chart, let's repeat the process for multiple states.  This will involve taking some of the code we wrote previously and repeating it.  First, let's define some states to run our process on.

In [None]:
states_to_map = [
    "California",
    "Arizona",
    "Washington",
    "Oregon",
    "Nevada"
]

Now let's repeat our process starting with querying data from the hosted feature service.

In [None]:
for state in states_to_map:
    
    # query the state layer
    df_state = lyr_counties.query(
        where = f"STATE = '{state}'",
        as_df = True
        )
    
    # rename the columns (we already created the field aliases dictionary)
    df_state = df_state.rename(columns=field_aliases)
    
    # sort the values in the DataFrame
    df_state = df_state.sort_values(
        'Total Housing Units', 
        ascending=False
        )
    
    # create a chart object
    chart = df_state.plot(
        kind='bar', 
        x='Name', 
        y='Total Housing Units',
        figsize=(15, 10),
    )
    
   
    # convert to a matplotlib Figure
    fig = chart.get_figure()
    
    # save as a png file
    fig.savefig(f"./{state}.pdf")

Note that the last two lines save our charts as .png files.  These are files we can hand off to our colleagues that they can use in their products.