# Lab03: Charts

Your Job – Let’s pretend that you are an entry level planner who has been asked to provide a demographic description for your boss with both total population and percent of total population. She is going to a city meeting and asked that you provide the information in a good visual representation. She would like an easy to read table for her records and either a bar chart or a pie chart that she can present at the meeting.

In [None]:
import pandas as pd
import numpy as np
import re

In [None]:
# read in 2021 census data
data_21 = pd.read_csv('~/git/cp101.github.io/labs/lab03/census21_data.csv')
data_21

In [None]:
# fix missingness and data types
data_21 = data_21.fillna(0)
data_21 = data_21.replace({'NA': 0})
data_21 = data_21.replace({'': 0})
data_21.iloc[:,4:] = data_21.iloc[:,4:].apply(pd.to_numeric)
# pad to get correct geouid length
data_21["GeoUID"] = data_21["GeoUID"].astype(str).str.ljust(10, "0")

In [None]:
data_21.columns

In [None]:
# subset to needed columns for analysis
data_subset = data_21.iloc[:,1:27].drop(columns = ["Type", "CMA_UID", "PR_UID", "CSD_UID", "CD_UID"])
# clean the names
data_subset = data_subset.rename(columns = {x: re.sub(r"v_CA\d{2}_\d+: ", "", x) for x in data_subset.columns.tolist()})
data_subset

First – Python plotting libraries can generate visualization using data from tables. As a result, we need to start by creating a table with the information requested. What statistics do you need to generate? (Hint: we need to generate summary statistics for the city). Now let’s calculate the amounts together. Do your summary statistics match your neighbors? How about the numbers projected at the front of the class?

In [None]:
# get sums, round to nearest whole number
# note- subsequent proportions and percentages will not sum to 1 or 100% due to this rounding
toronto_sums = data_subset.iloc[:,2:].sum().astype(int)
toronto_sums

In [None]:
# proportions by group
toronto_proportions = toronto_sums[6:] / toronto_sums[4]
toronto_proportions

Second – Now that we have the necessary information, we need to put it into a good table. You can think about this as prettying up the table, but in reality what you are doing is making the information easily digestible. Your table should have all the information your boss needs, but nothing more. A great way to check if your tables are good is to ask a neighbor to explain what your table is showing in one sentence.

In [None]:
summary_table = pd.concat([toronto_sums[6:].to_frame(name = "Population"), toronto_proportions.to_frame(name = "Proportion")],axis = 1).sort_values(by = "Population", ascending = False)
summary_table

Third – Now that we have our table, let’s put in a chart. There are many chart options – we will talk about them later in the semester. For now, we are going to create two charts showing the demographics of Toronto using pandas `plot()` function.

In [None]:
# a bar chart
summary_table.plot(y = 'Population', kind = 'bar');

In [None]:
# a pie chart
summary_table.plot(y = 'Population', kind = 'pie');

You can select different colors by creating a mapping of legend values to color values and passing it in as a keyword argument.

Another website we recommend is [Color Brewer](http://colorbrewer2.org/). The website provides a number of color options for whatever kind of visual you are providing, and can be useful if you are thinking of presenting your tables (and later maps) to a wider audience. You can then use the information to manually adjust the colors for your visual. It can take a little time, so I would recommend finalizing your visual first and then doing color adjustments as the last step.

Now that you have done this for the City of Toronto, can you work with your neighbor to develop a table that provides demographic information for Census Tract 0038.00 and for the
City as a whole? Remember that each time you use these functions the more natural they become.