In [1]:
import warnings
warnings.filterwarnings('ignore')

from datascience import *
import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline

# Accuracy in Online Articles #

***A. Adhikari***

### BUDS 2021 Week 3 Wednesday Report ###


**Remember to run the top cell of this notebook before reading further.**

[Forbes](https://en.wikipedia.org/wiki/Forbes) is an American business magazine that has a significant international presence and editions in over 25 languages. It has a particular interest in very rich people. For example, its annual list of the world's [billionaires](https://en.wikipedia.org/wiki/The_World%27s_Billionaires#2021) is publicized by news organizations all over the world.

Forbes has a digital presence as well, with forbes.com being one of the world's most widely visited business websites.

However, few people in the world are billionaires, and a pro-business perspective sometimes goes hand in hand with limiting payments to employees. 

Governments are not businesses, but they too have employees and must manage funds. In September 2020, forbes.com published an article by [Adam Andrzejewski](https://en.wikipedia.org/wiki/Adam_Andrzejewski), businessman and former Republican candidate for Governor of Illinois. The article was entitled *Why San Francisco is in Trouble: 19,000 Highly Compensated Employees Earned $150,000+ in Pay & Perks*. It was about the compensation paid to San Francisco city and county government employees in 2019.

**In this report, you will assess the contents of the article and provide recommendations for how to thoughtfully read data-intensive online publications.**

We will provide all the portions of the article needed for this report. If you want to read the full article, be aware that payments to the author are based on traffic to the article's [webpage](https://www.forbes.com/sites/adamandrzejewski/2020/09/01/why-san-francisco-is-in-trouble--19000-highly-compensated-city-employees-earned-150000-in-pay--perks/?sh=13393e437693). We will return to this aspect later in the report.

### The Data Source ###

The author of the Forbes article is also the founder of [OpentheBooks](https://en.wikipedia.org/wiki/OpenTheBooks), a non-profit organization that aims to collect and post all the disclosed spending of governments of all levels in the US.

The data source cited in the article is [publicly provided](https://data.sfgov.org/City-Management-and-Ethics/Employee-Compensation/88g8-5mnd/data) by the City of San Francisco. We have filtered it to retain just the relevant columns and restrict the data to the calendar year 2019 as that was the year studied in the article.

Run the cell below to load the data table <tt>sf</tt> and view its first three rows.

In [2]:
sf = Table.read_table('sf2019.csv')
sf.show(3)

Organization Group,Department,Job Family,Job,Salary,Overtime,Benefits,Total Compensation
Public Protection,Adult Probation,Information Systems,IS Trainer-Journey,91332,0,40059,131391
Public Protection,Adult Probation,Information Systems,IS Engineer-Assistant,123241,0,49279,172520
Public Protection,Adult Probation,Information Systems,IS Business Analyst-Senior,115715,0,46752,162468


The table has one row for each of the 44,525 San Francisco government employees in 2019.

The first four columns describe the employee's job. For example, the employee in the third row of the table had a job called IS Business Analyst-Senior. We will also call this the employee's *position* or *job title*. The job was in a Job Family called Information Systems (hence the IS in the job title), and was in the Adult Probation department that is part of the Public Protection Organization Group of the government. You will mostly be working with the job title.

The next three columns contain the dollar amounts paid to the employee in the calendar year 2019, for salary, overtime, and benefits. Here salary does not include overtime.

The last column contains the total compensation paid to the employee. It is the sum of the previous three columns:

total compensation = salary + overtime + benefits

You will mostly be working with the total compensation.

## Part 1: The Problem and Initial Recommendations ##

Since you have the data just as the author did, you can do some of the calculations yourself.

### The Title ###
Start with the data in the title of the article: 

*19,000 Highly Compensated Employees Earned $150,000+ in Pay & Perks*

It makes sense to interpret Pay & Perks to mean total compensation.

In the cell below, we have assigned the name <tt>at_least_150K</tt> to a table that contains only the <tt>sf</tt> rows corresponding to employees whose total compensation was $150,000 or more. The use of K to represent "thousand" is common in computer science, and comes from "kilo" as in "kilogram".

- First, discuss with your group why we decided to use
`where` and not some other method such as `take` or `select`.

Next, run the cell.

In [3]:
# Table of only those with total compensation $150,000 or more

at_least_150K = sf.where('Total Compensation', are.above_or_equal_to(150000))
at_least_150K.show(3)

Organization Group,Department,Job Family,Job,Salary,Overtime,Benefits,Total Compensation
Public Protection,Adult Probation,Information Systems,IS Engineer-Assistant,123241,0,49279,172520
Public Protection,Adult Probation,Information Systems,IS Business Analyst-Senior,115715,0,46752,162468
Public Protection,Adult Probation,Information Systems,IS Business Analyst-Principal,159394,0,57312,216706


<div class="alert alert-info">
    <b> Question: </b>
    How many employees had a total compensation of $150,000+? Is the number the same as in the title of the article? If not, which number is larger: yours, or the one in the title?

<div class="alert alert-warning">
    <b> Answer: </b>
    Add the 3 visible rows to the 15,318 omitted rows to see that 15,321 employees had compensations of at least 150K. The article claims a bigger count of 19,000 employees.   

Since the Forbes article cites the same open dataset that we are using, a reason for the discrepancy could be that the City updated the dataset between the time when the article was written and when we accessed it. Or it could be that the article used additional data that is not cited. 

Based on the figures in the rest of the article, the difference appears to be in the benefits, which are a little smaller in our dataset than in the article. We will continue to use our dataset because it is the one currently provided by the City. So you should expect the total compensation amounts in your calculations to be smaller than the corresponding values in the article.

A **more important factor** to keep in mind is that "highly compensated" does not have a unique definition. San Francisco is an expensive city. In fact, it is so expensive that the Housing and Urban Development (HUD) [limit for "low income"](https://www.huduser.gov/portal/datasets/il/il2019/select_Geography.odn) in San Francisco, Marin, and San Mateo counties in 2019 was an annual income of $129,150 or less for a family of four people.

### An Eye-Popping Table###

The article contains a brightly colored table, shown here. We will refer to it as the "Forbes table" for short. 

![Table in Forbes article](forbes_table.png)

It is striking that according to this table, Registered Nurses apparently earned almost $300,000 on average in 2019. That seems like a very large amount. It is worth checking if the calculation makes sense.

First, count the number of nurses. 

- Review how we calculated the number of employees who made $150,000 or more, and discuss with your group how to adapt that method to count the number of nurses. 
- Also discuss how to calculate the number of rows in a table without displaying the first few rows of the table.

<div class="alert alert-info">
    <b> Question: </b>
    Assign <tt>rn</tt> to a table of just the <tt>sf</tt> rows corresponding to Registered Nurses, and then assign <tt>rn_count</tt> to the number of Registered Nurses.

In [4]:
# Number of Registered Nurses

# Table of all the data for only the Registered Nurses
rn = sf.where('Job', are.equal_to('Registered Nurse'))

rn_count = rn.num_rows
rn_count

1471

This count should be the same as in Forbes table.

The Forbes table provides the following two numbers: the number of Registered Nurses, and the grand total of the compensation paid to all of them

- Discuss with your group how you can calculate the average compensation of the Registered Nurses using just these two numbers. 

<div class="alert alert-info">
    <b> Question: </b> 
Assign <tt>average_comp</tt> to the average compensation of the Registered Nurses, using the method your group came up with.

In [5]:
# Internal consistency check of the Forbes table

rn_count_forbes = 1471           # number of Registered Nurses
rn_total_forbes = 298515988      # grand total compensation of all Registered Nurses

# Average compensation based on rn_count_forbes and rn_total_forbes
average_comp = rn_total_forbes / rn_count_forbes
average_comp

202934.05030591434

<div class="alert alert-info">
    <b> Question: </b> 
    Is <tt>average_comp</tt> equal to the corresponding average in the Forbes table? If not, which one is bigger? What is a possible reason for the discrepancy? 

<div class="alert alert-warning">
    <b> Answer: </b>
    They are not the same. A 0 has become 9, leading to a falsely large average in the Forbes table. If the graphic was created by typing the figures instead of copy-pasting, then the error could have been caused by mistyping or misreading. 

It looks as though there is a problem. Your figures don't agree with those in the article, and the discrepancies seem to have a consistent direction.

At this point it's a good idea to think about how to approach the rest of the article.

<div class="alert alert-info">
    <b> Question: </b> 
    Based on your analysis so far, what are your initial recommendations to someone who is planning to read the Forbes article? 

<div class="alert alert-warning">
    <b> Answer: </b>
    Be cautious about interpreting the figures. Some of them are wrong, and the errors seem to make the claimed compensations higher than is justified by the source data. However, we have only checked a couple of values. The others might be fine. We will check some more, so maybe read the rest of our report before reading the article!

## Part 2: Top 10 Positions by Head Count ##

Now check some of the other figures in the Forbes table. 

Start by counting the number of employees in each position. 
- Discuss with your group which of the following methods is best suited for this purpose: `take`, `where`, or `group`.

<div class="alert alert-info">
    <b> Question: </b> 
    Assign <tt>job_counts</tt> to a table that has a row for each position, with one column showing the job title and the other showing the number of employees who had that job.

In [6]:
# Count the number of employees in each job title

job_counts = sf.group('Job')
job_counts

Job,count
"ACPO,JuvP, Juv Prob (SFERS)",1
ASR Operations Supervisor,4
ASR Senior Office Specialist,18
ASR-Office Assistant,25
Account Clerk,58
Accountant I,2
Accountant II,61
Accountant III,130
Accountant IV,68
Accountant Intern,32


**Notice that there are 1057 positions, and they appear in alphabetical order.** This will be important later.

<div class="alert alert-info">
    <b> Question: </b>
    Assign <tt>sorted_job_counts</tt> to a table that has the rows of <tt>job_counts</tt> in decreasing order of the counts.

In [7]:
# The most commonly held jobs, with their counts

sorted_job_counts = job_counts.sort('count', descending=True)
sorted_job_counts

Job,count
Transit Operator,3123
Special Nurse,1669
Registered Nurse,1471
Custodian,917
Firefighter,902
Public Service Trainee,842
Police Officer 3,821
Recreation Leader,750
Patient Care Assistant,600
HSA Sr Eligibility Worker,578


The display conveniently shows the top 10 jobs by head count (that is, by the number of employees who had the job). 

The list is not the same as the one in the Forbes table, reproduced here for convenience.

![Forbes table](forbes_table.png)

<div class="alert alert-info">
    <b> Questions: </b>
    <br> (a) What do the two lists have in common? 
    <br> (b) Which jobs are on the Forbes list but not on yours? 
    <br> (c) Which jobs are on your list but not on the Forbes list?

<div class="alert alert-warning">
    <b> Answers: </b>
    
(a) For every job that's on both lists, the counts are the same. Half the jobs are on both lists: Transit Operator, Registered Nurse, Custodian, Firefighter, and Police Officer 3.
    
(b) Sergeant 3, Deputy Sheriff, Police Officer 2, Attorney (Civil/Criminal), EMT/Paramedic/Firefighter
    
(c) Special Nurse, Public Service Trainee, Recreation Leader, Patient Care Assistant, HSA Sr Eligibility Worker

Run the cell below to see the top 20 jobs by head count. 
- Discuss with your group some possible reasons why the Forbes article chose to use some jobs as its so-called "top 10" but not others. 

Later in the report you will have more evidence to come up with reasons.

In [8]:
# The top 20 most commonly held jobs with their counts

sorted_job_counts.show(20)

Job,count
Transit Operator,3123
Special Nurse,1669
Registered Nurse,1471
Custodian,917
Firefighter,902
Public Service Trainee,842
Police Officer 3,821
Recreation Leader,750
Patient Care Assistant,600
HSA Sr Eligibility Worker,578


To understand the jobs better, you can look at some rows corresponding to each job as you can see by running the cell below. In this example, the table shows that Public Service Trainees were appointed by different departments such as the police and the airport, and their total compensation varied widely.

If you change the job title in the cell, remember to copy-paste it *exactly* as it appears in the table above, with no spaces before or after the title. You can of course also change the number of rows to display.

In [9]:
sf.where('Job', are.equal_to('Public Service Trainee')).show(6)

Organization Group,Department,Job Family,Job,Salary,Overtime,Benefits,Total Compensation
Public Protection,Police,Public Service Aide,Public Service Trainee,12029,0,990,13019
Public Protection,Police,Public Service Aide,Public Service Trainee,36008,0,10155,46163
Public Protection,Police,Public Service Aide,Public Service Trainee,14378,0,4060,18438
Public Protection,Police,Public Service Aide,Public Service Trainee,14913,0,1210,16123
Public Protection,Police,Public Service Aide,Public Service Trainee,15540,0,4383,19923
"Public Works, Transportation & Commerce",Airport Commission,Public Service Aide,Public Service Trainee,0,386,31,417


## Part 3: Total and Average Compensation by Job Title ##

This part of the report is purely computational. The goal is to calculate the figures that you will need for comparing with the Forbes table. In the next part of the report, you will display these figures in a table.

To avoid carrying unnecessary data, restrict your analysis to just the job titles and total compensation. 
- Discuss with your group which `Table` method allows you to create a new table consisting of specified columns of another table: `take`, `select`, or `column`.

<div class="alert alert-info">
    <b> Question: </b>
    Assign <tt>jobs_and_total_comp</tt> to a table that contains only the <tt>Job</tt> and <tt>Total Compensation</tt> columns of the table <tt>sf</tt>. It should have one row for each employee.

In [10]:
# Create a table that has only two of the columns of sf:
# Jobs and Total Compensation

jobs_and_total_comp = sf.select('Job', 'Total Compensation') 
jobs_and_total_comp

Job,Total Compensation
IS Trainer-Journey,131391
IS Engineer-Assistant,172520
IS Business Analyst-Senior,162468
IS Business Analyst-Principal,216706
IS Programmer Analyst,98706
IS Project Director,236572
IT Operations Support Admin IV,173269
Accountant III,158135
Statistician,126624
Senior Administrative Analyst,163843


### Table of Grand Total Compensation by Position ###

As you have seen, many employees can have the same position (also known as job title). The goal now is to find the grand total compensation of the employees for each of the positions. 

For example, you have seen that 3,123 employees had the position of Transit Operator. We now want the grand total amount the City paid to all its Transit Operators. That's the sum of the total compensation amounts for all 3,123 Transit Operators.

You have to calculate the corresponding amount for each position. 

- Discuss with your group which of the following two methods is best suited for this, and what the arguments should be: `where` or `group`.

<div class="alert alert-info">
    <b> Question: </b>
    Assign <tt>comp_by_jobs</tt> to a table that has one column for the job titles and one for the total compensation of employees in the jobs.

In [11]:
# Display the grand total compensation for each position

comp_by_jobs = jobs_and_total_comp.group('Job', sum)

# Don't worry about the next line of code. It just makes Column 1 easier to read.
comp_by_jobs.set_format(1, NumberFormatter(decimals=0)) 

Job,Total Compensation sum
"ACPO,JuvP, Juv Prob (SFERS)",248096
ASR Operations Supervisor,503372
ASR Senior Office Specialist,2114434
ASR-Office Assistant,1689426
Account Clerk,4173356
Accountant I,220055
Accountant II,6818860
Accountant III,18715168
Accountant IV,11309612
Accountant Intern,2266249


**The table should have one row for each job, and the jobs should be in alphabetical order, just as for `job_counts`.**

In [12]:
job_counts

Job,count
"ACPO,JuvP, Juv Prob (SFERS)",1
ASR Operations Supervisor,4
ASR Senior Office Specialist,18
ASR-Office Assistant,25
Account Clerk,58
Accountant I,2
Accountant II,61
Accountant III,130
Accountant IV,68
Accountant Intern,32


Each row of `comp_by_jobs` contains the same job as the corresponding row of `job_counts`, and hence also the same employees. So you will be able to create a larger table by simply attaching columns from the two tables side by side, one after another. 

For this, it will help to store the counts and grand total compensations in arrays. 

### Arrays of Counts and Grand Total Compensations by Position ###

Recall from Part 1 which `Table` method accesses a column as an array. Also note that column numbering starts at 0, following the Python convention.

<div class="alert alert-info">
    <b> Question: </b>
    Assign <tt>counts_only</tt> to an array of the values in Column 1 of <tt>job_counts</tt>, and <tt>total_comps_only</tt> to an array of the values in Column 1 of <tt>comp_by_jobs</tt>.

In [13]:
# Construct two data arrays
# corresponding to job titles arranged in alphabetical order

counts_only = job_counts.column('count')
total_comps_only = comp_by_jobs.column('Total Compensation sum')

The displays of these arrays are not pretty. That's because the arrays are very long. Run the cell below to see.

In [20]:
counts_only, total_comps_only

(array([ 1,  4, 18, ...,  6, 27,  1]),
 array([ 248096.,  503372., 2114434., ..., 1027519., 3643074.,  111344.]))

In the next part of the report you will place the arrays in a table for ease of viewing. For now, just check that they have the right number of elements.

Recall that there were 44,525 employees and 1,057 positions. 
- Discuss with your group how many elements each of the arrays should have: 44,525 or 1,057.
- Also discuss which function allows you to find the number of elements in an array.

<div class="alert alert-info">
    <b> Question: </b>
    Assign <tt>counts_only_size</tt> to the number of elements in <tt>counts_only</tt>, and <tt>total_comps_only_size</tt> to the number of elements in <tt>total_comps_only</tt>. Check that they agree with what you came up with in your discussion.

In [21]:
# Sizes of the arrays

counts_only_size = len(counts_only)
total_comps_only_size = len(total_comps_only)

counts_only_size, total_comps_only_size

(1057, 1057)

### Array of Average Compensations by Position ###
Now all you need is the average compensation for each position. Recall:

- the method you found in Part 1 for getting an average based on the count and the total
- array operations are carried out "elementwise", that is, by operating on the first element of each array, then on the second element of each array, and so on 

<div class="alert alert-info">
    <b> Question: </b>
    Assign <tt>average_comps_only</tt> to an array of the average compensation for each job.

In [14]:
# Construct the third data array:
# Average compensation for each job title, using counts_only and total_comps_only

average_comps_only = total_comps_only / counts_only

# This code just rounds each array element to the nearest integer. Don't worry about it.
average_comps_only = np.round(average_comps_only, 0)

<div class="alert alert-info">
    <b> Question: </b>
    Assign <tt>average_comps_only_size</tt> the number of elements in <tt>average_comps_only</tt>, and confirm that it is what it should be.

In [22]:
# Size of the array

average_comps_only_size = len(average_comps_only)

average_comps_only_size

1057

## Part 4: Constructing the Corrected Table ##

All your hard work will is about to pay off. You are now ready to correct the article's Top 10 table.

Run the cell below to assign `new_table` to a table that just has a column of job titles.

In [15]:
# Construct a table containing just one column
# that has all the job titles in alphabetical order.
# The column label should be Job Title as in the Forbes table.

new_table = job_counts.select('Job').relabeled('Job', 'Job Title')
new_table

Job Title
"ACPO,JuvP, Juv Prob (SFERS)"
ASR Operations Supervisor
ASR Senior Office Specialist
ASR-Office Assistant
Account Clerk
Accountant I
Accountant II
Accountant III
Accountant IV
Accountant Intern


- Discuss with your group which `Table` method allows you to grow `new_table` by attaching more columns. Recall how that method works.

<div class="alert alert-info">
    <b> Question: </b>
    Attach three more columns to <tt>new_table</tt>. The columns should have the same labels as in the Forbes table, and the contents should be the three arrays you created in Part 3.

In [16]:
# Attach three more columns

new_table = new_table.with_columns(
    'Average Comp.', average_comps_only,
    'Total Comp.', total_comps_only,
    'Head Count', counts_only
)

# This code makes numbers in Columns 1-3 easier to read. Don't worry about it.
new_table.set_format(make_array(1, 2, 3), NumberFormatter(decimals=0)) 

Job Title,Average Comp.,Total Comp.,Head Count
"ACPO,JuvP, Juv Prob (SFERS)",248096,248096,1
ASR Operations Supervisor,125843,503372,4
ASR Senior Office Specialist,117469,2114434,18
ASR-Office Assistant,67577,1689426,25
Account Clerk,71954,4173356,58
Accountant I,110028,220055,2
Accountant II,111785,6818860,61
Accountant III,143963,18715168,130
Accountant IV,166318,11309612,68
Accountant Intern,70820,2266249,32


You can now use `new_table` to create the accurate table of *Top 10 Positions by Head Count*. The goal here is to create a table that has exactly 10 rows, not just to display the first 10 rows of a bigger table.

- Discuss with your group which of the following `Table` methods you will need: `sort`, `where`, `select`, or `take`. You might need more than one of these methods.

<div class="alert alert-info">
    <b> Question: </b>
    Assign <tt>sorted_new_table</tt> to <tt>new_table</tt> sorted in decreasing order of head count. Then assign <tt>final_table</tt> to a table consisting of just the first 10 rows of <tt>sorted_new_table.</tt>

In [17]:
# The correct "Top 10 Positions by Head Count" table

sorted_new_table = new_table.sort('Head Count', descending=True)
final_table = sorted_new_table.take(np.arange(10))
final_table

Job Title,Average Comp.,Total Comp.,Head Count
Transit Operator,96738,302113995,3123
Special Nurse,48327,80657590,1669
Registered Nurse,187108,275235311,1471
Custodian,80064,73418563,917
Firefighter,175904,158665131,902
Public Service Trainee,14044,11824951,842
Police Officer 3,199183,163529007,821
Recreation Leader,9987,7490583,750
Patient Care Assistant,89108,53465047,600
HSA Sr Eligibility Worker,101085,58426872,578


This is how the *Top 10 Positions by Head Count* table should have appeared. It is your contribution to accuracy in reporting!

## Part 5: Conclusion and Final Recommendations ##

The Forbes article's title and contents indicate that its desired focus is on "highly compensated" employees. You now have more clear evidence to explain the choices made in its "Top 10 Positions by Head Count" table.

Run the cell below to display the top 20 rows of <tt>sorted_new_table</tt>.

In [18]:
sorted_new_table.show(20)

Job Title,Average Comp.,Total Comp.,Head Count
Transit Operator,96738,302113995,3123
Special Nurse,48327,80657590,1669
Registered Nurse,187108,275235311,1471
Custodian,80064,73418563,917
Firefighter,175904,158665131,902
Public Service Trainee,14044,11824951,842
Police Officer 3,199183,163529007,821
Recreation Leader,9987,7490583,750
Patient Care Assistant,89108,53465047,600
HSA Sr Eligibility Worker,101085,58426872,578


The `Average Comp.` column provides insight into which jobs were included and which were excluded in the article's "top 10" list.

<div class="alert alert-info">
    <b> Question: </b>
    What are some numerical reasons for why jobs were included or excluded in the Forbes table?

<div class="alert alert-warning">
    <b> Answer: </b>
    Jobs with the lowest average comps have been replaced with jobs that are less common but have average comps of around $200,000. Maybe custodians were left in the table because readers might already expect that the government of a large city like San Francisco has many custodians.

The article is its author's attempt to communicate the results of their data analysis to a wide audience. When human beings communicate, what they say is influenced by many different factors, not just the subject matter involved. Because of this, the Society of Professional Journalists has a formal [Code of Ethics](https://www.spj.org/ethicscode.asp).

The author of the Forbes article is listed as a Senior Contributor (next to the author's name, right below the headline). Here is [Wikipedia's description](https://en.wikipedia.org/wiki/Forbes) of the "contributor model" of forbes.com:

"Forbes.com uses a "contributor model" in which a wide network of "contributors" writes and publishes articles directly on the website ... Contributors are paid based on traffic to their respective Forbes.com pages ..."

<div class="alert alert-info">
    <b> Question: </b>
    What are some factors <i>apart from the data</i> that might have motivated the choices made in the article?

<div class="alert alert-warning">
    <b> Answer: </b>
    Sensational articles tend to get traffic, which affects payment. However, this article's author is a successful businessman and payment from forbes.com might not be a principal motivator. Instead, focusing on extreme cases, ignoring that $150,000+ isn't "highly compensated" in SF, and cherry-picking data to embellish the article's point might be more convincing to general readers and get more clicks and shares. This could help propagate the author's political point of view and drive readers to his own organization's websites.

In recent years there has been an explosion of online reporting by authors of all backgrounds. So the lessons learned from your careful analysis of the Forbes article have much wider applicability.

<div class="alert alert-info">
    <b> Question: </b>
    What recommendations do you have for readers of data-intensive online reports?

<div class="alert alert-warning">
    <b> Answer: </b>
    Read widely, try to double-check results using different sources, and consider the motivations of the authors. Remember that all authors, no matter how eminent, have biases, and not everyone who writes online has been trained in journalism or its ethics.

### Congratulations! ###

You have completed your report. 

Keep in mind that [data journalism](https://en.wikipedia.org/wiki/Data_journalism) is a relatively new profession, about 10 years old. After your work on this report, you are already on your way to becoming a great data journalist!