# Visualizing Black Literacy After Emancipation

<div>
<img src="https://github.com/HigherEdData/Du-Bois-STEM/blob/main/readings-images/original-plate-47.jpg?raw=true" width="700" />
</div>

### This interactive exercise is inspired by the annual #DuBoisChallenge

The #DuBoisChallenge is a call to scientists, students, and community members to recreate, adapt, and share on social media the data visualzations created by W.E.B. Du Bois and his collaborators in 1900. Before doing the interactive exercise, please read this article about the Du Bois Challenge: https://nightingaledvs.com/the-dubois-challenge/

### In this interactive excercise, you will:

1. Learn how to create a variation of a **bar graph.**
2. Learn and modify code in the statistical programming lanugage **R**.
3. Learn how to write statistical code to:
    * create visualizations that consistently and accurately represent your data
    * create a transparent record of exactly how you visualized something
    * make it easy for you or others to recreate or modify your visualization
4. Your instructor may also ask you answer questions and submit screenshots as you go in a parallel **Catcourses** (or other Canvas system) as you go. 

### You will learn how to use the *R* statistical programming language by creating two graphs:

1. You will recreate Du Bois' visualization of Black illiteracy rates in the US compared to illiteracy rates in other countries. Du Bois created the visualization in 1900.

2. You will reproduce Du Bois' visualization using data on Black illiteracy in the US today. This aligns with how Du Bois saw mass education as one important strategy for furthering and deepining emancipation for Black Americans and others.

3. An important context of Du Bois's graph of Black illiteracy is that literacy was illegal for enslaved people in the U.S. until emancipation and the Confederacy's defeat during the Civil War.  illiteracy then declined rapidly as Black Americans sought to empower themselves through education. Du Bois plotted this decline in illiteracty with the following graph:

<div>
<img src="https://github.com/ajstarks/dubois-data-portraits/blob/master/plate14/original-plate-14.jpg?raw=true" width="700" />
</div>

### 1. How to use this interactive **Jupyter Notebook**

If you know how to use R on your own computer with R Studio you can copy, paste, and edit code in R Studio to to the exercise. If you have Jupyter Lab, you can also download this Notebook to use it on your own computer. You can download the Notebook or view a non-interactive version of this Notebook by clicking **[here](https://github.com/HigherEdData/Du-Bois-STEM/blob/main/r_literacy_dubois.ipynb)**.

Grey cells in the *Notebook* like the one below are **code cells** where you will write and edit **R** Code. To try it out:

1. Click your cursor on the grey cell below. After you click on it, it will change to white to indicate you are editing it.
2. After you clickg on the cell below and it turns white. Type ```2+2``` to use R as a calculator.
3. After typing ```2+2```, click the <span class="play-button">&#9654;</span> *play button* at the top of this page.

### 2. Keeping track of your work and using R outside of this notebook.

If you leave this Notebook idle or close and and re-open it, you're work will not be saved in the Notebook. But you can export an HTML file showing your work at any time. You can then open and browse the HTML file in any web browser.

And after completing the Notebook exercises, you can export a final HTML file to submit for any course assignments using this Notebook.

To export the Notebook, click the **File** dropdown above, select **Export File As** and then select **HTML** as shown here:

<img src="attachment:c39a32ef-36ed-4800-812f-6df3ae35593f.png" alt="Screenshot" width="500" />

### 3. Getting hints and answers.

Sometimes, the code cells will already have code in them that you will be asked to edit or run by clicking the play button. In the process, you can click on <span class="play-button">&#9654;</span> dropdown buttons like the ones below to get hints and answers. For example:
1. Click on the cell below with ```3+3=``` and click the play triangle above. You should get an error message highlighted in pink the begins "Error in parse...". To complete this activity, you'll want to get each code cell to run without a pink error message.
2. Based on the ```2+2``` code you tried above, try to edit the ```3+3=``` code to get it to report the sum of 3+3 without the error message. For a hint, click the first <span class="play-button">&#9654;</span> button below.

<details> <summary>Click this triangle for a hint.</summary>

Try deleting the ```=``` sign in the cell and click play again. If that doesn't work, click the next triangle below for the answer.

</details>

<details> <summary>Click this triangle for the answer.</summary>

**Answer:** Delete all the text in the cell below and write or paste this answer in the cell: ```3+3``` before clicking play again.

</details>

In [None]:
3+3 =

### 4. Reading and writing comments that explain your code

In code cells, we can write **comment** text that explains our code. We put a ```# ``` before **comment** text to tell R that the text is not code it should execute. Any text after a ```# ``` on a given line will be treated as a comment. In Jupyter, **comment** text after a ```# ``` will be displayed in a dark turquois color. To see how this works, try the following below:

1. Try to run the code below. You should get an error message because the comment text ```This is code that adds 2+2``` is not R code and doesn't have a ```# ``` sign in front of it.
2. Add a ```# ``` sign before ```This is code that adds 2+2```. This should change the color of the text to turquois like the text after the ```2+2``` where there is already a ```# ``` sign.
3. Click the <span class="play-button">&#9654;</span> at the top of the notebook and the code should run and output **4** below.

In [None]:
This is code that adds 2+2

2+2 # the result of 2 +2 should be 4

### 5. Reading Du Bois' data into an R Data Frame

The first step for data visualization in **R** is to **read** data into an R **dataframe**. This is like double clicking a file to open it in other computer programs. But with **R**, we use code.

For this exercise, we're going to read in data from a website. And we're going to place the data into a dataframe named **d_literacy_country**.

The **R** code to do this uses an <- arrow pointed at the name of the data frame and a *read.csv* **function** command followed by the web address within paraentheses where a **csv (comma separated values)** data file is located. It looks like this: 

```d_literacy_country <- read.csv(web_address_with_data/data_file_name.csv")```

After writing this code, we can write the name of the **data frame** ```d_literacy_country``` again on a separate line. This will list all of the data in the data frame.

To do this yourself, replace the ```____``` portion of the code below to add the ```read``` function and then press the <span class="play-button">&#9654;</span> above.

In [None]:
d_literacy_country <- _____.csv("https://raw.githubusercontent.com/HigherEdData/Du-Bois-STEM/refs/heads/main/data/d_literacy_country.csv")

d_literacy_country

### 6. Creating a Bar Graph

After successfully listing the data above, you should be able to see that it has data in two columns. Each column is a **variable**:
* **country** is a country name for 10 countries with Black people in the U.S. treated as a country.
* **illiteracy** containts percent of people in each country who are illiterate.

As a first step, we will create a **bar graph** of the data using the shortest code possible. The code will:
1. repeat the code below that we wrote above to **read in** the Du Bois illiteracy data.
2. add a line of code to load the **ggplot2** package of functions into our Library that we will use to generate our graph: ```library(ggplot2)```
3. Add a line of code to tell Jupyter and R that we want the width and height of the graph to have the same ratio that Du Bois used, 22 inches wide by 28 inches tall, with each divided by 2 so that it doesn't display too big: ```options(repr.plot.width=22/3, repr.plot.height=28/3)```
4. Finally, we add a **ggplot** function followed by open parenthese to tell R that we will plot data from the d_literacy_country data frame with an "aesthetic mapping" **aes** specification that maps one column of data on the **x axis** and another column of data on the **y axis**.
5. After the close parentheses that tells ggplot we want to plot ```d_emancipation_dubois``` data with one variable on the x axis, and another on the y axis, we add a + and then a new line of code ```geom_col()``` that tells ggplot we want a bar graph based on summary statistics in the dataframe.
6. After looking at Du Bois' version of the graph above, replace the ```_____``` characters in the code cell below to plot the correct variable on the x-axis and the correct variable on the y-axis.

<details> <summary><strong>Hint:</strong></summary>

Du Bois plotted ```illiteracy``` on the x-axis and ```country``` name on the y-axis.

</details>

<details> <summary><strong>Answer:</strong></summary>

the 4th lines of code from the bottom should be: ```x = illiteracy,```  

the 3rd line of code from the bottom should be ```y = country)) + ```

</details>

In [None]:
# Read in the data from a CSV file into a dataframe called d_literacy_country
d_literacy_country <- read.csv("https://raw.githubusercontent.com/HigherEdData/Du-Bois-STEM/refs/heads/main/data/d_literacy_country.csv")

# Load the ggplot2 package for creating visualizations
library(ggplot2)

# Set the dimensions of the plot output (width and height in inches)
options(repr.plot.width = 22/2, repr.plot.height = 28/2)

# Source a custom ggplot theme from a remote URL to style the plot (theme_dubois)
source("https://raw.githubusercontent.com/HigherEdData/Du-Bois-STEM/refs/heads/main/theme_dubois.R")

# Begin the ggplot call with the data and aesthetic mappings
ggplot(d_literacy_country, aes(
    x = _________,  # replace ________ with the variable name that Du Bois plots on the x-axis
    y = _________)) + # replace ________ with the variable name that Du Bois plots on the x-axis
    # graph a bar chart based on the x and y mappings above
    geom_col() 

### 7. Ordering the Bars and Making the Bar for Black Americans a Different Color

In the bar graph you created above, can you tell what order the bars for each country are sorted by?

Du Bois sorts the bar for each country by its illiteracy rate from highest to lowest.

Du Bois also graphs the illiteracy rate for Black Americans in a different color to make it easier to compare to other countries.

To make these changes in your own graph, edit the code below to:
1. Fill in the blank in the ```y = reorder(country, _________),``` line of code with the variable name for the illiteracy rate. This will change the sorting of the bars.
2. Fill in the blank in the ```fill = country == "__________"``` line of code with the "Negroes, U.S.A." text that Du Bois uses for the Black U.S. resident category.

<details> <summary><strong>Hints:</strong></summary>

```illiteracy``` is the name of the illiteracy variable for the reorder code.
    
Make sure you have quotation marks when you write "Negroes, U.S.A." on the fill line of code. 

</details>

<details> <summary><strong>Answer:</strong></summary>

Use copy and paste to change your bottom 4 lines of code to be:

```r
y = reorder(country, illiteracy),
# Then, you can tell R which country to fill with a different color by specifying the country name
fill = country == "Negroes, U.S.A."
)) +
geom_col()
```

</details>

In [None]:
d_literacy_country <- read.csv("https://raw.githubusercontent.com/HigherEdData/Du-Bois-STEM/refs/heads/main/data/d_literacy_country.csv")
library(ggplot2)
options(repr.plot.width=22/2, repr.plot.height=28/2)
source("https://raw.githubusercontent.com/HigherEdData/Du-Bois-STEM/refs/heads/main/theme_dubois.R")


ggplot(d_literacy_country, aes(
    x = illiteracy,
# above is code you've already written for problems 1 through 5 above
    
# 1. fill in _______ with the correct variable name below to reorder by country 
    y = reorder(country, _________),
# 2. tell R which country to fill with a different color by filling in ________ with the country name
    fill = country == "__________"
)) +
    geom_col()

### 8. Edit the bar width and use a **"Du Bois Theme"** to make colors and text similar to Du Bois' style

Your code edits from #6. above added colors to the bars, but not the same colors used by Du Bois.

And the bar width, background color, legend, and other default ggplot elements are different than those employed by Du Bois.

Edit the code below to:
1. Fill in the blank where we added a width specification to the ```geom_col()``` bar graph command. Try .1 and .9. Then see if if some value in between gives you a bar width close to Du Bois.
2. Add a plus sign after the ```geom_col(width = ____)``` line of code to tell R you are adding another line of code and the end of your R code cell.
3. Add the code ```theme_dubois()``` on a new line after your plus sign. This will apply several Du Bois style options like an "antique white" background color and minimal text in labeling axes. You can see all of these R theme options in the "theme_dubois.R" theme file here: https://github.com/HigherEdData/Du-Bois-STEM/blob/main/theme_dubois.R

<details> <summary><strong>Hints:</strong></summary>

```geom_col(width = .1) ``` will create a very thin line. Try a higher width amount to get a bar closer to Du Bois.
    
The Du Bois theme command is ```theme_dubois()```

</details>

<details> <summary><strong>Answer:</strong></summary>

Use copy and paste to change the bottom 2 lines of code to be:

```R
    geom_col(width = .5) +
    theme_dubois()
```

</details>

In [None]:
d_literacy_country <- read.csv("https://raw.githubusercontent.com/HigherEdData/Du-Bois-STEM/refs/heads/main/data/d_literacy_country.csv")
library(ggplot2)
options(repr.plot.width=22/2, repr.plot.height=28/2)
source("https://raw.githubusercontent.com/HigherEdData/Du-Bois-STEM/refs/heads/main/theme_dubois.R")

ggplot(d_literacy_country, aes(
    x = illiteracy,
    y = reorder(country, illiteracy),
    fill = country == "Negroes, U.S.A."
)) +
# above is code you've already written for problems 1 through 6 above

# for this problem, do the following:
# 1. fill in the ____ below to adjust the bar widths to be more similar to Du Bois'. 
    geom_col(width = ____) 
# 2. add a plus sign above this comment to add a new line of code.
# Then write the code to add the Du Bois theme on a new line.
    

### 9. Change the Text Font and the Bar Color

Adding the Du Bois theme in problem 7 changed the background color, but it didn't change the bar colors. To change the bar colors:

1. fill in the blanks for the scale_fill_manual function of ggplot below to set the colors of the bars as **red** or **darkgreen** based on whether it is true or false that a given country category is "Negroes, U.S.A.".
2. fill the blank of the element_text function with the fontname **serif** to change the Du Bois theme's font for the graph to a **serif** font.

<details> <summary><strong>Hints:</strong></summary>

In Du Bois' graph, the bar is **red** when it is TRUE that the country name is **Negroes, U.S.A**. All of the bars are **darkgreen** when this is FALSE.

</details>

<details> <summary><strong>Answer:</strong></summary>

Use copy and paste to change the bottom 2 lines of code to be:

```R
    scale_fill_manual(values = c("TRUE" = "red", "FALSE" = "darkgreen")) +
# 2. change the themes's font by filling in the blank
    theme(text = element_text('serif')) 
```

</details>

In [None]:
d_literacy_country <- read.csv("https://raw.githubusercontent.com/HigherEdData/Du-Bois-STEM/refs/heads/main/data/d_literacy_country.csv")
library(ggplot2)
options(repr.plot.width=22/2, repr.plot.height=28/2)
source("https://raw.githubusercontent.com/HigherEdData/Du-Bois-STEM/refs/heads/main/theme_dubois.R")

ggplot(d_literacy_country, aes(
    x = illiteracy,
    y = reorder(country, illiteracy),
    fill = country == "Negroes, U.S.A." # this is the fill statement
)) +
    geom_col(width = .5) +
    theme_dubois() +
# above is code you've already written for problems 1 through 7 above

# 1. fill in the blank for which color the bar should be when the fill statement is true or false is TRUE
    scale_fill_manual(values = c("TRUE" = "___", "FALSE" = "_______")) +
# 2. change the themes's font by filling in the blank
    theme(text = element_text('______')) 

### 10. Add the Titles and Subtitles with Your Own Name

To add titles and subtitles to the graph, we use the **labs** function for ggplot (short for labels). We use the title and subtitle specifications with **labs**.

The title text needs to be enclosed in quotation marks. We use the code ```\n``` to tell R to put a "new line" break at different places in the title based on Du Bois' titling.

Fill in the blank with your name in the title code below to show that the graph was recreated by you!

<details> <summary><strong>Hints:</strong></summary>

Make sure that you keep the titles within quotation marks. The text should display in <span style="color: red;">red</span> when the code is within quotation marks.

</details>

In [None]:
d_literacy_country <- read.csv("https://raw.githubusercontent.com/HigherEdData/Du-Bois-STEM/refs/heads/main/data/d_literacy_country.csv")
library(ggplot2)
options(repr.plot.width=22/2, repr.plot.height=28/2)
source("https://raw.githubusercontent.com/HigherEdData/Du-Bois-STEM/refs/heads/main/theme_dubois.R")

ggplot(d_literacy_country, aes(
    x = illiteracy,
    y = reorder(country, illiteracy),
    fill = country == "Negroes, U.S.A." # this is the fill statement
)) +
    geom_col(width = .5) +
    theme_dubois() +
    scale_fill_manual(values = c("TRUE" = "red", "FALSE" = "darkgreen")) +
           theme(text = element_text('serif')) +
# above is code you've already written for problems 1 through 8 above

# Fill in the blank below with your name to show that the graph was recreated by you!
    labs(
        title = "\nIlliteracy of the American Negroes compared with that of other nations.\n",
        subtitle = "Proportion d' illettrés parmi les Nègres Americains comparée à celle des autres nations.\n\n
        Done by Atlanta University.\n\n
        Recreated by ________\n\n"
    )

### 11. Change the Data to Read in and Display College Degree Holding By Country

Now that you've written code to graph Du Bois' literacy data, you can use that same code to make bar graphs of other data in the same style.

To see how this works, fill in the blank below to read in our **d_college_country.csv** dataset of college degree attainment rates today by country.

Then the line of code **d_college_country** will display all of the country names and college attainment rates in the **d_college_country** dataframe.

We obtained this data for the same countries that Du Bois graphed literacy in 1900.

We obtained the country level data from the most recent data reported by the OECD here: https://www.oecd.org/en/topics/sub-issues/education-attainment.html

We obtained the Black college attainment rate data for the U.S. from: https://www.luminafoundation.org/stronger-nation/report/#/progress/racial_equity

<details> <summary><strong>Hints:</strong></summary>

Make sure that you include our **d_** prefix in the filename and the dataframe name along with .csv at the end of the file name.

</details>

<details> <summary><strong>Answer:</strong></summary>

The first line of code should be:
    
```d_college_country<- read.csv("https://raw.githubusercontent.com/HigherEdData/Du-Bois-STEM/refs/heads/main/data/d_college_country.csv") ```

</details>

In [None]:
_____________<- read.csv("https://raw.githubusercontent.com/HigherEdData/Du-Bois-STEM/refs/heads/main/data/d_college_country.csv")

d_college_country

### 12. Edit the Code to Graph the College Attainment Data

After reading in the **d_college_country.csv** data, you can edit the graph code you used for the literacy code to graph the college data.

Fill in the blanks below to:

1. Change the name of the data frame you are graphing with ggplot to the **d_college_country** data frame.
2. Change the x variable you are graphing from literacy to the **college** variable.
3. Change the subtitle of the graph to be a translation of the title to the language of your choice. Du Bois translated his graph title to French for his 1900 Paris Exposition audience in France.
4. Add your own name for the **Adapted by** line.

<details> <summary><strong>Hints:</strong></summary>

The new dataframe name is ```d_college_country```.
    
The x variable name for college attainment is ```college```.

</details>

<details> <summary><strong>Answer:</strong></summary>

The code in the middle of the code cell should be:

```R
    ggplot(d_literacy_country, aes(
# 2. fill in the blank to graph the college variable
    x = d_literacy_country
```

</details>

In [None]:
d_college_country<- read.csv("https://raw.githubusercontent.com/HigherEdData/Du-Bois-STEM/refs/heads/main/data/d_college_country.csv")
library(ggplot2)
options(repr.plot.width=22/2, repr.plot.height=28/2)
source("https://raw.githubusercontent.com/HigherEdData/Du-Bois-STEM/refs/heads/main/theme_dubois.R")
# abvoe is code you learned in Problem 10 for setting up R with the college data

# 1. fill in the blank to change the dataframe you are plotting
ggplot(____________, aes(
# 2. fill in the blank to graph the college variable
    x = _________,
    y = reorder(country, college),
    fill = country == "Black U.S. Residents" # this is the fill statement
)) +
    geom_col(width = .5) +
    theme_dubois() +
       theme(text = element_text('serif')) +
    scale_fill_manual(values = c("TRUE" = "red", "FALSE" = "darkgreen")) +
    labs(
        title = "\nCollege attainment by Black U.S. residents compared with that of other nations.\n",
# 3.fill in the blank below to translate the title in the language of your choice
        subtitle = "_____________________\n\n
        Adapted by ________from Du Bois' graph of literacy in 1900.\n\n"
# 4. Fill in the blank above to show the graph is adapted by you!
    )

### 13. Edit the Code to Add X Axis Grid Lines and a Style Change of Your Choice!

Some of Du Bois' style choices might not make sense for graphs you want to make. For example, Du Bois doesn't provide labels or grid lines to make it easy to understand what the exact college attainment rate is for each country.

You might also want to embed a photo in the graph or use different colors or sizing. Below we've added code to add grid lines with labels.

Use a google search or ask ChatGPT for help figure out how to add at least one line of R code below that changes the graph in any way you like.

For a google search, you could write something like, **how to change font color in ggplot to pink?**

For a chatGPT query, you could copy and paste the code from below and ask, **how could I change this R ggplot code to change the font color to pink**

In [None]:
d_college_country<- read.csv("https://raw.githubusercontent.com/HigherEdData/Du-Bois-STEM/refs/heads/main/data/d_college_country.csv")
library(ggplot2)
options(repr.plot.width=22/2, repr.plot.height=28/2)
source("https://raw.githubusercontent.com/HigherEdData/Du-Bois-STEM/refs/heads/main/theme_dubois.R")

ggplot(d_college_country, aes(
    x = college,
    y = reorder(country, college),
    fill = country == "Black U.S. Residents" # this is the fill statement
)) +
    geom_col(width = .5) +
    theme_dubois() +
       theme(text = element_text('serif')) +
    scale_fill_manual(values = c("TRUE" = "red", "FALSE" = "darkgreen")) +
    labs(
        title = "\nCollege attainment by Black U.S. residents compared with that of other nations.\n",
        subtitle = "_____________________\n\n
        Adapted by ________from Du Bois' graph of literacy in 1900.\n\n"
    ) +
# Above is code you wrote for Problem 12. Below is code to add grid lines with labels
scale_x_continuous(
        breaks = seq(0, 60, by = 10),  # Set tick positions every 10 units
        labels = function(x) paste0(x, "%")  # Add a "%" symbol to each label
    ) +
    theme(
        axis.text.x = element_text(size = 12),
        panel.grid.major.x = element_line(color = "lightgray")
        ) +
# Add your own line of code below to make your own style change to the graph

### 14. Export a final HTML file of your Notebook

Now that you're done, remember to export a final HTML file showing your work and displaying your name within the visualizations you created.

As noted above, you can export the Notebook by clicking the **File** dropdown above, selecting **Export File As** and then selecting **HTML** as shown here:

<img src="attachment:c39a32ef-36ed-4800-812f-6df3ae35593f.png" alt="Screenshot" width="500" />