## Exploring PEN America's School Book Bans Dataset

PEN America releases an annual list of book banning incidents in schools across the nation. Their indices are available for the three academic years starting from 2021 to 2024. The dataset includes data points for book titles, authors and series banned, geographical details and the date and authority of challenge. A book "challenge" is when someone like a parent, lawmaker or community member, complains about the book on the basis of its content and deems it unfit. In response to challenges, the book in question is investigated and taken off the shelf for the duration of the investigation. The result of the investigation will determine whether the book is returned to the shelf, banned in libraries, in classrooms or both. 

Banning books is often seen as an act of censorship meant to diminish intellectual freewill of citizens. While one side of it involves shielding young minds from 'inappropriate' or 'sexually explicit' content, the fact of the matter is that censorship often results in taboos, stigmas or shame being attached to certain topics in society. Additionally, as we move forward as a liberal and more inclusive society, the definition of what is 'inappropriate' or 'sexually explicit' remains in a state of flux. Topics exploring LGBTQ+ love or one's own sexuality or identity and difficult narratives on race are often blanketly deemed inappropriate and banned. Instead of stigmatizing them, real efforts should be made to encourage discussion and to debate the complex nuances of the realities that we live in, especially within the safety of our educational systems. 

This notebook is an exploration of the Index of banned books, attempting to parse out authors that have faced heavy censorship, states that lead in such challenges and any trends in the number and types of books being banned. 

Visuals were made using R (ggplot), Adobe Illustrator, Rawgraphs and Datawrapper.

In [1]:
import pandas as pd
import matplotlib as plt
import seaborn as sns

In [37]:
#load the individual files downloaded from PEN America to concatenate them

bb_2021 = pd.read_csv("data/bb_2021-2022.csv")

bb_2022 = pd.read_csv("data/bb_2022-2023.csv")
bb_2022.drop(columns = ["Series Name"], inplace = True)

bb_2023 = pd.read_csv("data/bb_2023-2024.csv")
bb_2023.drop(columns = ["Series Name"], inplace = True)

bb_2023.head()

Unnamed: 0,Title,Author,Secondary Author(s),Illustrator(s),Translator(s),State,District,Date of Challenge/Removal,Ban Status,Origin of Challenge
0,Next Summer,"Abbott, Hailey",,,,Florida,Escambia County Public Schools,August 2023,Banned pending investigation,Administration
1,Summer Girls,"Abbott, Hailey",,,,Florida,Escambia County Public Schools,August 2023,Banned pending investigation,Administration
2,In the Belly of the Beast: Letters From Prison,"Abbott, Jack Henry",,,,Florida,Orange County Public Schools,June 2024,Banned by restriction,Administration
3,The End of Everything,"Abbott, Megan",,,,Florida,Lee County Schools,June 2024,Banned by restriction,Administration
4,The Summer of Owen Todd,"Abbott, Tony",,,,Florida,Escambia County Public Schools,August 2023,Banned pending investigation,Administration


# Organizing Data 

In [38]:
#I want a column for year but academic year is the most accurate. For books that don't have a certain date given, 
# it's hard to accurately put them in a single year bucket

bb_2021["Academic Year"] = ["2021-2022"] * len(bb_2021)
bb_2022["Academic Year"] = ["2022-2023"] * len(bb_2022)
bb_2023["Academic Year"] = ["2023-2024"] * len(bb_2023)

bb_2023.head()

Unnamed: 0,Title,Author,Secondary Author(s),Illustrator(s),Translator(s),State,District,Date of Challenge/Removal,Ban Status,Origin of Challenge,Academic Year
0,Next Summer,"Abbott, Hailey",,,,Florida,Escambia County Public Schools,August 2023,Banned pending investigation,Administration,2023-2024
1,Summer Girls,"Abbott, Hailey",,,,Florida,Escambia County Public Schools,August 2023,Banned pending investigation,Administration,2023-2024
2,In the Belly of the Beast: Letters From Prison,"Abbott, Jack Henry",,,,Florida,Orange County Public Schools,June 2024,Banned by restriction,Administration,2023-2024
3,The End of Everything,"Abbott, Megan",,,,Florida,Lee County Schools,June 2024,Banned by restriction,Administration,2023-2024
4,The Summer of Owen Todd,"Abbott, Tony",,,,Florida,Escambia County Public Schools,August 2023,Banned pending investigation,Administration,2023-2024


In [39]:
# concatenating to create one big banned books df

banned_books_df = pd.concat([bb_2021, bb_2022, bb_2023], ignore_index=True)
banned_books_df.head()

Unnamed: 0,Author,Title,Ban Status,Secondary Author(s),Illustrator(s),Translator(s),State,District,Date of Challenge/Removal,Origin of Challenge,Academic Year
0,"Àbíké-Íyímídé, Faridah",Ace of Spades,Banned in Libraries and Classrooms,,,,Florida,Indian River County School District,November 2021,Administrator,2021-2022
1,"Acevedo, Elizabeth",Clap When You Land,Banned in Classrooms,,,,Pennsylvania,Central York School District,August 2021,Administrator,2021-2022
2,"Acevedo, Elizabeth",The Poet X,Banned in Libraries,,,,Florida,Indian River County School District,November 2021,Administrator,2021-2022
3,"Acevedo, Elizabeth",The Poet X,Banned in Libraries and Classrooms,,,,New York,Marlboro Central School District,February 2022,Administrator,2021-2022
4,"Acevedo, Elizabeth",The Poet X,Banned Pending Investigation,,,,Texas,Fredericksburg Independent School District,March 2022,Administrator,2021-2022


In [40]:
#Check to make sure they combine to give the right number of rows
#len(bb_2021) + len(bb_2022) + len(bb_2023)

len(banned_books_df)

15940

In [41]:
banned_books_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15940 entries, 0 to 15939
Data columns (total 11 columns):
 #   Column                     Non-Null Count  Dtype 
---  ------                     --------------  ----- 
 0   Author                     15939 non-null  object
 1   Title                      15939 non-null  object
 2   Ban Status                 15940 non-null  object
 3   Secondary Author(s)        794 non-null    object
 4   Illustrator(s)             1307 non-null   object
 5   Translator(s)              117 non-null    object
 6   State                      15940 non-null  object
 7   District                   15940 non-null  object
 8   Date of Challenge/Removal  15940 non-null  object
 9   Origin of Challenge        15940 non-null  object
 10  Academic Year              15940 non-null  object
dtypes: object(11)
memory usage: 1.3+ MB


In [42]:
# which author and title values are missing?

banned_books_df[banned_books_df["Author"].isnull()]

Unnamed: 0,Author,Title,Ban Status,Secondary Author(s),Illustrator(s),Translator(s),State,District,Date of Challenge/Removal,Origin of Challenge,Academic Year
3130,,,Banned from Libraries,,,,Texas,Frisco Independent School District,October 2022,Administration,2022-2023


In [43]:
# drop this, there is no way to find out what the book is 

banned_books_df = banned_books_df.dropna(subset=['Author', 'Title'])
banned_books_df.info()

In [103]:
banned_books_df["Ban Status"].value_counts()

Ban Status
Banned Pending Investigation            7245
Banned                                  4295
Banned from Libraries and Classrooms    1596
Banned by restriction                   1347
Banned from Libraries                    940
Banned from Classrooms                   516
Name: count, dtype: int64

In [102]:
# normalising some parameters like Ban Status 

banned_books_df["Ban Status"] = banned_books_df["Ban Status"].replace({
    "Banned pending investigation": "Banned Pending Investigation",
    "Banned in Libraries": "Banned from Libraries",
    "Banned in Libraries and Classrooms": "Banned from Libraries and Classrooms",
    "Banned in Classrooms": "Banned from Classrooms"
})

In [47]:
#save this into a csv

banned_books_df.to_csv("data/all_banned_books.csv", index=False)

# Data Analysis

### Each row is one instance of a ban. If 10 states ban the same book, it is counted as 10 bans of 1 unique title.

Each row is better represented as a time a book was challenged and removed from shelfs. PEN america does not track whether they are ultimately returned to shelfs or not. 

This is counter to procedural best practices from the National Coalition Against Censorship (NCAC) and the American Library Association (ALA), which state that a book should remain in circulation while undergoing a reconsideration process.

Banned by Restrictions mean that they seek to restrict it's access like parental permission or grade level

In [2]:
banned_books_df = pd.read_csv("data/all_banned_books.csv")
banned_books_df.head()

Unnamed: 0,Author,Title,Ban Status,Secondary Author(s),Illustrator(s),Translator(s),State,District,Date of Challenge/Removal,Origin of Challenge,Academic Year
0,"Àbíké-Íyímídé, Faridah",Ace of Spades,Banned in Libraries and Classrooms,,,,Florida,Indian River County School District,November 2021,Administrator,2021-2022
1,"Acevedo, Elizabeth",Clap When You Land,Banned in Classrooms,,,,Pennsylvania,Central York School District,August 2021,Administrator,2021-2022
2,"Acevedo, Elizabeth",The Poet X,Banned in Libraries,,,,Florida,Indian River County School District,November 2021,Administrator,2021-2022
3,"Acevedo, Elizabeth",The Poet X,Banned in Libraries and Classrooms,,,,New York,Marlboro Central School District,February 2022,Administrator,2021-2022
4,"Acevedo, Elizabeth",The Poet X,Banned Pending Investigation,,,,Texas,Fredericksburg Independent School District,March 2022,Administrator,2021-2022


In the 2023-2024 academic year, there are 10,046 instances of book bans across USA (does not mean 10046 unique books were banned)

In [3]:
banned_books_df["Academic Year"].value_counts()

Academic Year
2023-2024    10046
2022-2023     3361
2021-2022     2532
Name: count, dtype: int64

6325 unique titles have been banned in USA from 2021-2024

In [4]:
# Number of unique titles banned in the US

len(banned_books_df["Title"].value_counts())

6325

In [5]:
len(banned_books_df[banned_books_df["Academic Year"] == "2021-2022"]["Title"].value_counts())

1648

In [6]:
# unique titles in 2022-2023
len(banned_books_df[banned_books_df["Academic Year"] == "2022-2023"]["Title"].value_counts())

1556

In [7]:
# unique titles in 2023-2024

len(banned_books_df[banned_books_df["Academic Year"] == "2023-2024"]["Title"].value_counts())

4240

In [8]:
list(banned_books_df[banned_books_df["Academic Year"] == "2021-2022"]["Title"].value_counts().reset_index()["Title"])

['Gender Queer: A Memoir',
 "All Boys Aren't Blue",
 'Out of Darkness',
 'The Bluest Eye',
 'Lawn Boy',
 'The Hate U Give',
 'The Absolutely True Diary of a Part-Time Indian',
 'Me and Earl and the Dying Girl',
 'The Kite Runner',
 'Crank (Crank Series)',
 'Thirteen Reasons Why',
 'l8r, g8r',
 'Beloved',
 'Drama: A Graphic Novel',
 'Melissa (George)',
 'Looking for Alaska',
 'This Book Is Gay',
 'Beyond Magenta: Transgender Teens Speak Out',
 'This One Summer',
 'Jack of Hearts (and other parts)',
 'Flamer',
 'All American Boys',
 'Fun Home: A Family Tragicomic',
 'The Breakaways',
 "The Handmaid's Tale",
 'Nineteen Minutes',
 'The Perks of Being a Wallflower',
 'Tricks (Tricks Series)',
 'More Happy Than Not',
 'Dear Martin',
 'Extremely Loud & Incredibly Close',
 'A Court of Mist and Fury (A Court of Thorns and Roses Series)',
 'The 57 Bus: A True Story of Two Teenagers and the Crime That Changed Their Lives',
 "It's Perfectly Normal: Changing Bodies, Growing Up, Sex, and Sexual Heal

In [10]:
# did some books get unbanned cause number goes down from 2021 to 2022? Do banned books remain banned forever?


unique_2021 = list(banned_books_df[banned_books_df["Academic Year"] == "2021-2022"]["Title"].value_counts().reset_index()["Title"])

unique_2022 = list(banned_books_df[banned_books_df["Academic Year"] == "2022-2023"]["Title"].value_counts().reset_index()["Title"])

In [11]:
len([book for book in unique_2022 if book not in unique_2021])

1257

In [12]:
# 1349 unique titles were banned in 2021 AY but not in 2022-2023
len([book for book in unique_2021 if book not in unique_2022])

1349

In [13]:
# which books got unbanned (if that)?

maybe_unbanned = [book for book in unique_2021 if book not in unique_2022]

banned_books_df[(banned_books_df["Title"].isin(maybe_unbanned)) & 
                (banned_books_df["Academic Year"] == "2021-2022")]["Ban Status"].value_counts()

Ban Status
Banned Pending Investigation          853
Banned in Classrooms                  438
Banned in Libraries                   178
Banned in Libraries and Classrooms    117
Name: count, dtype: int64

Most are under investigation so maybe they got returned to shelves. But unclear. The main point here is that accessibility to these books was censored for long periods

In [14]:
# getting data in the right format for visualization

# heirarchy of number of titles per ban status, per academic year 
banned_books_df.groupby(["Ban Status", "Academic Year"])["Title"].count().reset_index().sort_values("Academic Year")#.to_clipboard()

Unnamed: 0,Ban Status,Academic Year,Title
1,Banned Pending Investigation,2021-2022,1375
7,Banned in Classrooms,2021-2022,487
8,Banned in Libraries,2021-2022,337
9,Banned in Libraries and Classrooms,2021-2022,333
2,Banned Pending Investigation,2022-2023,1466
4,Banned from Classrooms,2022-2023,29
5,Banned from Libraries,2022-2023,603
6,Banned from Libraries and Classrooms,2022-2023,1263
0,Banned,2023-2024,4295
3,Banned by restriction,2023-2024,1347


In [15]:
# checking to see if it grouped how I want it to
banned_books_df[banned_books_df["Academic Year"] == "2021-2022"]["Ban Status"].value_counts()

Ban Status
Banned Pending Investigation          1375
Banned in Classrooms                   487
Banned in Libraries                    337
Banned in Libraries and Classrooms     333
Name: count, dtype: int64

In [16]:
# heirarchy of titles per ban status, per state, per academic year (think a sunburst chart)
banned_books_df.groupby(["Ban Status", "State", 
                         "Academic Year"])["Title"].count().reset_index().sort_values("Academic Year")

Unnamed: 0,Ban Status,State,Academic Year,Title
165,Banned in Libraries,Wisconsin,2021-2022,17
162,Banned in Libraries,Texas,2021-2022,163
163,Banned in Libraries,Utah,2021-2022,2
164,Banned in Libraries,Virginia,2021-2022,1
158,Banned in Libraries,Oklahoma,2021-2022,9
...,...,...,...,...
67,Banned by restriction,Illinois,2023-2024,1
66,Banned by restriction,Idaho,2023-2024,1
65,Banned by restriction,Florida,2023-2024,988
73,Banned by restriction,Minnesota,2023-2024,1


# Focusing on 2023-2024

In [14]:
banned_books_df_2024 = banned_books_df[banned_books_df["Academic Year"] == "2023-2024"]

In [39]:
top5_states = banned_books_df_2024["State"].value_counts().head(5).reset_index()["State"].to_list()
top5_states

['Florida', 'Iowa', 'Texas', 'Wisconsin', 'Virginia']

In [40]:
#filtering for top 10 states only

In [44]:
# heirarchy of titles per ban status, per state, per academic year (think a sunburst chart)
banned_books_df_2024[banned_books_df_2024["State"].isin(top5_states)].groupby(["Ban Status", "State"])["Title"].count().reset_index().sort_values("Title", ascending = False).to_clipboard()

## Quick break to look at which states are not banning books

In [290]:
banned_books_df[banned_books_df["State"] == "California"]

Unnamed: 0,Author,Title,Ban Status,Secondary Author(s),Illustrator(s),Translator(s),State,District,Date of Challenge/Removal,Origin of Challenge,Academic Year
3184,"Dawson, Juno",This Book Is Gay,Banned from Libraries and Classrooms,,,,California,William S. Hart Union High School District,September 2022,Unclear,2022-2023
8061,"Dawson, Juno",This Book Is Gay,Banned,,,,California,Escondido Union School District,October 2023,Informal Challenge,2023-2024
9037,"Green, John",Looking for Alaska,Banned,,,,California,Escondido Union School District,October 2023,Administration,2023-2024


I am hesitant about drawing conclusions based on the map of states that haven't banned any books in the past 3 years (like Alabama or New Mexico). I don't think you can confidently say that somehow they care more about freedom of expression or are better about censorship -- maybe their initial educational offerings in libraries and classrooms were already stringent to begin with. We don't know 

In [291]:
banned_books_df["State"].value_counts().reset_index().head(10) #Datawrapper chart made

Unnamed: 0,State,count
0,Florida,6533
1,Iowa,3685
2,Texas,1963
3,Pennsylvania,664
4,Wisconsin,480
5,Missouri,417
6,Tennessee,394
7,Utah,325
8,Virginia,215
9,South Carolina,192


## Trying to make Radial Graphs

In [275]:
# getting the top 50 most banned authors for each academic year
df = banned_books_df.groupby(["Author","Academic Year"])["Title"].count().reset_index().sort_values(["Academic Year","Title"], 
                                                                                               ascending=False).groupby("Academic Year").head(50)

In [276]:
# save this csv
df.to_excel("top_banned_authors_perAY.xlsx", sheet_name="Banned Authors", index=False)

I did this to check if there were gender or identity trends in the authors being banned. I made a pictogram but I made it on Flourish and for the scope of this assigment, did not include it into the final piece. The pictogram shows the top author banned in each state, per academic year, categorized by their pronouns which serve as a nod to their gender identity. I manually encoded their pronouns by finding them on their instagram accounts, ALA articles about their books that include them, their own interviews or personal websites.  

In [334]:
# top authors from each state for every academic year
banned_books_df.groupby(["Author", "State", "Academic Year"])["Title"].count().reset_index().sort_values(["State", "Title", "Academic Year"], ascending=False).groupby(["State","Academic Year"]).head(1).to_clipboard()


# Standardise Author Names

It's given as "last name, first name" in the dataset and was affecting readability in the visualization.

In [2]:
authors = pd.read_csv("data/top50_banned_2024.csv")
authors.head()

In [59]:
authors_list = authors["Author"].str.split(",").to_list()

In [61]:
authors["Author"] = [authors[1].strip()+ " " + authors[0].strip() for authors in authors_list]

In [63]:
authors.head()

Unnamed: 0,Author,Academic Year,Title
0,Ellen Hopkins,2023-2024,523
1,Sarah J. Maas,2023-2024,481
2,Stephen King,2023-2024,173
3,Jodi Picoult,2023-2024,161
4,John Green,2023-2024,157


In [64]:
authors.to_csv("top50_banned_2024.csv", index = False)

# What are the top banned books each year?

From a reader's perspective - this is what they are most interested in. Atleast I would be

In [20]:
banned_books_df.groupby(["Title","Academic Year","Author"])["Ban Status"].count().reset_index().sort_values(["Academic Year","Ban Status"], ascending = [True,False]).groupby("Academic Year").head(10)

Unnamed: 0,Title,Academic Year,Author,Ban Status
2196,Gender Queer: A Memoir,2021-2022,"Kobabe, Maia",41
374,All Boys Aren't Blue,2021-2022,"Johnson, George M.",29
4035,Out of Darkness,2021-2022,"Pérez, Ashley Hope",23
5319,The Bluest Eye,2021-2022,"Morrison, Toni",22
3152,Lawn Boy,2021-2022,"Evison, Jonathan",17
5782,The Hate U Give,2021-2022,"Thomas, Angie",17
5177,The Absolutely True Diary of a Part-Time Indian,2021-2022,"Alexie, Sherman",16
3520,Me and Earl and the Dying Girl,2021-2022,"Andrews, Jesse",14
1389,Crank (Crank Series),2021-2022,"Hopkins, Ellen",12
5900,The Kite Runner,2021-2022,"Hosseini, Khaled",12


In [15]:
#sanity check for grouping
banned_books_df_2024["Title"].value_counts()

Title
Nineteen Minutes                                     98
Looking for Alaska                                   97
The Perks of Being a Wallflower                      85
Sold                                                 85
Thirteen Reasons Why                                 76
                                                     ..
Shadow of the Hegemon                                 1
Misery                                                1
Mr. Mercedes                                          1
Pastwatch: The Redemption of Christopher Columbus     1
Next Summer                                           1
Name: count, Length: 4240, dtype: int64