# Japanese American WWII CASE Module
### Revisiting Segregation through Computational History: the Case of the WWII Japanese American Tule Lake Segregation Center
* Contributors: Richard Marciano and Greg Jansen
* Source Available: https://github.com/cases-umd/Japanese-American-WWII
* License: [Creative Commons - Attribute 4.0 Intl](https://creativecommons.org/licenses/by/4.0/)
* [Lesson Plan for Instructors](./lesson-plan.ipynb)
* Related Publications: 
 * [Automating the Detection of Personally Identifiable Information (PII) in Japanese-American WWII Incarceration Camp Records](https://dcicblog.umd.edu/ComputationFrameworkForArchivalEducation/wp-content/uploads/sites/20/2019/01/pii_paper.pdf)
* More Information:
 * [Project Blog](https://dcicblog.umd.edu/Japanese-AmericanWWIICamps/project-showcase/)

## Introduction
This module is based on a case study involving [WWII Japanese American incarceration camp](https://en.wikipedia.org/wiki/Internment_of_Japanese_Americans) archival records from the National Archives and Records Administration (NARA). These records were examined for their potential to generate new forms of archival analysis and historical research engagement at the Tule Lake Unit, a recently established National Monument of the National Park Service (NPS) through a December 2008 Presidential proclamation. The project worked closely with NARA staff from the Office of Innovation and Research Services, and NPS staff from the NPS Tule Lake Unit, as well as prominent Tule Lake historians, colleagues from King’s College London, and experts from the US Holocaust Memorial Museum.

![Map showing western United States and locations of WWII Japanese-American camps](JAWWII-map.png "Map of Camp Locations")

The overall aim of this project is to build a platform for the integration of a number of distributed digital sources including significant sources from NARA:

* Camp “incident cards” data (persons, dates, events, locations)
* Japanese-American Internee database of 109,000 names
* 4,100 online WRA photos
* Architectural camp records

The approach is inspired by the Digital Harlem Project, which developed a digital platform to query, map, and visualize legal records of ordinary citizens in Black Harlem between 1915 and 1930. The Tule Lake project investigates and prototypes a platform that also links people, places, and events from distributed sources in a way that brings that archive to life as a resource for investigation and storytelling.

![Top view of box full of filed index cards](box8-top.png "Box 8")

![Front view of box full of filed index cards](box8-front.png "Box 8")

## Motivation
Balancing privacy and  access is an important topic in the age of digital curation, where datafication and the use of big data can lead to compromising privacy through the application of predictive analytics. In this module, we use the 10,000 **released** cards to create and test an algorithm that detects personally identifiable information (PII). Since these cards are already released, those involving anyone who was a juvenile at the time have already been redacted by NARA staff. So this card data is ideal for use as a demonstration of the algorithmic techniques for PII identification.

## Learning Goals

### **A**rchival Practices
 * Categories to be added..

### **C**omputational Practices
* Data Practices
 * Creating Data
 * Manipulating Data
 * Analyzing Data
 * Visualizing Data
* Modeling & Simulation Practices
 * Designing Computational Models
 * Constructing Computational Models
* Computational Problem Solving Practices
 * Computer Programming
 * Developing Modular Computational Solutions
 * Troubleshooting and Debugging
 
### **E**thical Concerns
 * Categories to be added..

# Software and Tools

* Python 3
* [Pandas](https://pandas.pydata.org/) - high-performance, easy-to-use data structures and data analysis tools for the Python programming language
* [Matplotlib](https://matplotlib.org/) - a Python 2D plotting library



# Acquiring or Accessing the Data
The data for this project was originally acquired from NARA collections and then screened to remove records related to those who were minors at the time. The main set of records is a subset of a larger collection of so called "incident cards" from the Tule Lake camp. The data also includes two datasets that are person registries. One is the "FAR Registry" and the other is the "WRA Form 26 Register". WRA is the War Relocation Authority. All data sources are included in this module as comma-separated values files. The links below will take you to a view of each data file:

* [Cards_Box9.csv](Cards_Box9.csv)
* [TuleLake_FAR_ALL_FINAL4.csv](TuleLake_FAR_ALL_FINAL4.csv)
* [WRAForm26.csv](WRAForm26.csv)

No additional data is required to run the notebooks in this module.

# Notebooks

This module is organized into a series of Python Notebooks that introduce, process, coordinate, and visualize the Tule Lake data.

1. [Dataset Review: WRA Form 26](WRAForm26.ipynb)
1. [Dataset Review: Incident Cards](Cards.ipynb)
1. [Dataset Review: Final Accountability Roster (FAR)](FAR.ipynb)
1. [Finding Private PII, Part One](PII_Algorithm.ipynb)
1. [Finding Private PII, Part Two](PII_Algorithm2.ipynb)
1. [Data Visualization](Visualize.ipynb)