# Introduction

In this notebook, we will be mapping the beginning and end coordinates of people who migrated from one location to another after being placed in internment camps. The Tulean Dispatch was the Tule Lake internment camp's official newspaper from 1942 to 1943, and later became the Tule Lake Newell Star newspaper. However, we will be focusing on the data collected during this approximate one year period, and visualizing the movement of individuals across the U.S. as they were forced from one place to another. More information can be found here: http://encyclopedia.densho.org/Tulean%20Dispatch%20(newspaper)/

In [16]:
from datascience import *
import numpy as np
import pandas as pd
import folium
% matplotlib inline

# 1. Data Processing

In this section, we will be importing the main csv files `tuleandispatch.csv` and `combined.csv` in the WRA folder, and converting them into tables that we can use in exploratory data analysis (EDA). The cell below does this with the implementation of pandas, a Python package that allows usage of flexible data structures designed to work with analysis of relational data, represented as *pd*.

In [17]:
dispatch = pd.read_csv('tuleandispatch.csv')
wra = pd.read_csv('WRA/combined.csv')

  interactivity=interactivity, compiler=compiler, result=result)


Seen in the cell below, the table has many different versions of the same location Tule Lake. How can we clean the data to reconcile this?

In [18]:
dispatch['Camp Name'].unique()

array(['TULE LAKE', 'Tule Lake', 'Gila River', 'Tule 38603-A',
       'Tule Lake  3803-A', 'Tule Lake 2418-B', 'Tule Lake 5413-F',
       ' Tule Lake 1307-A', 'Tule Lake ', 'Santa Fe, Mexico', nan], dtype=object)

The next two cells relabel our columns of interest in uppercase letters in order to match with the WRA dataset. The two tables will then be combined into one, called `merged`.

In [19]:
# Uppercase the columns of interest
dispatch = dispatch.rename(columns={'Last Name': 'LAST NAME', 'First Name': 'FIRST NAME', 'Camp Name':'RELOCATION PROJECT', 'Destination': 'DESTINATION'})

# Make first name and last name values uppercase
for col_name in ['FIRST NAME', 'LAST NAME']:
    dispatch[col_name] = dispatch[col_name].apply((lambda x: str(x).upper()))

dispatch

Unnamed: 0,Age,RELOCATION PROJECT,Date of Departure,DESTINATION,FIRST NAME,Gender,LAST NAME,Marital Status,Middle Name,Notes,Occupation,Source
0,,TULE LAKE,05/26/1943,"SHAKOPEE, MN",GEORGE,M,SAKAMOTO,,,NYA WAR PRODUCTION TRAINING CENTER,"MACHINERY , SHEET METAL, WELDING, FOUNDRY, PAT...","TULEAN DISPATCH, 5.58"
1,,TULE LAKE,05/26/1943,"SHAKOPEE, MN",ALBERT,M,OSHITA,,,NYA WAR PRODUCTION TRAINING CENTER,"MACHINERY , SHEET METAL, WELDING, FOUNDRY, PAT...","TULEAN DISPATCH, 5.58"
2,,TULE LAKE,05/26/1943,"SHAKOPEE, MN",THOMAS,M,OSHIKA,,A.,NYA WAR PRODUCTION TRAINING CENTER,"MACHINERY , SHEET METAL, WELDING, FOUNDRY, PAT...","TULEAN DISPATCH, 5.58"
3,,TULE LAKE,05/26/1943,"SHAKOPEE, MN",HENRY,M,SHIHOJIMA,,,"NYA WAR PRODUCTION TRAINING CENTER, project ca...","MACHINERY , SHEET METAL, WELDING, FOUNDRY, PAT...","TULEAN DISPATCH, 5.58"
4,,TULE LAKE,05/26/1943,"SHAKOPEE, MN",TOM,M,MURAKI,,,"NYA WAR PRODUCTION TRAINING CENTER, project ca...","MACHINERY , SHEET METAL, WELDING, FOUNDRY, PAT...","TULEAN DISPATCH, 5.58"
5,,TULE LAKE,05/26/1943,"SHAKOPEE, MN",HISASHI,M,KUNAGAI,,,"NYA WAR PRODUCTION TRAINING CENTER, project ca...","MACHINERY , SHEET METAL, WELDING, FOUNDRY, PAT...","TULEAN DISPATCH, 5.58"
6,,TULE LAKE,05/26/1943,"SHAKOPEE, MN",JACK,M,CKUDA,,V.,"NYA WAR PRODUCTION TRAINING CENTER, project ca...","MACHINERY , SHEET METAL, WELDING, FOUNDRY, PAT...","TULEAN DISPATCH, 5.58"
7,,TULE LAKE,05/26/1943,"SHAKOPEE, MN",SHIG,M,KATO,,,"NYA WAR PRODUCTION TRAINING CENTER, project ca...","MACHINERY , SHEET METAL, WELDING, FOUNDRY, PAT...","TULEAN DISPATCH, 5.58"
8,,TULE LAKE,05/26/1943,"SHAKOPEE, MN",KUNIO,M,KAWATA,,BILL,"NYA WAR PRODUCTION TRAINING CENTER, project ca...","MACHINERY , SHEET METAL, WELDING, FOUNDRY, PAT...","TULEAN DISPATCH, 5.58"
9,,TULE LAKE,05/26/1943,"SHAKOPEE, MN",BOB,M,OKAMURA,,,"NYA WAR PRODUCTION TRAINING CENTER, project ca...","MACHINERY , SHEET METAL, WELDING, FOUNDRY, PAT...","TULEAN DISPATCH, 5.58"


In [20]:
# Merging with WRA
merged = pd.merge(dispatch, wra, on = ['LAST NAME', 'FIRST NAME', 'RELOCATION PROJECT'])
# merged

The following cell will define a function `num_occurrences` that takes in three arguments in order to analyze the above data table, identifying all duplicate names in the columns `FIRST NAME` and `LAST NAME` and how many there are of each duplicate.

In [21]:
def num_occurrences(table, first_name, last_name):
    first_name_s = pd.Series([first_name] * len(table.index))
    last_name_s = pd.Series([last_name]*len(table.index))
    query = table[(table['FIRST NAME'] == first_name_s) &  (table['LAST NAME'] == last_name_s)]
    num_rows = len(query)
    return num_rows

duplicates = set()
for index, row in merged.iterrows():
    first_name, last_name = row['FIRST NAME'], row['LAST NAME']
    n_occurrences = num_occurrences(merged, row['FIRST NAME'], row['LAST NAME'])
    if n_occurrences > 1:
        duplicates.add((first_name, last_name, n_occurrences))
duplicates

{('AIKO', 'YAMAMOTO', 3),
 ('FRANCES', 'MORIOKA', 2),
 ('FRANK', 'FUJITA', 2),
 ('FRANK', 'MATSUMOTO', 4),
 ('FRANK', 'YAMAMOTO', 2),
 ('GEORGE', 'DANZUKA', 4),
 ('GEORGE', 'KATO', 2),
 ('GEORGE', 'KUBO', 2),
 ('GEORGE', 'MIYAI', 2),
 ('GEORGE', 'NOMURA', 2),
 ('GEORGE', 'SUMIDA', 2),
 ('GEORGE', 'TAKAO', 3),
 ('GEORGE', 'TAKETA', 3),
 ('GEORGE', 'YASUI', 2),
 ('HARRY', 'HAMADA', 2),
 ('HIROSHI', 'KANEKO', 2),
 ('HIROSHI', 'NAKAMURA', 3),
 ('JOE', 'TOMITA', 2),
 ('LILY', 'YAMASAKI', 2),
 ('MARY', 'MARUYAMA', 2),
 ('NOBORU', 'HONDA', 2),
 ('PAUL', 'TAKAHASHI', 2),
 ('TAKESHI', 'NAKAMURA', 2),
 ('YOSHIKO', 'SUZUKI', 2),
 ('YOSHIYE', 'FURUTA', 2)}

## 2. Back to Data Science Tables

In this section, we are converting all of our data in the `tulean_dispatch.csv` into a readable table that we can use to analyze. 

In [22]:
# Convert to a Datascience Table object
tulean_dispatch_joined = Table.from_df(merged)
#tulean_dispatch_joined

In [23]:
# Grouping by first. What does this entail?
get_first = lambda x: x[0]
tulean_dispatch_joined= tulean_dispatch_joined.group('FIRST NAME', collect = get_first).group('LAST NAME', collect = get_first)

In [24]:
# Get Columns of Interest
tulean_dispatch = tulean_dispatch_joined.select(['FIRST NAME', 'LAST NAME', 'RELOCATION PROJECT', 'LAST PERMANENT ADDRESS STATE', 'LAST PERMANENT ADDRESS COUNTY', 'ASSEMBLY CENTER', 'DESTINATION'])
# tulean_dispatch

### Coordinate Data Cleaning

Here is the `tulean_dispatch` table that we can finally work with!

In [25]:
# Loading in coordinates
coords = Table.read_table('tuleandispatch_coordinates.csv')
# Group locations
coords = coords.group('Location', collect = lambda x: x[0])
coords

Location,"Lat,Long",Notes
"Ann Arbor, Michigan","42.281389, -83.748333",
"Aurora, Illinois","41.7605800, -88.3200700",
"BOISE, ID","43.6187, -116.2146",
"BROOKLYN, NY","40.692778, -73.990278",
"Baldwin, Kansas","38.7775, -95.1875",
"Boise, Idaho","43.6187, -116.2146",
"CHICAGO, IL","41.8781 , -87.6298",
"CINCINATTI, OH","39.1031, -84.51202",
"CLEVELAND, OH","41.505493, -81.681290",
"CLEVELAND, OHIO","41.505493, -81.681290",


In [26]:
# Join Coordinates and locations
def process_coords(coords):
    return list([float(coord) for coord in coords.split(',')])
locations = ['RELOCATION PROJECT', 'DESTINATION']
for name in locations:
    tulean_dispatch = tulean_dispatch.join(name, coords, 'Location')
    tulean_dispatch[name + ' COORDS'] = tulean_dispatch.apply(process_coords, 'Lat,Long')
    tulean_dispatch = tulean_dispatch.drop(['Lat,Long', 'Notes'])
# Load the table
tulean_dispatch

DESTINATION,RELOCATION PROJECT,FIRST NAME,LAST NAME,LAST PERMANENT ADDRESS STATE,LAST PERMANENT ADDRESS COUNTY,ASSEMBLY CENTER,RELOCATION PROJECT COORDS,DESTINATION COORDS
"Ann Arbor, Michigan",Tule Lake,GEORGE,SUMIDA,Pacific States - California,SACRAMENTO,Sacramento (Walerga),[ 41.9688 -121.5681],[ 42.281389 -83.748333]
"Baldwin, Kansas",Tule Lake,KATE,KYONO,Pacific States - Oregon,MARION,,[ 41.9688 -121.5681],[ 38.7775 -95.1875]
"Boise, Idaho",Tule Lake,JANE,HAMADA,Pacific States - California,SANTA CRUZ,Marysville (Arboga),[ 41.9688 -121.5681],[ 43.6187 -116.2146]
"Boise, Idaho",Tule Lake,YOSHIMI,ISHIMOTO,Pacific States - California,SACRAMENTO,Sacramento (Walerga),[ 41.9688 -121.5681],[ 43.6187 -116.2146]
"Boise, Idaho",Tule Lake,ELAINE,TSUMURA,,,,[ 41.9688 -121.5681],[ 43.6187 -116.2146]
"CLEVELAND, OHIO",Tule Lake,KIKUJI,RYUGO,Pacific States - California,SACRAMENTO,Sacramento (Walerga),[ 41.9688 -121.5681],[ 41.505493 -81.68129 ]
"Caldwell, Idaho",Tule Lake,MIDORI,FURUSHIRO,Pacific States - California,YOLO,,[ 41.9688 -121.5681],[ 43.658333 -116.680278]
"Camp Granada - Amacha, Colorado",Tule Lake,FUSAKO,TEKAWA,Pacific States - California,YUBA,,[ 41.9688 -121.5681],[ 38.064722 -102.311111]
"Camp Robinson, Arkansas",Tule Lake,JOSIE,YANAGAWA,Pacific States - Washington,KING,Pinedale,[ 41.9688 -121.5681],[ 34.85 -92.300278]
"Camp Savage, Minnesota",Tule Lake,SARA,TANIGAWA,Pacific States - California,PLACER,Sacramento (Walerga),[ 41.9688 -121.5681],[ 44.783333 -93.333333]


In [27]:
def get_from_to_coords(table, from_location, to_location):
    result = table.select([from_location + ' COORDS', to_location + ' COORDS'])
    result.relabel(from_location + ' COORDS', 'from')
    result.relabel(to_location + ' COORDS', 'to')
    return result
relocation_to_destination = get_from_to_coords(tulean_dispatch, 'RELOCATION PROJECT', 'DESTINATION')
relocation_to_destination

from,to
[ 41.9688 -121.5681],[ 42.281389 -83.748333]
[ 41.9688 -121.5681],[ 38.7775 -95.1875]
[ 41.9688 -121.5681],[ 43.6187 -116.2146]
[ 41.9688 -121.5681],[ 43.6187 -116.2146]
[ 41.9688 -121.5681],[ 43.6187 -116.2146]
[ 41.9688 -121.5681],[ 41.505493 -81.68129 ]
[ 41.9688 -121.5681],[ 43.658333 -116.680278]
[ 41.9688 -121.5681],[ 38.064722 -102.311111]
[ 41.9688 -121.5681],[ 34.85 -92.300278]
[ 41.9688 -121.5681],[ 44.783333 -93.333333]


# 3. Plotting with Folium

The below cell checks the version type of folium, a mapping package that uses the Leaflet.js library - this lets us know what plug-ins and features are available to us in this Python notebook. More information can be found on https://pypi.python.org/pypi/folium/0.1.5.

In [28]:
#Version of folium
print(folium.__version__)

0.1.5


In [29]:
import folium
from IPython.display import HTML

def display(m, height=300):
    """Takes a folium instance and embed HTML."""
    m._build_map()
    srcdoc = m.HTML.replace('"', '&quot;')
    embed = HTML('<iframe srcdoc="{0}" '
                 'style="width: 100%; height: {1}px; '
                 'border: none"></iframe>'.format(srcdoc, height))
    return embed

Now that we have imported folium to make a map displaying the start and end locations of individuals going to internment camps, we can add location markers to pinpoint the areas of interest, with the cell below:

In [30]:
def plot_locations(table, location_name):
    location_coords = table.column(location_name + ' COORDS')
    # New US Map with Stamen Terrain
    m = folium.Map(location=[39.828175, -98.5795], zoom_start=4, tiles='Stamen Terrain')
    # Loop through table
    for i in range(table.num_rows):
        coords = location_coords[i]
        label = table.column(location_name)[i]
        m.simple_marker(location = coords, popup = label)
    return m
m = plot_locations(tulean_dispatch, 'DESTINATION')
display(m)
# Try zooming out

Now that we have imported folium to make a map displaying the start and end locations of individuals going to internment camps, we can add location markers to pinpoint the areas of interest, with the cell below:

In [174]:
def plot_from_to(table, from_location, to_location):
    from_to_array = table.select([from_location + ' COORDS', to_location + ' COORDS']).rows
    # New US Map with Stamen Terrain
    m = folium.Map(location=[39.828175, -98.5795], zoom_start=4, tiles='Stamen Terrain')
    # Loop through from_to_array/table
    for i in range(table.num_rows):
        from_to_coords = from_to_array[i]
        from_label = table.column(from_location)[i]
        to_label = table.column(to_location)[i]
        m.simple_marker(location = from_to_coords[0], popup = from_label)
        m.simple_marker(location = from_to_coords[1], popup = to_label)
        m.line(from_to_coords, line_weight=3, line_color = '#a3c6ff', line_opacity = 0.7)
    return m
m = plot_from_to(tulean_dispatch, 'RELOCATION PROJECT', 'DESTINATION')
display(m)