# Steps Toward Identifying Lynching Towns with Nearby Newspapers that Have Been Digitized

These steps required the cross-referencing of four datasets:

- Our Seguin & Rigby subset of black victims: [https://github.com/MatthewKollmer/messing-around/blob/main/vrt_work/say_their_names/seguin_rigby_data_black_subset_02.csv](https://github.com/MatthewKollmer/messing-around/blob/main/vrt_work/say_their_names/seguin_rigby_data_black_subset_02.csv)
- DBpedia's place metadata: [https://github.com/ViralTexts/newspaper-metadata/blob/main/places.csv](https://github.com/ViralTexts/newspaper-metadata/blob/main/places.csv)
- Viral Texts' dbpedia metadata for newspapers: [https://raw.githubusercontent.com/ViralTexts/newspaper-metadata/refs/heads/main/series.csv](https://raw.githubusercontent.com/ViralTexts/newspaper-metadata/refs/heads/main/series.csv)
- Chronicling America's digitized newspaper data: [https://chroniclingamerica.loc.gov/newspapers.txt](https://chroniclingamerica.loc.gov/newspapers.txt)

Here's basically what I've done:

1) I cross-referenced VT's dbpedia data with Chron Am's digitized newspaper data. Where there were matches, I added the dbpedia link for location to the digitized newspaper data.
2) I cross-referenced these dbpedia links to the dbpedia places metadata. Where there were matches, I added the dbpedia latitude and longitude data. This gave me the lat/long for each digitized newspaper.
3) After lots of Googling and inquiring with ChatGPT, I focused on what's called the Haversine formula for getting distances between newspaper locations and lynching town locations. This formula calculates the distances between points on a sphere. I'm not an expert on it or anything, but basically this formula calculates the angle between two points and the center of the sphere. Then it multiplies the angle by the size of the sphere to get the distance over the curved surface between the two points. After reading about this method, I adapted code from this Stack Overflow thread: [https://stackoverflow.com/questions/4913349/haversine-formula-in-python-bearing-and-distance-between-two-gps-points](https://stackoverflow.com/questions/4913349/haversine-formula-in-python-bearing-and-distance-between-two-gps-points). This resulted in the function haversine_calculation() described below.
4) Using this function, I iterated over the lats/longs in my newspapers dataframe (the one enriched with location data through the processes in steps 1 and 2) and I iterated over the lats/longs in the Seguin & Rigby Black victim subset. You may remember I  added lats/longs to the Seguin & Rigby data in [this notebook](https://github.com/MatthewKollmer/messing-around/blob/main/vrt_work/say_their_names/identify_build_lynching_town_datasets.ipynb). If their locations were within 10 miles of each other, I considered them matches. This 10 mile threshold is just off vibes. It should probably be adjusted based on any information we have about the distance of newspaper circulation and more refined definitions of 'local'. But if there were any matches within 10 miles of each other, I combined the rows in a new dataset called 'nearby_papers_cases'. This resulted in 745 newspapers within 10 miles of where lynching occurred.
5) I still needed to deduce whether these newspapers had digitized pages within the same timeframe as the lynching, though. To do that, I iterated over the First Issue Date and Last Issue Date columns, checking to see if the corresponding lynching date landed within those timeframes. After doing this, I was able to identify about 25 lynching cases that have local papers with digitized coverage.
6) I then mapped the results so it's easier to review them.

There are lots of little processing steps in between these things, but I'm trying to be as clear as possible where it matters.

In [1]:
import pandas as pd
import numpy as np
import folium

### 1. Cross-Referencing VT's dbpedia data with Chron Am's digitized nespaper data:

In [2]:
coverage = pd.read_csv('https://raw.githubusercontent.com/ViralTexts/newspaper-metadata/refs/heads/main/series.csv')

coverage = coverage[coverage['series'].str.contains(r'/lccn/', na=False)]
coverage['series'] = coverage['series'].str.replace(r'/lccn/', '', regex=True)
coverage.head()

Unnamed: 0,series,title,lang,publisher,placeOfPublication,corpora,coverage
0,62183,Polak amerykański = American Pole.,pl,Buffalo Polish Pub. Co.,"Buffalo, N.Y.",,http://dbpedia.org/resource/Buffalo%2C_New_York
1,64000,The apostolic bulletin.,en,Board of Elders of the Church of God in Christ,"Temple, Bell County, Tex.",,http://dbpedia.org/resource/Temple%2C_Texas
2,64001,Central Texas oil journal.,en,[s.n.],"Temple, Bell County, Tex.",,http://dbpedia.org/resource/Temple%2C_Texas
3,64002,Bell County socialist.,en,Socialist Co-Operative Pub. Co.,"Temple, Bell County, Tex.",,http://dbpedia.org/resource/Temple%2C_Texas
4,64003,Central Texas forum.,en,Bell County Farmer's Alliance,"Temple, Bell County, Tex.",,http://dbpedia.org/resource/Temple%2C_Texas


In [3]:
# this is Chron Am's digitized newspaper data linked above. I downloaded it a converted it to a csv file.
newspapers = pd.read_csv('newspapers.csv')
newspapers.head()

Unnamed: 0,Persistent Link,State,Title,LCCN,OCLC,ISSN,No. of Issues,First Issue Date,Last Issue Date,More Info
0,https://chroniclingamerica.loc.gov/lccn/sn8607...,Alabama,The age-herald. [volume] (Birmingham Ala.) 189...,sn86072192,14948274,2692-4099,1630,Aug. 1 1897,May 20 1902,https://chroniclingamerica.loc.gov/lccn/sn8607...
1,https://chroniclingamerica.loc.gov/lccn/sn8402...,Alabama,Alabama state intelligencer. [volume] (Tuscalo...,sn84021903,2683862,2574-4089,50,Jan. 1 1831,Dec. 24 1831,https://chroniclingamerica.loc.gov/lccn/sn8402...
2,https://chroniclingamerica.loc.gov/lccn/sn8503...,Alabama,The Birmingham age-herald. [volume] (Birmingha...,sn85038485,12607279,2692-6318,8237,May 21 1902,Dec. 31 1924,https://chroniclingamerica.loc.gov/lccn/sn8503...
3,https://chroniclingamerica.loc.gov/lccn/sn8402...,Alabama,Birmingham age-herald. [volume] (Birmingham Al...,sn84020639,4066065,2692-4226,423,July 1 1894,Oct. 3 1895,https://chroniclingamerica.loc.gov/lccn/sn8402...
4,https://chroniclingamerica.loc.gov/lccn/sn8504...,Alabama,Birmingham state herald. (Birmingham Ala.) 189...,sn85044812,12283890,2692-4250,570,Oct. 4 1895,July 31 1897,https://chroniclingamerica.loc.gov/lccn/sn8504...


In [4]:
newspapers = newspapers.merge(coverage[['series', 'coverage']], left_on='LCCN', right_on='series', how='left')
newspapers = newspapers.drop(columns=['series'])

### 2. Cross-Referencing Places to Get Lats/Longs of Every Digitized Newspaper in Chron Am

In [5]:
places = pd.read_csv('https://raw.githubusercontent.com/ViralTexts/newspaper-metadata/main/places.csv')
places.head()

Unnamed: 0,coverage,wdid,lon,lat,label,country,topdiv,region
0,http://dbpedia.org/resource/'s-Graveland,Q1615351,5.1211,52.2442,'s-Graveland,Netherlands,North Holland,Netherlands
1,http://dbpedia.org/resource/'s-Gravendeel,Q425780,4.616667,51.783333,'s-Gravendeel,Netherlands,South Holland,Netherlands
2,http://dbpedia.org/resource/'s-Hertogenbosch,Q9807,5.3031,51.6892,'s-Hertogenbosch,Netherlands,North Brabant,Netherlands
3,http://dbpedia.org/resource/Aaronsburg%2C_Cent...,Q303023,-77.453383,40.900946,Aaronsburg,United States of America,Pennsylvania,EUSA
4,http://dbpedia.org/resource/Abbeville%2C_Alabama,Q79806,-85.251389,31.566389,Abbeville,United States of America,Alabama,EUSA


In [6]:
newspapers = newspapers.merge(places[['coverage', 'lon', 'lat']], on='coverage', how='left')
newspapers.head()

Unnamed: 0,Persistent Link,State,Title,LCCN,OCLC,ISSN,No. of Issues,First Issue Date,Last Issue Date,More Info,coverage,lon,lat
0,https://chroniclingamerica.loc.gov/lccn/sn8607...,Alabama,The age-herald. [volume] (Birmingham Ala.) 189...,sn86072192,14948274,2692-4099,1630,Aug. 1 1897,May 20 1902,https://chroniclingamerica.loc.gov/lccn/sn8607...,http://dbpedia.org/resource/Birmingham%2C_Alabama,-86.809444,33.5175
1,https://chroniclingamerica.loc.gov/lccn/sn8402...,Alabama,Alabama state intelligencer. [volume] (Tuscalo...,sn84021903,2683862,2574-4089,50,Jan. 1 1831,Dec. 24 1831,https://chroniclingamerica.loc.gov/lccn/sn8402...,http://dbpedia.org/resource/Tuscaloosa%2C_Alabama,-87.534722,33.206667
2,https://chroniclingamerica.loc.gov/lccn/sn8503...,Alabama,The Birmingham age-herald. [volume] (Birmingha...,sn85038485,12607279,2692-6318,8237,May 21 1902,Dec. 31 1924,https://chroniclingamerica.loc.gov/lccn/sn8503...,http://dbpedia.org/resource/Birmingham%2C_Alabama,-86.809444,33.5175
3,https://chroniclingamerica.loc.gov/lccn/sn8402...,Alabama,Birmingham age-herald. [volume] (Birmingham Al...,sn84020639,4066065,2692-4226,423,July 1 1894,Oct. 3 1895,https://chroniclingamerica.loc.gov/lccn/sn8402...,http://dbpedia.org/resource/Birmingham%2C_Alabama,-86.809444,33.5175
4,https://chroniclingamerica.loc.gov/lccn/sn8504...,Alabama,Birmingham state herald. (Birmingham Ala.) 189...,sn85044812,12283890,2692-4250,570,Oct. 4 1895,July 31 1897,https://chroniclingamerica.loc.gov/lccn/sn8504...,http://dbpedia.org/resource/Birmingham%2C_Alabama,-86.809444,33.5175


And there you can see, I've got all the digitized newspapers in Chron Am and their lats/longs.

In [7]:
seguin_rigby = pd.read_csv('seguin_rigby_data_black_subset_02.csv')
seguin_rigby.head()

Unnamed: 0.1,Unnamed: 0,caseid,year,month,day,full_fips,state,state_fips,county,county_fips,...,race,alleged_offense,lynch_method,composition_of_mob,source_of_record,confirming_document,search_url,place,latitude,longitude
0,0,2740,1883,7,6,17153,IL,17,PULASKI,153,...,Black,Murder,Shot then hanged,25,Chicago Daily Tribune,Dixon Sun,https://chroniclingamerica.loc.gov/search/page...,"mound city, IL",37.085329,-89.162573
1,1,2795,1883,2,20,18019,IN,18,CLARK,19,...,Black,Sexual assault,Hanged,Mob,Chicago Daily Tribune,Logansport Journal,https://chroniclingamerica.loc.gov/search/page...,"sellersburg, IN",38.397972,-85.755073
2,2,4169,1883,6,30,29163,MO,29,PIKE,163,...,Black,Unknown,Hanged,Mob,Chicago Daily Tribune,"Marion County Herald/ July 6, 1883",https://chroniclingamerica.loc.gov/search/page...,"bowling green, MO",39.341989,-91.195144
3,3,6745,1883,6,26,48067,TX,48,CASS,67,...,Black,Rape,Hanged,500,Chicago Daily Tribune,"Galveston Daily News 6-23-1883, 6-34-1883",https://chroniclingamerica.loc.gov/search/page...,"jefferson, TX",29.834772,-94.17045
4,4,6746,1883,6,27,48067,TX,48,CASS,67,...,Black,Rape,Hanged,500,Chicago Daily Tribune,Galveston Daily News 6-28-1883,https://chroniclingamerica.loc.gov/search/page...,"jefferson, TX",29.834772,-94.17045


### 3. The Haversine Formula Function:

In [8]:
def haversine_calculation(longitude_1, latitude_1, longitude_2, latitude_2):
    # earth's radius in miles, according to Google
    earth_radius = 3958.8
    # You've got to convert the latitudes and longitudes to radians. Radians are just standardized units of measurement for angles.
    longitude_1, latitude_1, longitude_2, latitude_2 = map(np.radians, [longitude_1, latitude_1, longitude_2, latitude_2])
    # then calculate the distances between both lats and longs
    longitude_distance = longitude_2 - longitude_1
    latitude_distance = latitude_2 - latitude_1
    a = np.sin(latitude_distance/2)**2 + np.cos(latitude_1) * np.cos(latitude_2) * np.sin(longitude_distance/2)**2
    c = 2 * np.arcsin(np.sqrt(a))
    distance = earth_radius * c
    return distance

### 4. Using haversine_calculation() to Find Matches Within 10 Miles of Each Other

In [9]:
matches = []

for idx, row in newspapers.iterrows():
    longitude_1, latitude_1 = row['lon'], row['lat']
    
    within_ten_miles = seguin_rigby.apply(lambda x: haversine_calculation(longitude_1, latitude_1, x['longitude'], x['latitude']) <= 10, axis=1)
    
    if within_ten_miles.any():
        matching_rows = seguin_rigby[within_ten_miles]
        
        for _, match in matching_rows.iterrows():
            match_data = row.to_dict()
            match_data.update({
                'year': match['year'],
                'month': match['month'],
                'day': match['day'],
                'victim': match['victim'],
                'place': match['place'],
                'lynching_latitude': match['latitude'],
                'lynching_longitude': match['longitude']
            })
            matches.append(match_data)

nearby_papers_cases = pd.DataFrame(matches)

nearby_papers_cases.head()

Unnamed: 0,Persistent Link,State,Title,LCCN,OCLC,ISSN,No. of Issues,First Issue Date,Last Issue Date,More Info,coverage,lon,lat,year,month,day,victim,place,lynching_latitude,lynching_longitude
0,https://chroniclingamerica.loc.gov/lccn/sn8905...,Arkansas,Daily Texarkana Democrat. (Texarkana Ark.) 189...,sn89051301,20120001,,465,July 18 1892,Feb. 1 1894,https://chroniclingamerica.loc.gov/lccn/sn8905...,http://dbpedia.org/resource/Texarkana%2C_Arkansas,-94.020556,33.433056,1906,10,8,anthony davis,"texarkana, TX",33.446674,-94.077148
1,https://chroniclingamerica.loc.gov/lccn/sn8609...,Arkansas,The Daily Texarkanian. [volume] (Texarkana Ark...,sn86090500,14985064,,1935,Feb. 2 1894,Aug. 8 1900,https://chroniclingamerica.loc.gov/lccn/sn8609...,http://dbpedia.org/resource/Texarkana%2C_Arkansas,-94.020556,33.433056,1906,10,8,anthony davis,"texarkana, TX",33.446674,-94.077148
2,https://chroniclingamerica.loc.gov/lccn/sn9105...,Colorado,The Bessemer indicator. (Bessemer Colo.) 18??-...,sn91052321,23245382,2693-3640,98,Jan. 7 1893,Nov. 17 1894,https://chroniclingamerica.loc.gov/lccn/sn9105...,http://dbpedia.org/resource/Pueblo%2C_Colorado,-104.620278,38.266944,1900,5,22,calvin kimblern,"pueblo, CO",38.263995,-104.614187
3,https://chroniclingamerica.loc.gov/lccn/sn8905...,Colorado,Cheyenne record. (Cheyenne Wells Cheyenne Coun...,sn89052329,20790560,2578-1332,403,Sept. 18 1913,June 30 1921,https://chroniclingamerica.loc.gov/lccn/sn8905...,http://dbpedia.org/resource/Cheyenne_Wells%2C_...,-102.354,38.8211,1888,4,0,franklin baker,"cheyenne wells, CO",38.821395,-102.353243
4,https://chroniclingamerica.loc.gov/lccn/sn8905...,Colorado,Cheyenne Wells record. (Cheyenne Wells Cheyenn...,sn89052330,20790667,2578-1383,78,July 7 1921,Dec. 28 1922,https://chroniclingamerica.loc.gov/lccn/sn8905...,http://dbpedia.org/resource/Cheyenne_Wells%2C_...,-102.354,38.8211,1888,4,0,franklin baker,"cheyenne wells, CO",38.821395,-102.353243


In [10]:
len(nearby_papers_cases)

745

nearby_papers_cases contains 745 rows. That's 745 newspapers published within a 10 mile radius of lynchings.
<br>
<br>
<br>

### 5. Deducing Whether Papers Have Digitized Pages Overlapping with Lynching Date

In [11]:
# removing periods in the date columns
nearby_papers_cases['First Issue Date'] = nearby_papers_cases['First Issue Date'].str.replace(r'\.', '', regex=True).str.strip()
nearby_papers_cases['Last Issue Date'] = nearby_papers_cases['Last Issue Date'].str.replace(r'\.', '', regex=True).str.strip()

In [13]:
# converting month abbreviations to their numeral representations. I needed to do this to make the datetime function in Pandas more accurate.
month_codes = {'Jan': '01', 'Feb': '02', 'March': '03', 'April': '04', 'May': '05', 'June': '06', 'July': '07', 'Aug': '08', 'Sept': '09', 'Oct': '10', 'Nov': '11', 'Dec': '12'}

for month, num in month_codes.items():
    nearby_papers_cases['First Issue Date'] = nearby_papers_cases['First Issue Date'].str.replace(month, num, regex=False)
    nearby_papers_cases['Last Issue Date'] = nearby_papers_cases['Last Issue Date'].str.replace(month, num, regex=False)

In [14]:
# more datetime conversions
nearby_papers_cases['First Issue Date'] = pd.to_datetime(nearby_papers_cases['First Issue Date'], errors='coerce')
nearby_papers_cases['Last Issue Date'] = pd.to_datetime(nearby_papers_cases['Last Issue Date'], errors='coerce')
nearby_papers_cases['First Issue Date'] = nearby_papers_cases['First Issue Date'].dt.strftime('%Y-%m-%d')
nearby_papers_cases['Last Issue Date'] = nearby_papers_cases['Last Issue Date'].dt.strftime('%Y-%m-%d')

In [15]:
# I noticed in the Seguin-Rigby data, sometimes the day of the lynching is listed as 0. This is probably because they weren't able to deduce the exact date, just the month/year. In these cases where the day is listed as 0, I'm just adding a 1 so it can be properly cross-verified in our issue date ranges for newspapers.

nearby_papers_cases['day'] = nearby_papers_cases['day'].replace(0, 1)

nearby_papers_cases['lynch_date'] = nearby_papers_cases['year'].astype(str) + '-' + nearby_papers_cases['month'].astype(str) + '-' + nearby_papers_cases['day'].astype(str)
nearby_papers_cases['lynch_date'] = pd.to_datetime(nearby_papers_cases['lynch_date'], errors='coerce')

In [16]:
lynch_town_paper_subset = nearby_papers_cases[(nearby_papers_cases['lynch_date'] >= nearby_papers_cases['First Issue Date']) & (nearby_papers_cases['lynch_date'] <= nearby_papers_cases['Last Issue Date'])]

### 6. The Map

In [17]:
map = folium.Map(location=[39.8283, -98.5795], tiles="Cartodb Positron", zoom_start=4)

# markers for the newspapers - they are wider and black with a radius of 10 miles (16000 meters)
for idx, row in lynch_town_paper_subset.iterrows():
    popup_text = (
        f"<strong>Title:</strong> {row['Title']}<br>"
        f"<strong>First Issue Date:</strong> {row['First Issue Date']}<br>"
        f"<strong>Last Issue Date:</strong> {row['Last Issue Date']}<br>"
        f"<strong>No. of Issues:</strong> {row['No. of Issues']}"
    )
    folium.Circle(
        location=[row['lat'], row['lon']],
        radius=16000,
        color='black',
        popup=folium.Popup(popup_text, max_width=300)
    ).add_to(map)

# markers for lynchings - they are red dots
for idx, row in lynch_town_paper_subset.iterrows():
    if not pd.isna(row['lynching_latitude']) and not pd.isna(row['lynching_longitude']):
        lynch_date_formatted = row['lynch_date'].strftime('%Y-%m-%d')
        popup_text = (
            f"<strong>Victim:</strong> {row['victim']}<br>"
            f"<strong>Lynch Date:</strong> {lynch_date_formatted}<br>"
            f"<strong>Nearest Paper Coverage:</strong> {row['Title']}"
        )
        folium.CircleMarker(
            location=[row['lynching_latitude'], row['lynching_longitude']],
            radius= 4,
            fill='true',
            fill_opacity= 1,
            color='darkred',
            popup=folium.Popup(popup_text, max_width=300)
        ).add_to(map)

map

In [18]:
map.save('lynch_town_paper_map.html')