# Lynching Town Newspapers

This notebook outlines my steps in finding newspapers from towns where lynchings occurred and scraping Chron Am to build datasets of those newspapers. 

In [1]:
import pandas as pd
from geopy.geocoders import Nominatim
import folium
import time
import numpy as np
import requests
from bs4 import BeautifulSoup
from datetime import timedelta
from datetime import datetime

### 1) Find localities in our data where lynchings occurred.

In [2]:
seguin_rigby_df = pd.read_csv('seguin_rigby_data_black_subset_02.csv')

I create a 'place' column in the data so we can review city and state at once.

In [9]:
seguin_rigby_df['place'] = seguin_rigby_df['city'] + ', ' + seguin_rigby_df['state']

To review the number of lynchings per place in our data, I use .value_counts(). I thought this would be the best way to find lynching towns, but not so much. Still, it's helpful to see the relative rates by place.

In [4]:
seguin_rigby_df['place'].value_counts()

place
marshall, TX       8
paris, TX          7
jefferson, TX      7
hemphill, TX       6
dodge, TX          6
                  ..
brownsville, TN    1
fairfield, TX      1
gate city, VA      1
elkhorn, WV        1
fort worth, TX     1
Name: count, Length: 327, dtype: int64

### 2) Geolocate lynching towns.

I used geopy and Nomatim to find lat/long data for towns. The results are mostly good, but a fair number of mis-located places (see map below). Nomatim doesn't seem to be as accurate as using Chron Am's place metadata, but then, I wouldn't know exactly how to cross-reference that metadata with the lynchings... Anyway, these libraries worked fine for now.

In [11]:
geolocator = Nominatim(user_agent='its_me')

def get_lat_long(place):
    try:
        location = geolocator.geocode(place)
        if location:
            return location.latitude, location.longitude
        else:
            return None, None
    except Exception as e:
        return None, None

seguin_rigby_df['latitude'], seguin_rigby_df['longitude'] = zip(*seguin_rigby_df['place'].apply(get_lat_long))

time.sleep(1)

In [12]:
seguin_rigby_df

Unnamed: 0,caseid,year,month,day,full_fips,state,state_fips,county,county_fips,city,...,race,alleged_offense,lynch_method,composition_of_mob,source_of_record,confirming_document,search_url,place,latitude,longitude
0,2740,1883,7,6,17153,IL,17,PULASKI,153,mound city,...,Black,Murder,Shot then hanged,25,Chicago Daily Tribune,Dixon Sun,https://chroniclingamerica.loc.gov/search/page...,"mound city, IL",37.085329,-89.162573
1,2795,1883,2,20,18019,IN,18,CLARK,19,sellersburg,...,Black,Sexual assault,Hanged,Mob,Chicago Daily Tribune,Logansport Journal,https://chroniclingamerica.loc.gov/search/page...,"sellersburg, IN",38.397972,-85.755073
2,4169,1883,6,30,29163,MO,29,PIKE,163,bowling green,...,Black,Unknown,Hanged,Mob,Chicago Daily Tribune,"Marion County Herald/ July 6, 1883",https://chroniclingamerica.loc.gov/search/page...,"bowling green, MO",39.341989,-91.195144
3,6745,1883,6,26,48067,TX,48,CASS,67,jefferson,...,Black,Rape,Hanged,500,Chicago Daily Tribune,"Galveston Daily News 6-23-1883, 6-34-1883",https://chroniclingamerica.loc.gov/search/page...,"jefferson, TX",29.834772,-94.170450
4,6746,1883,6,27,48067,TX,48,CASS,67,jefferson,...,Black,Rape,Hanged,500,Chicago Daily Tribune,Galveston Daily News 6-28-1883,https://chroniclingamerica.loc.gov/search/page...,"jefferson, TX",29.834772,-94.170450
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
444,New_Tx_Rigby_19,1921,10,12,48063,TX,48,CAMP,63,leesburg,...,Black,Assauting a child,Burned at stake,500,,Wise County Messenger 10-14-1921,https://chroniclingamerica.loc.gov/search/page...,"leesburg, TX",32.987623,-95.083827
445,7692,1921,11,26,48199,TX,48,HARDIN,199,sour lare,...,Black,Unknown,Hanged,300,NAACP Thirty Years of Lynching (1919-1946 Appe...,"Wichita Daily Times November 27, 1921",https://chroniclingamerica.loc.gov/search/page...,"sour lare, TX",,
446,7693,1921,11,30,48399,TX,48,RUNNELS,399,ballinger,...,Black,Unknown,Shot,Mob,NAACP Thirty Years of Lynching (1919-1946 Appe...,"The Courier-Gazette December 1, 1921 | The Eag...",https://chroniclingamerica.loc.gov/search/page...,"ballinger, TX",31.741105,-99.953070
447,7695,1921,12,11,48183,TX,48,GREGG,183,gladwater,...,Black,Unknown,Hanged,Mob,NAACP Thirty Years of Lynching (1919-1946 Appe...,"Wichita Daily Times December 13, 1921 | San An...",https://chroniclingamerica.loc.gov/search/page...,"gladwater, TX",32.633861,-97.343345


Note to self: add updated seguin_rigby_data_black_subset_02.csv to GitHub.

In [22]:
seguin_rigby_df.to_csv('seguin_rigby_data_black_subset_02.csv')

I create a map of all the lynchings in our subset of the data:

In [19]:
map_start_point = [39.8283, -98.5795]

map = folium.Map(location=map_start_point, tiles="Cartodb Positron", zoom_start=4)

for index, row in seguin_rigby_df.iterrows():

    if np.isnan(row['latitude']) or np.isnan(row['longitude']):
        continue

    tooltip = f"<div style='font-size: 11pt'>{row['victim']}</div>" \
              f"<div style='font-size: 11pt'>{row['place']}</div>" \
              f"<div style='font-size: 11pt'>{row['year']}</div>"

    folium.Circle(
        [row['latitude'], row['longitude']],
        tooltip=tooltip,
        color='darkred',
        radius=10
    ).add_to(map)

map

So, this was intriguing. Why are there virtually no lynchings in the Southern states? Seguin/Rigby's article claims the South was the central location of Black lynchings in their data, but in our subset, this is not the case. Is it because we subset by 1883 to 1921? And by "victim's name known"? I will have to look over the full Seguin/Rigby dataset again. Just seems strange...

You know what, I think this could be a significant and meaningful observation. If Southern lynchings typically did not leave records of victim's names, it says something about the attitude toward those victims. There's a certain level of humanity granted to naming the victim. It would be a significant erasure.

### 3) Identify Lynching Towns with Newspapers

I haven't come up with a way to do this programmatically. I'd need all the geolocations of the newspapers in Chron Am and their coverage dates. Idk. Could be done, but rather than spend time figuring it out, I've been looking for candidates by hand by cross-referencing my map of lynching incidents to Chron Am's map of digitized newspapers ([__found here__](https://loc.maps.arcgis.com/apps/instant/media/index.html?appid=3c6a392554d545bdb1c083348ef56458&center=-97.5126;39.6376&level=3)). I thought this would be easier than it is... But the limitations of Chron Am's data reveals itself here in ways you don't realize looking at the data as rows or lists. That is, the map makes you see the gaps in spaces where surely there were newspapers, but they are not digitized or available. What this means is it is surprisingly difficult to find towns where lynchings occurred AND where there is available digitized newspaper data.

That being said, after a couple hours cross-referencing the maps by hand, I've found a few:

__Newspaper:__ Peninsula Enterprise, Acconomac, VA <br>
Page: https://chroniclingamerica.loc.gov/lccn/sn94060041/ <br>
__Incident:__ Magruder Fletcher, Tasley, VA <br>
A candidate because there is coverage before, during, and after the incident (1889). Acconomac and Tasley are only a couple miles apart. If the Peninsula Enterprise covered local issues at all, it would have likely covered happenings in Tasley. The Magruder Fletcher csv has 28 hits, too, making it a fairly widely reported incident.
<br>
<br>
__Newspaper:__ Maryland Independent, La Plata, MD <br>
Page: https://chroniclingamerica.loc.gov/lccn/sn85025407/ <br>
__Incident:__ Joseph Cocking, Port Tobacco, MD <br>
Another good candidate. There is coverage before, during, and after the incident (1896). La Plata and Port Tobacco are only about 3 miles from each other. The Joseph Cocking csv has 70 hits, making it a widely reported incident.
<br>
<br>
__Newspaper:__ Lexington Intelligencer, Lexington, MO <br>
Page: https://chroniclingamerica.loc.gov/lccn/sn86063623/ <br>
__Incident:__ Harry Gates, Lexington, MO <br>
Another good candidate. There's also the Weekly Intelligencer which published the year prior to the Harry Gates murder (1902), but not the year of.There are 101 hits in the Harry Gates csv, too. I'm going to just use the Lexington Intelligencer for now, but if you want to add the Weekly Intelligencer later, here it is: https://chroniclingamerica.loc.gov/lccn/sn93060416/


### 4) Build Lynching Town Newspaper Subsets

I scraped Chron Am for the above newspapers. I pulled two years before the incident, the year of the incident, and the year after. I chose this range after double-checking it was covered by each paper (that each paper actually had digitized pages for their respective years surrounding the lynchings). I'm not sure if a four-year range is adequate. It's based on nothing but vibes. If I need more data, I'll have to go back and get it, but that's okay.

I saved the results in a new directory called lynching_town_newspapers. The files are:

- peninsula_enterprise_1888-90.csv (associated with Magruder Fletcher)
- maryland_independent_1894-97.csv (associated with Joseph Cocking)
- lexington_intelligencer_1899-1903.csv (associated with Harry Gates)

To do this, I used scraping code I'd written last year as part of another project. It is different than the scrape_carefully() function I've used elsewhere in this project. It starts with setting parameters. It's important to double-check these details with Chron Am (i.e., see how many pages per paper, that your date ranges are covered by Chron Am, etc). But then, you can enter the parameters in these objects:

In [24]:
target_paper = {
    'sn86063623': ('Lexington Intelligencer', 'Lexington', 'MO', 'harry gates')
}

START_DATE = datetime(1899, 1, 1)
END_DATE = datetime(1903, 12, 31)
DATE_FORMAT = "%Y-%m-%d"
START_PAGE = 1
END_PAGE = 9
iterating_date = START_DATE

Then create some necessary shells and functions:

In [25]:
df = pd.DataFrame(columns=['url', 'text', 'date', 'newspaper', 'city', 'state', 'victim_association'])

def pull_row(data, new_row):
    data.loc[len(data)] = new_row

And finally, loop through the possible url combinations for the newspaper:

In [26]:
hold_up_wait = 10

while iterating_date <= END_DATE:

    formatted_date = iterating_date.strftime(DATE_FORMAT)

    for sn_code, (newspaper, city, state, victim) in target_paper.items():

        consistent_url = f'https://chroniclingamerica.loc.gov/lccn/{sn_code}/'

        for page in range(START_PAGE, END_PAGE):

            url_string = f'{consistent_url}{formatted_date}/ed-1/seq-{page}/ocr/'

            print(url_string)

            try:

                pulled_data = requests.get(url_string, timeout=hold_up_wait)

                if pulled_data.status_code == 200:
                    soup = BeautifulSoup(pulled_data.content, 'lxml')
                    text_chunks = soup.find_all('p')
                    text = ' '.join([p.get_text() for p in text_chunks])
                    pull_row(df, [url_string, text, formatted_date, newspaper, city, state, victim])

            except requests.exceptions.Timeout:
                print(f"Timeout occurred for URL: {url_string}")
            except requests.exceptions.RequestException as e:
                print(f"An error occurred: {e}")

    iterating_date += timedelta(days=1)

https://chroniclingamerica.loc.gov/lccn/sn86063623/1899-01-01/ed-1/seq-1/ocr/
https://chroniclingamerica.loc.gov/lccn/sn86063623/1899-01-01/ed-1/seq-2/ocr/
https://chroniclingamerica.loc.gov/lccn/sn86063623/1899-01-01/ed-1/seq-3/ocr/
https://chroniclingamerica.loc.gov/lccn/sn86063623/1899-01-01/ed-1/seq-4/ocr/
https://chroniclingamerica.loc.gov/lccn/sn86063623/1899-01-01/ed-1/seq-5/ocr/
https://chroniclingamerica.loc.gov/lccn/sn86063623/1899-01-01/ed-1/seq-6/ocr/
https://chroniclingamerica.loc.gov/lccn/sn86063623/1899-01-01/ed-1/seq-7/ocr/
https://chroniclingamerica.loc.gov/lccn/sn86063623/1899-01-01/ed-1/seq-8/ocr/
https://chroniclingamerica.loc.gov/lccn/sn86063623/1899-01-02/ed-1/seq-1/ocr/
https://chroniclingamerica.loc.gov/lccn/sn86063623/1899-01-02/ed-1/seq-2/ocr/
https://chroniclingamerica.loc.gov/lccn/sn86063623/1899-01-02/ed-1/seq-3/ocr/
https://chroniclingamerica.loc.gov/lccn/sn86063623/1899-01-02/ed-1/seq-4/ocr/
https://chroniclingamerica.loc.gov/lccn/sn86063623/1899-01-02/ed

In [27]:
df

Unnamed: 0,url,text,date,newspaper,city,state,victim_association
0,https://chroniclingamerica.loc.gov/lccn/sn8606...,"I,emgton Intelligencer;.. tifrLEXINGTON, LAFAY...",1901-04-13,Lexington Intelligencer,Lexington,MO,harry gates
1,https://chroniclingamerica.loc.gov/lccn/sn8606...,"The Intelligencer.I. 0. NEALE,Editor and Propr...",1901-04-13,Lexington Intelligencer,Lexington,MO,harry gates
2,https://chroniclingamerica.loc.gov/lccn/sn8606...,Gable OM SUn4 Wnt of th Court Noqm.bs. Granula...,1901-04-13,Lexington Intelligencer,Lexington,MO,harry gates
3,https://chroniclingamerica.loc.gov/lccn/sn8606...,"WV i! IThe Intelligencer.I. O. NEALE,Editor aa...",1901-04-13,Lexington Intelligencer,Lexington,MO,harry gates
4,https://chroniclingamerica.loc.gov/lccn/sn8606...,4.rsTBIGST 0 ROF!efut1hat sets the Pace the wh...,1901-04-13,Lexington Intelligencer,Lexington,MO,harry gates
...,...,...,...,...,...,...,...
1110,https://chroniclingamerica.loc.gov/lccn/sn8606...,The Intelligencer.Subscription $1.00 Per YeakI...,1903-12-26,Lexington Intelligencer,Lexington,MO,harry gates
1111,https://chroniclingamerica.loc.gov/lccn/sn8606...,e OUNTY NEWSCorder Items.A 1-ih Corria is quit...,1903-12-26,Lexington Intelligencer,Lexington,MO,harry gates
1112,https://chroniclingamerica.loc.gov/lccn/sn8606...,"3!'ed wi'.h ('onire m., re i:.Hi' I r treR Utn...",1903-12-26,Lexington Intelligencer,Lexington,MO,harry gates
1113,https://chroniclingamerica.loc.gov/lccn/sn8606...,INKLEE I; i4 ZFURNITURE-'no HP A NTjtANUFACTUR...,1903-12-26,Lexington Intelligencer,Lexington,MO,harry gates


In [28]:
df.to_csv('lexington_intelligencer_1899-1903.csv')