<a href="https://colab.research.google.com/github/brendanwilliam/germanamericanbund-news-research/blob/main/gab_newsdata_1936to1940.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


## **Newspaper Coverage of the German-American Bund**
For information about the ProQuest queries used to generate these databases, check out the [project GitHub page](https://github.com/brendanwilliam/germanamericanbund-news-research.git)

In [87]:
# Importing
import pandas as pd
import altair as alt
fulldf = pd.read_csv('https://raw.githubusercontent.com/brendanwilliam/germanamericanbund-news-research/main/data/gab-newsdata-1933to1940.csv')
kuhndf = pd.read_csv('https://raw.githubusercontent.com/brendanwilliam/germanamericanbund-news-research/main/data/gabkuhn-newsdata-1933to1940.csv')

# Adding a "isKuhn" column to the "kuhndf" data frame
kuhndf['isKuhn'] = True

# Merging "kuhndf" with "fulldf"
df = pd.merge(fulldf, kuhndf, how='left', on=['Title', 'Abstract', 'documentType', 
                                              'placeOfPublication', 'pubdate', 'pubtitle',
                                              'year', 'DocumentURL', 'startPage'])

# Changing all "NaN" values in "isKuhn" to "False"
df['isKuhn'] = df['isKuhn'].fillna(False)

# Removing years from "pubtitle"
df['pubtitle'] = df['pubtitle'].str.replace('\([^()]*\)', '', regex=True)

# Creating a column of "datetime" type
df['date'] = pd.to_datetime(df['pubdate'])

# Creating a column "total" for later computing rolling averages
df['total'] = 1

# Influential events and dates
dates = pd.DataFrame(
    {'date' : ['1936-03-29', '1936-08-02', '1938-03-01', '1938-06-01', '1938-09-03', '1939-02-20', '1939-05-02', '1939-11-09', '1939-12-05', '1939-08-16'],
     'event' : ['Kuhn elected Bundesleiter of German-American Bund',
                'Kuhn travels to Germany and meets Hitler',
                'Kuhn travels to Germany again, unlikely to have met significant Nazi officials. Marks beginning of more agressive anti-Semitism from German-American Bund',
                'Martin Dies and legal authorities, informed by findings from prosecuting Al Capone, change tactics to address the German-American Bund',
                'German-American Bund national convention, aimed to recruit American citizens',
                'German-American Bund hosts a rally at Madison Square Garden with 20,000 attending',
                'German-American Bund financial records siezed by the New York district attorney Thomas Dewey',
                'Kuhn is tried for embezelling funds related to the February Madison Square Garden rally',
                'Kuhn is found guilty and sentenced to 2.5 to 5 years in prison',
                'Kuhn testifies to the Dies Committee']
        
    }
)

# Converting date into a 'datetime' type
dates['date'] = pd.to_datetime(dates['date'])

# Computing rolling averages of news coverage
df_bydate = df.groupby(pd.Grouper(key='date',freq='D')).sum()
df_rolling = df_bydate.rolling(14).sum()
df_rolling = df_rolling.drop(columns=['year'])
df_rolling = df_rolling.dropna()
df_rolling['date'] = df_rolling.index

# Visualizing rolling average over time
articles_rolling = alt.Chart(df_rolling, title='Newspaper articles mentioning the German-American Bund').mark_area(color='#898989').encode(
    x=alt.X('date'),
    y=alt.Y('total', axis=alt.Axis(title='articles published (14-day rolling avg)')),
    tooltip=['date', 'total']
)

# Key dates throughout Kuhn years
keydates = alt.Chart(dates).mark_tick(size=400, thickness=3, opacity=0.9, color='#F48989').encode(
    x=alt.X('date'),
    tooltip=['date', 'event']
)

# 1939 News Data Wrangling
df_1939 = df.query("date > 19381231 & date < 19400101")
df_bydate_1939 = df_1939.groupby(pd.Grouper(key='date',freq='D')).sum()
df_rolling_1939 = df_bydate_1939.rolling(14).sum()
df_rolling_1939 = df_rolling_1939.drop(columns=['year'])
df_rolling_1939 = df_rolling_1939.dropna()
df_rolling_1939['date'] = df_rolling_1939.index

# 1939 Visualizing rolling average over time
articles_rolling_1939 = alt.Chart(df_rolling_1939, title='Newspaper articles mentioning the German-American Bund in 1939').mark_area(color='#898989').encode(
    x=alt.X('date'),
    y=alt.Y('total', axis=alt.Axis(title='articles published (14-day rolling avg)')),
    tooltip=['date', 'total']
)

# 1939 Date Data Wrangling
dates_1939 = dates.query("date > 19381231")
keydates_1939 = alt.Chart(dates_1939).mark_tick(size=400, thickness=5, opacity=0.9, color='#F48989').encode(
    x=alt.X('date'),
    tooltip=['date', 'event']
)


## **1936 to 1940 Newspaper Coverage of the German American Bund**

Values are based on a 14-day rolling average to better asses magnitude and frequency of coverage. Red bars represent key dates. Hover over elements for more details.

In [85]:
(articles_rolling + keydates).properties(height=400, width=1200)

## **Only 1939 Newspaper Coverage of the German American Bund**

Values are based on a 14-day rolling average to better asses magnitude and frequency of coverage. Red bars represent key dates. Hover over elements for more details.

In [86]:
(articles_rolling_1939 + keydates_1939).properties(height=400, width=1200)

## **1939 Newspaper Data Breakdown**
Newspapers published in 1939 are grouped by different attributes including publisher, publishing locaiton, and mentioning Fritz Kuhn.

In [88]:
# Investigating news coverage during 1939
news_1939 = df.query("date > 19381231 & date < 19400101").sort_values('date')

# Grouping by different attributes
num_articles = news_1939.shape[0]
articletype_dist = news_1939.groupby(by='documentType').size()
pub_dist = news_1939.groupby(by='pubtitle').size()
loc_dist = news_1939.groupby(by='placeOfPublication').size()
kuhn_dist = news_1939.groupby(by='isKuhn').size()

print("Articles written in 1939:", num_articles)
print("\nArticle type:", articletype_dist)
print("\nMentioning Fritz Kuhn:", kuhn_dist)
print("\n\nDistribution by newspaper:", pub_dist)
print("\n\nDistribution by location:", loc_dist)

Articles written in 1939: 575

Article type: documentType
 Article                   2
 Front Page                1
 Letter To The Editor      1
 article                 374
 banner                    1
 editorial_article        24
 front_page              164
 letter_to_editor          7
 obituary                  1
dtype: int64

Mentioning Fritz Kuhn: isKuhn
False    147
True     428
dtype: int64


Distribution by newspaper: pubtitle
Atlanta Daily World                 1
Chicago Daily Tribune              17
Daily Boston Globe                 25
Jewish Advocate                    20
Los Angeles Times                  33
New York Herald Tribune           116
New York Times                     93
South China Morning Post            7
The American Israelite              7
The Atlanta Constitution           31
The Austin American                 1
The Austin Statesman               17
The Christian Science Monitor      39
The Globe and Mail                 23
The Hartford Courant        

## **Analyzing three key timeframes**

The following three timeframes are based on the three greatest peaks in 1939 German-American Bund newspaper coverage:

1. February 20th, 1939 - May 19th, 1939
2. August 9th, 1939 - November 8th, 1939
3. November 9th, 1939 - February 8th, 1940

In [89]:
# Different "timeframes" of coverage...
rally_3M = df.query("date > 19390219 & date < 19390520").sort_values('date')
before_3M = df.query("date > 19390808 & date < 19391109").sort_values('date')
after_3M = df.query("date > 19391108 & date < 19400209").sort_values('date')

# Total size
rally_size = rally_3M.shape[0]
before_size = before_3M.shape[0]
after_size = after_3M.shape[0]
print("Articles published 3 months after MSG rally:", rally_size)
print("Articles published 3 months before trial:", before_size)
print("Articles published 3 months during/after trial:", after_size)

# isKuhn split
kuhn_rally = rally_3M.groupby(by='isKuhn').size()
kuhn_before = before_3M.groupby(by='isKuhn').size()
kuhn_after = after_3M.groupby(by='isKuhn').size()
print("\n3M after MSG", kuhn_rally)
print("\n3M before trial", kuhn_before)
print("\n3M after trial", kuhn_after)

Articles published 3 months after MSG rally: 149
Articles published 3 months before trial: 174
Articles published 3 months during/after trial: 175

3M after MSG isKuhn
False    73
True     76
dtype: int64

3M before trial isKuhn
False     39
True     135
dtype: int64

3M after trial isKuhn
False     12
True     163
dtype: int64
