<center><b><h2>  Where were revolutions located between 1840 and 1860, based on articles titles from Dutch newspapers?<\center><\b>



---------------------------------------------------------------------------------------------------------------------------
The dataset was constructed querying Delpher on the keyword 'revolutie' for the period 1840-1860. Since Delpher has no bulk download option, we've instead used [I-Analyzer](https://ianalyzer.hum.uu.nl) (developed by the [DH Lab at Utrecht University](https://dig.hum.uu.nl/)). This tool allows to query and download Delpher search results for this period.

To plot the locations mentioned in the article titles several steps are needed: Named Entity Recognition, extracting the locations, geocoding these, and obtaining a map to plot them. Here we've used (mainly) R for this, but Python would work as well. 

Apart from having R installed, you'll also need a python executable with [spaCy](https://spacy.io/usage) installed, and the corresponding [Dutch language model](https://spacy.io/models/nl#nl_core_news_sm). If you opt for a Google Map you'll also need a [Google API key for its Maps platform](https://developers.google.com/maps/faq). This is free as long as you don't exceed $200 in API calls a month (which usually does't happen).

---------------------------------------------------------------------------------------------------------------------------

First we load the required libraries

In [32]:
library(data.table)
library(spacyr)
library(maps)
library(rasterVis)
library(raster)
library(ggmap)
library(ggplot2)


Load the dataset (currently locally stored)

In [3]:
revol <- fread("C:\\Users\\Schal107\\Documents\\UBU\\Team DH\\Delpher\\dutchnewspapers-public_query=revolutie_date=1840-01-01 1860-12-31.csv")
setDT(revol)

Inspect the first row of data and the column names. You'll see that it's long because the content of the article is still included. We will not work with that now, since many article_titles already mention a placename. In a next step we could include article content, of course, and use the same workflow as below.

In [35]:
head(revol[1])

date,issue_number,category,circulation,temporal,article_title,content,url,id,language,newspaper_title,ocr,pub_place,source
1851-12-30,106,artikel,Verenigde Staten,Dag,REVOLUTIE IN FRANKERIJK,"Met de stoomboot Europa werd over eenige j| dagen liet berigt te New-York aangebragt, dat;' President Louis Napoleon"""" de Wetgevende Iva-j. raer van Frankrijk lieeft uiteen gejaagd en president wensebt te worden voor den tijd van 10 ja- j: ren. In deze maand nog moesten de nieuwe al- j gemeene verkiezingen plaats grijpen. Er heeft iwederom bloed in de straten van Parijs gestroomd i cn de hoofdstad werd in staat van beleg verklaard, j Op onderscheidene plaatsen werden barricaden opgeworpen en op verschillende punten hevig gevochten. Twee leden der wetgevende vergadering zijn bij de barricaden gesneuveld, terwijl anderen gevangen zijn of bewaakt worden. President Napoleos beeft openlijk de teugels des be- j winds in handen genomen, en heeft het oog op de , pnpularitiet eener algemeene verkiezings-wet en : den sterken arm der militairen om hem te onder- â– steunen. Dit zal ongetwijfeld den aanvang eener ; revolutie zijn, verschrikkelijker in haren aard, geweldiger in haren invloed en gewigtiger in hare ] uitwerkingen, dan eene revolutie welke sedert < menige eeuw over Europa gewaaid heeft Aan- ( vankelijk schijnt Napoleon wel te slagen, echter kan de zege van het despotisme slechts van korten ; ( duur zijn. Frankrijk blijft niet lang in revolutie,', en het tegenwoordige is slechts een oproer, welke do teekenen der tijden ons zeggen te moeten voortduren tot dat Europa verlost, herboren en bevrijd is van het juk van dwingelandij en ver- ] drukking ! ,",http://resolver.kb.nl/resolve?urn=ddd:011021468:mpeg21:a0007,ddd:011021468:mpeg21:a0007,nl,De Sheboygan Nieuwsbode,79.5,,Roosevelt Study Center


In [7]:
colnames(revol)

Let's keep only the variables we want to continue to work with. And inspect the first rows of this subset. You'll already notice the mentioning of "FRANKRIJK" (France) and "WEENEN" (Vienna) here.

In [4]:
x <- revol[,c("id", "date", "article_title", "url")]

In [7]:
head(x)

id,date,article_title,url
ddd:011021468:mpeg21:a0007,1851-12-30,REVOLUTIE IN FRANKERIJK,http://resolver.kb.nl/resolve?urn=ddd:011021468:mpeg21:a0007
ddd:011021468:mpeg21:a0037,1851-12-30,REVOLUTIE IN FRANKRIJK.,http://resolver.kb.nl/resolve?urn=ddd:011021468:mpeg21:a0037
ddd:010078958:mpeg21:a0001,1859-04-21,Frakrijk en de Revolutie.,http://resolver.kb.nl/resolve?urn=ddd:010078958:mpeg21:a0001
TEN:MMKB08:MMKB08:000088461:mpeg21:a0006,1853-08-10,(1) Ongeloof en Revolutie bladz. 55.,400337789
ddd:010081856:mpeg21:a0013,1850-07-18,VARIA. BALANS DER FEBRUARIJ REVOLUTIE.,http://resolver.kb.nl/resolve?urn=ddd:010081856:mpeg21:a0013
ddd:010067146:mpeg21:a0001,1848-10-23,"UTRECHT, 21 October. DE REVOLUTIE TE WEENEN.",http://resolver.kb.nl/resolve?urn=ddd:010067146:mpeg21:a0001


Not required for now, but below demonstrates how to extract years from the date variable.

In [10]:
x$date <- as.Date(x$date)
x[, year := as.numeric(substr(x$date, 1,4))]

In [36]:
setDT(x)
x[, .N, list(year)][order(-year)][1:10]

year,N
1860,1290
1859,966
1858,355
1857,365
1856,542
1855,339
1854,487
1853,538
1852,497
1851,613


For better Named Entity Recognition we will change the article_title variable to lower case strings. 
SpaCy has the tendency to recognize uppercase strings as corporations.

In [10]:
x$article_title <- tolower(x$article_title)

Now we need to call in action the [Dutch language model](https://spacy.io/models/nl/#nl_core_news_sm) from spaCy to perform NER. If you want to replicate this analysis, you need to have this installed on your local machine. Read the website for installation instructions. 

In [12]:
spacy_initialize(model = "nl_core_news_sm")

Finding a python executable with spaCy installed...
spaCy (language model: nl_core_news_sm) is installed in C:\Users\Schal107\AppData\Local\Programs\Python\Python39\python.exe
successfully initialized (spaCy Version: 3.3.0, language model: nl_core_news_sm)
(python options: type = "python_executable", value = "C:\Users\Schal107\AppData\Local\Programs\Python\Python39\python.exe")


Below we parse the article_title to SpaCy and perform NER. You'll see that it has recognized at least some locations in the resulting table.

In [17]:
parsedtxt <- spacy_parse(x$article_title, lemma = FALSE, entity = TRUE, nounphrase = TRUE)

In [18]:
locations <- entity_extract(parsedtxt)

In [19]:
setDT(locations)
top100 <- locations[entity_type == "GPE", .N, list(entity) ][order(-N)]

In [16]:
head(top100)

entity,N
frankrijk,1349
parijs,997
amsterdam,535
utrecht,304
londen,168
brussel,129


Before we can plot them on a map, we need to add coordinates to the placenames. We'll use the Google Maps API for that. Notice that you will need your own key to do so. You can register for a free API key at Google. 

In [20]:
google_key <- fread("C:\\Users\\Schal107\\Documents\\UBU\\Team DH\\Delpher\\google_key.txt")
register_google(key = paste0(google_key$key))

In [55]:
coordinates <- geocode(top100$entity)

Source : https://maps.googleapis.com/maps/api/geocode/json?address=frankrijk&key=xxx
Source : https://maps.googleapis.com/maps/api/geocode/json?address=parijs&key=xxx
Source : https://maps.googleapis.com/maps/api/geocode/json?address=amsterdam&key=xxx
Source : https://maps.googleapis.com/maps/api/geocode/json?address=utrecht&key=xxx
Source : https://maps.googleapis.com/maps/api/geocode/json?address=londen&key=xxx
Source : https://maps.googleapis.com/maps/api/geocode/json?address=brussel&key=xxx
"brussel" not uniquely geocoded, using "brussels, belgium"
Source : https://maps.googleapis.com/maps/api/geocode/json?address=rotterdam&key=xxx
Source : https://maps.googleapis.com/maps/api/geocode/json?address=berlijn&key=xxx
Source : https://maps.googleapis.com/maps/api/geocode/json?address=nederlanden&key=xxx
"Geocoding "nederlanden" failed with error:

"Source : https://maps.googleapis.com/maps/api/geocode/json?address=nederland&key=xxx
"nederland" not uniquely geocoded, using "netherlands"


In [30]:
head(coordinates)

lon,lat
2.213749,46.22764
2.3522219,48.85661
4.9041389,52.36757
5.1214201,52.09074
-0.1275862,51.50722
4.3571696,50.84764


After the geocoding we merge the retrieved coordinates with the placenames.

In [36]:
coordinates <- cbind(top100, coordinates)
head(coordinates)

entity,N,lon,lat
frankrijk,1349,2.213749,46.22764
parijs,997,2.3522219,48.85661
amsterdam,535,4.9041389,52.36757
utrecht,304,5.1214201,52.09074
londen,168,-0.1275862,51.50722
brussel,129,4.3571696,50.84764


Now we'll need a map! You can use the standard Google Map and define a centre using latitute and longitude. However, for historical data I like to remove roads and names from the map. You can do that by making your own map at [Google Mapstyle](https://mapstyle.withgoogle.com/). With the same API key as you've use for the geocoding, you can export the URL of your map of choice, and paste it below in the function `get_googlemap`. For some reason, though, you first need to extract the lat and long from this URL, and the zoom level (see below). Then you paste the remaining URL from the first mention of '&maptype' behind `path =` (don't forget to include it in quotes). Then it should work!

In [37]:
europe5 <- get_googlemap(center=c(lon=10.95931966568949, lat=48.561877580811775), zoom = 4, path = "&maptype=roadmap&style=element:geometry%7Ccolor:0xf5f5f5&style=element:labels%7Cvisibility:off&style=element:labels.icon%7Cvisibility:off&style=element:labels.text.fill%7Ccolor:0x616161&style=element:labels.text.stroke%7Ccolor:0xf5f5f5&style=feature:administrative.land_parcel%7Cvisibility:off&style=feature:administrative.land_parcel%7Celement:labels.text.fill%7Ccolor:0xbdbdbd&style=feature:administrative.neighborhood%7Cvisibility:off&style=feature:landscape.natural.terrain%7Ccolor:0xffffff%7Cvisibility:on%7Cweight:4&style=feature:landscape.natural.terrain%7Celement:geometry.fill%7Cvisibility:on%7Cweight:4&style=feature:landscape.natural.terrain%7Celement:geometry.stroke%7Cvisibility:on&style=feature:poi%7Celement:geometry%7Ccolor:0xeeeeee&style=feature:poi%7Celement:labels.text.fill%7Ccolor:0x757575&style=feature:poi.park%7Celement:geometry%7Ccolor:0xe5e5e5&style=feature:poi.park%7Celement:labels.text.fill%7Ccolor:0x9e9e9e&style=feature:road%7Cvisibility:off&style=feature:road%7Celement:geometry%7Ccolor:0xffffff&style=feature:road.arterial%7Celement:labels.text.fill%7Ccolor:0x757575&style=feature:road.highway%7Celement:geometry%7Ccolor:0xdadada&style=feature:road.highway%7Celement:labels.text.fill%7Ccolor:0x616161&style=feature:road.local%7Celement:labels.text.fill%7Ccolor:0x9e9e9e&style=feature:transit.line%7Celement:geometry%7Ccolor:0xe5e5e5&style=feature:transit.station%7Celement:geometry%7Ccolor:0xeeeeee&style=feature:water%7Celement:geometry%7Ccolor:0xc9c9c9&style=feature:water%7Celement:labels.text.fill%7Ccolor:0x9e9e9e&size=480x360")


Source : https://maps.googleapis.com/maps/api/staticmap?center=48.561878,10.95932&zoom=4&size=640x640&scale=2&maptype=terrain&path=&maptype=roadmap&style=element:geometry%7Ccolor:0xf5f5f5&style=element:labels%7Cvisibility:off&style=element:labels.icon%7Cvisibility:off&style=element:labels.text.fill%7Ccolor:0x616161&style=element:labels.text.stroke%7Ccolor:0xf5f5f5&style=feature:administrative.land_parcel%7Cvisibility:off&style=feature:administrative.land_parcel%7Celement:labels.text.fill%7Ccolor:0xbdbdbd&style=feature:administrative.neighborhood%7Cvisibility:off&style=feature:landscape.natural.terrain%7Ccolor:0xffffff%7Cvisibility:on%7Cweight:4&style=feature:landscape.natural.terrain%7Celement:geometry.fill%7Cvisibility:on%7Cweight:4&style=feature:landscape.natural.terrain%7Celement:geometry.stroke%7Cvisibility:on&style=feature:poi%7Celement:geometry%7Ccolor:0xeeeeee&style=feature:poi%7Celement:labels.text.fill%7Ccolor:0x757575&style=feature:poi.park%7Celement:geometry%7Ccolor:0xe5e5e5

Now that we have our map loaded, we parse it to ggmap. Then we can add our data to it. The size of the dots corresponds to the number of times the geocoded placename is mentioned in our article titles. You'll see that Paris and France are dominant, but also that we spot some unexpected places in Italy and even Eastern Europe.

In [53]:
p <- ggmap(europe5)
p <- p + geom_point(data = coordinates, aes(x=lon, y=lat, size=(N)), shape=16, color = "red") 
invisible(ggplot_build(p))
ggsave("delpher_map.png")

Saving 6.67 x 6.67 in image
"Removed 41 rows containing missing values (geom_point)."

---------------------------------------------------------------------------------------------------------------------------
<center><b><h3>  And here we have our results:<\center><\b><\h3>

[<img src="delpher_map.png" width="850"/>](delpher_map.png)

*For questions and comments email <r.schalk@uu.nl>