April 7, 2026
Welcome to the toponym 2.0.1 Readme!
The toponym package supplies users of R with tools to visualize and
analyze toponym (= place name) distributions. It is intended as an
interface to the GeoNames data. A regular
expression filters data and in a second step a map is created displaying
all locations in the filtered data set. The functions make data and
plots available for further analysis—either within R or in the working
directory. Users can select regions within countries, provide
coordinates to define regions, or specify a region within the package to
restrict the data selection to that region or compare regions with the
remainder of countries.
If you would like to use toponym 1.X.X, head to
this GitHub
branch and follow the instructions.
You can install the most recent CRAN release with:
## Install CRAN version of < toponym >
install.packages("toponym")In order to install this package from
GitHub, you
will need devtools. You can download and install the current
development version of toponym with:
## Install development version of < toponym > from GitHub
# install.packages("devtools")
# library ("devtools")
devtools::install_github("Lennart05/toponym", ref = "toponym-CRAN")Most functions require external data which will be downloaded and stored
for later use. Since no default path is set upon installation, users
need to provide a path. The function toponymOptions() allows you to
set a persistent path and view it. You can set the path to the package
directory or provide a full, alternative path. In the following example,
it is set to the package directory:
library(toponym) # load the package
toponymOptions("pkgdir") # "pkgdir" is interpreted as the directory of the toponym package
# you will be prompted to confirm your choiceOnce a path is set, you can check it like this:
toponymOptions()
# returns current path (in this case the package directory)We recommend setting a persistent path for downloaded data. However,
users can always set the path manually when a function is used by
specifying the parameter toponym_path. For illustration purposes, the
path is manually set to the temporary directory in examples of this
Readme.
The function top(), meaning “toponym”, outputs data complying with a
regular expression. Minimally one or more strings and one or more
countries (in that order) are given as input. The following code is a
simple example of this:
library(toponym) # load the package
data_itz <- top("itz$", "DE", toponym_path = tempdir())A data frame named data_itz is stored in the Global environment
listing all locations which end in -itz in Germany.
For the purpose of plotting outputs of top() and edited data frames,
we offer the mapper() function. This accepts a user-defined title,
legend, colors, groups and more. An example using the previously created
data frame is the following, where occurrences of -witz and -itz east of
a 10.5 longitudinal line are displayed:
itz_east <- data_itz[data_itz$longitude > 10.5,]
itz_east$color <- "darkgrey" # creates color column with color dark grey
witz_indices <- grep("witz", itz_east$name) # stores indices for lines containing "witz"
itz_east[witz_indices, "color"] <- "green" # sets color of "witz" entries to green
itz_east[witz_indices, "group"] <- "witz" # sets group labels with "itz" to "witz"
mapper(itz_east, title = "-witz and -itz in the East")The data is meant to cover maps and toponyms of the world. The function
country() lets users access all permitted country and region
designations used by this package. The query country table returns the
entire data frame.
head(country(query = "country table"))
#> ISO2 ISO3 Country
#> 1 AW ABW Aruba
#> 2 AF AFG Afghanistan
#> 3 AO AGO Angola
#> 4 AI AIA Anguilla
#> 5 AX ALA Aland Islands
#> 6 AL ALB AlbaniaIf you want to access the row of a specific country, you can either provide the ISO2 code, ISO3 code or the country name:
country(query = "Argentina")
#> [[1]]
#> ISO2 ISO3 Country
#> 9 AR ARG Argentina
# returns the respective row for Argentina
country(query = "ARG")
#> [[1]]
#> ISO2 ISO3 Country
#> 9 AR ARG Argentina
# returns the same rowIf regions is set to 1, the function returns all region
designations:
country("Mali", regions = 1, toponym_path = tempdir())
#> [[1]]
#> name ID
#> [1,] "Bamako" "MLI.1_1"
#> [2,] "Gao" "MLI.2_1"
#> [3,] "Kayes" "MLI.3_1"
#> [4,] "Kidal" "MLI.4_1"
#> [5,] "Koulikoro" "MLI.5_1"
#> [6,] "Mopti" "MLI.6_1"
#> [7,] "Ségou" "MLI.7_1"
#> [8,] "Sikasso" "MLI.8_1"
#> [9,] "Timbuktu" "MLI.9_1"
# returns all region names and IDs of Mali available in the dataMap data needs to be downloaded in order to retrieve region
designations. Thus, a path needs to be provided if parameter regions
is set to a value higher than 0.
topFreq() lets users find strings frequently recurring in toponyms. A
simple example for the Philippines would be:
topFreq(countries = "Philippines",
len = 3,
limit = 10,
type = "$",
toponym_path = tempdir())
#> toponyms
#> gan$ ang$ ong$ yan$ uan$ ion$ nan$ tan$ lan$ san$
#> 1767 1258 1136 770 709 615 604 552 551 510Among all toponyms in the data for the Philippines
(countries = "Philippines"), these are the ten (limit = 10) most
frequent trailing (type = "$") strings consisting of (a length of)
three characters (len = 3).
The additional parameter polygon allows users to restrict the data to
a subset of the selected countries. Only toponyms within the polygon are
selected. The polygon needs to intersect or be within a country
specified by the countries parameter. The package contains a
predefined polygon for the historical Danelaw area of England for
purposes of illustration:
topFreq(countries = "GB",
len = 3,
limit = 10,
polygon = toponym::danelaw_polygon,
toponym_path = tempdir())
#> toponyms
#> ton$ een$ ham$ ill$ ley$ End$ rpe$ eld$ ord$ rth$
#> 1467 694 493 437 436 431 264 257 202 192Coordinates which delimit a polygon are input in the form of a data
frame. The createPolygon() function helps users to define their own
polygon by point-and-click or to retrieve map data.
argentina_polygon <- createPolygon(countries = "AR", regions = 1, toponym_path = tempdir())In this example, a map of Argentina AR with highest-level
administrative borders regions = 1 will appear as a plot. Now users
can click to set points which define a polygon. The last point should
not repeat the first point. In RGui, users exit the point selection by
middle-clicking or right-clicking and then pressing stop. In RStudio,
users exit the point selection by pressing ESC or Finish in the top
right corner of the plot. Once finished, a data frame with longitudinal
and latitudinal coordinates called argentina_polygon is created.
topComp(), meaning “toponym compare”, determines which toponym strings
in the data are characteristic to a region. Consider again the following
example for the Danelaw area:
topComp(countries = "GB",
len = 3,
limit = 100,
rat = .8,
polygon = toponym::danelaw_polygon,
toponym_path = tempdir())
#> toponym ratio_perc frequency
#> 1 rpe$ 90.1 264/293The function compares the frequency of trailing strings (type = "$")
within the Danelaw area (polygon = toponym::danelaw_polygon) with
their frequency in the United Kingdom (countries = "GB") and returns a
data frame. The output is in descending order by their proportional
frequency. The search is limited to the 100 (limit = 100) most
frequent strings in the United Kingdom consisting of (a length of) three
characters (len = 3). The cut-off ratio of 80% (rat = .8) means that
at least 80% of all occurrences (in the country or countries) must be
inside the polygon. In this case, the string -rpe occurs 293 times in
the United Kingdom and 264 of these 293 occurrences are within the
target polygon resulting in a ratio percentage of 90.1%.
topZtest() tests whether the frequency of a toponym string is
significantly greater in the given area than in the rest of the country
or countries:
topZtest(strings = "aat$",
countries = "BEL",
polygon = toponym::flanders_polygon,
toponym_path = tempdir())
#>
#> 2-sample test for equality of proportions with continuity correction
#>
#> data: c(string_in_poly, string_in_cc) out of c(top_in_poly, top_in_cc)
#> X-squared = 321.66, df = 1, p-value < 2.2e-16
#> alternative hypothesis: greater
#> 95 percent confidence interval:
#> 0.0476564 1.0000000
#> sample estimates:
#> prop 1 prop 2
#> 0.0526875190 0.0003287851In this example, the function compares the toponymic distribution of the
trailing string -aat (strings = "aat$") in Flanders
(polygon = toponym::flanders_polygon) with Belgium
(countries = "BEL") as a whole. The result of the two proportion test
is returned as an object of class htest.
The core functions are as follows:
top()returns selected toponyms.country()helps in navigating designations of countries and regions used by the package.createPolygon()lets users create a polygon by point-and-click or directly retrieve polygon data.mapper()plots data onto a map.topComp()compares toponym substrings in a polygon and in the remainder of a country (or countries).topFreq()retrieves most frequent toponym substrings.topZtest()lets users apply a Z-test on toponym distributions.toponymOptions()allows users to modify settings for managing toponym data.
For help type ?toponym or a question mark following the individual
function name (or use the help() syntax). A link to the index at the
bottom of each help page provides a useful way of navigating the
package.
For a concise description of which regular expressions exist and how
they can be used, type help("regex") in the R console.
The toponym data comes from GeoNames and will be automatically downloaded when you call any of the core functions.
For mapping purposes as well as region designations, the
geodata package is used.
It provides spatial data for all countries and regions available in this
package. All maps are stored in the geodata package directory.
