# Borrowing detection and annotation

## Introduction

In a narrow sense, borrowing refers to the transfer of words from one language into another one:

> Narrowly, the transfer of a word from one language into a second language, as a result
> of some kind of contact [...] between speakers of the two. (Trask 2000: 44)

In a broader sense, however, borrowing can also refer to the transfer of linguistic feattures among languages:

> Broadly, the transfer of linguistic features of any kind from one language to another as
> a result of contact. (ebd.)

In terms of terminology, it is important to keep in mind that the term "borrowing" usually refers to a concrete process. Alternatively, one can also use the word "transfer", although we find it less frequently in the literature. Borrowing in a narrower sense would then be called "lexical transfer", while the general phenomenon is probably best referred to as "lexical interference" (following [Weinreich 1953](:bib:Weinreich1953)).

If we assume a bilateral sign model, following Saussure, we can distinguish roughly two cases of lexical transfer, namely the ones where it happens direct (*direct transfer*) and those cases, where the transfer is indirect, i.e., not including the form-part of the linguistic sign, but rather its meaning. For simple words, this can be easily visualized, while we have mixed forms if compounds or more complex signs are involved.


> The ways in which one vocabulary can interfere with another are various. Given two
> languages, A and B, morphemes may be transferred from A into B, or B-morphemes may
> be used in new designative functions on the model of A-morphemes with whose content
> they are identified; finally, in the case of compound lexical elements, both processes may
> be combined. (Weinreich 1953: 47)

We can display those processes based on Weinreich's description (pp. 47-62) in the following image:

![img](img/s13_weinreich.png)

A further important aspect to be mentioned here is that the process of borrowing does not necessarily end with the transfer of a form or a meaning-structure. After the process, the recipient language usually experiences certain follow-up processes, as the lexical space needs to be re-arranged to accommodate the new word form or conceptual construct. As a result, we can find cases such as *confusion*, *disappearance*, and *specialization*:

> Except for loanwords with entirely new content, the transfer or reproduction of foreign
> words must affect the existing vocabulary in one of three ways: (1) confusion between
> the content of the new and old word; (2) disappearance of the old words; (3) survival of
> both the new and old word, with a specialisation in content. (ebd.)

From the perspective of the transferred form, we can furthermore distinguish additional processes, namely *nativization*, and its counterpart: *hyper-foreignization* (see [Hock and Joseph 1995: 257f](:bib:Hock1995)). While nativization will adapt the pronunciation of foreign words to the recipient language's basic pronunciation, hypoer-foreignization modifies the foreign sound of borrowings to conserve the foreignness of the borrowed word.




## Traditional Approaches to Borrowing Detection

Traditionally, borrowings can be detected using a richer arsenal of different techniques that are rarely formalized and followed up in a straightforward fashion. Among the most important considerations that scholars use as evidence to prove that a word form is borrowed are:

1. phonotactic considerations (following the idea that nativization is not an abrupt process, and that borrowed words *sound* different from native words when first introduced to a new language)
2. topological considerations (following the idea that a word that is rare in one subgroup but frequent in another one is probably borrowed from the subgroup where it is frequent)
3. semantic considerations (following the idea that a word with a specialized meaning range that provides further evidence for being borrowed is judged to be more likely to be borrowed if it represents a restricted or very specialized, e.g., technical, meaning)
4. irregular sound correspondences (following the idea that irregular sound correspondences may reflect transfer instead of inheritance in potential cognate words)

The last point is currently regarded as the most important pieces of evidence for borrowings. If sound correspnodences between two languages show an unexpected behavior, scholars identify *layers* of borrowing. A simple example for this *stratification* approach is the match of *d* in German and English related words. Usually, we would expect English words to show a *th* if the German words starts with a *d*, but we have a layer of many borrowings in German, where we have *d* in both languages:

No. | German | English | Middle High German 
--- | --- | --- | ---
1 | Dach | thatch | dah
2 | Daumen | thumb | dūm
3 | Degen | thane | degan
4 | Ding | thing | ding
5 | drei | three | drī
6 | Durst | thirst | durst
7 | denken | think | denken
8 | Dieb | thief | diob
9 | dreschen | thresh | dreskan
10 | Drossel | throat | drozze
11 | Dill | dill | tilli
12 | dumm | dumb | tumb
13 | Damm | dam | tam
14 | Dunst | dunst | tunst
15 | Dollar | dollar | -

Not all of these are borrowings from English to German, but they can also come from Low German varities. The other methods are less frequently used, at least in the classical literature. The phonotactic method is probably considered to be too shallow in time depth, and the topological considerations suffer from the fact that words can look like borrowings from their distribution in tree topologies although they are not borrowed.


## Phylogenetic Borrowing Detection

This method, which has been described in a couple of papers and implemented as part of LingPy under the name "MLN method" ([Nelson-Sathi et al. 2011](:bib:NelsonSathi2011), [List et al. 2014](:bib:List2014f)) takes tree topology as a proxy for the detection of borrowed words. The main idea is that words which show a strange distribution across a given *reference tree* are likely to be borrowed. While this assumption does not hold in all cases, it seems to be useful enough to get at least a rough impression of how much reticulation can be found in a given dataset. 

As an example, consider the distribution of words for "mountain" in German and Romance languages in the following table:

![img](img/s13-fig1.png)

If we seek to explain this distribution, we can evoke different evolutionary scenarios of how the words corresponding to pattern A developed across the tree, as shown in the next figure (scenarios in B and C):

![img](img/s13-fig2.png)

If these methods are applied to larger datasets, a phylogenetic tree can be inferred in which edges between nodes are drawn that visualize where the conflicts can be found. Often these edges may coincide with lateral transfer events in language history, but they are at times also simplifying, as we cannot tell the direction, or the concrete donor, as in our case for *mountain*, where the donor could be any of the Romance languages (not only in our sample, but even outside). 

Nevertheless, networks drawn in this fashion may yield initial interesting insides into the data of a certain dataset.

![img](img/s13-fig3.png)

In a similar fashion, we can plot the networks to a map, where we can often find much more detailed accounts, and also an easy way to verify if a given "borrowing event" makes sense in the light of the data.

![img](img/s13-fig4.png)




## Language-Internal Borrowing Detection



## Computing MLNs with LingPy

In the following, we quickly demonstrate, how MLNs can be computed in LingPy.

In [8]:
from lingpy import *
from lingpy.compare.phylogeny import PhyBo

phy = PhyBo('../data/S10-BAI.tsv', tree_calc='upgma')
phy.analyze()
phy.plot_MLN(fileformat='png', filename='img/s13-fig5')



                                                                         

This code assumes a lot of data being already passed to the algorithm, especially cognate sets and cognate sets.

![img](img/s13-fig5.png)