![An interactive LADAL notebook](https://slcladal.github.io/images/uq1.jpg)


This notebook-based tool accompanies the [Language Technology and Data Analysis Laboratory (LADAL) tutorial *Analyzing Collocations, N-grams, and Keywords in R*](https://ladal.edu.au/coll.html). 

<div class="warning" style='padding:0.1em; background-color: rgba(215,209,204,.3); color:#51247a'>
<span>
<p style='margin-top:1em; text-align:center'>
<b>The code chunks below calculate association measures that represent the collocational strength between two words. </b>! 
<br>
</p>
<p style='margin-left:1em;'>
</p></span>
</div>

<br>

We set up our session by activating necessary packages. 


In [None]:
# set options
options(warn=-1)  # do not show warnings or messages
# activate packages
library(dplyr)    # for table processing
library(writexl)  # for saving xlsx files


## Using your own data

There are two options when uploading your data (you need to choose one!). 

1) You can upload a table with the co-occurrence of a specific word pair. If you upload this format, your data should be a spreadsheet  with

+ a column called `w1` with the first word of interest (*word1*) in rows 1 and 2 and the term *other* in rows 3 and 4

+ a column called `w2` with the co-occurring word (*word2*) in rows 1 and 3 and the term *other* in rows 2 and 4

+ a column called `O11` with co-occurrence frequency of word 1 with word 2 in row 1 and the frequency off all other words in row 2 (together, these frequencies should represent the total corpus size).

The data should have the same structure as the example below:

![Data format required for this notebook](https://slcladal.github.io/images/colltb_0.png)


2) You can upload a table with **all** co-occurrences in a corpus. If you upload this format, your data should be a spreadsheet  with

+ a column called `w1` with all word types in a corpus (*word1*)

+ a column called `w2` with all co-occurring word types in a corpus  (*word2*)

+ a column called `O11` with co-occurrence frequency of word 1 with word 2

The data should have the same structure as the example below:

![Data format required for this notebook](https://slcladal.github.io/images/colltb_1.png)


## Using your own data

<div class="warning" style='padding:0.1em; background-color: rgba(215,209,204,.3); color:#51247a'>
<span>
<p style='margin-top:1em; text-align:center'>
To <b>use your own data</b>, click on the folder called <b>`MyTables`</b> (it is in the menu to the left of the screen) and then simply drag and drop your xlsx-file into the folder. <br>When you then execute the code chunk below, you will upload your own data and you can then use it in this notebook.<br>You can upload <b>only xlsx-files</b>! ! 
<br>
</p>
<p style='margin-left:1em;'>
</p></span>
</div>

<br>


In [None]:
# load function that helps loading the xlsx data
source("https://slcladal.github.io/rscripts/loadtable.R")
# load texts
mytable <- loadtable("notebooks/MyTables")
# inspect first 10 rows of the data
head(mytable, 10)


We now prepare the table so that we can extract the association measures.



In [None]:
# load function that extract association measures
source("https://slcladal.github.io/rscripts/prepam.R")
# load texts
colldf <- prepam(mytable)
# inspect first 10 rows of the data
head(colldf, 10)


Now, we extract the association measures (if the table has many rows, this may take a few minutes).



In [None]:
# load function that extract association measures
source("https://slcladal.github.io/rscripts/assocstats.R")
# calculate collocation statistics
assocs <- assocstats(colldf) 
# inspect first 10 rows of the data
head(assocs, 10)


## Exporting the association table

To export the table with the association measures as an MS Excel spreadsheet, we use `write_xlsx`. Be aware that we use the `here` function to  save the file in the current working directory.


In [None]:
# save data for MyOutput folder
write_xlsx(assocs, here::here("notebooks/MyOutput/assocs.xlsx"))


<div class="warning" style='padding:0.1em; background-color: rgba(215,209,204,.3); color:#51247a'>
<span>
<p style='margin-top:1em; text-align:center'>
<b>You will find the generated MS Excel spreadsheet named *assocs.xlsx* in the `MyOutput` folder (located on the left side of the screen).</b> <br><br>Simply double-click the `MyOutput` folder icon, then right-click on the *assocs.xlsx* file, and choose Download from the dropdown menu to download the file. <br>
</p>
<p style='margin-left:1em;'>
</p></span>
</div>

<br>



***

[Back to LADAL](https://ladal.edu.au/coll.html)

***
