![An interactive LADAL notebook](https://slcladal.github.io/images/uq1.jpg)


This notebook-based tool accompanies the [Language Technology and Data Analysis Laboratory (LADAL) tutorial *Analyzing Collocations, N-grams, and Keywords in R*](https://ladal.edu.au/coll.html). 

<div class="warning" style='padding:0.1em; background-color: rgba(215,209,204,.3); color:#51247a'>
<span>
<p style='margin-top:1em; text-align:center'>
<b>The code chunks below calculate keyness statistics that represent how characteristic words are for a text or corpus. </b>! 
<br>
</p>
<p style='margin-left:1em;'>
</p></span>
</div>

<br>

We set up our session by activating necessary packages. 


In [None]:
# set options
options(warn=-1)  # do not show warnings or messages
# activate packages
library(dplyr)    # for table processing
library(writexl)  # for saving xlsx files


## Using your own data

The table you upload should be a spreadsheet  with

+ a column called `token` containing the word type

+ a column called `text1` with the frequency of the word type in the target text/corpus

+ a column called `text2` with the frequency of the word type in the reference text/corpus (together, these frequencies should represent the total corpus size).

The sums of `text1` and `text2` should represent the size of the texts/corpora. 

The data should have the same structure as the example below:

![Data format required for this notebook](https://slcladal.github.io/images/keytb.png)



## Using your own data

<div class="warning" style='padding:0.1em; background-color: rgba(215,209,204,.3); color:#51247a'>
<span>
<p style='margin-top:1em; text-align:center'>
To <b>use your own data</b>, click on the folder called <b>`MyTables`</b> (it is in the menu to the left of the screen) and then simply drag and drop your xlsx-file into the folder. <br>When you then execute the code chunk below, you will upload your own data and you can then use it in this notebook.<br>You can upload <b>only xlsx-files</b>! ! 
<br>
</p>
<p style='margin-left:1em;'>
</p></span>
</div>

<br>


In [None]:
# load function that helps loading the xlsx data
source("https://slcladal.github.io/rscripts/loadtable.R")
# load texts
mytable <- loadtable("notebooks/MyTables")
# inspect first 10 rows of the data
head(mytable, 10)


Now, we extract the association measures (if the table has many rows, this may take a few minutes).



In [None]:
# load function that extract association measures
source("https://slcladal.github.io/rscripts/keystats.R")
# load texts
keys <- keystats(mytable)
# inspect first 10 rows of the data
head(keys, 10)


## Exporting the association table

To export the table with the association measures as an MS Excel spreadsheet, we use `write_xlsx`. Be aware that we use the `here` function to  save the file in the current working directory.


In [None]:
# save data for MyOutput folder
write_xlsx(keys, here::here("notebooks/MyOutput/keys.xlsx"))


<div class="warning" style='padding:0.1em; background-color: rgba(215,209,204,.3); color:#51247a'>
<span>
<p style='margin-top:1em; text-align:center'>
<b>You will find the generated MS Excel spreadsheet named *keys.xlsx* in the `MyOutput` folder (located on the left side of the screen).</b> <br><br>Simply double-click the `MyOutput` folder icon, then right-click on the *keys.xlsx* file, and choose Download from the dropdown menu to download the file. <br>
</p>
<p style='margin-left:1em;'>
</p></span>
</div>

<br>



***

[Back to LADAL](https://ladal.edu.au/coll.html)

***
