# ATAP Concordancer

## Introduction

This notebook is a Concordancer tool which allows users to upload text data (eg. .csv or .txt file) and to search the text for each instance of a search term, presenting it in the form of a concordance. The Concordancer retrieves all relevant instances of the search term and displays them for users in the tool as well as making them available for download as a CSV file for additional analysis. It has specifically been designed to allow users (i) to undertake ‘dialogic’ analysis (when the input consists of related text pairs, such as question-answer or social media post-response) and/or (ii) to make visible the meta-data that are associated with the occurrence of the search term (when available in the input; for example, speaker identity, political affiliation, company, etc).

To do so, the data that is loaded into the notebook must contain ‘structured’ data, where one column consists of ‘text’ (eg the question or social media post) and the other columns consist of either the associated text of the dialogic pair (eg the relevant answer or the relevant reply/comment) or of metadata (describing aspects of the text). This is explained further below. In addition to this analysis of structured data, the notebook can create its own structured data based on symbols present in the uploaded text(s), automatically splitting the data preceding and following the relevant symbol (e.g. a colon or a question mark). This is also explained and illustrated further below.

In sum, this notebook is not meant to feature all types of analyses offered by current off-the-shelf Concordancers and should be considered as complementary to such existing tools. You may want to use this tool if you are interested in using a Concordancer for dialogic analysis or exploring the relationship between search term and meta-data.

## File upload

Upload either a CSV or text file here:

In [None]:
from ipywidgets import FileUpload
from src.atap_widgets.concordance import ConcordanceLoader
uploader = FileUpload(accept=".csv,.txt")
display(uploader)

## How to use

### Preparation
1. Upload a file by clicking the above 'Upload' button
2. Run the code block below and wait for the concordancer tool to display

### Search
1. Enter a search term into the search field and press enter on your keyboard to perform a search
2. Toggle the checkboxes below the search field to enable/disable regular expression matching, case sensitivity, and whole word matching

- If there are many results from the search, navigate through the pages of results using the 'Page' navigator field

### Display
1. Use the 'Sort by' dropdown to sort by text_id (line number), left context, right context, or metadata columns
2. If your data contains metadata columns, use the 'Show More' field to select a metadata column to display
3. Export the file to an excel spreadsheetby providing an appropriate file name and clicking the button labelled "Export to Excel".
   This sheet will appear in the Jupyter file window on the left and can be downloaded by right-clicking the file and clicking "Download"

- If the context windows don't display all of the text you would like to display, change the window size using the "Window size" field

## Concordancer
Ensure you have uploaded a file and then run the block below to show the widget

In [None]:
uploaded = len(uploader.value) > 0
if uploaded:
    uploaded_file = uploader.value[0]
    file_name = uploaded_file.name
    with open(file_name, "wb") as fp:
        fp.write(uploaded_file.content)
    
    file_type = uploaded_file.name[-3:]
    
    concordance_loader = ConcordanceLoader(path=file_name, type=file_type)
    concordance_loader.show()
else:
    print("Ensure you upload a file!")

## Concordancer - Dialogic

The dialogic feature is a more advanced feature of the concordancer. It requires that you specify a character across which the text data will be split. For example, if your text is of the format `speaker: spoken words`, you can specify the "splitter" to be `:`, which will create a column called "key" for the speaker and a column for the spoken words. You can then see which speaker spoke the words in a given concordance result.
Below, replace the `:` between the quotation marks to specify a different splitter character.

In [None]:
splitter = ":"

uploaded = len(uploader.value) > 0
if uploaded:
    uploaded_file = uploader.value[0]
    file_name = uploaded_file.name
    with open(file_name, "wb") as fp:
        fp.write(uploaded_file.content)
    
    file_type = uploaded_file.name[-3:]
    
    concordance_loader = ConcordanceLoader(path=file_name, type=file_type, re_symbol_txt=splitter)
    concordance_loader.show()
else:
    print("Ensure you upload a file!")