# QuotationTool
In this notebook, you will use the *QuotationTool* to extract quotes from a list of texts. In addition to extracting the quotes, the tool also provides information about who the speakers are, the location of the quotes (and the speakers) within the text, the identified named entities, etc., which can be useful for your text analysis.  

**Note:** This code has been adapted (with permission) from the [GenderGapTracker GitHub page](https://github.com/sfu-discourse-lab/GenderGapTracker/tree/master/NLP/main) and modified to run on a Jupyter Notebook. The quotation tool’s accuracy rate is evaluated in [this article](https://doi.org/10.1371/journal.pone.0245533).

## 1. Setup
Before you begin, you need to import the QuotationTool and the necessary libraries and initiate them to run in this notebook.

In [None]:
# import the QuotationTool
from extract_display_quotes import QuotationTool

# initialize the QuotationTool
qt = QuotationTool()

<div class="alert alert-block alert-warning">
<b>Installing Libraries</b> 

The requirements file <b>environment.yml</b> is included with this notebook. Take a look inside to find out what libraries you have just installed with the above command.

</div>

## 2. Load the data
This notebook will allow you to extract quotes directly from a text file (or a number of text files). Alternatively, you can also extract quotes from a text column inside your excel spreadsheet.  

<table style='margin-left: 10px'><tr>
<td> <img src='./img/txt_icon.png' style='width: 45px'/> </td>
<td> <img src='./img/xlsx_icon.png' style='width: 55px'/> </td>
<td> <img src='./img/csv_icon.png' style='width: 45px'/> </td>
</tr></table>

<div class="alert alert-block alert-danger">
<b>You will need to upload your text files here</b> 
    
Please upload your text files (.txt) or your excel spreadsheet (.xlsx or .csv), if you have already stored your texts in an excel spreadsheet. Multiple files upload are also accepted. 
</div>

In [None]:
# upload the text files and/or excel spreadsheets onto the system
display(qt.file_uploader)

Once your files are uploaded, you can see a preview of the text in a table format (pandas dataframe).

<div class="alert alert-block alert-info">
<b>Tools:</b>    
    
- nltk: for sentence tokenization
- spaCy: for text cleaning and normalisation
- pandas: for storing and displaying in dataframe (table) format
</div>

In [None]:
# display a preview of the pandas dataframe
qt.text_df.head()

## 3. Extract the quotes
Once your texts have been stored in a pandas dataframe, you can begin to extract the quotes from the texts. You can also extract named entities from your text by setting the named entities you wish to include in the below *inc_ent* variable.

<div class="alert alert-block alert-info">
<b>Tools:</b>    

- quote_extractor: for extracting quotes and speakers
- spaCy: for extracting named entities
</div>

In [None]:
# specify the named entities that you wish to include in the below
inc_ent = ['ORG','PERSON','GPE','NORP','FAC','LOC']

# extract the quotes from the text and preview them in a table format
quotes_df = qt.get_quotes(inc_ent)
quotes_df.head()

In general, the quotes are extracted either based on syntactic rules or heuristic rules. Some quotes can be stand-alone in a sentence, or followed by another quote (floating quote) in the same sentence.   

**Quotation symbols:** *Q (Quotation mark), S (Speaker), V (Verb), C (Content)*  

**Named Entities:**  *PERSON (People, including fictional), NORP (Nationalities or religious or political groups), FAC (Buildings, airports, highways, bridges, etc.), ORG (Companies, agencies, institutions, etc.), GPE (Countries, cities, states), LOC (Non-GPE locations, mountain ranges, bodies of water)*

## 4. Display the quotes
Once you have extracted the quotes, you can see a preview of the quotes using spaCy's visualisation tool, displaCy. All you need to do is run the below code and select the text you wish to analyse and what entities to show. 

Click the ***Preview*** button to display the quotes, the ***Save Preview*** button to save them as an html file, and the ***Top 5 Entitites*** button to display the top five Named Entities mentioned in the speakers and/or quotes.

<div class="alert alert-block alert-info">
<b>Tools:</b>    

- displaCy: for displaying quotes, speakers and named entities
- ipywidgets: for interactive tool
</div>

In [None]:
box = qt.analyse_quotes(inc_ent)
box

## 5. Save the quotes
Finally, you can save the quotes pandas dataframe into an Excel spreadsheet and download them to your local computer for further analysis, if you wish.

In [None]:
# save quotes_df into an Excel spreadsheet
quotes_df.to_excel('./output/quotes.xlsx', index=False)