# Text Summarization 

This notebook shows the summaries for specific subthemes and agreement levels. The purpose of this is to help explain the different groupings of agreement levels to determine potential areas for improve in the WES design.

### Instructions for use

This notebook can be used to create summaries for text. You can select which subtheme and agreement level you want to look at. This analysis uses pretrained embeddings which must be downloaded locally for this notebook to function. 


In [1]:
# Change working directory to be project root
import os
#os.chdir("..")
os.getcwd()

'C:\\Users\\payla\\Documents\\MDS\\Capstone\\DSCI_591_capstone-BCStats'

In [2]:
import pandas as pd
import numpy as np
import time

In [3]:
# ensure packages reload after every change 
%load_ext autoreload
%autoreload 2

import src

from src.analysis.text_summary import *
from src.analysis.emotion_analysis import *
from src.data.preprocessing_text import *

## Load Data

In [4]:
# read agreement data
data_agreement = pd.read_csv(".\data\interim\joined_qual_quant.csv", 
                            usecols = [0, 1, 4, 5, 6])

# read in data legend
legend = pd.read_csv("./references/data-dictionaries/theme_subtheme_names.csv")

data_agreement.head(3)

Unnamed: 0,USERID,code,question,diff,text
0,191202-862188,102,Q39,0,"Improved office space (fix HVAC, etc) but NO LWS"
1,173110-932228,14,Q46,1,Administration people should have better oppor...
2,185914-180608,24,Q20,0,We are the lowest paid in Canada with a worklo...


In [5]:
# add the theme labels to the dataset so it can also be filtered by theme
data_agreement = src.analysis.emotion_analysis.get_theme_labels(data_agreement, legend)
data_agreement.head(3)

Unnamed: 0,USERID,code,question,diff,text,theme,subtheme_description
0,191202-862188,102,Q39,0,"Improved office space (fix HVAC, etc) but NO LWS","Tools, Equipment & Physical Environment","Improve facilities (e.g. office space, noise l..."
1,173110-932228,14,Q46,1,Administration people should have better oppor...,Career & Personal Development,Provide opportunities for career advancement
2,185914-180608,24,Q20,0,We are the lowest paid in Canada with a worklo...,Compensation & Benefits,Increase salary


## Generate Corpus from Dataframe

- need sentences
- processed 

In [17]:
corpus_43_0 = src.analysis.text_summary.generate_corpus_from_comments(data_agreement, 
                                                                      depth="subtheme", 
                                                                      name=43, 
                                                                      agreement="strong", sentences=True)
corpus_all = src.analysis.text_summary.generate_corpus_from_comments(data_agreement)

## Load Embeddings
- this function takes several minutes depending on which pre-trained embedding you are using

In [7]:
# file path for embedding
fasttext = "./references/pretrained_embeddings.nosync/fasttext/crawl-300d-2M.vec"
start = time.time()
embedding = src.analysis.text_summary.load_word_embeddings(fasttext)
end = time.time()
print((end - start) / 60, "mins")

## Summaries

### Option 1:

In [11]:
start = time.time()
summary_43_0 = src.analysis.text_summary.generate_summary_pagerank_pretrained_embedding(corpus_43_0, 
                                                  embedding, 
                                                  embedding_size=300, 
                                                  size_summary=5)
end = time.time()
print((end - start) / 60, "mins")

5.8762840032577515 mins


In [13]:
print("Summary for Strengthen quality of executive leadership - Strong Agreement")
print("--------------------------------------------------------------------------")
for i in summary_43_0:
    print(i, "\n")

Summary for Strengthen quality of executive leadership - Strong Agreement
--------------------------------------------------------------------------
Make DM and ADM by merit only sever the political element Current Executive is completely out of touch, policies and decisions at that level do not reflect or support operational reality, moral is at an all time low Executive staff (director level) often interfere directly with staff members in work assignments that should be managed by senior managers. 

Leadership under the exist PSA model for senior Executive focuses too much on the political rather than the public service, the downside for staff is they have no impact on decisions affecting them, and often receive very little credit because they report to Executive who are more focussed on their upward mobility. 

tends to wait for instructions from above  Middle management is weak in MANAGEMENT, most, if not all, have no or little prior management experience and it manifests (related 

### Option 2:

In [42]:
start = time.time()
summary_43_0_summa = src.analysis.text_summary.generate_summary_summa(corpus_43_0)
end = time.time()
print((end - start) / 60, "mins")

In [43]:
print("Summary for Strengthen quality of executive leadership - Strong Agreement")
print("--------------------------------------------------------------------------")
for i in summary_43_0_summa:
    print(i, "\n")

Summary for Strengthen quality of executive leadership - Strong Agreement
--------------------------------------------------------------------------
Nothing happening Stronger exec leadership decision making initiative interaction with staff in meaningful constructive way improved communication about priorities strategic direction empower staff leverage expertise staff need more opportunities stretch develop new skills allow for true innovation encourage forum for creative policy ideas at staff level not just management By hiring team lead Director that has proven skills education ability lead. 

give the Executive team the confidence trust tools learning time they need be more strategic less reactionary Would like see change in leadership where managers directors embody qualities such as integrity honesty work ethic. 

Executive appears fearful unwilling make decisions Support exec senior leadership in professional development focused on being stronger leaders for staff who work below

### Option 3:

In [53]:
start = time.time()
summary_43_0_gensim = src.analysis.text_summary.generate_summary_gensim(corpus_43_0)
end = time.time()
print((end - start) / 60, "mins")

In [54]:
print("Summary for Strengthen quality of executive leadership - Strong Agreement")
print("--------------------------------------------------------------------------")
for i in summary_43_0_gensim:
    print(i, "\n")

Summary for Strengthen quality of executive leadership - Strong Agreement
--------------------------------------------------------------------------
The ADM responsible for this work unit needs take more active role ensure the directors Exec Director have the appropriate HR skills leadership decision making skills that have been lacking for number years. 

Nothing happening Stronger exec leadership decision making initiative interaction with staff in meaningful constructive way improved communication about priorities strategic direction empower staff leverage expertise staff need more opportunities stretch develop new skills allow for true innovation encourage forum for creative policy ideas at staff level not just management By hiring team lead Director that has proven skills education ability lead. 

give the Executive team the confidence trust tools learning time they need be more strategic less reactionary Would like see change in leadership where managers directors embody qualitie

## Current plan/steps

- try a few different algorthms, appears to be a lot of different implementations present

- do some sentence EDA
    - most common words (may want to lemitize first)
    - for exective maybe look at most common acroynms (like ADM, etc)


# Experiments

In [None]:
# this is how I can relate it back to the index and print text with better formatting 

for i in x:
    c = re.sub(r'\.', '', i)
    print(c)
    
    index = processed_43_2.index(c)
    print(index)

    print(sentences_43_2[index], "\n")
            
            
            