# Earthquake Report Summarization

**This notebook contains functions for extracting summaries from multiple sections, by using both TextRank and query search.**

In [2]:
import numpy as np
import pandas as pd
from gensim.summarization.summarizer import summarize
from nltk import tokenize

# 1. Define Parameters and Queries 

In [8]:
# sections you want to summarize
sections = ["motivation", "objectives", 
            "methodology", "results", "conclusions"]

# desired summary lengths (these are used to change the ratio)
s_lengths = [1400, 1400, 4000, 4000, 1700]

# queries used for each section
queries = {"motivation": ['motivation', 'motive', 'reasoning', 'why', 'purpose', 'scope'],
           "objectives": ['objective', 'objectives', 'goal', 'aim', 'purpose', 'summary'],
           "methodology": ['method', 'approach', 'analytical', 'experimental', 'theoretical',
                           'numerical', 'field', 'simulation', 'data', 'probabilistic', 
                           'system', 'effect', 'affect', 'summary'],
           "results": ['result', 'outcome', 'fnding', 'effect', 'affect', 'summary'],
           "conclusions": ['conclusion', 'conclude', 'list', 'arrive', 'reach', 'finding', 'summary']}

# 2.Generate the Summary

### Control: Plain Summary

In [4]:
result_plain = {}
for i in range(5):
    with open("report2/{}.txt".format(sections[i]), "r") as f:
        t = f.read()
        
        result_plain[sections[i]] = summarize(t, ratio=s_lengths[i]/len(t))

### Experiment: TextRank with context (surrounding sentences) + query-searched sentences

In [9]:
result_context = {}
result_keyword = {}
for i in range(5):
    with open("report1/{}.txt".format(sections[i]), "r") as f:
        t = f.read()
        context_list = []
        keyword_list = []
        sent_list = tokenize.sent_tokenize(t) 
        summa_list = tokenize.sent_tokenize(summarize(t, ratio=s_lengths[i]/(len(t)*3))) 
        context_ind_set = set()
        keyword_ind_set = set()
        for summa_sent in summa_list:
#             j = sent_list.index(summa_sent)
            for j in range(len(sent_list)-1, -1, -1):
                if summa_sent in sent_list[j]:
                    break
            if j > 0:
                context_ind_set.add(j-1)
            context_ind_set.add(j)
            if j < len(sent_list) - 1:
                context_ind_set.add(j+1)
                
        for query in queries[sections[i]]:
            for i in range(len(sent_list)):
                sent = sent_list[i]
                if query in sent:
                    keyword_ind_set.add(i)
            
        for c_ind in context_ind_set:
            context_list.append(sent_list[c_ind])
            
        for k_ind in context_ind_set:
            keyword_list.append(sent_list[k_ind])
        
        result_context[sections[i]] = " ".join(context_list)
        result_keyword[sections[i]] = " ".join(keyword_list)

In [6]:
result_plain

{'motivation': 'Figure 1.3 shows a bilinear force-displacement curve, which is generally representative of several types of energy-dissipating isolation devices.\nThe results of the test program demonstrated two direct effects of vertical shaking that may need to be considered in the lateral design of isolated structures.\nSpecifically, the high-frequency component introduced into the base shear by the vertical motion caused the activation of higher structural modes.',
 'objectives': 'As outlined above, although quite a few previous experimental studies reported no significant influence of vertical motion on the horizontal response isolated structures, the experimental test on a full-scale five-story steel moment frame building isolated with TPBs at E-defense demonstrated that vertical shaking can increase the base shear and horizontal acceleration.\nThe primary objectives of this study are as follows: (1) to develop a physical understanding of the phenomena by which responses in bridg

In [10]:
result_context

{'motivation': 'It is known that axial force or strain affects the shear capacity. As an example, near-fault vertical ground motions may lead to tensile forces on the bridge columns during short time intervals, leading to negligible contribution of concrete to shear capacity after cracking. Although the effect of axial force on shear capacity is an accepted fact, current seismic codes do not have a consensus on this effect, and different code equations might lead to different shear capacity estimations. On the demand side, axial forces that are not taken into consideration, such as those due to vertical excitation, may lead to an increase in the moment capacities, resulting in greater shear forces than expected.',
 'objectives': 'The main objective of this study is to investigate the effect of axial force produced by the vertical component of the ground motion on the behavior of bridge columns, especially on shear strength degradation. An outline of the research program is presented in

## Appendix 

**contains how to use docx library to directly get the txt files.**

In [1]:
import docx

doc = docx.Document("PEER_Report.docx")
fullText = []
for para in doc.paragraphs:
    fullText.append(para.text)
text = '\n'.join(fullText)

In [None]:
text[:10000]