<h1 align="center">Topic: Building a Text Summarizer</h1>

<h2>Importing required libraries</h2>

In [1]:
from sklearn.feature_extraction.text import TfidfVectorizer
from spacy.lang.en import English
import numpy as np

nlp = English()
#nlp.add_pipe(nlp.create_pipe,'sentencizer') changed in ver 3.0
nlp.add_pipe("sentencizer")

<spacy.pipeline.sentencizer.Sentencizer at 0x150d1592cc0>

<h2>INPUT TEXT</h2>

In [2]:
text_corpus = """


For road bridges the code is applicable to bridges with individual spans ranging from 5 m to 200 m & with the carriageway widths not greater than 42 m. There were 4 models for determining the main vertical loads from traffic representing the different types of traffic & different design situations. There were rules for determining the secondary live loads such as those due to acceleration & braking & centrifugal effects, the requirements for accidental loading cover collision with bridge supports & the effect of errant vehicles on areas of footways & cycle tracks. Load Model 1 These was the main traffic loading system & consisted of concentrated & uniformly distributed loads to cover the global & local effects of normal traffic. The concentrated load model known as the Tandem Axle System consisted of a pair of axles with 4 identical wheels having a square contact area of 0.4 m x 0.4 m. For span greater than 10 m the pair of axles can be replaced by a single of the same total weight. The UDL System has a constant value irrespective of the span length although there were different values for the individual traffic lanes. The distributed load was to be applied to the unfavourable parts of the influence surface. These can be multiplied by adjustment factors to allow for site specific situation but in most cases the factors are to be taken as unity. The load model 1 are shown in figure below & the basic values of the tandem axle system & the UDL system which include dynamic amplification are shown in table below for the different notional lanes. The development of an equivalent loading model so called ‘formula loading’ has to take account the vehicle load effects of both bending moments & shears. As far as the structure is concerned it is the load effects (i.e bending moments & shears) on individual structural members that are important rather than the total weight carried by the structure or member. Equivalent loading model adopted UDL + KEL to represent the vehicle effects for both bending moments & shears. Derivation of Load Model 1 & 2 The traffic data from a number of European countries were analysed. The data contained useful information about axle weights, axle spacing, distance between vehicles & the length of vehicles, on the inside or slow lanes. The durations of various data collection ranged from a few hours to 800 hrs. It was found that the traffic parameters from the different countries were not very different when comparing the daily values of axles & vehicle weights. However it was decided to use a set data recorded on the A6 motorway near city of Auxerre in France as it was felt that traffic data would give a good representation of European traffic as a whole. Methodology The procedure for the development of the main load model LM1 consisted of the following: Determination of target values of various traffic load effects which were to be reproduced by the design load models. The load effects were to be extrapolated to correspond to a probability of exceedance in 50 years or return period of 1000 years. To find & define the load model which was the best to reproduce the target values for loaded lengths from 5 m to 200 m. The target values were determined based on the traffic data recorded at Auxerre took into account different extrapolation methods, traffic compositions, different influence lines for various load effects & the dynamic effects from flowing  traffic. The total weights of the vehicles generated were interpolated to certain return periods using Gaussian, Poisson or extremal distributions. The dynamic behaviour of the vehicles & bridges was examined based on assumed roughness values for the carriageway surface. These dynamic functions were combined statically with the static effects  & were used to generate the target values for calibrating the proposed load models. The aim of the calibration was to develop a loading model which would allow for dynamic magnification effects, different traffic patterns, scenario of free flowing & congested traffic, would cover both global & local effects. Based on the studies, the final version of LM1 is shown below; The basic values of the concentrated & distributed loads may be modified by adjustment factors which allow reduced loading to be adopted in specific situations. Load Model 2 Calculation showed that the tandem-axle system in LM1 did not adequately cover all the local effects for all vehicles in particular those on orthotropic slabs. Therefore for local effects Load Model 2 (LM2) was introduced, consisting of a single axle with a total load of 400 kN including dynamic effects which can be applied to certain areas as shown in below; Load Model 3 These load model consisted of sets of axle loads at given spacings to represent special vehicles (SV-abnormal vehicles) carrying heavy loads on designated abnormal load routes. The weights of listed SVs range from 600 to 3600 kN with from 4 axles to 18 axles & with axle loads of 150, 200 or 240 kN depending on the class of the vehicle concerned. These model was intended to be used only as requested by the client & suitable for both global & local effects. The SVs were assumed to moved at very low speeds & therefore contained no allowance for dynamic effects. The selected SVs were applied to either 1 or 2 adjacent lanes depending on the weight & dimension of the vehicle concerned. Each notional lane & any remaining area was to be loaded by the main loading system as well but using frequent rather than characteristic values for the intensities of the concentrated & distributed loads. On the lanes actually occupied by the SV, the main loading system was not applied for a distance of 25 m in front of & behind the SV. Load Model 4 These represent crowd loading & consists of a UDL of 5 kN/m2 & only to be applied when requested by the client.



"""

<h2>Final Summary</h2>

In [3]:
doc = nlp(text_corpus.replace("\n", ""))
#sentences = [sent.string.strip() for sent in doc.sents] #old string changed to text
sentences = [sent.text.strip() for sent in doc.sents]

print("Senetence are: \n", sentences)

# Let's create an organizer which will store the sentence ordering to later reorganize the 
# scored sentences in their correct order
sentence_organizer = {k:v for v,k in enumerate(sentences)}

print("Our sentence organizer: \n", sentence_organizer)

# Let's now create a tf-idf (Term frequnecy Inverse Document Frequency) model
tf_idf_vectorizer = TfidfVectorizer(min_df=2,  max_features=None, 
                                    strip_accents='unicode', 
                                    analyzer='word',
                                    token_pattern=r'\w{1,}',
                                    ngram_range=(1, 3), 
                                    use_idf=1,smooth_idf=1,
                                    sublinear_tf=1,
                                    stop_words = 'english')

# Passing our sentences treating each as one document to TF-IDF vectorizer
tf_idf_vectorizer.fit(sentences)

# Transforming our sentences to TF-IDF vectors
sentence_vectors = tf_idf_vectorizer.transform(sentences)

# Getting sentence scores for each sentences
sentence_scores = np.array(sentence_vectors.sum(axis=1)).ravel()

# Sanity checkup
print(len(sentences) == len(sentence_scores))

# Getting top-n sentences
N = 15
top_n_sentences = [sentences[ind] for ind in np.argsort(sentence_scores, axis=0)[::-1][:N]]

# Let's now do the sentence ordering using our prebaked sentence_organizer
# Let's map the scored sentences with their indexes
mapped_top_n_sentences = [(sentence,sentence_organizer[sentence]) for sentence in top_n_sentences]
print("Our top_n_sentence with their index: \n")
for element in mapped_top_n_sentences:
    print(element)

# Ordering our top-n sentences in their original ordering
mapped_top_n_sentences = sorted(mapped_top_n_sentences, key = lambda x: x[1])
ordered_scored_sentences = [element[0] for element in mapped_top_n_sentences]

# Our final summary
summary = " ".join(ordered_scored_sentences)

print("Summary: \n", summary)

Senetence are: 
 ['For road bridges the code is applicable to bridges with individual spans ranging from 5 m to 200 m & with the carriageway widths not greater than 42 m. There were 4 models for determining the main vertical loads from traffic representing the different types of traffic & different design situations.', 'There were rules for determining the secondary live loads such as those due to acceleration & braking & centrifugal effects, the requirements for accidental loading cover collision with bridge supports & the effect of errant vehicles on areas of footways & cycle tracks.', 'Load Model 1 These was the main traffic loading system & consisted of concentrated & uniformly distributed loads to cover the global & local effects of normal traffic.', 'The concentrated load model known as the Tandem Axle System consisted of a pair of axles with 4 identical wheels having a square contact area of 0.4 m x 0.4 m. For span greater than 10 m the pair of axles can be replaced by a single 