In [1]:
import numpy as np
import pandas as pd
from nltk import sent_tokenize

In [25]:
text = """Chronobiology might sound a little futuristic – like something from a science fiction novel, perhaps – but it’s actually a field of study that concerns one of the oldest processes life on this planet has ever known: short-term rhythms of time and their effect on flora and fauna.

This can take many forms. Marine life, for example, is influenced by tidal patterns. Animals tend to be active or inactive depending on the position of the sun or moon. Numerous creatures, humans included, are largely diurnal – that is, they like to come out during the hours of sunlight. Nocturnal animals, such as bats and possums, prefer to forage by night. A third group are known as crepuscular: they thrive in the low-light of dawn and dusk and remain inactive at other hours.

When it comes to humans, chronobiologists are interested in what is known as the circadian rhythm. This is the complete cycle our bodies are naturally geared to undergo within the passage of a twenty-four hour day. Aside from sleeping at night and waking during the day, each cycle involves many other factors such as changes in blood pressure and body temperature. Not everyone has an identical circadian rhythm. ‘Night people’, for example, often describe how they find it very hard to operate during the morning, but become alert and focused by evening. This is a benign variation within circadian rhythms known as a chronotype.

Scientists have limited abilities to create durable modifications of chronobiological demands. Recent therapeutic developments for humans such as artificial light machines and melatonin administration can reset our circadian rhythms, for example, but our bodies can tell the difference and health suffers when we breach these natural rhythms for extended periods of time. Plants appear no more malleable in this respect; studies demonstrate that vegetables grown in season and ripened on the tree are far higher in essential nutrients than those grown in greenhouses and ripened by laser.

Knowledge of chronobiological patterns can have many pragmatic implications for our day-to-day lives. While contemporary living can sometimes appear to subjugate biology – after all, who needs circadian rhythms when we have caffeine pills, energy drinks, shift work and cities that never sleep? – keeping in synch with our body clock is important. 

The average urban resident, for example, rouses at the eye-blearing time of 6.04 a.m., which researchers believe to be far too early. One study found that even rising at 7.00 a.m. has deleterious effects on health unless exercise is performed for 30 minutes afterward. The optimum moment has been whittled down to 7.22 a.m.; muscle aches, headaches and moodiness were reported to be lowest by participants in the study who awoke then.

Once you’re up and ready to go, what then? If you’re trying to shed some extra pounds, dieticians are adamant: never skip breakfast. This disorients your circadian rhythm and puts your body in starvation mode. The recommended course of action is to follow an intense workout with a carbohydrate-rich breakfast; the other way round and weight loss results are not as pronounced.

Morning is also great for breaking out the vitamins. Supplement absorption by the body is not temporal-dependent, but naturopath Pam Stone notes that the extra boost at breakfast helps us get energised for the day ahead. For improved absorption, Stone suggests pairing supplements with a food in which they are soluble and steering clear of caffeinated beverages. Finally, Stone warns to take care with storage; high potency is best for absorption, and warmth and humidity are known to deplete the potency of a supplement.

After-dinner espressos are becoming more of a tradition – we have the Italians to thank for that – but to prepare for a good night’s sleep we are better off putting the brakes on caffeine consumption as early as 3 p.m. With a seven hour half-life, a cup of coffee containing 90 mg of caffeine taken at this hour could still leave 45 mg of caffeine in your nervous system at ten o’clock that evening. It is essential that, by the time you are ready to sleep, your body is rid of all traces.

Evenings are important for winding down before sleep; however, dietician Geraldine Georgeou warns that an after-five carbohydrate-fast is more cultural myth than chronobiological demand. This will deprive your body of vital energy needs. Overloading your gut could lead to indigestion, though. Our digestive tracts do not shut down for the night entirely, but their work slows to a crawl as our bodies prepare for sleep. Consuming a modest snack should be entirely sufficient."""

In [58]:
x = sent_tokenize(text)

In [27]:
len(x)

37

In [28]:
from nltk.corpus import stopwords
from keras.preprocessing.text import Tokenizer
stop_words = stopwords.words('English')

from nltk.stem.porter import *
porter_stemmer = PorterStemmer()

In [59]:
sentences = []
for sentence in x:
    a = " ".join([(word) for word in sentence.lower().split() if word not in stop_words])
    sentences.append(a)
sentences

['chronobiology might sound little futuristic – like something science fiction novel, perhaps – it’s actually field study concerns one oldest processes life planet ever known: short-term rhythms time effect flora fauna.',
 'take many forms.',
 'marine life, example, influenced tidal patterns.',
 'animals tend active inactive depending position sun moon.',
 'numerous creatures, humans included, largely diurnal – is, like come hours sunlight.',
 'nocturnal animals, bats possums, prefer forage night.',
 'third group known crepuscular: thrive low-light dawn dusk remain inactive hours.',
 'comes humans, chronobiologists interested known circadian rhythm.',
 'complete cycle bodies naturally geared undergo within passage twenty-four hour day.',
 'aside sleeping night waking day, cycle involves many factors changes blood pressure body temperature.',
 'everyone identical circadian rhythm.',
 '‘night people’, example, often describe find hard operate morning, become alert focused evening.',
 'be

In [46]:
len(sentences)

37

In [47]:
tokenizer = Tokenizer()
tokenizer.fit_on_texts(sentences)
word_to_index = tokenizer.word_index
index_to_word = tokenizer.index_word

In [48]:
indexed_sentences = tokenizer.texts_to_sequences(sentences)

In [49]:
tokenizer.sequences_to_texts([[1, 5, 8, 7]])

['– rhythms time sleep']

In [50]:
def inverse_document_frequency(documents):
    no_of_documents = len(documents)
    idf = defaultdict(int)
    for document in documents:
        for term in document.split():
            idf[term] += 1
    for term in idf.keys():
        idf[term] = idf[term]/no_of_documents
    return idf
        

In [51]:
from collections import defaultdict
def tfidf(sentence, idf_dict):
    sentlen = len(sentence.split())
    word_frequency = defaultdict(int)
    for i in sentence.split():
        word_frequency[i] += 1
    for i in word_frequency.keys():
        word_frequency[i] = word_frequency[i]/sentlen
    sent_tfidf =[]
    for word in sentence.split():
        sent_tfidf.append(word_frequency[word] * idf_dict[word])
    return sent_tfidf

In [52]:
text_idf = inverse_document_frequency(sentences)
sents_tfidf =[]
for sentence in sentences:
    sents_tfidf.append(tfidf(sentence,text_idf))
sents_tfidf[0]

[0.0008718395815170009,
 0.0008718395815170009,
 0.0008718395815170009,
 0.0008718395815170009,
 0.0008718395815170009,
 0.012205754141238012,
 0.0017436791630340018,
 0.0008718395815170009,
 0.0008718395815170009,
 0.0008718395815170009,
 0.0008718395815170009,
 0.0008718395815170009,
 0.012205754141238012,
 0.0008718395815170009,
 0.0008718395815170009,
 0.0008718395815170009,
 0.0026155187445510027,
 0.0008718395815170009,
 0.0017436791630340018,
 0.0008718395815170009,
 0.0008718395815170009,
 0.0008718395815170009,
 0.0008718395815170009,
 0.0008718395815170009,
 0.0008718395815170009,
 0.0008718395815170009,
 0.0034873583260680036,
 0.0026155187445510027,
 0.0008718395815170009,
 0.0008718395815170009,
 0.0008718395815170009]

In [53]:
df = pd.DataFrame(list(zip(x , sents_tfidf)) , columns = ['documents','tfidf'])

In [54]:
from statistics import mean
df['score'] = [mean(l) for l in list(df['tfidf'])]

In [55]:
threshold = mean(df['score'])
threshold

0.005396222113665702

In [56]:
summary = []
for i in range(len(df)):
    if df.iloc[i]['score']>=threshold*1:
        summary.append(df.iloc[i]['documents'])


In [57]:
summary

['This can take many forms.',
 'Marine life, for example, is influenced by tidal patterns.',
 'When it comes to humans, chronobiologists are interested in what is known as the circadian rhythm.',
 'Not everyone has an identical circadian rhythm.',
 'This is a benign variation within circadian rhythms known as a chronotype.',
 '– keeping in synch with our body clock is important.',
 'Once you’re up and ready to go, what then?',
 'This disorients your circadian rhythm and puts your body in starvation mode.',
 'Morning is also great for breaking out the vitamins.',
 'It is essential that, by the time you are ready to sleep, your body is rid of all traces.',
 'This will deprive your body of vital energy needs.',
 'Consuming a modest snack should be entirely sufficient.']