In [31]:
!pip install scikit-learn

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [32]:
import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import TruncatedSVD

In [33]:
document = """
Astronomers have used the James Webb Space Telescope to peer back in time to the early days of the universe — and they spotted something unexpected.
The space observatory revealed six massive galaxies that existed between 500 million and 700 million years after the big bang that created the universe. The discovery is completely upending existing theories about the origins of galaxies, according to a new study published Wednesday in the journal Nature.
“These objects are way more massive​ than anyone expected,” said study coauthor Joel Leja, assistant professor of astronomy and astrophysics at Penn State University, in a statement. “We expected only to find tiny, young, baby galaxies at this point in time, but we’ve discovered galaxies as mature as our own in what was previously understood to be the dawn of the universe.”
The telescope observes the universe in infrared light, which is invisible to the human eye, and is capable of detecting the faint light from ancient stars and galaxies. By peering into the distant universe, the observatory can essentially see back in time up to about 13.5 billion years ago. Scientists have determined the universe is about 13.7 billion years old.
"""

In [34]:
vectorizer = TfidfVectorizer(stop_words='english')
X = vectorizer.fit_transform([document])

In [35]:
lsa = TruncatedSVD(n_components=1, algorithm='randomized', n_iter=100, random_state=42)
lsa.fit(X)

TruncatedSVD(n_components=1, n_iter=100, random_state=42)

In [36]:
sentences = document.split('.')
important_sentences = np.argsort(np.abs(lsa.components_[0]))[::-1]

# Ensure that the indices in important_sentences are within the range of valid indices for sentences
valid_indices = [i for i in important_sentences if i < len(sentences)]

# Extract the two most important sentences based on the valid indices
summary_sentences = [sentences[i].strip() for i in valid_indices[:3]]

# If there are not enough valid indices, pad the summary with empty strings
while len(summary_sentences) < 2:
    summary_sentences.append('')

In [37]:
summary = '. '.join(summary_sentences) + '.'
print(summary)

Astronomers have used the James Webb Space Telescope to peer back in time to the early days of the universe — and they spotted something unexpected. . 7 billion years old.
