<a href="https://colab.research.google.com/github/cychen116/ExploratoryDataAnalysis_practice/blob/main/07_EDA_practice_SearchEngine.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Goal
Find a big set of text and implement a query/search engine from scratch.
- Reference:  https://medium.com/@deangelaneves/how-to-build-a-search-engine-from-scratch-in-python-part-1-96eb240f9ecb.

In [7]:
import pandas as pd
import numpy as np
from tqdm import tqdm 
import re
import spacy
import string


In [1]:
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

# Data Loading
Using scikit-learn dataset

In [4]:
from sklearn.datasets import fetch_20newsgroups

newsgroups_databunch = fetch_20newsgroups(
    subset = 'train',
    shuffle = True, 
    random_state = 1
)

newsgroups_data = pd.DataFrame(newsgroups_databunch.data, columns = ['text'])
newsgroups_data.head()

Unnamed: 0,text
0,"From: ab4z@Virginia.EDU (""Andi Beyer"")\nSubjec..."
1,From: timmbake@mcl.ucsb.edu (Bake Timmons)\nSu...
2,From: bc744@cleveland.Freenet.Edu (Mark Ira Ka...
3,From: ray@ole.cdac.com (Ray Berry)\nSubject: C...
4,From: kkeller@mail.sas.upenn.edu (Keith Keller...


# Text Processing

In [8]:
tqdm.pandas()

# Get the appropriate spaCy model to use.
spacy_model = spacy.load('en_core_web_sm', disable = ['parser', 'ner'])

def _remove_punctuation(text, step):
    if step == 'initial':
        return [
            token for token in text if re.sub(r'[\W_]+', ' ', token.text)
            not in string.punctuation
            and re.sub(r'([\W_])+', ' ', token.text) != ' '
            and re.sub(r'([\W_])+', ' ', token.text) != ''
        ]
    elif step == 'last':
        return [re.sub(r'[\W_]+', ' ', token) for token in text]

def _remove_stop_words(text):
    return [token for token in text if not token.is_stop]

def _lemmatize(text):
    return [token.lemma_ for token in text]

def preprocess_text(text, is_search_space=True):
    if is_search_space:
        # Remove the upper header part of the text.
        # We only need to do this for the search
        # space, not for the query string.
        step_1 = ' '.join(text.split('\n\n')[1:])
    else:
        step_1 = text

    # Lowercase text and remove extra spaces.
    step_2_3 = ' '.join(
        [word.lower() for word in str(step_1).split()]
    )

    # Tokenize text with spaCy.
    step_4 = spacy_model(step_2_3)

    # Remove punctuation.
    step_5 = _remove_punctuation(step_4, step = 'initial')

    # Remove stop words.
    step_6 = _remove_stop_words(step_5)

    # Lemmatize text.
    step_7 = _lemmatize(step_6)

    # Remove punctuation again.
    step_8 = _remove_punctuation(step_7, step = 'last')

    # Remake sentence with new cleaned up tokens.
    return ' '.join(step_8)


In [9]:
newsgroups_data['text_processed'] = newsgroups_data['text'] \
    .progress_apply(preprocess_text)

newsgroups_data.head()

100%|██████████| 11314/11314 [04:13<00:00, 44.56it/s]


Unnamed: 0,text,text_processed
0,"From: ab4z@Virginia.EDU (""Andi Beyer"")\nSubjec...",sure story nad biased disagree statement u s m...
1,From: timmbake@mcl.ucsb.edu (Bake Timmons)\nSu...,james hogan write timmbake mcl ucsb edu bake t...
2,From: bc744@cleveland.Freenet.Edu (Mark Ira Ka...,realize principle strong point like know ask q...
3,From: ray@ole.cdac.com (Ray Berry)\nSubject: C...,notwithstanding legitimate fuss proposal chang...
4,From: kkeller@mail.sas.upenn.edu (Keith Keller...,change scoring playoff pool unfortunately time...


In [10]:
print(f'=====TEXT BEFORE PROCESSING===== \n"{newsgroups_data["text"][0]}"')
print(f'=====TEXT AFTER PROCESSING===== \n"{newsgroups_data["text_processed"][0]}"')

=====TEXT BEFORE PROCESSING===== 
"From: ab4z@Virginia.EDU ("Andi Beyer")
Subject: Re: Israeli Terrorism
Organization: University of Virginia
Lines: 15

Well i'm not sure about the story nad it did seem biased. What
I disagree with is your statement that the U.S. Media is out to
ruin Israels reputation. That is rediculous. The U.S. media is
the most pro-israeli media in the world. Having lived in Europe
I realize that incidences such as the one described in the
letter have occured. The U.S. media as a whole seem to try to
ignore them. The U.S. is subsidizing Israels existance and the
Europeans are not (at least not to the same degree). So I think
that might be a reason they report more clearly on the
atrocities.
	What is a shame is that in Austria, daily reports of
the inhuman acts commited by Israeli soldiers and the blessing
received from the Government makes some of the Holocaust guilt
go away. After all, look how the Jews are treating other races
when they got power. It is unfortun

# TF-IDF From Scratch

## Term Frequency

In [11]:
def tokenize(text):
    tokens = spacy_model(text)

    return [token.text for token in tokens if (token.text != ' ' and token.text != '')]

newsgroups_data['text_tokenized'] = newsgroups_data['text_processed'] \
    .progress_apply(tokenize)

100%|██████████| 11314/11314 [02:02<00:00, 92.61it/s] 


In [12]:
from nltk import FreqDist

# Just select 1000 rows at random.
newsgroups_data_sample = newsgroups_data.sample(1000)

# Get the frequency distribution for each row.
search_space_vectorized = pd.DataFrame()
for row in tqdm(newsgroups_data_sample['text_tokenized']):
    search_space_vectorized = search_space_vectorized.append(
        dict(FreqDist(row)),
        ignore_index = True
    )
search_space_vectorized = search_space_vectorized.fillna(0)
search_space_vectorized.head()

100%|██████████| 1000/1000 [02:29<00:00,  6.70it/s]


Unnamed: 0,article,93332,hydra,gatech,edu,gt1091a,prism,kaan,timucin,write,hell,guy,david,davidian,think,talk,alter,ego,yo,better,shut,f,o,k,ok,go,come,like,attitute,lie,shit,united,states,refer,freedom,speech,prove,wrong,simply,fade,...,susppension,tocchet,091139,16ba5da01,823,determination,determined,dubious,independently,omc,omniscient,parable,qpliu,thug,undesirable,eon,frown,grail,hubris,polygenetic,psychologically,schaeffer,appoint,embarrassment,halt,lex,lifer,miscarriage,miscarry,mischief,skyblu,talionis,veil,carrying,copper,futserv,keen,markz,romex,zenier
0,1.0,1.0,1.0,2.0,2.0,3.0,1.0,7.0,2.0,3.0,2.0,1.0,3.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,2.0,1.0,1.0,5.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,6.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,3.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,14.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,2.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Formula $1 + ln(t_{freq})$.

In [13]:
def calculate_term_frequency(column):
    normalized_freq = np.log(
        column, 
        # Subtracting 1 here because it will return a 1 for term 
        # frequencies that are supposed to be 0 since we add 1 at the end
        out = np.zeros_like(column) - 1,
        where = (column != 0)
    )

    return 1 + normalized_freq

search_space_tf = search_space_vectorized.progress_apply(calculate_term_frequency)
search_space_tf.head()

100%|██████████| 34909/34909 [00:21<00:00, 1592.98it/s]


Unnamed: 0,article,93332,hydra,gatech,edu,gt1091a,prism,kaan,timucin,write,hell,guy,david,davidian,think,talk,alter,ego,yo,better,shut,f,o,k,ok,go,come,like,attitute,lie,shit,united,states,refer,freedom,speech,prove,wrong,simply,fade,...,susppension,tocchet,091139,16ba5da01,823,determination,determined,dubious,independently,omc,omniscient,parable,qpliu,thug,undesirable,eon,frown,grail,hubris,polygenetic,psychologically,schaeffer,appoint,embarrassment,halt,lex,lifer,miscarriage,miscarry,mischief,skyblu,talionis,veil,carrying,copper,futserv,keen,markz,romex,zenier
0,1.0,1.0,1.0,1.693147,1.693147,2.098612,1.0,2.94591,1.693147,2.098612,1.693147,1.0,2.098612,1.693147,1.0,1.0,1.0,1.0,1.0,1.0,1.693147,1.0,1.693147,1.0,1.0,2.609438,1.0,1.693147,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.693147,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,2.791759,0.0,0.0,0.0,0.0,2.098612,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.693147,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,2.098612,0.0,0.0,0.0,2.609438,0.0,0.0,0.0,0.0,2.098612,0.0,0.0,0.0,0.0,1.693147,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.693147,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.693147,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.693147,3.639057,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.693147,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Inverse Document Frequency

In [14]:
def calculate_inverse_document_frequency(column):
    document_frequency = column[column > 0].count()
    n = column.shape[0]
    return np.log(n / document_frequency)

search_space_idf = search_space_vectorized.progress_apply(calculate_inverse_document_frequency)
search_space_idf.head()

100%|██████████| 34909/34909 [00:11<00:00, 3020.37it/s]


article    0.796288
93332      6.907755
hydra      4.828314
gatech     4.422849
edu        0.685179
dtype: float64

## TF-IDF

In [15]:
search_space_tf_idf = search_space_tf * search_space_idf
search_space_tf_idf.head()

Unnamed: 0,article,93332,hydra,gatech,edu,gt1091a,prism,kaan,timucin,write,hell,guy,david,davidian,think,talk,alter,ego,yo,better,shut,f,o,k,ok,go,come,like,attitute,lie,shit,united,states,refer,freedom,speech,prove,wrong,simply,fade,...,susppension,tocchet,091139,16ba5da01,823,determination,determined,dubious,independently,omc,omniscient,parable,qpliu,thug,undesirable,eon,frown,grail,hubris,polygenetic,psychologically,schaeffer,appoint,embarrassment,halt,lex,lifer,miscarriage,miscarry,mischief,skyblu,talionis,veil,carrying,copper,futserv,keen,markz,romex,zenier
0,0.796288,6.907755,4.828314,7.488534,1.160109,13.042053,4.50986,18.307677,10.522246,1.172319,5.492897,2.97593,5.492689,7.975622,1.287354,2.577022,5.115996,5.809143,5.115996,3.270169,7.975622,3.057608,4.077006,3.324236,3.218876,4.489197,1.766092,1.993944,6.214608,3.411248,5.298317,3.963316,4.199705,3.649659,4.199705,4.422849,3.352407,4.431467,3.611918,6.214608,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.685179,0.0,0.0,0.0,0.0,0.558616,0.0,2.97593,0.0,0.0,3.593984,0.0,0.0,0.0,0.0,6.862817,0.0,0.0,0.0,0.0,0.0,1.720369,1.766092,1.993944,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.352407,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.796288,0.0,0.0,0.0,0.685179,0.0,0.0,0.0,0.0,0.558616,0.0,0.0,0.0,0.0,1.287354,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.720369,1.766092,1.177655,0.0,0.0,0.0,0.0,0.0,3.649659,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,1.6711,0.0,0.0,0.0,1.787932,0.0,0.0,0.0,0.0,1.172319,0.0,0.0,0.0,0.0,2.17968,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.912839,1.766092,1.177655,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.431467,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.796288,0.0,0.0,0.0,0.685179,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.577022,0.0,0.0,0.0,0.0,0.0,0.0,4.077006,12.097087,0.0,0.0,0.0,1.177655,0.0,0.0,0.0,0.0,0.0,3.649659,0.0,0.0,5.676119,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


# Search Function

In [16]:
from sklearn.metrics.pairwise import cosine_similarity

def search(query, search_space, search_space_tf_idf, search_space_idf, top_n=5):
    # Preprocess query text.
    query_processed = preprocess_text(query, is_search_space = False)

    # Tokenize query and vectorize to get term frequency.
    tokens = tokenize(query_processed)
    query_temp = pd.DataFrame()
    query_vectorized = query_temp.append(
        dict(FreqDist(tokens)), 
        ignore_index = True
    )
    term_frequency = query_vectorized.apply(calculate_term_frequency)
    
    # Reformat term frequency so it matches search space format
    query_dict = {}
    for word in search_space_tf_idf.columns.tolist():
        if word in term_frequency.columns.tolist():
            query_dict[word] = term_frequency.loc[:, word].values[0]
        else:
            query_dict[word] = 0.0
    query_tf = pd.DataFrame.from_records([query_dict])

    # Calculate query TF-IDF.
    query_tf_idf = query_tf * search_space_idf
    
    # Calculate cosine similarity between
    # query TF-IDF vector and search space TF-IDF vectors
    similarity_matrix = cosine_similarity(query_tf_idf, search_space_tf_idf)[0]
    sorted_idxs = np.argsort(similarity_matrix)[::-1][:top_n]
    
    return search_space.iloc[sorted_idxs, :].reset_index(drop = True)

# Search Function Testing

In [17]:
search_results = search(
    'Hello World!', 
    newsgroups_data_sample[['text']], 
    search_space_tf_idf, 
    search_space_idf,
    top_n = 3
)

print(f'TOP RESULT'.center(50, '='))
print(f'{search_results.iloc[0].values[0]}')
print('=' * 50)
print()
print('SECOND RESULT'.center(50, '='))
print(f'{search_results.iloc[1].values[0]}')
print('=' * 50)
print()
print('THIRD RESULT'.center(50, '='))
print(f'{search_results.iloc[2].values[0]}')
print('=' * 50)
print()

From: dunguyen@ecs.umass.edu
Subject: Hayes 9600 external AC pins???
Lines: 7

Hello, 
I have a Hayes 9600 moden with no cables or manuals.  The
modem requires a source of 14V AC, but I do not know how
to connect the power source to the 3 pin connector.  I know
that the top pin is the ground, so I would guess that the other
two are the AC pins, right?  If you have any hints, please
E-Mail me, I really need help...  Thanks!!!  Duc N.


From: spring@diku.dk (Jesper Honig Spring)
Subject: 486/66DX2 (ISA) vs. 486/50DX2 (EISA)
Organization: Department of Computer Science, U of Copenhagen
Lines: 18


Hello,

Can anyone give me their opinion on which system has got the best overall
system performance;

486/66DX2 with ISA-BUS or
486/50DX2 with EISA-BUS

The systems are equal in all other areas.

Thanks in advance

-- 
-------------------------------------------------------------------------------
jesper honig spring, spring@diku.dk |        IF ANIMALS BELIEVED IN GOD       
university of copen

In [19]:
search_results = search(
    'Education is important.', 
    newsgroups_data_sample[['text']], 
    search_space_tf_idf, 
    search_space_idf,
    top_n = 3
)

print(f'TOP RESULT'.center(50, '='))
print(f'{search_results.iloc[0].values[0]}')
print('=' * 50)
print()
print('SECOND RESULT'.center(50, '='))
print(f'{search_results.iloc[1].values[0]}')
print('=' * 50)
print()
print('THIRD RESULT'.center(50, '='))
print(f'{search_results.iloc[2].values[0]}')
print('=' * 50)
print()

From: kadie@cs.uiuc.edu (Carl M Kadie)
Subject: Re: Organized Lobbying for Cryptography
Organization: University of Illinois, Dept. of Comp. Sci., Urbana, IL
Lines: 18

kubo@zariski.harvard.edu (Tal Kubo) writes:

[...]
>The EFF has been associated with efforts to prevent the banning of sex
>and pictures newsgroups at various universities.
[...]

So what? Justices William Brennan, Thurgood Marshall, John Paul
Stevens, and Byron White are associated with a plurality Supreme Court
decision that prevented the removal of "anti-American, anti-Christian,
anti-Semitic, and just plain filthy" books from a public high school
library [_Board of Education v. Pico_ (1982)]. Does this mean that
they could no longer defend free expression and privacy?

- Carl
-- 
Carl Kadie -- I do not represent any organization; this is just me.
 = kadie@cs.uiuc.edu =


From: arp@cooper!osd (Andrew Pinkowitz)
Subject: SIGGRAPH -- Conference on Understanding Images
Keywords: graphics animation nyc acm siggraph
Organ