The purpose of this notebook is to test the idea that a simple LSA model could produce better keyword search results than semantic search.

Semantic search uses a language model to understand the meaning of text.  Blackwing uses only a handful of individual words in a search though, so it limits the power of a semantic match.  Instead, a BoW model might fare better.  If document vectors are created through matrix factorization of a TF-IDF (BoW) model, they can be passed to FAISS instead of document embeddings from a langauge model.  The index is built as usual.  Then the search query vector is determined by mapping the words to their TF-IDF vectors, and averaged to create a nx1 dimensional vector, where n matches the dimensionality of the document vectors.  FAISS operates normally and returns results.  

The question is whether or not a FAISS search backed by LSA could outperform a FAISS search backed by a language model, if the search is carried out with a few (1-5) words instead of an entire document.  

In [1]:
import faiss
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow_hub as hub
from time import time
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import MinMaxScaler
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import TruncatedSVD

In [2]:
class USEEmbeddingModel:
    def __init__(self):
        """
        Universal Sentence Encoder (USE) is a state of the art semantic similarity model.
        It is preferable to BERT embeddings for semantic similarity because:
            * It was trained specifically for detecting semantic similarity with sentence pairs
            * It has a greater range of values for the embedding dimensions than BERT, allowing it
              to better separate close matches in the embedding space (0.5 - 0.8 vs 0.79 - 0.87 for BERT)
        This class pulls the pre-trained USE model from TensorFlow Hub, then uses it to
        create document level embeddings.  Note that USE has a dimensionality of 512, meaning
        only the first 512 tokens of the document will be encoded.
        """
        self.model_url = "https://tfhub.dev/google/universal-sentence-encoder/4"
        self.model = None

    def _load_model(self):
        print(f"Model {self.model_url} loading")
        self.model = hub.load(self.model_url)
        print(f"Model {self.model_url} loaded")

    @staticmethod
    def batch(iterable, batch_size=1):
        """
        Creates batches of equal size, batch_size

        Example usage:
            data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]  # list of data

            for x in batch(data, 3):
                print(x)

            # Output

            [0, 1, 2]
            [3, 4, 5]
            [6, 7, 8]
            [9, 10]
        """
        iterable_len = len(iterable)
        for ndx in range(0, iterable_len, batch_size):
            yield iterable[ndx:min(ndx + batch_size, iterable_len)]

    def get_embeddings(self, text_input, batch_size=256):
        """
        Runs text through the model and produces the embeddings

        :param text_input: a list where each item is a document (a comment from this dataset)
        :param batch_size: integer representing how many samples to include in a batch
        """
        self._load_model()
        embeddings = []
        # helper variables to track progress
        nbr_batches = int(np.ceil(len(text_input) / batch_size))
        current_batch = 1

        for batch_indices in self.batch(iterable=range(len(text_input)), batch_size=batch_size):
            progress = round(100 * current_batch / nbr_batches, 2)
            if progress % 10 == 0:
                print(f"Embedding progress: {progress}%")
                print(progress)

            # grab the records for this batch
            batch_records = [text_input[idx] for idx in batch_indices]

            # forward pass over the input
            model_output = self.model(batch_records)

            # save the embeddings
            embeddings.append(model_output.numpy())

            current_batch += 1

        # convert the list of embeddings to a numpy array
        embeddings = np.array(
            [np.array(i) for i in np.vstack(embeddings).tolist()]
        )

        return embeddings

In [3]:
def get_term_vector(term: str, terms: list, components: np.ndarray):
    """Gets the n-dimensional vector for a given word."""
    return components[:,terms.index(term)]

In [4]:
news = fetch_20newsgroups() # remove=('headers', 'footers', 'quotes'))
news_df = pd.DataFrame(news.data)
news_df.columns = ["text"]
news_df = news_df.reset_index(drop=False).rename(columns={"index": "doc_id"})

# Embed Documents with TF-IDF and LSA

In [5]:
vectorizer = TfidfVectorizer()
vectors = vectorizer.fit_transform(news.data)
svd_model = TruncatedSVD(
    n_components=512,  # recommended 100 for LSA, but using 512 to compare with BERT
    random_state=14,
    n_iter=10,  # default is 5 for randomized algorithm
    algorithm='randomized'
)
svd_model.fit(vectors)
doc_topic_matrix = svd_model.fit_transform(vectors)

In [6]:
terms = vectorizer.get_feature_names()

# Embed Documents with USE

In [7]:
model = USEEmbeddingModel()
embeddings = model.get_embeddings(
    text_input=news.data,
    batch_size=64
)

Model https://tfhub.dev/google/universal-sentence-encoder/4 loading


2022-06-14 14:28:31.119633: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


Model https://tfhub.dev/google/universal-sentence-encoder/4 loaded
Embedding progress: 100.0%
100.0


# Build FAISS Indices

In [8]:
NLIST = 10
NPROBE = 100

In [9]:
nbr_documents, lsa_dim = doc_topic_matrix.shape
nbr_embeddings, embed_dim = embeddings.shape
assert(nbr_documents == nbr_embeddings)

In [10]:
# the L2 distance will be used as a quantizer for indices
quantizer_lsa = faiss.IndexFlatL2(lsa_dim)
quantizer_use = faiss.IndexFlatL2(embed_dim)

In [11]:
ivf_flat_index_lsa = faiss.IndexIVFFlat(quantizer_lsa, lsa_dim, NLIST)
ivf_flat_index_lsa.nprobe = NPROBE
ivf_flat_index_lsa.train(doc_topic_matrix.astype(np.float32))
ivf_flat_index_lsa.add_with_ids(doc_topic_matrix.astype(np.float32), news_df.doc_id.values)

ivf_flat_index_use = faiss.IndexIVFFlat(quantizer_use, embed_dim, NLIST)
ivf_flat_index_use.nprobe = NPROBE
ivf_flat_index_use.train(embeddings.astype(np.float32))
ivf_flat_index_use.add_with_ids(embeddings.astype(np.float32), news_df.doc_id.values)

In [12]:
# create the query vectors for sample search terms
search_terms = ["dog", "food"]

query_vector_lsa = np.mean(
    [
        get_term_vector(term=term, terms=terms, components=svd_model.components_) 
        for term in search_terms
    ], 
    axis=0
).reshape(1, -1)
print(query_vector_lsa.shape)

query_vector_use = model.get_embeddings(
    text_input=[" ".join(search_terms)],
    batch_size=1
)
print(query_vector_use.shape)

(1, 512)
Model https://tfhub.dev/google/universal-sentence-encoder/4 loading
Model https://tfhub.dev/google/universal-sentence-encoder/4 loaded
Embedding progress: 100.0%
100.0
(1, 512)


Perform range search to find similar documents within the given distance threshold

In [13]:
DISTANCE_THRESHOLD = 10000  # make this large enough so that everything is within range

In [14]:
ivf_flat_limits_lsa, ivf_flat_distances_lsa, ivf_flat_indices_lsa = ivf_flat_index_lsa.range_search(
    query_vector_lsa.astype(np.float32), 
    DISTANCE_THRESHOLD
)

ivf_flat_limits_use, ivf_flat_distances_use, ivf_flat_indices_use = ivf_flat_index_use.range_search(
    query_vector_use.astype(np.float32), 
    DISTANCE_THRESHOLD
)

In [15]:
ivf_flat_distances_lsa = ivf_flat_distances_lsa.tolist()
ivf_flat_indices_lsa = ivf_flat_indices_lsa.tolist()
results_lsa = pd.DataFrame(
    {"doc_id": ivf_flat_indices_lsa, "distance": ivf_flat_distances_lsa}
)
results_lsa = pd.merge(left=results_lsa, right=news_df, how='left', on='doc_id')
results_lsa.sort_values("distance", axis=0, ascending=True, inplace=True)

ivf_flat_distances_use = ivf_flat_distances_use.tolist()
ivf_flat_indices_use = ivf_flat_indices_use.tolist()
results_use = pd.DataFrame(
    {"doc_id": ivf_flat_indices_use, "distance": ivf_flat_distances_use}
)
results_use = pd.merge(left=results_use, right=news_df, how='left', on='doc_id')
results_use.sort_values("distance", axis=0, ascending=True, inplace=True)

In [16]:
pd.set_option('max_colwidth', 800)

In [17]:
results_lsa.head(10)

Unnamed: 0,doc_id,distance,text
4573,10963,0.178442,"From: james@dlss2 (James Cummings)\nSubject: Re: More Cool BMP files??\nOrganization: RedRock Development\nDistribution: usa\nLines: 1021\n\nIn article <1993Apr17.023017.17301@gmuvax2.gmu.edu> rwang@gmuvax2.gmu.edu (John Wang) writes:\n |Hi, everybody:\n | I guess my subject has said it all. It is getting boring\n |looking at those same old bmp files that came with Windows. So,\n |I am wondering if there is any body has some beautiful bmp file\n |I can share. Or maybe somebody can tell me some ftp site for\n |some bmp files, like some scenery files, some animals files,\n |etc.... I used to have some, unfortunately i delete them all.\n |\n |Anyway could me give me some help, please???\n |\n\n\tIn response to a ""different"" kinda wallpaper, here's what I\nuse. I think the original..."
250,565,0.178707,"From: spl@dim.ucsd.edu (Steve Lamont)\nSubject: Re: Finding equally spaced points on a sphere.\nOrganization: University of Calif., San Diego/Microscopy and Imaging Resource\nLines: 326\nNNTP-Posting-Host: dim.ucsd.edu\n\nIn article <4615trd@rpi.edu> deweeset@ptolemy2.rdrc.rpi.edu (Thomas E. DeWeese) writes:\n> Hello, I know that this has been discussed before. But at the time\n>I didn't need to teselate a sphere. So if any kind soul has the code\n>or the alg, that was finally decided upon as the best (as I recall it\n>was a nice, iterative subdivision meathod), I would be very \n>appreciative.\n\nHere is one by Andrew ""Graphics Gems"" Glassner that I got from a\ncollegue of mine. I think I fiddled with it a little bit to make it\ndeal with whatever bizarre problem I was working on ..."
4524,10852,0.182106,"From: boylan@sltg04.ljo.dec.com (Steve Boylan)\nSubject: Re: Christian Daemons? [Biblical Demons, the update]\nReply-To: boylan@ljohub.enet.dec.com (Steve Boylan)\nOrganization: Digital Equipment Corporation\nLines: 61\n\n\nIn article <1993Apr1.024850.20111@sradzy.uucp>, radzy@sradzy.uucp\n(T.O. Radzykewycz) writes:\n\n> >>swaim@owlnet.rice.edu (Michael Parks Swaim) writes:\n> >>> 666, the file permission of the beast.\n> \n> >radzy@sradzy.uucp (T.O. Radzykewycz) writes:\n> >> Sorry, but the file permission of the beast is 600.\n> >> \n> >> And the file permission of the home directory of the\n> >> beast is 700.\n> \n> boylan@sltg04.ljo.dec.com (Steve Boylan) writes:\n> >Hey, radzy, it must depend on your system's access policy.\n> >I get:\n> >\t$ ls -lg /usr/users\n> >\ttotal 3\n> >\..."
4689,11248,0.183304,"From: mgengelb@cs.ruu.nl (Marcel Engelbertink)\nSubject: NO MORE ROLEX-IMITATIONS\nOrganization: Utrecht University, Dept. of Computer Science\nLines: 28\n\nJammer !\n\n Dit is geen fantastische advertentie over nep-rolexen\n maar een evenzo duidelijke mededeling hieromtrent :\n\n Aangezien het alleen al aanbieden van deze horloges onder\n vermelding van de echte merknaam niet geheel correct is,\n wil ik met dit bericht duidelijk maken dat ik, Marcel Engelbertink,\n niet meer zal adverteren met imitatie-horloges van het merk ROLEX.\n\n Enig persoon die hierin geiinteresseerd is kan ik jammer genoeg ook niet\n meer helpen.\n\n\n\n\n For all the foreign people who can't even understand dutch ?!? :\n\n In spite of earlier mailing about fake-rolex's, I announce that I\n don't have any info..."
2546,6110,0.183631,"From: heathman@ncsa.uiuc.edu (Michael Heathman)\nSubject: Re: dogs\nOriginator: heathman@troon.ncsa.uiuc.edu\nOrganization: Nat'l Ctr for Supercomp App (NCSA) @ University of Illinois\nLines: 31\n\nIn article <93Apr20.193958.30419@acs.ucalgary.ca> parr@acs.ucalgary.ca (Charles Parr) writes:\n>\n>What, a dog weighs 150lb maybe, at max? You can't handle it?\n>\n>You have, I presume, thumbs? Grapple with it and tear it's head\n>off!\n>\n>Sheesh, even a trained attack dog is no match for a human,\n>we have *all* the advantages.\n>\n>Regards, Charles\n>DoD0.001\n>RZ350\n>-- \n\n\tProfessionals who train guard dogs, when polled, gave themselves a\n1 in 4 chance of survival tackling a trained dog unarmed. A trained guard\ndog is not to be trifled with. An untrained mutt may be another story..."
1445,3459,0.183774,"From: jessea@u013.me.vp.com (Jesse W. Asher)\nSubject: X version of whois??\nOrganization: Varco-Pruden Buildings\nLines: 8\n\nHas an X version of whois been written out there? If so, where can I ftp it\nfrom? Thanks.\n\n-- \n Jesse W. Asher (901)762-6000\n Varco-Pruden Buildings\n 6000 Poplar Ave., Suite 400, Memphis, TN 38119\n Internet: jessea@vpbuild.vp.com UUCP: vpbuild!jessea\n"
3128,7454,0.192404,"From: lusardi@cs.buffalo.edu (Christopher Lusardi)\nSubject: Program Included: 2 Edge Detection Algorithms!\nArticle-I.D.: acsu.C5JqM6.HLG\nOrganization: State University of New York at Buffalo/Comp Sci\nLines: 142\nNntp-Posting-Host: hadar.cs.buffalo.edu\n\n/*\n\nThis program doesn't detect edges with compass operators and a laplacian\noperator. It should output 2 raw grey-scale images with edges. The output\ndoesn't look like edges at all.\n\nIn novicee terms, how do I correct the errors? Any improvements are welcome.\n(I'll even accept your corrected code.)\n\n(If I convolve the INPUT.IMAGE with a digital gaussian [7 by 7] to remove\nnoise, will I get an improvement with the laplacian.)\n\n--------------------------2 types of edge detection-------------------------*/\n#include <stdi..."
2087,4975,0.192555,"From: montasmm@ntmtv.com (Medi Montaseri)\nSubject: Saddle bags and helmets for sale...\nOriginator: montasmm@nmtvs299\nNntp-Posting-Host: nmtvs299\nReply-To: montasmm@ntmtv.com (Medi Montaseri)\nOrganization: Northern Telecom Inc, Mountain View, CA\nDistribution: ba\nLines: 28\n\nI'm selling the following items...\n\n\t- a pair of hard saddle bags \n\t- easy installation \n\t- snap release feature with lock\n\t- black \n\t- brand is Krusures\n\n\t- two oshi full face helmets\n\n\ttake all for $275\n\nThese are comming off of my bike that I'm selling, maybe \nyou could use the whole thing, bike and accessories.\n\n\t1983 Yamaha, vision 550 \n\n\tcall Medi @ work (415) 940-2306\n\t\t home (408) 744-1169\n\nThanks\n\n\n-- \n+-------------------------------------------------------+\n| ..."
1910,4578,0.192858,From: Center for Policy Research <cpr@igc.apc.org>\nSubject: Poem by Erich Fried\nNf-ID: #N:cdp:1483500363:000:1387\nNf-From: cdp.UUCP!cpr Apr 25 05:29:00 1993\nLines: 46\n\n\nFrom: Center for Policy Research <cpr>\nSubject: Poem by Erich Fried \n\n\nPoem by German-Jewish poet Erich Fried (Holocaust survivor)\n\nEin Jude an die zionistischen Kaempfer - 1988\n\n von Erich Fried\n\nWas wollt ihr eigentlich ? Wollt ihr wirklich die uebertreffen\ndie euch niedergetreten haben vor einem Menschenalter in euer\neigenes Blut und in euren eigenen Kot ?\n\n\t *\n\nWollt ihr die alten Foltern jetzt an andere weitergeben mit allen\nblutigen dreckigen Einzelheiten mit allem brutalen Genuss die\nFolterknechte wie unsere Vaeter sie damals erlitten haben ?\n\n *\n\nWollt jetzt wirklich ih...
5755,2877,0.193132,From: jjd1@cbnewsg.cb.att.com (james.j.dutton)\nSubject: Re: bikes with big dogs\nOrganization: AT&T\nDistribution: na\nLines: 20\n\nIn article <1993Apr14.234835.1@cua.edu> 84wendel@cua.edu writes:\n>Has anyone ever heard of a rider giving a big dog such as a great dane a ride \n>on the back of his bike. My dog would love it if I could ever make it work.\n>\tThanks\n>\t\t\t84wendel@cua.edu\n \n If a large Malmute counts then yes someone has heard(and seen) such\nan irresponsible childish stunt. The dog needed assistance straightening\nout once on board. The owner would lift the front legs of dog and throw\nthem over the driver/pilots shoulders. Said dog would get shit eating\ngrin on its face and away they'd go. The dogs ass was firmly planted\non the seat.\n \n My dog and this dog ac...


In [18]:
results_use.head(10)

Unnamed: 0,doc_id,distance,text
381,4378,1.398383,"From: twain@carson.u.washington.edu (Barbara Hlavin)\nSubject: Re: Is MSG sensitivity superstition?\nArticle-I.D.: shelley.1qvq10INNlij\nDistribution: na\nOrganization: University of Washington, Seattle\nLines: 38\nNNTP-Posting-Host: carson.u.washington.edu\n\nIn article <1993Apr19.204855.10818@rtsg.mot.com> lundby@rtsg.mot.com (Walter F. Lundby) writes:\n>As nobody in the food industry has even bothered to address my previous\n>question ""WHY DO YOU NEED TO PUT MSG IN ALMOST EVERY FOOD?"" I must assume\n>that my wife's answer is closer to the truth than I hoped it was.\nI don't mean to be disrespectful to your concerns, but it seems to me \nthat you're getting all wound up in a non-issue. \n\nAs many knowledgeable people have pointed out, msg is a naturally \noccurring substance in a l..."
719,8331,1.459391,"From: lundby@rtsg.mot.com (Walter F. Lundby)\nSubject: Re: Is MSG sensitivity superstition?\nNntp-Posting-Host: accord2\nOrganization: Motorola Inc., Cellular Infrastructure Group\nDistribution: na\nLines: 29\n\nAs nobody in the food industry has even bothered to address my previous\nquestion ""WHY DO YOU NEED TO PUT MSG IN ALMOST EVERY FOOD?"" I must assume\nthat my wife's answer is closer to the truth than I hoped it was.\n\nShe believes that MSG is added to food to cause people to eat more of it\nand not quit when they shoud be sated. To put it a different way, she \nbelieves that for some people MSG causes them to act toward food like an addict. \n(Eat all the chips, chow down on several packages of noodle soup .... you get the\nidea! } IF she is right, then the moral and ethical ..."
869,10026,1.493861,"From: healta@saturn.wwc.edu (Tammy R Healy)\nSubject: Re: note to Bobby M.\nLines: 52\nOrganization: Walla Walla College\nLines: 52\n\nIn article <1993Apr14.190904.21222@daffy.cs.wisc.edu> mccullou@snake2.cs.wisc.edu (Mark McCullough) writes:\n>From: mccullou@snake2.cs.wisc.edu (Mark McCullough)\n>Subject: Re: note to Bobby M.\n>Date: Wed, 14 Apr 1993 19:09:04 GMT\n>In article <1993Apr14.131548.15938@monu6.cc.monash.edu.au> darice@yoyo.cc.monash.edu.au (Fred Rice) writes:\n>>In <madhausC5CKIp.21H@netcom.com> madhaus@netcom.com (Maddi Hausmann) writes:\n>>\n>>>Mark, how much do you *REALLY* know about vegetarian diets?\n>>>The problem is not ""some"" B-vitamins, it's balancing proteins. \n>>>There is also one vitamin that cannot be obtained from non-animal\n>>>products, and this is only ..."
810,9407,1.502973,"From: klier@iscsvax.uni.edu\nSubject: Re: Modified sense of taste in Cancer pt?\nOrganization: University of Northern Iowa\nLines: 16\n\nIn article <1993Apr21.134848.19017@peavax.mlo.dec.com>, lunger@helix.enet.dec.com (Dave Lunger) writes:\n> \n> What does a lack of taste of foods, or a sense of taste that seems ""off""\n> when eating foods in someone who has cancer mean? What are the possible\n> causes of this? Why does it happen?\n\nI can't answer most of your questions, but I've seen it happen in \nfamily members who are being treated with radiation and/or chemotherapy.\nJory Graham published a cookbook many years ago (in cooperation with \nthe American Cancer Society, I think) called ""Something has to taste\ngood"" (as I recall).\n\nThe cookbook was just what we needed several times ..."
1154,1838,1.511323,"From: gnome@pd.org (Mike Mitten)\nSubject: Re: What is it with Cats and Dogs ???!\nOrganization: The Laughing Gnome Software Farm, Atlanta, GA, USA\nLines: 13\nNNTP-Posting-Host: noel.pd.org\nX-Newsreader: TIN [version 1.1 PL6]\n\njames.bessette (jimbes@cbnewsj.cb.att.com) wrote:\n>In article <6130328@hplsla.hp.com> kens@hplsla.hp.com (Ken Snyder) writes:\n>>ps. I also heard from a dog breeder that the chains of bicycles and\n>>motorcycles produced high frequency squeaks that dogs loved to chase.\n>Ask the breeder why they also chase BMWs also.\n\nCam chain.\n\n -Mike\n\nMike Mitten - gnome@pd.org - ...!emory!pd.org!gnome - AMA#675197 - DoD#522\nIrony is the spice of life. '90 Bianchi Backstreet '82 Suzuki GS850GL\n""The revolution will not be televised.""\n"
533,6162,1.513969,"From: julie@eddie.jpl.nasa.gov (Julie Kangas)\nSubject: Re: Is MSG sensitivity superstition?\nNntp-Posting-Host: eddie.jpl.nasa.gov\nOrganization: Jet Propulsion Laboratory, Pasadena, CA\nLines: 34\n\nIn article <michael.735318247@vislab.me.iastate.edu> michael@iastate.edu (Michael M. Huang) writes:\n>MSG is common in many food we eat, including Chinese (though some oriental\n>restaurants might put a tad too much in them). I've noticed that when I\n>go out and eat in most of the Chinese food restaurants, I will usually get\n>a slight headache and an ununsual thirst afterwards. This happens to many\n>of my friends and relatives too. And, heh, we eat Chinese food all the\n>time at home :) (but we don't use MSG when we're cooking for ourselves)\n>\n>So, when we put one and one together..."
534,6163,1.518991,"From: lundby@rtsg.mot.com (Walter F. Lundby)\nSubject: Re: Is MSG sensitivity superstition?\nNntp-Posting-Host: accord2\nOrganization: Motorola Inc., Cellular Infrastructure Group\nLines: 23\n\nIn article <1993Apr20.173019.11903@llyene.jpl.nasa.gov> julie@eddie.jpl.nasa.gov (Julie Kangas) writes:\n>\n>As for how foods taste: If I'm not allergic to MSG and I like\n>the taste of it, why shouldn't I use it? Saying I shouldn't use\n>it is like saying I shouldn't eat spicy food because my neighbor\n>has an ulcer.\n>\n Nobody is saying that you shouldn't be allowed to use msg. Just\ndon't force it on others. If you have food that you want to \nenhance with msg just put the MSG on the table like salt. It is\nthen the option of the eater to use it. If you make a commerical\nproduct, just ..."
215,2575,1.519803,"From: michael@iastate.edu (Michael M. Huang)\nSubject: Re: Is MSG sensitivity superstition?\nOrganization: Iowa State University, Ames IA\nLines: 21\n\nMSG is common in many food we eat, including Chinese (though some oriental\nrestaurants might put a tad too much in them). I've noticed that when I\ngo out and eat in most of the Chinese food restaurants, I will usually get\na slight headache and an ununsual thirst afterwards. This happens to many\nof my friends and relatives too. And, heh, we eat Chinese food all the\ntime at home :) (but we don't use MSG when we're cooking for ourselves)\n\nSo, when we put one and one together, it can be safely assumed that\nMSG may cause some allergic reactions in some people.\n\nStick with natural things. MSG doesn't do body any good (and possib..."
1257,2877,1.527354,From: jjd1@cbnewsg.cb.att.com (james.j.dutton)\nSubject: Re: bikes with big dogs\nOrganization: AT&T\nDistribution: na\nLines: 20\n\nIn article <1993Apr14.234835.1@cua.edu> 84wendel@cua.edu writes:\n>Has anyone ever heard of a rider giving a big dog such as a great dane a ride \n>on the back of his bike. My dog would love it if I could ever make it work.\n>\tThanks\n>\t\t\t84wendel@cua.edu\n \n If a large Malmute counts then yes someone has heard(and seen) such\nan irresponsible childish stunt. The dog needed assistance straightening\nout once on board. The owner would lift the front legs of dog and throw\nthem over the driver/pilots shoulders. Said dog would get shit eating\ngrin on its face and away they'd go. The dogs ass was firmly planted\non the seat.\n \n My dog and this dog ac...
460,5204,1.531672,"From: paulson@tab00.larc.nasa.gov (Sharon Paulson)\nSubject: Re: food-related seizures?\nOrganization: NASA Langley Research Center, Hampton VA, USA\nLines: 53\nNNTP-Posting-Host: cmb00.larc.nasa.gov\nIn-reply-to: dozonoff@bu.edu's message of 21 Apr 93 16:18:19 GMT\n\nIn article <116305@bu.edu> dozonoff@bu.edu (david ozonoff) writes:\n\n Path: news.larc.nasa.gov!darwin.sura.net!zaphod.mps.ohio-state.edu!uwm.edu!linac!att!bu.edu!dozonoff\n From: dozonoff@bu.edu (david ozonoff)\n Newsgroups: sci.med\n Date: 21 Apr 93 16:18:19 GMT\n References: <PAULSON.93Apr19081647@cmb00.larc.nasa.gov>\n Sender: news@bu.edu\n Lines: 22\n X-Newsreader: Tin 1.1 PL5\n\n Sharon Paulson (paulson@tab00.larc.nasa.gov) wrote:\n : \n {much deleted]\n : \n : \n : The fact that this hap..."


The results look similarly good for both searches.  It would be better to compare them with labeled data.