<a href="https://www.kaggle.com/code/aleksandrmorozov123/nlp-with-python?scriptVersionId=153215975" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

**Code to populate the documents dictionary**

In [1]:
def read_documents ():
    f = open ("/kaggle/input/cisi-a-dataset-for-information-retrieval/CISI.ALL")
    merged = " "
    # the string variable merged keeps the result of merging the field identifier with its content
    
    for a_line in f.readlines ():
        if a_line.startswith ("."):
            merged += "\n" + a_line.strip ()
        else:
            merged += " " + a_line.strip ()
    # updates the merged variable using a for-loop
    
    documents = {}
    
    content = ""
    doc_id = ""
    # each entry in the dictioanry contains key = doc_id and value = content
    
    for a_line in merged.split ("\n"):
        if a_line.startswith (".I"):
            doc_id = a_line.split (" ") [1].strip()
        elif a_line.startswith (".X"):
            documents[doc_id] = content
            content = ""
            doc_id = ""
        else:
            content += a_line.strip ()[3:] + " "
    f.close ()
    return documents

# print out the size of the dictionary and the content of the very first article
documents = read_documents ()
print (len (documents))
print (documents.get ("1"))
    

1460
 18 Editions of the Dewey Decimal Classifications Comaromi, J.P. The present study is a history of the DEWEY Decimal Classification.  The first edition of the DDC was published in 1876, the eighteenth edition in 1971, and future editions will continue to appear as needed.  In spite of the DDC's long and healthy life, however, its full story has never been told.  There have been biographies of Dewey that briefly describe his system, but this is the first attempt to provide a detailed history of the work that more than any other has spurred the growth of librarianship in this country and abroad. 


**Code to populate the queries dictionary**

In [2]:
def read_queries ():
    f = open ("/kaggle/input/cisi-a-dataset-for-information-retrieval/CISI.QRY")
    merged = ""
    
    # merge the conten of each field with its identifier and separate different fields with lune breaks
    for a_line in f.readlines ():
        if a_line.startswith ("."):
            merged += "\n" + a_line.strip ()
        else:
            merged += " " + a_line.strip ()
    
    queries = {}
    
    # initialize queries dictionary with key = qry_id and value=content for each query in the dataset
    content = ""
    qry_id = ""
    
    for a_line in merged.split ("\n"):
        if a_line.startswith (".I"):
            if not content == "":
                queries [qry_id] = content
                content = ""
                qry_id = ""
            # add an enrty to the dictionary when you encounter an .I identifier
            qry_id = a_line.split(" ")[1].strip ()
        # otherwise, keep adding content to the content variable
        elif a_line.startswith (".W") or a_line.startswith (".T"):
            content += a_line.strip ()[3:] + " "
    queries [qry_id] = content
    f.close ()
    return queries

# print out the length of the dictionary and the content of the first query
queries = read_queries ()
print (len (queries))
print (queries.get("1"))

112
What problems and concerns are there in making up descriptive titles? What difficulties are involved in automatically retrieving articles from approximate titles? What is the usual relevance of the content of articles to their titles? 


**Code to populate the mappings dictionary**

In [3]:
def read_mappings ():
    f = open ("/kaggle/input/cisi-a-dataset-for-information-retrieval/CISI.REL")
    mappings = {}
    
    for a_line in f.readlines ():
        voc = a_line.strip ().split ()
        key = voc[0].strip ()
        current_value = voc[1].strip()
        value = []
        # update the entry in the mappings dictionary with the current value
        if key in mappings.keys ():
            value = mappings.get (key)
        value.append (current_value)
        mappings [key] = value
    f.close ()
    return mappings

# print out some information about the mapping data structure
mappings = read_mappings ()
print (len (mappings))
print (mappings.keys ())
print (mappings.get ("1"))

76
dict_keys(['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '37', '39', '41', '42', '43', '44', '45', '46', '49', '50', '52', '54', '55', '56', '57', '58', '61', '62', '65', '66', '67', '69', '71', '76', '79', '81', '82', '84', '90', '92', '95', '96', '97', '98', '99', '100', '101', '102', '104', '109', '111'])
['28', '35', '38', '42', '43', '52', '65', '76', '86', '150', '189', '192', '193', '195', '215', '269', '291', '320', '429', '465', '466', '482', '483', '510', '524', '541', '576', '582', '589', '603', '650', '680', '711', '722', '726', '783', '813', '820', '868', '869', '894', '1162', '1164', '1195', '1196', '1281']


**Preprocess the data in documents and queries**

In [4]:
# import required libraries
import nltk
from nltk import word_tokenize

# text is converted to lowercase and split into words
def get_words (text):
    word_list = [word for word in word_tokenize (text.lower ())]
    return word_list
    
doc_words = {}
qry_words = {}

for doc_id in documents.keys ():
    doc_words [doc_id] = get_words (documents.get (doc_id))
for qry_id in queries.keys ():
    # entries in both documents and queries are represented as word lists
    qry_words [qry_id] = get_words (queries.get (qry_id))
    
# print out the length of the dictionaries and check the first document and the fisrt query
print (len (doc_words))
print (doc_words.get ("1"))
print (len (doc_words.get ("1")))
print (len (qry_words))
print (qry_words.get ("1"))
print (len (qry_words.get("1")))

1460
['18', 'editions', 'of', 'the', 'dewey', 'decimal', 'classifications', 'comaromi', ',', 'j.p.', 'the', 'present', 'study', 'is', 'a', 'history', 'of', 'the', 'dewey', 'decimal', 'classification', '.', 'the', 'first', 'edition', 'of', 'the', 'ddc', 'was', 'published', 'in', '1876', ',', 'the', 'eighteenth', 'edition', 'in', '1971', ',', 'and', 'future', 'editions', 'will', 'continue', 'to', 'appear', 'as', 'needed', '.', 'in', 'spite', 'of', 'the', 'ddc', "'s", 'long', 'and', 'healthy', 'life', ',', 'however', ',', 'its', 'full', 'story', 'has', 'never', 'been', 'told', '.', 'there', 'have', 'been', 'biographies', 'of', 'dewey', 'that', 'briefly', 'describe', 'his', 'system', ',', 'but', 'this', 'is', 'the', 'first', 'attempt', 'to', 'provide', 'a', 'detailed', 'history', 'of', 'the', 'work', 'that', 'more', 'than', 'any', 'other', 'has', 'spurred', 'the', 'growth', 'of', 'librarianship', 'in', 'this', 'country', 'and', 'abroad', '.']
113
112
['what', 'problems', 'and', 'concerns',

**Simple Biilean search algorithm**

In [None]:
# iterate through the documents
def retrieve_documents (doc_words, query):
    docs = []
    for doc_id in doc_words.keys ():
        found = False
        i = 0
        while i<len(query) and not found: 
            word = query [i]
            if word in doc_words.get (doc_id):
                docs.append (doc_id)
                found = True
            else:
                i+=1
    return docs

# check the results
docs = retrieve_documents (doc_words, qry_words.get("3"))
print (docs [:100])
print (len (docs))

**Begin the preprocessing - remove stopwords and punctuation marks**

In [5]:
# import python's string module that will help remove punctuation marks
import string

# import the stopwords list
from nltk import word_tokenize
from nltk.corpus import stopwords

def process (text):
    stoplist = set (stopwords.words ('english'))
    # only add tthe words if they are not included in the stoplist and are not puctuation marks
    word_list = [word for word in word_tokenize (text.lower())
                if not word in stoplist and not word in string.punctuation]
    return word_list

# check the results of these preprocessing steps on some documents or queries
word_list = process (documents.get ("1"))
print (word_list)

['18', 'editions', 'dewey', 'decimal', 'classifications', 'comaromi', 'j.p.', 'present', 'study', 'history', 'dewey', 'decimal', 'classification', 'first', 'edition', 'ddc', 'published', '1876', 'eighteenth', 'edition', '1971', 'future', 'editions', 'continue', 'appear', 'needed', 'spite', 'ddc', "'s", 'long', 'healthy', 'life', 'however', 'full', 'story', 'never', 'told', 'biographies', 'dewey', 'briefly', 'describe', 'system', 'first', 'attempt', 'provide', 'detailed', 'history', 'work', 'spurred', 'growth', 'librarianship', 'country', 'abroad']


**Next step in preprocessing - stemming**

In [6]:
# import the stemmer
from nltk.stem.lancaster import LancasterStemmer

def process (text):
    stoplist = set (stopwords.words ('english'))
    # initialize the LancasterStemmer
    st = LancasterStemmer ()
    word_list = [st.stem(word) for word in word_tokenize (text.lower ())
                if not word in stoplist and not word in string.punctuation]
    return word_list

# check the results on some document, query, or on a list of words
word_list = process (documents.get("26"))
print (word_list)
word_list = process ("organize, organizing, organizational, organ, organic, organizer")
print (word_list)

['index', 'abstract', 'assocy', 'doyl', 'l.b', 'artic', 'discuss', 'poss', 'exploit', 'stat', 'word', 'co-occurrence', 'text', 'purpos', 'docu', 'retriev', 'co-occurrence', 'defin', 'rel', 'ment', 'process', 'auth', 'read', 'sev', 'mean', 'quantit', 'meas', 'word', 'co-occurrence', 'scrutinized', 'shown', 'strongly', 'co-occurring', 'word', 'pair', 'theref', '``', 'assocy', "''", 'stat', 'sens', 'repres', 'form', '``', 'assocy', 'map', "''", 'last', 'half', 'artic', 'pres', 'two', 'mod', 'us', 'assocy', 'map', 'lit', 'search']
['org', 'org', 'org', 'org', 'org', 'org']


**Estimate term frequency in documents and queries**

In [None]:
def get_terms (text):
    stoplist = set (stopwords.words ('english'))
    terms = {}
    st = LancasterStemmer ()
    word_list = [st.stem(word) for word in word_tokenize (text.lower ())
                if not word in stoplist and not word in string.punctuation]
    for word in word_list:
        terms [word] = terms.get (word, 0) + 1
    return terms

doc_terms = {}
qry_terms = {}
for doc_id in documents.keys ():
    doc_terms [doc_id] = get_terms (documents.get (doc_id))
for qry_id in queries.keys ():
    # populate the term frequency dictionaries for all documents and all queries
    qry_terms [qry_id] = get_terms (queries.get (qry_id))
    
# check the results
print (len (doc_terms))
print (doc_terms.get ("1"))
print (len (doc_terms.get("1")))
print (len (qry_terms))
print (qry_terms.get("1"))
print (len (qry_terms.get("1")))


**Code to represent the datya in a shared space**

In [None]:
# collect the shared vocabulary of terms from documents and queries and return it as a sorted list
def collect_vocabulary ():
    all_terms = []
    for doc_id in doc_terms.keys ():
        for term in doc_terms.get (doc_id).keys():
            all_terms.append (term)
    for qry_id in qry_terms.keys ():
        for term in qry_terms.keys():
            for term in qry_terms.get(qry_id).keys():
                all_terms.append (term)
    return sorted (set (all_terms))

# print out the length of the shared vocabulary and check the first several terms in the vocabulary
all_terms = collect_vocabulary ()
print (len (all_terms))
print (all_terms [:10])

def vectorize (input_features, vocabulary):
    output = {}
    for item_id in input_features.keys ():
        features = input_features.get (item_id)
        output_vector = []
        for word in vocabulary:
            if word in features.keys ():
                output_vector.append (int (features.get (word)))
            else:
                output_vector.append (0)
        output [item_id] = output_vector
    return output

doc_vectors = vectorize (doc_terms, all_terms)
qry_vectors = vectorize (qry_terms, all_terms)

# print out some statistics on these data structures
print (len (doc_vectors))
print (len (doc_vectors.get ("1450")))
print (len (qry_vectors))
print (len (qry_vectors.get ("110")))

**Calculate and apply inverse document frequency weighting**

In [None]:
# import library for math
import math

def calculate_idfs (vocabulary, doc_features):
    doc_idfs = {}
    for term in vocabulary:
        doc_count = 0
        for doc_id in doc_features.keys ():
            terms = doc_features.get (doc_id)
            if term in terms.keys ():
                doc_count += 1
        doc_idfs [term] = math.log (float (len (doc_features.keys ()))/
                                    float (1 + doc_count), 10)
    return doc_idfs

# check the results - we should have idf values for all terms from the vocabulary
doc_idfs = calculate_idfs (all_terms, doc_terms)
print (len (doc_idfs))
print (doc_idfs.get ("system"))

# define a function to apply idf weighing to the input_terms data structure
def vectorize_idf (input_terms, input_idfs, vocabulary):
    output = {}
    for item_id in input_terms.keys ():
        terms = input_terms.get (item_id)
        output_vector = []
        for term in vocabulary:
            if term in terms.keys ():
                # multiply the term frequencies with idf weights if the term is present in document
                output_vector.append (
                input_idfs.get (term) * float (terms.get (term)))
            else:
                output_vector.append (float (0))
        output [item_id] = output_vector
    return output

# apply idf weighing to doc_terms
doc_vectors = vectorize_idf (doc_terms, doc_idfs, all_terms)

# print out some statistics, such as the number of documents and terms
print (len (doc_vectors))
print (len (doc_vectors.get ("1460")))

**Run search algorithm for a given query on the set of the documents**

In [None]:
# the operator's itemgetter functionality helps sort Python dictionaries by keys or values
from operator import itemgetter

# calculate the length of the input vector
def length (vector):
    sq_length = 0
    for index in range (0, len(vector)):
        sq_length += math.pow (vector [index], 2)
    return math.sqrt (sq_length)

# calculate the dot product of two vectors
def dot_product (vector1, vector2):
    if len (vector1) == len (vector2):
        dot_prod = 0
        for index in range (0, len(vector1)):
            if not vector1 [index] == 0 and not vector2 [index] == 0:
                dot_prod += vector1 [index] * vector2 [index]
        return dot_prod
    else:
        return "Unmatching dimensionality"
    
def calculate_cosine (query, document):
    cosine = dot_product (query, document) / (length (query) * length (document))
    return cosine

query = qry_vectors.get ("3")
results = {}

for doc_id in doc_vectors.keys ():
    document = doc_vectors.get (doc_id)
    cosine = calculate_cosine (query, document)
    results [doc_id] = cosine
    
# sort the results dictionary by cosine values in descending order and return the top n results
for items in sorted (results.items (), key = itemgetter (1), reverse = True) [:44]:
    print (items [0])

**Estimate precision@k and ratio of cases with at least one relevant document**

In [None]:
# calculate the proportion of relevant documents from the gold standard in the top k returned results
def calculate_precision (model_output, gold_standard):
    true_pos = 0
    for item in model_output:
        if item in gold_standard:
            true_pos += 1
    return float (true_pos) / float (len (model_output))

def calculate_found (model_output, gold_standard):
    found = 0
    for item in model_output:
        if item in gold_standard:
            found = 1
    return float (found)

precision_all = 0.0
found_all = 0.0
for query_id in mappings.keys ():
    # calculate mean values across all queries
    gold_standard = mappings.get (str (query_id))
    query = qry_vectors.get (str (query_id))
    results = {}
    model_output = []
    for doc_id in doc_vectors.keys ():
        document = doc_vectors.get (doc_id)
        cosine = calculate_cosine (query, document)
        # for each document, esimate its relevance to the query with cosine similarity as before
        results [doc_id] = cosine
    # sort the results and consider only top k (top 5) most relevant documents
    for items in sorted (results.items (), key = itemgetter (1), reverse = True) [:5]:
        model_output.append (items [0])
    precision = calculate_precision (model_output, gold_standard)
    found = calculate_found (model_output, gold_standard)
    print (f"{str (query_id)} : {str(precision)}")
    precision_all += precision
    found_all += found
    
# estimate the mean values for all queries
print (precision_all / float (len (mappings.keys ())))
print (found_all / float (len (mappings.keys ())))    

On some queries the algorithm perform very well. For example, "1 : 1.0" shows that all top 5 documents returned for query 1 are relevant. However, on other queries the alforithm does not perform well.

**Estimate mean reciprocal rank**

In [None]:
rank_all = 0.0
for query_id in mappings.keys ():
    gold_standard = mappings.get (str (query_id))
    query = qry_vectors.get (str (query_id))
    results = {}
    for doc_id in doc_vectors.keys ():
        document = doc_vectors.get (doc_id)
        cosine = calculate_cosine (query, document)
        results [doc_id] = cosine
    sorted_results = sorted (results.items (),
                            key=itemgetter (1), reverse = True)
    index = 0
    found = False
    while found == False:
        # set the flag found to False and switch it to True when we find the first relevant document
        item = sorted_results [index]
        # increment the index with each document in the results
        index += 1
        if index == len (sorted_results):
            found = True
        if item [0] in gold_standard:
            # the document ID is the first element in the sorted tuples oof (document_id, similarity score)
            found = True
            print (f"{str(query_id)}: {str(float (1) / float (index))}")
            rank_all += float(1) / float (index)
            
# print out the mean valur across all queries
print (rank_all / float (len (mappings.keys ())))

**Example how to run spaCy's processing pipeline**

In [8]:
# import library
import spacy

# the spacy.load command initializes the nlp pipeline
nlp = spacy.load ("en_core_web_sm")
doc = nlp ("On monday students meet with researchers " + " and discuss future development their research.")
rows = []

# print the output in a tabular format and add a header to the printout for clarity
rows.append (["Word", "Position", "Lowercase", "Lemma", "POS", "Alphanumeric", "Stopword"])

for token in doc:
    rows.append ([token.text, str(token.i), token.lower_, token.lemma_,
                 token.pos_, str(token.is_alpha), str (token.is_stop)])
    
# Python's zip function allows to reformat input from row representation
columns = zip (*rows)
column_widths = [max (len (item) for item in col)
                for col in columns]

# calculate the maximum length of strings in each column to allow enough space in the printout
for row in rows:
    print (''.join(' {:{width}} '.format (
        row [i], width = column_widths [i])
                  for i in range (0, len (row))))

 Word         Position  Lowercase    Lemma        POS    Alphanumeric  Stopword 
 On           0         on           on           ADP    True          True     
 monday       1         monday       monday       PROPN  True          False    
 students     2         students     student      NOUN   True          False    
 meet         3         meet         meet         VERB   True          False    
 with         4         with         with         ADP    True          True     
 researchers  5         researchers  researcher   NOUN   True          False    
              6                                   SPACE  False         False    
 and          7         and          and          CCONJ  True          True     
 discuss      8         discuss      discuss      VERB   True          False    
 future       9         future       future       ADJ    True          False    
 development  10        development  development  NOUN   True          False    
 their        11        thei

**Identify all groups of nouns and the way they are realted to each other**

In [9]:
doc = nlp ("On monday students meet with researchers " + " and discuss future development their research.")

# we can access noun phrases by doc.noun_chunks
for chunk in doc.noun_chunks:
    # print out the phrase, its head, the type of relation to the next most important word, and the word itself
    print ('\t'.join ([chunk.text, chunk.root.text, chunk.root.dep_, chunk.root.head.text]))

monday students	students	nsubj	meet
researchers	researchers	pobj	with
future development	development	dobj	discuss
their research	research	dobj	discuss


**Visualize the dependency information**

In [11]:
# import spaCy's visualization tool displaCy
from spacy import displacy
# path helps define the location for the file to store the visualization
from pathlib import Path

# use displaCy to visualize dependecies over the input text with approptiate arguments
svg = displacy.render (doc, style = 'dep', jupyter = False)
file_name = '-'.join ([w.text for w in doc if not w.is_punct]) + ".svg"

# the the output us stored to simply uses the words from the sentence in its name
output_path = Path (file_name)
output_path.open ("w", encoding="utf-8").write(svg)

10561

**Print out the information about head and dependents for each word**

In [12]:
# coode assumes that spaCy is imported and input text is already fed into the pipeline
for token in doc:
    print (token.text, token.dep_, token.head.text,
          token.head.pos_, [child for child in token.children])

On prep meet VERB []
monday compound students NOUN []
students nsubj meet VERB [monday]
meet ROOT meet VERB [On, students, with, and, discuss, .]
with prep meet VERB [researchers]
researchers pobj with ADP [ ]
  dep researchers NOUN []
and cc meet VERB []
discuss conj meet VERB [development, research]
future amod development NOUN []
development dobj discuss VERB [future]
their poss research NOUN []
research dobj discuss VERB [their]
. punct meet VERB []


**Extarct participants of the actions**

In [13]:
# code assumes that spaCy is imported and input text is already fed into pipeline
for token in doc:
    # check that the ROOT of the sentence is a verb with the base form (lemma) "meet"
    if (token.lemma_ == "meet" and token.pos_ == "VERB"
       and token.dep_ == "ROOT"):
        # this verb expresses the action itself
        action = token.text
        # extract the list of all dependents of this verb using token.children
        children = [child for child in token.children]
        participant1 = ""
        participant2 = ""
        for child1 in children:
            if child1.dep_ == "nsubj":
                participant1 = " ".join (
                [attr.text for attr in child1.children]
                ) + " " + child1.text
            elif child1.text == "with":
                # check if the verb has preposition "with" as one of its dependents
                action += " " + child1.text
                child1_children = [child for child in child1.children]
                for child2 in child1_children:
                    if child2.pos_ == "NOUN":
                        participant2 = " ".join (
                        [attr.text for attr in child2.children]
                        ) + " " + child2.text
                    
# print out the results
print (f"Participant1 = {participant1}")
print (f"Action = {action}")
print (f"Participant2 = {participant2}")

Participant1 = monday students
Action = meet with
Participant2 =   researchers


**Build information extractor**

In [26]:
# provide diverse set of sentences
sentences = ["On monday students meet with researchers " + " and discuss future development their research.", 
            " Warren Baffet met with the President last week.",
            "Elon Musk met with the President an White House.",
            "The two bussinesmans also posed for photographs and " + 
            "the Vice President talked to reporters."]

# define a function to apply all the steps in the information extraction algorithm
def extract_information (doc):
    action = ""
    participant1 = ""
    for token in doc: 
         if (token.lemma_ == "meet" and token.pos_ == "VERB" 
            and token.dep_ == "ROOT"):
                action = token.text
                children = [child for child in token.children]
                for child1 in children:
                    if child1.dep_ == "nsubj": 
                        patricipant1 = " ".join (
                [attr.text for attr in child1.children]
                ) + " " + child1.text
                    elif child1.text == "with":
                        action += " " + child1.text
                        child1_children = [child for child in child1.children]
                        for child2 in child1_children:
                            # extract participants expressed with proper nouns (PROPN) and common nouns (NOUN)
                            if (child2.pos_ == "NOUN"
                            or child2.pos_ == "PROPN"):
                                participant2 = " ".join (
                        [attr.text for attr in child2.children]
                        ) + " " + child2.text
                    elif (child1.dep_ == "dobj"
                        and (child1.pos_ == "NOUN"
                            or child1.pos_ == "PROPN")):
                        participant2 = " ".join (
                            [attr.text for attr in child1.children]
                            ) + " " + child1.text
    print (f"Participant1 = {participant1}")
    print (f"Action = {action}")
    print (f"Participant2 = {participant2}")
        
# apply extract_information function to each sentence and print out the actions and participants
for sent in sentences:
    print (f"\nSentence = {sent}")
    doc = nlp (sent)
    extract_information (doc)
                        


Sentence = On monday students meet with researchers  and discuss future development their research.
Participant1 = 
Action = meet with
Participant2 =   researchers

Sentence =  Warren Baffet met with the President last week.
Participant1 = 
Action = met with
Participant2 = the President

Sentence = Elon Musk met with the President an White House.
Participant1 = 
Action = met with
Participant2 = the House President

Sentence = The two bussinesmans also posed for photographs and the Vice President talked to reporters.
Participant1 = 
Action = 


UnboundLocalError: local variable 'participant2' referenced before assignment