In [14]:
!jupyter nbconvert --to script search_tools.ipynb

[NbConvertApp] Converting notebook search_tools.ipynb to script
[NbConvertApp] Writing 1038 bytes to search_tools.py


In [13]:
!jupyter nbconvert --to script search_word.ipynb

[NbConvertApp] Converting notebook search_word.ipynb to script
[NbConvertApp] Writing 3008 bytes to search_word.py


In [3]:
import mistune

ImportError: cannot import name 'AstRenderer' from 'mistune.renderers' (C:\Users\Patrick\anaconda3\envs\urbandesignenv\Lib\site-packages\mistune\renderers\__init__.py)

In [1]:
#Import Modules
import os
import pickle
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import TruncatedSVD
from sklearn.metrics.pairwise import cosine_similarity

In [3]:
#Load Database (pickle_file)
pickle_folder = os.path.join(".","pickle_file")

In [4]:
#Load all pickle files and merge into single database
database = []
for pickle_file in os.listdir(pickle_folder):
    if pickle_file.endswith('.pkl'):
        with open(os.path.join(pickle_folder, pickle_file), "rb") as f:
            database.extend(pickle.load(f))

In [5]:
#Simple Search
def simple_search(query, num_results=5):
    results = []
    query = query.lower()
    for entry in database:
        if query in entry['KEYWORD'].lower():
            results.append(entry)
    
    results = results[:num_results]
    
    for result in results:
        print(f"TEXT: {result['TEXT']}\nLINE: {result['LINE']}\nBOOK: {result['BOOK']}\nKEYWORD: {result['KEYWORD']}\n")
    
    return results

In [6]:
#TF-IDF Search
def tfidf_search(query, num_results=5):
    documents = [entry['KEYWORD'] for entry in database]
    vectorizer = TfidfVectorizer(min_df=2)
    tfidf_matrix = vectorizer.fit_transform(documents)
    query_vec = vectorizer.transform([query.lower()])
    scores = cosine_similarity(query_vec, tfidf_matrix).flatten()
    top_indices = scores.argsort()[-num_results:][::-1]
    
    results = [database[i] for i in top_indices]
    for result in results:
        print(f"TEXT: {result['TEXT']}\nLINE: {result['LINE']}\nBOOK: {result['BOOK']}\nKEYWORD: {result['KEYWORD']}\n")
    
    return results, vectorizer, tfidf_matrix

In [7]:
def svd_search(query, num_results=5):
    documents = [entry['KEYWORD'] for entry in database]
    vectorizer = TfidfVectorizer(min_df=2)
    tfidf_matrix = vectorizer.fit_transform(documents)
    svd = TruncatedSVD(n_components=100)
    svd_matrix = svd.fit_transform(tfidf_matrix)
    query_vec = svd.transform(vectorizer.transform([query.lower()]))
    scores = cosine_similarity(query_vec, svd_matrix).flatten()
    top_indices = scores.argsort()[-num_results:][::-1]
    top_terms = [vectorizer.get_feature_names_out()[idx] for idx in svd.components_.argsort()[::-1][:10]]
    
    results = [database[i] for i in top_indices]
    for result in results:
        print(f"TEXT: {result['TEXT']}\nLINE: {result['LINE']}\nBOOK: {result['BOOK']}\nKEYWORD: {result['KEYWORD']}\n")
    
    print("Top Keywords:\n")
    for terms in top_terms:
        print(terms)

    return results, top_terms

In [8]:
#Testing
query = "shop"

In [9]:
simple_search(query)

TEXT:  He prefers patching up a ruin to building a house; he raises shops and hovels, the abodes of inactive, vegetating, brutish poverty, under the protection of aged and ruined, yet stalwart, arches of the Roman amphitheater; and the habitations of the lower orders frequently present traces of ornament and stability of material evidently belonging to the remains of a prouder edifice.
LINE: 119
BOOK: The Poetry of Architecture by John Ruskin
KEYWORD: ruin build hous rais shop hovel abod inact brutish poverti protect age ruin yet stalwart arch amphitheat habit lower order frequent present trace ornament stabil materi evid belong remain edific

TEXT: —PRINCIPLES OF COMPOSITION. 162. It has lately become a custom, among the more enlightened and refined of metropolitan shopkeepers, to advocate the cause of propriety in architectural decoration, by ensconcing their shelves, counters, and clerks in classical edifices, agreeably ornamented with ingenious devices, typical of the class of arti

[{'TEXT': ' He prefers patching up a ruin to building a house; he raises shops and hovels, the abodes of inactive, vegetating, brutish poverty, under the protection of aged and ruined, yet stalwart, arches of the Roman amphitheater; and the habitations of the lower orders frequently present traces of ornament and stability of material evidently belonging to the remains of a prouder edifice.',
  'LINE': 119,
  'BOOK': 'The Poetry of Architecture by John Ruskin',
  'KEYWORD': 'ruin build hous rais shop hovel abod inact brutish poverti protect age ruin yet stalwart arch amphitheat habit lower order frequent present trace ornament stabil materi evid belong remain edific'},
 {'TEXT': '—PRINCIPLES OF COMPOSITION. 162. It has lately become a custom, among the more enlightened and refined of metropolitan shopkeepers, to advocate the cause of propriety in architectural decoration, by ensconcing their shelves, counters, and clerks in classical edifices, agreeably ornamented with ingenious device

In [10]:
tfidf_results, vectorizer, tfidf_matrix = tfidf_search(query)
print("TF-IDF Search:", tfidf_results)

TEXT: Early on, we did everything we could to avoid exhibiting work in a white cube and everything it stood for. We showed in shop windows, homes, shopping centers, cafes, gardens. But always, the work became about the space itself, or the context, rather than the ideas we wished to explore.
LINE: 623
BOOK: Speculative Everything by Anthony Dunne
KEYWORD: earli everyth could avoid work white cube everyth stood shop window home shop center garden alway work space context rather idea wish explor

TEXT:  I remember another shebeen — a rum shop, it’s called there — in a village in Dominica. One night some friends and I drive through country darkness, stopped where the car could go no further, and climbed a hill, bumping into tree stumps, arriving at a rum shop.
LINE: 443
BOOK: A Map to the Door of No Return - Dionne Brand
KEYWORD: rememb anoth shebeen rum shop villag one night friend drive countri dark stop car could hill bump tree stump rum shop

TEXT:  This trend is also reflected in “fr

In [11]:
svd_results, top_terms = svd_search(query)

TEXT:  They had to purchase their supplies: shops and warehouses were constructed. Streets were pushed into the dunes behind the old village; the built-up area spilled south toward the mission and north toward Washerwoman’s Lagoon.
LINE: 1220
BOOK: The Age of Gold - H. W. Brands
KEYWORD: purchas suppli shop warehous street dune behind old villag built area south toward mission north toward washerwoman lagoon

TEXT:  Factories and workshops were sited in the middle of residential areas, emitting smoke and deadly effluents. Children played in courtyards awash with raw sewage. Cholera and tuberculosis were a constant threat.
LINE: 787
BOOK: The Architecture of Happiness by Alain de Botton
KEYWORD: factori workshop middl residenti area smoke deadli effluent child courtyard awash raw sewag cholera tuberculosi constant threat

TEXT:  For example, an extended scene in the 1968 television special Dialoog in Het Dorp showed an attractive young woman carefully selecting, testing, and purchasing 