# Seeker Notebook Demo

The goal of the notebook is to test all the functionalities of the SEEKER architecture. As can be seen in the diagrams, the program has two main processes: loading multiple datasets and exploring them through different types of search.

In [24]:
pip install -i https://test.pypi.org/simple/ seeker-cornell==1.0.1

Looking in indexes: https://test.pypi.org/simple/Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.2.1 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip



Collecting seeker-cornell==1.0.1
  Obtaining dependency information for seeker-cornell==1.0.1 from https://test-files.pythonhosted.org/packages/42/b9/e33d98357984c017ca1ed42274ea1242602b8a4af468c9c3850da028a240/seeker_cornell-1.0.1-py3-none-any.whl.metadata
  Downloading https://test-files.pythonhosted.org/packages/42/b9/e33d98357984c017ca1ed42274ea1242602b8a4af468c9c3850da028a240/seeker_cornell-1.0.1-py3-none-any.whl.metadata (4.0 kB)
Downloading https://test-files.pythonhosted.org/packages/42/b9/e33d98357984c017ca1ed42274ea1242602b8a4af468c9c3850da028a240/seeker_cornell-1.0.1-py3-none-any.whl (10 kB)
Installing collected packages: seeker-cornell
  Attempting uninstall: seeker-cornell
    Found existing installation: seeker-cornell 1.0.0
    Uninstalling seeker-cornell-1.0.0:
      Successfully uninstalled seeker-cornell-1.0.0
Successfully installed seeker-cornell-1.0.1


In [25]:
pip show seeker-cornell

Name: seeker-cornell
Version: 1.0.1
Summary: Search Engine for Efficient Knowledge Extraction and Retrieval
Home-page: https://github.com/CornellDB/SEEKER/
Author: Santiago Martínez Novoa
Author-email: sm2936@cornell.edu
License: 
Location: c:\Users\user\Cornell\SEEKER\venv\Lib\site-packages
Requires: bson, pandas
Required-by: 
Note: you may need to restart the kernel to use updated packages.


In [26]:
!pip show -f seeker-cornell


Name: seeker-cornell
Version: 1.0.1
Summary: Search Engine for Efficient Knowledge Extraction and Retrieval
Home-page: https://github.com/CornellDB/SEEKER/
Author: Santiago Martínez Novoa
Author-email: sm2936@cornell.edu
License: 
Location: C:\Users\user\Cornell\SEEKER\venv\Lib\site-packages
Requires: bson, pandas
Required-by: 
Files:
  data_visualization\__init__.py
  data_visualization\__pycache__\__init__.cpython-311.pyc
  data_visualization\__pycache__\search_visualizer.cpython-311.pyc
  data_visualization\search_visualizer.py
  databases\__init__.py
  databases\__pycache__\__init__.cpython-311.pyc
  dataset_relevance_algorithm\__init__.py
  dataset_relevance_algorithm\__pycache__\__init__.cpython-311.pyc
  index_creation\__init__.py
  index_creation\__pycache__\__init__.cpython-311.pyc
  index_creation\__pycache__\dataset_model.cpython-311.pyc
  index_creation\dataset_model.py
  integration\__init__.py
  integration\__pycache__\__init__.cpython-311.pyc
  interpreter\__init__.py
  

# Data import

In [1]:
from seeker.src.metadata_dataset_separation.data_import import DataLoader

loader = DataLoader()

# Upload a single dataset and its metadata
folder_path = 'dataset_examples'
dataset_name = 'data'
dataset_model = loader.upload(folder_path, dataset_name, include_metadata=True)
print("Single Dataset Model:", dataset_model)

# Upload multiple datasets and their metadata from a folder
dataset_models = loader.upload_multiple(folder_path, include_metadata=True)
print("All Dataset Models:", dataset_models)



Single Dataset Model: DatasetModel(id=3cf45dad-c435-4966-bc9d-43775931a990, name=data)
All Dataset Models: {'City_MedianRentalPrice_3Bedroom': DatasetModel(id=7b4afd65-a326-44f1-aa7f-ce563910266b, name=City_MedianRentalPrice_3Bedroom), 'data': DatasetModel(id=be9eeba5-8349-4b5c-bb39-28bfbb8056b6, name=data), 'house_prices': DatasetModel(id=3d6f8191-1ebc-4764-8a87-8fd968fe8ab8, name=house_prices)}


# Data Search

## Search in Metadata

### Succesful Query

In [2]:
from seeker.src.interpreter.interpreter import SEEKER

#Search string given by the user
input_string = "semantic:housing,semantic:rent,cause_and_consequences:erosion"

#Type of search user wants to do
search_in_metadata = True

#Call the interpreter
interpreter = SEEKER(input_string, dataset_models, search_in_metadata)
interpreter.process()


Operations: ['semantic:housing', 'semantic:rent', 'cause_and_consequences:erosion']
------------ Search Results ------------
Dataset: City_MedianRentalPrice_3Bedroom for search query: 'housing'
Score: 1
----------------------------------------
Dataset: house_prices for search query: 'housing'
Score: 1
----------------------------------------
Dataset: data for search query: 'housing'
Score: 0
----------------------------------------
------------ Search Results ------------
Dataset: City_MedianRentalPrice_3Bedroom for search query: 'housing'
Score: 1
----------------------------------------
Dataset: house_prices for search query: 'housing'
Score: 1
----------------------------------------
Dataset: data for search query: 'housing'
Score: 0
----------------------------------------
No results found for search query: 'rent'
Performing cause and consequences search for: 'erosion' in dataset of size: 3


### Unsuccesful Query

In [3]:
#Search string given by the user
input_string = "semantic:palm,semantic:leasing,cause_and_consequences:population"

#Type of search user wants to do
search_in_metadata = True

#Call the interpreter
interpreter = SEEKER(input_string, dataset_models, search_in_metadata)
interpreter.process()

Operations: ['semantic:palm', 'semantic:leasing', 'cause_and_consequences:population']
No results found for search query: 'palm'
No results found for search query: 'leasing'
Performing cause and consequences search for: 'population' in dataset of size: 3


## Search in Content

### Succesful Query

In [4]:
from seeker.src.interpreter.interpreter import SEEKER

#Search string given by the user
input_string = "semantic:housing,semantic:palm,cause_and_consequences:erosion"

#Type of search user wants to do
search_in_metadata = False

#Call the interpreter
interpreter = SEEKER(input_string, dataset_models, search_in_metadata)
interpreter.process()


Operations: ['semantic:housing', 'semantic:palm', 'cause_and_consequences:erosion']
------------ Search Results ------------
Dataset: house_prices for search query: 'housing'
Score: 1373
Top Words:
  flat: 731123
  for: 648323
  ready: 560441
  this: 542459
  the: 525056
  sale: 425049
  move: 362705
  bhk: 359788
  resale: 269443
  lac: 220793
----------------------------------------
Dataset: City_MedianRentalPrice_3Bedroom for search query: 'housing'
Score: 0
Top Words:
  county: 968
  beach: 184
  san: 129
  fort: 119
  city: 97
  miami: 89
  los: 82
  angeles: 82
  palm: 82
  west: 81
----------------------------------------
Dataset: data for search query: 'housing'
Score: 0
Top Words:
  usa: 4600
  ave: 1940
  seattle: 1574
  way: 303
  renton: 297
  bellevue: 286
  redmond: 239
  issaquah: 190
  sammamish: 188
  kirkland: 187
----------------------------------------
------------ Search Results ------------
Dataset: house_prices for search query: 'palm'
Score: 675
Top Words:
  fla

### Unsuccesful Query

In [5]:
#Search string given by the user
input_string = "semantic:information,semantic:relevant,cause_and_consequences:climate change"

#Type of search user wants to do
search_in_metadata = False

#Call the interpreter
interpreter = SEEKER(input_string, dataset_models, search_in_metadata)
interpreter.process()

Operations: ['semantic:information', 'semantic:relevant', 'cause_and_consequences:climate change']
------------ Search Results ------------
Dataset: house_prices for search query: 'information'
Score: 50
Top Words:
  flat: 731123
  for: 648323
  ready: 560441
  this: 542459
  the: 525056
  sale: 425049
  move: 362705
  bhk: 359788
  resale: 269443
  lac: 220793
----------------------------------------
Dataset: City_MedianRentalPrice_3Bedroom for search query: 'information'
Score: 0
Top Words:
  county: 968
  beach: 184
  san: 129
  fort: 119
  city: 97
  miami: 89
  los: 82
  angeles: 82
  palm: 82
  west: 81
----------------------------------------
Dataset: data for search query: 'information'
Score: 0
Top Words:
  usa: 4600
  ave: 1940
  seattle: 1574
  way: 303
  renton: 297
  bellevue: 286
  redmond: 239
  issaquah: 190
  sammamish: 188
  kirkland: 187
----------------------------------------
------------ Search Results ------------
Dataset: house_prices for search query: 'relevant