Skip to content

Releases: AI-team-UoA/pyJedAI

0.1.7

24 Apr 07:59
Compare
Choose a tag to compare

⚒️ Fixed

  • Issue #19 , #20 , #21 ;
  • Removed FALCONN and SCANN
  • Refined dependencies
  • Removed Optuna injection
  • Fixed typos
  • Reports

Added

  • New utilities to docs

⚠️ Issues

  • None

Full Changelog: 0.1.6...0.1.7


Authored by @Nikoletos-K

0.1.6

15 Mar 10:47
b02dbab
Compare
Choose a tag to compare

⚒️ Fixed

  • Issue #16 ;
  • Typos in clustering.py
  • Datamodel gt initialization
  • Imports in utils
  • Bugs in NN-workflow
  • Bugs and evaluation of simple Schema Clustering

Added

  • Dataframe memory consumption
  • New Schema Clustering method for RDF data [Not final implementation - alpha version]

⚠️ Issues

  • SCANN and FALCONN produce warnings

Full Changelog: 0.1.5...0.1.6


Authored by @Nikoletos-K

0.1.5

15 Jan 10:43
Compare
Choose a tag to compare

⚒️ Fixed

Added

  • First working version of Schema Clustering [ @Nikoletos-K ]
  • vector_based_blocking component: SCANN/FAISS full functionality on Linux OS only! [ @JacobMaciejewski ]
  • RowColumnClustering: new clustering algorithm [ @JacobMaciejewski ]

⚠️ Issues

0.1.4

09 Dec 17:31
Compare
Choose a tag to compare

⚒️ Fixed

  • Correlation Clustering method.
  • nltk.download('stopwords') download only when needed.
  • Schema Matching component to align with the latest version of Valentine.

➕ Added

  • datamodel.py: SchemaData for Schema Matching Component
  • ‼️ New Component; pyJedAI Spatial, for Interlinking geospatial RDF data. [ @IordanisT ]
  • SCANN functionality, only available for Linux OS. [ @JacobMaciejewski ]

⚠️ Issues

  • None

0.1.3

22 Nov 14:28
Compare
Choose a tag to compare

⚒️ Fixed

  • None

➕ Added

  • Clustering algorithms: [ Author: @JacobMaciejewski 📌 ]

    • EquivalenceCluster
    • ExtendedSimilarityEdge
    • Vertex
    • RicochetCluster
    • ExactClustering
    • CenterClustering
    • BestMatchClustering
    • MergeCenterClustering
    • CorrelationClustering
    • CutClustering
    • MarkovClustering
    • KiralyMSMApproximateClustering
    • RicochetSRClustering
  • Blocking:

    • Statistics

⚠️ Issues

  • None

0.1.2

10 Oct 14:51
Compare
Choose a tag to compare

⚒️ Fixed

  • Fixed export methods. Use case of not providing a ground-truth
  • Time of vectorization by saving and retrieving the distance matrix
  • Bug resolution in PER indexing, Dirty ER
  • Speed/Memory optimizations in NN Blocking & Join PER

➕ Added

  • 'sqeuclidean' metric in matching step
  • Valentine as a Schema Matching plugin
  • Frequency Evaluator compatible with base ER matching

⚠️ Issues

  • Vectorizers (tfidf, etc) don't support dirty er. Will be fixed in the next release.

0.1.1

08 Sep 12:48
39f6a43
Compare
Choose a tag to compare

⚒️ Fixed

  • Removed deprecated whoosh imports from prioritization file

➕ Added

  • None

⚠️ Issues

  • None

0.1.0

25 Aug 12:05
7b46d21
Compare
Choose a tag to compare

⚒️ Fixed

  • Restructured Matching Module - vectorizer, tokenizer, and qgrams as arguments (not inferred)
  • Clustering step randomization bug

➕ Added

  • PER notebook tutorials
  • PER grid-search pipeline (config files, search scripts, storage)
  • PER workflows visualization and comparison through:
    • feature configuration budget-centric metric progress plots
    • feature configuration dataset-centric sorting and comparison

⚠️ Issues

  • None

0.0.9

20 Jul 14:10
4051eeb
Compare
Choose a tag to compare

⚒️Fixed:

  • FAISS euclidean distance
  • Workflow methods
  • Removed whoosh
  • Removed SCANN

➕Added:

  • 3 New workflow methods
  • Export pairs in each step
  • Tfidf weights in matching options
  • Website:
    • code API
    • new tutorials

⚠️ Issues:

  • None

0.0.8

05 Jul 18:56
Compare
Choose a tag to compare

Fixed:

  • Word grams tokenization
  • Code architecture in entity matching
  • py_stringmatching dependencies
  • Pypi readme

Added:

  • Boolean/Tfidf/Tf weights