Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use sqlite fts5 for faster and more natural feeling part selection #348

Merged
merged 13 commits into from
Apr 18, 2024

Conversation

chmorgan
Copy link
Collaborator

Use fts5 for full text searching in the partselector screen. Speeds up part searching queries in almost all cases and makes it easier to find parts without needs to mess with so many drop-downs.

@chmorgan chmorgan changed the title Cmm fts5 use sqlite fts5 for faster and more natural feeling part selection Jul 24, 2023
@chmorgan chmorgan force-pushed the cmm-fts5 branch 2 times, most recently from 6f5656d to 6492047 Compare July 24, 2023 13:38
@Bouni
Copy link
Owner

Bouni commented Jul 24, 2023

I get a sqlite3.OperationalError: no such module: fts5 on my Windows 11 machine. Is fts5 something that mus be installed additionally? If so I don't think that I want to merge this one as installing dependencies in Kicad, especially on Windows is a pain.

@Bouni
Copy link
Owner

Bouni commented Jul 24, 2023

According to the sqlite website:

"As of version 3.9.0 (2015-10-14), FTS5 is included as part of the SQLite amalgamation."

On my windows machine on KiCAD 7.99 print(sqlite3.version) gives me "2.6.0"

So I fear KiCAD shipps a ver old version of sqlite which does not include fts5. I think on Linux and MacOS Kicad uses the system version python which ships with a newer version of sqlite.

@chmorgan
Copy link
Collaborator Author

@Bouni let me chat with the other Kicad devs. I think you’d be surprised at how quick the searches are once you can try it out. I’ll post back here when I know more.

@craftyjon
Copy link

According to the sqlite website:

"As of version 3.9.0 (2015-10-14), FTS5 is included as part of the SQLite amalgamation."

On my windows machine on KiCAD 7.99 print(sqlite3.version) gives me "2.6.0"

So I fear KiCAD shipps a ver old version of sqlite which does not include fts5. I think on Linux and MacOS Kicad uses the system version python which ships with a newer version of sqlite.

From https://docs.python.org/3/library/sqlite3.html#sqlite3.version :

Version number of this module as a string. This is not the version of the SQLite library.

You want sqlite3.sqlite_version.

@chmorgan
Copy link
Collaborator Author

It sounds like there may be some interest in adding fts5 sqlite to the windows build. BUT that would still mean that anyone using an older kicad on windows wouldn't be able to use the fts5 features.

@Bouni before I go through the effort of using it if detected, would you be able to give it a whirl on a system that has fts5? I'd like to confirm that you see the searches are super quick and it seems like a helpful improvement. If we can cross that bridge I'll:

  • Submit PR to kicad enable fts5 feature for Windows builds
  • Modify the db generation system to generate two DBs, one with fts5 support and one without.
  • Modify the plugin to download the appropriate DB based on the database file found (maybe parts.db vs. parts_fts5.db)
  • Modify the plugin to use the original search code if fts5 support isn't present.
  • Add a button to indicate fts5 support or not, if you click it will let you know that you can get better performance if you upgrade to kicad XXX on windows (we'll know more after the PR discussion)

Then we can consider when it might make sense to drop the non-fts5 support in a few years when the older versions start to cycle out.

Thoughts?

@Bouni
Copy link
Owner

Bouni commented Jul 25, 2023

@chmorgan First of all: thank you for investing so much time in all this, highly appreciate it!

I will try to test the fts5 version on my linux machine today if possible. If the speed gain is as much as you say, which I believe even without having tested it, you can definitely go on and implement the support for it.
The most important thing for me is that we don't break the compatibility with non fts5 systems, but that should be not to hard to achieve.
Dropping non fts5 support will take some time, I have many users that are still using KiCAD 6 because they say its more stable than V7 and I'm not sure if the KiCAD devs will kind of backport fts5 support to V7 or just add it for V7.99/V8.
If they support both V7 and V8 I think we can drop non fts5 support rather quickly because I think about not officially support V6 any longer anyway.

@craftyjon Thanks for pointing out how to get the right sqlite version info and for all the dev work you do on KiCAD itself 🤩

@Bouni Bouni added the enhancement New feature or request label Jul 25, 2023
@chmorgan
Copy link
Collaborator Author

chmorgan commented Aug 3, 2023

@Bouni I take it you are running on windows? How are you able to see and debug the plugin from when kicad is running? I'm at the point where I've got a local kicad build running under visual studio that loads sqlite3 etc, but using your GitHub plugin library and the latest kicad-jlcpcb-tools version I'm not getting the icon showing up but not sure how to debug it further.

@chmorgan
Copy link
Collaborator Author

chmorgan commented Aug 7, 2023

And the upstream add for windows was merged, https://gitlab.com/kicad/code/kicad/-/merge_requests/1671

@Bouni
Copy link
Owner

Bouni commented Aug 8, 2023

Nice, so that should have already landed in the latest nightly or will in the next one.
That way I can test on my windows machine. Had no time to test on my linux machine so far 🫤

@Bouni
Copy link
Owner

Bouni commented Aug 10, 2023

@chmorgan I finally managed to test this at least on Windows. It works but is still not ideal in some areas:

  1. When I search for a DS2411 I get no results, only if I search for the exact name DS2411P+ I get what I'm looking for. before I used LIKE which gave me a result even when I searched for just a part of the name.
  2. When I clear the search, everything freezes for a few seconds. I'm not sure why but it feel like a huge amount of data is searched in the background or something like that ....

@chmorgan
Copy link
Collaborator Author

chmorgan commented Aug 10, 2023 via email

@Bouni
Copy link
Owner

Bouni commented Aug 10, 2023

Yes, I use your branch and downloaded the database which gives me your db automatically because of the hard coded URL in your branch.

@chmorgan chmorgan force-pushed the cmm-fts5 branch 2 times, most recently from 1c7f82c to 075eada Compare August 11, 2023 00:47
@chmorgan
Copy link
Collaborator Author

@Bouni alright, this should improve search behavior a bit. Can you give it another run?

The empty search is an interesting case. The time it takes to search is due to there being so many matching entries and I think this is a regression from the mainline code in terms of performance.

If we can settle on the changes I can put in the code that will download the appropriate database based on sqlite3 support and run the appropriate search routines. I was also thinking of putting a UI comment in a text bar or something advising if the user does an upgrade they can get improved free text search.

@Bouni
Copy link
Owner

Bouni commented Aug 11, 2023

@chmorgan The search works like mainline again! DS2411 returns 3 results as expected.

Clearing the keyword still freezes the window for some seconds

grafik

Do we really need to return anything if the keyword is empty? Or if none of the search fields (manufacturer, package etc.) has a value?

Also the other fields like manufacturer, part number etc. should also use a wildcard, if you search for Part Number DS2411 you get no result.

Searching for keyword 328P does not return ATMEGA328P as result because you only append a wildcard, I think we should prepend a wildcard as well.

And last but not least, the subcategory dropdown seems to have no effect

@chmorgan
Copy link
Collaborator Author

@Bouni yeah maybe we skip unless keyword is populated or unless part number is populated. Let me add some logic there for that and look at the other two search cases.

Note that before submitting here I had used the fts5 search to populate a few dozen parts for a new board design and for my particular use case it was much quicker than mainline rev, but I didn't test the cases you are looking at so it helps to get your feedback.

I'll try to push something tonight or tomorrow.

@Bouni
Copy link
Owner

Bouni commented Aug 11, 2023

Take your time, I won't have time to try before Monday.

I think the biggest win over JLCs website is the ability to literally find a part.

I for myself find it very hard to find a part on their website.

But having results faster is obviously a win 😁

@chmorgan
Copy link
Collaborator Author

@Bouni alright, this should be ready for another test run if you'd entertain one. You'll need to make sure you pull an updated parts.db from my GitHub.

@serpent213
Copy link
Contributor

I invite you to have a look at the following:

A good way might be to follow the blog article closely. I would copy enable_fts() with minimal changes, adding tokenizer as parameter. Additionally you might want create the unique index on parts as well.

The fts index table would look about like that in the end (but enable_fts() would create it for you):

-- Create full text search index for selected columns.
CREATE VIRTUAL TABLE IF NOT EXISTS parts_fts USING fts5 (
    'LCSC Part',
    'Description',
    'MFR.Part',
    'Package',
    'Manufacturer',
    content='parts',
    tokenize='trigram'
)

…and will only hold the search index for specific columns and will be filled automatically by the trigger when populating parts. This should also save some disk space.

See query() for how to do the query then.

Let me know what you think or any issue. 🙂

@chmorgan
Copy link
Collaborator Author

chmorgan commented Sep 2, 2023 via email

@chmorgan
Copy link
Collaborator Author

chmorgan commented Apr 11, 2024 via email

@chmorgan
Copy link
Collaborator Author

@Bouni lmk when you get a chance to read my last comment here, no rush.

@Bouni
Copy link
Owner

Bouni commented Apr 17, 2024

@chmorgan sorry for the late reply

The source db is updated daily at 3am (https://github.com/yaqwsx/jlcparts/actions/runs/8715756791/workflow#L6)

As JLC does not provide a real API (not officialy) I'm not sure if we should use the API for getting the actual stock of every part in a project instead of relying on max. 1 day old data. If they decide to block us for to many requests or whatever the plugin breaks. Not sure if thats really going to happen but still needs to be concidered.

Showing the age of the database somewhere is a good idea anyway!

My workflow involves using the website specifically because I haven’t trusted the stock numbers in the plugin to be the present ones.

What differences did you see? Are they always incorrect or just sometimes?

Are you asking to modify the scripts used by ci to generate both databases for backwards compatibility? I have no objection, just want to make sure I understand before changing anything here.

Lets say someone uses the old plugin version (non-fts5).
For that user we should have the parts.zip on the github.io page to download like it is at the moment.
So the CI script should continue what it does currently for at least a few more months.

For users of the up comming fts5 version we should extend the CI script (or add a seperate one) to generate a fts5 versipn, lets say we call it parts-fts5.zip.
That allows the old plugin to continue to work and the new one to download the fts5 parts database and use that. nothing breaks and the changes are minmal in my opinion.

How was the performance? The startup time was greatly improved with the categories table. I’m not a fan of a special table but it saved so much time it was hard not to include.

I did not really benchmark it but the fts5 feels quicker for searches and I really like the search as you type feature 🤩

Did you see the db adjustments I made to save space? The rohs one? Removing duplicate text etc?

No, the current database generation script was created by @markusdd and I did not look into it very closely as it worked right away 😅 My original approach was to use JLCs CSV and convert that into a sqlite db but they stopped to provide it when @markusdd jumped in an helped with the new version that uses @yaqwsx database. I don't even know what the source data looks like to be honest. I just remember that it is already kind of sanitized but unsure to what degree.

Feels like there maybe other size wins if we trim or adjust data when we generate the db.

I'm open to improvements! Maybe we should take a look at the source data and make use of what @yaqwsx already does instead of bending the data into a form that looks like what JLC supplied in the past. I can imagine that we could get an even better plugin that way!

But that should be a seperate PR in my opinion. Lets get fts5 ready so that I can release a new version soon

@yaqwsx
Copy link

yaqwsx commented Apr 17, 2024

JLCParts uses SQLite for the raw data cache. There is a table of components where we store what JLC PCB API provides and a raw JSON of attributes from LCSC (e.g., rated voltage, size of flash memory, etc). We mainly use this view: https://github.com/yaqwsx/jlcparts/blob/5f512fb46b27be19182fc2c0ebc989496f79a15e/jlcparts/partLib.py#L110-L130. This is what @markusdd uses as the source of data.

Meanwhile, the data from the API have a reasonable structure; data from LCSC are quite the opposite. This is why we have https://github.com/yaqwsx/jlcparts/blob/master/jlcparts/attributes.py, which is a messy code that tries to parse the LCSC mess and turn it into structured data (format string with values and units). Note that the sanitized data are not stored in the cache DB. The cache DB only stores the raw data.

@Bouni
Copy link
Owner

Bouni commented Apr 17, 2024

@yaqwsx Wow that was fast 😄

I sthere a reason why you don't cache the sanitized data?

@markusdd
Copy link
Contributor

Just chiming in here: What I did was basically just taking the great work by @yaqwsx and turning it into a shape that can be understood by the existing plugin(with not all data taken over as we do not need absoluty everything). What I added on top was the zip splitting and download logic so the chunks become compatible with GitHubs size limits.
This way the generation could be automatic and we're always up to date daily.
So all I do is transform the tables into what the old CSV DB looked like as that meant no modification of the plugin logic itself was required.

@yaqwsx
Copy link

yaqwsx commented Apr 17, 2024

@Bouni: It's published as a list of JSON files. As @markusdd said, there's a lot of extra information you couldn't utilize in your plugin. Otherwise, feel free to use the processed data or even use the processing scripts to turn the database into any format you need.

The reason why we store raw data and sanitize them from scratch is to be able to deliver improvements in sanitation to all components, not only the newly fetched ones.

…-fts5.db, from chunk_num.txt to chunk_num_fts5.txt
…s selected by default. Allows the fts5 search to work better as a lack of category enables keywords to search the entire database of parts.
For text fields, dwell for 750ms of inactivity before initiating a search to avoid initiating
searches so frequently they interfere with the user interacting with the UI.

Remove the search button.

Update help to note that searching is automatic.
…match

jlcparts_db_convert.py - Add trigram tokenizer

library.py - Drop wildcards as trigram tokenizer does them automatically
… 'Stock', and 'Price'

Reduces database size from 5.8G to 4.3G

Removing the index on 'LCSC Parts' saves a minimal amount of database size but prevents
consistent searching for LCSC Part numbers. Keep the index on 'LCSC Parts' so we can search
for part numbers with fts5 match.
…formance

Reduces some test queries by 20% in some cases and by 80% in others.
Reduces retrieving categories from 25s to 0.01s
* Baseline

jlcparts database (cache.sqlite3): 6.0 GB
parts-fts5.db: 4.7 GB
parts-fts5.db.zip: 679.0 MB
Elapsed time: 4 minutes and 57.33 seconds

* Remove duplicate package and category from description, convert double spaces
to single spaces and remove leading and trailing spaces from description.

jlcparts database (cache.sqlite3): 6.0 GB
parts-fts5.db: 4.0 GB
parts-fts5.db.zip: 559.3 MB
Elapsed time: 4 minutes and 9.77 seconds

Reduction of ~700MB (~15%) in database size.
… indicate its importance with the free text search capabilities of fts5
@chmorgan
Copy link
Collaborator Author

@Bouni refactored the db generation and use that to generate both parts.db and parts-fts5.db, with corresponding chunk_num.txt and chunk_num_fts5.txt. Lmk how this looks and other changes that make sense to make.

@Bouni
Copy link
Owner

Bouni commented Apr 18, 2024

@chmorgan I just wanted to test your changes. You already changed the URL to https://bouni.github.io/kicad-jlcpcb-tools/ which is what I requested. But there are no fts5 files at the moment for download (chciken egg problem 😅 ).
Do you have the files somewhere under https://chmorgan.github.io/ so that I can change the URL for the moment to test everything?

@chmorgan
Copy link
Collaborator Author

@Bouni of course, rebuilding and will repush them to my GitHub url shortly. Slipped my mind that CI wasn't running.

@chmorgan
Copy link
Collaborator Author

@Bouni alright, pushed, you can update like this to try it out:

diff --git a/library.py b/library.py
index 0ebbd12..924a708 100644
--- a/library.py
+++ b/library.py
@@ -379,7 +379,7 @@ class Library:
         start = time.time()
         wx.PostEvent(self.parent, ResetGaugeEvent())
         # Download the zipped parts database
-        url_stub = "https://bouni.github.io/kicad-jlcpcb-tools/"
+        url_stub = "https://chmorgan.github.io/jlcpcb-db/"
         cnt_file = "chunk_num_fts5.txt"
         cnt = 0
         chunk_file_stub = "parts-fts5.db.zip."

Note that the db download is still crashing, unrelated to these changes I think. Reloaded plugin, opened plugin, download started, crashed kicad, restarted, opened plugin, downloaded again, successful this time, no parts refs listed, closed plugin window, reopened plugin window, parts refs showing up and everything is working correctly.

@whmountains
Copy link
Collaborator

I also observed the DB download crash. Not every time, but it happened. So I can confirm it is not related to the changes in this PR.

@Bouni
Copy link
Owner

Bouni commented Apr 18, 2024

Perfect, I tested both versins old and new and both work as expected. Thank you for your efforts!

@Bouni Bouni merged commit 70588d7 into Bouni:main Apr 18, 2024
2 checks passed
@whmountains
Copy link
Collaborator

Yay! 🎉

@whmountains whmountains mentioned this pull request Apr 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants