-
-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use sqlite fts5 for faster and more natural feeling part selection #348
Conversation
6f5656d
to
6492047
Compare
I get a |
According to the sqlite website:
On my windows machine on KiCAD 7.99 So I fear KiCAD shipps a ver old version of sqlite which does not include fts5. I think on Linux and MacOS Kicad uses the system version python which ships with a newer version of sqlite. |
@Bouni let me chat with the other Kicad devs. I think you’d be surprised at how quick the searches are once you can try it out. I’ll post back here when I know more. |
From https://docs.python.org/3/library/sqlite3.html#sqlite3.version :
You want |
It sounds like there may be some interest in adding fts5 sqlite to the windows build. BUT that would still mean that anyone using an older kicad on windows wouldn't be able to use the fts5 features. @Bouni before I go through the effort of using it if detected, would you be able to give it a whirl on a system that has fts5? I'd like to confirm that you see the searches are super quick and it seems like a helpful improvement. If we can cross that bridge I'll:
Then we can consider when it might make sense to drop the non-fts5 support in a few years when the older versions start to cycle out. Thoughts? |
@chmorgan First of all: thank you for investing so much time in all this, highly appreciate it! I will try to test the fts5 version on my linux machine today if possible. If the speed gain is as much as you say, which I believe even without having tested it, you can definitely go on and implement the support for it. @craftyjon Thanks for pointing out how to get the right sqlite version info and for all the dev work you do on KiCAD itself 🤩 |
@Bouni I take it you are running on windows? How are you able to see and debug the plugin from when kicad is running? I'm at the point where I've got a local kicad build running under visual studio that loads sqlite3 etc, but using your GitHub plugin library and the latest kicad-jlcpcb-tools version I'm not getting the icon showing up but not sure how to debug it further. |
And the upstream add for windows was merged, https://gitlab.com/kicad/code/kicad/-/merge_requests/1671 |
Nice, so that should have already landed in the latest nightly or will in the next one. |
@chmorgan I finally managed to test this at least on Windows. It works but is still not ideal in some areas:
|
Are you using my branch and the SQLite file from my GitHub location?
Agree that sometimes it takes a moment to finish the query.
Let me try the part you are searching for. It should be doing partial
searches almost the same as the previous approach.
…On Thu, Aug 10, 2023 at 7:15 AM bouni ***@***.***> wrote:
@chmorgan <https://github.com/chmorgan> I finally managed to test this at
least on Windows. It works but is still not ideal in some areas:
1. When I search for a *DS2411* I get no results, only if I search for
the exact name *DS2411P+* I get what I'm looking for. before I used
*LIKE* which gave me a result even when I searched for just a part of
the name.
2. When I clear the search, everything freezes for a few seconds. I'm
not sure why but it feel like a huge amount of data is searched in the
background or something like that ....
—
Reply to this email directly, view it on GitHub
<#348 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJH4AGI3LBUHO3QC35TW4LXUS7DNANCNFSM6AAAAAA2VPA22Q>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Yes, I use your branch and downloaded the database which gives me your db automatically because of the hard coded URL in your branch. |
1c7f82c
to
075eada
Compare
@Bouni alright, this should improve search behavior a bit. Can you give it another run? The empty search is an interesting case. The time it takes to search is due to there being so many matching entries and I think this is a regression from the mainline code in terms of performance. If we can settle on the changes I can put in the code that will download the appropriate database based on sqlite3 support and run the appropriate search routines. I was also thinking of putting a UI comment in a text bar or something advising if the user does an upgrade they can get improved free text search. |
@chmorgan The search works like mainline again! DS2411 returns 3 results as expected. Clearing the keyword still freezes the window for some seconds Do we really need to return anything if the keyword is empty? Or if none of the search fields (manufacturer, package etc.) has a value? Also the other fields like manufacturer, part number etc. should also use a wildcard, if you search for Part Number Searching for keyword And last but not least, the subcategory dropdown seems to have no effect |
@Bouni yeah maybe we skip unless keyword is populated or unless part number is populated. Let me add some logic there for that and look at the other two search cases. Note that before submitting here I had used the fts5 search to populate a few dozen parts for a new board design and for my particular use case it was much quicker than mainline rev, but I didn't test the cases you are looking at so it helps to get your feedback. I'll try to push something tonight or tomorrow. |
Take your time, I won't have time to try before Monday. I think the biggest win over JLCs website is the ability to literally find a part. I for myself find it very hard to find a part on their website. But having results faster is obviously a win 😁 |
@Bouni alright, this should be ready for another test run if you'd entertain one. You'll need to make sure you pull an updated parts.db from my GitHub. |
I invite you to have a look at the following: A good way might be to follow the blog article closely. I would copy The fts index table would look about like that in the end (but -- Create full text search index for selected columns.
CREATE VIRTUAL TABLE IF NOT EXISTS parts_fts USING fts5 (
'LCSC Part',
'Description',
'MFR.Part',
'Package',
'Manufacturer',
content='parts',
tokenize='trigram'
) …and will only hold the search index for specific columns and will be filled automatically by the trigger when populating See Let me know what you think or any issue. 🙂 |
You saw the commits in the PR??? I’d love to say I figured it all out from
the source code but the manual was helpful. :-)
Are there columns we can consider dropping entirely? I was wondering if the
stock column was particularly helpful since it’s effectively static at the
time of db creation.
Do we think there are columns people might not ever query on? If there
aren’t any/many non searchable columns there isn’t as much of a search
savings by not indexing them. Using the existing parts table as the backing
store of the fts5 table also means you’d need to join on queries right? It
didn’t seem to make sense to have the second table to deal with.
…On Sat, Sep 2, 2023 at 2:13 PM Steffen Beyer ***@***.***> wrote:
I invite you to have a look at the following:
- SQLite manual: External content
<https://www.sqlite.org/fts5.html#external_content_and_contentless_tables>
- Blog: Quick full-text search using SQLite
<https://abdus.dev/posts/quick-full-text-search-using-sqlite/>
A good way might be to follow the blog article closely. I would copy
enable_fts() with minimal changes, adding *tokenizer* as parameter.
Additionally you might want create the unique index on parts as well.
The fts index table would look about like that in the end (but
enable_fts() would create it for you):
-- Create full text search index for selected columns.
CREATE VIRTUAL TABLE IF NOT EXISTS parts_fts USING fts5 (
'LCSC Part',
'Description',
'MFR.Part',
'Package',
'Manufacturer',
content='parts',
tokenize='trigram'
)
…and will only hold the search index for specific columns and will be
filled automatically by the trigger when populating parts. This should
also save some disk space.
See query() for how to do the query then.
Let me know what you think or any issue. 🙂
—
Reply to this email directly, view it on GitHub
<#348 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJH4AHSLKBL6HUO6UXR5ILXYNZLZANCNFSM6AAAAAA2VPA22Q>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I run the various commands manually, I never got the ci working correctly
on the fork.
Do you know how often the source db is updated? Is that a daily thing? I
ask because one of the concerns I’ve had is the age of the stock data. It
would be helpful if there was a database updated date on the ui (if it’s
updated regularly to keep the stock numbers more accurate). My workflow
involves using the website specifically because I haven’t trusted the stock
numbers in the plugin to be the present ones.
Are you asking to modify the scripts used by ci to generate both databases
for backwards compatibility? I have no objection, just want to make sure I
understand before changing anything here.
How was the performance? The startup time was greatly improved with the
categories table. I’m not a fan of a special table but it saved so much
time it was hard not to include.
Did you see the db adjustments I made to save space? The rohs one? Removing
duplicate text etc? Feels like there maybe other size wins if we trim or
adjust data when we generate the db.
…On Thu, Apr 11, 2024 at 4:13 AM bouni ***@***.***> wrote:
Works fine for me on Windows 11 🥳
At the moment the database comes from your github.io ("
https://chmorgan.github.io/jlcpcb-db/").
So I guess you have a custom CI workflow that is not included in ths PR,
right?
If so, can we include it and make it create a different set of zip parts,
maybe parts.db-fts5.zip. ?
That way we can make sure that users of older plugin versions still can
use these versions.
Shouldn't make a difference for the new version, right?
—
Reply to this email directly, view it on GitHub
<#348 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJH4ADYU55N7MDCLZV2XWLY4ZA35AVCNFSM6AAAAAA2VPA22SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBZGE3DIMRXG4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@Bouni lmk when you get a chance to read my last comment here, no rush. |
@chmorgan sorry for the late reply The source db is updated daily at 3am (https://github.com/yaqwsx/jlcparts/actions/runs/8715756791/workflow#L6) As JLC does not provide a real API (not officialy) I'm not sure if we should use the API for getting the actual stock of every part in a project instead of relying on max. 1 day old data. If they decide to block us for to many requests or whatever the plugin breaks. Not sure if thats really going to happen but still needs to be concidered. Showing the age of the database somewhere is a good idea anyway!
What differences did you see? Are they always incorrect or just sometimes?
Lets say someone uses the old plugin version (non-fts5). For users of the up comming fts5 version we should extend the CI script (or add a seperate one) to generate a fts5 versipn, lets say we call it parts-fts5.zip.
I did not really benchmark it but the fts5 feels quicker for searches and I really like the search as you type feature 🤩
No, the current database generation script was created by @markusdd and I did not look into it very closely as it worked right away 😅 My original approach was to use JLCs CSV and convert that into a sqlite db but they stopped to provide it when @markusdd jumped in an helped with the new version that uses @yaqwsx database. I don't even know what the source data looks like to be honest. I just remember that it is already kind of sanitized but unsure to what degree.
I'm open to improvements! Maybe we should take a look at the source data and make use of what @yaqwsx already does instead of bending the data into a form that looks like what JLC supplied in the past. I can imagine that we could get an even better plugin that way! But that should be a seperate PR in my opinion. Lets get fts5 ready so that I can release a new version soon |
JLCParts uses SQLite for the raw data cache. There is a table of components where we store what JLC PCB API provides and a raw JSON of attributes from LCSC (e.g., rated voltage, size of flash memory, etc). We mainly use this view: https://github.com/yaqwsx/jlcparts/blob/5f512fb46b27be19182fc2c0ebc989496f79a15e/jlcparts/partLib.py#L110-L130. This is what @markusdd uses as the source of data. Meanwhile, the data from the API have a reasonable structure; data from LCSC are quite the opposite. This is why we have https://github.com/yaqwsx/jlcparts/blob/master/jlcparts/attributes.py, which is a messy code that tries to parse the LCSC mess and turn it into structured data (format string with values and units). Note that the sanitized data are not stored in the cache DB. The cache DB only stores the raw data. |
@yaqwsx Wow that was fast 😄 I sthere a reason why you don't cache the sanitized data? |
Just chiming in here: What I did was basically just taking the great work by @yaqwsx and turning it into a shape that can be understood by the existing plugin(with not all data taken over as we do not need absoluty everything). What I added on top was the zip splitting and download logic so the chunks become compatible with GitHubs size limits. |
@Bouni: It's published as a list of JSON files. As @markusdd said, there's a lot of extra information you couldn't utilize in your plugin. Otherwise, feel free to use the processed data or even use the processing scripts to turn the database into any format you need. The reason why we store raw data and sanitize them from scratch is to be able to deliver improvements in sanitation to all components, not only the newly fetched ones. |
…-fts5.db, from chunk_num.txt to chunk_num_fts5.txt
…s selected by default. Allows the fts5 search to work better as a lack of category enables keywords to search the entire database of parts.
For text fields, dwell for 750ms of inactivity before initiating a search to avoid initiating searches so frequently they interfere with the user interacting with the UI. Remove the search button. Update help to note that searching is automatic.
…match jlcparts_db_convert.py - Add trigram tokenizer library.py - Drop wildcards as trigram tokenizer does them automatically
… 'Stock', and 'Price' Reduces database size from 5.8G to 4.3G Removing the index on 'LCSC Parts' saves a minimal amount of database size but prevents consistent searching for LCSC Part numbers. Keep the index on 'LCSC Parts' so we can search for part numbers with fts5 match.
…formance Reduces some test queries by 20% in some cases and by 80% in others.
Reduces retrieving categories from 25s to 0.01s
* Baseline jlcparts database (cache.sqlite3): 6.0 GB parts-fts5.db: 4.7 GB parts-fts5.db.zip: 679.0 MB Elapsed time: 4 minutes and 57.33 seconds * Remove duplicate package and category from description, convert double spaces to single spaces and remove leading and trailing spaces from description. jlcparts database (cache.sqlite3): 6.0 GB parts-fts5.db: 4.0 GB parts-fts5.db.zip: 559.3 MB Elapsed time: 4 minutes and 9.77 seconds Reduction of ~700MB (~15%) in database size.
… indicate its importance with the free text search capabilities of fts5
@Bouni refactored the db generation and use that to generate both parts.db and parts-fts5.db, with corresponding chunk_num.txt and chunk_num_fts5.txt. Lmk how this looks and other changes that make sense to make. |
@chmorgan I just wanted to test your changes. You already changed the URL to https://bouni.github.io/kicad-jlcpcb-tools/ which is what I requested. But there are no fts5 files at the moment for download (chciken egg problem 😅 ). |
@Bouni of course, rebuilding and will repush them to my GitHub url shortly. Slipped my mind that CI wasn't running. |
@Bouni alright, pushed, you can update like this to try it out:
Note that the db download is still crashing, unrelated to these changes I think. Reloaded plugin, opened plugin, download started, crashed kicad, restarted, opened plugin, downloaded again, successful this time, no parts refs listed, closed plugin window, reopened plugin window, parts refs showing up and everything is working correctly. |
I also observed the DB download crash. Not every time, but it happened. So I can confirm it is not related to the changes in this PR. |
Perfect, I tested both versins old and new and both work as expected. Thank you for your efforts! |
Yay! 🎉 |
Use fts5 for full text searching in the partselector screen. Speeds up part searching queries in almost all cases and makes it easier to find parts without needs to mess with so many drop-downs.