-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Low accuracy in predicting SQL using RESDSQL on my dataset #45
Comments
Hi! |
I have been utilizing the provided checkpoints in RESDSQL to enhance my work. However, I am uncertain about the process of fine-tuning RESDSQL on my own dataset. Could you kindly provide guidance on how to proceed with fine-tuning RESDSQL using my specific dataset? |
RESDSQL has been fine-tuned on Spider. Therefore, you should prepare your dataset in the same format as it (its home page https://yale-lily.github.io/spider). In fact, most Text-to-SQL datasets organize their data in Spider's format (e.g., Dr. Spider, CSpider, BIRD, Kaggle-DBQA, etc.). |
I had already set my dataset in the format of Spider (i.e. tables.json) |
Just To train RESDSQL on your dataset, you have to prepare at least three files (Take Spider's file as an example):
To run inference and evaluation, you should prepare a separate |
Okay, I'll try this solution. BTW I have a question do I need to train the model every time whenever I change my dataset? |
No, if your training set and test set have the same (or similar) distribution, it can be naturally generalized to the hidden test set without additional training. |
Okay. I followed your training steps and noticed that both |
They are different files. |
We use |
So can I use the same dev.json file? or Do I need to create it separately as per my dataset? |
My suggestion would be to create a separate |
If you're training and evaluating your model on the training set, I don't think it makes sense because the model will memorize your training data to quickly reach (close to) 100% accuracy. |
You mean, train_spider.json and dev.json are the same as we are splitting our data into two sets i.e. train set and test set |
Yes |
Hello everyone,
I hope you're doing well. I encountered an issue while using RESDSQL for predicting SQL on my dataset. Despite following all the recommended steps, I'm observing an accuracy range of only 30-40%. I would greatly appreciate any suggestions or insights on increasing the predictions' accuracy.
Thank you in advance for your assistance!
The text was updated successfully, but these errors were encountered: