Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where is the pre-training data stored, want to know the format of input data. #17

Open
DHms2020 opened this issue Jul 19, 2021 · 3 comments

Comments

@DHms2020
Copy link

In the relogic folder, about the tabart-pretraining.py, compare with rat-sql part, I didn't find the specific config file like xx.jsonet.
Are the paths of all input data and configuration files specified by the user? how could i know more information about the input data, I have a project that I want to use the GAP method to train on a Chinese dataset, so its important to me to know the format of original pre-training input data.
I would be grateful if anyone could tell me,Thanks

@Impavidity
Copy link
Contributor

For the column prediction and recovery, here is one data example.

{"entities": {"Award": [], "Designer": [], "Publisher": ["'Ravensburger'"]}, "control_code": [], "question": "What are the award and designer for the books whose publisher is not \"Ravensburger\"?", "table_info": {"caption": ["Spiel des Jahres", "2008 awards", "Game Of The Year"], "header": ["Game", "Designer", "Publisher", "Award"], "table": [["Stone Age", "Michael Tummelhofer", "Hans im Gl\u00fcck", "Nominee"], ["Keltis", "Reiner Knizia", "Kosmos", "Winner"], ["Witch's Brew", "Andreas Pelikan", "alea / Ravensburger", "Nominee"], ["Blox", "Wolfgang Kramer , J\u00fcrgen P.K. Grunau , Hans Raggan", "Ravensburger", "Nominee"], ["Suleika", "Dominique Ehrhard", "Zoch Spiele", "Nominee"]], "_id": "29364-12", "column_type": ["text", "text", "text", "text"], "table_name": "Game"}, "with_value_entity": ["Publisher"], "entity_to_value": {"Game": ["Stone Age", "Keltis", "Witch's Brew", "Blox", "Suleika"], "Designer": ["Michael Tummelhofer", "Reiner Knizia", "Andreas Pelikan", "Dominique Ehrhard"], "Publisher": ["Hans im Gl\u00fcck", "Kosmos", "alea / Ravensburger", "Ravensburger", "Zoch Spiele"], "Award": ["Nominee", "Winner"]}}

@DHms2020
Copy link
Author

DHms2020 commented Aug 31, 2021

For the column prediction and recovery, here is one data example.

{"entities": {"Award": [], "Designer": [], "Publisher": ["'Ravensburger'"]}, "control_code": [], "question": "What are the award and designer for the books whose publisher is not \"Ravensburger\"?", "table_info": {"caption": ["Spiel des Jahres", "2008 awards", "Game Of The Year"], "header": ["Game", "Designer", "Publisher", "Award"], "table": [["Stone Age", "Michael Tummelhofer", "Hans im Gl\u00fcck", "Nominee"], ["Keltis", "Reiner Knizia", "Kosmos", "Winner"], ["Witch's Brew", "Andreas Pelikan", "alea / Ravensburger", "Nominee"], ["Blox", "Wolfgang Kramer , J\u00fcrgen P.K. Grunau , Hans Raggan", "Ravensburger", "Nominee"], ["Suleika", "Dominique Ehrhard", "Zoch Spiele", "Nominee"]], "_id": "29364-12", "column_type": ["text", "text", "text", "text"], "table_name": "Game"}, "with_value_entity": ["Publisher"], "entity_to_value": {"Game": ["Stone Age", "Keltis", "Witch's Brew", "Blox", "Suleika"], "Designer": ["Michael Tummelhofer", "Reiner Knizia", "Andreas Pelikan", "Dominique Ehrhard"], "Publisher": ["Hans im Gl\u00fcck", "Kosmos", "alea / Ravensburger", "Ravensburger", "Zoch Spiele"], "Award": ["Nominee", "Winner"]}}

@Impavidity It's very helpful, thank you very much !
Besides that, could you provide one data example for the SQL Generation task? Because about your updated code class "QuerySchema2SQLDataset", I could hardly tell the difference between <example["extra"]> and <example["negative"]> in line 48.
Looking forward to your reply . Thanks again!

@shivashankarrs
Copy link

shivashankarrs commented Sep 4, 2021

@Impavidity

Is it possible to share the pre-training data separately (or is already shared somewhere in case I missed it)?

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants