Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to execute own queries? #5

Open
kev2513 opened this issue Feb 4, 2021 · 9 comments
Open

How to execute own queries? #5

kev2513 opened this issue Feb 4, 2021 · 9 comments

Comments

@kev2513
Copy link

kev2513 commented Feb 4, 2021

Hello I would like to insert my own Questions and Databases but when I try to change the Spider json files it get the error:

RuntimeError: Error(s) in loading state_dict for EncDecModel:
	size mismatch for decoder.rule_logits.2.weight: copying a param with shape torch.Size([97, 128]) from checkpoint, the shape in current model is torch.Size([76, 128]).
	size mismatch for decoder.rule_logits.2.bias: copying a param with shape torch.Size([97]) from checkpoint, the shape in current model is torch.Size([76]).
	size mismatch for decoder.rule_embedding.weight: copying a param with shape torch.Size([97, 128]) from checkpoint, the shape in current model is torch.Size([76, 128]).
	size mismatch for decoder.node_type_embedding.weight: copying a param with shape torch.Size([55, 64]) from checkpoint, the shape in current model is torch.Size([49, 64]).

Is the an elegant solution to test my own data?
Thanks in advance!

@kev2513 kev2513 changed the title How to train with own queries? How to execute own queries? Feb 4, 2021
Impavidity added a commit to Impavidity/gap-text2sql that referenced this issue Feb 5, 2021
@Impavidity
Copy link
Contributor

Hey,
Thanks for your interests on our work. You can checkout the pull request #6 when it is merged. I think you can run your own database and queries based on the notebook I provided.

Let me know if it works for you and let me know if you have any further questions.

Peng

@kev2513
Copy link
Author

kev2513 commented Feb 6, 2021

Hello Peng,

Thank you very much for your quick response! I tried the notebook and it worked 👍 I will let you know if I have any questions. Have a nice weekend.

Kevin

@kev2513
Copy link
Author

kev2513 commented Feb 6, 2021

Hello Peng,

I made further tests and figured out that the response sometimes contains the word 'terminal' for example:

Query: department with budget greater then 10 billion

Answer: SELECT department.Department_ID FROM department WHERE department.Budget_in_Billions > 'terminal'

I guess 'terminal' should be replaced the words contained in the query. How can the replacement be achieved?

Sincerely
Kevin

@Impavidity
Copy link
Contributor

Hey Kevin,

Thanks for your question. So the terminal usually will be a cell value: it could be a float/integer or a string.
It usually involves some value copy mechanism to do it; but currently the model doesn't support it.

However, there is a simple solution for this:
If it is number, you can easily detect the number in the utterance and directly fill it in the generated SQL. For string type, you can match the n-gram against the value in the databases: if it matched, it would be a string value for the corresponding column.

I think above is a simple solution for this. I have a script to achieve this but it takes time to clean it and make it public. You can try this method out because that is pretty simple.

And I will try to make the script public as soon as possible if you did not implement it by yourself.

Peng

@kev2513
Copy link
Author

kev2513 commented Feb 7, 2021

Hey Peng,

Thank you very much for your explanation. I will try my best :)

Sincerely
Kevin

@thecodemakr
Copy link

Hi @Impavidity @kev2513 , I get the following error on trying the notebook -

WARNING <class 'seq2struct.models.enc_dec.EncDecModel.Preproc'>: superfluous {'name': 'EncDec'}
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-21-d986dbd802ee> in <module>()
----> 1 inferer = Inferer(infer_config)

4 frames
/content/gap-text2sql/rat-sql-gap/seq2struct/commands/infer.py in __init__(self, config)
     34             registry.lookup('model', config['model']).Preproc,
     35             config['model'])
---> 36         self.model_preproc.load()
     37 
     38     def load_model(self, logdir, step):

/content/gap-text2sql/rat-sql-gap/seq2struct/models/enc_dec.py in load(self)
     54 
     55         def load(self):
---> 56             self.enc_preproc.load()
     57             self.dec_preproc.load()
     58 

/content/gap-text2sql/rat-sql-gap/seq2struct/models/spider/spider_enc.py in load(self)
   1272 
   1273     def load(self):
-> 1274         self.tokenizer = BartTokenizer.from_pretrained(self.data_dir)
   1275 
   1276 

/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py in from_pretrained(cls, *inputs, **kwargs)
   1138 
   1139         """
-> 1140         return cls._from_pretrained(*inputs, **kwargs)
   1141 
   1142     @classmethod

/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py in _from_pretrained(cls, pretrained_model_name_or_path, *init_inputs, **kwargs)
   1244                     ", ".join(s3_models),
   1245                     pretrained_model_name_or_path,
-> 1246                     list(cls.vocab_files_names.values()),
   1247                 )
   1248             )

OSError: Model name 'data/spider-bart/nl2code-1115,output_from=true,fs=2,emb=bart,cvlink/enc' was not found in tokenizers model name list (facebook/bart-base, facebook/bart-large, facebook/bart-large-mnli, facebook/bart-large-cnn, facebook/bart-large-xsum, yjernite/bart_eli5). We assumed 'data/spider-bart/nl2code-1115,output_from=true,fs=2,emb=bart,cvlink/enc' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.json', 'merges.txt'] but couldn't find such vocabulary files at this path or url.

Can you please help me with what step am I missing?

@kev2513
Copy link
Author

kev2513 commented Feb 28, 2021

Hello @thecodemakr,

I got the same issue executing Inference solved the problem for me:

python run.py preprocess experiments/spider-configs/gap-run.jsonnet

(also execute the Preprocess dataset step in advance)

@alan-ai-learner
Copy link

Can you pls tell me that, How much time this command will run "python run.py preprocess experiments/spider-configs/gap-run.jsonnet", i'm running it for like an hr

@roburst2
Copy link

Hi @Impavidity @kev2513 , I get the following error on trying the notebook -

WARNING <class 'seq2struct.models.enc_dec.EncDecModel.Preproc'>: superfluous {'name': 'EncDec'}
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-21-d986dbd802ee> in <module>()
----> 1 inferer = Inferer(infer_config)

4 frames
/content/gap-text2sql/rat-sql-gap/seq2struct/commands/infer.py in __init__(self, config)
     34             registry.lookup('model', config['model']).Preproc,
     35             config['model'])
---> 36         self.model_preproc.load()
     37 
     38     def load_model(self, logdir, step):

/content/gap-text2sql/rat-sql-gap/seq2struct/models/enc_dec.py in load(self)
     54 
     55         def load(self):
---> 56             self.enc_preproc.load()
     57             self.dec_preproc.load()
     58 

/content/gap-text2sql/rat-sql-gap/seq2struct/models/spider/spider_enc.py in load(self)
   1272 
   1273     def load(self):
-> 1274         self.tokenizer = BartTokenizer.from_pretrained(self.data_dir)
   1275 
   1276 

/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py in from_pretrained(cls, *inputs, **kwargs)
   1138 
   1139         """
-> 1140         return cls._from_pretrained(*inputs, **kwargs)
   1141 
   1142     @classmethod

/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py in _from_pretrained(cls, pretrained_model_name_or_path, *init_inputs, **kwargs)
   1244                     ", ".join(s3_models),
   1245                     pretrained_model_name_or_path,
-> 1246                     list(cls.vocab_files_names.values()),
   1247                 )
   1248             )

OSError: Model name 'data/spider-bart/nl2code-1115,output_from=true,fs=2,emb=bart,cvlink/enc' was not found in tokenizers model name list (facebook/bart-base, facebook/bart-large, facebook/bart-large-mnli, facebook/bart-large-cnn, facebook/bart-large-xsum, yjernite/bart_eli5). We assumed 'data/spider-bart/nl2code-1115,output_from=true,fs=2,emb=bart,cvlink/enc' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.json', 'merges.txt'] but couldn't find such vocabulary files at this path or url.

Can you please help me with what step am I missing?

@thecodemakr I am also facing the issue while running the notebook
How did you resolve this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants