Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to from GitHub castorini/data to uWaterloo Castor-data #103

Merged
merged 6 commits into from May 23, 2018

Conversation

Projects
None yet
4 participants
@tuzhucheng
Copy link
Member

commented May 20, 2018

Migrate from castorini/data to https://git.uwaterloo.ca/jimmylin/Castor-data.

  • Changing paths in code to reflect new repo name.
  • Remove redundant instructions in README - aim to centralize instructions like cloning in the main README.md.

@rosequ please review changes in anserini_dependency, idf_baseline, sm_cnn, and kim_cnn?
@Victor0118 please review changes in nce and mp_cnn?

Both please review changes to the main README.md?

@tuzhucheng tuzhucheng requested review from Victor0118 and rosequ May 20, 2018

@Victor0118
Copy link
Member

left a comment

BTW, do we need to run our model to ensure each model works, right? I assume you haven't done it.

@@ -34,22 +31,13 @@ Install the dependency packages:

```
cd Castor
pip3 install -r requirements.txt
pip install -r requirements.txt

This comment has been minimized.

Copy link
@Victor0118

Victor0118 May 20, 2018

Member

Do all our models support both py2 and py3?

This comment has been minimized.

Copy link
@tuzhucheng

tuzhucheng May 20, 2018

Author Member

No, it only supports Python 3. But if they followed the setup correctly pip should be the correct one by default - if they explicitly use pip3 and pip doesn't work something is wrong with their environment set-up.

@@ -109,7 +109,7 @@ def get_answers(question, num_hits, k):
parser.add_argument('--no_cuda', action='store_false', help='do not use cuda', dest='cuda')
parser.add_argument('--gpu', type=int, default=0) # Use -1 for CPU
parser.add_argument('--seed', type=int, default=3435)
parser.add_argument('--dataset', help="the QA dataset folder {TrecQA|WikiQA}", default='../../data/TrecQA/')
parser.add_argument('--dataset', help="the QA dataset folder {TrecQA|WikiQA}", default='../../Castor-data/TrecQA/')

This comment has been minimized.

Copy link
@Victor0118

Victor0118 May 20, 2018

Member

two argument parsers in a file?

This comment has been minimized.

Copy link
@tuzhucheng

tuzhucheng May 20, 2018

Author Member

Didn't understand this comment :(

This comment has been minimized.

Copy link
@Victor0118

Victor0118 May 21, 2018

Member

I mean there are two parser objects in this file api.py. Is that what you want?

This comment has been minimized.

Copy link
@tuzhucheng

tuzhucheng May 21, 2018

Author Member

I see - that parser was already there (unchanged except for default path arg). This PR is only about the Castor-data issue. If that is a bug I would prefer to fix it separately in another PR.

@@ -144,7 +96,7 @@ NB: The results on WikiQA are based on the SM model hyperparameters.
to the `data/` folder

```bash
python $PYTHONPATH/utils/build_w2v.py --input data/aquaint+wiki.txt.gz.ndim=50.bin
python $PYTHONPATH/utils/build_w2v.py --input ../../Castor-data/word2vec/embeddings/aquaint+wiki.txt.gz.ndim=50.bin

This comment has been minimized.

Copy link
@Victor0118

Victor0118 May 20, 2018

Member

different from the path before embeddings/word2vec?

This comment has been minimized.

Copy link
@tuzhucheng

tuzhucheng May 20, 2018

Author Member

Yeah, this is the correct behaviour. We moved GloVe, word2vec etc. to embeddings per @lintool's suggestion.

@@ -20,7 +20,7 @@ def get_args():
parser.add_argument('--words_dim', type=int, default=50)
parser.add_argument('--dropout', type=float, default=0.5)
parser.add_argument('--epoch_decay', type=int, default=15)
parser.add_argument('--wordvec_dir', type=str, default='../../../data/word2vec/')
parser.add_argument('--wordvec_dir', type=str, default='../../../Castor-data/embeddings/word2vec/')

This comment has been minimized.

Copy link
@Victor0118

Victor0118 May 20, 2018

Member

different from the path below word2vec/embeddings ?

This comment has been minimized.

Copy link
@tuzhucheng

tuzhucheng May 20, 2018

Author Member

Ditto, embeddings are moved to the embeddings subdirectory now.

.
├── Castor
├── Castor-data
└── models

This comment has been minimized.

Copy link
@lintool

lintool May 22, 2018

Member

Why don't we just rename it to Castor-models while we're at it. I'll create this repo on UWaterloo git also.

@rosequ

rosequ approved these changes May 23, 2018

Copy link
Member

left a comment

LGTM

@lintool lintool merged commit f7a0167 into castorini:master May 23, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.