Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Searchcollection #114

Merged
merged 6 commits into from
May 19, 2020
Merged

Searchcollection #114

merged 6 commits into from
May 19, 2020

Conversation

yuki617
Copy link
Member

@yuki617 yuki617 commented May 18, 2020

No description provided.

import re
import argparse
import sys
sys.path.insert(0,'./')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this necessary? I don't think so...



def main(index, topics, output):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this down into __main__


if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Create a ArcHydro schema')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"ArcHydro schema"??

with open(topics, 'r') as content_file:
content = content_file.read()
result = re.findall(r'(?<=<top>)(.+?)(?=<desc>)', content, flags=re.S)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can load topics directly: https://github.com/castorini/pyserini/blob/master/tests/test_loadtopics.py

So allow something like --topics robust04?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should I keep allowing the -topics filepath one?

Copy link
Member

@lintool lintool May 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interpret it as an id, per above, otherwise interpret it as a path (if it fails).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it. thanks

parser = argparse.ArgumentParser(description='Create a ArcHydro schema')

def main(index, topics, output):
searcher = SimpleSearcher(index)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need an extra method definition? just move the code down?

searcher = SimpleSearcher(index)
my_list = []
if topics == 'robust04':
topics_dic = get_topics('robust04')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just do something like get_topics(topics) and check None?

hits = searcher.search(search, 1000)
for i in range(0, len(hits)):
my_list.append(f'{number} Q0 {hits[i].docid.strip()} {i + 1} {hits[i].score:.6f} Anserini')
with open(output, 'w') as f:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can move the with to higher scope, you don't need to append to my_list, and just directly write out?

@lintool lintool merged commit 02a2dfb into castorini:master May 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants