Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add remaining IBM Debater CSV datasets #5

Merged
merged 10 commits into from
Jan 8, 2021

Conversation

djalova
Copy link
Collaborator

@djalova djalova commented Dec 8, 2020

Adds schemas for the following IBM Debater Datasets:

  • IBM Debater® Concept Abstractness
  • IBM Debater® Sentiment Compositions Lexicon
  • IBM Debater® Thematic Clustering of Sentences
  • IBM Debater® Wikipedia Category Stance
  • IBM Debater® Wikipedia Oriented Relatedness

@xuhdev
Copy link
Collaborator

xuhdev commented Dec 9, 2020

Is this ready for review?

@djalova
Copy link
Collaborator Author

djalova commented Dec 9, 2020

@xuhdev Yep this is ready

datasets.yaml Outdated
name: Unigrams Sentiment Lexicon
description: Unigrams with their sentiment score
format:
id: txt
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually csv? txt doesn't get the column options

Copy link
Collaborator Author

@djalova djalova Dec 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's technically a txt file but it can be read as a csv with a ' ' as the separator. Should I just change the format to csv?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. How about we merge this after we have addressed CODAIT/pardata#48 ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, if we are to merge now, could you add a comment line:

# TODO: Replace txt with csv when PyDAX supports specifying CSV delimiters.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure we can wait until that feature is implemented.

@xuhdev
Copy link
Collaborator

xuhdev commented Dec 17, 2020

Conflict again...

@djalova
Copy link
Collaborator Author

djalova commented Dec 17, 2020

Updated with master

datasets.yaml Outdated Show resolved Hide resolved
datasets.yaml Outdated Show resolved Hide resolved
Copy link
Collaborator

@edwardleardi edwardleardi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xuhdev xuhdev merged commit 40e46e0 into CODAIT:master Jan 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants