Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store score and reasons #16

Closed
FelixSFD opened this issue Jan 21, 2018 · 9 comments · Fixed by #24
Closed

Store score and reasons #16

FelixSFD opened this issue Jan 21, 2018 · 9 comments · Fixed by #24
Assignees

Comments

@FelixSFD
Copy link
Member

To provide better statistics about the reports, we'd need to store the score of a report and maybe even the reasons.

Table reason:

reason_id name creation_date
1 String similarity timestamp of the first post with that reason

Table reason_for_post:

post_id reason_id score
34 1 Score, this reason reached (for example StringSimilarity has scores between 0 and 1)
@Bhargav-Rao
Copy link
Member

Can't we put score and reason in the posts table itself? That'd make it easier.

@Bhargav-Rao
Copy link
Member

Bhargav-Rao commented Jan 30, 2018

Thinking of a larger schema, incorporating #10 as well, like

CREATE TABLE IF NOT EXISTS posts (
 post_id INTEGER PRIMARY KEY,
 url_one text NOT NULL,
 url_two text NOT NULL,
 title_one text NOT NULL,
 title_two text NOT NULL,
 body_one text NOT NULL,
 body_two text NOT NULL,
 username_one text NOT NULL,
 username_two text NOT NULL,
 user_url_one text NOT NULL,
 user_url_two text NOT NULL,
 date_one INTEGER NOT NULL,
 date_two INTEGER NOT NULL,
 score INTEGER NOT NULL,
 reasons text NOT NULL
);

The updates for the present table would be,

ALTER TABLE posts ADD COLUMN username_one text NOT NULL DEFAULT "";
ALTER TABLE posts ADD COLUMN username_two text NOT NULL DEFAULT "";
ALTER TABLE posts ADD COLUMN user_url_one text NOT NULL DEFAULT "";
ALTER TABLE posts ADD COLUMN user_url_two text NOT NULL DEFAULT "";
ALTER TABLE posts ADD COLUMN score INTEGER NOT NULL DEFAULT 1;
ALTER TABLE posts ADD COLUMN reasons text NOT NULL DEFAULT "unknown;";

The reasons could be a ; separated list of reasons. We can then update /posts/pending endpoint to return the score and the reason.

@FelixSFD
Copy link
Member Author

Saving the total score in the Posts-Table: okay. (at least as a temporary solution)

But saving a list of reasons in a single column: No way. That would be redundant and really hard to parse to get statistics. I'd rather learn Python and implement this on my own than I would let anybody store a list as one string ;-) :-D

@Bhargav-Rao
Copy link
Member

If you want to learn Python and do it by yourself, then you've got quite a lot of time, as I'm terribly busy. :-D

Anyway, if that's the issue, then we can have a new table:

CREATE TABLE IF NOT EXISTS reasons (
 reason_id INTEGER PRIMARY KEY,
 reason text NOT NULL
);

The attribute in posts would still be reasons text NOT NULL, as SQLite3 doesn't support arrays (and I don't want to use BLOB). It would be stored as a CSV string of reasonIDs 1,2,3, etc. But then I think this would make it a bit DB intensive, first check if the reason exists, then retrieve the reason ID, then store it in the posts table. Same thing goes for retrieving. It must be one additional call, I don't think it'd be that heavy.

I'm open to better ideas.

@Bhargav-Rao
Copy link
Member

Okay, so as @quartata mentioned in chat, we can have an additional table which holds posts and reasons together. The schema would be:

CREATE TABLE IF NOT EXISTS caughtfor (
 post_id INTEGER NOT NULL,
 reason_id INTEGER NOT NULL,
 PRIMARY KEY (post_id, reason_id),
 FOREIGN KEY(post_id) REFERENCES posts(post_id),
 FOREIGN KEY(reason_id) REFERENCES reasons(reason_id)
);

@Bhargav-Rao
Copy link
Member

As we are also handling reposts, I think we would need to have a type field which can contain the values plagiarized or reposted.

@FelixSFD
Copy link
Member Author

FelixSFD commented Feb 1, 2018 via email

@Bhargav-Rao
Copy link
Member

Perfect, that's the best way.

@Bhargav-Rao
Copy link
Member

So yeah, I think we'll need to make both the reason_id and the reason as primary keys. This is because, we can utilize that constraint to prevent adding additional rows with the same reason text.

CREATE TABLE IF NOT EXISTS reasons (
 reason_id INTEGER,
 reason text NOT NULL,
 PRIMARY KEY (reason_id, reason)
);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants