Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database schema revisions #71

Closed
andylolz opened this issue May 21, 2024 · 1 comment · Fixed by #84
Closed

Database schema revisions #71

andylolz opened this issue May 21, 2024 · 1 comment · Fixed by #84
Assignees
Labels
backend This requires backend work frontend This requires frontend work

Comments

@andylolz
Copy link
Contributor

andylolz commented May 21, 2024

The current database schema (#62 (comment)) has a few gaps. We’d like to store the following things:

Here’s a proposed revised schema. This is WIP, and it may also be more complicated than we really need. But hopefully it should capture the things mentioned above. UPDATE: @dcorney, @ff-dh, @JamesMcMinn and @andylolz discussed and agreed the following:

erDiagram
  youtube_videos ||--o{ claim_extraction_runs : runs
  youtube_videos {
    text id
    text metadata
    text transcript
  }

  claim_extraction_runs ||--o{ inferred_claims : claims
  claim_extraction_runs {
    integer id PK
    text youtube_id FK
    text model
    text status
    integer timestamp
  }

  inferred_claims {
    integer id PK
    integer run_id FK
    text claim
    text raw_sentence_text
    text labels
    real offset_start_s
    real offset_end_s
  }

  training_claims {
    integer id PK
    text youtube_id
    text claim
    text labels
  }
Loading
@dcorney
Copy link
Contributor

dcorney commented May 21, 2024

  • Removed foreign key from training_claims to youtube_videos
  • claim_extraction_runs.models = concatenation of model-name (from code) + git hash of health-misinfo-shared folder
  • inferred_claims.raw_sentence_text is returned by the gemini model and should correspond to the claim text
  • chunks are generated on the fly and not stored

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend This requires backend work frontend This requires frontend work
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants