Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add disclaimer about ML generated transcription to event search results page #141

Closed
evamaxfield opened this issue Dec 9, 2021 · 14 comments · Fixed by #234
Closed

Add disclaimer about ML generated transcription to event search results page #141

evamaxfield opened this issue Dec 9, 2021 · 14 comments · Fixed by #234
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@evamaxfield
Copy link
Member

We have known there will always be some transcription errors and until we invest in making our own model, we should at the very least put a disclaimer about them:

To provide event transcripts at low-cost, Council Data Project uses Google Speech-to-Text. The transcriptions may include errors and absurdities. Please understand that our team regrets any miscommunication caused by these errors.

@evamaxfield evamaxfield added the enhancement New feature or request label Dec 9, 2021
@evamaxfield
Copy link
Member Author

Further thinking about this, maybe we should only show this on the event page if the transcript was generated by a known ML method. We store the generator in the transcript file: https://github.com/CouncilDataProject/cdp-backend/blob/main/cdp_backend/pipeline/transcript_model.py#L174

If the generator matches a regex like: https://github.com/CouncilDataProject/cdp-backend/blob/main/cdp_backend/sr_models/google_cloud_sr_model.py#L248

Then display this disclaimer?

@smai-f
Copy link

smai-f commented Apr 4, 2022

Hey @JacksonMaxfield , is anyone working on this, and if not, can I pick it up? Mark is in contact with our city about our CDP instance and I think it'd be good to have this disclaimer since all of our transcripts are ML generated.

@evamaxfield
Copy link
Member Author

To my knowledge no one is working on this! Feel free to take it on.

Totally agree that it's a good thing to have haha. Where to put it on the single event page / maybe search page too is the question in my mind.

@smai-f
Copy link

smai-f commented Apr 4, 2022

@JacksonMaxfield Cool, I'll try a few placements out and see what y'all think!

@evamaxfield
Copy link
Member Author

Wooo! Thanks!

@Shak2000
Copy link
Contributor

I made a prototype of a few options, and I want to get early feedback:

  1. In the transcript: https://drive.google.com/file/d/1S9pE5ItALlGTng8BQiE0CFX2lmsy4vYl/view?usp=sharing
  2. In the search page: https://drive.google.com/file/d/1kQCa10g3FSBmdl5ogtniRU71uy-yopDf/view?usp=sharing

I can add a box around the text, color the text, or change the font size. I can also put the text below the meetings.

For now, the text is displayed for all results. Once we decide how to display the text, I will how to filter it only for ML classifications. I think that classification can only work for the first option since we have too many events in the second option—some of them can be ML-generated, others can be human-generated.

I am open to receiving any advice.

@evamaxfield
Copy link
Member Author

Hey @Shak2000 these look good, I have one suggestion but before I get to it, I am curious if you can try deploying: https://github.com/CouncilDataProject/cdp-frontend/blob/main/CONTRIBUTING.md#deploying-your-storybook-docs-site-or-example-app

Specifically:

npm run build:app
npm run deploy:app

Then we can simply go to your page and check it out.

@Shak2000
Copy link
Contributor

Here it is: https://shak2000.github.io/cdp-frontend/#/

@evamaxfield
Copy link
Member Author

Thanks! I think I like the event search results location more than the transcript search results location.

Feel free to remove the transcript search results one.

I am tempted to ask to also place this in the footer so it's always present.

@evamaxfield
Copy link
Member Author

I may have some general rewording later

@Shak2000
Copy link
Contributor

I removed the transcript search location. I added the footer. I put the footer where the copyrights. I can move it into the links in the area above

@evamaxfield
Copy link
Member Author

I removed the transcript search location. I added the footer. I put the footer where the copyrights. I can move it into the links in the area above

Wanna open a PR?

Four things:

  1. for the text on the event search results page: can you make the font size smaller? It is currently rendered at 1.125 rem it looks like. I think 1 rem looks better personally
  2. for the text on the event search results page: can you change it to: "To provide event transcripts at low-cost, Council Data Project uses Google Speech-to-Text for transcription. Event transcripts may include errors."
  3. for the text on the event search results page: can you move the disclaimer to below the body + date filters and the sort options?
  4. for the text in the footer: can you change it to: "In many cases, Council Data Project utilizes a fine-tuned Google Speech-to-Text model for generation of event transcripts. We understand that transcripts may include errors. If you are a machine learning expert and wish to help improve our system for generating transcripts, please reach out to us on GitHub."

@Shak2000
Copy link
Contributor

I implemented all of the requests

@Shak2000
Copy link
Contributor

Can you please assign it to me? I have a PR waiting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants