Skip to content

Display chatbot response in the chat history #51

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

amannaik247
Copy link
Contributor

@amannaik247 amannaik247 commented May 4, 2025

The combo box earlier only displays user input.
It does not provide any history of the chatot's response.
This PR logs and displays the chatbots response to the learner.
The learner can easily hear the same response again by just clicking on one of the responses from the chat history.

I have changed the combo box which only displayed the user input to also display the bot response.

The Combo box now displays the response as:

You: user input
Bot: Bot response

Now the learner can:

  • Select the user input from the combo box which displays the text on the input place.
  • Select the chabot response which will prompt the bot to say the same response again and also gets displayed in the input place.

Text to speech tab:
The same conversation history is visible in this section as well with the same functionalities.
Learner can now select chatbot responses from the conversation history and edit them to hear their own version of the same text.

@amannaik247 amannaik247 marked this pull request as ready for review May 4, 2025 23:02
@amannaik247 amannaik247 marked this pull request as draft May 4, 2025 23:03
@amannaik247
Copy link
Contributor Author

@chimosky Please give this feature a try and let me know your views on this.
If you feel this would be a good addition, I will convert this into a proper PR.
Thank you!

The combo box earlier only displayed user input.
Added chabot response to the combo box.

When user clicks on user input from combo box, the message gets
displayed on the input place.
When user clicks on bot response, the message gets displayed on the
input place as well as the chatbot speaks the response again.

This conversation history is visible in the text-to-speech tab as well.
@chimosky
Copy link
Member

chimosky commented May 5, 2025

Why is this change needed? Your opening comment and commit message doesn't say anything about that.

@amannaik247
Copy link
Contributor Author

amannaik247 commented May 5, 2025

Why is this change needed? Your opening comment and commit message doesn't say anything about that.

Yes, I will update the opening comment to contain the concrete reason for change.

The reason for this update is:
In the chatbot section, the learner is only able to see the input texts they have given to the chabot. They have no option to see the response from the chatbot in textual format.

With this PR the learner can:

  • View full chat history (user + bot messages)
  • Replay bot responses for pronunciation practice
  • Edit bot responses to experiment with phrasing

Benefit:
The learner now has the option to again hear a particular response the chatbot has given earlier and reflect upon it.

@chimosky
Copy link
Member

chimosky commented May 5, 2025

The question I'm asking is, why would a learner need to hear a particular response the chatbot has given earlier?

Reflecting upon it doesn't seem like a compelling reason.

Why would they need to see the response from the chatbot in a textual format?

@amannaik247
Copy link
Contributor Author

amannaik247 commented May 5, 2025

The question I'm asking is, why would a learner need to hear a particular response the chatbot has given earlier?

Reflecting upon it doesn't seem like a compelling reason.

Why would they need to see the response from the chatbot in a textual format?

Ok. I understand your concern.
Let's take an example of a learner who is new to the language or only has basic amount of knowledge (Like most child learners who are still learning new words and their pronunciation).
When the chatbot speaks a sentence they might not comprehend certain words which are new to them. They would need additional help such as text of the same response.

Basically, while watching a movie which is not in our first language, we might need to enable subtitles to understand what is exactly being said.

@quozl
Copy link
Contributor

quozl commented May 5, 2025

Doesn't sound like a particularly good reason. The activity focus is text to speech, not speech to blog. Which reminds me, we lack an activity that can take speech and display text.

@chimosky
Copy link
Member

chimosky commented May 5, 2025

Doesn't sound like a particularly good reason. The activity focus is text to speech, not speech to blog. Which reminds me, we lack an activity that can take speech and display text.

I agree, a better solution would be to reduce speech speed as that'll help said learner hear the words at a slower pace and clearer.

@mebinthattil
Copy link

I agree, a better solution would be to reduce speech speed as that'll help said learner hear the words at a slower pace and clearer.

Doesn't reducing speech speed already exist in the speak activity?

@amannaik247
Copy link
Contributor Author

Doesn't sound like a particularly good reason. The activity focus is text to speech, not speech to blog. Which reminds me, we lack an activity that can take speech and display text.

Yes exactly, the focus is still kept on text to speech.
For example, when a learner hears a new word and might want to practice pronunciating it, by listening to it again, they do not have that option. Additionally, since they only hear it once and have no idea of how to spell it they can not type and hear it again.
Having an option to see the previous response, how to spell it and listen to it again makes it easy for the learner to practice saying it again.

Yes, @chimosky they can slow the speed down before hand but once the user has heard the response. If they want to hear the current response or hear a previous response again they do not have any option or way to do it.

I think there should be atleast an option to hear what the response is again, if by chance they want to practice saying it.

@chimosky
Copy link
Member

chimosky commented May 6, 2025

Yes exactly, the focus is still kept on text to speech.
For example, when a learner hears a new word and might want to practice pronunciating it, by listening to it again, they do not have that option. Additionally, since they only hear it once and have no idea of how to spell it they can not type and hear it again.
Having an option to see the previous response, how to spell it and listen to it again makes it easy for the learner to practice saying it again.

Here's an idea that Quozl brought up earlier; we lack an activity that can take speech and display text..

I think there should be atleast an option to hear what the response is again, if by chance they want to practice saying it.

You've not given a compelling reason why that should exist.

@amannaik247
Copy link
Contributor Author

Here's an idea that Quozl brought up earlier; we lack an activity that can take speech and display text..

If this is a separate activity that can be built. We can definitely work on it. Should we arrange a meet to discuss about what the basic functionalities and features of the activity would look like ?

You've not given a compelling reason why that should exist.

I genuinely believe this feature would be a valuable addition. It might also be helpful to gather feedback from students or teachers on how it could benefit them. Perhaps I could ask Devin for help during our next bi-weekly meeting to gather some reviews?

That said, I’m finding it a bit challenging to fully articulate the reasoning behind this suggestion here—I think it would be easier to explain in a conversation perhaps.
Thank you for taking the interest to review upon this feature.

@chimosky
Copy link
Member

chimosky commented May 7, 2025

If this is a separate activity that can be built. We can definitely work on it. Should we arrange a meet to discuss about what the basic functionalities and features of the activity would look like ?

We don't need to have a meting to discuss basic functionalities and features, feel free to implement whatever comes to your mind and share with us, then we can suggest improvements.

I genuinely believe this feature would be a valuable addition. It might also be helpful to gather feedback from students or teachers on how it could benefit them. Perhaps I could ask Devin for help during our next bi-weekly meeting to gather some reviews?

That said, I’m finding it a bit challenging to fully articulate the reasoning behind this suggestion here—I think it would be easier to explain in a conversation perhaps.
Thank you for taking the interest to review upon this feature.

Genuinely believing the feature would be valuable is okay, convincing us why is what's necessary.
We're already having a conversation here, if you can't articulate the reasoning here, then you can't do it some place else because we'll still be speaking the same language so nothing will change besides the medium of communication.

Feel free to ask for help.

@amannaik247
Copy link
Contributor Author

amannaik247 commented May 7, 2025

Genuinely believing the feature would be valuable is okay, convincing us why is what's necessary.

I appreciate your patience and the opportunity to clarify the value of this feature.

Here, is a video of current version of the activity, without this feature(It is sped up so might sound wonky):

speak.normal.demo.mp4

Reinforced learning, comprehension, and active experimentation are key to understand something new or complex like languages.
Here is how this feature targets all of this to improve the activity for new learners and why it's necessary.

I will take help of the demo I have created within the speak activity with this feature to convey its necessity.

Please have a look at this feature's DEMO to see it in action:

chat.history.demo.mp4
  • Reinforcing retention – Learners can replay and review bot responses to practice pronunciation and comprehension.
    -- In the demo you can see that a learner can just click on any of the previous responses multiple times.

  • Comprehension/ eliminating misunderstanding – Text display helps correlate sounds with spelling (e.g., correcting misheard words like "know" vs. "no").
    -- Here, you can see that a new learner might get confused between 'no' and 'know'. Seeing the conversation can eliminate such misunderstanding.

  • Encouraging experimentation – Editing bot responses lets learners actively manipulate language.
    -- Once the user clicks on the response it immediately becomes available on the input box. And the user can now go to the 'text to speech' section and experiment with the given response to understand pronunciation differences (if there are any).

This type of learning through reinforcement, comprehension, and active experimentation can be only possible through this feature. Increasing the potential benefits for the new learner.

I hope this clears it out on why such a feature is necessary in bridging the gaps of this learning environment.

@quozl
Copy link
Contributor

quozl commented May 7, 2025

Thanks, looks nice, but I'm also unconvinced. The Speak activity is not for chatting. The Chat activity is for chatting. We should continue to focus on speech. I'd like the Speak activity to lose the concept of previous entries or responses in order to focus on speech. Human speech does not have a history feature. The chatbot feature was a source of text when the activity is not being shared by multiple learners in a classroom using the Telepathy stack. If there was a project to deploy a local LLM, it could be better designed as a new activity, without all the face, voice, speed, and sunglasses modes of Speak.

@amannaik247
Copy link
Contributor Author

If there was a project to deploy a local LLM, it could be better designed as a new activity, without all the face, voice, speed, and sunglasses modes of Speak.

There is a GSoC project that was put forth to make this chatbot more human like using an LLM.

I'd like the Speak activity to lose the concept of previous entries or responses in order to focus on speech.

Although, the feature demo looks like it is a chat. But the purpose of the feature is prominently focused upon more ways to improve speech by bridging the gaps where this activity lacks.
Maybe we could come up with a different way to implement the said feature? But since the chatbot will have an LLM functionality soon, I thought this will be the best way to implement it.

@amannaik247
Copy link
Contributor Author

Please have a look at this feature's DEMO to see it in action:

@chimosky please have a look at this whole comment on why it is a necessary feature to bridge gaps within the activity.

@mebinthattil
Copy link

@quozl said:

If there was a project to deploy a local LLM, it could be better designed as a new activity, without all the face, voice, speed, and sunglasses modes of Speak.

Having an LLM for chatbot and TTS model for voice is a GSoC 25 project and I have been assigned to work on it.


@chimosky said:

Which reminds me, we lack an activity that can take speech and display text.

@MostlyKIGuess is working on this activity and has made a demo, but he said it works only for python 3.9 +, so he is currently trying to work on bringing it to older python versions, but suggested that instead of supporting older python versions, sugar should just upgrade to use python 3.10+


@quozl @chimosky - During one of the sugar bi-weekly meets @amannaik247 had showed this demo to both @walterbender and @pikurasa and they liked this concept of having chatbot responses in history and have a distinction between the student's questions and bot's responses.

I also support this idea, but for a technical reason. Having previous responses and it's answers let's us avoid generating LLM responses if the question has already been answered before. Considering that generating an LLM response is a rather expensive action, it might make sense to check history if the question has been answered before. But I would like other's opinion on this as well.

@quozl
Copy link
Contributor

quozl commented May 12, 2025 via email

@amannaik247
Copy link
Contributor Author

amannaik247 commented May 12, 2025

Thanks.

Please review the activities that provide a chat interface and
recommend which one should be used for developing your project. You
can see activity usage outlines at https://help.sugarlabs.org/

The activities are Chat, Polari, IRC, and Speak.

Of these, Chat is the most advanced and current.

Thank you! This information will be really helpful!

@quozl
I think the chat activity chat interface is a good one as well.
Can you explain what do you mean by creating a separate fork for the chatbot to have visible responses ?
Does that mean we don't add the bot responses to the user interface of the current speak activity?

If that is the case, wouldn't it be good to have the response history visible (after making some changes to the UI and making it child friendly) then have it reviewed across students and teachers to see if it is as helpful as intended.

Once we have the LLM working in place we can definitely then add it in the UI if the response is positive.

I am okay with any path we decide to take. Having a separate fork for LLM based chatbot till the end of the project is a good idea.

My advice on Python versions is to develop for the latest stable
release of Python, which at the moment is 3.13. If any changes are
required to Sugar Toolkit, they must be done anyway

Would it cause problems when we want to test some features with children/schools with their old version of Sugar OR untill the Sugar Toolkit has been updated to accommodate Python 3.13.

@chimosky
Copy link
Member

Thanks, looks nice, but I'm also unconvinced. The Speak activity is not for chatting. The Chat activity is for chatting. We should continue to focus on speech. I'd like the Speak activity to lose the concept of previous entries or responses in order to focus on speech. Human speech does not have a history feature. The chatbot feature was a source of text when the activity is not being shared by multiple learners in a classroom using the Telepathy stack. If there was a project to deploy a local LLM, it could be better designed as a new activity, without all the face, voice, speed, and sunglasses modes of Speak.

I agree with you, after seeing the demo I still can't see how it helps.

"Editing bot responses lets learners actively manipulate language."

The bot response is certainly not edited or am I missing something?

@chimosky
Copy link
Member

@MostlyKIGuess is working on this activity and has made a demo, but he said it works only for python 3.9 +, so he is currently trying to work on bringing it to older python versions, but suggested that instead of supporting older python versions, sugar should just upgrade to use python 3.10+

I don't see how you can't improve on that if someone else has something on it.

I also support this idea, but for a technical reason. Having previous responses and it's answers let's us avoid generating LLM responses if the question has already been answered before. Considering that generating an LLM response is a rather expensive action, it might make sense to check history if the question has been answered before. But I would like other's opinion on this as well.

This doesn't exist yet, and the PR isn't a fix for something that's nonexistent.

@amannaik247
Copy link
Contributor Author

amannaik247 commented May 13, 2025

The bot response is certainly not edited or am I missing something?

In the demo, when I open the conversation history and click on one of the bot's responses it immediately becomes available in the input box for the user to edit that response text.
Or go to the text to speech tab and hear it again(or edit) since the same input box gets used in the text to speech tab.

[I have shown this in the demo. I have switched from the chatbot tab to text to speech tab and edited the bot's response]

I agree with you, after seeing the demo I still can't see how it helps.

@chimosky If you feel this is the case then I will respect your expertise and decision about the feature. Since you have a whole understanding what each activity in Sugar is about and their purpose.
Thank you for taking the efforts and being patient with my reasoning.

I will try to think about other ways to address this learning gap in the activity. If you have any solution or idea in mind, then please let me know! Thank you again!

@quozl
Copy link
Contributor

quozl commented May 13, 2025

I think the Chat activity allows this editing you seek. Perhaps you haven't used it yet because you need to have more than one instance of Sugar?

@chimosky
Copy link
Member

chimosky commented May 15, 2025

In the demo, when I open the conversation history and click on one of the bot's responses it immediately becomes available in the input box for the user to edit that response text.
Or go to the text to speech tab and hear it again(or edit) since the same input box gets used in the text to speech tab.

[I have shown this in the demo. I have switched from the chatbot tab to text to speech tab and edited the bot's response]

The bot response being edited means you can edit the bot response, what's also available from the history isn't the bot response but user input which is what's edited in the demo you shared.

I will try to think about other ways to address this learning gap in the activity. If you have any solution or idea in mind, then please let me know! Thank you again!

Looking forward to your contributions, the speech to text idea seems interesting and something that'll be nice to have.

@amannaik247
Copy link
Contributor Author

amannaik247 commented May 15, 2025

The bot response being edited means you can edit the bot response, what's also available from the history isn't the bot response but user input which is what's edited in the demo you shared.

In my code, I have separated the responses in the drop up box as user input and bot response:

You: What are you upto?
Bot: I don't know what I am upto

In the demo I have clicked on the bot's response["I don't know what I am upto] and edited it in the 'text to speech' tab.

Please check once again.
(I double checked myself to make sure the video is correct. Since the comment has 2 videos - a normal one and another one - with my branch's code)

Looking forward to your contributions, the speech to text idea seems interesting and something that'll be nice to have.

Yes, will work on this and share the updates.

@chimosky
Copy link
Member

Please check once again.
(I double checked myself to make sure the video is correct. Since the comment has 2 videos - a normal one and another one - with my branch's code)

No need for me to, the problem with that is there's an overload of information which isn't needed.

@amannaik247
Copy link
Contributor Author

No need for me to, the problem with that is there's an overload of information which isn't needed.

Understandable. We could work around it by making it color coded perhaps to separate the user response from bot's making it easily readable like a conversation.

This was just an idea to cover the gap of misunderstanding spellings.
Probably we could come up with more easier ideas to deal with this.

@chimosky
Copy link
Member

chimosky commented May 17, 2025

Just to be clear, this isn't an idea we'll be pursuing.

@amannaik247
Copy link
Contributor Author

amannaik247 commented May 18, 2025

Just to be clear, this isn't an idea we'll be pursuing.

Understood. Thank you for taking the time to engage in this conversation.
Will look at other things which could be a good addition.

@pikurasa
Copy link

@walterbender and I saw a presentation of this, and it seemed like a good idea to us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants