Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using a trained CRAFT model to predict the outcome of a single prompt-reply conversation #61

Closed
akoen opened this issue Aug 6, 2020 · 5 comments
Assignees

Comments

@akoen
Copy link

akoen commented Aug 6, 2020

Hi I'm Alex, a first-year student from the University of British Columbia trying to wrap my head around conversational analysis.

What I'm about to ask is way out of my depth, and I totally understand if you don't have the time or the energy to respond.

I'm trying to create a script to forecast whether or not a conversation will derail based on a user-entered response to a prompt from the conversations-gone-awry corpus. To do so, I'm trying to train a CRAFT forecaster on the CGA dataset to then predict the outcome of the 'conversation' that I create.

However, when I train my model and run in on the conversation, I get the same prediction probability regardless of the response:

image

Here is my best effort:

import convokit
import pandas as pd
from convokit import Corpus, download, TextParser, Utterance, Speaker, Conversation

from convokit import Forecaster
from convokit.forecaster.CRAFTModel import CRAFTModel
from convokit.forecaster.CRAFT import craft_tokenize

# Download dataset
DATA_CORPUS = 'conversations-gone-awry-corpus'
data_corpus = Corpus(download(DATA_CORPUS))
data_corpus.load_info('utterance', ['parsed'])

# Build a CRAFT model for the given dataset
awry_corpus = data_corpus
craft_model = CRAFTModel(device_type='cpu', model_path='finetuned_model.tar')
forecaster = Forecaster(forecaster_model = craft_model,
                        forecast_mode = "future",
                        convo_structure="linear",
                        text_func = lambda utt: utt.meta["tokens"][:(MAX_LENGTH-1)],
                        label_func = lambda utt: int(utt.meta['comment_has_personal_attack']),
                        forecast_feat_name="prediction", forecast_prob_feat_name="pred_score",
                        use_last_only = True,
                        skip_broken_convos=False
                       )

for utt in awry_corpus.iter_utterances():
    utt.add_meta("tokens", craft_tokenize(craft_model.voc, utt.text))

# Create new corpus for prompt-reply pair
utt_prompt = data_corpus.random_utterance()
prompt_speaker = Speaker(id="prompt_speaker")
utt_prompt = Utterance(id='prompt',text=utt_prompt.text, speaker=prompt_speaker, conversation_id='0', reply_to=None, timestamp=0)

input_text = input("Please enter a response: ")
reply_speaker = Speaker(id="reply_speaker")
utt_reply = Utterance(id="reply",text=input_text, speaker=reply_speaker, conversation_id='0', reply_to='prompt', timestamp=1)

convo_corpus = Corpus(utterances=[utt_prompt, utt_reply])

# Parse corpus text
ts = TextParser()
convo_corpus = ts.transform(convo_corpus)

for utt in convo_corpus.iter_utterances():
    utt.add_meta("tokens", craft_tokenize(craft_model.voc, utt.text))

# Forecast
MAX_LENGTH = 80
pred = forecaster.transform(convo_corpus)
forecaster.summarize(pred)

I really appreciate your time. I have taken this on as part of my paper for an english course, and so this is way out of my league―but what you've made is really cool and I'd be overjoyed if I got this to work.

@calebchiam calebchiam assigned calebchiam and jpwchang and unassigned calebchiam Aug 6, 2020
@jpwchang
Copy link
Collaborator

jpwchang commented Aug 6, 2020

Hey Alex,

Thanks for raising this issue! It turns out there is a bit of a bug in the Forecaster's behavior regarding the final comment of a conversation (which is why you're seeing odd behavior in your example, as your input comment is the final comment of its conversation). Please bear with us as we get that patched up - we should have a hotfix deployed by the end of the week!

@jpwchang
Copy link
Collaborator

jpwchang commented Aug 6, 2020

@akoen A hotfix for this issue has been published! If you go ahead and update your installation of ConvoKit, you should now get the results you expect. IMPORTANT: note that as part of this update, the arguments forecast_feat_name and forecast_prop_feat_name have been renamed to forecast_attribute_name and forecast_prob_attribute_name, respectively, so you will need to update your code to account for this.

Here's a quick demonstration I did using the code snippet you provided above, showing that the prediction changes (as expected) when changing from a positive reply to a rude reply:

Please enter a response: That's a great idea!
Iteration: 1; Percent complete: 100.0%
        prediction  pred_score
utt_id
reply          0.0    0.175107

Please enter a response: Seriously? That's an idiotic idea.
Iteration: 1; Percent complete: 100.0%
        prediction  pred_score
utt_id
reply          1.0    0.580046

@akoen
Copy link
Author

akoen commented Aug 6, 2020

This is about the absolute best possible reply I could have received to my comment. Thanks @jpwchang, you're the man.

@akoen akoen closed this as completed Aug 6, 2020
@calebchiam
Copy link
Collaborator

@all-contributors please add @akoen for bug

@allcontributors
Copy link
Contributor

@calebchiam

I've put up a pull request to add @akoen! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants