<div class="alert alert-block alert-danger">

# Politeness Strategies Extraction and Quantification
    
#### Student Name: Chak Shing, Yuen
#### Student ID: 32331371

Date: 14-May-2024

Environment: Google Colab Python 3 Google Compute Engine backend

<div class="alert alert-block alert-info">
    
## Table of Contents

</div>

[1. Politeness Extraction](#1) <br>
$\;\;\;\;$[1.1. Import Libraries](#11) <br>
$\;\;\;\;$[1.2. Load the data](#12) <br>
$\;\;\;\;$[1.3. Functions](#13) <br>

<div class="alert alert-block alert-success">
    
# 1. Convokit: Politeness Strategies Extraction and Classifier <a class="anchor" name="1"></a>

<div class="alert alert-block alert-success">
    
## 1.1. Import Libraries <a class="anchor" name="11"></a>

In [1]:
# pip install convokit

In [2]:
import pandas as pd
from convokit import Classifier, Corpus, Utterance, Speaker, download
from convokit import TextParser, PolitenessStrategies
from convokit.model import Utterance

<div class="alert alert-block alert-success">
    
## 1.2. Load the data <a class="anchor" name="12"></a>

In [3]:
# Mount the Google Drive
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [4]:
# Downloading the wikipedia portion of annotated data
wiki_corpus = Corpus(download("wikipedia-politeness-corpus"))
parser = TextParser(verbosity=1000)
wiki_corpus = parser.transform(wiki_corpus)
ps = PolitenessStrategies()
wiki_corpus = ps.transform(wiki_corpus, markers=True)

Dataset already exists at /root/.convokit/downloads/wikipedia-politeness-corpus
1000/4353 utterances processed
2000/4353 utterances processed
3000/4353 utterances processed
4000/4353 utterances processed
4353/4353 utterances processed


In [5]:

# Load the forum dataset, not in utf-8 due to unknown character can't be parsed by utf-8
# forum = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/Research/stanfordMOOCForumPostsSet.txt', delimiter='\t', encoding='ISO-8859-1')
forum_1000 = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/Research/stanfordMOOCForumPostsSet.txt', delimiter='\t', encoding='ISO-8859-1', nrows=1000)

<div class="alert alert-block alert-success">
    
## 1.3. Functions <a class="anchor" name="13"></a>

In [6]:
def create_corpus_from_dataframe(df, column_name):
    """
    Creates a ConvoKit Corpus from a DataFrame column, including creating dummy Speakers.

    Parameters:
    - df: DataFrame containing the data.
    - column_name: String, name of the column to process.

    Returns:
    - ConvoKit Corpus object.
    """
    speakers = {str(i): Speaker(id=str(i)) for i in df.index}  # Create a dummy speaker for each entry
    utterances = []
    for index, row in df.iterrows():
        speaker = speakers[str(index)]
        utterances.append(Utterance(id=str(index), text=str(row[column_name]), speaker=speaker))
    return Corpus(utterances=utterances)

In [7]:
def apply_politeness_strategies(corpus):
    """
    Applies text parsing and politeness strategies to the corpus.

    Args:
    corpus (convokit.Corpus): The corpus to process.

    Returns:
    convokit.Corpus: The processed corpus with politeness strategies.
    """
    # Initialize and apply the text parser
    parser = TextParser(verbosity=1000)
    corpus = parser.transform(corpus)

    # Apply politeness strategies
    ps = PolitenessStrategies()
    return ps.transform(corpus)


In [8]:
def process_column(df, column_name):
    """
    Process the specified column of a DataFrame to add politeness strategies.

    Parameters:
    - df: DataFrame containing the data.
    - column_name: String, name of the column to process.

    Returns:
    - DataFrame with an additional column containing the processed text.
    """
    corpus = create_corpus_from_dataframe(df, column_name)
    corpus = apply_politeness_strategies(corpus)

    # Retrieve politeness strategies and add them to the DataFrame
    df['politeness_strategies'] = df.apply(lambda row: corpus.get_utterance(str(row.name)).meta['politeness_strategies'], axis=1)
    return df

In [9]:
def train_politeness_classifier(corpus):
    """
    Trains a classifier on a corpus that has politeness strategies applied.

    Args:
    corpus (convokit.Corpus): The corpus to train on.

    Returns:
    convokit.Classifier: The trained classifier.
    """
    classifier = Classifier(obj_type='utterance',
                            pred_feats=['politeness_strategies'],
                            labeller=lambda utt: utt.meta['Binary'] == 1)
    classifier.fit(corpus)
    return classifier

In [10]:
def predict_politeness(df, text_column, classifier):
    """
    Predicts politeness scores for the texts in a DataFrame using a trained classifier.

    Args:
    df (pd.DataFrame): DataFrame containing the text data.
    text_column (str): The column in the DataFrame containing the text.
    classifier (convokit.Classifier): A trained ConvoKit classifier.

    Returns:
    pd.DataFrame: The DataFrame with an added column for politeness scores.
    """
    # Convert DataFrame to corpus and apply politeness strategies
    corpus = create_corpus_from_dataframe(df, text_column)
    corpus = apply_politeness_strategies(corpus)

    # Predict using the classifier
    corpus = classifier.transform(corpus)

    # Extract predictions and attach to DataFrame
    df['politeness_classifier'] = [utt.meta['prediction'] for utt in corpus.iter_utterances()]
    return df

<div class="alert alert-block alert-success">
    
## 1.4. Process the data frame to add politeness strategies extracted <a class="anchor" name="14"></a>

In [11]:
# Process the 'Text' column
processed_data = process_column(forum_1000, 'Text')
processed_data.info()
# processed_data.head(5)

1000/1000 utterances processed
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 19 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Text                   1000 non-null   object 
 1   Opinion(1/0)           1000 non-null   int64  
 2   Question(1/0)          1000 non-null   int64  
 3   Answer(1/0)            1000 non-null   int64  
 4   Sentiment(1-7)         1000 non-null   float64
 5   Confusion(1-7)         1000 non-null   float64
 6   Urgency(1-7)           1000 non-null   float64
 7   CourseType             1000 non-null   object 
 8   forum_post_id          1000 non-null   object 
 9   course_display_name    1000 non-null   object 
 10  forum_uid              1000 non-null   object 
 11  created_at             1000 non-null   object 
 12  post_type              1000 non-null   object 
 13  anonymous              1000 non-null   bool   
 14  anonymous_to_peers     100

In [12]:
# Train the wiki
classifier = train_politeness_classifier(wiki_corpus)

# New DataFrame to predict either polite(1) or impolite(0)
predicted_df = predict_politeness(processed_data, 'Text', classifier)
predicted_df.head(5)

Initialized default classification model (standard scaled logistic regression).
1000/1000 utterances processed


Unnamed: 0,Text,Opinion(1/0),Question(1/0),Answer(1/0),Sentiment(1-7),Confusion(1-7),Urgency(1-7),CourseType,forum_post_id,course_display_name,forum_uid,created_at,post_type,anonymous,anonymous_to_peers,up_count,comment_thread_id,reads,politeness_strategies,politeness_classifier
0,Interesting! How often we say those things to ...,1,0,0,6.5,2.0,1.5,Education,5225177f2c501f0a00000015,Education/EDUC115N/How_to_Learn_Math,30CADB93E6DE4711193D7BD05F2AE95C,9/2/2013 22:55,Comment,False,False,0,5221a8262cfae31200000001,41,"{'feature_politeness_==Please==': 0, 'feature_...",0
1,"What is \Algebra as a Math Game\"" or are you j...",0,1,0,4.0,5.0,3.5,Education,5207d0e9935dfc0e0000005e,Education/EDUC115N/How_to_Learn_Math,37D8FAEE7D0B94B6CFC57D98FD3D0BA5,8/11/2013 17:59,Comment,False,False,0,520663839df35b0a00000043,55,"{'feature_politeness_==Please==': 0, 'feature_...",0
2,I like the idea of my kids principal who says ...,1,0,0,5.5,3.0,2.5,Education,52052c82d01fec0a00000071,Education/EDUC115N/How_to_Learn_Math,CC11480215042B3EB6E5905EAB13B733,8/9/2013 17:53,Comment,False,False,0,51e59415e339d716000001a6,25,"{'feature_politeness_==Please==': 0, 'feature_...",0
3,"From their responses, it seems the students re...",1,0,0,6.0,3.0,2.5,Education,5240a45e067ebf1200000008,Education/EDUC115N/How_to_Learn_Math,C717F838D10E8256D7C88B33C43623F1,9/23/2013 20:28,CommentThread,False,False,0,,0,"{'feature_politeness_==Please==': 0, 'feature_...",0
4,"The boys loved math, because \there is freedom...",1,0,0,7.0,2.0,3.0,Education,5212c5e2dd10251500000062,Education/EDUC115N/How_to_Learn_Math,F83887D68EA48964687C6441782CDD0E,8/20/2013 1:26,CommentThread,False,False,0,,3,"{'feature_politeness_==Please==': 0, 'feature_...",0
