<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Set-Up" data-toc-modified-id="Set-Up-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Set Up</a></span></li><li><span><a href="#QE4b" data-toc-modified-id="QE4b-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>QE4b</a></span><ul class="toc-item"><li><span><a href="#General-Cleaning" data-toc-modified-id="General-Cleaning-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>General Cleaning</a></span></li><li><span><a href="#N-Grams" data-toc-modified-id="N-Grams-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>N-Grams</a></span></li><li><span><a href="#Key-Word-Analysis" data-toc-modified-id="Key-Word-Analysis-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Key Word Analysis</a></span></li><li><span><a href="#Combinations?" data-toc-modified-id="Combinations?-2.4"><span class="toc-item-num">2.4&nbsp;&nbsp;</span>Combinations?</a></span></li><li><span><a href="#Try-Delimiting?---Online" data-toc-modified-id="Try-Delimiting?---Online-2.5"><span class="toc-item-num">2.5&nbsp;&nbsp;</span>Try Delimiting? - Online</a></span></li><li><span><a href="#Try-Delimiting?---Card" data-toc-modified-id="Try-Delimiting?---Card-2.6"><span class="toc-item-num">2.6&nbsp;&nbsp;</span>Try Delimiting? - Card</a></span></li><li><span><a href="#Try-Delimiting?---Business" data-toc-modified-id="Try-Delimiting?---Business-2.7"><span class="toc-item-num">2.7&nbsp;&nbsp;</span>Try Delimiting? - Business</a></span></li></ul></li></ul></div>

This notebook is an analysis of two questions from the 2020 Cash Alternative Survey (Wave 2): 

**E4b** Please tell us the reasons why you have changed the way you pay. What concerns, if any, do you have?

**E3** Asks for general comments

# Set Up

**Libraries**

In [1]:
import pandas as pd
import numpy as np
import collections
from collections import Counter
import re
import string
from spellchecker import SpellChecker
from langdetect import detect
# non-deterministic algorithm; different results for short / ambiguous text
# the code below ensures consistent results 
from langdetect import DetectorFactory
DetectorFactory.seed = 0
import math

# NLTK library
import nltk
from nltk.stem import WordNetLemmatizer 
lemmatizer = WordNetLemmatizer() 
from nltk.corpus import stopwords
from nltk import ngrams
w_tokenizer = nltk.tokenize.WhitespaceTokenizer()
lemmatizer = nltk.stem.WordNetLemmatizer()

# General
import os
from os import path

# Visualization
from IPython.display import display # display allows for >1 output per cell
from tabulate import tabulate

# Mute error warning
pd.options.mode.chained_assignment = None  # default='warn'

**Files**

In [2]:
input_files_path = "C:\\MOP_Survey\\trunk\\Methods-of-Payment surveys\\2020\\CAS Wave 2\\Data\\Final data files\\20-054726-01-09-Diary_FNL.dta"
mop_2020 = pd.read_stata(input_files_path)

**Global Variables**

In [16]:
# The list of stopwords that will be removed
stop_words = stopwords.words('english')
stop_words.extend(stopwords.words('french'))

# Exclude the following words (as they can alter the context of a response)
include = ["don't","won't"]
stop_words = [x for x in stop_words if x not in include]

**Helper Functions**

In [278]:
def remove_stopwords(df, col_name, stop_words):
    """
    Removes all words in stop_words from df['col_name']
    """
    df[col_name] = df[col_name].apply(
        lambda x: [item for item in x if item not in stop_words])


def remove_punctuations(text):
    """
    Adapted from: https://stackoverflow.com/questions/39782418/remove-punctuations-in-pandas/39782973
    """
    for punctuation in string.punctuation:
        if punctuation != "'":
            text = text.replace(punctuation, '')
    return text


def response_to_word(col_list):
    """
    Given a col_list E.X. ["covid", "i have gone out less",...], returns a list of words i.e ["covide", "i",...]
    """

    # list to store all words
    complete_word_list = []
    # Iterate through response
    for response in col_list:
        # Convert the response to a list of words
        word_list = re.findall(r"([\w][\w']*\w)", response)
        # Extend question_E4b_en_word_list with this list of words
        complete_word_list.extend(word_list)
    return complete_word_list


def lemmatize_text(text):
    return [lemmatizer.lemmatize(w) for w in w_tokenizer.tokenize(text)]


def get_ngrams_discern(sentence_list, n):
    """
    Returns a Counter containing all ngrams in sentence_list and frequencies.
    Ensures there is no overlap between respondents

    Parameters:
        sentence_list (list): a list of list of responses ~ [["response 1"], []"response 2"],...]
        n (int): the number of adjacent words to look for
    Returns:
        ngram_counter (Counter): containing all ngrams in sentence_list and frequencies.
    """
    # List to store all ngrams
    n_grams_list = []

    # Iterate through each sentence
    for i in range(len(sentence_list)):

        # Create the ngram for the current sentence
        n_grams = ngrams(sentence_list[i].split(), n)

        # Create a Counter to count the number of times each ngram occurs in n_grams
        n_grams_frequencies = Counter(n_grams)

        # Iterate through all the ngrams of the response
        for ngram in n_grams_frequencies:

            n_grams_list.append(ngram)

    # Create Counter where key:value pairs represent word:frequency
    return Counter(n_grams_list)


def tabulate_counter(counter, n):
    """
    Prints the top n occurring in counter
    """
    most_common = counter.most_common(n)
    most_common_table = []

    for i in most_common:
        row = []
        row.append(i[0])
        row.append(i[1])
        most_common_table.append(row)

    print(tabulate(most_common_table))


def print_query_results(search_terms, col_name, df, num):
    """
    Prints num rows in df[col_name] that contains >1 word from search_terms

    search_terms: a list of search words
    col_name: the column name that is searched
    df: the dataframe that is searched
    """
    query = df[df[col_name].str.contains(
        '|'.join(search_terms))]

    counter = 0
    for index, row in query.iterrows():
        print(f"{row[col_name]}\n")
        counter += 1
        if counter == num:
            break

def query_num(search_terms, col_name, df):
    """
    Counts the number of times search_terms appears in df[col]
    """
    query = df[df[col_name].str.contains(
        '|'.join(search_terms))]

    return query.shape[0]

def categorize(search_terms, search_col, category_col, df):
    """
    Sets value in category_col to 1 if search_col contains > 1 search_terms

    search_terms: list of search terms
    serach_col: string name of column that is searched
    category_col: string name of column that is updated
    df: dataframe
    """
    df[category_col] = np.where(
        df[search_col].str.contains('|'.join(search_terms)), 1, df[category_col])

# QE4b

## General Cleaning
- Drop empty rows
- Convert to lower case
- Remove punctuation
- Remove stopwords
- Lemmatize

In [63]:
# Create a dataframe that only includes the column of interest
E4b = mop_2020[["QE4b"]]
' '
# Check for empty columns:
num_empty = len(E4b[(E4b["QE4b"] == "") | (
    E4b["QE4b"] == " ") | (E4b["QE4b"].isna())].index)
print(f"Number of empty rows: {num_empty}")

# Drop empty columms
E4b = E4b[(E4b["QE4b"] != "") & (
    E4b["QE4b"] != " ") & (E4b["QE4b"].notna())]

print(f"Number of remaining rows: {len(E4b['QE4b'].index)}")

E4b["QE4b"] = E4b["QE4b"].str.replace(
    "[^\w\s]", "").str.lower()

# Lemmatize
# Create a list to save the corrected spellings
E4b_sentence_list = []

# Iterate through each sentence in E4b_sentence_list
for sentence in E4b["QE4b"].tolist():

    # List to store lemmatized words
    sentence_lemmatized = []

    # Iterate through each word in sentence
    for word in sentence.split():

        if word == "less":

            # Append "less" to sentence_lemmatized
            sentence_lemmatized.append(word)

        # Append the lemmatized word to sentence_lemmatized
        else:
            sentence_lemmatized.append(lemmatizer.lemmatize(word))

    # Append the lemmatized sentence (converted back into one string) to E4b_sentence_list_lemmatized
    E4b_sentence_list.append(" ".join(sentence_lemmatized))

Number of empty rows: 101574
Number of remaining rows: 787


  E4b["QE4b"] = E4b["QE4b"].str.replace(


Below, mispelled words are manually corrected:

In [64]:
# Create one string "i dont like.." from a list of strings ["i dont like...","this is...",""]
E4b_one_string =" ".join(E4b_sentence_list)
# Create a list of words ["i", "dont", "like"...] from one string "i dont like.."
E4b_word_list = E4b_one_string.split()

In [65]:
# Init the SpellChecker
spell = SpellChecker(language="en") 

# Generate a list of potentially misspelled words
E4b_word_list_misspelled = spell.unknown(E4b_word_list)

# Sort question_E4b_en_word_list_misspelled alphabetically (a-z)
E4b_word_list_misspelled = sorted(E4b_word_list_misspelled)

# The output below was used to generate the detect the mispellings
# the SpellChecker library's 'most likely'
# for word in E4b_word_list_misspelled:
#     print(f"word: {word} | most likely: {spell.correction(word)}")

# # Create a list to save the corrected spellings
# E4b_sentence_list = []

for i in range(len(E4b_sentence_list)):
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("aceepted", "accepted")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("almoat ", "almost")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("aslo", "also")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("atm's", "atms")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("atmsbanks", "atms banks")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("avlid ", "avoid")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("bancrupupt ", "bankrupt")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("bettertap", "better tap")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("bettertap", "better tap")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("cardtap", "card tap")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("cashand", "cash and")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("cashcoins", "cash coins")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace(
        "cashlessdistancing", "cashless distancing")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("chrismas", "christmas")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace(
        "concernseasyconvenient", "concerns easy convenient")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("contactlesstap", "contactless tap")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("coronvairus", "coronavirus")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("covid019", "covid")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("covid19", "covid")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("covidf", "covid")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("creditdebit", "credit debit")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("creditdebittap", "credit debit tap")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("currencyi", "currency")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("debitcredit", "credit debit")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("debitmastercard", "debit mastercard")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("debtsi", "debit")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("essentiels", "essentials")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("evenchoice", "even choice")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("foodrestaurants", "food restaurants")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("factorlike", "factor like")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("inconvinent", "inconvenient")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("limitthis", "limit this")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("makong", "making")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("nocash", "no cash")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("oftensome", "often some")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("onlinepreordering", "online preordering")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("onormally", "normally")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("onlone", "online")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("pandamic", "pandemic")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("pandemy", "pandemic")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("payandgo", "pay and go")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("paymentspurchase", "payment purchase")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("phonefind", "phone find")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("phoneto", "phone to")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("pndemic", "pandemic")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("retaliers", "retailers")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("returnsbetter", "returns better")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("secureand", "secure and")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("selectionat", "selection at")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("selfcheckouts", "self checkouts")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("servicedelivery", "service delivery")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("shoppay", "shopping")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("shoppingpayments", "shopping payments")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("smallfrequent", "small frequent")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("transactionsat", "transactions at")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("transactions", "more")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("thisthere", "this there")

Below, commonly occuring French phrases and words are translated to English:

In [66]:
# lists to store responses by language
E4b_sentence_list_fr = []

# iterate through each sentence in E4b_sentence_list
for sentence in E4b_sentence_list:
    # detect the language using package langdetect
    if detect(sentence) == "fr":
        E4b_sentence_list_fr.append(sentence)

print(f"Number of French responses detected: {len(E4b_sentence_list_fr)}")

Number of French responses detected: 116


In [67]:
# Create one string "i dont like.." from a list of strings ["i dont like...","this is...",""]
E4b_one_string_fr =" ".join(E4b_sentence_list_fr)

# Create a list of words ["i", "dont", "like"...] from one string "i dont like.."
E4b_word_list_fr = E4b_one_string_fr.split()

print("Top 10 frequently occurring French words:")
display(Counter(E4b_word_list_fr).most_common(10))

print("Top 10 frequently occurring French 2grams:")
ngrams_discerns_fr = get_ngrams_discern(E4b_sentence_list_fr, 2)
print("Discerns: No lemmatizing")
display(ngrams_discerns_fr.most_common(10))

print("Top 10 frequently occurring French 3grams:")
ngrams_discerns_fr = get_ngrams_discern(E4b_sentence_list_fr, 3)
print("Discerns: No lemmatizing")
display(ngrams_discerns_fr.most_common(10))

Top 10 frequently occurring French words:


[('de', 155),
 ('plus', 52),
 ('le', 52),
 ('carte', 52),
 ('la', 49),
 ('en', 42),
 ('je', 39),
 ('comptant', 37),
 ('et', 36),
 ('contact', 35)]

Top 10 frequently occurring French 2grams:
Discerns: No lemmatizing


[(('carte', 'de'), 29),
 (('de', 'la'), 27),
 (('sans', 'contact'), 27),
 (('de', 'crédit'), 22),
 (('en', 'ligne'), 18),
 (('cause', 'de'), 18),
 (('largent', 'comptant'), 18),
 (('ma', 'carte'), 17),
 (('la', 'covid'), 17),
 (('à', 'cause'), 15)]

Top 10 frequently occurring French 3grams:
Discerns: No lemmatizing


[(('carte', 'de', 'crédit'), 22),
 (('cause', 'de', 'la'), 16),
 (('de', 'la', 'covid'), 15),
 (('ma', 'carte', 'de'), 13),
 (('à', 'cause', 'de'), 12),
 (('de', 'largent', 'comptant'), 6),
 (('carte', 'de', 'débit'), 6),
 (('la', 'carte', 'de'), 6),
 (('plus', 'dachat', 'en'), 5),
 (('dachat', 'en', 'ligne'), 5)]

In [68]:
for i in range(len(E4b_sentence_list)):
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("plus dachat ligne","more online shopping")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("paiement sans contact","contactless payment")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("sans contact carte","contactless card")
#     E4b_sentence_list[i] = E4b_sentence_list[i].replace("contact carte crédit","") Unsure about this one
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("plus dachats ligne","more online shopping")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("carte sans contact","contactless card")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("sans contact","contactless")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("carte crédit","credit card")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("largent comptant","cash")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("carte débit","debit card")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("argent comptant","cash")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("dargent comptant","cash")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("dargent comptant","cash")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("plus", "more")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("carte", "card")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("contact", "contact")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("comptant", "cash")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("sans", "without")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("crédit", "credit")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("moins", "less")
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("largent", "money")

Lastly, words are mapped so that synonyms are grouped together.

In [69]:
# Create one string "i dont like.." from a list of strings ["i dont like...","this is...",""]
E4b_one_string =" ".join(E4b_sentence_list)

# Create a list of words ["i", "dont", "like"...] from one string "i dont like.."
E4b_word_list = E4b_one_string.split()

print(f"Instance of 'pandemic' that will be mapped to covid: {Counter(E4b_word_list)['pandemic']}")

Instance of 'pandemic' that will be mapped to covid: 64


In [70]:
for i in range(len(E4b_sentence_list)):
    # Add in additional mappings here as needed
    E4b_sentence_list[i] = E4b_sentence_list[i].replace("pandemic","covid")

## N-Grams

Create a version of `E4b_sentence_list` that does not contain stopwords.

In [82]:
E4b_sentence_list_nostopwords = []

for sentence in E4b_sentence_list:
    sentence_no_stopwords = " ".join([word for word in sentence.split() if word not in (stop_words)])
    E4b_sentence_list_nostopwords.append(sentence_no_stopwords)

In [83]:
ngrams_discerns = get_ngrams_discern(E4b_sentence_list_nostopwords, 1)
print("Top 10 frequently occurring words:")
tabulate_counter(ngrams_discerns,10)

print("Top 10 frequently occurring bigrams (discerns):")
ngrams_discerns = get_ngrams_discern(E4b_sentence_list_nostopwords, 2)
tabulate_counter(ngrams_discerns,10)

ngrams_discerns = get_ngrams_discern(E4b_sentence_list_nostopwords, 3)
print("Top 10 frequently occurring trigrams (discerns):")
tabulate_counter(ngrams_discerns,10)

Top 10 frequently occurring words:
-------------  ---
('covid',)     340
('cash',)      338
('use',)       184
('less',)      180
('online',)    174
('card',)      171
('credit',)    130
('using',)      96
('shopping',)   94
('tap',)        94
-------------  ---
Top 10 frequently occurring bigrams (discerns):
----------------------  --
('less', 'cash')        99
('due', 'covid')        77
('credit', 'card')      71
('use', 'cash')         54
('online', 'shopping')  54
('debit', 'card')       43
('tap', 'go')           41
('use', 'less')         32
('use', 'tap')          27
('covid', '19')         26
----------------------  --
Top 10 frequently occurring trigrams (discerns):
---------------------------  --
('use', 'less', 'cash')      30
('use', 'tap', 'go')         14
('use', 'credit', 'card')    14
('cash', 'due', 'covid')     14
('using', 'less', 'cash')    12
('less', 'cash', 'due')      10
('using', 'credit', 'card')  10
('using', 'debit', 'card')    8
('use', 'debit', 'card')    

## Key Word Analysis

In [186]:
# Create a dataframe from the cleaned responses
E4b_cleaned = pd.DataFrame(E4b_sentence_list,columns=["E4b"])
E4b_cleaned.head()

Unnamed: 0,E4b
0,some indication that covid 19 can live on bank...
1,covid
2,ive been using my debit card more a because of...
3,i have more cash in hand thats why i pay more ...
4,more contactless and online payment purchase b...


In [187]:
# Create new column categories
E4b_categorized = E4b_cleaned.reindex(
    columns=E4b_cleaned.columns.tolist() + ["less_cash",
                                            "covid",
                                            "credit_card",
                                            "debit_card", 
                                            "business",
                                           "online"])

# Define total number of responses
total_responses_E4b = E4b_cleaned.shape[0]

**"less cash"**

In [188]:
# Define the key terms
search_terms = ["less cash"]

# Set column "less_cash" equal to 1 for "QE3" containing any words in search_terms
categorize(search_terms, "E4b", "less_cash", E4b_categorized)

num_categorized = E4b_categorized[E4b_categorized["less_cash"]==1].shape[0]
print(f"Number of rows >1 search_terms: {num_categorized}\n")

# Print the number of rows that were categorized
print(f"Total number of responses: {total_responses_E4b}\n")

print("Examples:\n")
print_query_results(search_terms, "E4b", E4b_cleaned, 5)

Number of rows >1 search_terms: 95

Total number of responses: 787

Examples:

i carry less cash pay with touch and go i look online more than in store to avoid crowd

covid _ use less cash

use less cash since most retailer and cashier prefer not to handle cash these day

i use less cash and more contactless because of covid

i use less cash mostly because merchant request this a their preferred method of payment i typically would use more cash for smaller purchase



**"covid"**

In [189]:
# Define the key terms
search_terms = ["covid"]

# Set column "covid" equal to 1 for "QE3" containing any words in search_terms
categorize(search_terms, "E4b", "covid", E4b_categorized)

num_categorized = E4b_categorized[E4b_categorized["covid"]==1].shape[0]
print(f"Number of rows >1 search_terms: {num_categorized}\n")

# Print the number of rows that were categorized
print(f"Total number of responses: {total_responses_E4b}\n")

print("Examples:\n")
print_query_results(search_terms, "E4b", E4b_cleaned, 5)

Number of rows >1 search_terms: 341

Total number of responses: 787

Examples:

some indication that covid 19 can live on bank note greater likelihood of making contact with other people during the transaction

covid

ive been using my debit card more a because of covid lot of place werent accepting cash i on the other hand my only issue is leaving my self broke penny less more than often or more than what should be allowed but thats because i wa never taught the importance of money and saving so im a 21 year old that doesnt exactly care about money

i have more cash in hand thats why i pay more with cash not covid related covid ha no influence in my decision how to pay

more contactless and online payment purchase because of covid



**"debit card"**

In [190]:
# Define the key terms
search_terms = ["debit"]

# Set column "debit_card" equal to 1 for "QE3" containing any words in search_terms
categorize(search_terms, "E4b", "debit_card", E4b_categorized)

num_categorized = E4b_categorized[E4b_categorized["debit_card"]==1].shape[0]
print(f"Number of rows >1 search_terms: {num_categorized}\n")

# Print the number of rows that were categorized
print(f"Total number of responses: {total_responses_E4b}\n")

print("Examples:\n")
print_query_results(search_terms, "E4b", E4b_cleaned, 5)

Number of rows >1 search_terms: 92

Total number of responses: 787

Examples:

ive been using my debit card more a because of covid lot of place werent accepting cash i on the other hand my only issue is leaving my self broke penny less more than often or more than what should be allowed but thats because i wa never taught the importance of money and saving so im a 21 year old that doesnt exactly care about money

some business are preferring not to use cash if possible so i would use debit more than credit card if banking fee were eliminated

no concern just the fact the debit payment is so easy

use credit debit more because of vendor request

less cash more debit to quicken the transaction in light of covid and lessen the risk



**"credit"**

In [191]:
# Define the key terms
search_terms = ["credit"]

# Set column "credit_card" equal to 1 for "QE3" containing any words in search_terms
categorize(search_terms, "E4b", "credit_card", E4b_categorized)

num_categorized = E4b_categorized[E4b_categorized["credit_card"]==1].shape[0]
print(f"Number of rows >1 search_terms: {num_categorized}\n")

# Print the number of rows that were categorized
print(f"Total number of responses: {total_responses_E4b}\n")

print("Examples:\n")
print_query_results(search_terms, "E4b", E4b_cleaned, 5)

Number of rows >1 search_terms: 133

Total number of responses: 787

Examples:

i used credit card for everything so it is tracked and i have time to review charge and pay or dispute injustice charge

some business are preferring not to use cash if possible so i would use debit more than credit card if banking fee were eliminated

use credit debit more because of vendor request

most store were asking for you to use credit or debit card

i use credit card more frequently at grocery store



**"business"**

In [192]:
# Define the key terms
search_terms = ["business"]

# Set column "business" equal to 1 for "QE3" containing any words in search_terms
categorize(search_terms, "E4b", "business", E4b_categorized)

num_categorized = E4b_categorized[E4b_categorized["business"]==1].shape[0]
print(f"Number of rows >1 search_terms: {num_categorized}\n")

# Print the number of rows that were categorized
print(f"Total number of responses: {total_responses_E4b}\n")

print("Examples:\n")
print_query_results(search_terms, "E4b", E4b_cleaned, 5)

Number of rows >1 search_terms: 23

Total number of responses: 787

Examples:

some business are preferring not to use cash if possible so i would use debit more than credit card if banking fee were eliminated

using debit card more and cash less a a lot of business prefer that during the covid

less cash due to covid a business are less likely to accept it more online purchasing than before but overall less purchasing in general

tap and go is faster and safer than punching in a code at a machine also i prefer cash but if a business prefers tap then i will oblige i use on line for curb side pick up only never for delivery in my area of town one doe not want delivery of parcel

less cash and more credit still some business like canada post dont take cash any more



**online**

In [193]:
# Define the key terms
search_terms = ["online"]

# Set column "online" equal to 1 for "QE3" containing any words in search_terms
categorize(search_terms, "E4b", "online", E4b_categorized)

num_categorized = E4b_categorized[E4b_categorized["online"]==1].shape[0]
print(f"Number of rows >1 search_terms: {num_categorized}\n")

print(f"Total number of responses: {total_responses_E4b}\n")

print("Examples:\n")
print_query_results(search_terms, "E4b", E4b_cleaned, 5)

Number of rows >1 search_terms: 175

Total number of responses: 787

Examples:

more contactless and online payment purchase because of covid

due to the covid i made 95 of my purchase online

feel more secure with safeguard for making online purchase

i carry less cash pay with touch and go i look online more than in store to avoid crowd

covid ha made me do more online



## Combinations?

**"less cash" and "covid"**

In [297]:
search_terms=["less cash","covid"]
less_cash_covid = E4b_categorized[(E4b_categorized["less_cash"]==1)&(E4b_categorized["covid"]==1)]
print(f"Number of responses including both 'less cash' and 'covid': {less_cash_covid.shape[0]}")
print(f"\nTotal number of responses: {total_responses_E4b}")
print("\nExamples:\n")
print_query_results(search_terms, "E4b", less_cash_covid, 5)

Number of responses including both 'less cash' and 'covid': 48

Total number of responses: 787

Examples:

covid _ use less cash

i use less cash and more contactless because of covid

less cash more debit to quicken the transaction in light of covid and lessen the risk

less cash because of the covid

less cash due to covid a business are less likely to accept it more online purchasing than before but overall less purchasing in general



**"covid" and "business"**

In [299]:
search_terms=["covid","business"]
covid_business = E4b_categorized[(E4b_categorized["covid"]==1)&(E4b_categorized["business"]==1)]
print(f"Number of responses including both 'covid' and 'business': {covid_business.shape[0]}")
print(f"\nTotal number of responses: {total_responses_E4b}")
print("\nExamples:\n")
print_query_results(search_terms, "E4b", covid_business, 5)

Number of responses including both 'covid' and 'business': 10

Total number of responses: 787

Examples:

using debit card more and cash less a a lot of business prefer that during the covid

less cash due to covid a business are less likely to accept it more online purchasing than before but overall less purchasing in general

more online shopping due to the covid concern about small business

i am using less cash because a good number of business are not accepting cash due to covid 19

a ive explained there were many business that stopped accepting cash during covid bank increased the maximum for tap and go purchase and atm were either out of order not filled up with cash very often or simply were too expensive with the sur charge to use therefore i have used my debit card much more and cash much less i no longer have loose change lying around in my wallet to buy a pop from a machine at work for example also by using bank card it is easier to track spending and make appropriate chang

**"less cash" and "business"**

In [298]:
search_terms=["less cash","business"]
less_cash_business = E4b_categorized[(E4b_categorized["less_cash"]==1)&(E4b_categorized["business"]==1)]
print(f"Number of responses including both 'less cash' and 'business': {less_cash_business.shape[0]}")
print(f"\nTotal number of responses: {total_responses_E4b}")
print("\nExamples:\n")
print_query_results(search_terms, "E4b", less_cash_business, 5)

Number of responses including both 'less cash' and 'business': 6

Total number of responses: 787

Examples:

less cash due to covid a business are less likely to accept it more online purchasing than before but overall less purchasing in general

less cash and more credit still some business like canada post dont take cash any more

less cash a business asked not to use cash during the corona virus

i am using less cash because a good number of business are not accepting cash due to covid 19

well a lot of business either tell u from the beginning that cash isnt accepted host at a restaurant before they take u to the table sign posted at a food truck establishment etc so thats why we have been using less cash



## Try Delimiting? - Online

In [315]:
delimiters = "because|due to|since"

online = E4b_categorized[E4b_categorized["online"] == 1][["E4b"]]
online[["E4b_split", "because/dueto/since"]
       ] = online["E4b"].str.split(delimiters, 1, expand=True)

online_delim = online[(online["E4b_split"].notna()) & (
    online["because/dueto/since"].notna())][["E4b_split", "because/dueto/since"]]

print(f"Number of responses containing 'online': {online.shape[0]}")
print(
    f"Number of responses containing 'online' AND one or more delimiter: {online_delim.shape[0]}")
num_covid = query_num("covid", "because/dueto/since", online_delim)
print(f"Number (of the {online_delim.shape[0]}) containing 'covid': {num_covid}")

online_delim.head()

Number of responses containing 'online': 175
Number of responses containing 'online' AND one or more delimiter: 68
Number (of the 68) containing 'covid': 68


Unnamed: 0,E4b_split,because/dueto/since
4,more contactless and online payment purchase,of covid
7,,the covid i made 95 of my purchase online
36,ordering online slightly more to avoid crowded...,the covid covid
49,,of covid ive been buying a lot more stuff onl...
93,less cash,covid a business are less likely to accept it...


Consider the responses without a delimiter:

In [316]:
online_no_delim = online[(online["E4b_split"].isna()) | (
    online["because/dueto/since"].isna())][["E4b"]]

print(f"Number of responses containing 'online': {online.shape[0]}")
print(
    f"Number of responses containing 'online' AND NO delimiter: {online_no_delim.shape[0]}")
num_covid = query_num("covid", "E4b", online_no_delim)
print(f"Number (of the {online_no_delim.shape[0]}) containing 'covid': {num_covid}")

Number of responses containing 'online': 175
Number of responses containing 'online' AND NO delimiter: 107
Number (of the 107) containing 'covid': 107


In [286]:
print("Examples:\n")
print_query_results([" "], "E4b", online_no_delim, 5)

Examples:

feel more secure with safeguard for making online purchase

i carry less cash pay with touch and go i look online more than in store to avoid crowd

covid ha made me do more online

covid 19 it all online ive been using paypal a lot more too it feel more secure

access to store and the need to not have to travel combined with the ease of shopping online often at lower cost than shopping at a store



Conclusions:

- Any mention of 'online' also includes 'covid'

## Try Delimiting? - Card

In [317]:
delimiters = "because|due to|since"

card = E4b_categorized[(E4b_categorized["credit_card"] == 1)|(E4b_categorized["debit_card"] == 1)][["E4b"]]
card[["E4b_split", "because/dueto/since"]
       ] = card["E4b"].str.split(delimiters, 1, expand=True)
card_delim = card[(card["E4b_split"].notna()) & (
    card["because/dueto/since"].notna())][["E4b_split", "because/dueto/since"]]

print(f"Number of responses containing 'card': {card.shape[0]}")
print(
    f"Number of responses containing 'card' AND one or more delimiter: {card_delim.shape[0]}")
num_covid = query_num("covid", "because/dueto/since", card_delim)
print(f"Number (of the {card_delim.shape[0]}) containing 'covid': {num_covid}")

card_delim.head()

Number of responses containing 'card': 179
Number of responses containing 'card' AND one or more delimiter: 58
Number (of the 58) containing 'covid': 58


Unnamed: 0,E4b_split,because/dueto/since
2,ive been using my debit card more a,of covid lot of place werent accepting cash i...
31,use credit debit more,of vendor request
47,place are preferring not to handle cash i dont...,of this there being no special allowance due ...
99,,covid using credit debit card more
100,i use my debit card much more than cash just,of the covid caution


Consider the responses that do not contain a delimiter

In [318]:
card_no_delim = card[(card["E4b_split"].isna()) | (
    card["because/dueto/since"].isna())][["E4b"]]

print(f"Number of responses containing 'card': {card.shape[0]}")
print(
    f"Number of responses containing 'card' AND NO delimiter: {card_no_delim.shape[0]}")
num_covid = query_num("covid", "E4b", card_no_delim)
print(f"Number (of the {card_no_delim.shape[0]}) containing 'covid': {num_covid}")

print("\nExamples:\n")
print_query_results([" "], "E4b", card_no_delim, 5)

Number of responses containing 'card': 179
Number of responses containing 'card' AND NO delimiter: 121
Number (of the 121) containing 'covid': 121

Examples:

i used credit card for everything so it is tracked and i have time to review charge and pay or dispute injustice charge

some business are preferring not to use cash if possible so i would use debit more than credit card if banking fee were eliminated

no concern just the fact the debit payment is so easy

less cash more debit to quicken the transaction in light of covid and lessen the risk

most store were asking for you to use credit or debit card



## Try Delimiting? - Business

In [319]:
delimiters = "because|due to|since"

business = E4b_categorized[E4b_categorized["business"] == 1][["E4b"]]
business[["E4b_split", "because/dueto/since"]
       ] = business["E4b"].str.split(delimiters, 1, expand=True)

business_delim = business[(business["E4b_split"].notna()) & (
    business["because/dueto/since"].notna())][["E4b_split", "because/dueto/since"]]

print(f"Number of responses containing 'business': {business.shape[0]}")
print(
    f"Number of responses containing 'business' AND one or more delimiter: {business_delim.shape[0]}")
num_covid = query_num("covid", "because/dueto/since", business_delim)
print(f"Number (of the {business_delim.shape[0]}) containing 'covid': {num_covid}")

business_delim.head()

Number of responses containing 'business': 23
Number of responses containing 'business' AND one or more delimiter: 7
Number (of the 7) containing 'covid': 7


Unnamed: 0,E4b_split,because/dueto/since
93,less cash,covid a business are less likely to accept it...
158,more online shopping,the covid concern about small business
186,i am using less cash,a good number of business are not accepting c...
227,i tend to use debit card more if i dont have e...,virus concern
294,a ive explained there were many business that ...,covid ha put a strain on finance


Consider the responses that do not contain a delimiter

In [321]:
business_no_delim = business[(business["E4b_split"].isna()) | (
    business["because/dueto/since"].isna())][["E4b"]]

print(f"Number of responses containing 'business': {business.shape[0]}")
print(
    f"Number of responses containing 'business' AND NO delimiter: {business_no_delim.shape[0]}")
num_business = query_num("business", "E4b", business_no_delim)
print(f"Number (of the {business_no_delim.shape[0]}) containing 'business': {num_business}")

print("\nExamples:\n")
print_query_results([" "], "E4b", business_no_delim, 5)

Number of responses containing 'business': 23
Number of responses containing 'business' AND NO delimiter: 16
Number (of the 16) containing 'business': 16

Examples:

some business are preferring not to use cash if possible so i would use debit more than credit card if banking fee were eliminated

using debit card more and cash less a a lot of business prefer that during the covid

tap and go is faster and safer than punching in a code at a machine also i prefer cash but if a business prefers tap then i will oblige i use on line for curb side pick up only never for delivery in my area of town one doe not want delivery of parcel

less cash and more credit still some business like canada post dont take cash any more

less cash a business asked not to use cash during the corona virus

