#### Lydia Yampolsky - Data Visualization - Spring 2022

Ukraine's capital is transliterated into Latinate alphabets from the Ukraininan language as 'Kyiv' and from Russian as 'Kiev', reflecting a slight difference in pronunciation (Ukrainian /'ki.ɪv/ or /kiv/ vs Russian /'ki.əv/). The Russian spelling was imposed until 1995, when the Ukrainian government changed the official spelling to reflect an independent Ukrainian identity. Since then, efforts have been made to encourage usage of the Ukrainian spelling and pronunciation as a sign of solidarity with Ukraine's independece from Russia. In 2018, the Ukrainian Ministry of Foreign Affairs launched an online campaign with the hashtag #KyivNotKiev, urging people to use the Ukrainian name for capital. The call to use the spelling as a sign of support has increased since Russia's full-scale invasion of Ukraine in February of 2022, an expansion of the Russo-Ukrainian War beginning in February of 2014.

I am using this script to get data about the tendency in spelling(s) of 'Kyiv' over the last ten years, and before and after the 2022 invasion. I will pull the 100 top scoring Reddit comments from each of the last ten years that contain either spelling and create a visualization of the trend in usage. The goal is to show how the sentiment against the war influences written convention, an example of how language marks people as belonging to certain groups or in support of social movements, and how the internet spurs a global realization of this function of language.

In [118]:
pip install psaw pandas-bokeh




In [119]:
# imports
# API to get Reddit data
from psaw import PushshiftAPI
import praw

ID = '8OdWpDB2wT7WUJynezxMdQ'
secret = 'iZlWbCDzofAYDBAhEgzxcbEFiHngZA'
import requests
auth = requests.auth.HTTPBasicAuth(ID, secret)

# other tools
import datetime as dt
import pandas as pd
import pandas_bokeh
pandas_bokeh.output_notebook()
import re
import matplotlib.pyplot as plt
from matplotlib import colors

In [120]:
# setup to use the API
def keys(name):
        # Return the API key from an API name
        keychain = {
                                'RedditID':'8OdWpDB2wT7WUJynezxMdQ',
                                'RedditSECRET':'iZlWbCDzofAYDBAhEgzxcbEFiHngZA',
                                'RedditUSERAG':'Chrome:DVfinal.py:v1.0.0 by /u/RegularGazelle',
                                'RedditUSER':'RegularGazelle',
                                'RedditPSWD':'*Ni/3tLx7AAh4EL'}
        return keychain[name]
    
    
reddit = praw.Reddit(client_id = keys('RedditID'),
                                client_secret = keys('RedditSECRET'),
                                user_agent = keys('RedditUSERAG'), password = keys('RedditPSWD'),
                    username = keys('RedditUSER'))

api = PushshiftAPI(reddit)

In [123]:
# get data from the API
def get_pushshift_data(data_type, **kwargs):
    base_url = f"https://api.pushshift.io/reddit/search/{data_type}/"
    payload = kwargs
    request = requests.get(base_url, params=payload)
    return request.json()

# see if the comment contains the Ukrainian (1) or Russian (0) spelling.
def spelling(string):
    if bool(re.search('[K,k]yiv', string)):
        return 1
    elif bool(re.search('[K,k]iev', string)):
        return 0
    else:
        return None

# put the data in a DataFrame
# The API only returns a maximum of 100 comments per query, so I pulled the top responses from each of the last 10 years.  
# This returns comments containing either spelling. The q paramter is case-insensitive
dfs = []

for i in range(10):
    comments = get_pushshift_data(data_type = "comment",
                                  q = 'kyiv|kiev',
                                  limit = 100,
                                  sort_type = 'score',
                                  sort = 'desc',
                                  after = str(10-i) + 'y', 
                                  before = str(9-i) + 'y'
    )
    data = []
    for comment in comments['data']:
        data.append({
        "comment id": comment.get('id'),
        "sub": comment.get('subreddit'),
        "time": comment.get('created_utc'),
        "text": comment.get('body'),
        "score": comment.get('score'),
        "Ukr_spelling": spelling(comment.get('body'))
    })
    df = pd.DataFrame(data)
    dfs.append(df)

big_df = pd.concat(dfs)

In [124]:
# clean up
# convert time column from UTC to datetime objects and add a year column
def dt_from_time(row):
    return dt.datetime.fromtimestamp(row["time"])

big_df["time"] = big_df.apply(dt_from_time, axis = 1)
big_df["year"] = big_df["time"].dt.to_period('Y')
big_df

Unnamed: 0,comment id,sub,time,text,score,Ukr_spelling,year
0,c9atyjp,AskReddit,2013-04-08 07:30:40,"My master's thesis, my bachelor's diploma, my ...",969,1.0,2013
1,c8lnl5w,AskReddit,2013-02-25 18:37:48,"I'm Ukrainian, I moved to the states when I wa...",942,0.0,2013
2,c8lns8w,AskReddit,2013-02-25 18:47:43,&gt;Imagine dragging a gradient from west to e...,746,0.0,2013
3,c9l8dkv,leagueoflegends,2013-04-23 16:50:00,The popular perception that TheOddOne was in a...,684,0.0,2013
4,c8lfucq,pics,2013-02-25 12:27:00,Is ship\n\nIs night\n\nIs flags\n\nIs building...,666,0.0,2013
...,...,...,...,...,...,...,...
94,ho57vr7,hoi4,2021-12-11 11:48:03,"I just do logistics strike on the defensive, o...",301,0.0,2021
95,hjtjks2,worldnews,2021-11-08 10:34:41,In short. \n\n\n\- Belarus become totalitaria...,299,0.0,2021
96,h9sntq2,HistoryMemes,2021-08-21 10:05:15,There's like 5 movies about stalingrad. Yet no...,297,0.0,2021
97,hnqz5yx,HistoryMemes,2021-12-08 12:36:24,"Kyiv, early 1980s:\n\nA man goes to apply for ...",296,1.0,2021


I plotted the mentions as an interactive time series scatterplot with comment score on the y-axis. The color channel denotes whether the comment contains the Ukrainian spelling (red) or the Russian spelling (green). Inspect the data more closely by scrolling and zooming. Rolling over a dot will show the spelling, date, and score of the comment. While there is not much of a visible trend or change in proportion over time, we can see a cluster of red in February 2022, corresponding to the date of Russia's full-scale invasion of Ukraine, and an outlier, a very highly rated comment also posted this year. Instances of both spellings are high at this point in time, but Ukrainian spellings outnumber Russian spellings and are rated higher.

Zooming in reveals other clusters; one with both red and green is visible in February of 2014. This corresponds with the [Revolution of Dignity](https://en.wikipedia.org/wiki/Revolution_of_Dignity) in Kyiv, February 18-23, 2014, and the onset of the war.

In [131]:
# normal Pandas scatterplot
#cmap = colors.ListedColormap([[0,1,0], [1,0,0]])
#big_df.plot.scatter(x = 'time', y = 'score', c = 'Ukr_spelling', colormap = cmap, colorbar = False, alpha = .75)

# interactive Pandas_bokeh plot
big_df.plot_bokeh.scatter(x = 'time', y = 'score', category = 'Ukr_spelling', colormap = ('green', 'red'))

Instances of the Ukrainian spelling are concentrated at times when conflict in Kyiv and other parts of Ukraine is in the news. During these periods, Ukrainians and their supporters are informing the world about the conflict they are facing and how to show solidarity when writing about it. The uptick of use of the Ukrainian spelling in February of 2022 in the highest rated Reddit comments reflects efforts by Redditors to use the platform as a way to show solidarity with Ukraine and stand against the war. The use of written convention online to express support for a global social movement is one of the internet's major implications on language use.

    Dreyer, Benjamin. Kyiv Vs. Kiev, Zelensky Vs. Zelenskyy, And The Immense Meaning Of ‘The’. The Washington Post, 2022, https://www.washingtonpost.com/opinions/2022/03/09/language-ukraine-war-kyiv-kiev/.

    McBride, Kelly. "How Do You Say Kyiv? Our Language Catches Up With Political Realities". NPR Public Editor, 2022, https://www.npr.org/sections/publiceditor/2022/02/25/1083055646/how-do-you-say-kyiv.

    "Ukraine’S Foreign Ministry Launches Worldwide Campaign #Correctua - UATV". UATV, 2018, https://uatv.ua/en/ukraine-s-foreign-ministry-launches-worldwide-campaign-correctua/.