# Reactions

Let's look at the rocdev reactions data.  A reaction is when you `:+1:` someone's post, etc.

In [1]:
import pyspark.sql.functions as F
df = spark.read.parquet('parquet_data/rocdev/events.parquet')
username_df = spark.read.parquet('parquet_data/rocdev/users.parquet').select('id', 'name')

The reactions data is stored in a list as a nested struct.  To make it a little more analysis-friendly, let's flatten it a bit.  I'm going to arbitrarily choose the nomenclature as "reactor" being the person that reacts and "reactee" being the author of the original message.

In [2]:
reacts_df = df\
    .select('user', F.explode('reactions').alias('reaction'))\
    .select('user', F.col('reaction.name').alias('reaction'), 'reaction.users')\
    .select(F.col('user').alias('reactee'), 'reaction', F.explode('users').alias('reactor'))\
    .dropna()
reacts_df.show()

+---------+---------------+---------+
|  reactee|       reaction|  reactor|
+---------+---------------+---------+
|U9R2EFXJS|      bitbucket|U3ZJRAKTQ|
|U9R2EFXJS|      bitbucket|U7BH6855H|
|U9R2EFXJS|      bitbucket|UAW7CMX38|
|U9R2EFXJS|      bitbucket|U3P40Q1FW|
|U07SZRSTT|            roc|U3P40Q1FW|
|U9LGXCBGD|heavy_plus_sign|U07SZRSTT|
|U9LGXCBGD|heavy_plus_sign|UAD2W717S|
|U9LGXCBGD|heavy_plus_sign|U7BH6855H|
|U2N2GT82Y|             +1|UAKK3Q7B7|
|U2N2GT82Y|             +1|U3P40Q1FW|
|U07JAHMJ8|     sunglasses|U60T51VNU|
|U07FATBEW|           clap|U07SZRSTT|
|U07FATBEW|           clap|U3ZJRAKTQ|
|U07FATBEW|           clap|U3P40Q1FW|
|U3P40Q1FW|          metal|U0DT3B3MW|
|U3P40Q1FW|          metal|U07FATBEW|
|U3P40Q1FW|       laughing|U07FATBEW|
|U07ER9E75|           clap|U3ZJRAKTQ|
|U07ER9E75|           clap|U080D2GSK|
|U07ER9E75|           clap|U07SZRSTT|
+---------+---------------+---------+
only showing top 20 rows



What are the most commonly-used reactions?

In [3]:
reacts_df.\
    groupBy('reaction').\
    count().\
    orderBy(F.col('count').desc()).\
    show()

+--------------------+-----+
|            reaction|count|
+--------------------+-----+
|                  +1| 5765|
|                 100| 4128|
|                 joy| 2501|
|            laughing| 1760|
|     heavy_plus_sign| 1668|
|                wave| 1657|
|            point_up| 1065|
|               heart|  657|
|                clap|  604|
|                tada|  549|
|         partyparrot|  411|
|           trollface|  371|
|          point_up_2|  352|
|        raised_hands|  348|
|               smile|  335|
|       thinking_face|  269|
|                fire|  261|
|rolling_on_the_fl...|  239|
|        disappointed|  209|
|              joysob|  187|
+--------------------+-----+
only showing top 20 rows



Nothing too surprising here.  AFAIK `joysob` is a rocdev creation, so that's pretty neat to see in the top 20.

Let's see who reacts to whom the most.

In [4]:
reacts_df\
    .groupBy('reactee', 'reactor')\
    .count()\
    .orderBy(F.col('count').desc())\
    .join(username_df, F.col('reactee') == F.col('id'))\
    .select(F.col('name').alias('reactee'), F.col('reactor'), F.col('count'))\
    .join(username_df, F.col('reactor') == F.col('id'))\
    .select('reactee', F.col('name').alias('reactor'), F.col('count'))\
    .show()

+--------------+--------------+-----+
|       reactee|       reactor|count|
+--------------+--------------+-----+
|        geowa4|     edgriebel|  276|
|        geowa4|kristen.gdiroc|  252|
|       ajvulaj|     edgriebel|  223|
|        geowa4|valentinaperic|  217|
|        geowa4|brandonramirez|  205|
|  iamkirkbater|     edgriebel|  174|
|brandonramirez|     edgriebel|  146|
|kristen.gdiroc|valentinaperic|  144|
|kristen.gdiroc|        travis|  139|
|      coderjoe|     edgriebel|  138|
|         bking|kristen.gdiroc|  136|
|        fletch|     edgriebel|  135|
|        geowa4|      coderjoe|  126|
|         jfine|     edgriebel|  126|
|      mhodesty|     edgriebel|  124|
|kristen.gdiroc|     edgriebel|  119|
|        bvulaj|     edgriebel|  117|
|        geowa4|        travis|  115|
|      coderjoe|kristen.gdiroc|  107|
|     dantswain|kristen.gdiroc|  107|
+--------------+--------------+-----+
only showing top 20 rows



Everyone loves reacting to George, apparently.  Let's see who the most reacted-to people are in general.

In [5]:
reacts_df\
    .groupBy('reactee')\
    .count()\
    .orderBy(F.col('count').desc())\
    .join(username_df, F.col('reactee') == F.col('id'))\
    .select(F.col('name').alias('reactee'), F.col('count'))\
    .show()

+--------------+-----+
|       reactee|count|
+--------------+-----+
|        geowa4| 3140|
|kristen.gdiroc| 1707|
|  iamkirkbater| 1372|
|       ajvulaj| 1109|
|         jfine| 1086|
|      coderjoe| 1026|
|brandonramirez|  962|
|     dantswain|  949|
|        bvulaj|  896|
|        travis|  811|
|valentinaperic|  777|
|        fletch|  759|
|    timpoulsen|  742|
|        nwagar|  723|
|      mhodesty|  681|
|         bking|  662|
|     edgriebel|  645|
|     chrisolin|  633|
|      ptomblin|  461|
|        meganb|  437|
+--------------+-----+
only showing top 20 rows



Yep.  George by a big margin.  Who does the most reacting?

In [6]:
reacts_df\
    .groupBy('reactor')\
    .count()\
    .orderBy(F.col('count').desc())\
    .join(username_df, F.col('reactor') == F.col('id'))\
    .select(F.col('name').alias('reactor'), F.col('count'))\
    .show()

+--------------------+-----+
|             reactor|count|
+--------------------+-----+
|           edgriebel| 3194|
|      kristen.gdiroc| 2504|
|      valentinaperic| 1594|
|              travis| 1537|
|      brandonramirez| 1335|
|              meganb| 1207|
|               jfine| 1108|
|            coderjoe| 1019|
|              geowa4|  942|
|        iamkirkbater|  897|
|stephaniemorillo....|  865|
|           chrisolin|  841|
|              fletch|  832|
|              zmyaro|  792|
|             ajvulaj|  632|
|            anielamw|  486|
|       nerdofthunder|  442|
|            scottish|  436|
|           dantswain|  431|
|              bvulaj|  407|
+--------------------+-----+
only showing top 20 rows



Ed wins here, though Kristen is not _super_ far behind him.  What reactions do these people use most often when they do react?

In [7]:
from pyspark.sql.window import Window

# get reaction counts by user and reaction
reactions_by_user_df = reacts_df.\
    groupBy('reactor', 'reaction').\
    count().\
    withColumnRenamed('count', 'reaction_count')

# window over each user and rank their usage of each reaction
window = Window.partitionBy('reactor').orderBy(F.col('reaction_count').desc())
top_reaction_df = reactions_by_user_df.\
    select('*', F.dense_rank().over(window).alias('rank')).\
    filter(F.col('rank') == 1).\
    withColumnRenamed('reaction', 'top_reaction')

# find how many times each user has reacted
reaction_counts_by_user_df = reacts_df\
    .groupBy('reactor')\
    .count()

# join with the top reaction df (and username) to see both things
reaction_counts_by_user_df\
    .join(top_reaction_df, on=['reactor'])\
    .join(username_df, F.col('reactor') == F.col('id'))\
    .select(F.col('name').alias('reactor'), F.col('top_reaction'), F.col('count'))\
    .orderBy(F.col('count').desc())\
    .show()

+--------------------+------------+-----+
|             reactor|top_reaction|count|
+--------------------+------------+-----+
|           edgriebel|          +1| 3194|
|      kristen.gdiroc|         100| 2504|
|      valentinaperic|         100| 1594|
|              travis|          +1| 1537|
|      brandonramirez|          +1| 1335|
|              meganb|         joy| 1207|
|               jfine|         100| 1108|
|            coderjoe|          +1| 1019|
|              geowa4|        wave|  942|
|        iamkirkbater|         100|  897|
|stephaniemorillo....|         100|  865|
|           chrisolin|         100|  841|
|              fletch|    laughing|  832|
|              zmyaro|        wave|  792|
|             ajvulaj|         100|  632|
|            anielamw|         100|  486|
|       nerdofthunder|         100|  442|
|            scottish|          +1|  436|
|           dantswain|          +1|  431|
|              bvulaj|         100|  407|
+--------------------+------------

Interesting things that stand out to me:

* Megan and Fletch are always laughing, which is great.
* George is always welcoming new folks.