# eFuse Follower Events
The aim of this notebook is to get a feel for the data and determine if further processing is needed. We'll be working with an anonymized dataset containing follower events on the eFuse platform.

**Follower events** are events that occur whenever one gamer follows another on the eFuse platform. This notebook will have two main steps.

1. Let's start by loading in and inspecting the follower events.
2. Determine if any data processing is needed.

Before starting, run the code cell below to load some useful functions and packages.

In [1]:
# accessing utils module
import sys
sys.path.append('../utils')

# needed for loading data:
import pandas as pd

# some problem-specific helper functions:
from utils import get_path

## Step 1: Inspecting Follower Events

Our anonymized user data contains info about who gamers decided to follow and at what time they did so on the platform. Our data contains three dimensions (columns) as outlined below:

- **follower** - The id of the person that initiated the follow
- **followee** - The id of the person being followed
- **createdAt** - The timestamp that the follow event happened

**Location of the data.** The data is stored in a csv file called `followers_anonymized.csv` that can be found in `data/raw` of this repository. Take a moment to read in the data and observe the output:

In [2]:
print("Location of data files:", get_path(''))
print("Location of anonymized followers data:", get_path('followers_anonymized.csv'))
print("Loading...")
df = pd.read_csv(get_path('followers_anonymized.csv'))
print("...Done loading")
print("Displaying the first 5 rows...")
df.head(5)

Location of data files: /Users/matthewquinn/dev/eFuse-sample/data/raw/
Location of anonymized followers data: /Users/matthewquinn/dev/eFuse-sample/data/raw/followers_anonymized.csv
Loading...
...Done loading
Displaying the first 5 rows...


Unnamed: 0,follower,followee,createdAt
0,782bc0ce5ffe00c95bbc52f72fc654a2,e83a54eefe2ff5daf80a66505f9472a4,2019-11-26T21:36:21.221Z
1,e83a54eefe2ff5daf80a66505f9472a4,782bc0ce5ffe00c95bbc52f72fc654a2,2019-11-26T21:36:39.756Z
2,782bc0ce5ffe00c95bbc52f72fc654a2,4627a06c99dde3d167166eaab32e947d,2019-11-26T21:36:49.184Z
3,782bc0ce5ffe00c95bbc52f72fc654a2,49b76015d44936cb2c2184fc805e88a6,2019-11-26T21:36:57.669Z
4,782bc0ce5ffe00c95bbc52f72fc654a2,1e66c9b143e9ee9155d0dcaa8d5997b0,2019-11-26T21:41:21.834Z


In [3]:
print("...displaying the last 5 rows...")
df.tail(5)

...displaying the last 5 rows...


Unnamed: 0,follower,followee,createdAt
609112,a9b7b5dc212eb2cc43c7503ec256d78c,5e1ab920b988fb71f5ae532db2fc449e,2021-05-28T14:40:20.915Z
609113,3828c0c23a13cef79a3b9c30848c3609,c3581e6d3601d2825c10a4547d64f019,2021-05-28T14:49:39.844Z
609114,255f0841aae13a484e8b9b8e314022bd,37b8780b7369005f5bc922de6501dd40,2021-05-28T14:58:24.978Z
609115,49b76015d44936cb2c2184fc805e88a6,818ae3dee4d173fbc40b8aaf2c89f00c,2021-05-28T15:17:50.008Z
609116,72ae2db888738150b28d520578a5907d,37b8780b7369005f5bc922de6501dd40,2021-05-28T15:21:01.119Z


In [4]:
print("...some quick descriptive statistics")
display(df.describe())
print(f"Earliest Event:\n", df.createdAt.min(), sep=" ===> ")
print()
print(f"Most Recent Event:\n", df.createdAt.max(), sep=" ===> ")

...some quick descriptive statistics


Unnamed: 0,follower,followee,createdAt
count,609117,609117,609117
unique,97765,39956,602654
top,5e1ab920b988fb71f5ae532db2fc449e,bbc4a5710e40e0c9fe61a6793297db55,2020-08-23T16:37:07.588Z
freq,12652,51753,6


Earliest Event:
 ===> 2019-11-26T21:36:21.221Z

Most Recent Event:
 ===> 2021-05-28T15:21:01.119Z


**Note the following:**
1. The time period for these events range from `November 11th, 2019` to `May 28th, 2021`.
2. There is roughly 2.45 times as many followers as followees.
3. Follower `5e1ab920b988fb71f5ae532db2fc449e` has followed 12,652 gamers.
4. Followee `bbc4a5710e40e0c9fe61a6793297db55` has been followed by 51,753 gamers.

One thing this data doesn't take into account is the number of times a follower has followed, unfollowed and or re-followed a gamer and vice versa. But no worries, we'll save that observation for another time. Right now though, let's direct our attention towards the `createdAt` column.

## Step 2: Determine If Data Processing is Needed