# Analysis of Effective Altruism Facebook Group Posts
## 2012 - 2015

Data taken from https://docs.google.com/spreadsheets/d/1zRc2AvZ_nWEyXdOPz9XzjuXszMa92C9WUSVOul-NJyo/edit#gid=1562892504



### Growth

NB: "Posts" in the spreadsheet includes both "status" posts and "link" posts, I refer to them accordingly as "status-posts" and "link-posts." Status-posts are where someone writes some text and shares vs sharing a link to the group. There were also an insignificant number of "other" types including photo, video, and music.

* The total number of status-posts has remained constant for the last three years (Figure 1). 
* The number of link-containing posts has increased year-on-year by ~25% (Figure 1).
* Because of increase in link-posts, the total number of posts (of all types) is increasing annually.
* The group was started in late 2012 but had 23 posts for the year, compared with 1300+ for subsequent years (Table 1).


### Comments
* The absolute number of comments has been decreasing annually, despite increased number of overall posts (Figure 2). 
    - The number of comments per status-post has dropped from an average of 11.3 in 2013, to 7.8 in 2015 (Table 2)
 
 
### Other Comparison of Status- and Link-Posts
* Status-posts receive an average of 9.7 comments per post compared to 5.4 comments per post for link-posts (Table 3).
* Link-posts receive slightly more likes on average than status-posts, 9.3 vs. 7.2 (Table 3).
* Link-posts receive more than 30 times the number of shares as status-posts on average. This explains the large increase in number of shares in 2015.


### Monthly Variation
* Compared with the preceding 12-months, there was a dip in activity from mid-2014 to early 2015 (Figure 3). I don't why. Things started going back up sometime between February and June. I didn't get a chance to look into the changes on that finer time-scale. Does anybody know what's changed with the group from last year to this year? 


### Contributors
* The number of unique contributors has increased linearly since 2013 (Figure 4).
* Half or more of the top contributors list has stayed the same since 2013 (Tables 4-7).
* See below for Leaderboards for Top Contributors for 2013,2014,2015 and All Time.




## Conclusion
Overall group activity has been increasing since 2012, however the composition and nature of the activity has changed. While the number of status-posts has stayed constant since 2013, their proportion of total posts has dropped. In 2013, the number of statuses posted and links posted were roughly equal, whereas in 2015, 50% more links were posted than statuses. Additionally, the number of comments per status-post has been dropping since 2013.

There has been an annual increase in 

Questions I didn't attempt to answer.
Are the additional posters providing value? Do their posts get as many likes/shares/comments, or are they opportunistically posting on a large group?

In [None]:
# Imports
import pandas as pd
from pandas import Series,DataFrame
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import datetime
%matplotlib inline

In [None]:
posts = pd.read_csv('Effective Altruism facebook group posts data - posts (1).csv')
posts['createdTime'] = pd.to_datetime(posts['createdTime'])
posts['year'] = posts['createdTime'].apply(lambda x: x.year)
posts['week'] = posts['createdTime'].apply(lambda x: x.week)
posts['month'] = posts['createdTime'].apply(lambda x: x.month)
posts['quarter'] = posts['createdTime'].apply(lambda x: x.quarter)

### Figure 1: Number of Posts by Type vs. Year

In [None]:
postsTypes = posts.groupby(['year','type']).count()['id'].unstack()
postsTypes.fillna(0,inplace=True)

otherCols = set(list(postsTypes.columns))
mainCols = set(['link','status'])
otherCols = otherCols-mainCols
otherCols = list(otherCols)

postsTypes['other'] = postsTypes[otherCols].sum(axis=1)
postsTypes.drop(otherCols,axis=1,inplace=True)

fig = postsTypes.plot(kind='bar',figsize=(8,4))
fig.set_title("Post Counts by Type vs Year")

### Table 1: Number of Posts by Year
* The group was started in late 2012 but had 23 posts for the year, compared with 1300+ for subsequent years (Table 1).

In [None]:
postsY = posts.groupby(['year']).agg({'likesCount' : np.sum,'commentsCount' : np.sum,'sharesCount' : np.sum,'id': 'count'})
postsY.rename(columns={'id':'postsCount'},inplace=True)
postsY.sort_index()

### Figure 2: Activity Counts vs Year
* Because of increase in link-posts, the total number of posts (of all types) is increasing annually.
* The absolute number of comments has been decreasing annually, despite increased number of overall posts (Figure 2). 



In [None]:
#postsY.reset_index(inplace=True)
fig, Ax = plt.subplots(2,2,figsize=(10,12))

sns.barplot(y='postsCount',x='year',data=postsY,ax=Ax[0,0])
sns.barplot(y='commentsCount',x='year',data=postsY,ax=Ax[0,1])
sns.barplot(y='likesCount',x='year',data=postsY,ax=Ax[1,0])
sns.barplot(y='sharesCount',x='year',data=postsY,ax=Ax[1,1])

Ax[0,0].set_title('Posts'); Ax[0,0].set_xlabel(''); Ax[0,0].set_ylabel('')
Ax[0,1].set_title('Comments');Ax[0,1].set_xlabel('');Ax[0,1].set_ylabel('')
Ax[1,0].set_title('Likes');Ax[1,0].set_xlabel('');Ax[1,0].set_ylabel('')
Ax[1,1].set_title('Shares');Ax[1,1].set_xlabel('');Ax[1,1].set_ylabel('')
#Dammit, what's the command to turn off axis labels?

### Table 2: Average Likes/Comments/Shares for Status Posts
    - The number of comments per status-post has dropped from an average of 11.3 in 2013, to 7.8 in 2015 (Table 2)


In [None]:
postsTypes = posts[(posts['type']=='status')].groupby(['year','type']).mean()
postsTypes[['likesCount','commentsCount','sharesCount']].round(2)

### Table 3: Likes/Comments/Shares for Status vs Links
* Status-posts receive an average of 9.7 comments per post compared to 5.4 comments per post for link-posts (Table 3).
* Link-posts receive slightly more likes on average than status-posts, 9.3 vs. 7.2 (Table 3).
* Link-posts receive more than 30 times the number of shares as status-posts on average. This explains the large increase in number of shares in 2015.

In [None]:
postsTypes = posts[(posts['type']=='link')|(posts['type']=='status')].groupby(['type']).mean()
postsTypes[['likesCount','commentsCount','sharesCount']].round(2)


### Figure 3: Monthly Variation
* Compared with the preceding 12-months, there was a dip in activity from mid-2014 to early 2015 (Figure 3). I don't why, and it's not seasonality. Things started going back up sometime between February and June. I didn't get a chance to look into the changes on that finer time-scale. Does anybody know what's changed with the group from last year to this year? 

In [None]:
postsW = posts.groupby(['year','month']).agg({'likesCount' : np.sum,'commentsCount' : np.sum,'sharesCount' : np.sum,'id': 'count'})
postsW.rename(columns={'id':'postsCount'},inplace=True)
postsW['date_month'] = [datetime.datetime(a[0],a[1],15,0,0) for a in list(postsW.index)]

months = MonthLocator(interval=2)
monthsFmt = DateFormatter("%b-%y")
fig, (ax1) = plt.subplots(1,1,figsize=(10,5),sharex=False)
ax1.plot_date(postsW['date_month'],postsW['postsCount'],'-o')
ax1.xaxis.set_major_locator(months)
ax1.xaxis.set_major_formatter(monthsFmt)

fig.autofmt_xdate()
ax1.autoscale_view()
ax1.set_title('Posts Per Month')

#Yeah, not the best graph.

### Figure 4: Unique Contributors
* The number of unique contributors has increased linearly since 2013 (Figure 4).

In [None]:
postersYear = posts.groupby(['year','authorName']).count()['id'].reset_index(level=1)
postersYear = postersYear.reset_index().groupby('year').count()
postersYear.reset_index(inplace=True)
ax = sns.barplot(x='year',y='id',data=postersYear)
#postersYear.reset_index().plot(x='year',y='id')
ax.set_ylabel('Unique Posters')
ax.set_title('Number of Unique Posters by Year')

In [None]:
def TopContsFunc(posts,year=None):
    
    if(year):
        posts = posts[posts['year']==year]

    TopConts = posts.groupby('authorName').agg({'id': 'count',
                                            'likesCount' : np.sum,
                                              'commentsCount' : np.sum,
                                              'sharesCount' : np.sum
                                              }).sort_values(by='id',ascending=False)

    TopConts = TopConts[['id','likesCount','commentsCount','sharesCount']]
    TopConts.rename(columns={'id':'postsCount'},inplace=True)
    TopConts['LikesPerPost'] = TopConts['likesCount']/TopConts['postsCount']
    TopConts['CommentsPerPost'] = TopConts['commentsCount']/TopConts['postsCount']
    TopConts['SharesPerPost'] = TopConts['sharesCount']/TopConts['postsCount']
    TopConts.reset_index(inplace=True)
    return TopConts


## CONTRIBUTORS TABLES
* 1: All Time
* 2: 2015
* 3: 2014
* 4: 2013

### Table 4: TOP 20 CONTRIBUTORS (posts) ALL TIME

In [None]:
TopContsFunc(posts).head(20)

### TOP 15 CONTRIBUTORS, 2015

In [None]:
TopContsFunc(posts,2015).head(15)

All time contributor leader Rob Wiblin actually made most of his posts (135/187) this year.

### TOP 15 CONTRIBUTORS, 2014

In [None]:
TopContsFunc(posts,2014).head(15)

### TOP 15 CONTRIBUTORS, 2013

In [None]:
TopContsFunc(posts,2013).head(15)