# COVID-19 Memes, a Facebook album

This notebook downloads the top n most popular photos for each month.

Set n and download directory:

In [397]:
n = 15
download_dir = './memes/'

Import dependencies.

Note that the Facebook access_token expires pretty quickly. This was OK for my use case, but to use this again in the future, I would need to go into the app on my Developer page and renew the token. 

In [358]:
import configparser
import pandas as pd
import requests
import json
import facebook
configs = configparser.ConfigParser()
configs.read('../config.ini')

['../config.ini']

Get photos from the COVID-19 Memes album (id: `10223159753881444`). 

I chose to use requests instead of the facebook package, because I couldn't find a way in the [documentation](https://facebook-sdk.readthedocs.io/en/latest/api.html) to get the photo source URL. I can get this using requests if I expand the photos field. 

In [389]:
photos = []
access_token = configs['FACEBOOK']['ACCESS_TOKEN']
url = "https://graph.facebook.com/10223159753881444?fields=photos{created_time,name,source}&access_token=" + access_token

while url: 
    response = requests.get(url)
    response_parsed = json.loads(response.content)
    if 'photos' in response_parsed.keys():
        data = response_parsed['photos']['data']
        url = response_parsed['photos']['paging']['next']
    else:
        data = response_parsed['data']
        if 'next' not in response_parsed['paging']:
            break
        url = response_parsed['paging']['next']
    photos.extend(data)
    
print('Number of photos collected:', len(photos))

Number of photos collected: 1175


Use the facebook package to collect reaction info for each photo. 

In [390]:
graph = facebook.GraphAPI(access_token=configs['FACEBOOK']['ACCESS_TOKEN'], version=3.1)

love = 'reactions.type(LOVE).limit(0).summary(total_count).as(love)'
wow = 'reactions.type(WOW).limit(0).summary(total_count).as(wow)'
haha = 'reactions.type(HAHA).limit(0).summary(total_count).as(haha)'
like = 'reactions.type(LIKE).limit(0).summary(total_count).as(like)'
angry = 'reactions.type(ANGRY).limit(0).summary(total_count).as(angry)'
sad = 'reactions.type(SAD).limit(0).summary(total_count).as(sad)'
reactions = 'created_time,message,' + love + ',' + wow + ',' + haha + ',' + angry + ',' + sad + ',' + like

user_id = '10225913074352735'
ids = [user_id + '_' + x['id'] for x in photos]
album = {}

for i in range(0,len(ids),50):
    album.update(graph.get_objects(ids=ids[i:i+50], fields=reactions)) # only can query 50 at a time
    
print('Album contains', len(album), 'entries.')

Album contains 1175 entries.


Format all the info into a data frame.

In [396]:
print('Failed to get data for:')
failed_ids = []

# Some photos fail for some reason...
# Their IDs seem correct but for some reason the API returns the album info instead
# So we will remove them
for x in album:
    if 'message' in album[x].keys():
        if album[x]['message'] == 'Coronavirus memes are spreading like, well, coronavirus. A collection of the better ones to help get us through social distancing. Additions welcome. Stay healthy!':
            failed_ids.append(x)
            print(x)

data = {'id': [],
        'date': [],
        'loves': [],
        'angry': [],
        'wow': [],
        'likes': [],
        'sad': [],
        'haha': [],
        'source': []}

for idx, photo in enumerate(album):
    data['id'].append(photo)
    data['date'].append(album[photo]['created_time'])
    data['loves'].append(album[photo]['love']['summary']['total_count'])
    data['angry'].append(album[photo]['angry']['summary']['total_count'])
    data['wow'].append(album[photo]['wow']['summary']['total_count'])
    data['likes'].append(album[photo]['like']['summary']['total_count'])
    data['sad'].append(album[photo]['sad']['summary']['total_count'])
    data['haha'].append(album[photo]['haha']['summary']['total_count'])
    data['source'].append(photos[idx]['source'])

df = pd.DataFrame(data)
df = df[~df['id'].isin(failed_ids)].reset_index(drop=True)
df['total'] = df.apply(lambda x: x['loves'] + x['haha'], axis=1)
df['date'] = pd.to_datetime(df['date'])
df['month'] = df['date'].dt.month
df

Failed to get data for:
10225913074352735_10223162078699563
10225913074352735_10223172163311672
10225913074352735_10223184983712174
10225913074352735_10223197388822294
10225913074352735_10223210981962114
10225913074352735_10223241859654037
10225913074352735_10223639283789392
10225913074352735_10223652056268696
10225913074352735_10224136914669853
10225913074352735_10224206084199048
10225913074352735_10224212941730482
10225913074352735_10224747561535643


Unnamed: 0,id,date,loves,angry,wow,likes,sad,haha,source,total,month
0,10225913074352735_10223171737901037,2020-03-18 09:19:11+00:00,0,0,0,6,0,24,https://scontent.feau1-1.fna.fbcdn.net/v/t1.0-...,24,3
1,10225913074352735_10223176507700279,2020-03-18 18:21:28+00:00,0,0,0,5,0,9,https://scontent.feau1-1.fna.fbcdn.net/v/t1.0-...,9,3
2,10225913074352735_10223161106315254,2020-03-17 13:19:01+00:00,0,0,0,3,0,11,https://scontent.feau1-1.fna.fbcdn.net/v/t1.0-...,11,3
3,10225913074352735_10223162376987020,2020-03-17 15:29:34+00:00,0,0,0,5,0,16,https://scontent.feau1-1.fna.fbcdn.net/v/t1.0-...,16,3
4,10225913074352735_10223183899045058,2020-03-19 10:33:17+00:00,0,0,0,5,0,5,https://scontent.feau1-1.fna.fbcdn.net/v/t1.0-...,5,3
...,...,...,...,...,...,...,...,...,...,...,...
1158,10225913074352735_10225784291533245,2020-12-09 18:29:37+00:00,0,0,0,5,0,27,https://scontent.feau1-1.fna.fbcdn.net/v/t1.0-...,27,12
1159,10225913074352735_10225784309133685,2020-12-09 18:33:33+00:00,0,0,0,1,0,13,https://scontent.feau1-1.fna.fbcdn.net/v/t1.0-...,13,12
1160,10225913074352735_10225784455577346,2020-12-09 18:57:47+00:00,0,0,0,5,0,9,https://scontent.feau1-1.fna.fbcdn.net/v/t1.0-...,9,12
1161,10225913074352735_10225795830301707,2020-12-11 04:09:39+00:00,0,0,0,8,0,19,https://scontent.feau1-1.fna.fbcdn.net/v/t1.0-...,19,12


Get the top n memes for each month.

In [394]:
top_memes = df.sort_values(['month','total'], ascending=False).groupby('month', sort=False).head(n)
top_memes

Unnamed: 0,id,date,loves,angry,wow,likes,sad,haha,source,total,month
1156,10225913074352735_10225784275092834,2020-12-09 18:28:22+00:00,0,0,0,6,0,30,https://scontent.feau1-1.fna.fbcdn.net/v/t1.0-...,30,12
1158,10225913074352735_10225784291533245,2020-12-09 18:29:37+00:00,0,0,0,5,0,27,https://scontent.feau1-1.fna.fbcdn.net/v/t1.0-...,27,12
1142,10225913074352735_10225761723089048,2020-12-06 21:20:23+00:00,0,0,0,2,0,20,https://scontent.feau1-1.fna.fbcdn.net/v/t1.0-...,20,12
1161,10225913074352735_10225795830301707,2020-12-11 04:09:39+00:00,0,0,0,8,0,19,https://scontent.feau1-1.fna.fbcdn.net/v/t1.0-...,19,12
1128,10225913074352735_10225761563605061,2020-12-06 20:58:33+00:00,0,0,2,10,0,18,https://scontent.feau1-1.fna.fbcdn.net/v/t1.0-...,18,12
...,...,...,...,...,...,...,...,...,...,...,...
220,10225913074352735_10223249107075218,2020-03-24 15:15:46+00:00,2,0,0,3,0,17,https://scontent.feau1-1.fna.fbcdn.net/v/t1.0-...,19,3
200,10225913074352735_10223242517790490,2020-03-23 23:45:19+00:00,0,0,0,0,0,18,https://scontent.feau1-1.fna.fbcdn.net/v/t1.0-...,18,3
5,10225913074352735_10223174526930761,2020-03-18 14:45:37+00:00,0,0,0,0,0,17,https://scontent.feau1-1.fna.fbcdn.net/v/t1.0-...,17,3
180,10225913074352735_10223227755301437,2020-03-22 20:31:03+00:00,0,0,0,5,0,17,https://scontent.feau1-1.fna.fbcdn.net/v/t1.0-...,17,3


Download all these memes. 

They will be named `[id]_month[month_num]_total[total_haha]`.

In [398]:
def download(meme):
    filename = download_dir + meme['id'].split('_')[1] + '_month' + str(meme['month']) + '_total' + str(meme['haha']) + '.png'
    response = requests.get(meme['source'], allow_redirects=True)
    open(filename, 'wb').write(response.content)

top_memes.apply(download, axis=1)

1156    None
1158    None
1142    None
1161    None
1128    None
        ... 
220     None
200     None
5       None
180     None
255     None
Length: 100, dtype: object