<a href="https://colab.research.google.com/github/vaccine-lang/facebook-data/blob/main/vaccine_lang_facebook_data_EDA_wk_5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Exploratory Data Analysis on Facebook Data (Week 5)

This week, we turn the reins over to the group as a whole. Below is the code to download four separate csv files, each one a collection of posts by either a set of Facebook groups or Facebook pages that we found to be vaccine hesitant.



## Establish the environment and download the data

In [36]:
# Import common libraries
import pandas as pd
import numpy as np
import os
import pprint as pp

In [2]:
# Import data files from GitHub

# Set remote (GitHub) and local paths for the data files
GITHUB_ROOT = "https://raw.githubusercontent.com/vaccine-lang/facebook-data/main"
BASE_DIR = "/"
print(f'Files will be downloaded from "{GITHUB_ROOT}"')
print(f'Files will be downloaded to "{BASE_DIR}".')

# Download each file
file_names = ["groups-1", "groups-2", "pages-1", "pages-2"]
print("Downloading data")
for name in file_names:
  cmd = " ".join(['wget', '-P', os.path.dirname(BASE_DIR + name + ".csv"), GITHUB_ROOT + "/data/biased-" + name + ".csv"])
  print("!"+cmd)
  if os.system(cmd) != 0:
    print('  ~~> ERROR')


Files will be downloaded from "https://raw.githubusercontent.com/vaccine-lang/facebook-data/main"
Files will be downloaded to "/".
Downloading data
!wget -P / https://raw.githubusercontent.com/vaccine-lang/facebook-data/main/data/biased-groups-1.csv
!wget -P / https://raw.githubusercontent.com/vaccine-lang/facebook-data/main/data/biased-groups-2.csv
!wget -P / https://raw.githubusercontent.com/vaccine-lang/facebook-data/main/data/biased-pages-1.csv
!wget -P / https://raw.githubusercontent.com/vaccine-lang/facebook-data/main/data/biased-pages-2.csv


## Begin exploratory data analysis

In [11]:
# Convert the CSVs into Data Frames

data = {}

for name in file_names:
  posts_file = "biased-" + name + ".csv"
  print(posts_file)
  data[name] = pd.read_csv(posts_file)




# posts_file = ""
# posts_df = pd.read_csv(posts_file)

biased-groups-1.csv


  interactivity=interactivity, compiler=compiler, result=result)


biased-groups-2.csv
biased-pages-1.csv
biased-pages-2.csv


  interactivity=interactivity, compiler=compiler, result=result)


In [14]:
for name in file_names:
  print(name)
  print(data[name].columns)

groups-1
Index(['Group Name', 'User Name', 'Facebook Id', 'Page Category',
       'Page Admin Top Country', 'Page Description', 'Page Created',
       'Likes at Posting', 'Followers at Posting', 'Post Created',
       'Post Created Date', 'Post Created Time', 'Type', 'Total Interactions',
       'Likes', 'Comments', 'Shares', 'Love', 'Wow', 'Haha', 'Sad', 'Angry',
       'Care', 'Video Share Status', 'Is Video Owner?', 'Post Views',
       'Total Views', 'Total Views For All Crossposts', 'Video Length', 'URL',
       'Message', 'Link', 'Final Link', 'Image Text', 'Link Text',
       'Description', 'Sponsor Id', 'Sponsor Name', 'Sponsor Category',
       'Overperforming Score (weighted  —  Likes 1x Shares 1x Comments 1x Love 1x Wow 1x Haha 1x Sad 1x Angry 1x Care 1x )'],
      dtype='object')
groups-2
Index(['Group Name', 'User Name', 'Facebook Id', 'Page Category',
       'Page Admin Top Country', 'Page Description', 'Page Created',
       'Likes at Posting', 'Followers at Posting', 'P

In [32]:
# column = "Message"
# print(len(data["groups-1"]))
# print(pd.unique(data["groups-1"][column].values))

# print(len(pd.unique(data["groups-1"][column].values)))

data["groups-1"]["Message"].dropna().sample(15)

21664    Hey. Would anyone know anything about storing ...
19064    https://www.facebook.com/groups/911conspiracyt...
32869                       Is the Corona Virus Man Made? 
4110     Please take part in this urgent ‘Call to Actio...
733                              Why wasn't this on MSM???
16356                                          DAN MUST GO
9482                                              🤬🤬😡🤬😡🤬😡😡
19720             Crooked Mongrels cannot help themselves?
22415    I hope this isn’t off topic. Does anyone know ...
27348                                               arnica
33713    Has anyone any reviews on Thiosinaminum cream ...
18347    Please can someone help settle my mind. I am a...
17273    https://www.facebook.com/beckerfornd/videos/65...
28967    Is sourdough bread ok to have? No pesticides? ...
12639    Monday, 10/12/20. 7:00 PM, Eastern time. Presi...
Name: Message, dtype: object

In [38]:
ids = [21664, 19064, 32869, 4110, 733, 16356, 9482, 19720, 22415, 27348, 33713, 18347, 17273, 28967, 12639]
for id in ids:
  pp.pprint(data["groups-1"].iloc[id]["Message"])


('Hey. Would anyone know anything about storing a newborns stem cells from the '
 'umbilical cord or if this is a worthwhile process? Thanks')
'https://www.facebook.com/groups/911conspiracytheoriesareaninsidejob/permalink/3427570800669947/'
'Is the Corona Virus Man Made? '
('Please take part in this urgent ‘Call to Action’ by copying and pasting the '
 'text on link below and sending to your MP and listed copy addressees. If we '
 'don’t do it, they will think we are all happy about the announcement.')
"Why wasn't this on MSM???"
'DAN MUST GO'
'🤬🤬😡🤬😡🤬😡😡'
'Crooked Mongrels cannot help themselves?'
('I hope this isn’t off topic. Does anyone know of a cheap of free smear test '
 'service (not my gp) in England or wales (cheap meaning £100 or under). '
 'Please no debates about smear tests TIA 👍')
'arnica'
('Has anyone any reviews on Thiosinaminum cream for a wound to reduce '
 'scarring?Does it help? I have found one that has it and also rosehip oil and '
 'calendula in the cream')
('Plea

In [42]:
print(data["groups-1"].iloc[19720]["Image Text"])

# How do we handle the relationships between the Message, 
# Image Text, Link Text, and Description?



34965


In [45]:
print(data["groups-1"]["Group Name"].value_counts())
for name in file_names:
  print(len(data[name]))

wake up Australia                                                18958
Arnica - Parents' Support Network, Promoting Natural Immunity     9117
New Yorkers for Medical Freedom and Parental Rights               2680
Christians rejecting Covid-19 Vaccine                             1561
Vaccine Induced Autoimmune Disease                                1350
❤️ The rEvolution! ✊                                               729
Anti Mask Anti vaccine Pro Freedom                                 495
Christopher Bunch's support page                                    75
Name: Group Name, dtype: int64
34965
145221
10793
18310


In [None]:
# What sorts of things would we want to know about this data?