## 1. Introduction

#### 1.1 Background Information

According to Deloitte Digital (2015), digital interactions are expected to influence 64 cents
of every dollar spent in retail stores by the end of 2015, meaning that social media is increasing its direct impact on companies' revenues. Combined that with the acceleration of social media use in the pandemic, it's key for companies to understand what will engage consumers the most. 

Therefore, we will compare average Lifetime Post Total Consumptions/Lifetime Post Total Impressions ratio between different types of content (Status, Photo, Link, Video) and find out if there is a statistical difference between them. The idea being we want to see how effective each type of content is when converting looks into engagement.

#### 1.2 Dataset Description

// TODO: add brief description of dataset and explain what the variables of interest represent

## 2. Preliminary Results

#### 2.1 Reading the Data

In [45]:
# Load the required libraries
library(tidyverse)

# Here we read the data set straight from the web
# The original source is linked here (https://archive.ics.uci.edu/ml/datasets/Facebook+metrics)
temp <- tempfile()
download.file("https://archive.ics.uci.edu/ml/machine-learning-databases/00368/Facebook_metrics.zip",temp)
data <- read_delim(unz(temp, "dataset_Facebook.csv"), delim=";")
unlink(temp)

[1mRows: [22m[34m500[39m [1mColumns: [22m[34m19[39m

[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ";"
[31mchr[39m  (1): Type
[32mdbl[39m (18): Page total likes, Category, Post Month, Post Weekday, Post Hour, P...


[36mℹ[39m Use [30m[47m[30m[47m`spec()`[47m[30m[49m[39m to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set [30m[47m[30m[47m`show_col_types = FALSE`[47m[30m[49m[39m to quiet this message.



#### 2.2 Cleaning the Data

In [46]:
# First remove spaces from the column names
names(data)<-make.names(names(data),unique = TRUE)

# Select only the variables of interest
data_selected = data %>%
    select(Type, Lifetime.Post.Total.Impressions, Lifetime.Post.Consumptions) %>%
    filter(!is.na(Type))

# Assign new more manageable column names
names(data_selected) <- c("Type", "LifetimeImpressions", "LifetimeConsumptions")

# Preview the clean dataset
head(data_selected)

Type,LifetimeImpressions,LifetimeConsumptions
<chr>,<dbl>,<dbl>
Photo,5091,159
Status,19057,1674
Photo,4373,154
Photo,87991,1119
Photo,13594,580
Status,20849,1389


#### 2.3 Data Summary

In [47]:
# Here we will provide a summary of the dataset
data_summary <- data_selected %>%
    group_by(Type) %>%
    summarise(MeanImpressions = mean(LifetimeImpressions), MeanConsumptions = mean(LifetimeConsumptions), Num = sum(Type == Type))

data_summary

Type,MeanImpressions,MeanConsumptions,Num
<chr>,<dbl>,<dbl>,<int>
Link,28725.45,374.0909,22
Photo,28994.5,1299.0258,426
Status,24244.47,2838.8667,45
Video,102622.43,2600.1429,7
