# Description of analysis
In this notebook, we aim to look at engagement of posts, and whether posting at specific days of the week and timings of the day lead to greater engagement rate. To do so, we have to define the metric for "engagement".<br><br>
We face certain limitations and concerns when defining engagement of a post on a particular date. We address this points below and come up with our own engagement metric.

# Limitation/ concerns
This measurement of engagement comes with two limitations/threats to validity:<br>
<ol>
    <li>Likes of posts will rise will number of followrs. While the ideal metric would be: <i>Number of likes divided by number of followers at the time of posting</i>, instagram does not allow us to retrieve historical number of followers.</li>
    <li>If we look at absolute number of likes per day, e.g. all likes gathered on posts on Thursdays, we may have bias, where if Fukudon posts regularly on Thursdays, Thursdays may gain a higher engagement rate</li>
</ol>

# Mitigation
We first construct a linear regressor to fit likes across all dates, and look at the the percentage variance of the post to the regressor line. This is the engagement per post. We do this for all posts, and get the average engagement per post over a date.

# Formula for "Engagement"
In conclusion, we come up with the following formula:<h3><i>Engagement per day: (sum of all percentage variance for a day) divided by (number of posts on that day)</i></h3>


# Import relevant libraries and files

In [44]:
import pandas as pd
import numpy as np

from datetime import datetime, timedelta

In [45]:
instagram_df = pd.read_csv("../data/instagram_posts.csv")

In [46]:
instagram_df.head()

Unnamed: 0.1,Unnamed: 0,node.display_url,node.edge_media_to_caption.edges,node.owner.full_name,node.location.name,node.location.slug,hashtags,Likes,Date,Datetime
0,57,https://instagram.fsin5-1.fna.fbcdn.net/v/t51....,Let us bring you behind the scenes and show yo...,Fukudon | Donburi Specialist,Fukudon,fukudon,,29,7/2/2022,7/2/2022 15:20
1,56,https://instagram.fsin5-1.fna.fbcdn.net/v/t51....,"🐯新年快乐 恭喜发财🍊\n\nFrom all of us at Fúkudon, we w...",Fukudon | Donburi Specialist,,,,16,3/2/2022,3/2/2022 11:24
2,55,https://instagram.fsin5-1.fna.fbcdn.net/v/t51....,We’ve heard you! We are pleased to finally ann...,Fukudon | Donburi Specialist,Fukudon,fukudon,,18,28/1/2022,28/1/2022 18:04
3,54,https://instagram.fsin5-1.fna.fbcdn.net/v/t51....,🐯🧧 Let us usher in a Roarsome Lunar New Year t...,Fukudon | Donburi Specialist,,,,17,26/1/2022,26/1/2022 13:37
4,53,https://instagram.fsin5-1.fna.fbcdn.net/v/t51....,🧧Introducing to you our Roaring Combo this Lun...,Fukudon | Donburi Specialist,,,,47,23/1/2022,23/1/2022 16:01


In [47]:
instagram_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 58 entries, 0 to 57
Data columns (total 10 columns):
 #   Column                            Non-Null Count  Dtype 
---  ------                            --------------  ----- 
 0   Unnamed: 0                        58 non-null     int64 
 1   node.display_url                  55 non-null     object
 2   node.edge_media_to_caption.edges  58 non-null     object
 3   node.owner.full_name              55 non-null     object
 4   node.location.name                39 non-null     object
 5   node.location.slug                39 non-null     object
 6   hashtags                          9 non-null      object
 7   Likes                             58 non-null     int64 
 8   Date                              58 non-null     object
 9   Datetime                          58 non-null     object
dtypes: int64(2), object(8)
memory usage: 4.7+ KB


<h1> Archive: Look for alternative method </h1>

# Get series of dates from oldest to newest post

In [48]:
def get_date_from_string(date_string):
    return datetime.strptime(date_string, "%d/%m/%Y").date()

instagram_df["date_obj"] = instagram_df["Date"].apply(get_date_from_string)

# Sort dataframe from oldest post to newest post
instagram_df = instagram_df.sort_values(by="date_obj")

In [49]:
dates_of_posts = instagram_df["date_obj"].to_list()
oldest_post_date = dates_of_posts[0]
newest_post_date = dates_of_posts[-1]

In [60]:
list_of_dates = pd.date_range(oldest_post_date, newest_post_date, freq='d').tolist()
list_of_dates = [item.date() for item in list_of_dates]

In [61]:
# Sanity check
assert (list_of_dates[0] == oldest_post_date) and (list_of_dates[-1] == newest_post_date), "Oldest or newest date not correct"