# 🤳1100 Instagram Users Datetime Posts Data

<div style="align:center">
    <img src="https://storage.googleapis.com/kaggle-datasets-images/1187749/1986717/ecb1c5df6c76935bc6f1ddea4b19b1ae/dataset-cover.png?t=2021-03-01-21-24-55">
</div>

<br>

## Introduction

This notebook is dedicated to the task from the ***1100 Instagram Users Datetime Posts Data*** dataset given to us by [@vasileiosmpletsos](https://www.kaggle.com/vasileiosmpletsos). Here are the targeted tasks:
1. **Determine what time each day you will end with more likes**
2. **Determine if previous like have correlation with upcoming post**
3. **Determine which months do better**


## Table of contents

- Load (packages, data)
- Analysis
- Conclusion

___
# ☁️ Load

In [None]:
# Load main packages
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt # plotting handling
import seaborn as sns # plotting handling
import altair as alt # plotting handling
import time # timer

In [None]:
# Load data
data = pd.read_csv('../input/1100-instagram-users-datetime-posts-data/Instagram_Data.csv')

In [None]:
# Clean/Preprocessing
data['Date Posted'] = pd.to_datetime(data['Date Posted']) # Convert to datetime
data['Type'] = data['Type'].astype('category') # Change to category
data['Day Name'] = data['Date Posted'].dt.strftime("%A") # Get day name
data['Workday'] = data['Day Name'].apply(lambda x: False if x in ['Saturday', 'Sunday'] else True) # Is workday or not

In [None]:
data.head()

___
# 📊 Analysis

## 🤔 Determine which months do better ?

In [None]:
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

In [None]:
fig, ax = plt.subplots(4, 1, figsize=(20, 4*5))

sns.barplot(x=months, y=data.groupby('Month')['Likes'].mean(), ax=ax[0], color='#852852')
ax[0].title.set_text('Sum likes by months')

sns.barplot(x=months, y=data.query("Type == 'Image'").groupby('Month')['Likes'].mean(), ax=ax[1], color="#802885")
ax[1].title.set_text('Sum likes by months for Image')

sns.barplot(x=months, y=data.query("Type == 'Images'").groupby('Month')['Likes'].mean(), ax=ax[2], color="#632885")
ax[2].title.set_text('Sum likes by months for Images')

sns.barplot(x=months, y=data.query("Type == 'Video'").groupby('Month')['Likes'].mean(), ax=ax[3], color="#302885")
ax[3].title.set_text('Sum likes by months for Video')

plt.show()

In [None]:
def plot_agg(agg_name="Average", agg_function=np.mean, ci=75):
    palette = ["navy", "teal", "crimson"]
    cols = ['Image', 'Images', 'Video'] 
    STEP = 0.1
    
    # Plot lineplot
    plt.figure(figsize=(20, 9))
    
    for col, pal in zip(cols, palette):
        # Plot Mean - Max
        preframe = data[data["Type"] == col]
        frame = preframe.groupby('Month')['Likes'].apply(agg_function)
        sns.lineplot(data=preframe, x="Month", y="Likes", color=pal, estimator=agg_function, ci=ci, label=col)
        plt.plot(frame.argmax()+1, frame.max(), color=pal, markersize=10, marker='o')
        plt.text(frame.argmax()+1 + STEP, frame.max(), f'MAX({col})={round(frame.max())} | {months[frame.argmax()]}', color=pal, weight='bold')
    
    plt.legend(loc='lower right')
    plt.title(f'{agg_name} number of likes by Month and Type of content ({ci}% confidence interval)')
    plt.show()

In [None]:
def plot_agg_simple(agg_name="Average", agg_function=np.mean, ci=75):
    STEP = 0.1
    plt.figure(figsize=(20, 8))
    frame = data.groupby('Month')['Likes'].apply(agg_function)
    sns.lineplot(data=data, x="Month", y="Likes", estimator=agg_function, ci=ci, color='#A788B5')
    plt.plot(frame.argmax()+1, frame.max(), color='red', markersize=10, marker='o')
    plt.text(frame.argmax()+1 + STEP, frame.max(), f'MAX()={round(frame.max())} | {months[frame.argmax()]}', color='red', weight='bold')
    plt.plot(frame.argmin()+1, frame.min(), color='blue', markersize=10, marker='o')
    plt.text(frame.argmin()+1 + STEP, frame.min(), f'MIN()={round(frame.min())} | {months[frame.argmin()]}', color='blue', weight='bold')
    plt.title(f"{agg_name} number of likes by Month ({ci}% confidence interval)")
    plt.show()

In [None]:
plot_agg_simple(agg_name="Average", agg_function=np.mean, ci=75)
plot_agg_simple(agg_name="Median", agg_function=np.median, ci=75)
plot_agg_simple(agg_name="Sum", agg_function=np.sum, ci=75)

In [None]:
plot_agg(agg_name="Average", agg_function=np.mean)
plot_agg(agg_name="Median", agg_function=np.median)
plot_agg(agg_name="Sum", agg_function=np.sum)

**<ins>Answer</ins>**:
- **January**: Based on the data there is a lot of likes during this month if we **sum** them all.
- **February**:
- **March**:
- **April**: This month is the best on **average** of likes for ***Image***.
- **May**: The maximum **median** of likes is during this month, you will have more chance to have more likes during this month than the whole year especially for ***Images***!
- **June**:
- **July**: Without separating the type of content this the month when there is the most likes on **average** especially for ***Image***.
- **August**:
- **September**:
- **October**:
- **November**:
- **December**:

**<ins>What I would suggest</ins>**: In general if you want to get the most likes based on the month, I would suggest to post your content during the following months: `[April, May, June, July]`, in fact it corresponds to the "***Summer Holidays***" in the Northern Hemisphere.

## 🤔 Determine what time each day you will end with more likes

In [None]:
plt.figure(figsize=(20,6))
sns.barplot(x='Hour', y='Likes', data=data, palette='Blues_d', capsize=.2)
plt.title("Average number of lieks by hour")
plt.show()

> Based on **average** on likes for every types of content there are more likes between 8:00 and 8:59.

In [None]:
for content_type, col in zip(data["Type"].unique(), ['icefire', 'magma', 'Blues_d']):
    plt.figure(figsize=(20,6))
    sns.barplot(x='Hour', y='Likes', data=data.query(f'Type == "{content_type}"'), hue='Workday', capsize=.2, palette=col)
    plt.title(f'Average number of likes of {content_type} by hour and workday')
    plt.show()

<ins>**Answer**</ins>:
- **Image**: Peak no. of likes are between **6am** and **9am**, we see that between **10am** and **7pm** there are less likes because maybe people are at school/work.
- **Images**: Peak no. of likes for workdays are **2am**, **7am** and **12am** and for weekends it is at **5am**, **8am** and **9am**!
- **Video**: Peak no. of likes are during workdays at **8am** and at other times it is more normal distributed. Before going to work/school?? 

## 🤔 Determine which day you will end with more likes

In [None]:
plt.figure(figsize=(20,10))
sns.pointplot(x='Hour', y='Likes', data=data[data["Type"] == "Image"], hue='Day Name', palette='tab10', estimator=np.mean, dodge=True, ci=0)
plt.show()

Based on the data, we can see that peak no. of likes for **Image** are:
1. Thursday at 8am
2. Saturday at 9am
3. Monday at 10am
4. Wednesday at 9am
5. Friday at 9am

In [None]:
plt.figure(figsize=(20,10))
sns.pointplot(x='Hour', y='Likes', data=data[data["Type"] == "Images"], hue='Day Name', palette='tab10', estimator=np.mean, dodge=True, ci=0)
plt.show()

Based on the data, we can see that peak no. of likes for **Images** are:
1. Wednesday at 9am
2. Friday at 5 am
3. Friday at 8am
4. Tuesday at 4 am
5. Monday at 8am

In [None]:
plt.figure(figsize=(20,10))
sns.pointplot(x='Hour', y='Likes', data=data[data["Type"] == "Video"], hue='Day Name', palette='tab10', estimator=np.mean, dodge=True, ci=0)
plt.show()

Based on the data, we can see that peak no. of likes for **Video** are:
1. Sunday at 8am
2. Friday at 6, 7 and 8am
3. Thursday at 8am

___
# 📑 Conclusion

Based on the different data visualizations, we can expect for an user to post mostly during the summer holidays and one/two month prior to that especially between 7 to 9am!