# Instagram Engagement Analysis - Part 2: Tags

### What do we want to answer?
- Which tag gets the most engagement in these publications?
    - Now we want to look only at tags
<br><br>
- It also provides some guidance:
    - You can ignore the views column, we only want to understand likes, comments and interactions
    - Empty tags are those that don't really have a tag (please treat as empty)

### Importing and Viewing the database

In [1]:
# Importing pandas
import pandas as pd
import numpy as np
# Using the same format as the values
pd.options.display.float_format = '{:,.2f}'.format

In [7]:
# Reading the database excel file
base = pd.read_excel("Company_X_Database.xlsx")

In [8]:
# Deleting the "Views" column
base = base.drop("Views",axis=1)

In [9]:
# Viewing the first 5 lines again
base.head()

Unnamed: 0,Type,Date,Likes,Comments,Tags,People,Campaigns,Carousel,Interactions
0,Photo,2021-09-11,2858,16,Store,N,N,,2874
1,Photo,2021-09-11,2930,28,Store/Products,N,N,,2958
2,Photo,2021-09-11,2807,9,Store,N,N,,2816
3,Vídeo,2021-09-12,5115,49,Products,N,N,,5164
4,Photo,2021-09-13,4392,45,Products,Y,N,,4437


In [10]:
base.Tags.head()

0             Store
1    Store/Products
2             Store
3          Products
4          Products
Name: Tags, dtype: object

In [11]:
# Grouping by tags
base.groupby("Tags")["Likes"].mean()

Tags
Commemorative dates              17,975.00
Commemorative dates/Promotions   29,084.00
Influencers                      15,197.29
New Products                     11,619.57
Products                          5,666.92
Promotions                       26,645.50
Store                             2,832.50
Store/Products                    2,930.00
Tips for use/New Products         5,703.50
Tips for use/Products             7,586.67
Trends                           22,400.67
Trends/Products                  12,894.00
Name: Likes, dtype: float64

### To be able to analyze the tags separately, we can split rows with 2 tags into 2 rows
- Let's use split to separate the tags into a list.
- To transform lists with 2 tags into 2 different lines, we'll use explode.

**The split command separates a text into a list based on some separator**

In [101]:
text = "let's study Data Analysis"

In [102]:
# If I don't pass any arguments, it will separate them with a space
text.split()

["let's", 'study', 'Data', 'Analysis']

In [103]:
text = "let's-study-Data-Analysis"

In [104]:
# If I don't pass any arguments, it will separate them with a space
text.split()

["let's-study-Data-Analysis"]

In [105]:
# If it's another delimiter, I need to enter
text.split("-")

["let's", 'study', 'Data', 'Analysis']

In [106]:
# Using this for our "Tags" column
# Turning the Tags column into a list of tags
base.Tags = base.Tags.str.split("/")
base.head()

Unnamed: 0,Type,Date,Likes,Comments,Tags,People,Campaigns,Carousel,Interactions
0,Photo,2021-09-11,2858,16,[Store],N,N,,2874
1,Photo,2021-09-11,2930,28,"[Store, Products]",N,N,,2958
2,Photo,2021-09-11,2807,9,[Store],N,N,,2816
3,Vídeo,2021-09-12,5115,49,[Products],N,N,,5164
4,Photo,2021-09-13,4392,45,[Products],Y,N,,4437


**explode will separate a column of a DataFrame into 1 row for each element in the list**

In [107]:
# Creating dictionaries
dic = {
    "A": [[1,2],3,[4,5,6],[]],
    "B": [1,2,3,4],
}

# Turning this dictionary into a DataFrame
base_dic = pd.DataFrame(dic)

base_dic

Unnamed: 0,A,B
0,"[1, 2]",1
1,3,2
2,"[4, 5, 6]",3
3,[],4


In [108]:
# Using explode to separate column A
base_dic = base_dic.explode('A')
base_dic

Unnamed: 0,A,B
0,1.0,1
0,2.0,1
1,3.0,2
2,4.0,3
2,5.0,3
2,6.0,3
3,,4


- Everything in the list will be separated by 1 line per list element
- If not in the list, the element will be kept
- Empty lists will have a value of 'NaN'
<br><br>
- For the other columns, they will repeat their values
- Even the index will be repeated

In [110]:
# Separating the "Tags" column into 1 line for each list element
base = base.explode('Tags')
base.head()

Unnamed: 0,Type,Date,Likes,Comments,Tags,People,Campaigns,Carousel,Interactions
0,Photo,2021-09-11,2858,16,Store,N,N,,2874
1,Photo,2021-09-11,2930,28,Store,N,N,,2958
1,Photo,2021-09-11,2930,28,Products,N,N,,2958
2,Photo,2021-09-11,2807,9,Store,N,N,,2816
3,Vídeo,2021-09-12,5115,49,Products,N,N,,5164


### Doing the same analysis of the average per tag

**Important note: be very careful because the other columns will be duplicated, so we can't do the same average calculation we were doing before**
<br><br>
- In the previous file:
![googlesheet_screen_shot(3).png](googlesheet_screen_shot(3).png)

In [111]:
# Repeating the average calculation for people
base.groupby("People")["Likes"].mean()

People
N    4,154.62
Y   14,100.57
Name: Likes, dtype: float64

**Let's do the analysis that involves tagging after doing this with the base**

In [116]:
# Making to Tag
base.groupby("Tags")["Likes"].mean()

Tags
Commemorative dates   20,752.25
Influencers           15,197.29
New Products          10,304.89
Products               6,269.82
Promotions            27,458.33
Store                  2,865.00
Tips for use           6,833.40
Trends                20,024.00
Name: Likes, dtype: float64

In [117]:
# Sorting by likes
base.groupby("Tags")[["Likes","Comments"]].mean().sort_values("Likes",ascending=False)

Unnamed: 0_level_0,Likes,Comments
Tags,Unnamed: 1_level_1,Unnamed: 2_level_1
Promotions,27458.33,531.0
Commemorative dates,20752.25,343.5
Trends,20024.0,352.25
Influencers,15197.29,161.71
New Products,10304.89,198.56
Tips for use,6833.4,133.6
Products,6269.82,94.12
Store,2865.0,17.67


- **Promotional posts get the most engagement**
- **In addition to promotions, commemorative dates and trends also generate good engagement**

**And what's untagged?**

In [119]:
# Filtering tagless values
base[base.Tags.isnull()]

Unnamed: 0,Type,Date,Likes,Comments,Tags,People,Campaigns,Carousel,Interactions
11,Photo,2021-10-12,17831,391,,Y,Y,,18222
19,Photo,2021-12-12,16086,268,,Y,Y,,16354
29,Photo,2022-01-19,8612,142,,Y,N,,8754
38,Photo,2022-02-15,17687,213,,Y,N,,17900
41,Photo,2022-02-22,12530,90,,Y,N,,12620
43,Photo,2022-03-04,24399,266,,Y,Y,,24665
49,Photo,2022-03-22,9087,106,,Y,Y,,9193
50,Photo,2022-03-26,16551,186,,Y,N,,16737


In [120]:
base.loc[base.Tags.isnull(),"Tags"]

11    NaN
19    NaN
29    NaN
38    NaN
41    NaN
43    NaN
49    NaN
50    NaN
Name: Tags, dtype: object

**In the same way as I did for Carousel, I could have done it for tags by writing "No tag", in which case it would appear in the groupby**

In [121]:
# Assigning untagged text to columns where the tag is NaN
base.loc[base.Tags.isnull(),"Tags"] = "No_Tags"

In [122]:
# Showing the likes by tag table again
base.groupby("Tags")[["Likes","Comments"]].mean().sort_values("Likes",ascending=False)

Unnamed: 0_level_0,Likes,Comments
Tags,Unnamed: 1_level_1,Unnamed: 2_level_1
Promotions,27458.33,531.0
Commemorative dates,20752.25,343.5
Trends,20024.0,352.25
No_Tags,15347.88,207.75
Influencers,15197.29,161.71
New Products,10304.89,198.56
Tips for use,6833.4,133.6
Products,6269.82,94.12
Store,2865.0,17.67


In [123]:
# As instructed, we will return as 'NaN' so that these values are ignored.
import numpy as np
base.loc[base.Tags == 'No_Tags',"Tags"] = np.nan

In [124]:
# And back to the columns with null values
base[base.Tags.isnull()]

Unnamed: 0,Type,Date,Likes,Comments,Tags,People,Campaigns,Carousel,Interactions
11,Photo,2021-10-12,17831,391,,Y,Y,,18222
19,Photo,2021-12-12,16086,268,,Y,Y,,16354
29,Photo,2022-01-19,8612,142,,Y,N,,8754
38,Photo,2022-02-15,17687,213,,Y,N,,17900
41,Photo,2022-02-22,12530,90,,Y,N,,12620
43,Photo,2022-03-04,24399,266,,Y,Y,,24665
49,Photo,2022-03-22,9087,106,,Y,Y,,9193
50,Photo,2022-03-26,16551,186,,Y,N,,16737


In [125]:
# And these lines again stop being taken into account in the aggregation
base.groupby("Tags")[["Likes","Comments"]].mean().sort_values("Likes",ascending=False)

Unnamed: 0_level_0,Likes,Comments
Tags,Unnamed: 1_level_1,Unnamed: 2_level_1
Promotions,27458.33,531.0
Commemorative dates,20752.25,343.5
Trends,20024.0,352.25
Influencers,15197.29,161.71
New Products,10304.89,198.56
Tips for use,6833.4,133.6
Products,6269.82,94.12
Store,2865.0,17.67


**Analyzing tags with people and campaigns:**

In [127]:
# Making for People and Tag
base.groupby(["People","Tags"])[["Likes","Comments"]].mean()

Unnamed: 0_level_0,Unnamed: 1_level_0,Likes,Comments
People,Tags,Unnamed: 2_level_1,Unnamed: 3_level_1
N,New Products,5359.0,62.0
N,Products,4450.67,60.78
N,Store,2865.0,17.67
Y,Commemorative dates,20752.25,343.5
Y,Influencers,15197.29,161.71
Y,New Products,10923.12,215.62
Y,Products,8316.38,131.62
Y,Promotions,27458.33,531.0
Y,Tips for use,6833.4,133.6
Y,Trends,20024.0,352.25


In [128]:
# We can also sort by likes
base.groupby(["People","Tags"])[["Likes","Comments"]].mean().sort_values("Likes",ascending=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,Likes,Comments
People,Tags,Unnamed: 2_level_1,Unnamed: 3_level_1
Y,Promotions,27458.33,531.0
Y,Commemorative dates,20752.25,343.5
Y,Trends,20024.0,352.25
Y,Influencers,15197.29,161.71
Y,New Products,10923.12,215.62
Y,Products,8316.38,131.62
Y,Tips for use,6833.4,133.6
N,New Products,5359.0,62.0
N,Products,4450.67,60.78
N,Store,2865.0,17.67


In [130]:
# Making for Campaigns and Tags
base.groupby(["Campaigns","Tags"])[["Likes","Comments"]].mean().sort_values("Likes",ascending=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,Likes,Comments
Campaigns,Tags,Unnamed: 2_level_1,Unnamed: 3_level_1
Y,Promotions,33217.5,490.5
Y,Trends,22400.67,386.67
Y,Commemorative dates,20752.25,343.5
Y,Influencers,18715.4,197.6
N,Promotions,15940.0,612.0
N,Trends,12894.0,249.0
Y,New Products,11040.67,323.0
N,New Products,9937.0,136.33
Y,Products,9074.0,67.5
N,Tips for use,6833.4,133.6


## Conclusions
- **Having people in the posts is fundamental for good engagement with the publication**
    - In all the tags where there were people, the result was much better
- **Creating campaigns helps a lot in promoting the brand**
- **Promotions performed absurdly better than any other tag**
    - However, it is a tag that can cost the store money, which must be analyzed
- **Using content that is trending also helps to promote the brand, even if the trend is from other niches**
- **The best way to showcase products is through people using them, and if possible in special date campaigns**
- **For new products, the inclusion of people is even more critical, being almost double when there is a face with the product**
- **We can't say that the 'Store' tag is bad until we test it with people or in a campaign. It's worth testing to analyze the results.**
- **We will continue to monitor the posts to find new patterns as we still have little information from the base**