In [30]:
import pandas as pd
import warnings
import numpy as np

warnings.filterwarnings("ignore")

import pandas as pd
import numpy as np

df_with_topics = pd.read_csv('Shortlisted_Influencer_Posts_With_topics.csv')

# Sort the DataFrame by comments_count in descending order
df_sorted = df_with_topics.sort_values(by='comments_count', ascending=False)

# Calculate the log of comments_count
df_sorted['log_comments_count'] = np.log(df_sorted['comments_count'])

# Calculate the quartiles based on the log of comments_count
quantile_criteria = 'log_comments_count'
q1 = df_sorted[quantile_criteria].quantile(0.25)
q2 = df_sorted[quantile_criteria].quantile(0.5)
q3 = df_sorted[quantile_criteria].quantile(0.75)

# Divide the DataFrame into highest and lowest quartiles
fourth_quartile = df_sorted[df_sorted[quantile_criteria] >= q3]
third_quartile = df_sorted[(df_sorted[quantile_criteria] < q3) & (df_sorted[quantile_criteria] >= q2)]
second_quartile = df_sorted[(df_sorted[quantile_criteria] < q2) & (df_sorted[quantile_criteria] >= q1)]
first_quartile = df_sorted[df_sorted[quantile_criteria] <= q1]

# Calculate the sum of comments and likes for each quartile
fourth_quartile_sum = fourth_quartile[['comments_count', 'likes_count']].sum()
third_quartile_sum = third_quartile[['comments_count', 'likes_count']].sum()
second_quartile_sum = second_quartile[['comments_count', 'likes_count']].sum()
first_quartile_sum = first_quartile[['comments_count', 'likes_count']].sum()

# Calculate the ratio of comments to likes for each quartile
fourth_quartile_sum['ratio_comments_to_likes'] = fourth_quartile_sum['comments_count'] / fourth_quartile_sum['likes_count']
third_quartile_sum['ratio_comments_to_likes'] = third_quartile_sum['comments_count'] / third_quartile_sum['likes_count']
second_quartile_sum['ratio_comments_to_likes'] = second_quartile_sum['comments_count'] / second_quartile_sum['likes_count']
first_quartile_sum['ratio_comments_to_likes'] = first_quartile_sum['comments_count'] / first_quartile_sum['likes_count']

# Calculate the average of each topic for each quartile
fourth_quartile_avg = fourth_quartile[['topic_0', 'topic_1', 'topic_2', 'topic_3', 'topic_4']].mean()
third_quartile_avg = third_quartile[['topic_0', 'topic_1', 'topic_2', 'topic_3', 'topic_4']].mean()
second_quartile_avg = second_quartile[['topic_0', 'topic_1', 'topic_2', 'topic_3', 'topic_4']].mean()
first_quartile_avg = first_quartile[['topic_0', 'topic_1', 'topic_2', 'topic_3', 'topic_4']].mean()

# Create a new DataFrame with the calculated values
summary_df = pd.DataFrame({
    'Quartile': ['Fourth', 'third', 'Second', 'First'],
    'Sum of Comments': [fourth_quartile_sum['comments_count'], third_quartile_sum['comments_count'], second_quartile_sum['comments_count'], first_quartile_sum['comments_count']],
    'Sum of Likes': [fourth_quartile_sum['likes_count'], third_quartile_sum['likes_count'], second_quartile_sum['likes_count'], first_quartile_sum['likes_count']],
    'Ratio_Comments_to_like': [fourth_quartile_sum['ratio_comments_to_likes'], third_quartile_sum['ratio_comments_to_likes'], second_quartile_sum['ratio_comments_to_likes'], first_quartile_sum['ratio_comments_to_likes']],
    'Avg topic_0': [fourth_quartile_avg['topic_0'], third_quartile_avg['topic_0'], second_quartile_avg['topic_0'], first_quartile_avg['topic_0']],
    'Avg topic_1': [fourth_quartile_avg['topic_1'], third_quartile_avg['topic_1'], second_quartile_avg['topic_1'], first_quartile_avg['topic_1']],
    'Avg topic_2': [fourth_quartile_avg['topic_2'], third_quartile_avg['topic_2'], second_quartile_avg['topic_2'], first_quartile_avg['topic_2']],
    'Avg topic_3': [fourth_quartile_avg['topic_3'], third_quartile_avg['topic_3'], second_quartile_avg['topic_3'], first_quartile_avg['topic_3']],
    'Avg topic_4': [fourth_quartile_avg['topic_4'], third_quartile_avg['topic_4'], second_quartile_avg['topic_4'], first_quartile_avg['topic_4']]
})

summary_df

# multiply last 6 columns by 100 and add % at column name
summary_df.iloc[:, 4:] = summary_df.iloc[:, 4:] * 100
summary_df.columns = ['Quartile', 'Sum of Comments', 'Sum of Likes', 'Ratio_Comments_to_like', 'Avg topic_0 %', 'Avg topic_1 %', 'Avg topic_2 %', 'Avg topic_3 %', 'Avg topic_4 %']
summary_df


# Create a new DataFrame with only the first and last row
comparison = summary_df.iloc[[0, -1]]

# add new row with all values nan
comparison.loc[2] = [np.nan] * len(comparison.columns)

# third row quartile = 'Difference'
comparison.iloc[2, 0] = 'Delta (Q4-Q1)'
# rest of the columns in thirs row shoul dbe difference of first two rows
comparison.iloc[2, 1:] = comparison.iloc[0, 1:] - comparison.iloc[1, 1:]

# remove index
comparison.reset_index(drop=True, inplace=True)
# dont show index 
print(comparison.to_string(index=False))

     Quartile  Sum of Comments  Sum of Likes  Ratio_Comments_to_like  Avg topic_0 %  Avg topic_1 %  Avg topic_2 %  Avg topic_3 %  Avg topic_4 %
       Fourth         134739.0    13685383.0                0.009845       8.197823      16.544106      31.498794      37.941396       5.817881
        First           8566.0     1241516.0                0.006900       3.939234      20.172735      22.190429      44.109626       9.587977
Delta (Q4-Q1)         126173.0    12443867.0                0.002946       4.258589      -3.628628       9.308365      -6.168230      -3.770096


### Understanding Quartile Analysis in Influencer Marketing

Influencer marketing relies heavily on understanding audience engagement and content resonance to drive successful campaigns. Quartile analysis provides a nuanced view of how influencers perform relative to each other based on key engagement metrics, shedding light on audience preferences and content strategies.

#### Importance of Engagement Metrics

Engagement metrics such as likes and comments are crucial indicators of audience interaction and interest in influencer content. Higher engagement suggests a more receptive audience and can lead to increased brand visibility, credibility, and potential conversion rates.

#### Interpreting Quartile Analysis

1. **Quartile 4 (Fourth)**:
   - This quartile represents influencers with the highest engagement metrics, including a substantial sum of likes (13,685,383) and comments (134,739).
   - Their content predominantly revolves around topics like Beauty and Glamour (37.94%) and Fashion and Elegance (31.50%), indicating a strong resonance with audiences interested in lifestyle, fashion trends, and beauty products.
   - Recommendations for brands targeting this quartile may include collaborations on beauty products, fashion brands, and lifestyle products. Leveraging influencer content that aligns with these topics can yield high engagement and brand visibility.

2. **Quartile 1 (First)**:
   - In contrast, influencers in this quartile have lower engagement metrics, with fewer likes (1,241,516) and comments (8,566).
   - Their content is more evenly distributed across topics, with a significant emphasis on Beauty and Glamour (44.11%) and Travel and Exploration (20.17%).
   - Recommendations for brands targeting this quartile may involve collaborations focused on beauty products, travel experiences, and lifestyle brands. While the engagement metrics are lower compared to Quartile 4, influencers in this quartile still offer opportunities to reach audiences interested in beauty and travel-related content.

#### Impact of Topic Distribution on Engagement

The analysis reveals how the distribution of topics correlates with engagement metrics:
- Influencers focusing on topics like Beauty and Glamour tend to attract higher engagement, regardless of quartile.
- Quartile 4 influencers, with a heavier emphasis on Fashion and Elegance, achieve high engagement, indicating a strong connection with their audience's interests.
- Quartile 1 influencers show a more diverse content mix, with a notable emphasis on Travel and Exploration alongside Beauty and Glamour, suggesting potential opportunities to tap into travel-related brand partnerships.

#### High-IQ Recommendations

1. **Audience Segmentation**: Tailor influencer collaborations based on audience preferences within each quartile. For Quartile 4, prioritize partnerships with beauty and fashion brands, while considering lifestyle and travel brands for Quartile 1.
  
2. **Content Alignment**: Ensure brand messaging aligns with the predominant topics of the influencer's content to maximize resonance and authenticity, leading to higher engagement and brand affinity.

3. **Data-Driven Decision Making**: Continuously monitor engagement metrics and audience feedback to refine influencer strategies and optimize campaign performance over time.

By leveraging quartile analysis insights and aligning influencer partnerships with audience interests, brands can unlock the full potential of influencer marketing to drive engagement, brand awareness, and ultimately, conversions.

*Note: The provided engagement metrics are based on the analyzed data and serve to illustrate the differences between quartiles.*

In [None]:
# show random photos of each topic