In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
content = pd.read_csv('c:/Users/nathb/Downloads/Accenture/Content.csv')
Reactions = pd.read_csv('c:/Users/nathb/Downloads/Accenture/Reactions.csv')
Reactiontypes = pd.read_csv('c:/Users/nathb/Downloads/Accenture/ReactionTypes.csv')

# Cleaning and Polishing Content DATASET

In [None]:
content.head()

In [None]:
content.shape

# Shape, Null_values and Duplicates 

Null/ NaN:
In pandas, the isnull().sum() combination of methods is used to count the number of missing (null or NaN) values in each column of a DataFrame. It's a convenient way to quickly assess the amount of missing data in your dataset.

isnull(): This method is used to create a Boolean mask that identifies missing values in a DataFrame. It returns a DataFrame of the same shape as the original, where each element is True if the corresponding element in the original DataFrame is missing (NaN), and False otherwise.

sum(): After applying isnull(), you can use the sum() method on the resulting Boolean DataFrame. This method sums the True values (which represent missing values) along each column, effectively counting the number of missing values in each column.

Datatypes: 
The dtypes attribute in pandas is used to determine the data type of each column in a DataFrame. It provides information about the data type of the values stored in each column, which is important for understanding and working with your data, especially during the Exploratory Data Analysis (EDA) phase.

Duplicates:

The .duplicated().sum() combination of methods in pandas is used to count the number of duplicate rows in a DataFrame. It allows you to identify and quantify duplicate rows based on the values in all columns or a specific subset of columns.

Here's how it works:

.duplicated(): This method returns a Boolean Series that indicates whether each row is a duplicate of a previous row. It marks duplicate rows as True and non-duplicate rows as False.

.sum(): After applying .duplicated(), you can use the .sum() method on the resulting Boolean Series to count the number of True values, which correspond to duplicate rows.


In [None]:
#reviewing the shape, nulls and duplicates for conent df

print (content.isnull().sum())
print(content.dtypes)
print(content.duplicated().sum())


In [None]:
content = content.drop(columns=['Unnamed: 0','URL'])


In [None]:
content.head()

In [None]:
content = content.rename(columns={'Type':'Content Type'})

In [None]:
content['Category'] = content['Category'].str.replace('"','').str.capitalize()

In [None]:
content.head(10)

# Cleaning and Polishing Reactions Dataset

In [None]:
Reactions.head(10)

In [None]:
print(Reactions.isnull().sum())
print(Reactions.dtypes)
print(Reactions.duplicated().sum())

In [None]:
#Remove unwanted columns
Reactions = Reactions.drop(columns=['Unnamed: 0', 'User ID'])

#Rename columns for clarity
Reactions =  Reactions.rename(columns={'Type' : 'Reaction Type'})

In [None]:
Reactions.head(10)

# Dropping Null Values
Since there are null values in Reactions dataframe, we need to drop NaN values. For that we need to remove rows with missing values.
*dropna()*

In [None]:
Reactions.dropna()

In [None]:
print(Reactions)

In [None]:
Reactions=Reactions.dropna()

In [None]:
Reactions.head()

# Cleaning and Polishing ReactionTypes Dataset

In [None]:
Reactiontypes.head()

In [None]:
print(Reactiontypes.shape)
print(Reactiontypes.isnull().sum())
print(Reactiontypes.dtypes)
print(Reactiontypes.duplicated().sum())

In [None]:
#Removing the unwanted columns
Reactiontypes = Reactiontypes.drop(columns=['Unnamed: 0'])

#renaming column names for claryfication
Reactiontypes = Reactiontypes.rename(columns={'Type': 'Reaction Type'})

In [None]:
Reactiontypes.head()

# Merging the three dataframes into one
In the code you provided, on is a parameter used in the pd.merge() function to specify the column or columns on which you want to merge two DataFrames.

Here's what it signifies:

on: This parameter specifies the column(s) that serve as the key or common identifier for merging the two DataFrames. When you set on to a column name or a list of column names, the merge function uses those columns as the matching criteria to align rows between the two DataFrames. In other words, it's the column that both DataFrames share and use to combine data.

For example: one_df= pd.concat('content','Reactions', on='Content_ID')

You are merging two DataFrames, Reactions_df and Content_df, based on the column named 'Content ID'. This means that rows with the same 'Content ID' values in both DataFrames will be combined into a single row in the Content_merged DataFrame. The 'Content ID' column is acting as the key or identifier for this merge operation.

In [None]:
one_df= pd.merge(content, Reactions, on='Content ID')
Merged_df= pd.merge(one_df, Reactiontypes, on='Reaction Type')


In [None]:
Merged_df.head()

# Now making a new dataframe to store the top categoreis with scores

The code you provided is performing a grouping and aggregation operation using the groupby() method in pandas. Let's break down the code step by step:

category_score = Merged_df.groupby('Category')['Score'].sum()
Merged_df: This is assumed to be a DataFrame containing your data.

.groupby('Category'): This part of the code groups the rows in the DataFrame Merged_df based on the values in the 'Category' column. In simple terms, it's creating groups where each group corresponds to a unique category in the 'Category' column.

['Score']: After grouping by 'Category', this part specifies that you want to work with the 'Score' column for the aggregation. In other words, you are interested in summing the 'Score' values within each category group.

.sum(): Finally, you are applying the sum() function to each group of 'Score' values within each category. This will calculate the sum of scores for each category.

The result, category_score, is a pandas Series or DataFrame (depending on the structure of your original data) that shows the total sum of scores for each unique category in the 'Category' column of your original data. Each category is used as an index, and the corresponding value represents the sum of scores for that category.

In [None]:
category_score = Merged_df.groupby('Category')['Score'].sum()
print(category_score)

**Sorting the top five scores in descending order**

In [None]:
top_categories = category_score.sort_values(ascending=False)[:5]
print(top_categories)

In [None]:
Top_five = pd.DataFrame({'Category':top_categories.index, 'Score':top_categories.values})
print(Top_five)

# Converting the dataframes to xlsx files

In [None]:
%pip install openpyxl

In [None]:
with pd.ExcelWriter('data_final.xlsx') as writer:
    Merged_df.to_excel(writer, sheet_name='Merged Datasets')
    Top_five.to_excel(writer, sheet_name='Top five Categories')

In [None]:
# Specify the file path where you want to save the CSV file
file_path = 'c:/Users/nathb/Downloads/Accenture/Merged_df.csv'

# Use the to_csv() method to save the DataFrame as a CSV file
Merged_df.to_csv(file_path, index=False)  # Set index=False to exclude the index column

print(f"DataFrame has been saved as '{file_path}'.")

In [None]:
# Specify the file path where you want to save the CSV file
file_path = 'Top_five.csv'

# Use the to_csv() method to save the DataFrame as a CSV file
Top_five.to_csv(file_path, index=False)  # Set index=False to exclude the index column

print(f"DataFrame has been saved as '{file_path}'.")

In [None]:
Merged_df.head(10)

In [None]:
Merged_df= Merged_df.drop(columns=['User ID'])

In [None]:
Merged_df

In [None]:
Merged_df.isnull().sum()

In [None]:
Merged_df.duplicated().sum()

In [None]:
Merged_df.shape

In [None]:
Merged_df.dropna()

In [None]:
with pd.ExcelWriter('finaldataset.xlsx') as writer:
    Merged_df.to_excel(writer, sheet_name='Merged Datasets')
    Top_five.to_excel(writer, sheet_name='Top five Categories')

# Visualization and Analysis

In [None]:
sorted_categories = category_score.sort_values(ascending=False)
#colors = ['tomato', 'darkorange', 'gold', 'green', 'darkturquoise','mediumorchid']
fig, ax = plt.subplots(figsize=(15,6))

ax.bar(sorted_categories.index, sorted_categories)
ax.set_title('Category Scores')
ax.set_xlabel('Categories')
ax.set_ylabel('Score')
plt.xticks(rotation=45)
plt.show()

# Average Scores at different hours of the day. 

# Line Chart with highlighed peak

In [None]:
# Convert the "Datetime" column to a datetime object
Merged_df['Datetime'] = pd.to_datetime(Merged_df['Datetime'], format='%Y-%m-%d %H:%M:%S')

# average score per hour
hourly_scores = Merged_df.groupby(Merged_df['Datetime'].dt.hour)['Score'].mean()

# Find peak hours
peak_hours = hourly_scores[hourly_scores > hourly_scores.mean()].index

#average Scores
avg_score = hourly_scores.mean()




In [None]:
print (peak_hours)

In [None]:
top_peak_hours = hourly_scores.nlargest(3)

# Print the top three peak engagement hours
print("Top Three Peak Engagement Hours:")
for hour, score in top_peak_hours.items():
    print(f"Hour: {hour}, Average Score: {score:.2f}")

In [None]:
import plotly.graph_objects as go

# Assuming you have already calculated 'hourly_scores', 'peak_hours', and 'avg_score' as before

# Create a list of hour labels for the x-axis
hour_labels = [f"{h}:00" for h in range(24)]

# Create a scatter trace for the peak hours
peak_trace = go.Scatter(
    x=peak_hours,
    y=[hourly_scores[hour] for hour in peak_hours],
    mode='markers',
    marker=dict(size=10, color='red'),
    text=[f"Hour: {hour}<br>Average Score: {hourly_scores[hour]:.2f}" for hour in peak_hours],
    name='Peak Hours'
)

# Create a line trace for the average score
line_trace = go.Scatter(
    x=hourly_scores.index,
    y=hourly_scores,
    mode='lines',
    line=dict(color='blue'),
    name='Average Score'
)

# Create a dashed line for the average score
avg_trace = go.Scatter(
    x=hourly_scores.index,
    y=[avg_score] * len(hourly_scores),
    mode='lines',
    line=dict(color='green', dash='dash'),
    name='Average Score (Mean)'
)

# Create the layout for the plot
layout = go.Layout(
    title='Average Scores by Hour of Day',
    xaxis=dict(
        title='Hour of Day',
        tickvals=list(range(24)),
        ticktext=hour_labels,
        tickangle=45
    ),
    yaxis=dict(title='Average Score')
)

# Create the figure and add traces
fig = go.Figure(data=[peak_trace, line_trace, avg_trace], layout=layout)

# Show the plot
fig.show()


# Average Score per day of the week

In [None]:
#average scores per day of the week
daily_scores = Merged_df.groupby(Merged_df['Datetime'].dt.dayofweek)['Score'].mean()

#peak days
peak_days= daily_scores[daily_scores>daily_scores.mean()].index

#average Scores
avg_score = daily_scores.mean()

#create a list for the day labels i.e. x-axis

day_labels = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']

#create a scatter trace for peak days

peak_trace = go.Scatter(
    x= peak_days,
    y= [daily_scores[day] for day in peak_days],
    mode='markers',
    marker=dict(size=10, color='red'),
    text= [f"Day: {day}<br>Average Score : {daily_scores[day] : .2f}" for day in peak_days], 
    name= "Peak Days"
)

#Create a line trace for the average scores
line_trace = go.Scatter(
    x= daily_scores.index,
    y= daily_scores,
    mode= 'lines',
    line= dict(color='blue'),
    name= 'Average Score'
)

# Create a dashed line for the average score
avg_trace = go.Scatter(
    x=daily_scores.index,
    y=[avg_score] * len(hourly_scores),
    mode='lines',
    line=dict(color='green', dash='dash'),
    name='Average Score (Mean)'
)

# Create the layout for the plot
layout = go.Layout(
    title='Average Scores by Day of the week',
    xaxis=dict(
        title='Day of the week',
        tickvals=list(range(7)),
        ticktext=day_labels,
        tickangle=45
    ),
    yaxis=dict(title='Average Score')
)

# Create the figure and add traces
fig = go.Figure(data=[peak_trace, line_trace, avg_trace], layout=layout)

# Show the plot
fig.show()



In [None]:
# Find the top 3 peak days
top_peak_days = daily_scores.nlargest(3)

In [None]:
print("Top Three Peak Engagement Days:")
for day in top_peak_days.index:
    print(f"Day: {day_labels[day]}, Average Score: {top_peak_days[day]:.2f}")

# Content Type Percentages using pie chart

In [None]:
import plotly.express as px

#Count the number of content types
type_counts = Merged_df.groupby('Content Type').size()

# Create an interactive pie chart using Plotly Express
fig = px.pie(
    names=type_counts.index, 
    values=type_counts.values,
    title='Content Type Percentages', 
    hole=0.6,
    labels={'names': 'Content Type'}
)

# Customize the layout to adjust size (width and height)
fig.update_layout(
    showlegend=True,
    width=600,  # Adjust the width as needed
    height=400  # Adjust the height as needed
)

# Customize textinfo using update_traces
fig.update_traces(textinfo='percent+label')

# Show the plot
fig.show()

In [None]:
type_counts.sort_values(ascending=False)

# Number of Content Items by Categories

In [None]:
categories = Merged_df['Category'].nunique()
categories

In [None]:
category_counts= Merged_df['Category'].value_counts()
category_counts

In [None]:
# Create an interactive bar plot using Plotly Express
fig = px.bar(
    x=category_counts.index,  # Use the index as x-values (category names)
    y=category_counts.values,  # Use values as y-values (count values)
    title='Number of Content Items by Category',
    labels={'x': 'Category', 'y': 'Number of Content Items'},
    text=category_counts.values,  # Display count values on bars
)

# Customize the layout (optional)
fig.update_layout(
    xaxis_title='Category',
    yaxis_title='Number of Content Items',
    xaxis=dict(tickangle=45),
    showlegend=False
)

# Show the plot
fig.show()

In [None]:
Merged_df['Month'] = Merged_df['Datetime'].dt.month

In [None]:
import calendar
#month with the most posts
month_with_most_posts = Merged_df['Month'].value_counts().idxmax()

#converting the output which will be an integer to the name of the month.
month_name = calendar.month_name[month_with_most_posts]
print(f"The month with the most posts is: {month_name}")


In [None]:
monthly_posts = Merged_df.groupby('Month')['Content ID'].count()

# Calculate the average posts per month
avg_posts = monthly_posts.mean()

# Define month labels
month_labels = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

# Create a line chart using Plotly
fig = go.Figure()

# Add a trace for the monthly posts
fig.add_trace(go.Scatter(
    x=monthly_posts.index,
    y=monthly_posts,
    mode='lines',
    line=dict(color='blue'),
    name='Monthly Posts'
))

# Add a horizontal dashed line for the average posts
fig.add_shape(
    type='line',
    x0=monthly_posts.index[0],
    x1=monthly_posts.index[-1],
    y0=avg_posts,
    y1=avg_posts,
    line=dict(color='red', dash='dash'),
    name='Average Posts'
)

# Find and highlight peak months
peak_months = monthly_posts[monthly_posts == monthly_posts.max()].index.tolist()
fig.add_trace(go.Scatter(
    x=peak_months,
    y=[monthly_posts[i] for i in peak_months],
    mode='markers',
    marker=dict(color='red', size=10),
    name='Peak Activity'
))

# Customize the layout
fig.update_layout(
    xaxis=dict(
        title='Month',
        tickvals=list(range(1, 13)),
        ticktext=month_labels
    ),
    yaxis=dict(title='Count of Posts'),
    title='Average Scores per Month of the Year',
    showlegend=True
    
)

# Show the plot
fig.show()

In [None]:
top_peak_months = monthly_posts.nlargest(3)

In [None]:
# Print the top 3 peak months
print("Top Three Peak Months:")
for month in top_peak_months.index:
    print(f"Month: {calendar.month_name[month]}, Total Posts: {top_peak_months[month]}")

In [None]:
Merged_df.head()


In [None]:
# Extract the year from the 'Datetime' column
Merged_df['Year'] = Merged_df['Datetime'].dt.year

# Now, you can count the number of contents uploaded in a specific year (e.g., 2023)
year_to_count = Merged_df['Year'[::]]  # Change this to the desired year
total_contents_in_year = len(Merged_df[Merged_df['Year'] == year_to_count])

# Print the total number of contents uploaded in the specified year
print(f"Total contents uploaded in {year_to_count}: {total_contents_in_year}")


# Insights :
1. In our dataset, which consists of content across 16 different categories, we've uncovered some significant insights.  Our content portfolio encompasses a wide array of content categories, each with its unique appeal and significance.
2. Among these categories, we observe a range of topics that resonate strongly with our audience. Notably, the top 5 content categories that have garnered the most attention and engagement are Animals, Science, Healthy Eating, Technology, and Food. These categories appear to resonate strongly with our audience and have generated substantial interest.
3. With a substantial count of 1897 content items, the Animals category stands at the forefront, showcasing its remarkable popularity and the strong affinity our audience holds for content related to the natural world and animal kingdom.
4. In our analysis, we've identified a diverse range of content types, including photos, videos, audios, and gifs. Among these, photos emerge as the dominant content type, comprising 26.8% of the total content volume. This is closely followed by videos at 25.4%, while audios and gifs contribute 23% and 24.7%, respectively. 
5. In our analysis, May stands out as the month with the highest volume of posts making it a pivotal month in our data timeline.
6. Thursdays, Fridays, and Sundays emerge as the peak engagement days, offering prime opportunities for content impact.
7. Peak engagement hours at 15:00, 17:00, and 21:00 signify optimal windows for content delivery and interaction, strategically capitalizing on user enthusiasm throughout the day.

In [None]:
import plotly.express as px

# Sample data (replace with your actual data)
categories = ['Animals', 'Science', 'Healthy Eating', 'Technology', 'Food']
content_counts = [1897, 1796, 1717, 1698, 1699]

# Create a DataFrame from the data
data = {'Category': categories, 'Content Count': content_counts}
df = pd.DataFrame(data)

# Define a custom color palette
custom_colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd']  # You can specify your own colors here

# Create an interactive pie chart with custom colors
fig = px.pie(df, names='Category', values='Content Count', title='Top 5 Content Categories', hole=0.6, color_discrete_sequence=custom_colors)
fig.update_traces(textinfo='percent+label')

# Show the chart
fig.show()


In [None]:
Merged_df.head(1000)

In [None]:
print(Merged_df['Sentiment'].unique())

In [None]:
sentiment_mapping = {'negative': -1, 'positive': 1, 'neutral': 0}
Merged_df['Sentiment'] = Merged_df['Sentiment'].map(sentiment_mapping)

In [None]:
Merged_df.head()

In [None]:
print(sentiment_counts)

In [None]:
# Group the data by 'Category' and 'Sentiment' and count occurrences
sentiment_counts = Merged_df.groupby(['Category', 'Sentiment']).size().reset_index(name='Count')

# Create a custom color map for sentiment categories
color_map = {'positive': '#90EE90', 'negative': '#FF0000', 'neutral': '#ADD8E6'}

# Create a stacked bar chart using Plotly Express
fig = px.bar(sentiment_counts, x='Category', y='Count', color='Sentiment',
             title='Sentiment Distribution by Content Category',
             color_discrete_map=color_map)

# Customize the layout (optional)
fig.update_xaxes(title='Content Category')
fig.update_yaxes(title='Count')
fig.update_traces(marker_line_width=0, opacity=0.8)  # Customize appearance

# Show the interactive plot
fig.show()


In [None]:
# Group the sentiment_counts DataFrame by 'Category' and calculate the total count
total_counts_per_category = sentiment_counts.groupby('Category')['Count'].sum().reset_index()

# Print the total counts per category
print(total_counts_per_category)



In [None]:
import pandas as pd

# Filter the sentiment_counts DataFrame for positive sentiment
positive_sentiment_counts = sentiment_counts[sentiment_counts['Sentiment'] == 'positive']
positive_sentiment_counts = positive_sentiment_counts[['Category', 'Count']]
positive_sentiment_counts = positive_sentiment_counts.rename(columns={'Count': 'Positive Count'})

# Filter the sentiment_counts DataFrame for neutral sentiment
neutral_sentiment_counts = sentiment_counts[sentiment_counts['Sentiment'] == 'neutral']
neutral_sentiment_counts = neutral_sentiment_counts[['Category', 'Count']]
neutral_sentiment_counts = neutral_sentiment_counts.rename(columns={'Count': 'Neutral Count'})

# Filter the sentiment_counts DataFrame for negative sentiment
negative_sentiment_counts = sentiment_counts[sentiment_counts['Sentiment'] == 'negative']
negative_sentiment_counts = negative_sentiment_counts[['Category', 'Count']]
negative_sentiment_counts = negative_sentiment_counts.rename(columns={'Count': 'Negative Count'})

# Merge the three DataFrames based on the 'Category' column
combined_sentiment_counts = pd.merge(positive_sentiment_counts, neutral_sentiment_counts, on='Category', how='outer')
combined_sentiment_counts = pd.merge(combined_sentiment_counts, negative_sentiment_counts, on='Category', how='outer')

# Fill NaN values with 0
combined_sentiment_counts = combined_sentiment_counts.fillna(0)

# Print the combined sentiment counts table
print("Combined Sentiment Counts:")
print(combined_sentiment_counts)


In [None]:

# Group the data by 'Content Type' and 'Sentiment' and count occurrences
sentiment_counts = Merged_df.groupby(['Content Type', 'Sentiment']).size().reset_index(name='Count')

# Create a custom color map for sentiment categories
color_map = {'positive': '#8BC34A', 'negative': '#FF5722', 'neutral': '#757575'}

# Create a stacked bar chart using Plotly Express with custom colors
fig = px.bar(sentiment_counts, x='Content Type', y='Count', color='Sentiment',
             title='Sentiment Distribution by Content Type',
             color_discrete_map=color_map)  # Specify custom colors

# Customize the layout (optional)
fig.update_xaxes(title='Content Type')
fig.update_yaxes(title='Count')
fig.update_traces(marker_line_width=0, opacity=0.8)  # Customize appearance

# Show the interactive plot
fig.show()


In [None]:
# Group the data by 'Content Type' and sum the 'Count' column to get the total counts per Content Type
total_counts_per_content_type = sentiment_counts.groupby('Content Type')['Count'].sum().reset_index()

# Print the total counts per Content Type
print(total_counts_per_content_type)


In [None]:
# Pivot the 'sentiment_counts' DataFrame to have separate columns for positive, neutral, and negative sentiments
pivot_sentiments = sentiment_counts.pivot(index='Content Type', columns='Sentiment', values='Count').reset_index()

# Rename the columns for clarity
pivot_sentiments.columns = ['Content Type', 'Negative', 'Neutral', 'Positive']

# Print the pivot table
print(pivot_sentiments)


In [None]:
# Group the data by 'Sentiment' and count occurrences
sentiment_counts = Merged_df['Sentiment'].value_counts().reset_index()
sentiment_counts.columns = ['Sentiment', 'Count']

# Create a figure with two subplots
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))

# Create a pie chart in the first subplot
wedges, texts, autotexts = ax1.pie(sentiment_counts['Count'], labels=sentiment_counts['Sentiment'], autopct='%1.1f%%',
         colors=sns.color_palette('Set3', len(sentiment_counts)))
ax1.set_title('Distribution of Sentiments')

# Create a bar plot in the second subplot
sns.barplot(x='Sentiment', y='Count', data=sentiment_counts, ax=ax2, palette='Set3')
ax2.set_title('Sentiment Counts')

# Add labels to the pie chart
for text, autotext in zip(texts, autotexts):
    text.set(size=12, weight="bold")
    autotext.set(size=12, weight="bold")

# Adjust layout
plt.tight_layout()

# Show the plots
plt.show()


In [None]:
import plotly.express as px
import plotly.graph_objects as go

# Assuming you have already calculated 'sentiment_counts' as described in your code

# Create a pie chart
fig = px.pie(sentiment_counts, values='Count', names='Sentiment',
             title='Distribution of Sentiments',
             color_discrete_sequence=px.colors.qualitative.Set3)

# Customize the pie chart
fig.update_traces(textinfo='percent+label')
fig.update_layout(showlegend=True)  # Show legend

fig.update_layout(title_x=0.5)

# Create a bar chart
bar_fig = px.bar(sentiment_counts, x='Sentiment', y='Count',
                  title='Sentiment Counts',
                  color='Sentiment', color_discrete_sequence=px.colors.qualitative.Set3)

# Customize the bar chart
bar_fig.update_layout(showlegend=True)  # Show legend

# Show both charts side by side
fig.show()
bar_fig.show()


In [None]:
Merged_df.head()

In [None]:
# Assuming you have the sentiment_counts DataFrame with columns 'Sentiment' and 'Count'
total_counts = sentiment_counts['Count'].sum()

# Calculate the overall percentage for each sentiment category
positive_percentage = (sentiment_counts[sentiment_counts['Sentiment'] == 'positive']['Count'].sum() / total_counts) * 100
negative_percentage = (sentiment_counts[sentiment_counts['Sentiment'] == 'negative']['Count'].sum() / total_counts) * 100
neutral_percentage = (sentiment_counts[sentiment_counts['Sentiment'] == 'neutral']['Count'].sum() / total_counts) * 100

# Print the overall percentages
print(f"Overall Positive Percentage: {positive_percentage:.2f}%")
print(f"Overall Negative Percentage: {negative_percentage:.2f}%")
print(f"Overall Neutral Percentage: {neutral_percentage:.2f}%")


# User Engagement Summary:

1. Almost each post has a good count of positive sentiment which suggests that people aren't being rude to the content they see.
2. If we look over the overall sentiment or user engagement in the posts, the positive sentiments come on top, with negative being the second and neutral playing a small part in it. But the good part is that majority of the people are engaging positivly with the posts that are on the platform