<a href="https://colab.research.google.com/github/VildanaRazumova/VildanaRazumova/blob/main/Social_media_usage_and_Emotional_Well_Being.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Social media usage and Emotional Well-Being

## The following information:

* **Client:** Municipal Social Department
* **The main idea:** this analysis will help our client to further understand the emotional well-being of a group of people who are involved in social media life.

>For instance, the client would like to know how the use of social media affects emotional well-being, which group of people is more active on social media, and the dependence between the time spent on social media and people's mood.




---

## The [Dataset from Kaggle](https://www.kaggle.com/code/emirhanai/predicting-emotional-well-being-from-social-media) includes:


> **User_ID:** Unique identifier for the user.

> **Age:** Age of the user.

> **Gender:** Gender of the user (Female, Male, Non-binary).

> **Platform:** Social media platform used (e.g., Instagram, Twitter, Facebook, LinkedIn, Snapchat, Whatsapp, Telegram).

> **Daily_Usage_Time (minutes):** Daily time spent on the platform in minutes.

> **Posts_Per_Day:** Number of posts made per day.

> **Likes_Received_Per_Day:** Number of likes received per day.

> **Comments_Received_Per_Day:** Number of comments received per day.

> **Messages_Sent_Per_Day:** Number of messages sent per day.

> **Dominant_Emotion:** User's dominant emotional state during the day (e.g., Happiness, Sadness, Anger, Anxiety, Boredom, Neutral).

###As a *Data Analyst* I will be able to analyze the data through the following steps in this notebook:###

1. Loading Required Libraries
1. Load the Dataset
1. Clean the Dataset
1. Analyze Dataset
1. Other insights, which will be helpful and usefull
1. Overall conclusion

My analysis helps our clients get all the information they need to achieve their business goals.



---

*Loading Libraries Pandas, Numpy and Drive for reading dataset uploaded to Google Drive*

In [275]:
import pandas as pd
import numpy as np
from google.colab import drive

*Importing the Dataset from Google Drive, viewing top 5 rows*

In [291]:
drive.mount('/content/drive', force_remount=True)
df = pd.read_csv('/content/drive/My Drive/Colab Notebooks/Emotional Well-Being/val.csv',
                 sep=';')
df.head()

Mounted at /content/drive


Unnamed: 0,User_ID,Age,Gender,Platform,Daily_Usage_Time (minutes),Posts_Per_Day,Likes_Received_Per_Day,Comments_Received_Per_Day,Messages_Sent_Per_Day,Dominant_Emotion,Unnamed: 10
0,,,,,,,,,,,
1,10.0,31.0,Male,Instagram,170.0,5.0,80.0,20.0,35.0,Happiness,
2,,,,,,,,,,,
3,877.0,32.0,Female,Instagram,155.0,6.0,75.0,25.0,38.0,Happiness,
4,,,,,,,,,,,


# TASK 1: Checking for Missing Values and Data Cleaning

*Cleaning dataset*

In [277]:
df.isnull().sum()

Unnamed: 0,0
User_ID,148
Age,148
Gender,148
Platform,148
Daily_Usage_Time (minutes),148
Posts_Per_Day,148
Likes_Received_Per_Day,148
Comments_Received_Per_Day,148
Messages_Sent_Per_Day,148
Dominant_Emotion,148


*Deleting completely empty rows and columns*

In [278]:
df = df.dropna(how='all')
df = df.dropna(how='all', axis=1)

df.head()

Unnamed: 0,User_ID,Age,Gender,Platform,Daily_Usage_Time (minutes),Posts_Per_Day,Likes_Received_Per_Day,Comments_Received_Per_Day,Messages_Sent_Per_Day,Dominant_Emotion
1,10.0,31.0,Male,Instagram,170.0,5.0,80.0,20.0,35.0,Happiness
3,877.0,32.0,Female,Instagram,155.0,6.0,75.0,25.0,38.0,Happiness
5,230.0,26.0,Non-binary,Facebook,45.0,1.0,8.0,4.0,12.0,Sadness
7,876.0,28.0,Non-binary,Snapchat,115.0,3.0,38.0,18.0,27.0,Anxiety
9,376.0,28.0,Non-binary,Snapchat,115.0,3.0,38.0,18.0,27.0,Anxiety


*Checking the data types from Dataset*

In [279]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 148 entries, 1 to 295
Data columns (total 10 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   User_ID                     148 non-null    float64
 1   Age                         148 non-null    float64
 2   Gender                      148 non-null    object 
 3   Platform                    148 non-null    object 
 4   Daily_Usage_Time (minutes)  148 non-null    float64
 5   Posts_Per_Day               148 non-null    float64
 6   Likes_Received_Per_Day      148 non-null    float64
 7   Comments_Received_Per_Day   148 non-null    float64
 8   Messages_Sent_Per_Day       148 non-null    float64
 9   Dominant_Emotion            148 non-null    object 
dtypes: float64(7), object(3)
memory usage: 12.7+ KB


  # TASK 2: Overview Data

**Overview Data:** *we understand mean, min, max of our numerical data and and the most frequently occurring value in the columns “gender”, “platform”, “emotion ”*

In [280]:
df[
    ['Age', 'Daily_Usage_Time (minutes)', 'Posts_Per_Day',
     'Likes_Received_Per_Day','Comments_Received_Per_Day',
     'Messages_Sent_Per_Day']
    ].describe().loc[['mean', 'min', 'max']]

Unnamed: 0,Age,Daily_Usage_Time (minutes),Posts_Per_Day,Likes_Received_Per_Day,Comments_Received_Per_Day,Messages_Sent_Per_Day
mean,27.304054,96.209459,3.533784,39.378378,15.081081,22.641892
min,21.0,30.0,1.0,5.0,2.0,10.0
max,35.0,210.0,10.0,110.0,40.0,50.0


In [281]:
df[
    ['Gender', 'Platform', 'Dominant_Emotion']
    ].describe(include=['object']).loc['top']

Unnamed: 0,top
Gender,Female
Platform,Instagram
Dominant_Emotion,Anxiety


*The Top-level analysis shows that the main gender: Female, the most used platform: Instagram and dominant emotion: Anxiety. Additionally, we see that the average age 27 and the minimum usage time 30 minutes.*

# TASK 3: The ranking of the platform

In [282]:
def all_platforms(df):
  # ranking by users and avg daily usage time
  ranking_platform = df['Platform'].value_counts(ascending=False)
  avr_spent_time = (
      df.groupby('Platform')['Daily_Usage_Time (minutes)']
      .mean()/60
  ).astype(int)

  table = pd.DataFrame({
      'Users Count': ranking_platform,
      'Avr Daily Usage Time (hour)':avr_spent_time
  })

  table = table.sort_values(by='Users Count', ascending=False)
  # descending order by Users Count in the final table
  return table

all_platforms(df)

Unnamed: 0_level_0,Users Count,Avr Daily Usage Time (hour)
Platform,Unnamed: 1_level_1,Unnamed: 2_level_1
Instagram,38,2
Facebook,35,1
Twitter,25,1
LinkedIn,19,1
Snapchat,19,1
Telegram,7,1
Whatsapp,5,1


# TASK 4: Finding out the dependence on mood people spend on social media.

In [283]:
df_depend = df.groupby('Dominant_Emotion')[
    ['Daily_Usage_Time (minutes)',
    'Messages_Sent_Per_Day',
    'Posts_Per_Day',
    'Comments_Received_Per_Day']
].mean().astype(int)

df_depend = df_depend.sort_values(by='Daily_Usage_Time (minutes)', ascending=False)

df_depend

Unnamed: 0_level_0,Daily_Usage_Time (minutes),Messages_Sent_Per_Day,Posts_Per_Day,Comments_Received_Per_Day
Dominant_Emotion,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Happiness,148,31,5,24
Agression,140,29,6,21
Anger,96,25,4,20
Anxiety,91,22,3,14
Sadness,82,22,3,13
Neutral,76,16,2,9
Boredom,71,15,1,8


*This spreadsheet illustrates that the people could be very Happy and very Unhapyy after high social media activity.*

# TASK 5: Percentage of emotions

In [284]:
df_emotions = df.groupby('Dominant_Emotion')['User_ID'].count()
df_emotions_percentage = (df_emotions /
                           df['User_ID'].nunique() * 100
                          ).round(0).astype(int)

df_emotions_percentage.sort_values(ascending=False)

Unnamed: 0_level_0,User_ID
Dominant_Emotion,Unnamed: 1_level_1
Anxiety,24
Neutral,22
Happiness,21
Sadness,19
Boredom,13
Anger,10
Agression,1


*Additionally, as we can see, the percentage of negative emotions is the highest*

# TASK 6: The Average age of users and platform type

In [285]:
age_platform = df.groupby('Platform')['Age'].mean()

for platform, age in age_platform.items():
     print(f'{platform}, {int(age)}')

Facebook, 25
Instagram, 28
LinkedIn, 27
Snapchat, 26
Telegram, 30
Twitter, 26
Whatsapp, 27


# TASK 7: Adding the Gender of users to the table above


In [286]:
# the most popular gender at the platform
gender_platform = df.groupby('Platform')['Gender'].agg(
    lambda popular_gender: popular_gender.mode()[0]
)

# concatination of 2 Series to 1 DataFrame
df_gender_age_platform = pd.concat(
    [age_platform.astype(int), gender_platform], axis=1
    ).sort_values(by='Age',
    ascending=False)

df_gender_age_platform

Unnamed: 0_level_0,Age,Gender
Platform,Unnamed: 1_level_1,Unnamed: 2_level_1
Telegram,30,Male
Instagram,28,Female
LinkedIn,27,Male
Whatsapp,27,Female
Snapchat,26,Non-binary
Twitter,26,Female
Facebook,25,Non-binary


# TASK 8: Checking teenagers in the Dataset as array


In [287]:
df_array = np.array(df)
teenager_age = df['Age'].values <= 18

if teenager_age.any():
    print('We have teenagers')
else:
    print('No teenagers')


No teenagers


___

###Summary:

The dataset had some problems with empty rows and columns, but I could clean it.

The dataset includes only adults, an average 27 years old. The dataset `"Social Media Usage and Emotional Well-Being"` shows that the most sensitive people actively using social media experience Anxiety. They spend more than an 2 hours on social media, most of whom are Women.

In addition, we see that people have mostly three main emotions: **"Happiness"**, **"Aggression"** and **"Anger"**. This insight illustrates that social media content influences us directly, and consuming information changes our moods and emotions, and in the end, influences our whole life.

If I saw what content they used, I would be able to analyze the dynamics of people's emotions and the correlation between them.

___


---

##Converting to HTML



In [292]:
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [293]:
!jupyter nbconvert --to html "/content/drive/MyDrive/Social media usage and Emotional Well-Being.ipynb"

[NbConvertApp] Converting notebook /content/drive/MyDrive/Social media usage and Emotional Well-Being.ipynb to html
[NbConvertApp] Writing 672759 bytes to /content/drive/MyDrive/Social media usage and Emotional Well-Being.html
