## - Introduction & Explanation

The survey serves as a primary data source designed to investigate the impact of social media on mental health .The survey contains valuable information on various aspects of social media usage, including demographics such as age, gender, location, educational level, marital and employment status. It also includes detailed questions regarding the platforms respondents use, the time spent on social media, and the emotional and mental effects of their usage. The survey explores key areas such as anxiety, focus, and productivity, and investigates coping mechanisms like app restrictions and limiting device usage, and it aims to capture real-world experiences and perceptions related to social media usage .

The survey questions were selected with the assistance of specialists to ensure that the collected data would accurately and clearly serve the research goals and it was published in Arabic and later translated into English, to obtain the largest number of responses .

## - Motivation and Goals

The motivation behind choosing a survey stems from its effectiveness in gathering diverse and personal insights from individuals across various demographics. 

The goals include:

* Understanding social media usage: To explore how long people  spend on social media daily, and which apps are most popular.
* Analyzing mental and emotional effects: To examine how social media impacts focus, stress, anxiety, and interactions with others.
* Increasing data richness: One of the reasons for selecting a survey is to gather more data, enrich knowledge, and learn how to manage and process survey data effectively.
* Comparative analysis: The research aims to compare the survey data collected from different regions of  Saudi Arabia with secondary data ( Kaggle dataset) to understand the differences in social media's impact on mental health between Saudi Arabia and another foreign country.
* Reliable, realistic insights: By using both primary and secondary data, the study strives for accurate, reliable, and realistic results.

| Column Name        | Description                                                             | Data Type     | Possible Values                      |
|:-------------------|:------------------------------------------------------------------------|:-------------:|:-------------------------------------:|
| `the age`             | The age of the respondent (in  a range of a years).                     | Integer range       |  (e.g.,13-17, 18-24) |
| `Gender`             | The gender of the respondent.                                   | Categorical      |  (e.g., Male, Female)   |
| `Area`          | The geographic area or city where the respondent resides                 | Categorical   | (e.g., Riyadh, Jeddah). |
| `Current educational level`     | The respondent's current level of education.      | Categorical      | (e.g., High school or equivalent, Bachelor's degree).             |
| `marital status`    | The respondent's marital status .    | Categorical  |              |
| `Employment status`        | The employment status of the respondent           | Categorical       | (e.g., Student, Full time employee, Not employed).         |
| `Do you use social media applications?`          | Whether the respondent uses social media applications . | Binary (Yes, No)     | (Yes/No)             |
| `What social media platforms do you use?`         | The social media platforms the respondent uses .| Categorical/List      | (e.g., Instagram, Twitter)             |
| `What app do you use the most?`         | The social media application the respondent uses the most .           | Categorical      | (e.g., TikTok, Instagram).             |
| `How many hours do you spend on social media platforms daily?`           | he number of hours the respondent spends on social media each day .             | Categorical       | (e.g., Less than 1 hour, 12 hours or more)             |
| `Do you feel anxious or stressed after reading negative comments on your posts?`         | The respondent's frequency of feeling anxious or stressed after reading negative comments .           | Categorical      | (e.g., Rarely, sometimes)            |
| `Are you worried about missing out on important information or events when you're not using social media?`          | Whether the respondent feels concerned about missing out on important updates from social media.           | Categorical      | (e.g., Rarely, sometimes)             |
| `Do you feel that using social media has affected your ability to focus and accomplish daily tasks?`         | Whether the respondent feels that social media has affected their ability to focus and accomplish daily tasks .           | Categorical       |       (e.g., Yes, a lot)       |
| `Do you think that consuming quick content has affected your patience and ability to deal with long tasks?`             | The respondent's opinion on whether quick content (e.g., stories, reels) affects their attention span.                                   | Categorical       | (e.g., Yes, a lot)  |
| `Do you use social media right before going to sleep?`             | Whether the respondent uses social media right before going to bed .                                   | Categorical       | (e.g., Yes, always)  |
| `Do you have difficulty sleeping because of thinking about what you saw on social media platforms?`             | Whether the respondent experiences sleep difficulties due to social media use.                                   | Categorical       | (e.g., Yes, always)    |
| `Does the number of likes or comments you get on your posts affect you?`             | Whether the number of likes or comments the respondent receives impacts their self-worth.                                   | Categorical      | (e.g., Rarely, sometimes)   |
| `Have you changed your opinion or feeling based on the reactions of others on social media platforms?`      | does the respondent has changed their opinion based the reactions of others on social media platforms media                                  | Categorical      | (e.g., Rarely, sometimes)   |
| `Do you prefer interacting with friends or family online rather than face-to-face?`             | Whether the respondent prefers interacting with people online over in-person interactions.                | Categorical      | (e.g., Rarely, sometimes)  |
| `How often do you find yourself using social media for longer than you planned?`             | How frequently the respondent uses social media for longer than you planned                 |Categorical      | (e.g., Rarely, sometimes)   |
| `How do you feel when you compare your life to the lives of others on social media?`             | The respondent's feelings when comparing their life to others on social media             | Text       |  (e.g., Feeling natural,Inspiration,Envy)   |
| `What methods, if any, do you use to limit your social media access?`             | The methods the respondent uses to control or limit their social media use .         | Text        | (e.g., Time limits, None)  |

# Cleaning data

### 1-Dataset Sample:

In [2]:
import pandas as pd
survey_data = pd.read_csv('SurveyData.csv')


print("Sample Data:")
print(survey_data.head())


Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


Sample Data:
  the age:   Gender:   Area: Current educational level: marital status:  \
0    13-17  feminine  Riyadh  High school or equivalent        bachelor   
1    18-24      male  Riyadh          Bachelor's degree        bachelor   
2    18-24  feminine  Riyadh          Bachelor's degree        bachelor   
3    18-24  feminine  Riyadh  High school or equivalent        bachelor   
4    35-44  feminine  Riyadh          Bachelor's degree         married   

      Employment status: Do you use social media applications?  \
0                student                                   Yes   
1           Not employed                                   Yes   
2                student                                   Yes   
3                student                                   Yes   
4  Housewife, unemployed                                   Yes   

             What social media platforms do you use?  \
0  Instagram, X (Twitter), TikTok, Snapchat, Yout...   
1  Instagram, X (Twitter), 

### 2-Ignoring the Last two Columns: 

In this step, we are ignoring the last two columns now because they will be used later in the preprocessing stage with algorithms such as sentiment analysis and tokenization


In [3]:
survey_data = survey_data.drop(survey_data.columns[-2:], axis=1)

print("Data after ignoring (deleting) the last two columns:")
print(survey_data.head())

Data after ignoring (deleting) the last two columns:
  the age:   Gender:   Area: Current educational level: marital status:  \
0    13-17  feminine  Riyadh  High school or equivalent        bachelor   
1    18-24      male  Riyadh          Bachelor's degree        bachelor   
2    18-24  feminine  Riyadh          Bachelor's degree        bachelor   
3    18-24  feminine  Riyadh  High school or equivalent        bachelor   
4    35-44  feminine  Riyadh          Bachelor's degree         married   

      Employment status: Do you use social media applications?  \
0                student                                   Yes   
1           Not employed                                   Yes   
2                student                                   Yes   
3                student                                   Yes   
4  Housewife, unemployed                                   Yes   

             What social media platforms do you use?  \
0  Instagram, X (Twitter), TikTok, Snapchat

### 3-Deleting unnecessary Rows:

In this step, we will delete all rows where the answer is 'No' in the "Do you use social media applications?" column, as these responses are not relevant to our analysis

In [4]:
survey_data = survey_data[survey_data["Do you use social media applications?"] != "No"]
print("Data after deleting rows with 'No' in the 'Do you use social media applications?' column:")
social_media_answers_sample = survey_data["Do you use social media applications?"].head(10)
print(social_media_answers_sample)

Data after deleting rows with 'No' in the 'Do you use social media applications?' column:
0    Yes
1    Yes
2    Yes
3    Yes
4    Yes
5    Yes
6    Yes
7    Yes
8    Yes
9    Yes
Name: Do you use social media applications?, dtype: object


### 4-Checking missing values:

In [5]:
missing_values = survey_data.isnull().sum()

print("Missing values in each column:")
print(missing_values)

if missing_values.sum() == 0:
    print("\nThere are no missing values in the dataset.")
else:
    print("\nThere are missing values in the dataset. Please review the above output for details.")


Missing values in each column:
the age:                                                                                                                                                               0
Gender:                                                                                                                                                                0
Area:                                                                                                                                                                  0
Current educational level:                                                                                                                                             0
marital status:                                                                                                                                                        0
Employment status:                                                                                                          

### 5- Dealing with "#VALUE!" 

Now, we are going to check if the value "#VALUE!" occurs in the dataset, as it represents unanswered columns in the survey. 

In [6]:
import pandas as pd

value_error_count = (survey_data == "#VALUE!").sum().sum()

if value_error_count > 0:
    print(f"Found {value_error_count} occurrences of '#VALUE!' in the dataset. Deleting them now...")
    survey_data = survey_data.replace("#VALUE!", pd.NA).dropna()
    print("Rows with '#VALUE!' have been deleted.")
else:
    print("No occurrences of '#VALUE!' found in the dataset.")

print("\nUpdated data sample after cleaning:")
print(survey_data.head())

Found 156 occurrences of '#VALUE!' in the dataset. Deleting them now...
Rows with '#VALUE!' have been deleted.

Updated data sample after cleaning:
  the age:   Gender:   Area: Current educational level: marital status:  \
0    13-17  feminine  Riyadh  High school or equivalent        bachelor   
1    18-24      male  Riyadh          Bachelor's degree        bachelor   
2    18-24  feminine  Riyadh          Bachelor's degree        bachelor   
3    18-24  feminine  Riyadh  High school or equivalent        bachelor   
4    35-44  feminine  Riyadh          Bachelor's degree         married   

      Employment status: Do you use social media applications?  \
0                student                                   Yes   
1           Not employed                                   Yes   
2                student                                   Yes   
3                student                                   Yes   
4  Housewife, unemployed                                   Yes   

    

# Pre-Prossesing tasks \

### Sample Dataset

In [7]:
print("Sample Data:")
print(survey_data.head())

Sample Data:
  the age:   Gender:   Area: Current educational level: marital status:  \
0    13-17  feminine  Riyadh  High school or equivalent        bachelor   
1    18-24      male  Riyadh          Bachelor's degree        bachelor   
2    18-24  feminine  Riyadh          Bachelor's degree        bachelor   
3    18-24  feminine  Riyadh  High school or equivalent        bachelor   
4    35-44  feminine  Riyadh          Bachelor's degree         married   

      Employment status: Do you use social media applications?  \
0                student                                   Yes   
1           Not employed                                   Yes   
2                student                                   Yes   
3                student                                   Yes   
4  Housewife, unemployed                                   Yes   

             What social media platforms do you use?  \
0  Instagram, X (Twitter), TikTok, Snapchat, Yout...   
1  Instagram, X (Twitter), 

### 1- Data Transformation 

Here, we identified and replaced inconsistent or incorrect values in our dataset. For example, we replaced "feminine" with "Female" in the Gender column and corrected city names like "grandmother" to "Jeddah," "the news" to "Khobar," and "City" to "Madinah" in the City column.



In [8]:
# Replace "feminine" with "Female" in the Gender column
survey_data['Gender:'] = survey_data['Gender:'].replace("feminine", "Female")

# Replace incorrect city names in the City column
survey_data['Area:'] = survey_data['Area:'].replace({
    "grandmother": "Jeddah", 
    "the news": "Khobar", 
    "City": "Madinah"
})

# Display a sample of the updated dataset
print("\nUpdated data sample after replacing incorrect words:")
print(survey_data[['Gender:', 'Area:']].head(250))



Updated data sample after replacing incorrect words:
    Gender:   Area:
0    Female  Riyadh
1      male  Riyadh
2    Female  Riyadh
3    Female  Riyadh
4    Female  Riyadh
..      ...     ...
247  Female  Riyadh
248  Female  Jeddah
249  Female  Riyadh
250    male  Riyadh
251  Female    Abha

[250 rows x 2 columns]


### 2- Range(1-5):

Now, we are converting the survey responses into a numerical range from 1 to 5. This step ensures that the data is standardized and ready for analysis. 

In [9]:
columns_to_convert = [
    "Do you feel anxious or stressed after reading negative comments on your posts?",
    "Are you worried about missing out on important information or events when you're not using social media?", 
    "Do you feel that using social media has affected your ability to focus and accomplish daily tasks?", 
    "Do you think that consuming quick content (such as watching short videos and push notifications...) has affected your patience and ability to deal with long tasks?", 
    "Do you use social media right before going to sleep?", 
    "Do you have difficulty sleeping because of thinking about what you saw on social media platforms?", 
    "Does the number of likes or comments you get on your posts affect you?", 
    "Have you changed your opinion or feeling based on the reactions of others on social media platforms?", 
    "Do you prefer interacting with friends or family online rather than face-to-face?", 
    "How often do you find yourself using social media for longer than you planned?"
]

response_mapping = {
    "Yes, always": 1,
    "always": 1,
    "Yes, a lot": 2,
    "often": 2,
    "sometimes": 3,
    "Rarely": 4,
    "rarely": 4,
    "No, never": 5,
    "never": 5
}

for column in columns_to_convert:
    survey_data[column] = survey_data[column].replace(response_mapping)

print("\nUpdated data sample with answers converted to range 1-5:")
print(survey_data[columns_to_convert].head(10))


  survey_data[column] = survey_data[column].replace(response_mapping)



Updated data sample with answers converted to range 1-5:
  Do you feel anxious or stressed after reading negative comments on your posts?  \
0                                                  4                               
1                                                  5                               
2                                                  1                               
3                                                  3                               
4                                                  5                               
5                                                  3                               
6                                                  1                               
7                                                  5                               
8                                                  4                               
9                                                  3                               

   Are you worrie

# Decision Made

During our project on the impact of social media on mental health, we made several key decisions to ensure the accuracy and effectiveness of the study. These decisions were crucial in designing the survey and analyzing the data in a way that aligns with the research objectives. Below are the most significant decisions we made:

1. **Consulting Psychological Experts**:

We decided to reach out to psychology experts to ensure our survey questions were accurate and relevant to mental health.

2. **Survey Language**:

We chose to distribute the survey in Arabic, as our target audience is Saudi, and using their native language would facilitate responses and increase participation.

3. **Excluding Non-Social Media Users**:

We excluded 10 participants out of 852 who indicated they don’t use social media, as their small number made them statistically insignificant.

4. **Converting Responses to a Numeric Scale**:

We converted qualitative responses to a 1-4 range to align with secondary data from Kaggle, which uses the same scale for easier comparison.

### Challenges and how we over come them

During the survey, we encountered several challenges throughout the process. These challenges ranged from technical issues concerning data quality to difficulties participants faced while completing the survey. Each obstacle required careful consideration and strategic solutions to ensure the validity and reliability of the collected data. Addressing these challenges was crucial for preserving the integrity of the research and obtaining meaningful results.

Here are the main challenges we faced:

1. **Questionnaire Design:** One of the significant challenges was developing a questionnaire with valid and effective questions that addressed the research questions. It was essential to ensure that the questions were clear and straightforward to avoid any confusion for the respondents. To tackle this, we collaborated with experts to review the questions, ensuring they aligned with the research objectives.

2. **Questionnaire Fatigue:** This is a common issue, as lengthy questionnaires can cause participants to lose interest. To mitigate this, we designed a concise and focused questionnaire that included essential questions, dividing it into sections. The first section collected demographic data, followed by the primary research questions.

3. **Misunderstanding of Questions:** Another challenge was the risk of participants misunderstanding the questions, which could result in inaccurate or irrelevant answers. To alleviate this, we employed simple language and provided examples for questions that required further clarification, making it easier for participants to understand and respond accurately.

4. **Sample Diversity:** Achieving diversity within the sample across various demographics was also a challenge. To address this, we utilized multiple social media platforms, which allowed us to reach a broader audience.

After completing the survey and gathering a substantial number of responses, we began to verify the results and encountered the following issues:

**Unanswered Questions:** A significant issue was the presence of unanswered questions, with many participants skipping certain items, leading to incomplete data. This posed a risk to the accuracy and reliability of the results. To resolve this, we identified and removed rows with #VALUE during the data cleaning phase, ensuring the integrity of the remaining dataset was preserved.

**Literal Translation Issues:** Throughout the project, we faced challenges with the literal translation of documents from Arabic to English, as noted in the pre-processing phase. Certain phrases and terms often failed to convey the intended meaning in English, resulting in confusion and misinterpretation of key concepts. To tackle this, we identified and corrected any inconsistent or incorrect values in our dataset, ensuring the original context and clarity were maintained. Additionally, we conducted thorough reviews of the translated content to identify and rectify any phrases that could lead to misunderstandings, thereby enhancing the overall quality of the document.



## Recommendations:
Improve Participant Engagement- Increase Sample Size and Diversity- Employ More Detailed Data Analysis-

## Impact and Implications:
The findings demonstrate that social media has both positive and negative effects on mental health. While it fosters social interaction and support, it also poses risks such as anxiety and depression. These insights can inform the development of mental health interventions, including educational campaigns and support programs tailored for social media users.