# COGS 108 - Data Checkpoint

# Names

- Daniela Molina
- Gabriel Beal
- Marc Isaia
- Haoxuan Cui

<a id='research_question'></a>
# Research Question

There are currently a substantial amount of teenagers and young adults (ages 14-28) who use the top social media applications in the United States (specifically: Instagram, Facebook, Snapchat, and Twitter.) Taking that into consideration, does extensive usage (2+ hours/day?) of the aforementioned applications by these users generate an onset of symptoms that correspond to the clinical criteria of an Anxiety Disorder and/or Depressive Disorder?

# Dataset(s)

CSV from the Google Form Survey we conducted:
- Dataset Name: Social Media and Mental Health
- Link to the dataset: https://raw.githubusercontent.com/COGS108/group036_wi21/main/Social%20Media%20and%20Mental%20Health.csv?token=AJJHPBN3BQHNXLMZPWR5UZTAFWDK2
- Number of Observations: 182
This dataset is the CSV file created from the responses received on our survey which collected data on social media use and mental health. It contains information about the individual’s identity, their social media usage (average time spent, which apps they use, when they first started using them), and their current mental health as well as whether they feel that it can be attributed to social media.

# Setup

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

In [2]:
df = pd.read_csv("Social_Media_and_Mental_Health.csv")

In [3]:
df.head()

Unnamed: 0,Timestamp,What is your preferred gender identity?,What is your age? (In number form),Do you use social media?,How frequently do you use social media per day?,How often do you use social media per week?,Is social media the first thing you check in the morning?,Is social media the last thing you check before going to bed?,Which of the following social media apps do you use?,When did you first start using social media?,Rate how you feel social media has affected your mental health.,"Which of the following mental health issues, if any, do you identify with?",Have you ever had suicidal thoughts?,Do you feel social media has contributed to your previous answers?,"Are there any questions, comments, concerns you would like to share? (anonymously)"
0,2021/02/03 1:06:21 PM PST,Female,22,Yes,2-3 hours,3-5 times a week,Yes,No,Snapchat;TikTok,Middle School,6,Depression;Anxiety Disorder;Eating Disorder;Sl...,Yes,Maybe,
1,2021/02/03 1:08:09 PM PST,Female,22,Yes,1-2 hours,I use it every day,No,Yes,Facebook;Instagram;Twitter;TikTok,High School,7,Depression;Anxiety Disorder,No,Maybe,
2,2021/02/03 1:08:37 PM PST,Female,21,Yes,4+ hours,I use it every day,Yes,Yes,Instagram;Snapchat;Twitter;TikTok,Elementary School,5,Depression,No,Yes,No questions :)
3,2021/02/03 1:11:37 PM PST,Female,19,Yes,4+ hours,3-5 times a week,Yes,Yes,Instagram;Snapchat;Twitter;TikTok,Elementary School,3,Depression;Anxiety Disorder;Loneliness;Attenti...,Yes,Yes,
4,2021/02/03 1:16:07 PM PST,Female,21,Yes,2-3 hours,I use it every day,Yes,Yes,Instagram;Snapchat,High School,4,Loneliness,No,Yes,


# Data Cleaning

Describe your data cleaning steps here.

In [4]:
# Rename columns
df.columns = ['time', 'gender', 'age', 'use social media', 'hours per day', 'days per week', 'check morning', 'check night', 'apps', 'start using', 'impact', 'mental health issues', 'suicidal thoughts', 'SM contributed', 'feedback']
df.head()

Unnamed: 0,time,gender,age,use social media,hours per day,days per week,check morning,check night,apps,start using,impact,mental health issues,suicidal thoughts,SM contributed,feedback
0,2021/02/03 1:06:21 PM PST,Female,22,Yes,2-3 hours,3-5 times a week,Yes,No,Snapchat;TikTok,Middle School,6,Depression;Anxiety Disorder;Eating Disorder;Sl...,Yes,Maybe,
1,2021/02/03 1:08:09 PM PST,Female,22,Yes,1-2 hours,I use it every day,No,Yes,Facebook;Instagram;Twitter;TikTok,High School,7,Depression;Anxiety Disorder,No,Maybe,
2,2021/02/03 1:08:37 PM PST,Female,21,Yes,4+ hours,I use it every day,Yes,Yes,Instagram;Snapchat;Twitter;TikTok,Elementary School,5,Depression,No,Yes,No questions :)
3,2021/02/03 1:11:37 PM PST,Female,19,Yes,4+ hours,3-5 times a week,Yes,Yes,Instagram;Snapchat;Twitter;TikTok,Elementary School,3,Depression;Anxiety Disorder;Loneliness;Attenti...,Yes,Yes,
4,2021/02/03 1:16:07 PM PST,Female,21,Yes,2-3 hours,I use it every day,Yes,Yes,Instagram;Snapchat,High School,4,Loneliness,No,Yes,


In [5]:
df.shape

(182, 15)

In [6]:
# Remove responses with age over 28
df = df[df['age'] <= 28]

In [7]:
# Change yes and no to true and false
df = df.replace({"Yes": True, "No": False})

In [8]:
# Drop anyone who doesn't use social media - there was only one within the age range
df.drop(df[df['use social media'] == False].index, inplace = True)

In [9]:
# Drop time column - not relevant to the research
df.drop(columns=['time'], inplace = True)

In [10]:
df["hours per day"] = df["hours per day"].str.replace(" hours", "")

In [11]:
df = df.replace({"Once a week": "1", "2-3 times a week": "2-3", "3-5 times a week": "3-5", "I use it every day":"7"})

In [12]:
df = df.replace({"Non-Binary ": "Non-binary", "Female- heads up this is worded ambiguously and people may interpret it as what gender they're attracted to": "Female"})

In [13]:
df['age'] = pd.to_numeric(df['age'])
df['impact'] = pd.to_numeric(df['impact'])

In [14]:
df.dtypes

gender                  object
age                      int64
use social media          bool
hours per day           object
days per week           object
check morning           object
check night             object
apps                    object
start using             object
impact                   int64
mental health issues    object
suicidal thoughts       object
SM contributed          object
feedback                object
dtype: object

# Project Proposal (updated)

| Meeting Date  | Meeting Time| Completed Before Meeting  | Discuss at Meeting |
|---|---|---|---|
| 1/20  |  12:30 PM | Read & Think about COGS 108 expectations; brainstorm topics/questions  | Determine best form of communication; Discuss and decide on final project topic; discuss hypothesis; begin background research | 
| 1/27  |  12:30 PM |  Think about topic of Mental Health and Social Media | Discuss ideal dataset(s) and ethics; draft project proposal and submit | 
| 2/3  | 12:30 PM  | Search for datasets  | Discuss Wrangling and possible analytical approaches; Assign group members to lead each specific part   |
| 2/10  | 12:30 PM  | Import & Wrangle Data | Review/Edit wrangling/EDA; Discuss Analysis Plan   |
| 2/17  | 12:30 PM  | Finalize wrangling/EDA; Begin Analysis | Discuss/edit Analysis; Complete project check-in |
| 2/24  | 12:30 PM  | Continue analysis | Discuss ideal Data Visualization |
| 3/3  | 12:30 PM  | Finalize Data Visualization | Discuss potential conclusions/results |
| 3/10  | 12:30 PM  | Think about conclusions/results | Split final writeup duties |
| 3/15  | 12:30 PM  | Complete analysis; Draft results/conclusion/discussion | Discuss/edit full project |
| 3/17  | Before 11:59 PM  | Finalize writeup | Turn in Final Project & Group Project Surveys |