# CSMODEL S11 | Project Phase 1
This notebook is the work of, Group 4, consisting of the following members:

* CARNEY, JOHN PAUL COMPANIA
* GUERRRERO, MIGUEL ALFONSO DAVID
* REINANTE, CHRISTIAN VICTOR GO
* SALVADOR, JARYLL FRANCIS PENA

## Dataset Description
This project makes use of the [Online Gaming Anxiety Data Set](https://www.kaggle.com/datasets/divyansh22/online-gaming-anxiety-data). It contains responses gathered from a worldwide survey of gamers. Included in this survey are psychological assessments for anxiety, social phobia, and life satisfaction. It also gathered demographic and gaming-related information. Marian Sauter and Dejan Draschkow originally compiled the data.


## Importing Libraries
Before proceeding, we will import the necessary libraries which we will use to provide a general overview of the dataset.

In [4]:
import numpy as np
import pandas as pd

## Importing Libraries
We then load the dataset as follows:

In [6]:
gamingAnxiety_df = pd.read_csv("GamingStudy_data.csv")
gamingAnxiety_df.head()

Unnamed: 0,S. No.,Timestamp,GAD1,GAD2,GAD3,GAD4,GAD5,GAD6,GAD7,GADE,...,Birthplace,Residence,Reference,Playstyle,accept,GAD_T,SWL_T,SPIN_T,Residence_ISO3,Birthplace_ISO3
0,1,42052.00437,0,0,0,0,1,0,0,Not difficult at all,...,USA,USA,Reddit,Singleplayer,Accept,1,23,5.0,USA,USA
1,2,42052.0068,1,2,2,2,0,1,0,Somewhat difficult,...,USA,USA,Reddit,Multiplayer - online - with strangers,Accept,8,16,33.0,USA,USA
2,3,42052.0386,0,2,2,0,0,3,1,Not difficult at all,...,Germany,Germany,Reddit,Singleplayer,Accept,8,17,31.0,DEU,DEU
3,4,42052.06804,0,0,0,0,0,0,0,Not difficult at all,...,USA,USA,Reddit,Multiplayer - online - with online acquaintanc...,Accept,0,17,11.0,USA,USA
4,5,42052.08948,2,1,2,2,2,3,2,Very difficult,...,USA,South Korea,Reddit,Multiplayer - online - with strangers,Accept,14,14,13.0,KOR,USA



## Process and Implications of Data Collection
The data was gathered by means of a survey that was distributed to gamers globally. The survey had a range of inquiries commonly employed by psychologists to assess levels of anxiety, social phobia, and life satisfaction. Standardized psychological assessment instruments, including the General Anxiety Disorder Assessment (GAD), Satisfaction with Life Scale (SWL), and Social Phobia Inventory (SPIN) questionnaires, and inquiries regarding gaming habits and general demographics were included in the survey. 

Though not explicitly mentioned, it is extremely likely that this survey was conducted online, given that online surveys are commonly used when reaching a worldwide audience, especially gamers. The dataset description also includes *Reddit* as an example for the **Reference** variable, indicating the website was used as an avenue to conduct the survey as well. Assuming the data was collected as such, this presents several implications:

- **Sample Composition**: Because the data was collected through an online survey, it may over-represent individuals active in online gaming communities or gamers who primarily play online multiplayer games. As a result, those who do not regularly use the internet, are inactive in online gaming communities, or those who play single-player games exclusively may be underrepresented.

- **Voluntary Response Bias**: The data relies on self-reported responses, which can be subject to biases such as inaccurate self-assessment by the respondent or social desirability bias. Respondents with stronger views also may have been more likely to participate in the first place because of this.

**Each row** represents a single survey response from a gamer, and **each column** represents a variable collected in the survey. The dataset contains **13464 observations** in total, and there are **55 variables** in the dataset. We can verify this, and also check each individual variable using the info() method:

In [7]:
gamingAnxiety_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13464 entries, 0 to 13463
Data columns (total 55 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   S. No.           13464 non-null  int64  
 1   Timestamp        13464 non-null  float64
 2   GAD1             13464 non-null  int64  
 3   GAD2             13464 non-null  int64  
 4   GAD3             13464 non-null  int64  
 5   GAD4             13464 non-null  int64  
 6   GAD5             13464 non-null  int64  
 7   GAD6             13464 non-null  int64  
 8   GAD7             13464 non-null  int64  
 9   GADE             12815 non-null  object 
 10  SWL1             13464 non-null  int64  
 11  SWL2             13464 non-null  int64  
 12  SWL3             13464 non-null  int64  
 13  SWL4             13464 non-null  int64  
 14  SWL5             13464 non-null  int64  
 15  Game             13464 non-null  object 
 16  Platform         13464 non-null  object 
 17  Hours       

### Variable Descriptions

#### Demographic Information:
- **S. No.:** Serial number of the response.
- **Timestamp:** Time at which the survey was completed.
- **Age:** Age of the respondent.
- **Gender:** Gender of the respondent.
- **Nationality:** Nationality of the respondent.
- **Education:** Education level of the respondent.
- **Employment:** Employment status of the respondent.
- **Income:** Income level of the respondent.
- **Marital_status:** Marital status of the respondent.
- **Children:** Number of children.
- **Birthplace:** Birthplace of the respondent.
- **Residence:** Current residence of the respondent.
- **Reference:** How the respondent found the survey
- **accept:** Acceptance of the survey.
- **Residence_ISO3:** ISO3 code of the respondent's residence.
- **Birthplace_ISO3:** ISO3 code of the respondent's birthplace.(e.g., Reddit).

#### Psychological Assessment:
- **GAD1 to GAD7:** Responses to Generalized Anxiety Disorder (GAD) questions.
- **GADE:** Perceived difficulty in completing GAD questions.
- **SWL1 to SWL5:** Responses to Satisfaction with Life Scale (SWL) questions.
- **SWLE:** Perceived difficulty in completing SWL questions.
- **SPIN1 to SPIN17:** Responses to Social Phobia Inventory (SPIN) questions.
- **SPINE:** Perceived difficulty in completing SPIN questions.
- **Anxiety_score:** Computed anxiety score based on GAD responses.
- **Satisfaction_with_life_score:** Computed satisfaction with life score based on SWL responses.
- **Social_phobia_score:** Computed social phobia score based on SPIN responses.
- **GAD_T:** Total GAD score.
- **SWL_T:** Total SWL score.
- **SPIN_T:** Total SPIN score.

#### Gaming Habits:
- **Gaming_hours_per_week:** Hours spent gaming per week.
- **Gaming_hours_per_day:** Hours spent gaming per day.
- **Device:** Preferred gaming device (e.g., PC, console).
- **Primary_game:** Most played game.
- **Game_genre:** Genre of the primary game.
- **Primary_platform:** Platform of the primary game.
- **Secondary_game:** Second most played game.
- **Secondary_genre:** Genre of the secondary game.
- **Secondary_platform:** Platform of the secondary game.
- **Tertiary_game:** Third most played game.
- **Tertiary_genre:** Genre of the tertiary game.
- **Tertiary_platform:** Platform of the tertiary game.
- **Playstyle:** Playstyle pSO3:** ISO3 code of the respondent's birthplace.
