# Project Prototype: Social Media and Mental Health Impact Analysis

## 1. Project Overview
This project analyzes the impact of social media usage on mental health in Southeast Asia, focusing on various demographic factors. The goal is to identify patterns and relationships between social media usage, social anxiety, self-confidence, and other relevant variables.

## 2. Dataset Information
- Source: South_East_Asia_Social_Media_MentalHealth.csv
- Key Columns:
  - `Country`: Country of the respondents.
  - `Age Group`: Age categories of the respondents (e.g., 18-24, 25-34).
  - `Gender`: Gender of the respondents (e.g., Male, Female).
  - `Urban/Rural`: Indicates whether the respondent lives in an urban or rural area.
  - `Daily SM Usage (hrs)`: Average hours spent on social media daily.
  - `Most Used SM Platform`: The most frequently used social media platform.
  - `Frequency of SM Use`: Frequency of social media usage (e.g., Daily, Weekly).
  - `Likes Received (per post)`: Average likes received on social media posts.
  - `Comments Received (per post)`: Average comments received on posts.
  - `Shares Received (per post)`: Average shares received on posts.
  - `Peer Comparison Frequency (1-10)`: Frequency of comparing oneself to peers on a scale of 1 to 10.
  - `Social Anxiety Level (1-10)`: Self-reported social anxiety on a scale of 1 to 10.
  - `Socioeconomic Status`: Classification of the respondent's economic status.
  - `Education Level`: Level of education attained by the respondent.
  - `Body Image Impact (1-10)`: Impact of social media on body image, rated from 1 to 10.
  - `Sleep Quality Impact (1-10)`: Impact of social media on sleep quality, rated from 1 to 10.
  - `Self Confidence Impact (1-10)`: Impact of social media on self-confidence, rated from 1 to 10.
  - `Cyberbullying Experience (1-10)`: Experience of cyberbullying rated from 1 to 10.
  - `Anxiety Levels (1-10)`: Self-reported anxiety levels rated from 1 to 10.

## 3. Project Structure
- data/ Raw and processed datasets
- notebooks/ Jupyter notebooks for analysis
- scripts/ Python scripts (including clustering algorithms)
- visualizations/ Plots and graphs
- README.md


## 4. Functionality Requirements
- Data cleaning and preprocessing to handle missing values and inconsistencies.
- Implement Custom KMeans clustering and Sklearn KMeans.
- Visualize clustering results using scatter plots and other relevant graphs.
- Conduct statistical analysis to summarize the findings and explore relationships.

## 5. Key Classes and Functions
- CustomKMeans Class
  - `__init__(self, k, max_iters=100, tol=1e-4)`
  - `initialize_centroids(self, df)`
  - `assign_clusters(self, df)`
  - `compute_centroids(self, df)`
  - `fit(self, df)`
  - `predict(self, df)`
  - `sse(self, df)`

- Data Processing Functions
  - Functions for cleaning data, handling missing values, and preprocessing data for clustering.

- Visualization Functions
  - Functions to create various plots (e.g., scatter plots, box plots) for visualizing relationships and clusters.

## 6. Clustering Strategy
- Clustering Variables: The following variables will be used for clustering:
  - `Daily SM Usage (hrs)`
  - `Peer Comparison Frequency (1-10)`
  - `Social Anxiety Level (1-10)`
  - `Self Confidence Impact (1-10)`

- Justification for Using KMeans: KMeans is chosen for its effectiveness in segmenting data into meaningful clusters based on the selected variables.

## 7. Potential Challenges and Solutions
- Challenge: Handling missing values and inconsistencies in the dataset.
  - Solution: Imputation techniques or removal of records with missing data, followed by thorough data validation.

- Challenge: Interpreting clusters meaningfully and understanding their implications.
  - Solution: Conduct thorough statistical analysis on each cluster and provide context based on demographic data.

## 8. Next Steps
- Complete data preprocessing.
- Implement clustering algorithms.
- Finalize visualizations and statistical analysis.
- Prepare a comprehensive report summarizing findings.

