Data-Analytics-Project

Repo for project titled - Analysis of factors influencing mental health during the Pandemic, done as a part of the Data Analytics ( UE19CS312 ) course at PES University.

This project aims to observe the degree by which trust in various authorities, sources of information and methods of coping influence levels of stress in certainpopulations. We also aim to take into consideration factors such as the country’s average expenditure on healthcare in the past decade along with stringency scores that indicate the level of strictness in a country and analyse how these factors influence Perceived Stress Scores in different countries. Finally, we also perform clustering analysis to find out which countries responded similarly to the pandemic.

Considering the problem statements, data set and the length of the list of features, certain popular classification and regression methods have been used. The ordinal nature of the target variable was taken into account as well. Ordinal Regression, Decision Trees, AdaBoost, XGBoost, Ridge and Lasso Regression and Support Vector Regression are the models presented in this project.

Ordinal Models

Ordinal Regression is performed using a Generalized Linear Model. The model computes the coefficient vector and thresholds for the problem. Given a set of observations denoted by vectors x1,x2..xn each of length d and a set of the corresponding values of the dependent variable denoted by y1through yn, the model finds a coefficient vector W and a set of thresholds θ1 θ2...θd-1 such that θ1 > θ2... > θd-1. The model can be formulated according to the expression given below

Classifier Models

The decision tree used in this project employed the CART algorithm.The CART algorithm builds decision trees based on the feature that gives the largest information gain at every node.

Adaboost is an ensemble machine learning technique. Multiple weak learners learn sequentially on the whole data to classify new instances. Incorrectly classified instances by a particular estimator or learner are given larger weights for that iteration so future learners focus on accurately classifying those points.

XGboost, a decision tree based ensemble learning algorithm was employed to achieve a greater accuracy.

Regression Models

Ridge and Lasso regression help us by choosing important features themselves by giving more weight to relevant features.Here, the target variable, ’Perceived Stress’ is not binned, and hence is a regression model Support Vector Regression uses the concept of support vector machines to unravel regression problems. The algorithm tries to find a hyperplane that suitably classifies data instances

Clustering Analysis

We performed clustering analysis in order to answer the question, ”Which countries are most similar in terms of trust in media and government ?”

Full results of this study can be found in the corresponding paper.

Project by : Anusha, Chakita M , Parimala S and Prajwal

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Adaboost+XGBoost.ipynb		Adaboost+XGBoost.ipynb
Additional_Inferences(Peoples'_Confidence_vs_Gov's_Measures).ipynb		Additional_Inferences(Peoples'_Confidence_vs_Gov's_Measures).ipynb
Data_Analytics_Models.ipynb		Data_Analytics_Models.ipynb
DecisionTrees_DA.ipynb		DecisionTrees_DA.ipynb
EDA_clustering.ipynb		EDA_clustering.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data-Analytics-Project

Ordinal Models

Classifier Models

Regression Models

Clustering Analysis

About

Releases

Packages

Contributors 3

Languages

Chakita/Analysis-of-Factors-Influencing-Mental-Health-During-the-Pandemic

Folders and files

Latest commit

History

Repository files navigation

Data-Analytics-Project

Ordinal Models

Classifier Models

Regression Models

Clustering Analysis

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages