# Project: Analysis__ of Meditation's Impact on Stress Reduction

In this project, we explore the impact of meditation practices on reducing stress levels using real scientific data from an open-source resource. The main stages of work include:

1. **Data Preparation** – cleaning, quality assessment, and merging of data files.
2. **Analysis and Visualization** – examining relationships between meditation practices and stress indicators.
3. **Insights** – evaluating the effectiveness of meditation and creating recommendations.

We'll start by working with each file individually to assess data quality and clean the content as needed. This project demonstrates data processing skills from initial cleaning in Python and SQL to final visualization in Tableau.



## Data Files Overview

1. **Code_Book.csv** – Provides detailed descriptions for each data field, including data types and format examples.

2. **Study_Groups_Category.csv** – Describes participant groups, covering demographic information, initial diagnoses, and settings of each study.

3. **Intervention.csv** – Includes details on meditation interventions, such as type, program name, goals, practice format, duration, and medical needs.

4. **Outcomes.csv** – Lists outcomes related to stress and other psychological indicators, including outcome categories, subcategories, and associated thresholds for interpreting scores.

5. **Effect_Size_Data.csv** – Contains data on the effectiveness of various interventions, with metrics on effect size, treatment and control means, standard deviations, and confidence intervals.

Each file will be individually assessed and cleaned before merging to ensure data quality for the analysis.

### 1. Code_Book.csv

The `Code_Book.csv` file serves as a reference guide, providing detailed information about each field across the datasets. It is metadata and it includes the following columns:

- **Field Name** – The name of each data field, allowing easy identification of variables across files.
- **Description** – A detailed description of each field, explaining its purpose and the type of information it represents.
- **Type** – Specifies the data type (e.g., numeric, string), which is essential for ensuring proper handling during data processing.
- **Format** – Indicates the expected format or structure of the data (e.g., number format, date format), helping to validate data entries.
- **Example** – Provides an example of each field’s typical data, giving a clearer understanding of expected values.

This code book is essential for understanding the data structure and for consistent data handling during cleaning and analysis.

### 2. Study_Groups_Category.csv

The `Study_Groups_Category.csv` file provides demographic and contextual information about each study group involved in the research. Key columns include:
- **report_id** – Unique identifier for each report, allowing for data linkage across files.
- **study_id** – Unique identifier for each study within the reports.
- **s_country** – Country where the study was conducted, helping to assess the geographical scope of the data.
- **s_setting** – Describes the study setting (e.g., clinical, community), providing context on the environment where data was collected.
- **s_start** – Start date of the study, indicating when data collection began.
- **s_end** – End date of the study, showing the study’s overall duration.
- **s_duration** – Numeric duration of the study, giving a quick understanding of the study period.
- **s_n** – Number of participants in each group, essential for understanding the scale of each study.
- **s_diag** – General code for participants' diagnoses, which classifies medical conditions.
- **s_diag_spec** – Detailed description of diagnoses, offering a more precise medical context for the group.
- **diag_category** – Diagnosis category, providing an additional generalized classification, useful for analyzing large groups.
- **s_female** – Percentage of females in the group, supporting demographic analysis by gender.
- **s_age_mt** – Mean age of the treatment group, useful for assessing age distribution within this category.
- **s_age_mc** – Mean age of the control group, allowing for comparison of age characteristics.

These columns offer comprehensive information for analyzing study group characteristics, demographics, and diagnoses, providing deeper insight into the context of the data collected.

This file is crucial for assessing the diversity and characteristics of the study groups, enabling a deeper analysis of outcomes based on participant backgrounds.

In [14]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import seaborn as sns
import os

In [15]:
study_df=pd.read_csv(r'C:\Users\User\OneDrive\Documents\NEW_JOB\Pet Projects\Meditation_Stress_Project\Cleaned data\Study_Groups_Category.csv', encoding='utf-8-sig', delimiter=',', on_bad_lines='skip')