**If you lost points on the last checkpoint you can get them back by responding to TA/IA feedback**  

Update/change the relevant sections where you lost those points, make sure you respond on GitHub Issues to your TA/IA to call their attention to the changes you made here.

Please update your Timeline... no battle plan survives contact with the enemy, so make sure we understand how your plans have changed.

# COGS 108 - Data Checkpoint

# Names

- Bryce Beeson
- Minghao Xian
- Zeying Du
- Bryant Quijada
- Yasamin Mazaheri


# Research Question

In 2022, how does the effectiveness of telemedicine in providing mental health care and therapy compare to traditional in-person sessions?
- The operational definition of “the effectiveness of telemedicine in providing mental health care and therapy” is patient satisfaction scores on surveys


## Background and Prior Work

The landscape of mental health care has undergone significant transformations, particularly with the increasing acceptance of therapy and the burgeoning adoption of telehealth, intensified by the COVID-19 pandemic. This shift has prompted several studies to scrutinize the comparative effectiveness of virtual therapy sessions against traditional in-person interventions.

"In‐person versus virtual therapy in outpatient eating‐disorder treatment: A COVID‐19 inspired study" investigated the outcomes of individuals undergoing in-person and virtual therapy during the COVID-19 outbreak. Results indicated similar improvements in eating disorder symptoms, weight gain, and satisfaction with services in both groups<a name="cite_ref-5"></a>[<sup>5</sup>](#cite_note-5).

Lee et al. explored the patient experience of telehealth services in a mental health setting during the COVID-19 pandemic, employing a cross-sectional survey conducted in an outpatient psychiatric clinic<a name="cite_ref-2"></a>[<sup>2</sup>](#cite_note-2). Their findings contribute valuable insights into how patients perceive telehealth services in mental health care.

Comparing telehealth and face-to-face psychotherapy for less common mental health conditions, Greenwood et al. aimed to assess differences in effectiveness between the two treatment modalities<a name="cite_ref-1"></a>[<sup>1</sup>](#cite_note-1). This study sheds light on the nuances of treatment outcomes for less common mental health conditions.

Polinski et al. delved into patient satisfaction and preferences with telehealth visits through a cross-sectional survey involving 1,734 patients<a name="cite_ref-3"></a>[<sup>3</sup>](#cite_note-3). The results underscore high patient satisfaction, with 94-99% reporting being "very satisfied" with various aspects of telehealth. Predictors of satisfaction included gender, overall understanding, quality of care, and convenience of telehealth.

Kaur et al. explored patient satisfaction rates during the COVID-19 era, emphasizing the importance of communication methods in telemedicine<a name="cite_ref-4"></a>[<sup>4</sup>](#cite_note-4). The study found that satisfaction rates were higher when video calls were used for teleconsultations, signaling the significance of the mode of interaction in influencing patient satisfaction.

Collectively, these studies suggest that telemedicine has the potential to be as effective as traditional in-person sessions for mental health care. The high satisfaction rates reported by patients further highlight the convenience and perceived quality of care associated with telehealth, contributing to improved accessibility to mental health services.

Citations:

1. <a name="cite_note-1"></a> [^](#cite_ref-1) Greenwood, Hannah, et al. “Telehealth versus Face-to-Face Psychotherapy for Less Common Mental Health Conditions: Systematic Review and Meta-Analysis of Randomized Controlled Trials.” JMIR Mental Health, U.S. National Library of Medicine, 11 Mar. 2022, www.ncbi.nlm.nih.gov/pmc/articles/PMC8956990/.
2. <a name="cite_note-2"></a> [^](#cite_ref-2)Lee, Heeyoung, et al. “Patient Experience with Telehealth Service in a Mental Health Setting.” Archives of Psychiatric Nursing, U.S. National Library of Medicine, Apr. 2023, www.ncbi.nlm.nih.gov/pmc/articles/PMC10008765/.
3. <a name="cite_note-3"></a> [^](#cite_ref-3)Polinski, Jennifer M., et al. “Patients’ Satisfaction with and Preference for Telehealth Visits.” Journal of General Internal Medicine : JGIM, vol. 31, no. 3, 2016, pp. 269–75, https://doi.org/10.1007/s11606-015-3489-x.
4. <a name="cite_note-4"></a> [^](#cite_ref-4)Kaur, Karuna Nidhi, et al. “Patient Satisfaction for Telemedicine Health Services in the Era of COVID-19 Pandemic: A Systematic Review.” Frontiers in Public Health, vol. 10, 2022, pp. 1031867–1031867, https://doi.org/10.3389/fpubh.2022.1031867.
5. <a name="cite_note-5"></a> [^](#cite_ref-5)Steiger, Howard, et al. “In‐person Versus Virtual Therapy in Outpatient Eating‐disorder Treatment: A COVID‐19 Inspired Study.” International Journal of Eating Disorders, vol. 55, no. 1, 2022, pp. 145–50, https://doi.org/10.1002/eat.23655.

# Hypothesis


- In the year 2022, patients who have received their mental health care or therapy in person rated their level of satisfaction of the care they received differently from the patients who received their care remotely.
- Our null hypothesis: patients who have received in-person or remote mental health care or therapy rated its  effectiveness based on the satisfaction of the care they received equally.
- We think this will be the outcome because receiving mental health care remotely and in person are two different experiences. We think the difference in experiences will lead to differences in the effectiveness of the provided mental health care or therapy based on the satisfaction of the care they received.


# Data

- The Center for Behavioral Health Statistics and Quality (CBHSQ), Mental Health Client-Level Data (MH-CLD) and National Mental Health Services Survey (N-MHSS) are all reliable sources of data related to clinical mental health services, facilities, and treatments in the United States. Since datasets from those data collections cover wide time periods and are based on large national (the United States) and states-level samples, we think they are sufficient enough for our project.
https://www.samhsa.gov/data/data-we-collect/n-sumhss-national-substance-use-and-mental-health-services-survey

## Data Overview

## SU63a OPIOIDS - Frequency of different types of health care facilities for opiods treatment

- https://github.com/COGS108/Group019_FA23/blob/master/csv/SU63a-opioids.csv
- https://github.com/COGS108/Group019_FA23/blob/master/csv/SU63a-cont-opioids.csv
- 54 observations
- 14 variables
- Description: The dataset comes from the National Substance Use and Mental Health Service Survey 2022: (N-SUMHSS) Annual Detailed Tables. The ‘Telemedicine/telehealth’ variable is the most important variable in this table. The datatype in this dataset is integer count. This metric can be proxy to the frequency in which people use telemedicine/telehealth to receive their mental health care or therapy. This data needs to be truncated down to only include the data we care about.

In [40]:
import pandas as pd

opioids_filepath = "csv-corrected/SU63a-opioids.csv"
cont_opioids_filepath = "csv-corrected/SU63a-cont-opioids.csv"

# Read the datasets into pandas DataFrames
opioids_data = pd.read_csv(opioids_filepath)
cont_opioids_data = pd.read_csv(cont_opioids_filepath)

# Set the display options to show all rows and columns
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

# Display the entire table for each dataset
print("Opioids Dataset:")
print(opioids_data)

print("\nCont Opioids Dataset:")
print(cont_opioids_data)

Opioids Dataset:
   State or jurisdiction1   Total2  Substance use counseling  \
0                   Total   14.854                    13.288   
1                 Alabama  143.000                   130.000   
2                  Alaska   83.000                    74.000   
3                 Arizona  441.000                   374.000   
4                Arkansas  138.000                    94.000   
5              California    1.525                     1.368   
6                Colorado  323.000                   254.000   
7             Connecticut  174.000                   161.000   
8                Delaware   51.000                    43.000   
9    District of Columbia   28.000                    25.000   
10                Florida  681.000                   610.000   
11                Georgia  309.000                   255.000   
12                 Hawaii  124.000                   113.000   
13                  Idaho   97.000                    93.000   
14               Illino

## SU63b OPIOIDS - Frequency of different types of health care facilities for other drugs treatment

- https://github.com/COGS108/Group019_FA23/blob/master/csv/SU63b-opioids.csv
- https://github.com/COGS108/Group019_FA23/blob/master/csv/SU63b-cont-opioids.csv
- 54 observations
- 14 variables
- Description: The dataset comes from the National Substance Use and Mental Health Service Survey (N-SUMHSS) 2022: Annual Detailed Tables. The ‘Telemedicine/telehealth’ variable is the most important variable in this table. The datatype in this dataset is percent distribution. This metric can be proxy to the frequency in which people use telemedicine/telehealth to receive their mental health care or therapy. This data needs to be truncated down to only include the data we care about.

In [47]:
import pandas as pd

# Define the URLs for the datasets
opioids_url_b = "csv-corrected/SU63b-opioids-revised.csv"
cont_opioids_url_b = "csv-corrected/SU63b-opioids-cont-revised.csv"

# Read the datasets into pandas DataFrames
opioids_data_b = pd.read_csv(opioids_url_b)
cont_opioids_data_b = pd.read_csv(cont_opioids_url_b)

# Display basic information about the datasets
print("Opioids Dataset B:")
print(opioids_data_b)

print("\nCont Opioids Dataset B:")
print(cont_opioids_data_b)

Opioids Dataset B:
   State or jurisdiction1  Number of facilities  Substance use counseling  \
0                   Total                 14854                      89.5   
1                 Alabama                   143                      90.9   
2                  Alaska                    83                      89.2   
3                 Arizona                   441                      84.8   
4                Arkansas                   138                      68.1   
5              California                  1525                      89.7   
6                Colorado                   323                      78.6   
7             Connecticut                   174                      92.5   
8                Delaware                    51                      84.3   
9    District of Columbia                    28                      89.3   
10                Florida                   681                      89.6   
11                Georgia                   309          

## SU63a - Other Drugs - Percentage of different types of health care facilities for opiods treatment

- https://github.com/COGS108/Group019_FA23/blob/master/csv/SU63a-other.csv
- https://github.com/COGS108/Group019_FA23/blob/master/csv/SU63a-cont-other.csv
- 54 observations
- 14 variables
- Description: The dataset comes from the National Substance Use and Mental Health Service Survey (N-SUMHSS) 2022: Annual Detailed Tables. The ‘Telemedicine/telehealth’ variable is the most important variable in this table. The datatype in this dataset is integer count. This metric can be proxy to the frequency in which people use telemedicine/telehealth to receive their mental health care or therapy. This data needs to be truncated down to only include the data we care about.

In [None]:
import pandas as pd

# Define the URLs for the datasets
other_drugs_url_a = "https://github.com/COGS108/Group019_FA23/raw/master/csv/SU63a-other.csv"
cont_other_drugs_url_a = "https://github.com/COGS108/Group019_FA23/raw/master/csv/SU63a-cont-other.csv"

# Read the datasets into pandas DataFrames
other_drugs_data_a = pd.read_csv(other_drugs_url_a)
cont_other_drugs_data_a = pd.read_csv(cont_other_drugs_url_a)

# Display basic information about the datasets
print("Other Drugs Dataset A:")
print(other_drugs_data_a.info())

print("\nCont Other Drugs Dataset A:")
print(cont_other_drugs_data_a.info())

## SU63b Other Drugs - Percentage of different types of health care facilities for other drugs treatment

- https://github.com/COGS108/Group019_FA23/blob/master/csv/SU63b-other.csv
- https://github.com/COGS108/Group019_FA23/blob/master/csv/SU63b-cont-other.csv
- 54 observations
- 14 variables
- Description: The dataset comes from the National Substance Use and Mental Health Service Survey (N-SUMHSS) 2022: Annual Detailed Tables. The ‘Telemedicine/telehealth’ variable is the most important variable in this table. The datatype in this dataset is percent distribution. This metric can be proxy to the frequency in which people use telemedicine/telehealth to receive their mental health care or therapy. This data needs to be truncated down to only include the data we care about.

In [None]:
import pandas as pd

# Define the URLs for the datasets
other_drugs_url_b = "https://github.com/COGS108/Group019_FA23/raw/master/csv/SU63b-other.csv"
cont_other_drugs_url_b = "https://github.com/COGS108/Group019_FA23/raw/master/csv/SU63b-cont-other.csv"

# Read the datasets into pandas DataFrames
other_drugs_data_b = pd.read_csv(other_drugs_url_b)
cont_other_drugs_data_b = pd.read_csv(cont_other_drugs_url_b)

# Display basic information about the datasets
print("Other Drugs Dataset B:")
print(other_drugs_data_b.info())

print("\nCont Other Drugs Dataset B:")
print(cont_other_drugs_data_b.info())

## MH21a - Mental health facilities’ client satisfaction survey by facility type with numbers

- https://github.com/COGS108/Group019_FA23/blob/master/csv/MH21a.csv
- 16 observations
- 8 variables
- Description: The dataset comes from the National Substance Use and Mental Health Service Survey (N-SUMHSS) 2022: Annual Detailed Tables. The periodic client satisfaction surveys variable is the most important variable in this table. The datatype in this dataset is integer count. This metric can be proxy to the patients’ own rating for the satisfaction and effectiveness for the care that they received. This data needs to be truncated down to only include the data we care about.

In [35]:
import pandas as pd

# Define the URL for the dataset
mh21a_url = "csv-corrected/MH21a.csv"

# Read the dataset into a pandas DataFrame
mh21a_data = pd.read_csv(mh21a_url)

# Display basic information about the dataset
print("MH21a Dataset:")
print(mh21a_data)

MH21a Dataset:
                                       Facility type  Total  \
0                                              Total   9586   
1                      Psychiatric hospitals (Total)    448   
2                     Psychiatric hospitals (Public)    119   
3                    Psychiatric hospitals (Private)    329   
4                                  General hospitals    618   
5                                    State hospitals     33   
6                                  RTCs for children    448   
7                                    RTCs for adults    699   
8    Other types of residential treatment facilities     75   
9                   Veterans Affairs medical centers    393   
10                   Community mental health centers   1835   
11     Certified community behavioral health clinics    481   
12  Partial hospitalization/day treatment facilities    334   
13               Outpatient mental health facilities   3516   
14            Multi-setting mental healt

## MH21b - Mental health facilities’ client satisfaction survey by facility type with percentage


- https://github.com/COGS108/Group019_FA23/blob/master/csv/MH21b.csv
- 16 observations
- 8 variables
- Description: The dataset comes from the National Substance Use and Mental Health Service Survey (N-SUMHSS)  2022: Annual Detailed Tables. The periodic client satisfaction surveys variable is the most important variable in this table. The datatype in this dataset is percent distribution. This metric can be proxy to the patients’ own rating for the satisfaction and effectiveness for the care that they received. This data needs to be truncated down to only include the data we care about.

In [36]:
import pandas as pd

# Define the URL for the dataset
mh21b_url = "csv-corrected/MH21b.csv"

# Read the dataset into a pandas DataFrame
mh21b_data = pd.read_csv(mh21b_url)

# Display basic information about the dataset
print("MH21b Dataset:")
print(mh21b_data)

MH21b Dataset:
                                       Facility type  \
0                                              Total   
1                      Psychiatric hospitals (Total)   
2                      Psychiatric hospitals (Public   
3                    Psychiatric hospitals (Private)   
4                                  General hospitals   
5                                    State hospitals   
6                                  RTCs for children   
7                                    RTCs for adults   
8    Other types of residential treatment facilities   
9                   Veterans Affairs medical centers   
10                   Community mental health centers   
11     Certified community behavioral health clinics   
12  Partial hospitalization/day treatment facilities   
13               Outpatient mental health facilities   
14            Multi-setting mental health facilities   
15                                             Other   

    Total number of facilities  

## SU19 - Substance facilities by frequently used methods and facility type

- https://github.com/COGS108/Group019_FA23/blob/master/csv/SU19.csv
- 15 observations
- 6 variables
- Description: The dataset comes from the National Substance Use and Mental Health Service Survey (N-SUMHSS) 2022: Annual Detailed Tables. The periodic client satisfaction surveys variable is the most important observation in this table. The datatype in this dataset is integer count. This metric tells us about how many cases of mental health care is done via telemedicine.

In [None]:
import pandas as pd

# Define the URL for the dataset
su19_url = "https://github.com/COGS108/Group019_FA23/raw/master/csv/SU19.csv"

# Read the dataset into a pandas DataFrame
su19_data = pd.read_csv(su19_url)

# Display basic information about the dataset
print("SU19 Dataset:")
print(su19_data.info())

## MH14a - Mental health facilities that offer different treatment modalities

- https://github.com/COGS108/Group019_FA23/blob/master/csv/MH14a.csv
- https://github.com/COGS108/Group019_FA23/blob/master/csv/MH14a-cont.csv
- 16 observations
- 8 variables
- Description: The dataset comes from the National Substance Use and Mental Health Service Survey (N-SUMHSS) 2022: Annual Detailed Tables. The ‘Periodic client satisfaction surveys’ variable is the most important variable in the dataset. The datatype in this dataset is integer count. This metric can be proxy to the patients’ own rating for the satisfaction and effectiveness for the care that they received. This data needs to be truncated down to only include the data we care about.

In [None]:
import pandas as pd

# Define the URLs for the datasets
mh14a_url = "https://github.com/COGS108/Group019_FA23/raw/master/csv/MH14a.csv"
cont_mh14a_url = "https://github.com/COGS108/Group019_FA23/raw/master/csv/MH14a-cont.csv"

# Read the datasets into pandas DataFrames
mh14a_data = pd.read_csv(mh14a_url)
cont_mh14a_data = pd.read_csv(cont_mh14a_url)

# Display basic information about the datasets
print("MH14a Dataset:")
print(mh14a_data.info())

print("\nCont MH14a Dataset:")
print(cont_mh14a_data.info())

# Ethics & Privacy

- Some privacy issues that come up with the data that we proposed are the exposure of private conversations between the patient and the therapist. Privacy shouldn’t be a big concern, as the practicing therapists who would provide the data will have to adhere to HIPAA.
- One source of bias would be the patients’ own preferences. When given the choice, some patients might prefer to receive their care one way or another. This will lead to them rating their preferred method care to be more effective. Some other potential biases in our dataset is the demographic/population being sampled, and we do not know the methods of how the data from the dataset was created/collected. Too often, WEIRD (Western, Educated, Industrial, Rich, Democracies) populations are taken. This can affect the efficacy of our data as it is heavily biased to a certain population. Additionally, we do not know if the people who created the data from the dataset had some sort of ulterior motive or inherent bias against the participants involved.
- Some of the ways we will set out to detect these specific biases for commuting our analysis is by having participants experience both forms of delivering mental health care and therapy, check for a description of the characteristics of the participants involved in the data, and analyze the methodology used during the creation of the data.
- Some issues related to our data that are potentially problematic in terms of equitable impact is that our data is not representative of the population of the United States as a whole
- We will negate privacy concerns by using a dataset where the patients voluntarily submit their feedback. 
- Some of the negative consequences that could arise during and after the project are Inaccurate Representations. Specifically, if biases in the dataset are not adequately identified and addressed, the outcomes may provide inaccurate representations of the effectiveness of telemedicine. This could lead to misguided conclusions and potentially influence decision-making in mental health care policies and practices. 

# Team Expectations 


* Team Expectation 1: Join group calls within an hour of the scheduled meeting time.
* Team Expectation 2: Plan meetings ahead of time
* Team Expectation 3: Each team member should contribute equally to the project
* Team Expectation 4: Be respectful of people’s time and responsive by communicating effectively to the group as a whole
* Team Expectation 5: Utilize each other’s strengths to work on the project more efficiently.


# Project Timeline Proposal

| Meeting Date  | Meeting Time| Completed Before Meeting  | Discuss at Meeting |
|---|---|---|---|
| 10/22  |  6 PM | Read & Think about COGS 108 expectations; brainstorm topics/questions  | Discuss and complete the project review | 
| 11/1  |  6 PM |  Do background research on topic | Discuss and complete the Project Proposal | 
| 11/15  | 6 PM  | Edit, finalize, and submit proposal; Search for datasets  | Discuss and complete the Data Checkpoint   |
| 11/29  | 6 PM  | Import & Wrangle Data; EDA | Discuss and complete the EDA Checkpoint   |
| 12/6  | 6 PM  | Finalize wrangling/EDA; Begin Analysis | Continue editing and revising final project |
| 12/13  | 6 PM  | Complete analysis; Draft results/conclusion/discussion | Finalize Final Project |
| 12/13  | Before 11:59 PM  | NA | Turn in Final Project & Group Project Surveys |