# COGS 108 - Data Checkpoint

## Authors

COGS 108 Project Group 20 Members:
- Alexandra Dinh: Writing - original draft, Analysis, Data Cleaning/Analysis
- Kevin Casillas: Data curation, Writing - original draft, Data Cleaning/Analysis
- Afraz Hameed: Project administration, Writing - original draft, Analysis, Data Cleaning/Analysis
- Gunoor Sohal: Background research, Writing - original draft, Data Cleaning/Analysis
- Mohammad Hashmi: Project administration, Writing - original draft, Feedback Consolidation

How does the state-level nighttime sky brightness magnitude as measured by Globe at Night datasets correlate with depression prevalence and sleep disturbance rates across U.S. states? 



Instructions: REPLACE the contents of this cell with your work, including any updates to recover points lost in your proposal feedback



## Background and Prior Work

Light pollution, the common phrase for regions with excessive artificial lighting at night time, has been a long growing concern in regards to the environment, as well as the psychological well being of the public. Studies have shown that being exposed to artificial light at night can disrupt your circadian rhythm, causing mood swings or poor sleep. Satellite imagery tools such as the [Light Pollution Map](https://lightpollutionmap.app/?lat=45.706179&lng=-86.440430&zoom=5&opacity=50&auroraOpacity=90) and observational datasets like Globe at Night let us see the variances in light pollution by area, and we can see a common trend where the more populated the region is, the more light pollution is present. 

These extensive tooling have been used for environmental research, and are reliable in comparing light pollution levels across the U.S. However, most existing projects or studies try to link these stats to mental health outcomes in humans on a large scale. 

The prevalence of depression in humans varies widely by region. Data from the [Center of Disease Control](https://www.cdc.gov/mmwr/volumes/72/wr/mm7224a1.htm) (CDC) highlights that a majority of adults in the U.S. have reported experiencing depression in one form or another, with notable differences between states. Additional state level stats from the [World Population Review](https://worldpopulationreview.com/state-rankings/mental-health-statistics-by-state) and even [government owned websites](https://catalog.data.gov/dataset/adult-depression-lghc-indicator-627e3) further confirm the differences in severity of depression depending on the region. We have extensive datasets to independently track mental health trends and variances in light pollution, but very few projects try to combine mental health indicators and environmental exposure data. As such, we want to build our project on these existing datasets, and study the effects of state level light pollution levels with depression prevalence data to find out whether regions with higher light pollution also report higher cases of depression. 

Recent peer-reviewed research has also supported this connection. A 2024 systematic review and meta-analysis examining over half a million participants found that both indoor and outdoor artificial light at night were associated with increased risks of depression, suggesting that consistent nighttime brightness can negatively affect mental well-being (Chen et al., 2024). 

References: Chen, M., Zhao, Y., Lu, Q., Ye, Z., Bai, A., Xie, Z., Zhang, D., & Jiang, Y. (2024). Artificial light at night and risk of depression: A systematic review and meta-analysis. Environmental Health and Preventive Medicine, 29, 73. 


## Hypothesis


States with higher levels of light pollution will have more cases of diagnosed depression compared to states with lower light pollution levels. This hypothesis is based on existing research discussed in the background section showing that artificial light at night is associated with disruptions in normal sleep patterns and overall well-being. Since the project compares state-level light-pollution intensity data with reported depression statistics it’s reasonable to expect that areas with consistently higher nighttime brightness may also show higher rates of diagnosed depression. The relationship isn’t assumed to be purely causal but rather an observable association between environmental light exposure and mental-health indicators across regions.

## Data

### Data overview

Instructions: REPLACE the contents of this cell with descriptions of your actual datasets.

For each dataset include the following information
- Dataset #1
  - Dataset Name: Mental Health Statistics by State 2026
  - Link to the dataset: https://worldpopulationreview.com/state-rankings/mental-health-statistics-by-state
  - Number of observations: 51
  - Number of variables: 8
  - Description of the variables most relevant to this project: Depression Rate 2023, Depression Rate 2022, Depression Rate Ages 0-14 2022, and Depression Rate Ages 0-14 2023 (all per 100K) are the most relevant datasets towards our project.
  - Descriptions of any shortcomings this dataset has with repsect to the project: It doesn't have as many variables exploring greater depth of the dpression, but rather taking in the numbers of if people have depression at all within those states, not exploring the severity of the depression as well. It also has variables not necessary for our project, including the eating disorder rates, as we are focusing on depression solely, these variables do not provide much significance.
- Dataset #2 (if you have more than one!)
  - Dataset Name: Globe at Night 2024
  - Link to the dataset: https://globeatnight.org/documents/926/GaN2024.csv
  - Number of observations: 14,373 (raw)
  - Number of variables: 17
  - Description of the variables most relevant to this project: LimitingMag (faintest star magnitude visible to naked eye; lower = brighter skies = more light pollution), SQMReading (sky brightness in mag/arcsec²; higher = darker skies; often missing), and Country (includes US state for US observations) are most relevant. Latitude and Longitude enable geographic aggregation by state.
  - Descriptions of any shortcomings this dataset has with respect to the project: SQMReading is missing in around 82% of rows (only meter based observations have it). Observation density varies by region, so US states have unequal sample sizes. Citizen-science data may be biased toward clear nights and motivated observers. Seasonal and long-term trends are not explicitly controlled.
- etc

Each dataset deserves either a set of bullet points as above or a few sentences if you prefer that method.

If you plan to use multiple datasets, add a few sentences about how you plan to combine these datasets.

In [None]:
# Run this code every time when you're actively developing modules in .py files.  It's not needed if you aren't making modules
#
## this code is necessary for making sure that any modules we load are updated here 
## when their source code .py files are modified

%load_ext autoreload
%autoreload 2

In [None]:
# Setup code -- this only needs to be run once after cloning the repo!
# this code downloads the data from its source to the `data/00-raw/` directory
# if the data hasn't updated you don't need to do this again!

# if you don't already have these packages (you should!) uncomment this line
# %pip install requests tqdm

import sys
sys.path.append('./modules') # this tells python where to look for modules to import

import get_data # this is where we get the function we need to download data

# replace the urls and filenames in this list with your actual datafiles
# yes you can use Google drive share links or whatever
# format is a list of dictionaries; 
# each dict has keys of 
#   'url' where the resource is located
#   'filename' for the local filename where it will be stored 
datafiles = [
    { 'url': 'https://raw.githubusercontent.com/kjaramillocasillas/Mental-Health-Stats/refs/heads/main/Mental-Health-Data.csv', 'filename':'mental-health-by-state.csv'},
    { 'url': 'https://globeatnight.org/documents/926/GaN2024.csv', 'filename': 'GaN2024.csv'}
]

get_data.get_raw(datafiles, destination_directory='data/00-raw/')

### Depression Rates in the U.S. Corresponding to State

Instructions: 
1. Change the header from Dataset #1 to something more descriptive of the dataset
2. Write a few paragraphs about this dataset. Make sure to cover
   1. Describe the important metrics, what units they are in, and giv some sense of what they mean.  For example "Fasting blood glucose in units of mg glucose per deciliter of blood.  Normal values for healthy individuals range from 70 to 100 mg/dL.  Values 100-125 are prediabetic and values >125mg/dL indicate diabetes. Values <70 indicate hypoglycemia. Fasting idicates the patient hasn't eaten in the last 8 hours.  If blood glucose is >250 or <50 at any time (regardless of the time of last meal) the patient's life may be in immediate danger"

The important metrics are Depression Rates and Depression Rate Ages 0-14 (for 2022 and 2023 each). The units for these are in people per 100K, so for example if the Depression Rate 2022 was 6,299 for West Virginia, that means for every 100K people in that location there are 6,299 that have depression. This is also the case for Depression Rate Ages 0-14, but only focused on kids/teens that are younger than 14 years old. Generally, a conjecture can be made that the values for depression will be lower for those aged 0-14 due to the comfort, security, and lack of responsibilities in young years compared to being burdened by adult responsibilities and hardships. A higher percentage suggests a larger proportion of people in that state have had depression identified or reported, which can relate to true burden but also to differences in screening, healthcare access, and willingness to report. This metric can tend to reflect lifetime diagnosis rather than if someone is currently depressed and it doesn’t directly capture severity or duration.

   2. If there are any major concerns with the dataset, describe them. For example "Dataset is composed of people who are serious enough about eating healthy that they voluntarily downloaded an app dedicated to tracking their eating patterns. This sample is likely biased because of that self-selection. These people own smartphones and may be healthier and may have more disposable income than the average person.  Those who voluntarily log conscientiously and for long amounts of time are also likely even more interested in health than those who download the app and only log a bit before getting tired of it"

A major concern with this depression-rate-by-state dataset is that the numbers may not be a clean and true measure of depression and can be biased by how depression is detected and reported. Much of it reflects depression diagnoses which depend heavily on whether people have access to healthcare, get screened, and are willing to seek help. Therefore, states with better access or more routine screening may look like they have higher depression rates simply because more cases are diagnosed, while states with limited access may be undercounted. In addition, there are likely many people that have depression that keep it to themselves and suffer the symptoms themselves without getting professional help. If any values come from self-reported surveys, those can also be affected by stigma, underreporting, and who chooses to respond.

3. Use the cell below to 
    1. load the dataset 
    2. make the dataset tidy or demonstrate that it was already tidy
    3. demonstrate the size of the dataset
    4. find out how much data is missing, where its missing, and if its missing at random or seems to have any systematic relationships in its missingness
    5. find and flag any outliers or suspicious entries
    6. clean the data or demonstrate that it was already clean.  You may choose how to deal with missingness (dropna of fillna... how='any' or 'all') and you should justify your choice in some way
    7. You will load raw data from `data/00-raw/`, you will (optionally) write intermediate stages of your work to `data/01-interim` and you will write the final fully wrangled version of your data to `data/02-processed`
4. Optionally you can also show some summary statistics for variables that you think are important to the project
5. Feel free to add more cells here if that's helpful for you


In [None]:
# Imports to clean the dataset
import pandas as pd

# store mental health data in dataframe
mental_health_raw_df = pd.read_csv('data/00-raw/mental-health-by-state.csv')

# Explore mental health data and show it is in tidy form
print("Shape of data pre-clean: ", mental_health_raw_df.shape)
mental_health_raw_df.head()

In [None]:
#Show case data types as part of exploring
print("Data types of entries:\n", mental_health_raw_df.dtypes)

In [None]:
# Cleaning data below

# Dropping columns unrelated to depression or location
mental_health_clean_df = mental_health_raw_df.drop(
    ['EatingDisorderRate_2023', 'EatingDisorderRate_2022', 
    'EatingDisorderRateAge0To14_2023', 'EatingDisorderRateAge0To14_2022'], 
    axis = 1)

# Find rows and columns with null values 
invalid_row_count = mental_health_clean_df.isnull().any(axis=1).sum()

print("Number of Rows with at least 1 null entry: ", invalid_row_count, '\n')

print("Columns with null data:\n", mental_health_clean_df.isna().any())

All 51 rows appear to have at least one null entry under the stateFlagCode column. 
The data isn't missing at random and based on further observation are 
images of the state flags of each entry respectivley which are irrelevant here.
We will drop the stateFlagCode completley because of this in the cell below

In [None]:
mental_health_clean_df = mental_health_clean_df.drop(
    ['stateFlagCode'], 
    axis = 1)

# See new size and format of clean tidy data 
print("Shape of clean data: ", mental_health_clean_df.shape)
mental_health_clean_df.head()


# Search cleaned data for any outlier or suspicious entries
print("Summary of cleaned data: ")
mental_health_clean_df.describe()


Data appears to be clean and logically consistent with similar columns 
based on the summary  of statistics printed in the cell above

In [None]:
# Moving current cleaned data to interm folder
mental_health_clean_df.to_csv('data/01-interim/cleaned-mental-health-by-state.csv', index = 0);

# Final display of data
mental_health_clean_df.head()


### Light Pollution Dataset

See instructions above for Dataset #1.  Feel free to keep adding as many more datasets as you need.  Put each new dataset in its own section just like these. 

Lastly if you do have multiple datasets, add another section where you demonstrate how you will join, align, cross-reference or whatever to combine data from the different datasets

Please note that you can always keep adding more datasets in the future if these datasets you turn in for the checkpoint aren't sufficient.  The goal here is demonstrate that you can obtain and wrangle data.  You are not tied down to only use what you turn in right now.

### Dataset Breakdown

LocalDate (YYY-MM-DD) / LocalTime (HH:MM) is the date and time of the obsersation, important for determining sky brightness. 

LimitingMag is in atronomical magnitude, or the faintest star magnitude visible to the naked eye. Lower values mean brighter skies and worse conditions, higher values mean darker skies and better conditions. 

SQMReading is in Magnitudes per square arcsecond, and measures the sky brightness. Higher values mean darker skies. Many missing rows in dataset

Country is where the observation took place in. This dataset is global, but the United States ones specify which state. 

Concerns are that there is not an equal amount of observations between all states in the US, and seasons and long term trends may not have been taken into consideration. 

In [None]:
import pandas as pd
import os

# Step A, load the dataset
light_pol = pd.read_csv("data/00-raw/GaN2024.csv")

# Step B, Show that the data is already in tidy form
light_pol.head()

In [None]:
# Step C, show the size of our dataset
light_pol.shape

In [None]:
# Step D, missing data
light_pol_missing = light_pol.isna().sum()
light_pol_missing
#seems to be missing SQMReading in many entries

In [None]:
# Steps E and F, find outliers and clean
# Lets keep the rows with NaN SQMReading. we will use LimitingMag as a fallback to avoid losing data
mask = (light_pol["LimitingMag"] >= 0) & (light_pol["LimitingMag"] <= 7)
sqm_ok = light_pol["SQMReading"].isna() | ((light_pol["SQMReading"] >= 10) & (light_pol["SQMReading"] <= 25))
clean = light_pol[mask & sqm_ok].drop(columns=["SQMSerial"])

os.makedirs("data/02-processed", exist_ok=True)
clean.to_csv("data/02-processed/light_pollution_2024_cleaned.csv", index=False)
clean.shape
# We can see we dropped a column (17 -> 16)!

In [None]:
#summary stats for key variables
clean[["LimitingMag", "SQMReading"]].describe()

## Ethics 

Instructions: Keep the contents of this cell. For each item on the checklist
-  put an X there if you've considered the item
-  IF THE ITEM IS RELEVANT place a short paragraph after the checklist item discussing the issue.
  
Items on this checklist are meant to provoke discussion among good-faith actors who take their ethical responsibilities seriously. Your teams will document these discussions and decisions for posterity using this section.  You don't have to solve these problems, you just have to acknowledge any potential harm no matter how unlikely.

Here is a [list of real world examples](https://deon.drivendata.org/examples/) for each item in the checklist that can refer to.

[![Deon badge](https://img.shields.io/badge/ethics%20checklist-deon-brightgreen.svg?style=popout-square)](http://deon.drivendata.org/)

Even though this study is only looking at correlations between nighttime light exposure and mental-health data I think there is a risk of misunderstanding the results as direct cause-and-effect. For example the city planners might reduce lighting too aggressively without thinking about safety, transportation visibility, or nighttime business activity. There is also a chance that states or regions with higher reported depression or anxiety rates could become stigmatized or be unfairly judged. Mental health is influenced by many different social and economic factors and so focusing only on light pollution is oversimplifying a much bigger issue. To avoid this, the study will explain clearly its limitations and emphasize that the findings show associations and not definite causes.
### A. Data Collection
 - [X] **A.1 Informed consent**: If there are human subjects, have they given informed consent, where subjects affirmatively opt-in and have a clear understanding of the data uses to which they consent?

> We are diving into the topic of mental health, so it is important to have their consent to use datasets in regards to their mental health symptoms, such as whether they have depression. We will ensure that the datasets we use note that the participants were consenting. 

 - [X] **A.2 Collection bias**: Have we considered sources of bias that could be introduced during data collection and survey design and taken steps to mitigate those?

 > While we are not personally collecting the data, we are gathering it amongst various public datasets. We will try to find datasets that have a diverse variety of participants to minimize bias. 

 - [X] **A.3 Limit PII exposure**: Have we considered ways to minimize exposure of personally identifiable information (PII) for example through anonymization or not collecting information that isn't relevant for analysis?

 > We will not be using any personally identifiable information such as name, but will instead be using datasets that have been anonymized 

 - [X] **A.4 Downstream bias mitigation**: Have we considered ways to enable testing downstream results for biased outcomes (e.g., collecting data on protected group status like race or gender)?

 > As we look at the relationship between light pollution and mental health, we may find a relationship that actually represents race or culture. We will try to reduce bias for these by taking into consideration the distribution of race and the light pollution distribution. 

### B. Data Storage
 - [X] **B.1 Data security**: Do we have a plan to protect and secure data (e.g., encryption at rest and in transit, access controls on internal users and third parties, access logs, and up-to-date software)?

 > If we have any private or personal information, data security is very important. However, we are using publicly available datasets, without people’s personal information.

 - [X] **B.2 Right to be forgotten**: Do we have a mechanism through which an individual can request their personal information be removed?

 > Any person should have the right to remove their personal information. We are using a public dataset, so we will likely not have their personal information in the first place. 

 - [X] **B.3 Data retention plan**: Is there a schedule or plan to delete the data after it is no longer needed?

 > We can delete the data on people’s mental health after analysing the features for our research question. There is no need to hold on to the data.  

### C. Analysis
 - [X] **C.1 Missing perspectives**: Have we sought to address blindspots in the analysis through engagement with relevant stakeholders (e.g., checking assumptions and discussing implications with affected communities and subject matter experts)?

 > We should consider the relevant groups’ perspectives during analysis, such as people who have mental health symptoms including depression, climate change or weather related scientists, and possibly therapists who work with people with depression. 

- [X] **C.2 Dataset bias**: Have we examined the data for possible sources of bias and taken steps to mitigate or address these biases (e.g., stereotype perpetuation, confirmation bias, imbalanced classes, or omitted confounding variables)?

 > To reduce bias, we plan to take any variables that are not relevant to our question out of our data relationship calculations, including variables such as race, gender, class, etc. 

 - [X] **C.3 Honest representation**: Are our visualizations, summary statistics, and reports designed to honestly represent the underlying data?

 > There are many ways to manipulate how the same data looks through visualization such as changing the axis on a graph, so we will take steps to ensure that we are not purposefully doing that in order to showcase the data impartially. 

- [X] **C.4 Privacy in analysis**: Have we ensured that data with PII are not used or displayed unless necessary for the analysis?

 > We will eliminate any PII from our dataset to prevent it from becoming a problem in the first place. This way we will not accidently reveal anyone’s personal information. 

- [X] **C.5 Auditability**: Is the process of generating the analysis well documented and reproducible if we discover issues in the future?

 > This is important for those who want to replicate or improve this data analysis and research question. Throughout this process, we will try to document our process, as well as check over the documentation at the end to ensure that it is understandable and clearly communicates what we have done. 

### D. Modeling
- [X] **D.1 Proxy discrimination**: Have we ensured that the model does not rely on variables or proxies for variables that are unfairly discriminatory?

 > We plan to verify that the model does not rely on variables that are unfairly discriminatory. We aspire for our data to analyze people of all backgrounds and cultures and how depression of large, diverse groups from different areas of the United States are affected by varying light pollution. 

- [X] **D.2 Fairness across groups**: Have we tested model results for fairness with respect to different affected groups (e.g., tested for disparate error rates)?

- [X] **D.3 Metric selection**: Have we considered the effects of optimizing for our defined metrics and considered additional metrics?

 > We will consider the effects of optimizing for our defined metrics possibility to include trade-offs like bias, overfitting, and unequal subgroup performance so we plan to monitor other metrics like fairness and error distributions to verify improvements reflected real, responsible model performance.

- [X] **D.4 Explainability**: Can we explain in understandable terms a decision the model made in cases where a justification is needed?

 > Yes, many of our members have had experience describing technical outcomes or phenomena in more understandable terms to be more friendly and accessible by viewers of all backgrounds and understandings of data science.

- [X] **D.5 Communicate limitations**: Have we communicated the shortcomings, limitations, and biases of the model to relevant stakeholders in ways that can be generally understood?

 > We plan to communicate the flaws, shortcomings, and limitations of the model to be transparent with our work. We want to be completely open in regards to the benefits our project has and what it does well while still noting the parts that 

### E. Deployment
- [X] **E.1 Monitoring and evaluation**: Do we have a clear plan to monitor the model and its impacts after it is deployed (e.g., performance monitoring, regular audit of sample predictions, human review of high-stakes decisions, reviewing downstream impacts of errors or low-confidence decisions, testing for concept drift)?

- [X] **E.2 Redress**: Have we discussed with our organization a plan for response if users are harmed by the results (e.g., how does the data science team evaluate these cases and update analysis and models to prevent future harm)?

 > Yes, we can implement a response process where potential harms are reviewed. Cases would be analyzed by the team and insights are used to update data, metrics, model, etc. to reduce risk of future harm.

- [X] **E.3 Roll back**: Is there a way to turn off or roll back the model in production if necessary?

 > Is there a way to turn off or roll back the model in production if necessary?
The goal would be to include an ability to shut off or pull back the model if issues arise to safely and efficiently revert to a safer version of the model.

- [X] **E.4 Unintended use**: Have we taken steps to identify and prevent unintended uses and abuse of the model and do we have a plan to monitor these once the model is deployed?

 > We hope to consider unintended uses and abuses of the model by implementing constraints and documentation to limit misuse and consistent monitoring to detect it if it happens in the future post-deployment.


## Team Expectations 

Communication will be conducted and kept over Instagram, expect a response time within 5-6 hours. We will meet once a week, on Mondays at 5:30 PM over Zoom.

The tone we expect is something that is direct but also courteous and polite. We don’t intend to tiptoe around each other and if we need to recommend solutions or improvements, we must be direct about it while also being empathetic towards the other team members. This would mean emphasizing something along the lines of “Hey, I thought X could be a fantastic idea to incorporate, and wanted to know what your thoughts were on it. Currently, as we are doing Y, I see there are some ways that Z could be improved, and I would love to chat about it.”

Most things have been a majority vote so far, mainly because we have the luxury of time in most scenarios. However, when starting the actual project work, we will likely assign SME’s or Subject Matter Experts to different aspects of the project to lead it. Some of these can include Filtering SME, Writing SME, Data Analysis SME, etc so certain team members have greater authority or responsibility in different parts of the project. If a decision has to be made in a short time frame and a relevant teammate is non-responsive, so far we have had individual team members to take initiative and do what it takes for the group to succeed. If it is an ill outcome that happens when the corresponding team mate tries to implement a solution when relevant contacts aren’t answering - the whole group will accept the outcome that happens as we are all trying our best for the sake of the project.

This is described in more detail in the preceding numbered section, there will likely be more specialization. So far, Afraz Hameed specializes in logistics, meetings, and coordination of work. Other roles will likely come up later, but generally our ideal scenario is everybody working on a little bit of everything, while some specialize further in certain categories. Tasks have currently been assigned to people via Google Docs and for the foreseeable future we intend to use Google Docs with checkboxes and assigning members to tasks using the Comment feature in Google Docs. 

If someone is struggling to deliver something they promised to do, they must reach out to the group to get further resources, information, or help in achieving their goal. By default, when any of us are assigned to our tasks, the expectation is that they will complete it with what they have. However, if there is any support needed, whether it’s via more teammates helping out, information that could benefit them, or anything else - it is in our team’s responsibility to support them. Preferably, if you’re struggling on a task well over a few hours with no progress whatsoever, it doesn’t hurt to contact the team and inform us of what’s going on. The group will allocate the team members that have the most bandwidth at the time to provide support when needed.

Afraz Hameed will act as the de-facto Facilitator of the project group.


## Project Timeline Proposal

| Meeting Date | Meeting Time | Completed Before Meeting | Discuss at Meeting |
|---|---|---|---|
| 1/20 | 2 PM | Read & thought about COGS 108 expectations | Brainstormed potential project ideas; established team communication; introduced ourselves; shared individual skills and experiences |
| 1/27 | 2 PM | Explored shortlisted ideas further and gathered initial dataset sources | Finalized project idea focusing on light pollution and depression through team consensus; assigned follow-up tasks; worked on the Project Review Assignment |
| 2/3 | 2 PM | Edited and finalized the proposal; searched for light pollution and depression datasets | Discussed possible data wrangling approaches; assigned group members to roles such as data cleaning, visualization, and documentation |
| 2/14 | 6 PM | Import and begin wrangling light pollution and state-level depression datasets; start basic EDA | Review and edit wrangling/EDA progress; discuss the analysis plan |
| 2/23 | 12 PM | Finalize wrangling/EDA; begin initial correlation or regression analysis | Review and edit analysis progress; complete the project check-in |
| 3/13 | 12 PM | Complete main analysis; draft results, conclusion, and discussion sections | Review and edit the full project as a group |
| 3/20 | Before 11:59 PM | Final project completed | Turn in final project and complete group project surveys |
