![Illustration of silhouetted heads](mentalhealth.jpg)

Does going to university in a different country affect your mental health? A Japanese international university surveyed its students in 2018 and published a study the following year that was approved by several ethical and regulatory boards.

The study found that international students have a higher risk of mental health difficulties than the general population, and that social connectedness (belonging to a social group) and acculturative stress (stress associated with joining a new culture) are predictive of depression.


Explore the `students` data using PostgreSQL to find out if you would come to a similar conclusion for international students and see if the length of stay is a contributing factor.

Here is a data description of the columns you may find helpful.

| Field Name    | Description                                      |
| ------------- | ------------------------------------------------ |
| `inter_dom`     | Types of students (international or domestic)   |
| `japanese_cate` | Japanese language proficiency                    |
| `english_cate`  | English language proficiency                     |
| `academic`      | Current academic level (undergraduate or graduate) |
| `age`           | Current age of student                           |
| `stay`          | Current length of stay in years                  |
| `todep`         | Total score of depression (PHQ-9 test)           |
| `tosc`          | Total score of social connectedness (SCS test)   |
| `toas`          | Total score of acculturative stress (ASISS test) |

Your task will be to do the following exploratory analysis:
- Count the number of all records, and all records per student type
- Filter the data to see how it differs between the student types
- Find the summary statistics of the diagnostic tests for all students
- Summarize the data for international students
- See if length of stay impacts the test scores

In [1]:
# SQL Engine imports
from dotenv import load_dotenv
import os
import psycopg2
from sqlalchemy import create_engine
from sqlalchemy.sql import text
import warnings
warnings.filterwarnings("ignore")

# Python data analysis imports
import pandas as pd
import numpy as np
pd.set_option('display.max_columns', None)

Initialize SQL

In [2]:
load_dotenv()
user = os.environ.get("USER")
pw = os.environ.get("PASS")
db = os.environ.get("DB")
host = os.environ.get("HOST")
api = os.environ.get("API")
port = 5432

In [3]:
uri = f"postgresql+psycopg2://{user}:{pw}@{host}:{port}/{db}"
alchemyEngine = create_engine(uri)
conn = alchemyEngine.connect()

Load data

In [4]:
df = pd.read_csv('students.csv')
df.to_sql('students', conn, if_exists='replace', index=False)

286

In [5]:
def query(stmt: str):
    """Executes a given SQL statement and returns a Pandas DataFrame given the results.
    
    Parameters
    ----------
    stmt: str
        The SQL statement to be executed
    """
    global conn
    result = pd.read_sql_query(stmt, conn)
    return result

Exploring the data

## 0. Viewing the full dataset

In [6]:
query('SELECT * FROM students')

Unnamed: 0,inter_dom,region,gender,academic,age,age_cate,stay,stay_cate,japanese,japanese_cate,english,english_cate,intimate,religion,suicide,dep,deptype,todep,depsev,tosc,apd,ahome,aph,afear,acs,aguilt,amiscell,toas,partner,friends,parents,relative,profess,phone,doctor,reli,alone,others,internet,partner_bi,friends_bi,parents_bi,relative_bi,professional_bi,phone_bi,doctor_bi,religion_bi,alone_bi,others_bi,internet_bi
0,Inter,SEA,Male,Grad,24.0,4.0,5.0,Long,3.0,Average,5.0,High,,Yes,No,No,No,0.0,Min,34.0,23.0,9.0,11.0,8.0,11.0,2.0,27.0,91.0,5.0,5.0,6.0,3.0,2.0,1.0,4.0,1.0,3.0,4.0,,Yes,Yes,Yes,No,No,No,No,No,No,No,No
1,Inter,SEA,Male,Grad,28.0,5.0,1.0,Short,4.0,High,4.0,High,,No,No,No,No,2.0,Min,48.0,8.0,7.0,5.0,4.0,3.0,2.0,10.0,39.0,7.0,7.0,7.0,4.0,4.0,4.0,4.0,1.0,1.0,1.0,,Yes,Yes,Yes,No,No,No,No,No,No,No,No
2,Inter,SEA,Male,Grad,25.0,4.0,6.0,Long,4.0,High,4.0,High,Yes,Yes,No,No,No,2.0,Min,41.0,13.0,4.0,7.0,6.0,4.0,3.0,14.0,51.0,3.0,3.0,3.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,,No,No,No,No,No,No,No,No,No,No,No
3,Inter,EA,Female,Grad,29.0,5.0,1.0,Short,2.0,Low,3.0,Average,No,No,No,No,No,3.0,Min,37.0,16.0,10.0,10.0,8.0,6.0,4.0,21.0,75.0,5.0,5.0,5.0,5.0,5.0,2.0,2.0,2.0,4.0,4.0,,Yes,Yes,Yes,Yes,Yes,No,No,No,No,No,No
4,Inter,EA,Female,Grad,28.0,5.0,1.0,Short,1.0,Low,3.0,Average,Yes,No,No,No,No,3.0,Min,37.0,15.0,12.0,5.0,8.0,7.0,4.0,31.0,82.0,5.0,5.0,5.0,2.0,5.0,2.0,5.0,5.0,4.0,4.0,,Yes,Yes,Yes,No,Yes,No,Yes,Yes,No,No,No
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
281,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,46,222,,,,,,,,,
282,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,19,249,,,,,,,,,
283,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,65,203,,,,,,,,,
284,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,21,247,,,,,,,,,


## 1. Exploring the dataset

Start by exploring the dataset and understanding the data we are working with:

First, counting the number of records in the dataset:

In [7]:
query('select count(*) as total_records from students ')

Unnamed: 0,total_records
0,286


Next, counting the number of records for each student type:

In [8]:
query('select inter_dom, count(*) as count_inter_dom from students group by inter_dom')

Unnamed: 0,inter_dom,count_inter_dom
0,,18
1,Dom,67
2,Inter,201


Then, examining information on the domestic and international students

In [9]:
query("select * from students where inter_dom='Dom'")

Unnamed: 0,inter_dom,region,gender,academic,age,age_cate,stay,stay_cate,japanese,japanese_cate,english,english_cate,intimate,religion,suicide,dep,deptype,todep,depsev,tosc,apd,ahome,aph,afear,acs,aguilt,amiscell,toas,partner,friends,parents,relative,profess,phone,doctor,reli,alone,others,internet,partner_bi,friends_bi,parents_bi,relative_bi,professional_bi,phone_bi,doctor_bi,religion_bi,alone_bi,others_bi,internet_bi
0,Dom,JAP,Female,Grad,27.0,5.0,2.0,Medium,3.0,Average,3.0,Average,Yes,Yes,No,Yes,Major,12.0,Mod,47.0,16.0,11.0,5.0,8.0,7.0,3.0,31.0,81.0,7.0,3.0,7.0,1.0,6.0,6.0,1.0,5.0,4.0,1.0,,Yes,No,Yes,No,Yes,Yes,No,Yes,No,No,No
1,Dom,JAP,Female,Under,18.0,1.0,1.0,Short,5.0,High,3.0,Average,No,No,No,No,No,9.0,Mild,48.0,9.0,4.0,5.0,4.0,3.0,2.0,10.0,37.0,4.0,4.0,4.0,4.0,1.0,1.0,1.0,1.0,1.0,1.0,4.0,No,No,No,No,No,No,No,No,No,No,No
2,Dom,JAP,Female,Under,21.0,3.0,3.0,Medium,5.0,High,3.0,Average,Yes,No,No,No,No,7.0,Mild,40.0,16.0,8.0,10.0,8.0,6.0,4.0,20.0,72.0,6.0,6.0,7.0,1.0,1.0,1.0,5.0,1.0,1.0,1.0,4.0,Yes,Yes,Yes,No,No,No,Yes,No,No,No,No
3,Dom,JAP,Male,Under,20.0,2.0,3.0,Medium,5.0,High,1.0,Low,No,No,No,No,No,3.0,Min,47.0,11.0,4.0,5.0,4.0,5.0,2.0,12.0,43.0,1.0,5.0,5.0,3.0,1.0,1.0,3.0,1.0,1.0,1.0,3.0,No,Yes,Yes,No,No,No,No,No,No,No,No
4,Dom,JAP,Female,Under,21.0,3.0,3.0,Medium,5.0,High,1.0,Low,No,No,Yes,Yes,Other,10.0,Mod,48.0,8.0,4.0,5.0,4.0,3.0,2.0,10.0,36.0,7.0,5.0,7.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,Yes,Yes,Yes,No,No,No,No,No,No,No,No
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
62,Dom,JAP,Female,Under,21.0,3.0,4.0,Long,5.0,High,4.0,High,No,Yes,No,No,No,8.0,Mild,27.0,16.0,9.0,10.0,8.0,7.0,4.0,20.0,74.0,1.0,7.0,5.0,1.0,3.0,3.0,3.0,1.0,1.0,1.0,6.0,No,Yes,Yes,No,No,No,No,No,No,No,Yes
63,Dom,JAP,Female,Under,22.0,3.0,3.0,Medium,3.0,Average,4.0,High,Yes,Yes,No,No,No,2.0,Min,48.0,8.0,10.0,5.0,4.0,3.0,4.0,16.0,50.0,7.0,7.0,7.0,7.0,2.0,2.0,2.0,2.0,2.0,1.0,3.0,Yes,Yes,Yes,Yes,No,No,No,No,No,No,No
64,Dom,JAP,Female,Under,19.0,2.0,1.0,Short,5.0,High,3.0,Average,No,No,No,No,No,9.0,Mild,47.0,8.0,7.0,5.0,5.0,3.0,2.0,13.0,43.0,5.0,7.0,7.0,6.0,7.0,7.0,7.0,1.0,1.0,1.0,2.0,Yes,Yes,Yes,Yes,Yes,Yes,Yes,No,No,No,No
65,Dom,JAP,Male,Under,19.0,2.0,1.0,Short,5.0,High,3.0,Average,No,No,No,No,No,1.0,Min,43.0,8.0,12.0,5.0,4.0,3.0,2.0,10.0,44.0,7.0,5.0,7.0,5.0,5.0,5.0,5.0,4.0,4.0,4.0,2.0,Yes,Yes,Yes,Yes,Yes,Yes,Yes,No,No,No,No


In [10]:
query("select * from students where inter_dom='Inter'")

Unnamed: 0,inter_dom,region,gender,academic,age,age_cate,stay,stay_cate,japanese,japanese_cate,english,english_cate,intimate,religion,suicide,dep,deptype,todep,depsev,tosc,apd,ahome,aph,afear,acs,aguilt,amiscell,toas,partner,friends,parents,relative,profess,phone,doctor,reli,alone,others,internet,partner_bi,friends_bi,parents_bi,relative_bi,professional_bi,phone_bi,doctor_bi,religion_bi,alone_bi,others_bi,internet_bi
0,Inter,SEA,Male,Grad,24.0,4.0,5.0,Long,3.0,Average,5.0,High,,Yes,No,No,No,0.0,Min,34.0,23.0,9.0,11.0,8.0,11.0,2.0,27.0,91.0,5.0,5.0,6.0,3.0,2.0,1.0,4.0,1.0,3.0,4.0,,Yes,Yes,Yes,No,No,No,No,No,No,No,No
1,Inter,SEA,Male,Grad,28.0,5.0,1.0,Short,4.0,High,4.0,High,,No,No,No,No,2.0,Min,48.0,8.0,7.0,5.0,4.0,3.0,2.0,10.0,39.0,7.0,7.0,7.0,4.0,4.0,4.0,4.0,1.0,1.0,1.0,,Yes,Yes,Yes,No,No,No,No,No,No,No,No
2,Inter,SEA,Male,Grad,25.0,4.0,6.0,Long,4.0,High,4.0,High,Yes,Yes,No,No,No,2.0,Min,41.0,13.0,4.0,7.0,6.0,4.0,3.0,14.0,51.0,3.0,3.0,3.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,,No,No,No,No,No,No,No,No,No,No,No
3,Inter,EA,Female,Grad,29.0,5.0,1.0,Short,2.0,Low,3.0,Average,No,No,No,No,No,3.0,Min,37.0,16.0,10.0,10.0,8.0,6.0,4.0,21.0,75.0,5.0,5.0,5.0,5.0,5.0,2.0,2.0,2.0,4.0,4.0,,Yes,Yes,Yes,Yes,Yes,No,No,No,No,No,No
4,Inter,EA,Female,Grad,28.0,5.0,1.0,Short,1.0,Low,3.0,Average,Yes,No,No,No,No,3.0,Min,37.0,15.0,12.0,5.0,8.0,7.0,4.0,31.0,82.0,5.0,5.0,5.0,2.0,5.0,2.0,5.0,5.0,4.0,4.0,,Yes,Yes,Yes,No,Yes,No,Yes,Yes,No,No,No
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
196,Inter,SEA,Male,Under,21.0,3.0,3.0,Medium,4.0,High,4.0,High,Yes,Yes,No,No,No,6.0,Mild,37.0,23.0,8.0,11.0,8.0,6.0,4.0,22.0,82.0,3.0,2.0,2.0,2.0,4.0,2.0,4.0,2.0,6.0,4.0,4.0,No,No,No,No,No,No,No,No,Yes,No,No
197,Inter,SEA,Female,Under,20.0,2.0,1.0,Short,2.0,Low,4.0,High,Yes,No,No,No,No,7.0,Mild,16.0,29.0,18.0,17.0,17.0,12.0,8.0,44.0,145.0,7.0,1.0,3.0,3.0,4.0,4.0,4.0,4.0,7.0,1.0,4.0,Yes,No,No,No,No,No,No,No,Yes,No,No
198,Inter,SEA,Female,Under,21.0,3.0,3.0,Medium,2.0,Low,5.0,High,Yes,No,No,Yes,Major,16.0,ModSev,25.0,24.0,11.0,17.0,4.0,11.0,6.0,37.0,110.0,5.0,7.0,3.0,1.0,6.0,1.0,6.0,1.0,4.0,1.0,3.0,Yes,Yes,No,No,Yes,No,Yes,No,No,No,No
199,Inter,SEA,Female,Under,18.0,1.0,1.0,Short,1.0,Low,4.0,High,No,No,No,No,No,8.0,Mild,38.0,11.0,12.0,10.0,4.0,7.0,4.0,20.0,68.0,5.0,5.0,4.0,3.0,3.0,3.0,3.0,3.0,5.0,5.0,5.0,Yes,Yes,No,No,No,No,No,No,Yes,Yes,Yes


## 2. Find out the summary statistics of the diagnostic tests for all students

Compute the aggregrate functions minimum, maximum, and average, and round the averages to two decimal places and use aliases to keep the output clean. The diagnostic test fields are:
- PHQ-9 for scoring depression (todep): higher is worse
- SCS test for scoring social connectedness (tosc): to assess the extent to which persons feel connected to others in their surrounding social area, higher is worse
- ASISS test for scoring acculturative stress (toas): the stress that emerges from conflicts when individuals must adjust to a new culture of the host society, higher is worse

In [11]:
query('''
	select 	MIN(todep) as min_phq, MAX(todep) as max_phq, AVG(todep) as avg_phq,
			MIN(tosc) as min_scs, MAX(tosc) as max_scs, AVG(tosc) as avg_scs,
			MIN(toas) as min_as, MAX(toas) as max_as, AVG(toas) as avg_as
	from students
''')

Unnamed: 0,min_phq,max_phq,avg_phq,min_scs,max_scs,avg_scs,min_as,max_as,avg_as
0,0.0,25.0,8.186567,8.0,48.0,37.473881,36.0,145.0,72.380597


## 3. Summarize the data for international students only and local students only and compare

In [12]:
query('''
    select 	
        MIN(todep) as min_phq, MAX(todep) as max_phq, AVG(todep) as avg_phq, 		
        MIN(tosc) as min_scs, MAX(tosc) as max_scs, AVG(tosc) as avg_scs, 		
        MIN(toas) as min_as, MAX(toas) as max_as, AVG(toas) as avg_as 
    from students where inter_dom='Dom'
    union
    select 	
        MIN(todep) as min_phq, MAX(todep) as max_phq, AVG(todep) as avg_phq, 
        MIN(tosc) as min_scs, MAX(tosc) as max_scs, AVG(tosc) as avg_scs, 		
        MIN(toas) as min_as, MAX(toas) as max_as, AVG(toas) as avg_as 
    from students where inter_dom='Inter'
''')

Unnamed: 0,min_phq,max_phq,avg_phq,min_scs,max_scs,avg_scs,min_as,max_as,avg_as
0,0.0,23.0,8.61194,8.0,48.0,37.641791,36.0,112.0,62.835821
1,0.0,25.0,8.044776,11.0,48.0,37.41791,36.0,145.0,75.562189


Observations:
- Average PHQ-9 score for depression (avg_phq): roughly the same, slightly higher in domestic students
- Average SCS score for social connectedness (avg_scs): greater in international students
- Average ASISS score for acculturative stress (avg_assiss): greater in international students

Slightly higher score indicating higher depression in domestic students without any other factors, but slightly higher SCS and ASISS scores indicating higher stress from social connectedness and acculturative stress.

## 4. See the impact of the length of stay

See how the length of stay of an international student impacts the average diagnostic scores. Order the results by descending order of the length of stay.

In [13]:
query('''
    select 
        stay, 
        ROUND(AVG(todep)::NUMERIC,2) as average_phq, 
        ROUND(AVG(tosc)::NUMERIC,2) as average_scs, 
        ROUND(AVG(toas)::NUMERIC,2) as average_as
    from students where inter_dom='Inter'
    group by stay
    order by stay desc limit 9
''')

Unnamed: 0,stay,average_phq,average_scs,average_as
0,10.0,13.0,32.0,50.0
1,8.0,10.0,44.0,65.0
2,7.0,4.0,48.0,45.0
3,6.0,6.0,38.0,58.67
4,5.0,0.0,34.0,91.0
5,4.0,8.57,33.93,87.71
6,3.0,9.09,37.13,78.0
7,2.0,8.28,37.08,77.67
8,1.0,7.48,38.11,72.8


Observation: The longer the international students stayed, the higher their PHQ score would be. On the other hand, the students staying longest had the lowest core in SCS and ASISS, suggesting better social connectedness and less acculturative stress over time.