# Student Alcohol Consumption
## Eric Lin, Naveen Janarthanan, Estelle Jiang, Nuo Chen

In [5]:
# import data
from IPython.display import HTML

from statistical_analysis import df_nice, regression

import warnings
warnings.filterwarnings('ignore')

HTML('''<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Click here to view imported code."></form>''')

## Project Description
> An issue that persists in modern day is abusive alcohol consumption by adolescents. These adolescents tend to start drinking at a very young age for various physical, emotional, and lifestyle changes. Puberty and learning how to live independently often contribute to the commence of alcohol consumption. However, due to the immature mindset that most adolescents have during these early ages, they tend to make bad decisions regarding anything they might term as _"risky"_ or _"cool"_, such as consuming large amounts of alcohol to get drunk. In fact, **51%** of junior and senior high school students have had at least one drink within the past year and **8 million students drink weekly**. **More than 3 million students drink alone**, **more than 4 million drink when they are upset**, and **less than 3 million drink because they are bored**. In addition, parents, friends, and alcoholic beverage advertisements influences students’ attitudes about alcohol. Students' drinking habit is often heavily influenced by their surroundings and is likely impacted by the local student culture and norms. As such, we wanted to analyze this issue in further detail by analyzing all the possible variables that could potentially have an effect on student alcohol consumption, such as personal statistics, parent statistics and education values, and produce a model to help predict student drinking rates based on these features.


## Data Set 

In [3]:
df_nice.head()

Unnamed: 0,Course,"Weekday Alc Consumption (1=low, 5=high)","Feather Education (0=none, 5=higher edu)",Father Job,Period 1 Grades (0-20 Scale),Period 2 Grades (0-20 Scale),Final Grade (0-20 Scale),"Mother Education (0=none, 5=higher edu)",Mother Job,"Parents Living Together(T), Apart(A)","Weekend Alc Consumption (1=low, 5=high)",Number of School Absences,Extra Curricular Activities,Urban(U)/Rural(R) Location,Student Age,Number of Failures,"Family Relationship Quality (1=not good, 5=good)","Family Size (LE3:<=3, GT3:>3",Family Education Support,"Free Time (1=low, 5=high)","Go Out w/ Friends (1=low, 5=high)",Guardian,"Current Health Status (1=bad, 5=good)",Wants to take Higher Education,Internet,Attended Nursery School,Paid for Extra Classes,Reason to Choose this School,In a Romantic Relationship,"Student School (GP=Gabriel Pereira, MS=Mousinho da Silveira)",Extra Educational Support,Student Sex,Weekly Studytime,"Travel Time to School (1=<15 min, 2=15-30 min, 3=30 min-1 hour, 4=>1 hour)",Social Index,Drinking Index
0,Math,1,4,teacher,5,6,6,4,at_home,A,1,6,no,U,18,0,4,GT3,no,3,4,mother,3,yes,no,yes,no,course,no,GP,yes,F,2,2,1.0,1.0
1,Math,1,1,other,5,5,6,1,at_home,T,1,4,no,U,17,0,5,GT3,yes,3,3,father,3,yes,yes,no,no,course,no,GP,no,F,2,1,0.77,1.0
2,Math,2,1,other,7,8,10,1,at_home,T,3,10,no,U,15,3,4,LE3,no,3,2,mother,3,yes,yes,yes,yes,other,no,GP,yes,F,2,1,0.52,2.714286
3,Math,1,2,services,15,14,15,4,health,T,1,2,yes,U,15,0,3,GT3,yes,2,2,mother,5,yes,yes,yes,yes,home,yes,GP,no,F,3,1,0.54,1.0
4,Math,1,3,other,6,10,10,3,other,T,2,4,no,U,16,0,4,GT3,yes,3,2,father,5,yes,no,yes,yes,home,no,GP,no,F,2,1,0.5,1.714286


> The Student Alcohol Consumption data set is found on the Kaggle website. The data was obtained in a survey of student's math and portuguese language courses in the secondary school. In the data set, there are more than thirty features about the student, such as the gender, the age, as well as whether the student is engaged in a romantic relationship. The website also provided us with two data sets; one dataset contained students from a math course, and the other dataset contained students from a Portuguese language course. We will be using those data to predict the alcohol consumption of a student within a week. 

## Data Preparation

### Data Cleaning (discuss how/if we needed to do any data cleaning)


### New Variables
#### Drinking per Week Index (DWI)
> The Drinking per Week Index (DWI) is an index that measures the amount of alcohol a student consumes throughout the entire week. From the dataset given, we wanted a more precise way of measuring alcohol consumption rates, rather than determining both a weekday and weekend alcohol consumption index. As such, the best way to go about this was to combine both indexes and find the average of these indexes. However, since both indexes are weighted differently, we had to take that into account when finding the average of the weekday and weekend alcohol consumption index by multiplying each index by the number of days there are  in the given variable divided by the number of days there are in a week (as shown below). This index ranges from one to five, where five represents high alcohol consumption and one represents low alcohol consumption. Ultimately, the DWI allows us to convert categorical variables into a continuous variable, which we can use to run a regression model and make predictions about how much alcohol a student consumes in a week. 


$$ 
Drinking/Week~Index = \bigg[ \dfrac{ \big(Weekend~Alc~Consumption \times 5\big) + \big(Weekday~Alc~Consumption \times 2\big) }{7}  \bigg]
$$

#### Social Index
> A study done by healthtalk on drugs and alcohol suggests that most high school and college students drink to socialize with others, to have fun and relax. When junior/senior high school students and college students go out with their friends, it is usually to grab some drinks and attend a party. Students tend to drink at abnormally high raters their first year of college when they move out of the house. Students iwth access to the internet tends to utilize social media at high rates, and as such is closely associated with how social a student is. Additionally, whether a student is in a relationship or not strongly dictates how social they are, as relationships usually mandate meeting your partner's friends, hanging out with your partnet often, and attending various events with your partner and your/their friends. Engaging in extra-curriculat activities has a negative correlation with respect to how social a student is, as a student entertained in more extra-curricular activities will likely _NOT_ have time to socialize with others. As such, we decided to create the **Social Index** feature, which indicates how social a given student is based on how often they go out with friends, whether they have access to the internet (1=yes, 0=no), whether they are in a romantic relationship (1=yes, 0=no), and whether they are doing any extra-curricular activities (1=yes, 0=no). We created the social index formula based on how each feature correlates with the the drinking per week index. Below is the formula to determine the Social Index:

$$ 
Social~Index = \big(0.25 \times Go~Out~w/~Friends \big) + \big(0.02 \times Internet \big) + \big(0.03 \times In~a~Romantic~Relationship \big) + \big(\text{-} 0.01 \times Extra~Curricular~Activities \big)
$$

## Exploratory Data Analysis (the statistical analysis stuff)
> 

![](img/plt_hist_social_index.png)

![](img/plt_scatter_social_dwi.png)

In [6]:
regression.summary()

0,1,2,3
Dep. Variable:,drinking,R-squared:,0.393
Model:,OLS,Adj. R-squared:,0.354
Method:,Least Squares,F-statistic:,10.07
Date:,"Sun, 10 Mar 2019",Prob (F-statistic):,5.39e-70
Time:,01:28:00,Log-Likelihood:,-1320.1
No. Observations:,1044,AIC:,2768.0
Df Residuals:,980,BIC:,3085.0
Df Model:,63,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.0987,0.690,0.143,0.886,-1.256,1.453
school[T.MS],-0.0349,0.080,-0.437,0.663,-0.192,0.122
sex[T.M],0.5601,0.064,8.754,0.000,0.435,0.686
address[T.U],-0.1000,0.072,-1.387,0.166,-0.242,0.041
Pstatus[T.T],0.0563,0.094,0.597,0.551,-0.129,0.242
famsize[T.LE3],0.1804,0.065,2.776,0.006,0.053,0.308
Medu[T.1],-0.5001,0.316,-1.584,0.114,-1.120,0.120
Medu[T.2],-0.7287,0.317,-2.302,0.022,-1.350,-0.107
Medu[T.3],-0.5355,0.321,-1.670,0.095,-1.165,0.094

0,1,2,3
Omnibus:,19.689,Durbin-Watson:,1.915
Prob(Omnibus):,0.0,Jarque-Bera (JB):,20.322
Skew:,0.326,Prob(JB):,3.86e-05
Kurtosis:,3.205,Cond. No.,9680000000000000.0


![](img/plt_actual_pred_dwi_before.png)

![](img/plt_heat_map.png)

![](img/plt_actual_pred_dwi_aft.png)

Sources: 
- http://www.healthtalk.org/young-peoples-experiences/drugs-and-alcohol/alcohol-and-social-life