In [None]:
library(tidyverse)

[1mRows: [22m[34m395[39m [1mColumns: [22m[34m33[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ";"
[31mchr[39m (17): school, sex, address, famsize, Pstatus, Mjob, Fjob, reason, guardi...
[32mdbl[39m (16): age, Medu, Fedu, traveltime, studytime, failures, famrel, freetime...

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


# DSCI 100 Project

### Title: The Effects of Lifestyle Based Variables on the Math Grades of Students

### Introduction:

- Provide some relevant background information on the topic so that someone unfamiliar with it will be prepared to understand the rest of your proposal:
- Clearly state the question you will try to answer with your project:
- Identify and describe the dataset that will be used to answer the question:


### Preliminary exploratory data analysis

- Demonstrate that the dataset can be read from the web into R 
- Clean and wrangle your data into a tidy format
- Using only training data, summarize the data in at least one table (this is exploratory data analysis). An example of a useful table could be one that reports the number of observations in each class, the means of the predictor variables you plan to use in your analysis and how many rows have missing data. 
- Using only training data, visualize the data with at least one plot relevant to the analysis you plan to do (this is exploratory data analysis). An example of a useful visualization could be one that compares the distributions of each of the predictor variables you plan to use in your analysis.


In [None]:
url <- "https://raw.githubusercontent.com/chloezandberg/dsci-100-project/main/student-mat.csv"
math_data <- read_csv2(url)
math_data

In [None]:
clean_math_data <- math_data |>
    select(freetime, studytime, absences, G3) |> # G3 is final grade in Portugese school system
    mutate(final_percentage = (G3/20)*100) |> # final grade is out of 20
    mutate(grade_range = cut(final_percentage, c(0, 49, 54, 67, 79, 100))) |>
    mutate(final_percentage = fct_recode(final_percentage, 
                                           NA = "F",
                                           "(0,49]" = "F",
                                           "(49,54]" = "D",
                                           "(54,67]" = "C",
                                           "(67,79]" = "B",
                                           "(79,100]" = "A"))
                                           
clean_math_data

In [None]:
avg_predictors <- clean_math_data |>
            select(freetime, studytime, absences) |>
            map(mean())

number_observations <- clean_math_data |>
            n() |>

observations_tibble <- tibble(observations = number_observations)

exploratory_data_analysis <- bind_rows(observations_tibble, avg_predictors)

exploratory_data_analysis

In [None]:
math_plot <- ggplot(clean_math_data, aes(x=freetime, y=studytime, color = final_percentage)) +
                geom_point(alpha = 0.4) +
                labs(x="Free time (1-4)",
                y="Study time(1-4)",
                color = "Final letter grade (based on UBC marking scheme)") +
                ggtitle("Free time vs. study time for Portugese math students")

### Methods:


- Explain how you will conduct either your data analysis and which variables/columns you will use. Note - you do not need to use all variables/columns that exist in the raw data set. In fact, that's often not a good idea. For each variable think: is this a useful variable for prediction?
- Describe at least one way that you will visualize the results


- We will conduct our data analysis on numerical values that have higher correlation in order to show a trend. We intend to use lifestyle based variables like study time, free time, and absences to find their effects on the final math grades of students attending the given school.

- One way we will visualize the results is by using a scatterplot to plot our variable on the x-axis vs the grade percentage on the y axis.
  
- Another way that the data could be visualized is by comparing the correlational values of the variables we choose on a bar graph, in order to display how much impact, if any, each variable has on final math grade.



### Expected outcomes and significance:
- This classifictation model is intended to provide insight into how study time, free time, and absences affect student performance. We aim to measure and comprehend the predictive potential of these variables for grade categories(e.g.,A+,A,B,F,etc.). For example, more study time mat be linked to better grades; free time may be linked to both academic success and general well-being; and frequent absenteeism may have negative impacts on grades. We anticipate identifying which lifestyle and study habits are most predictive of academic success. This includes how study time affects grades, how crucial it is to manage free time for academic productivity, and how frequent absences harm students's academic performance.
- Our project's findings could influence curriculum development and educational policy in addition to inspiring the creation of focused intervention initiatives. Through an awareness of the connections between study time, free time, absenteeism, and academic performance, educators and politicians may create curriculum and policies that support productive study habits, stimulate leisure activities, and stress the value of consistent attendance. With their help, schools would be able to provide at-risk students with specialized mental health services, attendance improvement plans, and tutoring on an individual basis. In the end, these adjustments might enhance learning results, foster student growth, and get students ready for further education and employment
- Future Questions: How could schools and educational institutions measure the long-term success of personalized intervention strategies that are based on student's specific patterns in study time, free time, and attendance? How might the interaction between study habits, leisure time, and attendance during formative educational years influence cognitive development and decision-making skills into adulthood? 

