# Exploratory Data Analysis of Student Self-Study Habits

## 1. Executive Summary
This project examines student self-study behaviors, particularly those of female students, utilizing Exploratory Data Analysis (EDA). The project leverages data gathered from student surveys, which include information on the subject studied, the start and end times of study sessions, and the Rate of Perceived Exhaustion (RPE). The analysis aims to provide insights into various aspects of student self-study habits, with an emphasis on female students.


### Key areas of focus:
- Average Study Duration: The analysis determines the average amount of time female students dedicate to self-study. This is a crucial metric for understanding their commitment to independent learning.
- Average RPE: The project calculates the average RPE experienced by female students during self-study sessions. Understanding RPE levels can help assess the mental and physical effort involved in their study practices.
- Prime Study Times: The analysis identifies specific time slots during which students tend to study for longer durations or report lower RPE levels. This insight can be valuable for optimizing study schedules and improving learning efficiency.
- Subject-wise Trends: The project investigates the relationship between the choice of subject and both study duration and RPE. This exploration aims to uncover if certain subjects require longer study hours or are perceived as more demanding.
  

- The findings of this project will be beneficial for:
    - Students: By understanding their self-study patterns, students can tailor their learning strategies and schedules to enhance their learning effectiveness.
    - Educators: The insights can help educators design interventions and provide guidance to students on effective self-study practices.
    - Parents: The analysis can inform parents about their children's study habits, allowing them to provide appropriate support and encouragement.
Overall, this project aims to contribute to a better understanding of student self-study habits and inform strategies for improving their learning outcomes.

---

## 2. Introduction

### Project Background
Understanding student self-study habits is crucial for optimizing learning outcomes. This project analyzes data collected through student surveys to gain insights into their study patterns and identify potential areas for improvement.
### Project Objectives
This project aims to achieve the following objectives through a detailed analysis of student self-reported data:
- Determine the average study duration for students. 
- Calculate the average Rate of Perceived Exhaustion (RPE) experienced by students during self-study. By measuring RPE, the project aims to assess the mental and physical effort invested by students during their study sessions.
- Identify prime study times based on duration and RPE levels. This objective involves analyzing the relationship between study time slots and both duration and RPE to determine when students are most engaged and productive.
- Investigate the relationship between subject, duration, and RPE. This analysis aims to uncover whether specific subjects demand longer study hours or lead to higher levels of perceived exertion, offering insights into subject-specific study strategies.\
  
### Scope of the Study
This study is focused on analyzing self-reported data collected from a sample of students through a structured survey. The data encompasses information on the subject studied, the start and end times of study sessions, and the Rate of Perceived Exhaustion (RPE). The scope of the study is limited to exploring the relationships between these variables, and the findings may not be generalizable to all student populations due to the limited sample size and potential biases in self-reported data.
    
### Research Questions
This project is guided by the following research questions:
- How much time do students dedicate to self-study on average? This question aims to quantify the average duration of self-study sessions for female students, providing a basic understanding of their time allocation.
- What is the average RPE reported by students during self-study sessions? This question seeks to understand the general level of perceived exertion experienced by students while studying independently.
- Are there specific time slots when students study for longer durations or experience lower RPE? This question explores the concept of prime study times, aiming to identify periods when students are most effective in their self-study efforts.
- Does the choice of subject influence study duration or RPE? This question investigates the potential impact of different subjects on study habits, seeking to understand whether certain subjects require more time or induce higher levels of perceived exertion.

---

## 3. Methodology
This section outlines the methodology employed in this project to analyze student self-study behaviors. The approach involves utilizing Python libraries for data manipulation and visualization, guided by the research questions established in the introduction.

### 3.1 Data Collection

The data for this project was collected through a structured survey administered to a sample of students. The survey included questions about:
- The subject studied during each self-study session
- The start and end times of each session
- The student's perceived level of exhaustion, measured using the Rate of Perceived Exhaustion (RPE) scale. 1 denoting least and 5 denoting maximum.
- This scale is commonly used in exercise science to measure subjective feelings of exertion during physical activity, and it can also be applied to assess mental effort during cognitive tasks like studying.

### 3.2 Data Preprocessing

After data collection, the survey responses were compiled into a comma-separated value (CSV) file named 'timeSheet.csv'. This file was then imported into a Python environment using the pandas library for data analysis and manipulation. The raw data underwent several preprocessing steps to ensure its quality and suitability for analysis.
Key preprocessing steps included:
- Removing Irrelevant Columns: Columns containing irrelevant information, such as 'Date' and 'Email Address', were dropped from the dataset using the `df.drop()` function in `pandas`.
- Extracting Date and Time: The 'Timestamp' column, which contained both date and time information, was processed using the `pd.to_datetime()` function to extract separate 'dates' and 'time' columns. The original 'Timestamp' column was then dropped from the dataset.
The 'Timestamp' column, containing both date and time information, is converted to datetime objects using the `pd.to_datetime()` function. This allows for separate extraction of 'dates' and 'time' using the `dt.date` and `dt.time` attributes.<br>
`df['Timestamp'] = df['Timestamp'].apply(lambda x : pd.to_datetime(str(x)))`<br>
`df['dates'] = df['Timestamp'].dt.date`<br>
`df['time'] = df['Timestamp'].dt.time`<br>
- Calculating Duration: To determine the duration of each study session, the 'Start Time' and 'End Time' columns were converted to datetime objects using `pd.to_datetime()`. The duration of each session was then calculated by subtracting the 'Start Time' from the 'End Time' and stored in a new 'Duration' column.
The 'Start Time' and 'End Time' columns are transformed into datetime objects using `pd.to_datetime()`. The 'Duration' is then calculated by subtracting 'Start Time' from 'End Time'.<br>
`df['Start Time'] = df['Start Time'].apply(lambda x : pd.to_datetime(str(x)))`<br>
`df['End Time'] = df['End Time'].apply(lambda x : pd.to_datetime(str(x)))`<br>
`df['Duration'] = df['End Time'] - df['Start Time']`<br>
- Converting Duration to Readable Format: Initially, the 'Duration' column is in a format that includes "`0 days`" followed by the actual duration in hours, minutes, and seconds (e.g., "`0 days 01:30:00`"). This code aims to remove the "`0 days`" prefix and retain only the time portion for better readability.The code snippet used for converting the 'Duration' column to a more readable format is:<br>
`df['Duration'] = df['Duration'].astype(str).str.split('0 days ').str[-1]`<br>
Here is a breakdown of the code:<br>
`df['Duration'].astype(str)`: This converts the 'Duration' column to string data type, allowing string operations to be applied.<br>
`.str.split('0 days ')`: This splits each string in the 'Duration' column at the "0 days " delimiter, resulting in a list of strings for each row.<br>
`.str[-1]`: This selects the last element from the list of strings generated by the split operation. As the duration time is the last part after the split, this effectively extracts the desired time portion.<br>
By applying this code, the 'Duration' column will be transformed to contain only the time part of the duration, making it easier to read and work with in subsequent analyses. For instance, "0 days 01:30:00" would become "01:30:00".


### 3.3 Exploratory Data Analysis (EDA)
The preprocessed data was then subjected to EDA using a combination of pandas and matplotlib, a Python plotting library. EDA techniques were employed to:
- Calculate Descriptive Statistics: The `df.describe()` function in pandas was used to obtain descriptive statistics for the 'RPE' column, such as mean, standard deviation, minimum, and maximum values. This provided an initial overview of the distribution of perceived exertion levels.
- Grouping and Aggregation: The data was grouped by different variables, such as 'Name' and 'Subject', using the `groupby()` function in pandas. Aggregation functions like `mean()` were then applied to calculate average RPE and duration for each group.
- Data Visualization: Bar plots were generated using matplotlib to visualize the average RPE and duration for different groups, including by student name, by subject, and by time of day. These visualizations helped identify patterns and trends in the data.


<img title="a title" alt="Alt text" src="avgRPE.png" width = 800>

<img title="a title" alt="Alt text" src="avgDur.png" width=800>

#### Code snippet for average duration ( as an example providing code snippet for Jui , applying it for every student. ) 
`newdf = jui_df.drop(['dates', 'time','StartTime','EndTime','RPE','Name'],axis =1)`<br>
`newdf['Duration'] = newdf['Duration'].apply(lambda x : pd.to_datetime(str(x))) #converting srt to date-time`<br>
`newdf['Hours'] = newdf['Duration'].dt.hour`<br>
`newdf['Minutes'] = newdf['Duration'].dt.minute`<br>
`newdf = newdf.drop(['Duration'],axis =1)`<br>
`newdf['Duration']= newdf['Minutes']+ newdf['Hours']*60`<br>
`newdf = newdf.drop(['Hours','Minutes'],axis = 1)`<br>
`newdf.groupby('Subject').mean()`<br>
<br>
#### Code snippet for average RPE ( as an example providing code snippet for Jui , applying it for every student. ) 
`newdf = jui_df.drop(['dates', 'time','StartTime','EndTime','Duration','Name'],axis =1)`<br>
`newdf.groupby('Subject').mean()`<br>

#### Showcasing resulting dataframes :

#### Swara : <br>
<img title="a title" alt="Alt text" src="swara.png" width = 500><br>
#### Jui : <br>
<img title="a title" alt="Alt text" src="jui.png" width = 500><br>
##### Nupur : <br>
<img title="a title" alt="Alt text" src="nupur.png" width = 500><br>

### 3.3.1 Average Duration Calculation

<img title="a title" alt="Alt text" src="duration.png" width = 800>

`import matplotlib.pyplot as plt`<br>
`df12 = pd.DataFrame({'Name': ['Nupur', 'Jui', 'Swara'], 'Biology': [70, 0, 0]})`<br>
`df13 = pd.DataFrame({'Name': ['Nupur', 'Jui', 'Swara'], 'Chemistry': [60, 79.15, 98.50]})`<br>
`df14 = pd.DataFrame({'Name': ['Nupur', 'Jui', 'Swara'], 'English': [100, 97.5, 107.33]})`<br>
`df15 = pd.DataFrame({'Name': ['Nupur', 'Jui', 'Swara'], 'Math1': [80, 74.58, 135.0]})`<br>
`df16 = pd.DataFrame({'Name': ['Nupur', 'Jui', 'Swara'], 'Math2': [65, 79.66, 101.0]})`<br>
`df17 = pd.DataFrame({'Name': ['Nupur', 'Jui', 'Swara'], 'Physics': [96.34, 75.09, 98.55]})`<br>

`#use concat to combine more than two DataFrames`
`df = pd.concat([df12.set_index('Name'), df13.set_index('Name'), df14.set_index('Name'), df15.set_index('Name'), df16.set_index('Name'), df17.set_index('Name')], axis=1)`<br>
`df`<br>

<img title="a title" alt="Alt text" src="durDF.png" width = 600>

`#plot the DataFrame`<br>
`ax = df.plot(kind='bar', logy=True, figsize=(20, 10), rot=0, ylabel='Duration', title="Average Duration of the each subject given by students")`<br>
`ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left')`<br>
`for c in ax.containers:`<br>
    `ax.bar_label(c, fmt='%.2f', label_type='edge')`<br>
`plt.show()`<br>

Transpose of the above graph : <br>
<img title="a title" alt="Alt text" src="durationTrans.png" width = 800>

### 3.3.2. Average RPE Calculation

<img title="a title" alt="Alt text" src="rpe.png" width = 800>

`df12 = pd.DataFrame({'Name': ['Nupur', 'Jui', 'Swara'], 'Biology': [1.8, 0, 0]})`<br>
`df13 = pd.DataFrame({'Name': ['Nupur', 'Jui', 'Swara'], 'Chemistry': [2.0, 3.0, 2.1]})`<br>
`df14 = pd.DataFrame({'Name': ['Nupur', 'Jui', 'Swara'], 'English': [1.5, 2.0, 2.0]})`<br>
`df15 = pd.DataFrame({'Name': ['Nupur', 'Jui', 'Swara'], 'Math1': [3.0, 2.58, 2.0]})`<br>
`df16 = pd.DataFrame({'Name': ['Nupur', 'Jui', 'Swara'], 'Math2': [2.5, 2.58, 2.2]})`<br>
`df17 = pd.DataFrame({'Name': ['Nupur', 'Jui', 'Swara'], 'Physics': [2.8, 3.0, 2.3]})`<br>

`#use concat to combine more than two DataFrames`<br>
`df = pd.concat([df12.set_index('Name'), df13.set_index('Name'), df14.set_index('Name'), df15.set_index('Name'), df16.set_index('Name'), df17.set_index('Name')], axis=1)`<br>
`df`<br>

<img title="a title" alt="Alt text" src="rpeDf.png" width= 500>

`#plot the DataFrame`<br>
`ax = df.plot(kind='bar', logy=True, figsize=(20, 10), rot=0, ylabel='RPE', title="Average RPE of the each subject felt by students")`<br>
`ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left')`<br>
`for c in ax.containers:`<br>
    `ax.bar_label(c, fmt='%.2f', label_type='edge')`<br>
`plt.show()`<br>


### 3.3.3 Prime Time Analysis
To identify prime study times, the analysis focused on examining the relationship between the time of day and both study duration and RPE. The 'StartTime' column was used to categorize study sessions into different time slots:
- Dawn (00:00 to 6:00)
- Morning (6:00 to 12:00)
- Afternoon (12:00 to 18:00)
- Evening (18:00 to 00:00).<br>
This categorization allowed for comparisons of average duration and RPE across different time periods. Overlapping bar plots were used to visualize the relationship between duration and RPE for each time slot, providing a clear representation of prime study times for individual students.<br>
Code snippet :<br>
`df = original_df.copy()`<br>
`newdf = df.drop(['Subject','RPE','dates','time'], axis=1)`<br>
`newdf['Duration'] = newdf['Duration'].apply(lambda x : pd.to_datetime(str(x))) #converting srt to date-time`<br>
`newdf['Hours'] = newdf['Duration'].dt.hour`<br>
`newdf['Minutes'] = newdf['Duration'].dt.minute`<br>
`newdf = newdf.drop(['Duration'],axis =1)`<br>
`newdf['Duration']= newdf['Minutes']+ newdf['Hours']*60`<br>
`newdf = newdf.drop(['Hours','Minutes'],axis = 1)`<br>
`newdf['tod'] = np.nan`<br>
`def ftod(x):`<br>
 ` if (x>=0) & (x<6):`<br>
  `      tod = 'dawn'`<br>
  ` elif (x>=6) & (x<12):`<br>
   `     tod = 'morning'`<br>
    ` elif (x>=12) & (x<18):`<br>
     `   tod = 'afternoon'`<br>
    ` else:`<br>
     `   tod = 'evening'`<br>
    ` return tod`<br>
`newdf['tod'] = newdf['StartTime'].dt.hour.map(ftod) `<br>


This is how output looks like <br>
<img title="a title" alt="Alt text" src="demo.png" width = 600>

After creating separate data frames for girls :


<img title="a title" alt="Alt text" src="dfSeparate.png" width = 600>

Plotting a graph 

<img title="a title" alt="Alt text" src="durationTimeSlot.png" width = 800>

Plotting tis transpose for the comparison point of view :

<img title="a title" alt="Alt text" src="durationTimeSlots.png" width = 800>

Similarly plotting RPE with respect to timeslots 

<img title="a title" alt="Alt text" src="rpeTimeSlots.png" width = 800>

### 3.3.4 Personal Prime time analysis :
For analysing the prime time, the concept used was the timeslot having most RPE and most duration. Hence to gain comparative analysis joint bar graphs are used. I have assumed max duration to be reached as 3 hrs and max RPE 5. Here duration is mentioned in hrs. Hence the transparent one shows the max and foreground shows the actual values. 


<img title="a title" alt="Alt text" src="nupurPT.png" width = 700>
<img title="a title" alt="Alt text" src="swaraPT.png" width = 700>
<img title="a title" alt="Alt text" src="juiPT.png" width = 700>

For comparative analysis 

<img title="a title" alt="Alt text" src="compPT.png" width = 700>

### 3.3.5 Analysis based on number of sessions 

`def create_subject_summary(df):` takes input a dataframe which is separated dataframe of a student outputs a dtatframe with summary of aveage RPE over total number of sessions , aveagra duration over total number of sessions and total number of sessions

`def create_subject_summary(df):`<br>
    `"""`<br>
    `Creates a summary dataframe with sessions count, average RPE and duration (in hours) for each subject.`<br>
    `"""`<br>
    `summary = df.groupby('Subject').agg({`<br>
        `'RPE': ['count', 'mean'],`<br>
        `'Duration': 'mean'`<br>
    `})`<br>
    `summary.columns = ['Sessions', 'Avg RPE', 'Avg Dura']` <br>
    `summary = summary.reset_index()`<br>
    `summary['Avg Dura'] = summary['Avg Dura'] / 60  # Convert minutes to hours`<br>
    `summary['Avg RPE'] = summary['Avg RPE'].round(2)`<br>
    `summary['Avg Dura'] = summary['Avg Dura'].round(2)`<br>
    `return summary`<br>

    
---
    

`def plot_subject_summary(summary_df):` function plots the dataframe creating joint bar graphs

`def plot_subject_summary(summary_df):`<br>
    `"""`<br>
    `Creates a grouped bar plot showing Sessions, Avg RPE, and Avg Duration for each subject.`  <br>
    `Parameters:`<br>
    `summary_df (pandas.DataFrame): Summary dataframe from create_subject_summary function`<br>
    `"""`<br>
    `fig = go.Figure(data=[`<br>
        `go.Bar(name='Sessions', `<br>
               `x=summary_df['Subject'], `<br>
               `y=summary_df['Sessions'],`<br>
               `marker_color='rgb(55, 83, 109)',`<br>
               `text=summary_df['Sessions'],`<br>
               `textposition='auto'),`<br>   
        `go.Bar(name='Average RPE', `<br>
               `x=summary_df['Subject'], `<br>
               `y=summary_df['Avg RPE'],`<br>
               `marker_color='rgb(26, 118, 255)',`<br>
               `text=summary_df['Avg RPE'].round(2),`<br>
               `textposition='auto'),`<br>
        `go.Bar(name='Average Duration', `<br>
               `x=summary_df['Subject'], `<br>
               `y=summary_df['Avg Dura'],`<br>
               `marker_color='rgb(158, 202, 225)',`<br>
               `text=summary_df['Avg Dura'].round(1),`<br>
               `textposition='auto')`<br>
    `])`<br>
    `# Update layout`<br>
    `fig.update_layout(`<br>
        `title='Subject-wise Analysis',`<br>
        `title_x=0,`<br>
        `barmode='group',`<br>
        `width=1000,`<br>
        `height=600,`<br>
        `yaxis_title='Values',`<br>
        `xaxis_title='Subjects',`<br>
        `bargap=0.15,`<br>
        `bargroupgap=0.1,`<br>
        `showlegend=True,`<br>
        `legend=dict(`<br>
            `yanchor="top",`<br>
            `y=1.15,`<br>
            `xanchor="center",`<br>
            `x=0.5,`<br>
            `orientation="h"`<br>
        `)`<br>
    `)`<br>
    `return fig`<br>

---

### Nupur :<br>
<img title="a title" alt="Alt text" src="nupur_PT.png" width = 700><br>
### Swara :<br>
<img title="a title" alt="Alt text" src="swara_PT.png" width = 700><br>
### Jui :<br>
<img title="a title" alt="Alt text" src="jui_PT.png" width = 700><br>

### Comparative Analysis : 

<img title="a title" alt="Alt text" src="comp_PT.png" width = 900><br>

### Conclusion :
The methodology employed in this project involved a systematic approach to data collection, preprocessing, EDA, and visualization using Python libraries. The findings from this analysis are presented in the subsequent sections of this report, offering insights into the self-study habits of female students.


## 4. Data Analysis and Results
Analysis focused on students, exploring average RPE, duration, and prime study times. The data was further segmented by subject and student to understand individual study patterns. The results are detailed in the subsequent sections.
- Key Findings and Discussion
   - Average RPE: The average RPE across all students was 2.61 (out of 4), indicating moderate levels of perceived exertion during self-study.
   - Average Duration: The average study duration varied among students. Detailed analysis for each student's average duration is presented in the "Data Analysis and Results" section.
   - Prime Study Times: Students exhibited different prime study times, with some favoring mornings while others were more productive in the afternoon or evening. Detailed breakdowns for each student's prime study times are available in the "Data Analysis and Results" section.
   - Subject-wise Analysis: Both average duration and RPE showed variations across different subjects. Specific subjects like English and Math-I tended to have longer average durations while subjects like Chemistry and Physics often resulted in higher RPE scores. 

The sources provide a detailed analysis of student study habits, visualized through various graphs. 

**Overall Study Patterns:**

* **Swara demonstrates the highest average study duration across all subjects**, followed closely by Jui. This suggests a greater time commitment to studying compared to Nupur. 
* **Jui consistently reports higher RPE levels across most subjects**, indicating she might perceive the subjects as more challenging or requiring greater effort. 

**Average Study Duration and RPE by Subject:**

* **Swara's significant time investment in Maths-I** suggests a potential focus or interest in this subject. Her average duration for Maths-I is the highest among all subjects and students.
* **Nupur's highest average duration is for English**, implying a possible preference or strength in this area. This is further supported by her relatively lower RPE for English, indicating a correlation between perceived effort and time spent. 
* **Jui's strong tendency to favor Chemistry is evident by the highest number of study sessions dedicated to this subject.** This focused approach might indicate a particular interest or a specific preparation strategy. 

**Prime Time Analysis:**

* **Nupur's prime study time appears to be the evening**, exhibiting the highest average duration and a moderate RPE during this period.
* **Jui's study pattern is relatively balanced across different timeslots**,  with no significant peaks or troughs in duration or RPE. 
* **Swara's analysis highlights a preference for morning and dawn hours for study**. However, her highest RPE occurs in the dawn slot, suggesting potential fatigue or reduced focus during very early hours. 

**Session-wise Insights:**

* Analyzing session durations reveals **variability in study session lengths for each student and subject**.  Some sessions are short and focused, while others are extended.
* **Examining RPE in conjunction with session duration can provide insights into individual productivity levels.** For instance, a high RPE coupled with a shorter duration might indicate a struggle to maintain focus or a high level of perceived difficulty, while a low RPE with a longer duration suggests a comfortable and productive study session.
* **Tracking session-wise data over time can reveal trends in study habits and help identify potential areas for improvement.**  For example, if a student consistently has short sessions with high RPE for a specific subject, it might indicate a need for additional support or a change in study strategies.



## 5. Conclusion

This project delves into analyzing student study patterns using a dataset containing information about study subjects, durations, and perceived exertion levels (RPE). Through a series of data manipulations and visualizations, the project successfully uncovers key insights into the distinct study habits and preferences of three students: Nupur, Swara, and Jui. 

The following key conclusions emerge from the analysis:

*   **Individualized Study Habits:** Each student exhibits unique study patterns. **Swara emerges as the most dedicated student in terms of time invested**, with the highest average study duration across subjects. **Jui, while spending slightly less time, consistently reports higher RPE scores**, suggesting she perceives the subjects as more demanding. **Nupur, on the other hand, shows a more balanced approach**, with her study durations and RPE levels varying significantly across subjects and times of day.

*   **Prime Time Variability:**  Analyzing study durations and RPE across different times of day reveals the importance of **personalized study scheduling**.  **Nupur's prime study time appears to be the evening**, as evidenced by her longer durations and moderate RPE during this period.  **Jui's study pattern is relatively consistent across timeslots**,  while **Swara demonstrates a preference for mornings and dawn**,  despite experiencing higher RPE in the very early hours.

*   **Subject-Specific Insights:** The analysis reveals **strong subject preferences** influencing study patterns. **Swara's significant time investment in Maths-I points toward a particular interest or focus on this subject.** **Nupur's preference for English**,  marked by longer durations and lower RPE,  suggests a potential strength in this area.  **Jui's focused dedication to Chemistry, evident through a high number of sessions, hints at a strategic approach to mastering the subject.**

*   **Session-Level Analysis:** **Analyzing session durations and RPE together provides a nuanced understanding of student productivity and focus.**  Some students engage in short, intense sessions, while others prefer longer, more relaxed periods. **The combination of duration and RPE reveals valuable insights into sessions where students thrive versus those where they might struggle**, helping to identify potential areas for improvement. 

By combining these key conclusions, the project effectively demonstrates the value of data-driven insights in understanding student study habits. **The visualizations and analyses serve as powerful tools for educators and students to develop tailored study plans that optimize learning effectiveness, time management, and overall academic performance.** By leveraging these insights, students can identify their prime study times, adjust study strategies based on subject-specific challenges, and improve their overall learning experience. 
