## COVID-19 and its impact on education, social life and mental health of students: A Survey Link to the paper
In this study, a cross-sectional survey is conducted with a sample size of 1182 students of different age groups from different educational institutions in Delhi National Capital Region (NCR).

### Task Details


Below are some of the suggestions you could use in your notebook:

> 1. Descriptive analysis on the given dataset.
> 2. Inferential statistical analysis:
> * Correlation between different attributes such as to determine statistical relationship between time spent on sleep and time spent on fitness.
> * Association between age of the students (group them into '7-17', '18-22', and
'23-') and different attributes including health issue, change in weight, etc.
(Pearson Chi Square test).
> * Non parametric tests to check significant differences between the distribution
of age of students/ region of residence with time spent on different activities.

### Download dataset

In [None]:
#import kaggle

In [None]:
#!kaggle datasets download -d kunal28chaturvedi/covid19-and-its-impact-on-students 

In [None]:
#!unzip covid19-and-its-impact-on-students.zip

In [None]:
#import os
#os.rename('COVID-19 Survey Student Responses.csv', 'covid-19-survey.csv')

### Import libraries & dataset.

In [None]:
%%time
import pandas as pd
import numpy as np
import warnings

pd.options.display.max_columns = 200
pd.options.display.max_rows = 200
warnings.filterwarnings('ignore')

In [None]:
survey_df = pd.read_csv('../input/covid19-and-its-impact-on-students/COVID-19 Survey Student Responses.csv')
survey_df

In [None]:
survey_df.info();

### Note :
1. First of all renaming the columns as per my convenience.
2. There is missing data in columns such as __Rating of Online Class experience__ and __Medium for online class__.
3. Rest columns are okay.
4. Many columns are treated as object that we need to deal also. 

## Data preparation & data cleaning.

In [None]:
survey_df.rename(
    columns = {
        'Region of residence': 'region',
        'Age of Subject': 'age',
        'Time spent on Online Class': 'time_online_class',
        'Rating of Online Class experience': 'rating_online_class',
        'Medium for online class': 'medium',
        'Time spent on self study': 'time_self_study',
        'Time spent on fitness': 'time_fitness',
        'Time spent on sleep': 'time_sleep',
        'Time spent on social media': 'time_social_media',
        'Prefered social media platform': 'prefered_social_media',
        'Time spent on TV': 'time_tv',
        'Number of meals per day': 'num_meals_per_day',
        'Change in your weight': 'delta_weight',
        'Health issue during lockdown': 'health_issue_in_lockdown',
        'Stress busters': 'stress_busters',       
        'Time utilized': 'time_utilized',
        'Do you find yourself more connected with your family, close friends , relatives  ?': 'more_family_connected',
        'What you miss the most': 'miss_most'}, inplace = True)

del survey_df['ID']  # Deleting ID column since it is not useful.

survey_df.head()

In [None]:
survey_df.describe()

### Note :
1. People from age 7 to uptill 59 are studying online during covid-19 situation.
2. Average time spend on the online classes is 3 hours ~ 3 & a half.(we can say)
3. Average time given to self study : 2.9 hours ~ almost 3 hours.
4. Average time given to fitness/health : 1 hour.
5. Average time spend on social media : 2.36 hours.
6. Average meal people are getting : nearly 3 time/day. 

In [None]:
print(survey_df['rating_online_class'].unique())

In [None]:
print(survey_df['medium'].unique())

In [None]:
survey_df[['rating_online_class', 'medium']].isnull().sum()

In [None]:
%%time
from sklearn.impute import SimpleImputer 
imputer = SimpleImputer(missing_values = np.nan, strategy = 'most_frequent').fit(survey_df[['rating_online_class', 'medium']])
survey_df[['rating_online_class', 'medium']] = imputer.transform(survey_df[['rating_online_class', 'medium']])
survey_df[['rating_online_class', 'medium']].isnull().sum()

In [None]:
survey_df['prefered_social_media'].value_counts()

In [None]:
survey_df['prefered_social_media'].unique()

In [None]:
survey_df['prefered_social_media'].replace('None ', 'None', inplace = True)
survey_df['prefered_social_media'].replace('Whatsapp', 'WhatsApp', inplace = True)
survey_df['prefered_social_media'].value_counts()

In [None]:
survey_df['time_tv'].unique()

### Note 
1. In here we can see many different responses.<br>
2. We would replace anything like 'n', 'N, 'no', empty etc. with __0__<br>
3. We would also change the datatype for this column to be __float__ instead of objec2. 

In [None]:
survey_df['time_tv'].replace({'n':'0', 'N':'0', 'No tv':'0', ' ':'0', 0:'0'}, inplace = True)
survey_df['time_tv'] = survey_df['time_tv'].astype('float', copy = True)

In [None]:
survey_df['stress_busters'].unique()

In [None]:
survey_df['stress_busters'].replace([
    ['Sleep'],
    ['Scrolling through social media'],
    ['Reading books'],
    ['Talking to your relatives']
], ['Sleeping', 'Social Media', 'Reading', 'Talking'], inplace = True)

In [None]:
survey_df['stress_busters'].replace([
    'Exercising','Exercise','Gym','Workout ','Cardio',
    'workout','working out and some physical activity'
],'Exercise/Gym', inplace = True)

In [None]:
survey_df['stress_busters'].replace([
    'Talking with friends ','Talking','Talking to friends','With a friend',
    'Calling friends','Taking with parents','Talk with childhood friends.',
], 'Talking', inplace = True)

In [None]:
survey_df['stress_busters'].replace([
    'Listening to music',' listening music, motion design, graphic design, sleeping.',
    'singing','Workout and listening music',
    'Both listining music and scrolling down social media',
    'Listening to music and reading books both . ',
    'Poetry, writing books and novels , listening to music too'
], 'Music', inplace = True)

In [None]:
survey_df['stress_busters'].replace([
    'Online surfing','live stream watching','Watching orgasm releasing videos','Anime Manga',
    'Watching ted talks and music and books','Watching YouTube ','Internet',
    'Online gaming , surfing and listening to music ','Web Series','Watching web series',
    'Netflix, Friends and Books','Youtube'
], 'Internet Surfing', inplace=True)

In [None]:
survey_df['stress_busters'].replace([
    'Coding and studying for exams',
    'sketching,reading books,meditation,songs',
    'Many of these',
    'All reading books watching web series listening to music and talking to friends',
    'Many among these ',
    'Do some home related stuff',
    'watching movies,reading books,games,listening to music,sleep,dancing',
    'Reading books, music, exercise',
    'Whatever want','listening to music,reading books and dancing.',
], 'Many Things', inplace=True)

In [None]:
survey_df['stress_busters'].replace([
    'Reading','drawing','Dancing','Meditation','Driving','Drawing, painting','Forming ','Painting','Sketching',
    'Sports','Painting ','Drawing','Football','Business','Running','I run','Drawing and painting and sketching',
    'I play Rubiks cube','Indoor Games','I cant de-stress myslef','Writing my own Comics & novels',
    'I have no problem of stress ','Sketching and writing','By engaging in my work.', 'Work',
    'Painting,. Sewing','Crying','Dont get distreessed','gardening cartoon','Playing ','no stress',
    'Cricket','No able to reduce the stress ','drawing ','Writing'
], 'Doing Extra Activities', inplace=True)

In [None]:
survey_df['stress_busters'].replace([
    'Sleeping, Online games',
    'pubg'
], 'Online gaming', inplace=True)

In [None]:
print(survey_df['stress_busters'].unique())

In [None]:
print(survey_df['stress_busters'].value_counts())

In [None]:
survey_df['miss_most'].unique()

In [None]:
survey_df['miss_most'].replace(
    [
        'All the above',
        'All of the above ',
        'everything',
        'All above',
        'all of the above',
        'ALL','all',
        'All of the above',
        'all of them',
        'All of them',
        'All '
    ],
    'All', inplace=True)

In [None]:
survey_df['miss_most'].replace(
    [
        'NOTHING',
        'Nothing this is my usual life',
        'To stay alone. ',
        'Nothing ',
        'Nah, this is my usual lifestyle anyway, just being lazy....',
        'Normal life',
        'My normal routine',
        'nothing',
        'Job',
        'I have missed nothing',
        'Previous mistakes',
        '.',
        'I have missed nothing ',
        'Internet'
    ],
    'Nothing', inplace=True)

In [None]:
survey_df['miss_most'].replace(
    [
        'Only friends',
        'Friends , relatives',
        'relatives and friends',
        'Family ',
        'The idea of being around fun loving people but this time has certainly made us all to reconnect (and fill the gap if any) with our families and relatives so it is fun but certainly we do miss hanging out with friends',
        'Family',
        'Friends, relatives & travelling',
        'Travelling & Friends',
        'School and friends',
        'Friends and School',
        'Eating outside and friends.',
        'School and friends.',
        'school, relatives and friends',
        'School and my school friends'
    ],
    'Friends/Relatives/Family', inplace=True)

In [None]:
survey_df['miss_most'].replace(
    [
        'Playing',
        'Roaming around freely',
        'Taking kids to park',
        'Being social ',
        'Friends and roaming around freely',
        'Friends,Romaing and traveling',
        'Metro',
        'Going to the movies',
        'Gym',
        'Football',
        'Badminton in court'
    ],
    'Passing Time Outside', inplace=True)

In [None]:
survey_df['miss_most'].unique()

In [None]:
survey_df['miss_most'].value_counts()

In [None]:
survey_df.head()

Data is much ``cleaner`` now.

## Exploratory Analysis & Visualizations.📚

- We will explore every aspect of our dataset.
- We will gather some information which we will visualise to make some conclusion
- We will Also do some basic mathematics to infer some insights related to dataset
- Then we will also visualise and compare anything we might found to be interesting

In [None]:
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns
import matplotlib
%matplotlib inline
plt.style.use('seaborn-whitegrid')

In [None]:
survey_df.age.describe()

In [None]:
fig = px.histogram(survey_df, x = 'age', marginal = 'box',
                   width = 900, height = 580, color_discrete_sequence=['plum'],)
fig.update_layout(title = 'Age distribution',
                  xaxis_title = 'Age groups',
                  yaxis_title = 'Number of students',
                  font = dict(family = 'Arial', size = 15),
                  bargap = 0.1)
fig.show();

#### Conclusion:
- We Can immediately see that most of the students answered the survey fall in the category of 15-25
- This is the age group where most students are self-aware and able to answer these questions
- Also we can see that Highest number of students are 20 years old, so maybe they are in college and universities and we might get something interesting as covid-19 actually hampered their studies
- There are even students from age __40 to 59__.

### Class Ratings :

In [None]:
print(survey_df['rating_online_class'].unique())

In [None]:
survey_df['rating_online_class'].value_counts()

In [None]:
fig = px.histogram(survey_df, x = survey_df['rating_online_class'], color = 'rating_online_class',
            width=900, height = 580)

fig.update_layout(title = 'Ratings for online class',
                 xaxis_title = 'Ratings',
                 yaxis_title = 'No. of students',
                 font = dict(family = 'Droid Serif', size = 15))
fig.show()

### Insights :
- We have exactly 437 students who states that the class is being held is ``very poor``.
- Below that we have 387 students who states that the class is ``Average``.
- 30 students says its ``poor``.
- Apart from this 230 students & 98 student says their class is ``good`` and ``excellent`` repectively.



- We can infer that there is a large number of students who are not enjoying online classes scenario.
- So we can Confidently say that Online classes are not as good as actual classes because Students need some kind of environment to excel in studeies which online classes fail to provide

In [None]:
#total students - rating wise count
1182-437-387-230-98-30

### Time spent on study :

In [None]:
fig = px.histogram(survey_df, x = 'time_self_study', 
                   color_discrete_sequence=['darkorange'],
                   width=900, height = 550)

fig.update_layout(title = 'Time spent on selft study',
                 xaxis_title = 'Hours',
                 yaxis_title = 'Number of students',
                 font = dict(family = 'Balto', size = 13))
fig.show()

### Insights :
- Around 1 ot 3 hours students averagly spend on their studies.
- We have 346 students who spends 2 hours on their studies. These are must be self aware students of the university/college.
- There are students who spends more than 5/7 hours on studies.
- There are 15 students who spends 10 hours on self studies. 5 students spends 12 hours.
- Lastly we see there are two students who spends 17, 18 hours on self studies respectively.

## Popular social media platforms :

In [None]:
fig = px.histogram(survey_df, y = survey_df['prefered_social_media'],
                   color = 'prefered_social_media', width = 900, height = 570)

fig.update_layout(title = 'Prefered Social Medias',
                  xaxis_title = 'Total users', 
                  yaxis_title = 'Social medias',
                  font = dict(
                      family = 'Balto', size = 15))
fig.show()

### Note :
- We can see there are total 14 social media platforms are popular among students.
- We will choose the top 5 - the most used platforms by students.
- that would be easy for analysis.

In [None]:
top5_social = survey_df['prefered_social_media'].value_counts().nlargest(5)
top5_social

In [None]:
pact = top5_social*100
pact /= top5_social.sum()


fig = px.histogram(survey_df, x = pact, y = pact.index, width=900, height=550, 
                   color = pact.index)
fig.update_layout(title = 'Top 5 social media platforms among students',
                 xaxis_title = 'Percentage',
                 yaxis_title = 'Social Medias',
                 font = dict(family = 'Droid Serif', size = 15))
fig.show()

### Insights : 
- As we obsereved earlier, more than ``31 %`` students prefer using __Instagram__ as it provides a source of entertainment at fingertips and only a few swipes and you are loaded with ton of dopamine in the time of lockdown
- Also Usage of __WhatsApp__ is also about ``30 %`` as this platform helps them to connect with friends and family easily, moreover during the Lockdown most schools are providing material and other important notices through WhatsApp so this might be one of the reasons that this is so popular
- Now __YouTube__ is 3rd in the list, although it is not considered a well-defined social media but many students are sharing thier artworks, insights, achievement through this platform, Also YouTube has become the largest learing community in the world as every bit of knowledge is present there

### Time spend on social media by students:

In [None]:
fig = px.scatter(survey_df, x = 'age', y = 'time_social_media', size = 'time_social_media', color='prefered_social_media')
fig.update_layout(title = 'Time spend on social medias',
                 xaxis_title = 'students of age',
                 yaxis_title = 'Time spent',
                 font = dict(family = 'Droid Serif', size = 14))
fig.show()

### Insights :
- Age group from __12 to 25__ seems to be spending a lot of time on social media during covid-19.
- There are people who actually spends __10 hours__ on social medias.
- 10 hours is the most highest time/duration spend on social media, among youngsters.
- Even there are people of age 27 to 34 spending time 7 to 8 hours on these medias, these are must be _unemployed._ 

### Time spent of self study :

In [None]:
fig = px.scatter(survey_df, x = 'age', y = 'time_self_study', size = 'time_self_study', color='time_self_study')
fig.update_layout(title = 'Time spend slef study',
                 xaxis_title = 'students of age',
                 yaxis_title = 'Time spent',
                 font = dict(family = 'Droid Serif', size = 14))
fig.show()

### Insights :
- Average time spent on studies is __2-3__ hours.
- Students uder age of __15 to 25/27__ spends most time on their studies.
- Heighest time spent on the self studies is 17-18 hours, by some of these students(not all).
- Even people of age __30 to 40__ spends __4 to 5__ hours on self studies.

 ###  Do students find themselves MORE Connected with their Family/Close Frinds  ?

In [None]:
import plotly.graph_objects as go
from plotly import tools

labels, values = survey_df['more_family_connected'].unique(), survey_df['more_family_connected'].value_counts()

fig = go.Figure(data=[go.Pie(labels = labels, values = values, pull=[0.1])])
fig.update_layout(title = 'Do You Feel more Connected to Family/Close Friends ?',
                 font = dict(family = 'Droid Serif', size = 15))
fig.show()

### Insights :
- 70.3% people says __YES__, they do feel connected with the family/friends.
- 29.7 almost 30% are not satisfied, their answer is __NO__.

### It would be interesting to know how many students think they utilize their time

In [None]:
labels, values = survey_df['time_utilized'].unique(), survey_df['time_utilized'].value_counts()


fig = go.Figure(data = [go.Pie(labels = labels, values = values.sort_values(), hole = .3)])
fig.update_layout(title = 'Time Utilize',
                 font = dict(family = 'Droid Serif', size = 15))
fig.show()

#### Conclusion:
- Here it is quite bizzare that the data is divided almost equally even though a pandemic is goin on
- It is a good thing that close to 50 % students think that they are utilising their time
- But let's not jump to conclusion and figure out how does students spend their time

### What students are missing most during covid-19 :

In [None]:
print(survey_df['miss_most'].unique())

In [None]:
pct = survey_df['miss_most'].value_counts()*100/survey_df['miss_most'].value_counts().sum()


fig = px.histogram(survey_df, x = pct, y = pct.index, color = pct.index,
                  width = 900, height = 570)
fig.update_layout(title = 'What students are missing during covid-19',
                 xaxis_title = 'Percentage',
                 yaxis_title = 'Features',
                 font = dict(family = 'Droid Serif', size = 15))
fig.show()

#### Insights :
- So the 32% of the overall data, students are missing their schools & colleges, which is quiet obvious.
- Then 19% students are missing their Friends and family or relatives.
- There are 13-15% students who are missing eating outside or Passing time outside we can say in a straight manner.
- 5.6% students miss their colleagues.
- Lastly we see 2.19% students who claims they are missing Nothing.
- Same percentile of students who claims they are missing All.

### Effects of pandemic on student's weights :

In [None]:
print(survey_df['delta_weight'].unique())

In [None]:
survey_df['delta_weight'].value_counts()

In [None]:
labels, values = ['Remain Constant','Increased','Decreased'], survey_df['delta_weight'].value_counts()
fig = go.Figure(data=[go.Pie(labels =  labels, values = values, hole=.3)])
fig.update_layout(title = 'Effects on weights of students',
                 font = dict(family = 'Droid Serif', size = 14))
fig.show()

### Insights :
- 45.3% students claims there is no gain in their weight during lockdown/covid-19.
- 37.1% students claims they do have gained weight.
- 17.7% says they have lost their weight.

### Student's favourite stress busters :

In [None]:
print(survey_df['stress_busters'].unique())

In [None]:
labels = survey_df['stress_busters'].value_counts()

fig = px.histogram(survey_df, x = labels, y = labels.index, 
                   color = labels.index, width = 900, height = 570)

fig.update_layout(title = 'Stress busters of students',
                 xaxis_title = 'Count',
                 yaxis_title = 'Features',
                 font = dict(family = 'Droid Serif', size = 15))
fig.show()

### Insights :
- As we all know that Music heals our body, mind, soul and spirit, so it is quite expected that most students rely on Music to overcome their stress.
- Now during these times many students too over some hobbies which are represented by 'Doing Extra Activities' which helped them to overcome stress such as drawing, writing, sketching etc. that is why it may be second on the list
- Also Internet Surfing is third on list which suggests that their are many students who surf the internet and look for more information and entertainment sources to bust their stress
- During the pandemic many kind of Online games have gained popularity like PUBG, Among Us, Getting Over It etc. So it is clear that many students used these Online games to lower down their stress levels.

## Inferences and Conclusion

Here is summary of all the inferences drawn from this analysis, and any conclusions we have drawn by answering various questions:

- Based on survey we see that most of the students answered the survey fall in the category of 15-25 which is basically the best phase of a student's life.


- We Also went on the rating of online classes according to students which resulted not so good because almost 75% students are saying that Online Classes are not Good Enough.


- According to the basic analysis close to 50 % students think that they are utilising their time which is quite good.


- We did quite a long analysis on Time Spend by students which mainly suggested that the timeline of students is distorted due to Covid-19 Pandemic and they are not able to enjoy their life as they would've if there was not pandemic.


- We also found out that students are not being able to give time to studies neither online classes nor self-study.


- Now in Social Media prespective we found that Instagram and WhatsApp are the most popular among Students which is quite expected and not that bizzare.


- Due to pandemic majority of the students feel more connected to their family/close friends because lockdown has given them opportunity to spend quality time with them which was not that high during normal times.


- According to the data Students are missing School and College the most (more than 30%). Morevover about 20% students are missing their families, friends and relatives, this suggests that many students are separated from their families, friends and relatives due to pandemic.


- We Also found out that about 45% students reported no change in their weight whereas 37% reported a weight gain and 18% students reported weight loss.


- Finally we infer that Music is the best Stress-Busters among students followed by Extra-Activities such as drawing, writing, sketching etc. and then Internet Surfing is third on list for entertainment sources, also Online Gaming have also gained popularity among students for beating stress.