## Examining Student and Online Perceptions of LSE

### Part 1: Introduction

Before arriving at LSE, many students likely hear of the university's low student satisfaction rates. These impressions often gain traction on online forums such as Reddit, where anecdotal accounts tend to amplify perceptions of dissatisfaction. Additionally, multiple university ranking websites also present LSE as having a comparatively low student satisfaction rate. Therefore, it is worth exploring this topic, to understand how "severe" the problem is and whether the data fully aligns with the online perception of the university.

We can explore the statement above through a series of focused questions:
- How does LSE student satisfaction compare to other universities?
- How does student satisfaction within LSE vary by degree?
- How has LSE student satisfaction evolved over the last few years (2020-2023)?
- How is LSE perceived by people online?

It is worth considering what is meant by student satisfaction. Many data sources such as the National Student Survey break down student's views into categories such as teaching, course content, course organisation and assessment. Meanwhile, online the focus tends to be on social opportunities, as well as societies. As "student satisfaction" is multifaceted we will be breaking down each question into the specific type of student satisfaction we are discussing, to avoid confusion. The NSS is generally better suited towards 'academic' related definitions of student satisfaction, such as student's views on the quality of the course, the teaching etc. whilst Reddit and online platforms likely better represent student's thoughts on their social lives at certain universities.




### Part 2: Data acquisition
- Go to 'Data_acquisition.ipynb' file

### Part 3: Data Preparation and Exploration

In [2]:
import pandas as pd
import pickle

### NSS data

We can answer many of the questions through the NSS. The NSS questions students on multiple different categories, such as teaching quality, course quality and assessment quality. Most websites ignore this distinction and simply focus on one of these metrics such as teaching quality, however, we wanted to make the distinction between categories clearer. As there are multiple categories, we create multiple dataframes so that visualisation is easier. Another important thing to consider is the response to population size; if it is high it means the measure of student satisfaction is more likely to accurately represent the real student body, compared to if it is low, when the results are less robust. This is the reason why many online displays of these metrics leave out universities like the University of Oxford, where a hefty proportion of the students boycott the NSS.

We also wanted to display the variance between student's opinions on these measures through the standard deviation score.

In [3]:
file_path = './data/univ_df.pkl'

# Load the dictionary from the pickle file
with open(file_path, 'rb') as pickle_file:
    univ_df = pickle.load(pickle_file)

#refresher of the universities surveyed
for key in univ_df.keys():
    print(key)

LSE
Oxford
UCL
Birmingham
Edinburgh
Glasgow
Imperial
KCL
Manchester
Norwich
Strathclyde
Warwick


Below, we create dataframes representing the differences in student opinions on teaching quality, course quality, assessment quality, support quality, as well as course organisation quality. We considered aggregating every metric into one 'overall satisfaction' score, however, feel that would not take into account the fact that some may view certain criterion as deserving more weight; for example, learning opportunities may be valued much more than assessment quality. These themes are represented on the NSS and are calculated through averaging student responses to questions that can be grouped into one larger theme.


In [4]:
teach_dict={}
learn_dict={}
assess_dict={}
support_dict={}
organisation_dict={}
resource_dict={}
voice_dict={}

for name,uni_df in univ_df.items():
    
    #We measure 'All Subjects' at this point as we don't want to differentiate by Subject
    #The last 7 rows of the dataframe refer to the Themes we are interested in
    df=uni_df[(uni_df["Level of study"]=='All undergraduates') & (uni_df["Subject level"]=="All subjects")& (uni_df['Level of study']=='All undergraduates')][-7:]
    
    #Teaching Scores
    positivity=df.iloc[0]['Positivity measure (%)']
    sd=df.iloc[0]['Standard deviation']
    response_ratio=round(df.iloc[0]['Responses']/df.iloc[0]['Population'],2)
    teach_dict[name]={'Positivity Measure(%)':positivity, 'Standard Deviation': sd, 'Response Ratio': response_ratio}
    
    #Learning Opportunities
    positivity=df.iloc[1]['Positivity measure (%)']
    sd=df.iloc[1]['Standard deviation']
    response_ratio=round(df.iloc[1]['Responses']/df.iloc[1]['Population'],2)
    learn_dict[name]={'Positivity Measure(%)':positivity, 'Standard Deviation': sd, 'Response Ratio': response_ratio}

    
    #Assessment and feedback
    positivity=df.iloc[2]['Positivity measure (%)']
    sd=df.iloc[2]['Standard deviation']
    response_ratio=round(df.iloc[2]['Responses']/df.iloc[2]['Population'],2)
    assess_dict[name]={'Positivity Measure(%)':positivity, 'Standard Deviation': sd, 'Response Ratio': response_ratio}

    #Academic Support
    positivity=df.iloc[3]['Positivity measure (%)']
    sd=df.iloc[3]['Standard deviation']
    response_ratio=round(df.iloc[3]['Responses']/df.iloc[3]['Population'],2)
    support_dict[name]={'Positivity Measure(%)':positivity, 'Standard Deviation': sd, 'Response Ratio': response_ratio}

    # Organisation and Management
    positivity=df.iloc[4]['Positivity measure (%)']
    sd=df.iloc[4]['Standard deviation']
    response_ratio=round(df.iloc[4]['Responses']/df.iloc[4]['Population'],2)
    organisation_dict[name]={'Positivity Measure(%)':positivity, 'Standard Deviation': sd, 'Response Ratio': response_ratio}

    # Learning Resources
    positivity=df.iloc[5]['Positivity measure (%)']
    sd=df.iloc[5]['Standard deviation']
    response_ratio=round(df.iloc[5]['Responses']/df.iloc[5]['Population'],2)
    resource_dict[name]={'Positivity Measure(%)':positivity, 'Standard Deviation': sd, 'Response Ratio': response_ratio}

    # Student Voice
    positivity=df.iloc[6]['Positivity measure (%)']
    sd=df.iloc[6]['Standard deviation']
    response_ratio=round(df.iloc[6]['Responses']/df.iloc[6]['Population'],2)
    voice_dict[name]={'Positivity Measure(%)':positivity, 'Standard Deviation': sd, 'Response Ratio': response_ratio}

    




In [5]:
teach_df=pd.DataFrame(teach_dict)
print('Student Views on Teaching Quality') #a higher percentage means more students think positively of that factor
teach_df.transpose()

Student Views on Teaching Quality


Unnamed: 0,Positivity Measure(%),Standard Deviation,Response Ratio
LSE,86.5,1.0,0.67
Oxford,93.1,0.8,0.5
UCL,84.4,0.5,0.72
Birmingham,84.1,0.5,0.69
Edinburgh,85.2,0.6,0.65
Glasgow,98.1,3.4,0.71
Imperial,89.3,0.7,0.72
KCL,83.3,0.6,0.7
Manchester,81.7,0.5,0.74
Norwich,82.6,1.5,0.82


In [6]:
learn_df=pd.DataFrame(learn_dict)
print('Student Views on Learning Opportunities')
learn_df.transpose()

Student Views on Learning Opportunities


Unnamed: 0,Positivity Measure(%),Standard Deviation,Response Ratio
LSE,79.4,1.1,0.67
Oxford,82.1,0.9,0.5
UCL,79.4,0.6,0.72
Birmingham,79.2,0.6,0.69
Edinburgh,75.5,0.7,0.65
Glasgow,96.2,4.0,0.71
Imperial,85.1,0.8,0.72
KCL,77.7,0.6,0.7
Manchester,76.9,0.5,0.74
Norwich,79.1,1.6,0.82


In [7]:
print('Student Views on Assessment Quality')
assess_df=pd.DataFrame(assess_dict)
assess_df.transpose()

Student Views on Assessment Quality


Unnamed: 0,Positivity Measure(%),Standard Deviation,Response Ratio
LSE,71.1,1.2,0.67
Oxford,74.1,1.0,0.5
UCL,68.6,0.6,0.72
Birmingham,68.2,0.7,0.69
Edinburgh,63.4,0.8,0.65
Glasgow,95.6,3.8,0.71
Imperial,68.7,1.0,0.72
KCL,67.9,0.7,0.7
Manchester,67.9,0.6,0.74
Norwich,81.6,1.6,0.82


In [8]:
print('Student Views on Academic Support Quality')
support_df=pd.DataFrame(support_dict)
support_df.transpose()

Student Views on Academic Support Quality


Unnamed: 0,Positivity Measure(%),Standard Deviation,Response Ratio
LSE,87.4,1.0,0.67
Oxford,90.4,0.8,0.5
UCL,84.1,0.5,0.72
Birmingham,81.4,0.6,0.69
Edinburgh,81.3,0.6,0.65
Glasgow,100.0,3.5,0.71
Imperial,86.6,0.8,0.72
KCL,80.6,0.6,0.7
Manchester,81.3,0.5,0.74
Norwich,87.4,1.4,0.82


In [9]:
print('Student Views on Course Organisation Quality')
organisation_df=pd.DataFrame(organisation_dict)
organisation_df.transpose()

Student Views on Course Organisation Quality


Unnamed: 0,Positivity Measure(%),Standard Deviation,Response Ratio
LSE,80.7,1.1,0.67
Oxford,67.6,1.1,0.5
UCL,74.0,0.6,0.72
Birmingham,72.5,0.7,0.69
Edinburgh,69.1,0.8,0.65
Glasgow,98.1,3.7,0.71
Imperial,73.5,0.9,0.72
KCL,66.6,0.7,0.7
Manchester,70.2,0.6,0.74
Norwich,68.7,1.8,0.82


In [10]:
print('Student Views on Student Voice')
resource_df=pd.DataFrame(resource_dict)
resource_df.transpose()

Student Views on Student Voice


Unnamed: 0,Positivity Measure(%),Standard Deviation,Response Ratio
LSE,87.1,0.9,0.67
Oxford,92.2,0.8,0.5
UCL,89.4,0.5,0.71
Birmingham,87.2,0.5,0.69
Edinburgh,85.7,0.6,0.65
Glasgow,94.9,3.8,0.69
Imperial,91.2,0.7,0.72
KCL,85.1,0.5,0.7
Manchester,81.9,0.5,0.74
Norwich,85.5,1.4,0.82


Another research question involved investigating how student satisfaction varied by degree. Firstly, we see how this works *within* LSE, then compare certain courses across universities.

In [41]:
lse_subjectdat=univ_df['LSE']
lse_subjectdat['Subject'].value_counts(dropna=False)

Subject
Law                                                408
Psychology                                         272
Mathematical sciences                              272
Geography, earth and environmental studies         272
Economics                                          272
Politics                                           272
Business and management                            272
Language and area studies                          136
Human geography                                    136
Philosophy                                         136
Philosophy and religious studies                   136
History                                            136
History and archaeology                            136
Historical, philosophical and religious studies    136
Asian studies                                      136
Languages and area studies                         136
Management studies                                 136
Accounting                                         136
Fi

Many of these subjects overlap due to the way the NSS is organised. Essentially, subjects are grouped into certain levels under the Common Aggregation Hierarchy; the more broad a subject is the lower the level. For example, at CAH1 you could get 'Geography, earth and environmental studies' whilst at CAH3 you can get 'Human geography.' For this project, we have decided to only analyse subjects at the level of CAH2. Though this does lose some núance (since Mathematics and Statistics are grouped into one 'Mathematical Sciences' it makes the data easier to interpret, as the differences between groups may appear more evident.  

In [43]:
lse_subjectdat=lse_subjectdat[lse_subjectdat['Subject level']=='CAH2']
lse_subjectdat['Subject'].value_counts()


Subject
Psychology                                    136
Mathematical sciences                         136
Sociology, social policy and anthropology     136
Economics                                     136
Politics                                      136
Law                                           136
Business and management                       136
Languages and area studies                    136
History and archaeology                       136
Philosophy and religious studies              136
Geography, earth and environmental studies    136
Name: count, dtype: int64

We then calculate the specific positivity scores for each subject.

The data differentiates between students with a First and 'all undergraduates', but we could not see any difference between the data collected for both values (i.e. the rows were identical with the one difference being some had 'First') which led us to believe there was no real need to display this distinction.

In [45]:
teach_dict={}
learn_dict={}
assess_dict={}
support_dict={}
organisation_dict={}
resource_dict={}
voice_dict={}
for name in lse_subjectdat['Subject'].unique():
    
    df=lse_subjectdat[(lse_subjectdat['Subject']==name) & (lse_subjectdat['Level of study']=='All undergraduates')][-7:]
    
    #Teaching Scores
    positivity=df.iloc[0]['Positivity measure (%)']
    sd=df.iloc[0]['Standard deviation']
    response_ratio=round(df.iloc[0]['Responses']/df.iloc[0]['Population'],2)
    teach_dict[name]={'Positivity Measure(%)':positivity, 'Standard Deviation': sd, 'Response Ratio': response_ratio}
    
    #Learning Opportunities
    positivity=df.iloc[1]['Positivity measure (%)']
    sd=df.iloc[1]['Standard deviation']
    response_ratio=round(df.iloc[1]['Responses']/df.iloc[1]['Population'],2)
    learn_dict[name]={'Positivity Measure(%)':positivity, 'Standard Deviation': sd, 'Response Ratio': response_ratio}

    
    #Assessment and feedback
    positivity=df.iloc[2]['Positivity measure (%)']
    sd=df.iloc[2]['Standard deviation']
    response_ratio=round(df.iloc[2]['Responses']/df.iloc[2]['Population'],2)
    assess_dict[name]={'Positivity Measure(%)':positivity, 'Standard Deviation': sd, 'Response Ratio': response_ratio}

    #Academic Support
    positivity=df.iloc[3]['Positivity measure (%)']
    sd=df.iloc[3]['Standard deviation']
    response_ratio=round(df.iloc[3]['Responses']/df.iloc[3]['Population'],2)
    support_dict[name]={'Positivity Measure(%)':positivity, 'Standard Deviation': sd, 'Response Ratio': response_ratio}

    # Organisation and Management
    positivity=df.iloc[4]['Positivity measure (%)']
    sd=df.iloc[4]['Standard deviation']
    response_ratio=round(df.iloc[4]['Responses']/df.iloc[4]['Population'],2)
    organisation_dict[name]={'Positivity Measure(%)':positivity, 'Standard Deviation': sd, 'Response Ratio': response_ratio}

    # Learning Resources
    positivity=df.iloc[5]['Positivity measure (%)']
    sd=df.iloc[5]['Standard deviation']
    response_ratio=round(df.iloc[5]['Responses']/df.iloc[5]['Population'],2)
    resource_dict[name]={'Positivity Measure(%)':positivity, 'Standard Deviation': sd, 'Response Ratio': response_ratio}

    # Student Voice
    positivity=df.iloc[6]['Positivity measure (%)']
    sd=df.iloc[6]['Standard deviation']
    response_ratio=round(df.iloc[6]['Responses']/df.iloc[6]['Population'],2)
    voice_dict[name]={'Positivity Measure(%)':positivity, 'Standard Deviation': sd, 'Response Ratio': response_ratio}

    
    


In [46]:
teach_df=pd.DataFrame(teach_dict)
print('Student Views on Teaching Quality by Subject') #a higher percentage means more students think positively of that factor
teach_df.transpose()

Student Views on Teaching Quality by Subject


Unnamed: 0,Positivity Measure(%),Standard Deviation,Response Ratio
Psychology,99.2,5.1,1.0
Mathematical sciences,72.8,3.4,0.63
"Sociology, social policy and anthropology",90.1,3.0,0.71
Economics,84.1,2.1,0.65
Politics,89.2,2.4,0.73
Law,93.3,2.8,0.59
Business and management,88.0,2.4,0.66
Languages and area studies,93.3,10.5,0.77
History and archaeology,90.3,2.8,0.74
Philosophy and religious studies,90.4,4.7,0.68


In [47]:
learn_df=pd.DataFrame(learn_dict)
print('Student Views on Learning Opportunities by Subject')
learn_df.transpose()

Student Views on Learning Opportunities by Subject


Unnamed: 0,Positivity Measure(%),Standard Deviation,Response Ratio
Psychology,96.0,6.0,1.0
Mathematical sciences,69.3,3.6,0.63
"Sociology, social policy and anthropology",82.6,3.5,0.71
Economics,75.7,2.4,0.65
Politics,79.3,3.0,0.73
Law,82.3,3.5,0.59
Business and management,82.7,2.6,0.66
Languages and area studies,89.8,12.9,0.77
History and archaeology,84.6,3.7,0.74
Philosophy and religious studies,83.9,6.0,0.68


In [48]:
print('Student Views on Assessment Quality by Subject')
assess_df=pd.DataFrame(assess_dict)
assess_df.transpose()

Student Views on Assessment Quality by Subject


Unnamed: 0,Positivity Measure(%),Standard Deviation,Response Ratio
Psychology,89.3,6.9,1.0
Mathematical sciences,58.6,3.9,0.63
"Sociology, social policy and anthropology",77.5,3.7,0.71
Economics,68.2,2.6,0.65
Politics,69.2,3.3,0.73
Law,79.5,3.8,0.59
Business and management,70.3,2.9,0.66
Languages and area studies,92.4,13.1,0.77
History and archaeology,78.1,4.0,0.74
Philosophy and religious studies,79.1,6.5,0.68


In [49]:
print('Student Views on Academic Support Quality by Subject')
support_df=pd.DataFrame(support_dict)
support_df.transpose()

Student Views on Academic Support Quality by Subject


Unnamed: 0,Positivity Measure(%),Standard Deviation,Response Ratio
Psychology,100.0,5.3,1.0
Mathematical sciences,76.9,3.2,0.63
"Sociology, social policy and anthropology",89.8,3.2,0.71
Economics,86.5,2.0,0.65
Politics,87.9,2.5,0.73
Law,90.0,3.1,0.59
Business and management,88.0,2.3,0.66
Languages and area studies,100.0,8.4,0.77
History and archaeology,91.2,2.9,0.74
Philosophy and religious studies,93.1,4.7,0.68


In [50]:
print('Student Views on Course Organisation Quality by Subject')
organisation_df=pd.DataFrame(organisation_dict)
organisation_df.transpose()

Student Views on Course Organisation Quality by Subject


Unnamed: 0,Positivity Measure(%),Standard Deviation,Response Ratio
Psychology,96.7,6.1,1.0
Mathematical sciences,70.8,3.5,0.63
"Sociology, social policy and anthropology",80.9,3.8,0.71
Economics,81.9,2.2,0.65
Politics,78.2,3.1,0.73
Law,76.7,3.8,0.59
Business and management,85.6,2.5,0.66
Languages and area studies,84.5,14.8,0.77
History and archaeology,85.6,3.8,0.74
Philosophy and religious studies,83.8,6.2,0.68


In [51]:
print('Student Views on Student Voice by Subject')
resource_df=pd.DataFrame(resource_dict)
resource_df.transpose()

Student Views on Student Voice by Subject


Unnamed: 0,Positivity Measure(%),Standard Deviation,Response Ratio
Psychology,93.9,5.2,1.0
Mathematical sciences,84.6,2.8,0.63
"Sociology, social policy and anthropology",89.8,2.9,0.71
Economics,85.4,1.9,0.64
Politics,88.4,2.4,0.73
Law,87.7,3.0,0.59
Business and management,86.8,2.3,0.66
Languages and area studies,86.0,12.2,0.77
History and archaeology,88.5,3.1,0.74
Philosophy and religious studies,88.3,5.2,0.68


Answering how LSE student satisfaction has evolved over the years requires multiple dataframes from the Data Acquisition notebook.

### Reddit LSE data

A refresh of the format of the data frame used to store this data:

In [23]:
lse_reddit_data_df = pd.read_csv("Data/reddit_data.csv")
lse_reddit_data_df.head(5)

Unnamed: 0,Title,Score,Top Comment
0,The irony of LSE being a socialist institution,327,"Don’t really want to give super identifying details but I’m currently a postgraduate student in LSE and although my programme isn’t under the department of economics, I’m in an economics heavy programme. My professors are quite obviously pro-tax (heavily guided by data and evidence of course), pro-government regulations and interventions in the free market, pro-reform and redistribution through public administration and services, and actively teach us how to see through monopolies, inequalities, and undemocratic tactics administered in western free market countries (primarily the US). I think that we may still be considered centrists by hardcore socialists but we are clearly left wing compared to right wing, conservatives, libertarians, and whatever."
1,Got kicked outta LSE.....,238,"Can you share more details? Specifically.\n\nDid you see a DR about such issues and have a record of it?\n\nDid you apply for extenuating circumstances at all?\n\nWas you offered a resit opportunity at all?\n\nWhat is your grounds for appeal, and are you appealing only specifically the final fail decision?"
2,why are masters degrees so expensive?,207,"This doesn't really apply to most masters courses if you are a domestic student, the MFin looks aimed at people who already have professional experience which makes it quite different to most other masters. Even at Cambridge most course are around £12k so almost coverable with the student loan, but still you do need to find £10k+ for living expenses from somewhere"
3,TIL some unis have worse graduate prospects than non-graduates,200,"I think people should bear in mind that Imperial only offers STEM and LSE focuses a lot of financial careers. That inflates the average salaries. In reality, the top 4 are very similar."
4,University subreddits,190,"/r/Edinburgh_University Edinburgh, University of"


The first thing to check is the number of duplicate or null values. As you can see below, there is neither for both.
- There are no null values since the code used to access the top comment from each post was written to only add the post to the dataframe if it contained a comment. 

- There are no duplicate posts since the chance of two identical strings of words is highly unlikely.

In [17]:
num_duplicates = lse_reddit_data_df.duplicated().sum()
print(f"Number of duplicate rows: {num_duplicates}")

Number of duplicate rows: 0


In [18]:
total_nulls = lse_reddit_data_df.isnull().sum().sum()
print(f"Number of null values: {total_nulls}")
print(lse_reddit_data_df.shape)

Number of null values: 0
(30, 3)


We only want to consider posts that have a high amount of upvotes, since they are more likely to be helpful and insightful information if other users liked them. We can acheive this by removing posts whose score is below '30', of which there way only one. This number was decided given the maximum score was 327 and the mean score was 97.

It is worth noting that the posts are already sorted in order of descending score, since that is a paramter included in the Reddit API call.

In [19]:
print(f"Maximum score: {lse_reddit_data_df['Score'].max()}")
print(f"Mean score: {int(lse_reddit_data_df['Score'].mean())}")
print(f"Current number of posts: {lse_reddit_data_df.shape[0]}")
num_low_score_posts = (lse_reddit_data_df['Score']<= 30).sum()
print(f"Number of posts with a score of 5 or below: {num_low_score_posts}")
lse_reddit_data_df = lse_reddit_data_df[lse_reddit_data_df['Score'] > 30]
print(f"New number of posts: {lse_reddit_data_df.shape[0]}")


lse_reddit_data_df.head(5)

Maximum score: 327
Mean score: 97
Current number of posts: 30
Number of posts with a score of 5 or below: 1
New number of posts: 29


Unnamed: 0,Title,Score,Top Comment
0,The irony of LSE being a socialist institution,327,"Don’t really want to give super identifying details but I’m currently a postgraduate student in LSE and although my programme isn’t under the department of economics, I’m in an economics heavy programme. My professors are quite obviously pro-tax (heavily guided by data and evidence of course), pro-government regulations and interventions in the free market, pro-reform and redistribution through public administration and services, and actively teach us how to see through monopolies, inequalities, and undemocratic tactics administered in western free market countries (primarily the US). I think that we may still be considered centrists by hardcore socialists but we are clearly left wing compared to right wing, conservatives, libertarians, and whatever."
1,Got kicked outta LSE.....,238,"Can you share more details? Specifically.\n\nDid you see a DR about such issues and have a record of it?\n\nDid you apply for extenuating circumstances at all?\n\nWas you offered a resit opportunity at all?\n\nWhat is your grounds for appeal, and are you appealing only specifically the final fail decision?"
2,why are masters degrees so expensive?,207,"This doesn't really apply to most masters courses if you are a domestic student, the MFin looks aimed at people who already have professional experience which makes it quite different to most other masters. Even at Cambridge most course are around £12k so almost coverable with the student loan, but still you do need to find £10k+ for living expenses from somewhere"
3,TIL some unis have worse graduate prospects than non-graduates,200,"I think people should bear in mind that Imperial only offers STEM and LSE focuses a lot of financial careers. That inflates the average salaries. In reality, the top 4 are very similar."
4,University subreddits,190,"/r/Edinburgh_University Edinburgh, University of"


### Reddit COVID-19 Data

Similar to the procedure above, the below dataframe about reddit posts that contain the word 'covid' is imported from a csv file and saved as a dataframe.

In [36]:
covid_reddit_data_df = pd.read_csv("Data/covid_reddit_data.csv")
covid_reddit_data_df.head(5)

Unnamed: 0,Title,Score,Top Comment
0,Is anyone completely disengaged with university to the point of not attending any lectures/tutorials and only looking at that when it's needed for assignments/exams? I cannot focus with online learning at all and don't even feel like a student,461,[deleted]
1,"Apparently alot of ""covid"" student have been dropping out of Uni",389,"I'm an admissions tutor and lecturer. I teach a fairly challenging module but it's actually quite popular and the final average was about 65%. During Covid we went to online assessment, which resulted in the average shooting up. We returned to a proctored exam last year, and the the average was 38%. This was the 2021 intake (its a Year 2 module). A lot have failed other modules so will be deregistered. Some have quit before they are pushed. Its absolutely not their fault and it's really sad. Their A-levels were severely interrupted, they never learnt how to study properly, and they were totally demoralised. Unfortunately, I think the government will just forget about them. I worry they'll become a lost cohort like the Japanese kids who graduated during the Pacific financial crisis."
2,Important: For people starting uni this year.,383,"Also, not just for you but fellow students, we appreciate there being fewer people to spread whatever arrives on campus."
3,who'd had thunk it,376,Some of us also seem to make this face upon discovering hangovers.
4,Has anyone done a degree they now realise has absolutely no job prospects?,361,What was your undergraduate degree?


To clean the data frame, rows where the value in the 'Top Comment' column was '[deleted]' are removed as this was considered a null value, and where the score is 0.


In [45]:
#remove null values
covid_reddit_data_df = covid_reddit_data_df[covid_reddit_data_df['Top Comment'] != '[deleted]']

#find summary statistics, remove low  scores
print(f"Maximum score: {covid_reddit_data_df['Score'].max()}")
print(f"Mean score: {int(covid_reddit_data_df['Score'].mean())}")
print(f"Current number of posts: {covid_reddit_data_df.shape[0]}")
num_low_score_posts = (covid_reddit_data_df['Score']== 0).sum()
print(f"Number of posts with a score of 0: {num_low_score_posts}")
covid_reddit_data_df = covid_reddit_data_df[covid_reddit_data_df['Score'] > 0]
print(f"New number of posts: {covid_reddit_data_df.shape[0]}")

#display the full comment in the 'Top Comment' column
pd.set_option('display.max_colwidth', None)

covid_reddit_data_df.head(5)

Maximum score: 389
Mean score: 117
Current number of posts: 94
Number of posts with a score of 0: 0
New number of posts: 94


Unnamed: 0,Title,Score,Top Comment
1,"Apparently alot of ""covid"" student have been dropping out of Uni",389,"I'm an admissions tutor and lecturer. I teach a fairly challenging module but it's actually quite popular and the final average was about 65%. During Covid we went to online assessment, which resulted in the average shooting up. We returned to a proctored exam last year, and the the average was 38%. This was the 2021 intake (its a Year 2 module). A lot have failed other modules so will be deregistered. Some have quit before they are pushed. Its absolutely not their fault and it's really sad. Their A-levels were severely interrupted, they never learnt how to study properly, and they were totally demoralised. Unfortunately, I think the government will just forget about them. I worry they'll become a lost cohort like the Japanese kids who graduated during the Pacific financial crisis."
2,Important: For people starting uni this year.,383,"Also, not just for you but fellow students, we appreciate there being fewer people to spread whatever arrives on campus."
3,who'd had thunk it,376,Some of us also seem to make this face upon discovering hangovers.
4,Has anyone done a degree they now realise has absolutely no job prospects?,361,What was your undergraduate degree?
5,I wasted my fucking time at uni and now I regret ever even going,343,It wasn’t Uni - it was the time and situation. Feel for you mate and hope you sort it out.


The next step was to extract any information about how COVID-19 impacted universtity students in London. The results were as follows:

Only 3/200 posts mentioned the words in the 'keywords' list below that were linked to the city of London, or the names of the biggest universities in London. Of the three posts outputted, only one details the impact of COVID-19 on student life by discussing online learning in the King's College London Physics degree. However, they are descirbing the benefits of this course due to it having compulsory in-person labs, whereas most posts in the previous dataframe have a negative view.

The scores of these posts are also all lower than the mean score of 117, signifying there are not as relevant compared to other posts about COVID-19.

The lack of results could be explained by the following reasons:

- Whether a university was located in London had no impact on the student satisfaction during/after COVID-19.
- Students chose not to mentioned the university they attended when posting on the website.
- The information was not accurately gathered as there are too many variations of university names or other
identifying phrases of a student being located in London.
- The sample size was to small.



In [46]:
keywords = [' lse ', ' ucl ', 'london', ' kcl ', 'kings', ' icl ', 'imperical',
            'imperical', ' ual ', ' uol ', 'university of london ', 'university of west london',
            'university of east london', 'london metropolitan university', ' london met', 'university of greenwhich',
            'university of westminster', 'middlesex university', 'london southbank',
            "regent's university", 'university of roehampton', 'ravensbourne university', 'royal college of art',' rcoa ',
            'birkbeck', ' bpp university', 'guildhall school', 'royal college of music', ' lbs ', 'london business school',
            'rada']

#find posts containing any keywords
london_filtered_df = covid_reddit_data_df[covid_reddit_data_df['Title'].str.contains('|'.join(keywords), case=False, na=False) |
                 covid_reddit_data_df['Top Comment'].str.contains('|'.join(keywords), case=False, na=False)]

london_filtered_df

Unnamed: 0,Title,Score,Top Comment
48,British but being charged international fees because I spent 4 years studying in the US,100,"I’d say the London university is right. To be considered a home student, you need to be ordinarily resident in the U.K. for the three years before your course starts (e.g. https://www.lse.ac.uk/study-at-lse/international-students/fee-status-classification), which unfortunately you haven’t been, whatever the reason. I’d request a fee status reassessment from both of them to be honest - you wouldn’t want to get half way through the degree at the Scottish university and suddenly have to pay international fees."
62,Questions about clubbing - London,68,"Pryzm is a famously bad club, terrible atmosphere with awful clientele and a hotbed for spiking. What club you want depends on what sort of music you’re into, I club a lot in London but since I love electronic music it’s much more to go to listen to music rather than to try get off with strangers like the likes of Pryzm are for"
71,Terrified Of Online Learning?,51,"It depends what your degree is. Also whether you’re going to be in England or Scotland or NI or Wales because rules can be a bit different. If you’re doing a STEM degree, it’s very likely that you will be on campus (likely some online lectures too) - I received word from KCL that STEM degrees like Physics will have to be on campus because of compulsory labs. If you’re doing something like History or Literature, it will be even more likely you’ll be doing most things online."
