# Coursework Part 1: Model Solution

The aim of this coursework is to manipulate two data files containing:
1. The list of students from RGU's School of Computing 2020-2021 generation (also known as "class", not to be confused with a module).
2. The attendance list of one of the modules to which some of these students attend online.

Keep in mind that not all of the students in the class attend to the module for which the attendance has been collected!

By running the cell below, you will see a sample of the class list which is being imported from a .csv file stored in my Dropbox:

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
# Load class data
allstudents = pd.read_csv('https://www.dropbox.com/s/y3421z32slsohq0/classlist.csv?raw=1')
allstudents

Unnamed: 0,MATRIC,SURNAME,FIRST NAME,COURSE
0,A00001,AARSMA,SEBASTIAN,CS
1,A00002,ANDERSON,CAMERON,CS
2,A00003,ANDERSON,CRAIG,CS
3,A00004,ANDREWS,CHAD,CS
4,A00005,BARNEY,CHRISTINE,CS
...,...,...,...,...
144,A00145,WILSON,KIERAN,CNMD
145,A00146,WILSON,MARK,CNMD
146,A00147,WITKOWSKI,DOMINIK,CNMD
147,A00148,WOJCIK,ADRIANNE,CNMD


Notice that within the class, you can see students enrolled in different courses (CS, CNMD, etc.)

**Q1: How would you show a list of the existing courses using one line of Python code?**

In [2]:
## Use this cell to show the SET (i.e. once each entry) of courses
set(allstudents['COURSE'])

{'CASD', 'CNMD', 'CS', 'CY', 'DMc', 'DMd'}

**Q2: How would you count the number of students enrolled in each course?**

In [3]:
## Use this cell to show the number of students enrolled to each course
allstudents['COURSE'].value_counts()

CS      50
CASD    39
DMc     25
DMd     16
CNMD    10
CY       9
Name: COURSE, dtype: int64

The second file you will use in this coursework contains the log generated by BlackBoard Collaborative (BBC) once an online session finishes. It reports the name and matriculation number of the student, along with the first join, last leave, total time in the session and total joins to the session 

In [4]:
# Load module data
modulestudents = pd.read_csv('https://www.dropbox.com/s/t67izuewdjxplro/attendance.csv?raw=1')
modulestudents

Unnamed: 0,Name,First join,Last leave,Total time,Joins
0,SEBASTIAN AARSMA (A00001),10/02/2020 13:47,10/02/2020 15:25,01:35:42,2
1,CAMERON ANDERSON (A00002),10/02/2020 13:48,10/02/2020 15:42,01:53:30,1
2,CRAIG ANDERSON (A00003),10/02/2020 13:48,10/02/2020 15:24,01:35:21,1
3,CHAD ANDREWS (A00004),10/02/2020 13:48,10/02/2020 15:30,01:41:53,1
4,CHRISTINE BARNEY (A00005),10/02/2020 13:49,10/02/2020 15:27,01:38:16,1
...,...,...,...,...,...
94,ROBERT WATSON (A00143),10/02/2020 14:01,10/02/2020 16:44,02:43:03,1
95,KASEY WHIPPS (A00144),10/02/2020 14:01,10/02/2020 15:44,01:42:55,1
96,KIERAN WILSON (A00145),10/02/2020 14:01,10/02/2020 16:00,01:56:22,3
97,MARK WILSON (A00146),10/02/2020 14:01,10/02/2020 15:25,01:23:35,1


The first thing you will notice is that the names of the students are not saved in the same way as in the class list. This means that in the list provided by BBC, instead of having the matric number in one column, the surname in a second one and the name in a third one, we have these three in the same column!

Your aim now is to merge both tables into a single one where the course that the student is enrolled gets appended to the attendance table (the one stored in the variable called `modulestudents`)

In [5]:
## Use this cell to create cofe that allows you to MERGE the class table into the module (attendance) so that we can know the course to which the students in the module are enrolled to

# Add MATRIC to attendance data
modulestudents['MATRIC'] = modulestudents['Name'].str.extract('(A\d+)', expand=False)
# Join tables
joined = modulestudents.merge(allstudents, on='MATRIC', how='inner')
mydf = joined[['Name','FIRST NAME', 'SURNAME', 'MATRIC', 'First join', 'Last leave', 'Total time', 'Joins','COURSE']]
mydf

Unnamed: 0,Name,FIRST NAME,SURNAME,MATRIC,First join,Last leave,Total time,Joins,COURSE
0,SEBASTIAN AARSMA (A00001),SEBASTIAN,AARSMA,A00001,10/02/2020 13:47,10/02/2020 15:25,01:35:42,2,CS
1,CAMERON ANDERSON (A00002),CAMERON,ANDERSON,A00002,10/02/2020 13:48,10/02/2020 15:42,01:53:30,1,CS
2,CRAIG ANDERSON (A00003),CRAIG,ANDERSON,A00003,10/02/2020 13:48,10/02/2020 15:24,01:35:21,1,CS
3,CHAD ANDREWS (A00004),CHAD,ANDREWS,A00004,10/02/2020 13:48,10/02/2020 15:30,01:41:53,1,CS
4,CHRISTINE BARNEY (A00005),CHRISTINE,BARNEY,A00005,10/02/2020 13:49,10/02/2020 15:27,01:38:16,1,CS
...,...,...,...,...,...,...,...,...,...
94,ROBERT WATSON (A00143),ROBERT,WATSON,A00143,10/02/2020 14:01,10/02/2020 16:44,02:43:03,1,CNMD
95,KASEY WHIPPS (A00144),KASEY,WHIPPS,A00144,10/02/2020 14:01,10/02/2020 15:44,01:42:55,1,CNMD
96,KIERAN WILSON (A00145),KIERAN,WILSON,A00145,10/02/2020 14:01,10/02/2020 16:00,01:56:22,3,CNMD
97,MARK WILSON (A00146),MARK,WILSON,A00146,10/02/2020 14:01,10/02/2020 15:25,01:23:35,1,CNMD


The next step is to **create a new variable** with the total time in seconds. I recommend you to use the `datetime` module in Python, but any alternative is valid as long as the new column is created. With `datetime`, you could first convert the `Total time` column into a timestamp so that then you can do the calculation:

In [6]:
# Use this cell to create your seconds column
import the datetime module for easy access
import datetime as dt
# import warnings; warnings.simplefilter('ignore')
# add a "nice time" variable which has the time in a readable format
mydf['Total time new'] = pd.to_timedelta(mydf['Total time'].str.strip())
# add a "seconds" variable
mydf['time in seconds'] = mydf['Total time new'].dt.total_seconds()
mydf

SyntaxError: invalid syntax (<ipython-input-6-33956d00d0b7>, line 2)

Now group the table by course and show the mean joins and the mean total time (in seconds) at which the students from each course were connected to the BBC session.

In [None]:
# Use this cell to implement the code that gets you the summary of the means of the numerical values (joins and time in seconds)
courses = mydf.groupby('COURSE')
summary = courses.agg({'mean'})
summary

**Q3: From which course do students remained more seconds connected in average?**

In [None]:
# Use this cell to write the code that retrieves you the course that has the maximum time in seconds mean value
summary['time in seconds'].idxmax()

Your Answer: CY

Finally, your task is to create a small program that allows a user to input a name or surname and present the following stats of a student:
- First join
- Last leave
- Total time
- Joins
- Course
- Total Time (in seconds)


Keep in mind that this program has to consider the following conditions:
1. The programme has to offer the user the option to look by first name or surname, and NOT be case sensitive (i.e. that you can input the first name/surname without the capitalisation and still get a valid response).
2. If the user inputs a name or surname that is repeated throughout the list, you need to show all matching entries. 
3. If the user enters a name or surname that is not on the module list, you must check if that name/surname is on the class list (i.e. the one stored in the `allstudents` variable created at the very beginning of this notebook). If so, then print the message **Student in this class, but not in this module**. If not, then you print the message **Incorrect or invalid name/surname, please try again**.
4. If the user enters an incorrect or invalid name/surname **3 times in a row**, then the program automatically stops!
5. The program can be repeated any number of (valid) times until the user decides to exit.

In [None]:
## Use this cell to write a programme to lookup for the attendance stats of any student
ans = True
counter = 0
while ans:
  print('Select an option:')
  print('1. Search by first name')
  print('2. Search by surname')
  print('0. exit')
  ans = input('>> ')
  if ans == '0':
    # exit
    ans = False
  elif ans == '1':
    allstudents_firstname = allstudents.set_index('FIRST NAME')
    mydf_firstname = mydf.set_index('FIRST NAME')
    ans2 = True
    while ans2:
      if counter == 3:
        # exit
        print('You have tried three unsuccessful times. Exiting program')
        ans = False
        ans2 = False
      else:
        print('Please enter a first name:')
        ans2 = input('>> ')
        try:          
          print(mydf_firstname.loc[ans2.upper()])  
          ans2 = False
          counter = 0 # reset fail counter
        except KeyError:        
          try:           
            allstudents_firstname.loc[ans2.upper()]
            print('Student in this class, but not in this module')
            ans2 = False
            counter = 0 # reset fail counter
          except KeyError:
            counter +=1
            print('Incorrect or invalid name/surname, please try again')
            print('Attempts: '+str(counter))
  elif ans == '2':
    allstudents_surname = allstudents.set_index('SURNAME')
    mydf_surname = mydf.set_index('SURNAME')
    ans2 = True
    while ans2:
      if counter == 3:
        # exit
        print('You have tried three unsuccessful times. Exiting program')
        ans = False
        ans2 = False
      else:
        print('Please enter a surname:')
        ans2 = input('>> ')
        try:          
          print(mydf_surname.loc[ans2.upper()])  
          ans2 = False
          counter = 0 # reset fail counter
        except KeyError:        
          try:
            allstudents_surname.loc[ans2.upper()]
            print('Student in this class, but not in this module')
            ans2 = False
            counter = 0 # reset fail counter
          except KeyError:
            counter +=1
            print('Incorrect or invalid name/surname, please try again')
            print('Incorrect Attempts: '+str(counter))
  else:
    print('invalid option please try again') 

Once you are finished, download the .ipynb notebook and submit it to the appropriate dropbox).