## A demonstration of PANDAS data frames used to investigate CAO points

Author: Jon Ishaque
Commenced: 29th September 2021
GMIT SID: G00398244

This notebook extracts CAO points from the CAO website for 2019, 2020 and 2021. It loads data into pandas dataframes and uses pandas and python to compare points from different years.

 Documentation: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html

In [1]:
# package for making http requests

import requests as rq
# Dates and time package
import datetime as dt

#dataframes
import pandas as pd

#import regex package for searching strings
import re

#import csv, deals with commas when writing to file
import csv

#use urlib to retrieve url as file for 2019 and 2020
import urllib.request as urlrq 

## 2021 points
#http://www.cao.ie/index.php?page=points&p=2021
The 2021 CAO points are presented in a web page. 

The page header from the server should decode as per: *Content-Type: text/html; charset=iso-8859-1*
However, one line uses \x96 which isn't defined in iso-8859-1. Therefore we use the similar decoding standard cp1252, which is very similar but includes #x96. The character in question was had an Irish foda on a level 8 course

Create a string var,*now*. this is use in file names of back up copies of CAO points.

In [2]:
# Get the current date and time.

now = dt.datetime.now()

# Format as a string.
#global as used in functions
global nowstr
nowstr = now.strftime('%Y%m%d_%H%M%S')



###### Compile the regular expression so it is not compiled at each interation of the loop reading the webpage

###### Explanation of the regualar expression [4][5]:
('[A-Z]{2}[0-9]{3} (.*)([0-9]{3}))</span>

[A-Z]{2}        Any two upper case aphanumberic

[0-9]{3}        Any three digits 0-9

'  '            Two spaces

(.*)([0-9]{3})   Any amount of text before 3 numeric characters


    </font>

In [3]:
#set reg ex
re_courses = re.compile('[A-Z]{2}[0-9]{3} (.*)') #[4]




In [4]:
#Function to get HEI name from course code. using a switcher dict as
#oppose to messy if/else block
#https://www.upgrad.com/blog/how-to-implement-switch-case-functions-in-python/
def getHEI(cc):
    switcher = {'AC' : 'American College',
    'AD' : 'National College of Art and Design',
    'AL' : 'Athlone Institute of Technology',
    'AS' : 'St. Angela`s College',
    'CI' : 'Irish College of Humanities & Applied Sciences',
    'BY' : 'IBAT College Dublin',
    'CK' : 'University College Cork (NUI)',
    'CM' : 'Marino Institute of Education',
    'CR' : 'Cork Institute of Technology',
    'CT' : 'CCT College Dublin',
    'CW' : 'Institute of Technology Carlow',
    'DB' : 'Dublin Business School',
    'DC' : 'Dublin City University',
    'DK' : 'Dundalk Institute of Technology',
    'DL' : 'Dun Laoghaire Institute of Art Design and Technology',
    'DN' : 'University College Dublin (NUI)',
    'DS' : 'Dorset College',
    'GA' : 'Galway-Mayo Institute of Technology',
    'GB' : 'Galway Business School',
    'GC' : 'Griffith College',
    'GY' : 'National University of Ireland Galway',
    'ID' : 'ICD Business School',
    'LC' : 'Limerick Institute of Technology',
    'LM' : 'University of Limerick',
    'LY' : 'Letterkenny Institute of Technology',
    'MH' : 'Maynooth University',
    'MI' : 'Mary Immaculate College',
    'MU' : 'Pontifical University St Patricks College',
    'NC' : 'National College of Ireland (NCI)',
    'NM' : 'St Nicholas Montessori College Ireland',
    'PC' : 'Carlow College St. Patricks',
    'RC' : 'RCSI University of Medicine & Health Sciences',
    'SG' : 'Institute of Technology Sligo',
    'TL' : 'Institute of Technology Tralee',
    'TR' : 'Trinity College Dublin',
    'TU' : 'Technological University Dublin',
    'WD' : 'Waterford Institute of Technology'
    }

    cc = cc[:2]
    return  switcher.get(cc)

#print(HEI('WD123123'))


In [5]:
#helper
def points_to_arr(s):
    AQA=''
    portfolio =''
    points=''
    random = ''
    #check 1st char for #
    #print(s)
    if s[0]=='#':
        portfolio='#'# add to var
    random = ''
    #check final char for  *
    if s[-1] == '*':
        random ='*'
    points=''
    
    if s.find("AQA") ==-1: #not AQA
        #strip ~ and * from start and end of s
        for i in s:
            if i.isdigit():
                #concat points string
                points = points + i
    else:
        AQA ="AQA" #return AQA as separate val as it will be separate column
        #return
    return [points, portfolio, random,AQA]

The following block of code iterates through each line of the csv file 

This part of the note part of the note book will load the web page content. A loop will read each line of web page and determine if it's content is relevant and write content to a csv file.

In [6]:
#MAKE THIS BLOCK A FUNCTION AS REPEATING IT FOR L8 and L6/7
#get a save the csv names to paths
global csv_files
csv_files = []
def createCSV(path):
    print (path)
    #Get the both level 8 and 6/7 web pages
    #getheaders and determine contenttype [3]

    
    
#respL8.headers['content-type']
        
   
    #resp.text
    #loop through response text lines
    #get level
    print (path)
    
    if path.find('L8') >= 0 :
        #print (path)
        level = '8'
        
        resp = rq.get('http://www2.cao.ie/points/l8.php', 
                      headers={"content-type":"text"})
        
    elif path.find('L67') >= 0 :
        #print (path)
        level ='6/7'
        
        resp = rq.get('http://www2.cao.ie/points/l76.php', 
                      headers={"content-type":"text"})
    else:
        level = ''
        
    
    original_encoding = resp.encoding
    # Change to cp1252, which handles accented characters
    resp.encoding = 'cp1252'
     # Create a file path for the original data. 2021
    pathhtml = path + nowstr + '.html'
    # Save the original html file.
    with open(pathhtml, 'w') as f:
        f.write(resp.text)

    #set var to count lines for cross check with webpage
    no_lines = 0
    path = path+'.csv'
    #add csv name to array
    csv_files.append(path)
    with open(path, 'w') as f:
        #write csv header
        linesplit = ['Course Code','Course title','R1_21',
                     'Po_1_21','Rn_1_21','AQA1_21','R2_21',
                     'Po_2_21','Rn_2_21','AQA2_21',
                     'HEI','Level','Year']
        f.write(','.join(linesplit) + '\n')
        for line in resp.iter_lines():
            #

            #problem with bytes
            #so convert str to bytes
            #print (line)
            dline = line.decode('cp1252')
            #check if line mathces reg exp pattern. If so, do something.
            if re_courses.fullmatch(dline):
                no_lines +=1
                #get first five chars - course code
                course_code = dline[:5]
                #course title
                course_title = dline[7:57]
                #r1 points
                round_1 = dline[60:65].rstrip() # get five chars, remove white space
                #if round 1 not blank call fn points_to_arr
                if len(round_1) > 0:
                    round_1= points_to_arr(round_1)
                    #assign vals from returned array
                    pts1 = round_1[0]
                    plo1 = round_1[1]
                    rnd1 = round_1[2]
                    AQA1 = round_1[3]
                else: 
                    pts1 = ''
                    plo1 = ''
                    rnd1 = ''
                    AQA1 = ''
                #r2 points
                round_2 = dline[67:].rstrip() # get four chars, remove white space
                #if round 2 not blank call fn points_to_arr
                if len(round_2) > 0:
                    round_2= points_to_arr(round_2)
                    #assign vals from returned array
                    pts2 = round_2[0]
                    plo2 = round_2[1]
                    rnd2 = round_2[2]
                    AQA2 = round_2[3]
                else: 
                    pts2 = ''
                    plo2 = ''
                    rnd2 = ''
                    AQA2 = ''
                #print (course_code)
                #get the instituion name
                HEI =getHEI(course_code)
                #print (HEI)
                # create an array of the fields for the csv line
                linesplit = [course_code,course_title,pts1,plo1,rnd1,AQA2,pts2,plo2,rnd2,AQA2,HEI,level,'2021']
                #print (linesplit)
                #debug
                #print(f"'{course_code} {dline} r1: {round_1} r2: {round_2}'")
               # print((','.join(linesplit) + '\n'))
                # Rejoin the array values with commas in between. ie.comma separated
                f.write(','.join(linesplit) + '\n')
    print (f"number of lines is", {no_lines})
    path=''
#check this number is correct


In [7]:
# The file path for the csv file.
path_2021_L8 = 'data/cao2021_L8_' + nowstr 
path_2021L67 ='data/cao2021_L67_' + nowstr 

createCSV(path_2021_L8)
createCSV(path_2021L67)

data/cao2021_L8_20211116_201856
data/cao2021_L8_20211116_201856
number of lines is {949}
data/cao2021_L67_20211116_201856
data/cao2021_L67_20211116_201856
number of lines is {416}


#### NB: 949 L8 courses on CAO website verified on 10th November 2021
#### 416 L6/7 courses on CAO website verified on 15th November 2021

Join L8 & L6/7 courses into one dataframe

In [8]:
# loop over the list of csv files
#https://stackoverflow.com/questions/16597265/appending-to-an-empty-dataframe-in-pandas

print(csv_files)
df2021 = pd.DataFrame()
for f in csv_files:
      
    # read the csv file
    print(f)
    df_temp = pd.read_csv(f,encoding='cp1252')
    
    
    df2021 = df2021.append(df_temp, ignore_index = True)


print(df2021)
    

['data/cao2021_L8_20211116_201856.csv', 'data/cao2021_L67_20211116_201856.csv']
data/cao2021_L8_20211116_201856.csv
data/cao2021_L67_20211116_201856.csv
     Course Code                                       Course title  R1_21  \
0          AL801  Software Design for Virtual Reality and Gaming...  300.0   
1          AL802  Software Design in Artificial Intelligence for...  313.0   
2          AL803  Software Design for Mobile Apps and Connected ...  350.0   
3          AL805  Computer Engineering for Network Infrastructur...  321.0   
4          AL810  Quantity Surveying                            ...  328.0   
...          ...                                                ...    ...   
1360       WD188  Applied Health Care                           ...  220.0   
1361       WD205  Molecular Biology with Biopharmaceutical Scien...    NaN   
1362       WD206  Electronic Engineering                        ...  180.0   
1363       WD207  Mechanical Engineering                        ...

*** 

## 2020 CAO points
###### http://www.cao.ie/index.php?page=points&p=2020 CAO points in 2020 include level 6,7 & 8

In [9]:
# Create a file path for the original data.For backup
path2020 = 'data/cao2020_' + nowstr + '.xlsx'

#download to path
urlrq.urlretrieve('http://www2.cao.ie/points/CAOPointsCharts2020.xlsx',\
                  path2020)

('data/cao2020_20211116_201856.xlsx',
 <http.client.HTTPMessage at 0x176e24e84f0>)

In [10]:
###### Read the Excel file into a pandas dataframe 

In [11]:
#download and parse the excel spreadsheet
#skip first 10 rows
df2020=pd.read_excel('http://www2.cao.ie/points/CAOPointsCharts2020.xlsx',\
                 skiprows=10)


In [12]:
#df.iloc[123]

#check final row
#delete unwanted columns
df2020 = df2020.drop(['CATEGORY (i.e.ISCED description)','avp','v','Column1',\
              'Column2','Column3','Column4','Column5','Column6',\
              'Column7','Column8'], 1)
df2020 = df2020.rename(columns={'COURSE TITLE': 'Course title',\
                        'COURSE CODE2': 'Course Code',\
                        'R1 POINTS':'R1_20','R2 POINTS':'R2_20',\
                        'R1 Random *':'Rn_1_20',\
                        'R2 Random*':'Rn_2_20','LEVEL':'Level',\
                        'EOS':'EOS_20','EOS Mid-point':'Mid_20'})

df2020['Year'] =2020
df2020['Po_1_20'] =''
df2020['Po_2_20'] =''
df2020['AQA1_20'] =''
df2020['AQA2_20'] =''
#https://towardsdatascience.com/check-for-a-substring-in-a-pandas-dataframe-column-4b949f64852

#Pulling out AQA, # and  placing into own column.
#this dataset has either digits or #matric code for portfolio and AQA. Check for value,
#if it exists place in another column and set points col to blank.

df2020.loc[df2020['R1_20'].str.contains('#',na=False) ,\
       'Po_1_20'] = '#' 
df2020.loc[df2020['R1_20'].str.contains('#',na=False) , 'R1_20'] = '' 
df2020.loc[df2020['R2_20'].str.contains('#',na=False) ,\
       'Po_2_20'] = '#' 
df2020.loc[df2020['R2_20'].str.contains('#',na=False) , 'R2_20'] = '' 
df2020.loc[df2020['R1_20'].str.contains('AQA',na=False) , 'AQA1'] = 'AQA' 
df2020.loc[df2020['R2_20'].str.contains('AQA',na=False) , 'AQA2'] = '' 
display (df2020.loc[df2020['Course Code']=='CR210']) # check we are picking up commas in csv fields          

Unnamed: 0,Course title,Course Code,R1_20,Rn_1_20,R2_20,Rn_2_20,EOS_20,EOS Random *,Mid_20,Level,HEI,Test/Interview #,Year,Po_1_20,Po_2_20,AQA1_20,AQA2_20,AQA1,AQA2
195,"Contemporary Applied Art (Ceramics, Glass, Tex...",CR210,,,,,#+matric,,#+matric,8,Cork Institute of Technology,#,2020,#,,,,,


***

## 2019 CAO
#http://www2.cao.ie/points/lvl8_19.pdf

In [13]:
import camelot #use camelot package to extract tables from pdf files [7]

Read tables functions Parameters: url_path for path to read 2019 CAO points, csv_path to write to file


In [14]:
def readTables(url_path,csv_path):
    #download url to path
    urlrq.urlretrieve(url_path,csv_path)
    
    tables = camelot.read_pdf(url_path,\
                              pages='1-end',flavor='stream')
   
    #read all pages [8]
    
    tables
    tbl_cnt = len(tables)

    #export all tables - not what we really want
    #tables.export(path, f='csv', compress=False) # json, excel, html, markdown, sqlite
    #tables[0]

    tables[1].parsing_report
    {
        'accuracy': 99.02,
        'whitespace': 12.24,
        'order': 1,
        'page': 10
    }
    i = 1 # exclude first header table 
    #interate through the list of tables [9] 
    data2019 = [] # empty list of tables

    for t in tables:    
        if i > 0: #exclude 1st table
            #write the table as a dataframe to listdata2019

            data2019.append(t.df) 
        i +=1 

    #combine all the dataframes in the list into one dataframe
    dfcombined = pd.concat(data2019)

    #add column headers
    dfcombined.columns = ['Course Code', 'Course title', 'EOS', 'Mid']
    #add year
    dfcombined['Year']= '2019'
    #add level
    if url_path.find('l8') >= 0 :
        
        dfcombined['Level'] = '8'
    elif url_path.find('l76') >= 0 :
        dfcombined['Level'] = '8'
    
    tbl_cnt
    #write to csv to store as back up.
    dfcombined.to_csv(csv_path)
    
    return dfcombined

print parsing report of first table

In [15]:
# The file path for the csv file.
path2019_l8='http://www2.cao.ie/points/lvl8_19.pdf'
path2019_l67 ='http://www2.cao.ie/points/lvl76_19.pdf'
#create a path to back up the file as a csv    
path2019_csv_L8 = 'data/cao2019_L8_csv_' + nowstr + '.csv'
path2019_csv_L67 = 'data/cao2019_L67_' + nowstr + '.csv'
df2019L8=readTables(path2019_l8,path2019_csv_L8)
df2019L67=readTables(path2019_l67,path2019_csv_L67)

Filter df so only rows with course codes remain. [10]

In [16]:
#function to filter the dataframe on course code reg ex - i.e. get rid of institution title lines
def regex_filter(val): 
    regex= '[A-Z]{2}[0-9]{3}'
    if val:
        mo = re.search(regex,val)
        if mo:
            return True
        else:
            return False
    else:
        return False


dfs = [df2019L67,df2019L8]
df2019 = pd.concat(dfs,ignore_index=True)
df2019 = df2019[df2019['Course Code'].apply(regex_filter)]

df2019



Unnamed: 0,Course Code,Course title,EOS,Mid,Year,Level
5,AL600,Software Design,205,306,2019,8
6,AL601,Computer Engineering,196,272,2019,8
7,AL602,Mechanical Engineering,258,424,2019,8
8,AL604,Civil Engineering,252,360,2019,8
9,AL630,Pharmacy Technician,306,366,2019,8
...,...,...,...,...,...,...
1480,WD200,Arts (options),221,296,2019,8
1481,WD210,Software Systems Development,271,329,2019,8
1482,WD211,Creative Computing,275,322,2019,8
1483,WD212,Recreation and Sport Management,274,311,2019,8


###### reset index to remove indexes from appended dataframes.
reset because reindex will notwork with duplicate values indexes [11]


In [17]:
df2019 = df2019.reset_index(drop=True)
#Create and set year column
df2019['Year'] =2019
#create columns for potfolio, AQA and random - signal 1 even though there only 1 for 2019
df2019['Rn1_19'] =''
df2019['Po_1_19'] =''
df2019['AQA1_19'] =''



#deal with random, portfolio and AQAs. These occur only in in the EOS field so check this field
#for occurence, and strip out digits, replace digits and move flag to new column.
#This done by passing df rows to helper functions
#
#add HEI name #https://towardsdatascience.com/create-new-column-based-on-other-columns-pandas-5586d87de73d
def HEIrow(row):
    return getHEI(row['Course Code'])
df2019['HEI'] = ''

df2019['HEI'] = df2019.apply(lambda row: HEIrow(row), axis=1)

def getRandomCol(row):
    #treat this field as string
    if row['EOS'].find('*') > -1:
        return '*'
df2019['Rn_1_19'] =  df2019.apply(lambda row: getRandomCol(row), axis=1)

def getPortFolCol(row):
    if row['EOS'].find('#') > -1:
        return '#'
df2019['Po_1_19'] =  df2019.apply(lambda row: getPortFolCol(row), axis=1)
def getAQACol(row):
    if row['EOS'].find('AQA') > -1:
        return 'AQA'
df2019['AQA1_19'] =  df2019.apply(lambda row: getAQACol(row), axis=1)
#finally return digits if they exist to EOS
def getDigitsCol(row):
    points=''
    for i in row['EOS']:
            if i.isdigit():
                #concat points string
                points = points + i    
    return points
df2019['EOS'] =  df2019.apply(lambda row: getDigitsCol(row), axis=1)

In [18]:
df2019=df2019.rename(columns={'COURSE': 'Course title','Mid': 'Mid_2019','EOS': 'EOS_2019'})


In [19]:
display (df2019.loc[df2019['Course Code']=='CK791']) #AL861,CK201


Unnamed: 0,Course Code,Course title,EOS_2019,Mid_2019,Year,Level,Rn1_19,Po_1_19,AQA1_19,HEI,Rn_1_19
628,CK791,Medicine - Graduate Entry (GAMSAT required),58,59,2019,8,,#,,University College Cork (NUI),*


In [20]:
#Get unique course list
#Create short dataframes
df2019_sh = df2019[['Course Code','Course title','Level']]
df2020_sh = df2020[['Course Code','Course title','Level']]
df2021_sh = df2021[['Course Code','Course title','Level']]
#display(df2021_sh)

In [21]:
dfs=[df2021_sh,df2020_sh,df2021_sh]
all_Courses=pd.concat(dfs,ignore_index=True)
#all_Courses[all_Courses.duplicated(subset=['Course code'])]
#clean date, remove duplicates on course code

all_Courses.drop_duplicates(subset=['Course Code'],inplace=True,ignore_index=False)

all_Courses

Unnamed: 0,Course Code,Course title,Level
0,AL801,Software Design for Virtual Reality and Gaming...,8
1,AL802,Software Design in Artificial Intelligence for...,8
2,AL803,Software Design for Mobile Apps and Connected ...,8
3,AL805,Computer Engineering for Network Infrastructur...,8
4,AL810,Quantity Surveying ...,8
...,...,...,...
2756,TU986,Print Media Technology and Management,8
2759,TU993,Early Childhood Care and Education,8
2760,TU994,Early Childhood Care and Education,8
2790,WD139,Civil Engineering,7


In [22]:
#debug
#display (all_Courses.loc[all_Courses['Course code']=='WD208'])
#all_Courses.iloc[1538]

Join the data frames
<br>

In [23]:
#Set course code as index - default column to join frame on, makes the code much cleaner.
df2019.set_index('Course Code',inplace=True)
df2020.set_index('Course Code',inplace=True)
df2021.set_index('Course Code',inplace=True)
all_Courses.set_index('Course Code',inplace=True)

all_Courses=all_Courses.join(df2021[['R1_21','Rn_1_21','Po_1_21','AQA1_21','R2_21','Rn_2_21','Po_2_21','AQA2_21']])
all_Courses=all_Courses.join(df2020[['R1_20','Rn_1_20','Po_1_20','AQA1_20','R2_20','Rn_2_20','Po_2_20','AQA2_20']])
all_Courses=all_Courses.join(df2019[['Rn_1_19','Po_1_19','AQA1_19','EOS_2019','Mid_2019']])
all_Courses

Unnamed: 0_level_0,Course title,Level,R1_21,Rn_1_21,Po_1_21,AQA1_21,R2_21,Rn_2_21,Po_2_21,AQA2_21,...,AQA1_20,R2_20,Rn_2_20,Po_2_20,AQA2_20,Rn_1_19,Po_1_19,AQA1_19,EOS_2019,Mid_2019
Course Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AL801,Software Design for Virtual Reality and Gaming...,8,300.0,,,,,,,,...,,,,,,,,,304,328
AL802,Software Design in Artificial Intelligence for...,8,313.0,,,,,,,,...,,,,,,,,,301,306
AL803,Software Design for Mobile Apps and Connected ...,8,350.0,,,,,,,,...,,,,,,,,,309,337
AL805,Computer Engineering for Network Infrastructur...,8,321.0,,,,,,,,...,,,,,,,,,329,442
AL810,Quantity Surveying ...,8,328.0,,,,,,,,...,,,,,,,,,307,349
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
TU986,Print Media Technology and Management,8,,,,,,,,,...,,,,,,,,,,
TU993,Early Childhood Care and Education,8,,,,,,,,,...,,,,,,,,,,
TU994,Early Childhood Care and Education,8,,,,,,,,,...,,251,,,,,,,,
WD139,Civil Engineering,7,,,,,,,,,...,,,,,,,,,200,364


In [24]:
all_Courses_usefulCols=all_Courses[['Course title' ,'Level','R1_21','R2_21','R1_20','R2_20']]

In [25]:
all_Courses_usefulCols

Unnamed: 0_level_0,Course title,Level,R1_21,R2_21,R1_20,R2_20
Course Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
AL801,Software Design for Virtual Reality and Gaming...,8,300.0,,303,
AL802,Software Design in Artificial Intelligence for...,8,313.0,,332,
AL803,Software Design for Mobile Apps and Connected ...,8,350.0,,337,
AL805,Computer Engineering for Network Infrastructur...,8,321.0,,333,
AL810,Quantity Surveying ...,8,328.0,,319,
...,...,...,...,...,...,...
TU986,Print Media Technology and Management,8,,,289,
TU993,Early Childhood Care and Education,8,,,270,
TU994,Early Childhood Care and Education,8,,,321,251
WD139,Civil Engineering,7,,,206,


In [26]:
all_Courses_usefulCols['R2_21']=all_Courses_usefulCols['R2_21'].fillna(0)

all_Courses_usefulCols


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_Courses_usefulCols['R2_21']=all_Courses_usefulCols['R2_21'].fillna(0)


Unnamed: 0_level_0,Course title,Level,R1_21,R2_21,R1_20,R2_20
Course Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
AL801,Software Design for Virtual Reality and Gaming...,8,300.0,0.0,303,
AL802,Software Design in Artificial Intelligence for...,8,313.0,0.0,332,
AL803,Software Design for Mobile Apps and Connected ...,8,350.0,0.0,337,
AL805,Computer Engineering for Network Infrastructur...,8,321.0,0.0,333,
AL810,Quantity Surveying ...,8,328.0,0.0,319,
...,...,...,...,...,...,...
TU986,Print Media Technology and Management,8,,0.0,289,
TU993,Early Childhood Care and Education,8,,0.0,270,
TU994,Early Childhood Care and Education,8,,0.0,321,251
WD139,Civil Engineering,7,,0.0,206,


In [27]:

all_Courses_usefulCols.convert_dtypes(infer_objects=False).dtypes 
all_Courses_usefulCols['R1_21'] = all_Courses_usefulCols['R1_21'].astype(int)
all_Courses_usefulCols.dtypes

ValueError: Cannot convert non-finite values (NA or inf) to integer

---
## References:
[1]

[2]

[3] https://docs.python-requests.org/en/latest/index.html

[4] https://docs.python.org/3/library/re.html

[5] https://docs.python.org/3/library/re.html?highlight=re%20match#re.match

[6] https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html?highlight=read_excel#pandas.read_excel

[7] https://camelot-py.readthedocs.io/en/master/

[8] https://github.com/atlanhq/camelot/issues/278

[9] https://stackoverflow.com/questions/55052989/how-to-iterate-through-a-list-of-data-frames-and-drop-all-data-if-a-specific-str

[10] https://stackoverflow.com/questions/15325182/how-to-filter-rows-in-pandas-by-regex/48884429

[11] https://stackoverflow.com/questions/68261366/right-way-to-reindex-a-dataframe


## End