# CAO Points Analysis

File include detailed comparison of CAO points in 2019, 2020 and 2021 using the functionality in Pandas,
plus appropiate plots.

***

### Link to CAO Points Website
https://www.cao.ie/index.php?/page=point&p=2021

***

In [1]:
# Convinient package for making HTTP requests.
import requests as rq

# Regular expression.
import re

#Dates and times.
import datetime as dt

#To clean xlms and pdf files.
import pandas as pd

# For downloading.
import urllib.request as urlrq

### Link to CAO Points 2021 Website
https://www.cao.ie/index.php?/page=point&p=2021

***

### CAO Points 2021
***

In [2]:
# Fetch CAO points from the website.
resp = rq.get("http://www2.cao.ie/points/l8.php")
# Check if its connecting.
# resp

### Save original CAO data set to a HTML file. 

***

In [3]:
# Get the current date and time.
now = dt.datetime.now()
# Format as a string.
nowstr = now.strftime('%Y%m%d_%H%M%S')

In [4]:
# Creating timestamp in file as a filename itself.
pathhtml = 'data/cao2021_' + nowstr + '.html'

In [5]:
# The server uses the wrong encoding, fix it.
original_encoding = resp.encoding
# Change to cp1252
resp.encoding = 'cp1252'

In [6]:
# Save the original html file
with open(pathhtml, 'w') as f:
    f.write(resp.text)

### Use regular expression to select lines we want.

***

In [7]:
# Compile regular expression for matching lines.
re_course = re.compile(r'([A-Z]{2}[0-9]{3})(.*)')

### Loop through the lines of the response.
***

In [8]:
# Writing sorted CAO data to a CSV file. 
path2021 = 'data/cao2021_csv_' + nowstr + '.csv'

In [9]:
# Keep track of how many courses we process.
no_lines = 0

# Open the csv file for writing.
with open(path2021, 'w') as f:
    # Write a header row.
    f.write(','.join(['code', 'title', 'pointsR1', 'pointsR2']) + '\n')
    # Loop through lines of the response.
    for line in resp.iter_lines():
        # Decode the line, using the wrong encoding!
        dline = line.decode('cp1252')
        # Match only the lines representing courses.
        if re_course.fullmatch(dline):
            # Add one to the lines counter.
            no_lines = no_lines + 1
            # The course code.
            course_code = dline[:5]
            # The course title.
            course_title = dline[7:57]
            # Round one points.
            course_points = re.split(' +', dline[60:])
            if len(course_points) != 2:
                course_points = course_points[:2]
            # Join the fields using a comma.
            linesplit = [course_code, course_title, course_points[0], course_points[1]]
            #clear linesplit[2] from # and *
            #print(linesplit[2])
            # Rejoin the substrings with commas in between.
            f.write(','.join(linesplit) + '\n')

# Print the total number of processed lines.
print(f"Total number of lines is {no_lines}.")
            
                    

Total number of lines is 949.


In [10]:
df2021 = pd.read_csv(path2021, encoding='cp1252')

In [11]:
df2021

Unnamed: 0,code,title,pointsR1,pointsR2
0,AL801,Software Design for Virtual Reality and Gaming...,300,
1,AL802,Software Design in Artificial Intelligence for...,313,
2,AL803,Software Design for Mobile Apps and Connected ...,350,
3,AL805,Computer Engineering for Network Infrastructur...,321,
4,AL810,Quantity Surveying ...,328,
...,...,...,...,...
944,WD211,Creative Computing ...,270,
945,WD212,Recreation and Sport Management ...,262,
946,WD230,Mechanical and Manufacturing Engineering ...,230,230
947,WD231,Early Childhood Care and Education ...,266,


### Link to CAO Points 2020 

http://www2.cao.ie/points/CAOPointsCharts2020.xlsx

***

### CAO Points 2020 

xmls file , cleaning, fetching,  saving to file converted data and raw data

***

### Save Original File
***

In [12]:
# Create a file path for the original data.
pathxlsx = 'data/cao2020_' + nowstr + '.xlsx'

In [13]:
urlrq.urlretrieve('http://www2.cao.ie/points/CAOPointsCharts2020.xlsx', pathxlsx)

('data/cao2020_20211115_165607.xlsx',
 <http.client.HTTPMessage at 0x270f9549c70>)

### Load Spreadsheet using Pandas
***

In [14]:
# Download and parse the excel spreadsheet.
df2020 = pd.read_excel('http://www2.cao.ie/points/CAOPointsCharts2020.xlsx', skiprows=10)

In [15]:
df2020

Unnamed: 0,CATEGORY (i.e.ISCED description),COURSE TITLE,COURSE CODE2,R1 POINTS,R1 Random *,R2 POINTS,R2 Random*,EOS,EOS Random *,EOS Mid-point,...,avp,v,Column1,Column2,Column3,Column4,Column5,Column6,Column7,Column8
0,Business and administration,International Business,AC120,209,,,,209,,280,...,,,,,,,,,,
1,Humanities (except languages),Liberal Arts,AC137,252,,,,252,,270,...,,,,,,,,,,
2,Arts,"First Year Art & Design (Common Entry,portfolio)",AD101,#+matric,,,,#+matric,,#+matric,...,,,,,,,,,,
3,Arts,Graphic Design and Moving Image Design (portfo...,AD102,#+matric,,,,#+matric,,#+matric,...,,,,,,,,,,
4,Arts,Textile & Surface Design and Jewellery & Objec...,AD103,#+matric,,,,#+matric,,#+matric,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1459,Manufacturing and processing,Manufacturing Engineering,WD208,188,,,,188,,339,...,,,,,,,,,,
1460,Information and Communication Technologies (ICTs),Software Systems Development,WD210,279,,,,279,,337,...,,,,,,,,,,
1461,Information and Communication Technologies (ICTs),Creative Computing,WD211,271,,,,271,,318,...,,,,,,,,,,
1462,Personal services,Recreation and Sport Management,WD212,270,,,,270,,349,...,,,,,,,,,,


In [16]:
# Spot check a random row.
df2020.iloc[753]

CATEGORY (i.e.ISCED description)          Engineering and engineering trades
COURSE TITLE                        Road Transport Technology and Management
COURSE CODE2                                                           LC286
R1 POINTS                                                                264
R1 Random *                                                              NaN
R2 POINTS                                                                NaN
R2 Random*                                                               NaN
EOS                                                                      264
EOS Random *                                                             NaN
EOS Mid-point                                                            360
LEVEL                                                                      7
HEI                                         Limerick Institute of Technology
Test/Interview #                                                         NaN

In [17]:
# Spot check the last row.
df2020.iloc[-1]

CATEGORY (i.e.ISCED description)          Engineering and engineering trades
COURSE TITLE                        Mechanical and Manufacturing Engineering
COURSE CODE2                                                           WD230
R1 POINTS                                                                253
R1 Random *                                                              NaN
R2 POINTS                                                                NaN
R2 Random*                                                               NaN
EOS                                                                      253
EOS Random *                                                             NaN
EOS Mid-point                                                            369
LEVEL                                                                      8
HEI                                        Waterford Institute of Technology
Test/Interview #                                                         NaN

In [18]:
# Create a file path for the pandas data.
path = 'data/cao2020_' + nowstr + '.csv'

In [19]:
# Save pandas data frame to disk.
df2020.to_csv(path)

### Link CAO Points 2019 
https://www.cao.ie/index.php?page=points&p=2019
***

### CAO Points 2019
***

##### Preparation CAO 2019 pdf file for editing in Pandas ( steps)
- Downloading original file from the website - CAO Points 2019 - pdf file - lvl8_19.pdf  
- Open original pdf file in Microsoft Word.
- Save Microsoft Word's converted PDF in docx format.
- Re-save Word document for editing.
- Delete headers and footers.
- Delete preamble on page 1.
- Select all and copy.
- Paste into Notepad++.
- Remove HEI name headings and paste onto each course line.
- Delete blank lines.
- Replace " ` " to " ' ".
- Remove empty spaces at the end of the lines ( tabs, spaces)

In [20]:
# Read CAO Points 2019 edited from csv (tsv) file to pandas
df2019 = pd.read_csv("data/lvl8_19_20211115_1016_edited.csv", sep="\t")
df2019

Unnamed: 0,Institution,Code,Course,EOS,Mid
0,Athlone Institute of Technology,AL801,Software Design with Virtual Reality and Gaming,304,328.0
1,Athlone Institute of Technology,AL802,Software Design with Cloud Computing,301,306.0
2,Athlone Institute of Technology,AL803,Software Design with Mobile Apps and Connected...,309,337.0
3,Athlone Institute of Technology,AL805,Network Management and Cloud Infrastructure,329,442.0
4,Athlone Institute of Technology,AL810,Quantity Surveying,307,349.0
...,...,...,...,...,...
925,Waterford Institute of Technology,WD200,Arts (options),221,296.0
926,Waterford Institute of Technology,WD210,Software Systems Development,271,329.0
927,Waterford Institute of Technology,WD211,Creative Computing,275,322.0
928,Waterford Institute of Technology,WD212,Recreation and Sport Management,274,311.0


##### Years from 2018 - 2010 , downloading , prepering files, all should be pdf files 

### Link CAO Points 2018
https://www.cao.ie/index.php?page=points&p=2018
***

##### Preparing CAO 2018 pdf file for editing in Pandas

In [21]:
# Read CAO Points 2018 edited from csv (tsv) file to pandas
df2018 = pd.read_csv("data/CAO2018_20211115_edited.csv", sep="\t")
df2018

Unnamed: 0,Institution,Code,Course,EOS,Mid
0,Athlone Institute of Technology,AL801,Software Design (Game Development or Cloud Com...,295,326.0
1,Athlone Institute of Technology,AL810,Quantity Surveying,300,340.0
2,Athlone Institute of Technology,AL820,Mechanical and Polymer Engineering,299,371.0
3,Athlone Institute of Technology,AL830,General Nursing,418,440.0
4,Athlone Institute of Technology,AL832,Psychiatric Nursing,377,388.0
...,...,...,...,...,...
898,Waterford Institute of Technology,WD197,The Internet of Things,260,329.0
899,Waterford Institute of Technology,WD200,Arts,220,299.0
900,Waterford Institute of Technology,WD210,Software Systems Development,289,327.0
901,Waterford Institute of Technology,WD211,Creative Computing,265,326.0


### Link CAO Points 2017
https://www.cao.ie/index.php?page=points&p=2017
***

##### Preparing CAO 2017 pdf file for editing in Pandas

In [22]:
# Read CAO Points 2017 edited from csv (tsv) file to pandas
df2017 = pd.read_csv("data/CAO2017_edited.csv", sep="\t")
df2017

Unnamed: 0,Institution,Code,Course,EOS,Mid
0,Athlone Institute of Technology,AL801,Software Design (Game Development or Cloud Com...,290,329.0
1,Athlone Institute of Technology,AL810,Quantity Surveying,311,357.0
2,Athlone Institute of Technology,AL820,Mechanical and Polymer Engineering,300,336.0
3,Athlone Institute of Technology,AL830,General Nursing,398*,418.0
4,Athlone Institute of Technology,AL832,Psychiatric Nursing,378,389.0
...,...,...,...,...,...
865,Waterford Institute of Technology,WD193,Marketing and Digital Media,297,337.0
866,Waterford Institute of Technology,WD194,Culinary Arts,279,356.0
867,Waterford Institute of Technology,WD195,Architectural & Building Information Modelling...,273,320.0
868,Waterford Institute of Technology,WD197,The Internet of Things,262,328.0


### Link CAO Points 2016
https://www.cao.ie/index.php?page=points&p=2016
***

##### Preparing CAO 2016 pdf file for editing in Pandas

In [23]:
# Read CAO Points 2016 edited from csv (tsv) file to pandas
df2016 = pd.read_csv("data/CAO2016_edited.csv", sep="\t")
df2016

Unnamed: 0,Institution,Code,Course,EOS,Mid
0,Athlone Institute of Technology,AL801,Software Design (Game Development or Cloud Com...,300,340.0
1,Athlone Institute of Technology,AL810,Quantity Surveying,315,355.0
2,Athlone Institute of Technology,AL820,Mechanical and Polymer Engineering,295,340.0
3,Athlone Institute of Technology,AL830,General Nursing,425*,440.0
4,Athlone Institute of Technology,AL831,Mature Applicants General Nursing,#181,185.0
...,...,...,...,...,...
928,Waterford Institute of Technology,WD197,The Internet of Things,275,380.0
929,Waterford Institute of Technology,WD200,Arts,275,320.0
930,Waterford Institute of Technology,WD816,Mature Applicants General Nursing,#188,193.0
931,Waterford Institute of Technology,WD817,Mature Applicants Psychiatric Nursing,#176,183.0


### Link CAO Points 2015
https://www.cao.ie/index.php?page=points&p=2015
***

##### Preparing CAO 2015 pdf file for editing in Pandas

In [24]:
# Read CAO Points 2015 edited from csv (tsv) file to pandas
df2015 = pd.read_csv("data/CAO2015_edited.csv", sep="\t")
df2015

Unnamed: 0,Institution,Code,Course,EOS,Mid
0,Athlone Institute of Technology,AL801,Software Design (Game Development or Cloud Com...,280,345.0
1,Athlone Institute of Technology,AL820,Mechanical and Polymer Engineering,315,355.0
2,Athlone Institute of Technology,AL830,General Nursing,420,435.0
3,Athlone Institute of Technology,AL831,Mature Applicants General Nursing,#176*,182.0
4,Athlone Institute of Technology,AL832,Psychiatric Nursing,390,400.0
...,...,...,...,...,...
919,Waterford Institute of Technology,WD197,Internet of Things,240,340.0
920,Waterford Institute of Technology,WD200,Arts,280,315.0
921,Waterford Institute of Technology,WD816,Mature Applicants General Nursing,#181,188.0
922,Waterford Institute of Technology,WD817,Mature Applicants Psychiatric Nursing,#167*,172.0


### Link CAO Points 2014
https://www.cao.ie/index.php?page=points&p=2014
***

##### Preparing CAO 2014 pdf file for editing in Pandas

In [60]:
# Read CAO Points 2014 edited from csv (tsv) file to pandas
df2014 = pd.read_csv("data/CAO2014_edited.csv", sep="\t")
df2014

Unnamed: 0,Institution,Code,Course,EOS,Mid
0,ATHLONE IT,AL801,Software Design (Common Entry,280,335
1,ATHLONE IT,AL820,Mechanical and Polymer Engineering ...,315,365
2,ATHLONE IT,AL830,General Nursing ...,410,420
3,ATHLONE IT,AL831,Mature Applicants General Nursing ...,#169,173
4,ATHLONE IT,AL832,Psychiatric Nursing ...,390,395
...,...,...,...,...,...
940,WATERFORD INSTITUTE OF TECHNOLOGY,WD191,Agricultural Science ...,430,445
941,WATERFORD INSTITUTE OF TECHNOLOGY,WD200,Arts ...,280,325
942,WATERFORD INSTITUTE OF TECHNOLOGY,WD816,Mature Applicants General Nursing ...,#183,186
943,WATERFORD INSTITUTE OF TECHNOLOGY,WD817,Mature Applicants Psychiatric Nursing ...,#159*,168


### Link CAO Points 2013
https://www.cao.ie/index.php?page=points&p=2013
***

##### Preparing CAO 2013 pdf file for editing in Pandas

In [86]:
# Read CAO Points 2013 edited from csv (tsv) file to pandas
df2013 = pd.read_csv("data/CAO2013_edited.csv", sep="\t")
df2013

Unnamed: 0,Institution,Code,Course,EOS,Mid
0,ATHLONE IT,AL802,Software Design (Games Development) ...,275,325
1,ATHLONE IT,AL803,Software Design (Cloud Computing) ...,280,345
2,ATHLONE IT,AL830,General Nursing ...,410*,415
3,ATHLONE IT,AL831,Mature Applicants General Nursing ...,566#,581
4,ATHLONE IT,AL832,Psychiatric Nursing ...,395,400
...,...,...,...,...,...
914,WATERFORD INSTITUTE OF TECHNOLOGY,WD187,Social Science ...,300,340
915,WATERFORD INSTITUTE OF TECHNOLOGY,WD200,Arts ...,275,325
916,WATERFORD INSTITUTE OF TECHNOLOGY,WD816,Mature Applicants General Nursing ...,582#,605
917,WATERFORD INSTITUTE OF TECHNOLOGY,WD817,Mature Applicants Psychiatric Nursing ...,557#,565


### Link CAO Points 2012
https://www.cao.ie/index.php?page=points&p=2012
***

##### Preparing CAO 2012 pdf file for editing in Pandas

In [138]:
# Read CAO Points 2012 edited from csv (tsv) file to pandas
df2012 = pd.read_csv("data/CAO2012_edited.csv", sep="\t")
df2012

Unnamed: 0,Institution,Code,Course,EOS,Mid
0,ATHLONE IT,AL802,Software Design (Games Development) ...,300,340
1,ATHLONE IT,AL803,Software Design (Web Development) ...,310,335
2,ATHLONE IT,AL805,Construction Technology and Management ...,,
3,ATHLONE IT,AL830,General Nursing ...,415*,430
4,ATHLONE IT,AL831,Mature Applicants General Nursing ...,233#,235
...,...,...,...,...,...
870,WATERFORD INSTITUTE OF TECHNOLOGY,WD180,Physics for Modern Technology ...,345,365
871,WATERFORD INSTITUTE OF TECHNOLOGY,WD200,Arts ...,285,320
872,WATERFORD INSTITUTE OF TECHNOLOGY,WD816,Mature Applicants General Nursing ...,234#,247
873,WATERFORD INSTITUTE OF TECHNOLOGY,WD817,Mature Applicants Psychiatric Nursing ...,223#,225


### Link CAO Points 2011
https://www.cao.ie/index.php?page=points&p=2011
***

##### Preparing CAO 2011 pdf file for editing in Pandas

In [163]:
# Read CAO Points 2011 edited from csv (tsv) file to pandas
df2011 = pd.read_csv("data/CAO2011_edited.csv", sep="\t")
df2011

Unnamed: 0,Institution,Code,Course,EOS,Mid
0,ATHLONE IT,AL032,Software Design (Games Development) ...,285,330
1,ATHLONE IT,AL033,Toxicology ...,240,330
2,ATHLONE IT,AL034,Software Design (Web Development) ...,285,340
3,ATHLONE IT,AL035,Construction Technology and Management ...,265,315
4,ATHLONE IT,AL050,Business ...,270,325
...,...,...,...,...,...
842,WATERFORD INSTITUTE OF TECHNOLOGY,WD200,Arts ...,290,320
843,WATERFORD INSTITUTE OF TECHNOLOGY,WD816,Mature Applicants General Nursing ...,#232,233
844,WATERFORD INSTITUTE OF TECHNOLOGY,WD817,Mature Applicants Psychiatric Nursing ...,#224,228
845,WATERFORD INSTITUTE OF TECHNOLOGY,WD820,Mature Applicants Intellectual Disability Nurs...,#220*,228


### Link CAO Points 2010
https://www.cao.ie/index.php?page=points&p=2010
***

##### Preparing CAO 2010 pdf file for editing in Pandas

In [215]:
# Read CAO Points 2010 edited from csv (tsv) file to pandas
df2010 = pd.read_csv("data/CAO2010_edited.csv", sep="\t")
df2010

Unnamed: 0,Institution,Code,Course,EOS,Mid
0,ATHLONE IT,AL032,Software Design (Games Development) ...,265 (375v),315
1,ATHLONE IT,AL033,Toxicology ...,280,345
2,ATHLONE IT,AL034,Software Design (Web Development) ...,270,300
3,ATHLONE IT,AL035,Construction Technology and Management ...,265,310
4,ATHLONE IT,AL050,Business ...,275,320
...,...,...,...,...,...
845,WATERFORD INSTITUTE OF TECHNOLOGY,WD168,Entertainments Systems ...,335,
846,WATERFORD INSTITUTE OF TECHNOLOGY,WD200,Arts ...,330,
847,WATERFORD INSTITUTE OF TECHNOLOGY,WD816,Mature Applicants General Nursing ...,181,
848,WATERFORD INSTITUTE OF TECHNOLOGY,WD817,Mature Applicants Psychiatric Nursing ...,166,


### REFERENCES:
    1.
    2.
    3.
    