# CAO Points analysis
***

## Purpose
##### The purpose of this notebook is to analyse the CAO points required for Level 8 courses in 3rd level institutions in Ireland. 
##### The data used in this analysis can be found [here](http://www.cao.ie/index.php?page=points&p=2021).

### Library imports
##### The libraries used in the analysis are numpy, pyplot and Element Tree

In [22]:
import numpy as np
import matplotlib.pyplot as plt
import xml.etree.ElementTree as ET
import re
import requests as rq
import pandas as pd
import urllib.request as urlrq

## CAO Points 2021 analysis
***

In [2]:
# Get CAO points URL
resp = rq.get('http://www2.cao.ie/points/l8.php')

In [3]:
# Check request status
resp

<Response [200]>

### Save Dataset
##### Using the datetime module, create a unique file name for the dataset and save it to back it up. 

In [4]:
import datetime as dt
now = dt.datetime.now()
nowstr = now.strftime('%Y%m%d_%H%M%S')
filenameHTML = 'data/cao2021_html_' + nowstr + '.html'

##### The encoding given by the server is not the correct encoding that the html is in. The following lines of code correct this. 

In [5]:
original_encoding = resp.encoding
resp.encoding = 'cp1252'
print(original_encoding)
print(resp.encoding)

iso-8859-1
cp1252


##### Now that the encoding has been corrected, the data can be saved to a csv file.

In [6]:
with open(filenameHTML, 'w') as f:
    f.write(resp.text)

In [7]:
# Compile regular expression for matching lines
re_course = re.compile(r'([A-Z]{2}[0-9]{3})  (.*)([0-9]{3})(\*?) *')

In [8]:
# iterate through lines in the response
filenameCSV = 'data/cao2021_csv_' + nowstr + '.csv'

num_lines = 0 

with open(filenameCSV, 'w') as f:
    for line in resp.iter_lines():
        dline = line.decode('cp1252')
        # Get only the lines with courses and points
        if re_course.fullmatch(dline):
            num_lines += 1
            linesplit = re.split('  +', dline)
            f.write(','.join(linesplit) + '\n')
print(f"Total number of lines is {num_lines}")

Total number of lines is 922


## CAO Points 2020 analysis
***

In [23]:
# Create file path for raw data
path = 'data/cao2020_' + nowstr + '.xlsx'

In [24]:
# Save raw data locally
urlrq.urlretrieve('http://www2.cao.ie/points/CAOPointsCharts2020.xlsx', path)

('data/cao2020_20220101_144624.xlsx',
 <http.client.HTTPMessage at 0x1b24b8e94f0>)

In [16]:
df = pd.read_excel('http://www2.cao.ie/points/CAOPointsCharts2020.xlsx', skiprows=10)

In [17]:
df

Unnamed: 0,CATEGORY (i.e.ISCED description),COURSE TITLE,COURSE CODE2,R1 POINTS,R1 Random *,R2 POINTS,R2 Random*,EOS,EOS Random *,EOS Mid-point,...,avp,v,Column1,Column2,Column3,Column4,Column5,Column6,Column7,Column8
0,Business and administration,International Business,AC120,209,,,,209,,280,...,,,,,,,,,,
1,Humanities (except languages),Liberal Arts,AC137,252,,,,252,,270,...,,,,,,,,,,
2,Arts,"First Year Art & Design (Common Entry,portfolio)",AD101,#+matric,,,,#+matric,,#+matric,...,,,,,,,,,,
3,Arts,Graphic Design and Moving Image Design (portfo...,AD102,#+matric,,,,#+matric,,#+matric,...,,,,,,,,,,
4,Arts,Textile & Surface Design and Jewellery & Objec...,AD103,#+matric,,,,#+matric,,#+matric,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1459,Manufacturing and processing,Manufacturing Engineering,WD208,188,,,,188,,339,...,,,,,,,,,,
1460,Information and Communication Technologies (ICTs),Software Systems Development,WD210,279,,,,279,,337,...,,,,,,,,,,
1461,Information and Communication Technologies (ICTs),Creative Computing,WD211,271,,,,271,,318,...,,,,,,,,,,
1462,Personal services,Recreation and Sport Management,WD212,270,,,,270,,349,...,,,,,,,,,,


In [18]:
df.iloc[754]

CATEGORY (i.e.ISCED description)    Engineering and engineering trades
COURSE TITLE                                    Mechanical Engineering
COURSE CODE2                                                     LC288
R1 POINTS                                                          347
R1 Random *                                                        NaN
R2 POINTS                                                          346
R2 Random*                                                         NaN
EOS                                                                346
EOS Random *                                                       NaN
EOS Mid-point                                                      415
LEVEL                                                                8
HEI                                   Limerick Institute of Technology
Test/Interview #                                                   NaN
avp                                                                NaN
v     

In [25]:
# Create file path for pandas dataframe data
path = 'data/cao2020_' + nowstr + '.csv'

In [26]:
# Save data in pandas dataframe to a csv file locally
df.to_csv(path)