# CAO Points 2021 analysis
***

## Purpose
##### The purpose of this notebook is to analyse the CAO points required for Level 8 courses in 3rd level institutions in Ireland. 
##### The data used in this analysis can be found [here](http://www.cao.ie/index.php?page=points&p=2021).

### Library imports
##### The libraries used in the analysis are numpy, pyplot and Element Tree

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import xml.etree.ElementTree as ET
import re
import requests as rq

In [2]:
# Get CAO points URL
resp = rq.get('http://www2.cao.ie/points/l8.php')

In [3]:
# Check request status
resp

<Response [200]>

### Save Dataset
##### Using the datetime module, create a unique file name for the dataset and save it to back it up. 

In [4]:
import datetime as dt
now = dt.datetime.now()
nowstr = now.strftime('%Y%m%d_%H%M%S')
filenameHTML = 'data/cao2021_html_' + nowstr + '.html'

##### The encoding given by the server is not the correct encoding that the html is in. The following lines of code correct this. 

In [5]:
original_encoding = resp.encoding
resp.encoding = 'cp1252'
print(original_encoding)
print(resp.encoding)

iso-8859-1
cp1252


##### Now that the enoding has been corrected, the data can be saved to a csv file.

In [6]:
with open(filenameHTML, 'w') as f:
    f.write(resp.text)

In [7]:
# Compile regular expression for matching lines
re_course = re.compile(r'([A-Z]{2}[0-9]{3})  (.*)([0-9]{3})(\*?) *')

In [8]:
# iterate through lines in the response
filenameCSV = 'data/cao2021_csv_' + nowstr + '.csv'

num_lines = 0 

with open(filenameCSV, 'w') as f:
    for line in resp.iter_lines():
        # Get only the lines with courses and points
        dline = line.decode('cp1252')
        if re_course.fullmatch(dline):
            num_lines += 1
            # Add commas to make into CSV file later
            # csv_version = re_course.sub(r'\1,\2,\3\4', line.decode('cp1252'))
            linesplit = re.split('  +', dline)
            # print(csv_version)
            f.write(','.join(linesplit) + '\n')
print(f"Total number of lines is {num_lines}")

Total number of lines is 922
