# UCL MSc - Module Scrape
## Preamble
Import modules

In [1]:
import CompSciScrape as cs
import pandas as pd

## Scraping the Computer Science Module Directory
First, a database is made of all postgraduate modules listed in the Computer Science department's 2017/18 directory. Note, statistics courses were found not to be listed with detail on the UCL website. Only reading lists seem ubiquitous across modules from different departments.

In [2]:
# Specify URL location of module directory
parentURL = 'http://www.cs.ucl.ac.uk/'
targetURL = 'http://www.cs.ucl.ac.uk/current_students/syllabus/pg/'

# Collect soup
modules_soup = cs.collectSoup(targetURL)

# Find all module tags and parse modules into a pandas dataframe
modules_directory = cs.parseModuleList(modules_soup, parentURL)
modules_directory.head()

Unnamed: 0,Code,Module,Link
0,COMPG001,Financial Data and Statistics,http://www.cs.ucl.ac.uk/current_students/sylla...
1,COMPG004,Market Risk Measures and Portfolio Theory,http://www.cs.ucl.ac.uk/current_students/sylla...
2,COMPG005,Numerical Analysis for Finance,http://www.cs.ucl.ac.uk/current_students/sylla...
3,COMPG007,Operational Risk Measurement for Financial Ins...,http://www.cs.ucl.ac.uk/current_students/sylla...
4,COMPG008,Stochastic Processes for Finance,http://www.cs.ucl.ac.uk/current_students/sylla...


## Loading the Target Modules List
Second, a database of modules available on our target MSc courses is loaded.

In [3]:
# Specify file location of CSV
file = 'TargetModules.csv'

# Read CSV file to pandas dataframe
modules_targets = cs.loadCSV(file)
modules_targets.head()

Unnamed: 0,Module,CS & ML,ML,Type
0,Supervised Learning,True,True,Core
1,Statistical Modelling and Data Analysis,True,False,Core
2,Graphical Models,True,True,Group One
3,Probabilistic and Unsupervised Learning,True,True,Group One
4,Advanced Deep Learning and Reinforcement Learning,True,True,Group Two


## Identify Links
The directory and target lists are joined to identify the URLs containing the relevant module information.

In [4]:
# The dataframes are joined based on the text string title of the module
modules_targets_2 = cs.joinDataframes(modules_targets, modules_directory, 'Module')
modules_targets_2

# Note NaN values for statistics modules that don't appear on the Computer Science directory of modules

Unnamed: 0,Module,CS & ML,ML,Type,Code,Link
0,Supervised Learning,True,True,Core,COMPGI01,http://www.cs.ucl.ac.uk/current_students/sylla...
1,Statistical Modelling and Data Analysis,True,False,Core,,
2,Graphical Models,True,True,Group One,COMPGI08,http://www.cs.ucl.ac.uk/current_students/sylla...
3,Probabilistic and Unsupervised Learning,True,True,Group One,COMPGI18,http://www.cs.ucl.ac.uk/current_students/sylla...
4,Advanced Deep Learning and Reinforcement Learning,True,True,Group Two,COMPGI22,http://www.cs.ucl.ac.uk/current_students/sylla...
5,Advanced Topics in Machine Learning,True,True,Group Two,COMPGI13,http://www.cs.ucl.ac.uk/current_students/sylla...
6,Applied Machine Learning,True,True,Group Two,COMPGI09,http://www.cs.ucl.ac.uk/current_students/sylla...
7,Approximate Inference and Learning in Probabil...,True,True,Group Two,COMPGI16,http://www.cs.ucl.ac.uk/current_students/sylla...
8,Information Retrieval & Data Mining,True,True,Group Two,COMPGI15,http://www.cs.ucl.ac.uk/current_students/sylla...
9,Introduction to Deep Learning,True,True,Group Two,COMPGI23,http://www.cs.ucl.ac.uk/current_students/sylla...


## Pull Detail from Module Pages
The links acquired are visited to extract detailed data on each module.

In [5]:
details = None
for i in range(len(modules_targets_2)):
    if pd.isnull(modules_targets_2.at[i, 'Link']):
        continue
    else:
        module = modules_targets_2.at[i, 'Module']
        link = modules_targets_2.at[i, 'Link']
        soup = cs.collectSoup(link)
        data = cs.parseModuleDetails(module, soup)
    if details is None:
        details = data
    else:
        details = pd.concat([details, data])  
details.head()

Unnamed: 0,Aims,Assessment,Code,Content,Learning Outcomes,Method of Instruction,Module,Prerequisites,Reading List,Resources,Taught By,Term,Year
0,This module covers supervised approaches to ma...,The course has the following assessment compon...,COMPGI01 (Also taught as: COMPM055 Supervised ...,The course consists of both foundational topic...,Gain in-depth familiarity with various classic...,Lecture presentations with associated class pr...,Supervised Learning,"Basic mathematics, Calculus, Probability, Line...",http://readinglists.ucl.ac.uk/modules/compgi01...,Reading list available via the UCL Library cat...,"Mark Herbster (50%), John Shawe-Taylor (30%), ...",1,MSc
0,To give an introduction to probabilistic model...,The course has the following assessment compon...,COMPGI08 (Also taught as COMPM056 ),Bayesian Reasoning. Bayesian Networks. Directe...,"Ability to construct probabilistic models, lea...",Lectures,Graphical Models,Excellent understanding and abilities with Lin...,http://readinglists.ucl.ac.uk/modules/compgi08...,Reading list available via the UCL Library cat...,Dmitry Adamsky [Teaching Fellow] (100%) David ...,1,MSc
0,This course provides students with an in-depth...,The course has the following assessment compon...,COMPGI18,Basics of Bayesian learning and regression. La...,To be able to understand the theory of unsuper...,Lecture presentations with associated class pr...,Probabilistic and Unsupervised Learning,"A good background in statistics, calculus, lin...",http://readinglists.ucl.ac.uk/modules/compgi18...,Reading list available via the UCL Library cat...,Maneesh Sahani (Gatsby Computational Neuroscie...,1,MSc
0,Students successfully completing the module sh...,The course has the following assessment compon...,COMPGI22 (Also taught as COMPMI22 ),The course has two interleaved parts that conv...,To understand the foundations of deep learning...,"Lectures, reading, and course work assignments...",Advanced Deep Learning and Reinforcement Learning,"The prerequisites are probability, calculus, l...",http://readinglists.ucl.ac.uk/modules/compmi22...,Reading list available via the UCL Library cat...,Thore Graepel (50%) Hado van Hasselt (50%) The...,2,MSc
0,Kernel methods To gain an understanding of the...,The course has the following assessment compon...,COMPGI13 (Also taught as: COMPM050 );,Introduction to kernel methods:. - Definition ...,To gain in-depth familiarity with the selected...,Frontal teaching using whiteboard and slides,Advanced Topics in Machine Learning,"Linear Algebra, Probability Theory, Calculus",http://readinglists.ucl.ac.uk/modules/compgi13...,Reading list available via the UCL Library cat...,Arthur Gretton (50%) and Carlo Ciliberto (50%),1,MSc


## Combine Data and Finish
New data is joined into the original target list and printed out to CSV.

In [12]:
final = cs.joinDataframes(modules_targets_2, details, 'Module')
final.to_csv('Results.csv')
final.head()

Unnamed: 0,Module,CS & ML,ML,Type,Code_left,Link,Aims,Assessment,Code_right,Content,Learning Outcomes,Method of Instruction,Prerequisites,Reading List,Resources,Taught By,Term,Year
0,Supervised Learning,True,True,Core,COMPGI01,http://www.cs.ucl.ac.uk/current_students/sylla...,This module covers supervised approaches to ma...,The course has the following assessment compon...,COMPGI01 (Also taught as: COMPM055 Supervised ...,The course consists of both foundational topic...,Gain in-depth familiarity with various classic...,Lecture presentations with associated class pr...,"Basic mathematics, Calculus, Probability, Line...",http://readinglists.ucl.ac.uk/modules/compgi01...,Reading list available via the UCL Library cat...,"Mark Herbster (50%), John Shawe-Taylor (30%), ...",1.0,MSc
1,Statistical Modelling and Data Analysis,True,False,Core,,,,,,,,,,,,,,
2,Graphical Models,True,True,Group One,COMPGI08,http://www.cs.ucl.ac.uk/current_students/sylla...,To give an introduction to probabilistic model...,The course has the following assessment compon...,COMPGI08 (Also taught as COMPM056 ),Bayesian Reasoning. Bayesian Networks. Directe...,"Ability to construct probabilistic models, lea...",Lectures,Excellent understanding and abilities with Lin...,http://readinglists.ucl.ac.uk/modules/compgi08...,Reading list available via the UCL Library cat...,Dmitry Adamsky [Teaching Fellow] (100%) David ...,1.0,MSc
3,Probabilistic and Unsupervised Learning,True,True,Group One,COMPGI18,http://www.cs.ucl.ac.uk/current_students/sylla...,This course provides students with an in-depth...,The course has the following assessment compon...,COMPGI18,Basics of Bayesian learning and regression. La...,To be able to understand the theory of unsuper...,Lecture presentations with associated class pr...,"A good background in statistics, calculus, lin...",http://readinglists.ucl.ac.uk/modules/compgi18...,Reading list available via the UCL Library cat...,Maneesh Sahani (Gatsby Computational Neuroscie...,1.0,MSc
4,Advanced Deep Learning and Reinforcement Learning,True,True,Group Two,COMPGI22,http://www.cs.ucl.ac.uk/current_students/sylla...,Students successfully completing the module sh...,The course has the following assessment compon...,COMPGI22 (Also taught as COMPMI22 ),The course has two interleaved parts that conv...,To understand the foundations of deep learning...,"Lectures, reading, and course work assignments...","The prerequisites are probability, calculus, l...",http://readinglists.ucl.ac.uk/modules/compmi22...,Reading list available via the UCL Library cat...,Thore Graepel (50%) Hado van Hasselt (50%) The...,2.0,MSc
