**Author**: 

Abdelrahman Khairy Mahmoud

&emsp; M.Eng. in Mechanical and Industrial Engineering

&emsp; University of Toronto, Toronto, ON, Canada

**Project description**:

The purpose of this project is to perform web-scraping on Bilgi University's (Istanbul, Turkey) curriculum for the Management Information Systems (MIS) program.

**Notes**: (v1.2)


- All curriculum webpages observed essentially follow the same outline format. An intended later version of this project will be adapted to obtain data pertaining to any course on the University website
- Later versions will support individual course content data mining using course using \<href>

# Getting HTML and Creating a BeautifulSoup Object

In [35]:
# importing necessary libraries
import pandas as pd
import time
import requests as re
from bs4 import BeautifulSoup as bs

In [2]:
# browser header for requests object
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.74 Safari/537.36 Edg/99.0.1150.46"}

# homepage for MIS curriculum
home = re.get('https://ects.bilgi.edu.tr/Department/Curriculum?catalog_departmentId=126632', headers)
time.sleep(3) # time delay to ensure all page contents are loaded

# Create a beautifulSoup object with the webpage's HTML
soup = bs(home.text, 'html.parser') 

# Data Extraction  
What we're looking for: (v1.2)  
- Course code (ABC####)  
- Course name (String)  
- Course status (core/elective)  
- Credits (Lec+Pra)  

In [3]:
container = soup.find("div", {'class':"panel panel-default"})
container2 = container.find("div", {'class':'panel-body'})

# div tags contain table headings
divs = container2.find_all("div")
divs = [div.text for div in divs]

# tables ontain the core curriculum content (incl. electives)
tables = container2.find_all("table")
tables = [table.find("tbody") for table in tables]

Level index = *8*  
Semester index = *23*

In [4]:
divs

['Level : 1 | Semester : 1 - 52. Management Information Systems',
 'Level : 1 | Semester : 2 - 52. Management Information Systems',
 'Level : 2 | Semester : 1 - 52. Management Information Systems',
 'Level : 2 | Semester : 2 - 52. Management Information Systems',
 'Level : 3 | Semester : 1 - 52. Management Information Systems',
 'Level : 3 | Semester : 2 - 52. Management Information Systems',
 'Level : 4 | Semester : 1 - 52. Management Information Systems',
 'Level : 4 | Semester : 2 - 52. Management Information Systems',
 '\nTotal ECTS : 240\n']

In [63]:
labels = ['course_code', 'course_name', 'status', 'lec_pra', 'ects', 'level', 'semester']
course_data = pd.DataFrame(columns=labels)
for index, table in enumerate(tables):
    rows = table.find_all("tr")
    for row in rows:
        try:
            course = {'course_code' : row.find_all("td")[0].text.strip(),
            'course_name' : row.find_all("td")[1].text.strip(),
            'status' : row.find_all("td")[2].text.strip(),
            'lec_pra' : row.find_all("td")[3].text.strip(),
            'ects' : row.find_all("td")[4].text.strip(),
            'level' : divs[index][8],
            'semester' : divs[index][23]}

            course_data = course_data.append(course, ignore_index=True)
        except IndexError:
            continue

In [64]:
course_data.to_csv('Bilgi_MIS_Curriculum')
course_data

Unnamed: 0,course_code,course_name,status,lec_pra,ects,level,semester
0,BUS 179,Experiencing Business in Society I,Core,3+0,6,1,1
1,EC 101,Introduction to Economics I,Core,2+1,6,1,1
2,BUS/E 179,English for Academic Purposes I,Core,4+0,3,1,1
3,CMPE 130,Algorithms and Programming,Core,3+2,6,1,1
4,MATH 175,Calculus I,Core,4+1,7,1,1
5,TK 103,Turkish Language I,Core,2+0,2,1,1
6,MIS 102,Intermediate Programming,Core,3+2,5,1,2
7,BUS 120,Office Applications for Business and Economics,Core,2+0,1,1,2
8,BUS 180,Experiencing Business in Society II,Core,3+0,6,1,2
9,EC 102,Introduction to Economics II,Core,2+1,6,1,2
