<font size="20"><ins>**Course Recommender System**</ins></font> 

<font size="5">July 28th 2024</font> 


# 1. Introduction

===============================================================================================================================================================================

**Background**:

New students hard to find the suitable course based on their preference and knowledge level

**Objectives**:

Developing system recommendation to provide the suitable course based on student’s preference

**Dataset Overview**

Dataset yang digunakan merupakan data informasi kursus yang tersedia pada platform online course yang bernama Coursera. Platform ini menyediakan berbagai layanan kursus dengan berbagai kategori pembelajaran yang dapat dipilih oleh student.

Informasi terkait dataset dirangkup sebagai berikut:


| Column Name       | Data Type | Description                                       |
| ------------------|-----------|---------------------------------------------------|
| Title             | Object    | Title of the course                               |
| Category          | Object    | Category of the course                            |
| Type              | Object    | Type of the course                                |
| Level	            | Object    | Level of the course                               |
| Description	    | Object    | Detail description of the course                  |
| Price             | Object    | Is the course free or enroll                      |
| Rating	        | Float     | Ratings given about the course                    |
| Language          | Object    | Course Language                                   |
| Prerequisites     | Object    | The prerequisites needed before start the course  |
| Syllabus          | Object    | Course's syllabus                                 |
| Modules           | Object    | Course's Modulse                                  |
| Instructor        | Object    | Course's Instructor                               |
| Certificate type  | Object    | Certifica thath being provided                    |
| Association       | Object    | The association that collaborated with course     |
| Image             | Object    | Course's Image                                    |    
| URL               | Object    | Course's URL                                      |
| Timestampt        | Date      | Timestampt that course being taken                |

===============================================================================================================================================================================



# 2. Import Libraries

In [1]:
## Import libraries
from selenium import webdriver
import bs4
from bs4 import BeautifulSoup
import pandas as pd

# 3. Data Loading

In [2]:
# Import Raw Dataset
df_raw = pd.read_csv('webautomation_coursera.csv')

# Display First 5 rows from Dataset
df_raw.head()

Unnamed: 0,url,title,associated-university-institution-company,type,image,category-subject-area,certificate-is-available,description,duration,language,level,prerequisites,price,rating,syllabus,timestamp
0,https://www.coursera.org/specializations/netwo...,Networking in Google Cloud Specialization,Google Cloud,specializations,https://s3.amazonaws.com/coursera_assets/meta_...,Networking,Shareable Certificate,This specialization gives participants broad s...,Approximately 4 months to complete,English,Intermediate Level,-,free,4.8,-,2022-07-29T23:58:34Z
1,https://www.coursera.org/learn/2-speed-it,Two Speed IT: How Companies Can Surf the Digit...,CentraleSupélec,course,https://s3.amazonaws.com/coursera_assets/meta_...,Business Essentials,Shareable Certificate,"Transform or disappear, the Darwinism of IT: I...",Approx. 14 hours to complete,English,-,-,free,4.3,Introduction ~.~ Start here! ~.~ IT and the CI...,2022-07-29T23:58:34Z
2,https://www.coursera.org/learn/fundamentals-ne...,Fundamentals of Network Communication,University of Colorado System,course,https://s3.amazonaws.com/coursera_assets/meta_...,Computer Security and Networks,Shareable Certificate,"In this course, we trace the evolution of netw...",Approx. 15 hours to complete,English,Intermediate Level,-,free,4.6,Communication Networks and Services ~.~ This m...,2022-07-29T23:58:54Z
3,https://www.coursera.org/learn/ux-design-jobs,Design a User Experience for Social Good & Pre...,Google,course,https://s3.amazonaws.com/coursera_assets/meta_...,Design and Product,Shareable Certificate,Design a User Experience for Social Good and P...,Approx. 71 hours to complete,English,Beginner Level,-,free,4.8,"Starting the UX design process: empathize, def...",2022-07-29T23:59:20Z
4,https://www.coursera.org/learn/database-applic...,Building Database Applications in PHP,University of Michigan,course,https://s3.amazonaws.com/coursera_assets/meta_...,Mobile and Web Development,Shareable Certificate,"In this course, we'll look at the object orien...",Approx. 24 hours to complete,English,Intermediate Level,-,free,4.9,PHP Objects ~.~ We look at the object oriented...,2022-07-29T23:59:20Z


In [3]:
df_raw.info()# Menampilkan summary dari data df dengan menggunakan .info()
print('Informasi data: ', '\n')
print(df_raw.info())
print('Data yang kosong: ', df_raw.isnull().sum().sum()) # Check apakah ada data null
print('Data yang duplicate: ', df_raw.duplicated().sum()) # Check apakah ada data null

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 242 entries, 0 to 241
Data columns (total 16 columns):
 #   Column                                     Non-Null Count  Dtype 
---  ------                                     --------------  ----- 
 0   url                                        242 non-null    object
 1   title                                      242 non-null    object
 2   associated-university-institution-company  242 non-null    object
 3   type                                       242 non-null    object
 4   image                                      242 non-null    object
 5   category-subject-area                      242 non-null    object
 6   certificate-is-available                   242 non-null    object
 7   description                                242 non-null    object
 8   duration                                   242 non-null    object
 9   language                                   242 non-null    object
 10  level                                 

In [4]:
# Drop Duplicated rows from dataset
df_raw.drop_duplicates(inplace=True)

# Re-check the duplicated
df_raw.duplicated().sum()

0

**Insight**
- Dataset terdiri dari 16 kolom dan 242 rows dengan tipe data kategorikal pada semua kolom.
- Dataset memiliki 5 kolom duplicate dan tidak memiliki data null.
- Dataset ini kemudian akan dilengkapi dengan data terkait modules dan instructor yang akan diperoleh dari web scrapping pada websites coursera berdasarkan url yang tersedia pada kolom url.  

# 4. Web Scripping

In [5]:
# create driver
driver = webdriver.Chrome()

The chromedriver version (125.0.6422.78) detected in PATH at c:\Users\andre\Hactiv8\Phase 2\Final Project\chromedriver.exe might not be compatible with the detected chrome version (126.0.6478.185); currently, chromedriver 126.0.6478.182 is recommended for chrome 126.*, so it is advised to delete the driver in PATH and retry


In [458]:
# Buat list unutk menampung nilai
list_titles = []
list_modules = []
list_instructors_name = []

# sediakan url
url = df_raw['url'][0]

# akses website
driver.get(url)

# extract html
html = driver.page_source

# Parsing html
soup = BeautifulSoup(html, 'html.parser') 

# 1. Mengambil title course
title = soup.find('h1')
list_titles.append(title.get_text())
list_titles = ';'.join(list_titles)

# 2. Mengambil nama module
modules = soup.find_all('h3', {'class':'cds-119'})
for module in modules:
    if module.get_text() not in ['Instructor','Offered by','More questions']:
        list_modules.append(module.get_text())

# Join semua data menjadi 1 baris
list_modules = ';'.join(list_modules)

# Mengambil nama instruktur
instructor_name = soup.find_all('span', {'class':'css-6ecy9b'}, limit=4)
for name in instructor_name:
    list_instructors_name.append(name.get_text())

# Join semua data menjadi 1 baris
list_instructors_name = ';'.join(list_instructors_name)

# Gabungkan di dataframe
data = pd.DataFrame(({
    'Titles':pd.Series(list_titles),
    'Modules':pd.Series(list_modules),
    'Instructurs':pd.Series(list_instructors_name)}
))
data

Unnamed: 0,Titles,Modules,Instructurs
0,Networking in Google Cloud Specialization,Google Cloud Fundamentals: Core Infrastructure...,Google Cloud Training;Google Cloud;How long do...


## 4.1 Manual Scripping

In [1671]:
# Buat list unutk menampung nilai
list_titles = []
list_modules = []
list_instructors_name = []

# sediakan url
url = df_raw['url'][238]

# akses website
driver.get(url)

# extract html
html = driver.page_source

# Parsing html
soup = BeautifulSoup(html, 'html.parser') 

# 1. Mengambil title course
title = soup.find('h1')
list_titles.append(title.get_text())
list_titles = ','.join(list_titles)

# 2. Mengambil nama module
modules = soup.find_all('h3', {'class':'cds-119'}, limit=8)
for module in modules:
    if module.get_text() not in ['Instructor', 'Instructors','Offered by','More questions']:
        list_modules.append(module.get_text())

# Join semua data menjadi 1 baris
list_modules = ';'.join(list_modules)

# Mengambil nama instruktur
instructor_name = soup.find_all('span', {'class':'css-6ecy9b'}, limit=2)
for name in instructor_name:
    list_instructors_name.append(name.get_text())

# Join semua data menjadi 1 baris
list_instructors_name = ';'.join(list_instructors_name)

new_rows = {'Titles': list_titles, 'Modules':list_modules, 'Instructurs':list_instructors_name}

data = data.append(new_rows, ignore_index=True)

data

  data = data.append(new_rows, ignore_index=True)


Unnamed: 0,Titles,Modules,Instructurs
0,Networking in Google Cloud Specialization,Google Cloud Fundamentals: Core Infrastructure...,Google Cloud Training
1,Two Speed IT: How Companies Can Surf the Digit...,Introduction;IT and the CIO in the Digital Wor...,Antoine Gourévitch;Vanessa Lyon;Eric Baudson
2,Fundamentals of Network Communication,Communication Networks and Services;Layered Ar...,Xiaobo Zhou;University of Colorado System
3,Design a User Experience for Social Good & Pre...,Design for social good and strengthen your por...,Google Career Certificates
4,Building Database Applications in PHP,PHP Objects;Connecting PHP and MySQL;PHP Cooki...,Charles Russell Severance
...,...,...,...
234,Data Science: Foundations using R Specialization,The Data Scientist’s Toolbox;R Programming;Get...,"Roger D. Peng, PhD;Brian Caffo, PhD"
235,IBM Data Science Professional Certificate,Get exclusive access to career resources upon ...,IBM Skills Network Team;Dr. Pooja;Abhishek Gag...
236,Data Science Specialization,The Data Scientist’s Toolbox;R Programming;Get...,"Roger D. Peng, PhD;Brian Caffo, PhD;Jeff Leek,..."
237,Introduction to Physical Chemistry,Thermodynamics I;Thermodynamics II;Virtual Lab...,"Patrick J O'Malley, D.Sc;Michael W. Anderson, ..."


In [417]:
data.to_csv('new_data.csv')

In [5]:
new_data = pd.read_csv('new_data.csv')

In [6]:
# Assign final dataset
df = df_raw 

# Insert column Modules and Instructurs 
df['modules'] = new_data['Modules']
df['instructor'] = new_data['Instructurs']

# Display final dataset
df

Unnamed: 0,url,title,associated-university-institution-company,type,image,category-subject-area,certificate-is-available,description,duration,language,level,prerequisites,price,rating,syllabus,timestamp,modules,instructor
0,https://www.coursera.org/specializations/netwo...,Networking in Google Cloud Specialization,Google Cloud,specializations,https://s3.amazonaws.com/coursera_assets/meta_...,Networking,Shareable Certificate,This specialization gives participants broad s...,Approximately 4 months to complete,English,Intermediate Level,-,free,4.8,-,2022-07-29T23:58:34Z,Google Cloud Fundamentals: Core Infrastructure...,Google Cloud Training
1,https://www.coursera.org/learn/2-speed-it,Two Speed IT: How Companies Can Surf the Digit...,CentraleSupélec,course,https://s3.amazonaws.com/coursera_assets/meta_...,Business Essentials,Shareable Certificate,"Transform or disappear, the Darwinism of IT: I...",Approx. 14 hours to complete,English,-,-,free,4.3,Introduction ~.~ Start here! ~.~ IT and the CI...,2022-07-29T23:58:34Z,Introduction;IT and the CIO in the Digital Wor...,Antoine Gourévitch;Vanessa Lyon;Eric Baudson
2,https://www.coursera.org/learn/fundamentals-ne...,Fundamentals of Network Communication,University of Colorado System,course,https://s3.amazonaws.com/coursera_assets/meta_...,Computer Security and Networks,Shareable Certificate,"In this course, we trace the evolution of netw...",Approx. 15 hours to complete,English,Intermediate Level,-,free,4.6,Communication Networks and Services ~.~ This m...,2022-07-29T23:58:54Z,Communication Networks and Services;Layered Ar...,Xiaobo Zhou;University of Colorado System
3,https://www.coursera.org/learn/ux-design-jobs,Design a User Experience for Social Good & Pre...,Google,course,https://s3.amazonaws.com/coursera_assets/meta_...,Design and Product,Shareable Certificate,Design a User Experience for Social Good and P...,Approx. 71 hours to complete,English,Beginner Level,-,free,4.8,"Starting the UX design process: empathize, def...",2022-07-29T23:59:20Z,Design for social good and strengthen your por...,Google Career Certificates
4,https://www.coursera.org/learn/database-applic...,Building Database Applications in PHP,University of Michigan,course,https://s3.amazonaws.com/coursera_assets/meta_...,Mobile and Web Development,Shareable Certificate,"In this course, we'll look at the object orien...",Approx. 24 hours to complete,English,Intermediate Level,-,free,4.9,PHP Objects ~.~ We look at the object oriented...,2022-07-29T23:59:20Z,PHP Objects;Connecting PHP and MySQL;PHP Cooki...,Charles Russell Severance
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
234,https://www.coursera.org/specializations/data-...,Data Science: Foundations using R Specialization,-,specializations,https://s3.amazonaws.com/coursera_assets/meta_...,Data Analysis,Shareable Certificate,"Ask the right questions, manipulate data sets,...",Approximately 5 months to complete,English,Beginner Level,-,free,4.6,-,2022-07-30T00:45:32Z,The Data Scientist’s Toolbox;R Programming;Get...,"Roger D. Peng, PhD;Brian Caffo, PhD"
235,https://www.coursera.org/professional-certific...,IBM Data Science Professional Certificate,IBM Skills Network,professional certificates,https://s3.amazonaws.com/coursera_assets/meta_...,Data Analysis,Shareable Certificate,Data science is one of the hottest professions...,Approximately 11 months to complete,English,Beginner Level,-,free,4.6,-,2022-07-30T00:45:32Z,Get exclusive access to career resources upon ...,IBM Skills Network Team;Dr. Pooja;Abhishek Gag...
236,https://www.coursera.org/specializations/jhu-d...,Data Science Specialization,-,specializations,https://s3.amazonaws.com/coursera_assets/meta_...,Data Analysis,Shareable Certificate,"Ask the right questions, manipulate data sets,...",Approximately 11 months to complete,English,Beginner Level,-,free,4.5,-,2022-07-30T00:45:32Z,The Data Scientist’s Toolbox;R Programming;Get...,"Roger D. Peng, PhD;Brian Caffo, PhD;Jeff Leek,..."
237,https://www.coursera.org/learn/physical-chemistry,Introduction to Physical Chemistry,University of Manchester,course,https://s3.amazonaws.com/coursera_assets/meta_...,Chemistry,Shareable Certificate,Chemical reactions underpin the production of ...,Approx. 19 hours to complete,English,-,-,free,4.7,Thermodynamics I ~.~ This module explores ther...,2022-07-30T00:45:41Z,Thermodynamics I;Thermodynamics II;Virtual Lab...,"Patrick J O'Malley, D.Sc;Michael W. Anderson, ..."


**Insight**
- There are several inactive urls which include urls in rows : 16, 27, 30, 31, 32, 33, 38, 39, 40, 41, 47, 53, 54, 74, 75, 76, 77, 85, 96, 102, 121, 140, 164, 170, 171, 172, 173, 184, 187, 195, 196, 197, 198, 211, 212, 233.
- There is one row with arabic letter in row 32.

## 4.2 Function untuk mengambil Semua data

In [105]:
# Buat list unutk menampung nilai
list_titles = []
list_modules = []
list_instructors_name = []

# sediakan url
url = df_raw['url'][0]

# akses website
driver.get(url)

# extract html
html = driver.page_source

# Parsing html
soup = BeautifulSoup(html, 'html.parser') 

# 1. Mengambil title course
title = soup.find('h1')
list_titles.append(title.get_text())
list_titles = ','.join(list_titles)

# 2. Mengambil nama module
modules = soup.find_all('h3', {'class':'cds-119'})
for module in modules:
    list_modules.append(module.get_text())

# Join semua data menjadi 1 baris
list_modules = ','.join(list_modules)

# Mengambil nama instruktur
instructor_name = soup.find_all('div', {'class':'css-1f454bp'})
for name in instructor_name:
    list_instructors_name.append(name.get_text())

# Join semua data menjadi 1 baris
list_instructors_name = ','.join(list_instructors_name)

# Gabungkan di dataframe
df2 = pd.DataFrame(({
    'Titles':pd.Series(list_titles),
    'Modules':pd.Series(list_modules),
    'Instructurs':pd.Series(list_instructors_name)}
))
df2

NameError: name 'driver' is not defined

In [20]:
def get_data(data1, data2):

    # Buat list unutk menampung nilai
    list_titles = []
    list_modules = []
    list_instructors_name = []

    # Variable untuk looping
    x = 0

    for i in data1['url'][0:3]:
        for y in data2:
                
            # sediakan url
            url = df_raw['url'][x]

            # akses website
            driver.get(url)

            # extract html
            html = driver.page_source

            # Parsing html
            soup = BeautifulSoup(html, 'html.parser') 

            # 1. Mengambil title course
            title = soup.find('h1')
            list_titles.append(title.get_text())

            # 2. Mengambil nama module
            modules = soup.find_all('h3', {'class':'cds-119'})
            for module in modules:
                list_modules.append(module.get_text())

            # Mengambil nama instruktur
            instructor_name = soup.find_all('div', {'class':'css-1f454bp'})
            for name in instructor_name:
                list_instructors_name.append(name.get_text())

            # Join semua data menjadi 1 baris
            list_instructors_name = ','.join(list_instructors_name)

            new_rows = {'Titles': list_titles, 'Modules':list_modules, 'Instructurs':list_instructors_name}
            data2 = data2.append(new_rows, ignore_index=True)

            list_titles = []
            list_modules = []
            list_instructors_name = []
            
            x += 1
            

            # df['Titles'] = pd.Series(list_titles)
            # df['Modules'] = pd.Series(list_modules)
            
            
        #     # new_rows = {'Titles': list_titles, 'Modules':list_modules}
        # df = df.append(list_titles, ignore_index=True)
        # df = df.append(list_modules, ignore_index=True)


In [21]:
get_data(df_raw, df2)

  data2 = data2.append(new_rows, ignore_index=True)
  data2 = data2.append(new_rows, ignore_index=True)
  data2 = data2.append(new_rows, ignore_index=True)
  data2 = data2.append(new_rows, ignore_index=True)
  data2 = data2.append(new_rows, ignore_index=True)
  data2 = data2.append(new_rows, ignore_index=True)
  data2 = data2.append(new_rows, ignore_index=True)
  data2 = data2.append(new_rows, ignore_index=True)
  data2 = data2.append(new_rows, ignore_index=True)


In [22]:
data2

Unnamed: 0,Titles,Modules,Instructurs
0,Networking in Google Cloud Specialization,Google Cloud Fundamentals: Core Infrastructure...,"Google Cloud TrainingGoogle Cloud1,480 Courses..."


In [176]:
# Buat list unutk menampung nilai
list_titles = []
list_modules = []

def get_data(data):
        
    # Buat Dataframe
    df = pd.DataFrame(({
        'Titles':pd.Series(),
        'Modules':pd.Series()}))

    # Variable untuk looping
    x = 0

    for i in data2['url'][0:10]:   
        # sediakan url
        url = df_raw['url'][x]

        # akses website
        driver.get(url)

        # extract html
        html = driver.page_source

        # Parsing html
        soup = BeautifulSoup(html, 'html.parser') 

        # 1. Mengambil title course
        title = soup.find('h1')
        list_titles.append(title.get_text())

        # 2. Mengambil nama module
        modules = soup.find_all('h3', {'class':'cds-119'})
        for module in modules:
            list_modules.append(module.get_text())
        x += 1
        

        # df['Titles'] = pd.Series(list_titles)
        # df['Modules'] = pd.Series(list_modules)
        
        
    #     # new_rows = {'Titles': list_titles, 'Modules':list_modules}
    # df = df.append(list_titles, ignore_index=True)
    # df = df.append(list_modules, ignore_index=True)


In [180]:
# Buat list unutk menampung nilai
list_titles = []
list_modules = []
list_instructors_name = []

def get_data(data):
        
    # Buat Dataframe
    df = pd.DataFrame(({
        'Titles':pd.Series(),
        'Modules':pd.Series()}))

    # Variable untuk looping
    x = 0

    for i in data['url'][0:10]:   
        # sediakan url
        url = df_raw['url'][x]

        # akses website
        driver.get(url)

        # extract html
        html = driver.page_source

        # Parsing html
        soup = BeautifulSoup(html, 'html.parser') 

        # 1. Mengambil title course
        title = soup.find('h1')
        list_titles.append(title.get_text())

        # 2. Mengambil nama module
        modules = soup.find_all('h3', {'class':'cds-119'})
        for module in modules:
            list_modules.append(module.get_text())

        # 3. Mengambil nama instruktur
        instructor_name = soup.find_all('div', {'class':'css-1f454bp'})
        for name in instructor_name:
            list_instructors_name.append(name.get_text())
        x += 1
        

        # df['Titles'] = pd.Series(list_titles)
        # df['Modules'] = pd.Series(list_modules)
           
        # new_rows = {'Titles': list_titles, 'Modules':list_modules}
        # df = df.append(new_rows, ignore_index=True)
    df = df.append(list_titles, ignore_index=True)
    df = df.append(list_modules, ignore_index=True)
    df = df.append(list_instructors_name, ignore_index=True)


In [181]:
get_data(df_raw)

  'Titles':pd.Series(),
  'Modules':pd.Series()}))
  df = df.append(list_titles, ignore_index=True)
  df = df.append(list_modules, ignore_index=True)
  df = df.append(list_instructors_name, ignore_index=True)


In [182]:
df

Unnamed: 0,Titles,Modules
0,Two Speed IT: How Companies Can Surf the Digit...,"Introduction,IT and the CIO in the Digital Wor..."
1,Two Speed IT: How Companies Can Surf the Digit...,"Introduction,IT and the CIO in the Digital Wor..."


In [173]:
list_titles

['Networking in Google Cloud Specialization',
 'Two Speed IT: How Companies Can Surf the Digital Wave, a BCG Perspective',
 'Fundamentals of Network Communication',
 'Design a User Experience for Social Good & Prepare for Jobs',
 'Building Database Applications in PHP',
 'Web Design: Wireframes to Prototypes',
 'Build Wireframes and Low-Fidelity Prototypes',
 'Introduction to C# Programming and Unity',
 'Virtual Reality Specialization',
 'C++ Programming for Unreal Game Development Specialization']

In [174]:
list_modules

['Google Cloud Fundamentals: Core Infrastructure',
 'Networking in Google Cloud: Defining and Implementing Networks',
 'Networking in Google Cloud: Hybrid Connectivity and Network Management',
 'Instructor',
 'Offered by',
 'More questions',
 'Introduction',
 'IT and the CIO in the Digital World',
 'Steer the Balance Sheet',
 'Market and Sell Products',
 'Run the Factories',
 'Manage Human Resources',
 'Transform or Disappear',
 'Instructors',
 'Offered by',
 "Recommended if you're interested in Business Essentials",
 'More questions',
 'Communication Networks and Services',
 'Layered Architectures ',
 'Socket API & Digital Transmissions',
 'Error Control',
 'course project - fundamentals of network communication',
 'Instructor',
 'Offered by',
 "Recommended if you're interested in Computer Security and Networks",
 'More questions',
 'Design for social good and strengthen your portfolio',
 'Build a professional presence',
 'Finding a UX job',
 'Instructor',
 'Offered by',
 "Recommended

In [175]:
list_instructors_name

['Google Cloud TrainingGoogle Cloud1,480 Courses•2,614,781 learners',
 'Google CloudLearn moreCloseOffered byGoogle CloudWe help millions of organizations empower their employees, serve their customers, and build what’s next for their businesses with innovative technology created in—and for—the cloud. Our products are engineered for security, reliability, and scalability, running the full stack from infrastructure to applications to devices and hardware. Our teams are dedicated to helping customers apply our technologies to create success.OK',
 'Antoine GourévitchCentraleSupélec1 Course•21,696 learners',
 'Vanessa LyonCentraleSupélec1 Course•21,696 learners',
 'Eric BaudsonCentraleSupélec1 Course•21,696 learners',
 'CentraleSupélecLearn moreCloseOffered byCentraleSupélecCentraleSupélec is the result of the merger of the Ecole Centrale Paris and the Supélec. The collaboration between the two Colleges of engineering begun in 1969 with the introduction of the joint competitive entrance ex

# 5. Data Cleaning

In [7]:
# Displaying final dataset
df

Unnamed: 0,url,title,associated-university-institution-company,type,image,category-subject-area,certificate-is-available,description,duration,language,level,prerequisites,price,rating,syllabus,timestamp,modules,instructor
0,https://www.coursera.org/specializations/netwo...,Networking in Google Cloud Specialization,Google Cloud,specializations,https://s3.amazonaws.com/coursera_assets/meta_...,Networking,Shareable Certificate,This specialization gives participants broad s...,Approximately 4 months to complete,English,Intermediate Level,-,free,4.8,-,2022-07-29T23:58:34Z,Google Cloud Fundamentals: Core Infrastructure...,Google Cloud Training
1,https://www.coursera.org/learn/2-speed-it,Two Speed IT: How Companies Can Surf the Digit...,CentraleSupélec,course,https://s3.amazonaws.com/coursera_assets/meta_...,Business Essentials,Shareable Certificate,"Transform or disappear, the Darwinism of IT: I...",Approx. 14 hours to complete,English,-,-,free,4.3,Introduction ~.~ Start here! ~.~ IT and the CI...,2022-07-29T23:58:34Z,Introduction;IT and the CIO in the Digital Wor...,Antoine Gourévitch;Vanessa Lyon;Eric Baudson
2,https://www.coursera.org/learn/fundamentals-ne...,Fundamentals of Network Communication,University of Colorado System,course,https://s3.amazonaws.com/coursera_assets/meta_...,Computer Security and Networks,Shareable Certificate,"In this course, we trace the evolution of netw...",Approx. 15 hours to complete,English,Intermediate Level,-,free,4.6,Communication Networks and Services ~.~ This m...,2022-07-29T23:58:54Z,Communication Networks and Services;Layered Ar...,Xiaobo Zhou;University of Colorado System
3,https://www.coursera.org/learn/ux-design-jobs,Design a User Experience for Social Good & Pre...,Google,course,https://s3.amazonaws.com/coursera_assets/meta_...,Design and Product,Shareable Certificate,Design a User Experience for Social Good and P...,Approx. 71 hours to complete,English,Beginner Level,-,free,4.8,"Starting the UX design process: empathize, def...",2022-07-29T23:59:20Z,Design for social good and strengthen your por...,Google Career Certificates
4,https://www.coursera.org/learn/database-applic...,Building Database Applications in PHP,University of Michigan,course,https://s3.amazonaws.com/coursera_assets/meta_...,Mobile and Web Development,Shareable Certificate,"In this course, we'll look at the object orien...",Approx. 24 hours to complete,English,Intermediate Level,-,free,4.9,PHP Objects ~.~ We look at the object oriented...,2022-07-29T23:59:20Z,PHP Objects;Connecting PHP and MySQL;PHP Cooki...,Charles Russell Severance
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
234,https://www.coursera.org/specializations/data-...,Data Science: Foundations using R Specialization,-,specializations,https://s3.amazonaws.com/coursera_assets/meta_...,Data Analysis,Shareable Certificate,"Ask the right questions, manipulate data sets,...",Approximately 5 months to complete,English,Beginner Level,-,free,4.6,-,2022-07-30T00:45:32Z,The Data Scientist’s Toolbox;R Programming;Get...,"Roger D. Peng, PhD;Brian Caffo, PhD"
235,https://www.coursera.org/professional-certific...,IBM Data Science Professional Certificate,IBM Skills Network,professional certificates,https://s3.amazonaws.com/coursera_assets/meta_...,Data Analysis,Shareable Certificate,Data science is one of the hottest professions...,Approximately 11 months to complete,English,Beginner Level,-,free,4.6,-,2022-07-30T00:45:32Z,Get exclusive access to career resources upon ...,IBM Skills Network Team;Dr. Pooja;Abhishek Gag...
236,https://www.coursera.org/specializations/jhu-d...,Data Science Specialization,-,specializations,https://s3.amazonaws.com/coursera_assets/meta_...,Data Analysis,Shareable Certificate,"Ask the right questions, manipulate data sets,...",Approximately 11 months to complete,English,Beginner Level,-,free,4.5,-,2022-07-30T00:45:32Z,The Data Scientist’s Toolbox;R Programming;Get...,"Roger D. Peng, PhD;Brian Caffo, PhD;Jeff Leek,..."
237,https://www.coursera.org/learn/physical-chemistry,Introduction to Physical Chemistry,University of Manchester,course,https://s3.amazonaws.com/coursera_assets/meta_...,Chemistry,Shareable Certificate,Chemical reactions underpin the production of ...,Approx. 19 hours to complete,English,-,-,free,4.7,Thermodynamics I ~.~ This module explores ther...,2022-07-30T00:45:41Z,Thermodynamics I;Thermodynamics II;Virtual Lab...,"Patrick J O'Malley, D.Sc;Michael W. Anderson, ..."


## 5.1 Drop inactive rows

In [10]:
# Drop inactive url rows
df.drop(index=[16, 27, 30, 31, 32, 33, 38, 39, 40, 41, 47, 53, 54, 74, 
               75, 76, 77, 85, 96, 102, 121, 140, 164, 170, 171, 172, 
               173, 184, 187, 195, 196, 197, 198, 211], 
               inplace=True)

# Reset index
df.reset_index(inplace=True)

# Drop index columns
df.drop('index',axis=1,inplace=True)

In [67]:
# Drop another column
df.drop(index=27, inplace=True)

# Reset index
df.reset_index(inplace=True)

# Drop index columns
df.drop('index',axis=1,inplace=True)

## 5.1 Rename Columns Name

In [11]:
# Rename several columns name
df = df.rename(columns={
                        'associated-university-institution-company': 'association',
                        'category-subject-area': 'category',
                        'certificate-is-available': 'certificate_type'
                        })

In [12]:
df.head()

Unnamed: 0,url,title,association,type,image,category,certificate_type,description,duration,language,level,prerequisites,price,rating,syllabus,timestamp,modules,instructor
0,https://www.coursera.org/specializations/netwo...,Networking in Google Cloud Specialization,Google Cloud,specializations,https://s3.amazonaws.com/coursera_assets/meta_...,Networking,Shareable Certificate,This specialization gives participants broad s...,Approximately 4 months to complete,English,Intermediate Level,-,free,4.8,-,2022-07-29T23:58:34Z,Google Cloud Fundamentals: Core Infrastructure...,Google Cloud Training
1,https://www.coursera.org/learn/2-speed-it,Two Speed IT: How Companies Can Surf the Digit...,CentraleSupélec,course,https://s3.amazonaws.com/coursera_assets/meta_...,Business Essentials,Shareable Certificate,"Transform or disappear, the Darwinism of IT: I...",Approx. 14 hours to complete,English,-,-,free,4.3,Introduction ~.~ Start here! ~.~ IT and the CI...,2022-07-29T23:58:34Z,Introduction;IT and the CIO in the Digital Wor...,Antoine Gourévitch;Vanessa Lyon;Eric Baudson
2,https://www.coursera.org/learn/fundamentals-ne...,Fundamentals of Network Communication,University of Colorado System,course,https://s3.amazonaws.com/coursera_assets/meta_...,Computer Security and Networks,Shareable Certificate,"In this course, we trace the evolution of netw...",Approx. 15 hours to complete,English,Intermediate Level,-,free,4.6,Communication Networks and Services ~.~ This m...,2022-07-29T23:58:54Z,Communication Networks and Services;Layered Ar...,Xiaobo Zhou;University of Colorado System
3,https://www.coursera.org/learn/ux-design-jobs,Design a User Experience for Social Good & Pre...,Google,course,https://s3.amazonaws.com/coursera_assets/meta_...,Design and Product,Shareable Certificate,Design a User Experience for Social Good and P...,Approx. 71 hours to complete,English,Beginner Level,-,free,4.8,"Starting the UX design process: empathize, def...",2022-07-29T23:59:20Z,Design for social good and strengthen your por...,Google Career Certificates
4,https://www.coursera.org/learn/database-applic...,Building Database Applications in PHP,University of Michigan,course,https://s3.amazonaws.com/coursera_assets/meta_...,Mobile and Web Development,Shareable Certificate,"In this course, we'll look at the object orien...",Approx. 24 hours to complete,English,Intermediate Level,-,free,4.9,PHP Objects ~.~ We look at the object oriented...,2022-07-29T23:59:20Z,PHP Objects;Connecting PHP and MySQL;PHP Cooki...,Charles Russell Severance


## 5.2 Re-arrange Columns

In [13]:
# Re-arrange columns position
df = df[['title', 'category', 'type', 'level', 'description', 'price',
    'rating', 'duration', 'language', 'prerequisites', 'syllabus',
    'modules', 'instructor', 'certificate_type', 'association','image', 'url', 'timestamp']]

# Displaying dataset
df

Unnamed: 0,title,category,type,level,description,price,rating,duration,language,prerequisites,syllabus,modules,instructor,certificate_type,association,image,url,timestamp
0,Networking in Google Cloud Specialization,Networking,specializations,Intermediate Level,This specialization gives participants broad s...,free,4.8,Approximately 4 months to complete,English,-,-,Google Cloud Fundamentals: Core Infrastructure...,Google Cloud Training,Shareable Certificate,Google Cloud,https://s3.amazonaws.com/coursera_assets/meta_...,https://www.coursera.org/specializations/netwo...,2022-07-29T23:58:34Z
1,Two Speed IT: How Companies Can Surf the Digit...,Business Essentials,course,-,"Transform or disappear, the Darwinism of IT: I...",free,4.3,Approx. 14 hours to complete,English,-,Introduction ~.~ Start here! ~.~ IT and the CI...,Introduction;IT and the CIO in the Digital Wor...,Antoine Gourévitch;Vanessa Lyon;Eric Baudson,Shareable Certificate,CentraleSupélec,https://s3.amazonaws.com/coursera_assets/meta_...,https://www.coursera.org/learn/2-speed-it,2022-07-29T23:58:34Z
2,Fundamentals of Network Communication,Computer Security and Networks,course,Intermediate Level,"In this course, we trace the evolution of netw...",free,4.6,Approx. 15 hours to complete,English,-,Communication Networks and Services ~.~ This m...,Communication Networks and Services;Layered Ar...,Xiaobo Zhou;University of Colorado System,Shareable Certificate,University of Colorado System,https://s3.amazonaws.com/coursera_assets/meta_...,https://www.coursera.org/learn/fundamentals-ne...,2022-07-29T23:58:54Z
3,Design a User Experience for Social Good & Pre...,Design and Product,course,Beginner Level,Design a User Experience for Social Good and P...,free,4.8,Approx. 71 hours to complete,English,-,"Starting the UX design process: empathize, def...",Design for social good and strengthen your por...,Google Career Certificates,Shareable Certificate,Google,https://s3.amazonaws.com/coursera_assets/meta_...,https://www.coursera.org/learn/ux-design-jobs,2022-07-29T23:59:20Z
4,Building Database Applications in PHP,Mobile and Web Development,course,Intermediate Level,"In this course, we'll look at the object orien...",free,4.9,Approx. 24 hours to complete,English,-,PHP Objects ~.~ We look at the object oriented...,PHP Objects;Connecting PHP and MySQL;PHP Cooki...,Charles Russell Severance,Shareable Certificate,University of Michigan,https://s3.amazonaws.com/coursera_assets/meta_...,https://www.coursera.org/learn/database-applic...,2022-07-29T23:59:20Z
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
198,Data Science: Foundations using R Specialization,Data Analysis,specializations,Beginner Level,"Ask the right questions, manipulate data sets,...",free,4.6,Approximately 5 months to complete,English,-,-,The Data Scientist’s Toolbox;R Programming;Get...,"Roger D. Peng, PhD;Brian Caffo, PhD",Shareable Certificate,-,https://s3.amazonaws.com/coursera_assets/meta_...,https://www.coursera.org/specializations/data-...,2022-07-30T00:45:32Z
199,IBM Data Science Professional Certificate,Data Analysis,professional certificates,Beginner Level,Data science is one of the hottest professions...,free,4.6,Approximately 11 months to complete,English,-,-,Get exclusive access to career resources upon ...,IBM Skills Network Team;Dr. Pooja;Abhishek Gag...,Shareable Certificate,IBM Skills Network,https://s3.amazonaws.com/coursera_assets/meta_...,https://www.coursera.org/professional-certific...,2022-07-30T00:45:32Z
200,Data Science Specialization,Data Analysis,specializations,Beginner Level,"Ask the right questions, manipulate data sets,...",free,4.5,Approximately 11 months to complete,English,-,-,The Data Scientist’s Toolbox;R Programming;Get...,"Roger D. Peng, PhD;Brian Caffo, PhD;Jeff Leek,...",Shareable Certificate,-,https://s3.amazonaws.com/coursera_assets/meta_...,https://www.coursera.org/specializations/jhu-d...,2022-07-30T00:45:32Z
201,Introduction to Physical Chemistry,Chemistry,course,-,Chemical reactions underpin the production of ...,free,4.7,Approx. 19 hours to complete,English,-,Thermodynamics I ~.~ This module explores ther...,Thermodynamics I;Thermodynamics II;Virtual Lab...,"Patrick J O'Malley, D.Sc;Michael W. Anderson, ...",Shareable Certificate,University of Manchester,https://s3.amazonaws.com/coursera_assets/meta_...,https://www.coursera.org/learn/physical-chemistry,2022-07-30T00:45:41Z


## 5.4 Handling missing value

In [14]:
# Handling Missing Not at Random (MNAR) value: Diasumsikan nilai kosong merupakan Beginner Level
df['level'] = df['level'].replace('-', 'Beginner Level')

# Handling Missing Not at Random (MNAR) value: Diasumsikan nilai kosong merupakan course yang free
df['price'] = df['price'].replace('-', 'free')

# Handling Missing Not at Random (MNAR) value: Diasumsikan nilai yang kosong memiliki standar rating 4
df['rating'] = df['rating'].replace('-', 4)

# Handling Missing Not at Random (MNAR) value: Diasumsikan nilai yang kosong memiliki durasi rata-rata yaitu 8 jam
df['duration'] = df['duration'].replace('-', 'Approx. 8 hours to complete')

# Handling Missing Not at Random (MNAR) value: Diasumsikan nilai yang kosong memiliki bahasa English
df['language'] = df['language'].replace('-', 'English')

# Handling Missing Not at Random (MNAR) value: Diasumsikan nilai yang kosong sebagai no prerequisites needed
df['prerequisites'] = df['prerequisites'].replace('-', 'no prerequisites needed')

# Handling Missing Not at Random (MNAR) value: Diasumsikan nilai yang kosong sebagai Non shareable certificate
df['certificate_type'] = df['certificate_type'].replace('-', 'Non Shareable Certificate')

# Handling Missing Not at Random (MNAR) value: Diasumsikan nilai yang kosong sebagai value General
df['association'] = df['association'].replace('-', 'General')

# Handling Missing Not at Random (MNAR) value: Diasumsikan nilai yang kosong sebagai No specify data
df['syllabus'] = df['syllabus'].replace('-', 'No specify data')

In [15]:
# Check if '-' still in df
'-' in df

False

## 5.5 Change datatype

In [16]:
# Change datatype
df['rating'] = df['rating'].astype('float64')
df['timestamp'] = df['timestamp'].astype('datetime64[ns]')

# Check kembali tipe data
df.info() 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 203 entries, 0 to 202
Data columns (total 18 columns):
 #   Column            Non-Null Count  Dtype         
---  ------            --------------  -----         
 0   title             203 non-null    object        
 1   category          203 non-null    object        
 2   type              203 non-null    object        
 3   level             203 non-null    object        
 4   description       203 non-null    object        
 5   price             203 non-null    object        
 6   rating            203 non-null    float64       
 7   duration          203 non-null    object        
 8   language          203 non-null    object        
 9   prerequisites     203 non-null    object        
 10  syllabus          203 non-null    object        
 11  modules           203 non-null    object        
 12  instructor        203 non-null    object        
 13  certificate_type  203 non-null    object        
 14  association       203 non-

# 6. Export Final Dataset

In [17]:
# Display Final Dataset
df

Unnamed: 0,title,category,type,level,description,price,rating,duration,language,prerequisites,syllabus,modules,instructor,certificate_type,association,image,url,timestamp
0,Networking in Google Cloud Specialization,Networking,specializations,Intermediate Level,This specialization gives participants broad s...,free,4.8,Approximately 4 months to complete,English,no prerequisites needed,No specify data,Google Cloud Fundamentals: Core Infrastructure...,Google Cloud Training,Shareable Certificate,Google Cloud,https://s3.amazonaws.com/coursera_assets/meta_...,https://www.coursera.org/specializations/netwo...,2022-07-29 23:58:34
1,Two Speed IT: How Companies Can Surf the Digit...,Business Essentials,course,Beginner Level,"Transform or disappear, the Darwinism of IT: I...",free,4.3,Approx. 14 hours to complete,English,no prerequisites needed,Introduction ~.~ Start here! ~.~ IT and the CI...,Introduction;IT and the CIO in the Digital Wor...,Antoine Gourévitch;Vanessa Lyon;Eric Baudson,Shareable Certificate,CentraleSupélec,https://s3.amazonaws.com/coursera_assets/meta_...,https://www.coursera.org/learn/2-speed-it,2022-07-29 23:58:34
2,Fundamentals of Network Communication,Computer Security and Networks,course,Intermediate Level,"In this course, we trace the evolution of netw...",free,4.6,Approx. 15 hours to complete,English,no prerequisites needed,Communication Networks and Services ~.~ This m...,Communication Networks and Services;Layered Ar...,Xiaobo Zhou;University of Colorado System,Shareable Certificate,University of Colorado System,https://s3.amazonaws.com/coursera_assets/meta_...,https://www.coursera.org/learn/fundamentals-ne...,2022-07-29 23:58:54
3,Design a User Experience for Social Good & Pre...,Design and Product,course,Beginner Level,Design a User Experience for Social Good and P...,free,4.8,Approx. 71 hours to complete,English,no prerequisites needed,"Starting the UX design process: empathize, def...",Design for social good and strengthen your por...,Google Career Certificates,Shareable Certificate,Google,https://s3.amazonaws.com/coursera_assets/meta_...,https://www.coursera.org/learn/ux-design-jobs,2022-07-29 23:59:20
4,Building Database Applications in PHP,Mobile and Web Development,course,Intermediate Level,"In this course, we'll look at the object orien...",free,4.9,Approx. 24 hours to complete,English,no prerequisites needed,PHP Objects ~.~ We look at the object oriented...,PHP Objects;Connecting PHP and MySQL;PHP Cooki...,Charles Russell Severance,Shareable Certificate,University of Michigan,https://s3.amazonaws.com/coursera_assets/meta_...,https://www.coursera.org/learn/database-applic...,2022-07-29 23:59:20
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
198,Data Science: Foundations using R Specialization,Data Analysis,specializations,Beginner Level,"Ask the right questions, manipulate data sets,...",free,4.6,Approximately 5 months to complete,English,no prerequisites needed,No specify data,The Data Scientist’s Toolbox;R Programming;Get...,"Roger D. Peng, PhD;Brian Caffo, PhD",Shareable Certificate,General,https://s3.amazonaws.com/coursera_assets/meta_...,https://www.coursera.org/specializations/data-...,2022-07-30 00:45:32
199,IBM Data Science Professional Certificate,Data Analysis,professional certificates,Beginner Level,Data science is one of the hottest professions...,free,4.6,Approximately 11 months to complete,English,no prerequisites needed,No specify data,Get exclusive access to career resources upon ...,IBM Skills Network Team;Dr. Pooja;Abhishek Gag...,Shareable Certificate,IBM Skills Network,https://s3.amazonaws.com/coursera_assets/meta_...,https://www.coursera.org/professional-certific...,2022-07-30 00:45:32
200,Data Science Specialization,Data Analysis,specializations,Beginner Level,"Ask the right questions, manipulate data sets,...",free,4.5,Approximately 11 months to complete,English,no prerequisites needed,No specify data,The Data Scientist’s Toolbox;R Programming;Get...,"Roger D. Peng, PhD;Brian Caffo, PhD;Jeff Leek,...",Shareable Certificate,General,https://s3.amazonaws.com/coursera_assets/meta_...,https://www.coursera.org/specializations/jhu-d...,2022-07-30 00:45:32
201,Introduction to Physical Chemistry,Chemistry,course,Beginner Level,Chemical reactions underpin the production of ...,free,4.7,Approx. 19 hours to complete,English,no prerequisites needed,Thermodynamics I ~.~ This module explores ther...,Thermodynamics I;Thermodynamics II;Virtual Lab...,"Patrick J O'Malley, D.Sc;Michael W. Anderson, ...",Shareable Certificate,University of Manchester,https://s3.amazonaws.com/coursera_assets/meta_...,https://www.coursera.org/learn/physical-chemistry,2022-07-30 00:45:41


In [18]:
df.to_csv('coursera_dataset2.csv')

In [18]:
df2 = pd.read_csv('coursera_dataset2.csv')

In [21]:
# Handling keterhubungan tanda ; dengan kata-kata dalam kolom dari hasil scrapping
new_modules = []

for data in df2['modules']:
    data = data.replace(';', ' ; ')
    new_modules.append(data)

df2['modules'] = pd.Series(new_modules)

In [22]:
# Handling keterhubungan tanda ; dengan kata-kata dalam kolom dari hasil scrapping
new_instructor = []

for data in df2['instructor']:
    data = data.replace(';', ' ; ')
    new_instructor.append(data)

df2['instructor'] = pd.Series(new_instructor)

In [23]:
# Handling tanda ~.~ yang terdapat dalam kolom syllabus
new_syllabus = []

for data in df2['syllabus']:
    data = data.replace('~.~', '')
    new_syllabus.append(data)

df2['syllabus'] = pd.Series(new_syllabus)

In [24]:
# Displaying final dataset
df2

Unnamed: 0.1,Unnamed: 0,title,category,type,level,description,price,rating,duration,language,prerequisites,syllabus,modules,instructor,certificate_type,association,image,url,timestamp
0,0,Networking in Google Cloud Specialization,Networking,specializations,Intermediate Level,This specialization gives participants broad s...,free,4.8,Approximately 4 months to complete,English,no prerequisites needed,No specify data,Google Cloud Fundamentals: Core Infrastructure...,Google Cloud Training,Shareable Certificate,Google Cloud,https://s3.amazonaws.com/coursera_assets/meta_...,https://www.coursera.org/specializations/netwo...,2022-07-29 23:58:34
1,1,Two Speed IT: How Companies Can Surf the Digit...,Business Essentials,course,Beginner Level,"Transform or disappear, the Darwinism of IT: I...",free,4.3,Approx. 14 hours to complete,English,no prerequisites needed,Introduction Start here! IT and the CIO in t...,Introduction ; IT and the CIO in the Digital W...,Antoine Gourévitch ; Vanessa Lyon ; Eric Baudson,Shareable Certificate,CentraleSupélec,https://s3.amazonaws.com/coursera_assets/meta_...,https://www.coursera.org/learn/2-speed-it,2022-07-29 23:58:34
2,2,Fundamentals of Network Communication,Computer Security and Networks,course,Intermediate Level,"In this course, we trace the evolution of netw...",free,4.6,Approx. 15 hours to complete,English,no prerequisites needed,Communication Networks and Services This modu...,Communication Networks and Services ; Layered ...,Xiaobo Zhou ; University of Colorado System,Shareable Certificate,University of Colorado System,https://s3.amazonaws.com/coursera_assets/meta_...,https://www.coursera.org/learn/fundamentals-ne...,2022-07-29 23:58:54
3,3,Design a User Experience for Social Good & Pre...,Design and Product,course,Beginner Level,Design a User Experience for Social Good and P...,free,4.8,Approx. 71 hours to complete,English,no prerequisites needed,"Starting the UX design process: empathize, def...",Design for social good and strengthen your por...,Google Career Certificates,Shareable Certificate,Google,https://s3.amazonaws.com/coursera_assets/meta_...,https://www.coursera.org/learn/ux-design-jobs,2022-07-29 23:59:20
4,4,Building Database Applications in PHP,Mobile and Web Development,course,Intermediate Level,"In this course, we'll look at the object orien...",free,4.9,Approx. 24 hours to complete,English,no prerequisites needed,PHP Objects We look at the object oriented pa...,PHP Objects ; Connecting PHP and MySQL ; PHP C...,Charles Russell Severance,Shareable Certificate,University of Michigan,https://s3.amazonaws.com/coursera_assets/meta_...,https://www.coursera.org/learn/database-applic...,2022-07-29 23:59:20
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
198,198,Data Science: Foundations using R Specialization,Data Analysis,specializations,Beginner Level,"Ask the right questions, manipulate data sets,...",free,4.6,Approximately 5 months to complete,English,no prerequisites needed,No specify data,The Data Scientist’s Toolbox ; R Programming ;...,"Roger D. Peng, PhD ; Brian Caffo, PhD",Shareable Certificate,General,https://s3.amazonaws.com/coursera_assets/meta_...,https://www.coursera.org/specializations/data-...,2022-07-30 00:45:32
199,199,IBM Data Science Professional Certificate,Data Analysis,professional certificates,Beginner Level,Data science is one of the hottest professions...,free,4.6,Approximately 11 months to complete,English,no prerequisites needed,No specify data,Get exclusive access to career resources upon ...,IBM Skills Network Team ; Dr. Pooja ; Abhishek...,Shareable Certificate,IBM Skills Network,https://s3.amazonaws.com/coursera_assets/meta_...,https://www.coursera.org/professional-certific...,2022-07-30 00:45:32
200,200,Data Science Specialization,Data Analysis,specializations,Beginner Level,"Ask the right questions, manipulate data sets,...",free,4.5,Approximately 11 months to complete,English,no prerequisites needed,No specify data,The Data Scientist’s Toolbox ; R Programming ;...,"Roger D. Peng, PhD ; Brian Caffo, PhD ; Jeff L...",Shareable Certificate,General,https://s3.amazonaws.com/coursera_assets/meta_...,https://www.coursera.org/specializations/jhu-d...,2022-07-30 00:45:32
201,201,Introduction to Physical Chemistry,Chemistry,course,Beginner Level,Chemical reactions underpin the production of ...,free,4.7,Approx. 19 hours to complete,English,no prerequisites needed,Thermodynamics I This module explores thermod...,Thermodynamics I ; Thermodynamics II ; Virtual...,"Patrick J O'Malley, D.Sc ; Michael W. Anderson...",Shareable Certificate,University of Manchester,https://s3.amazonaws.com/coursera_assets/meta_...,https://www.coursera.org/learn/physical-chemistry,2022-07-30 00:45:41


In [36]:
# Export Final Dataset
df2.to_csv('coursera_dataset.csv')