## About Dataset

### Context

This is a web-scrapped dataset found on Kaggle of over 40,000 courses on Udemy.com in 9 categories from 2020. 

### Content

This dataset contains the following fields:
- index
- Title
- Summary
- Enrollment
- Stars
- Rating
- Link

More information on this dataset can be found here: https://www.kaggle.com/datasets/songseungwon/2020-udemy-courses-dataset

In [1]:
!python --version

Python 3.9.12


In [26]:
import pandas as pd
import numpy as np
import matplotlib
from matplotlib import pyplot as plt
import seaborn as sns
from functools import reduce

In [8]:
print(f'pandas version being used is: {pd.__version__}')
print(f'numpy version being used is: {np.__version__}')
print(f'seaborn version being used is: {sns.__version__}')
print(f'matplotlib version being used is: {matplotlib.__version__}')

pandas version being used is: 1.4.2
numpy version being used is: 1.21.5
seaborn version being used is: 0.11.2
matplotlib version being used is: 3.5.1


In [9]:
#function to create a dataframe
def create_df(path):
    df = pd.read_csv(path)
    return df

In [10]:
#creating path files for all of the datasets
business = 'D:/Capstone/udemy_dataset/udemy_business.csv'
design = 'D:/Capstone/udemy_dataset/udemy_design.csv'
finance = 'D:/Capstone/udemy_dataset/udemy_finance.csv'
lifestyle = 'D:/Capstone/udemy_dataset/udemy_lifestyle.csv'
marketing = 'D:/Capstone/udemy_dataset/udemy_marketing.csv'
music = 'D:/Capstone/udemy_dataset/udemy_music.csv'
office_productivity = 'D:/Capstone/udemy_dataset/udemy_office_productivity.csv'
photography = 'D:/Capstone/udemy_dataset/udemy_photography.csv'
tech = 'D:/Capstone/udemy_dataset/udemy_tech.csv'

In [12]:
#create the datasets
business_df = create_df(business)
design_df = create_df(design)
finance_df = create_df(finance)
lifestyle_df = create_df(lifestyle)
marketing_df = create_df(marketing)
music_df = create_df(music)
office_df = create_df(office_productivity)
photo_df = create_df(photography)
tech_df = create_df(tech)

In [13]:
business_df.head()

Unnamed: 0,index,Title,Summary,Enrollment,Stars,Rating,Link
0,0,The Complete SQL Bootcamp 2020: Go from Zero t...,Become an expert at SQL!,301243,4.7,79919,https://www.udemy.com/course/the-complete-sql-...
1,1,Tableau 2020 A-Z: Hands-On Tableau Training fo...,Learn Tableau 2020 for data science step by st...,211674,4.6,55582,https://www.udemy.com/course/tableau10/
2,2,PMP Exam Prep Seminar - PMBOK Guide 6,PMP Exam Prep Seminar - Earn 35 PDUs by comple...,157957,4.6,53858,https://www.udemy.com/course/pmp-pmbok6-35-pdus/
3,3,The Complete Financial Analyst Course 2020,"Excel, Accounting, Financial Statement Analysi...",249097,4.5,47415,https://www.udemy.com/course/the-complete-fina...
4,4,An Entire MBA in 1 Course:Award Winning Busine...,** #1 Best Selling Business Course! ** Everyth...,376913,4.5,42101,https://www.udemy.com/course/an-entire-mba-in-...


In [15]:
#adding a category section to each df before merging them together so when we sample the df, we will have 
#a good sample of each category
business_df['Category'] = 'business'
design_df['Category'] = 'design'
finance_df['Category'] = 'finance'
lifestyle_df['Category'] = 'lifestyle'
marketing_df['Category'] = 'marketing'
music_df['Category'] = 'music'
office_df['Category'] = 'office'
photo_df['Category'] = 'photo'
tech_df['Category'] =  'tech'

In [32]:
#combine all datasets and use pd.concat to merge all datasets together
dataframes = [business_df, design_df, finance_df, lifestyle_df, marketing_df, music_df, office_df, photo_df, tech_df]

udemy_master_df = pd.concat(dataframes)
udemy_master_df.head(10)
#len(udemy_master_df)

Unnamed: 0,index,Title,Summary,Enrollment,Stars,Rating,Link,category,Category
0,0,The Complete SQL Bootcamp 2020: Go from Zero t...,Become an expert at SQL!,301243,4.7,79919,https://www.udemy.com/course/the-complete-sql-...,business,business
1,1,Tableau 2020 A-Z: Hands-On Tableau Training fo...,Learn Tableau 2020 for data science step by st...,211674,4.6,55582,https://www.udemy.com/course/tableau10/,business,business
2,2,PMP Exam Prep Seminar - PMBOK Guide 6,PMP Exam Prep Seminar - Earn 35 PDUs by comple...,157957,4.6,53858,https://www.udemy.com/course/pmp-pmbok6-35-pdus/,business,business
3,3,The Complete Financial Analyst Course 2020,"Excel, Accounting, Financial Statement Analysi...",249097,4.5,47415,https://www.udemy.com/course/the-complete-fina...,business,business
4,4,An Entire MBA in 1 Course:Award Winning Busine...,** #1 Best Selling Business Course! ** Everyth...,376913,4.5,42101,https://www.udemy.com/course/an-entire-mba-in-...,business,business
5,5,Microsoft Power BI - A Complete Introduction [...,"Learn how to use Microsoft's Power BI Tools, i...",126880,4.6,38771,https://www.udemy.com/course/powerbi-complete-...,business,business
6,6,Agile Crash Course: Agile Project Management; ...,Get Agile Certified & Learn about the key and ...,98700,4.3,31276,https://www.udemy.com/course/agile-crash-course/,business,business
7,7,Beginner to Pro in Excel: Financial Modeling a...,Financial Modeling in Excel that would allow y...,128940,4.5,29111,https://www.udemy.com/course/beginner-to-pro-i...,business,business
8,8,Become a Product Manager | Learn the Skills & ...,The most complete course available on Product ...,114114,4.5,27879,https://www.udemy.com/course/become-a-product-...,business,business
9,9,The Business Intelligence Analyst Course 2020,The skills you need to become a BI Analyst - S...,118019,4.5,24582,https://www.udemy.com/course/the-business-inte...,business,business


In [33]:
#save the master dataframe as a csv
udemy_master_df.to_csv(r'D:/Capstone/udemy_dataset/udemy_master_df.csv')