![](61_Best-Udemy-Alternatives-For-Instructors-ONLINE-COURSE-MARKETPLACES.jpg)

# 1. Introduction

## 1.1 Udemy courses Overview
The Udemy Dataset provides a detailed snapshot of online courses available on Udemy, one of the leading platforms for online learning and teaching. This dataset is a valuable resource for analyzing trends in online education, understanding course popularity, and identifying factors that contribute to the success of courses. It contains information about various courses across multiple subjects, including Musical Instruments, Business Finance, Graphic Design, and Web Development.

Key Features of the Datase
t:
Course Details: The dataset includes essential information about each course, such as its title, pricing, number of subscribers, reviews, and lectures.

Course Levels: Courses are categorized by difficulty levels, such as Beginner, Intermediate, All Levels, and Expert Level, making it easier to analyze the demand for different skill levels.

Content Duration: The dataset provides the total duration of course content, which can be used to study the relationship between course length and subscriber engagement.

Publication Timestamp: Each course includes a timestamp indicating when it was published, allowing for trend analysis over time.

Subject Categories: Courses are grouped into subjects, enabling analysis of which topics are most popular or in demand.

Datas
et Structure:
The dataset is structured in a tabular format with the following columns:

course_id: A unique identifier for each course.

course_title: The title of the course.

is_paid: Indicates whether the course is paid (TRUE) or free (FALSE).

price: The price of the course in USD (if paid).

num_subscribers: The number of subscribers enrolled in the course.

num_reviews: The number of reviews received for the course.

num_lectures: The number of lectures included in the course.

level: The difficulty level of the course (e.g., Beginner, Intermediate, All Levels).

content_duration: The total duration of the course content (e.g., 1.5 hours).

published_timestamp: The timestamp when the course was published (format: YYYY-MM-DDTHH:MM:SSZ).

subject: The category or subject of the course (e.g., Musiable date format.

Ens
uring consistent data types across all columns.

Potential Applications:
Educators: Use the dataset to understand what types of courses are in demand and tailor their offerings accordingly.

Marketers: Analyze pricing strategies and promotional efforts to maximize course enrollments.

Data Analysts: Perform exploratory data analysis (EDA) to uncover trends and insights in the online education market.

Researcheroses. Users should review Udemy's terms of service for any restrictions on data usage.

This overview provides a high-level understanding of the Udemy Dataset, its structure, and its potential applications. It serves as a starting point for anyone looking to explore or analyze the dataset for insights into the online education market.

t.

# 2. Python Required Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import re

# 3. Import Dataset

In [None]:
df=pd.read_csv('E:/Python Projects/shohreh/Shohreh_GitHub_Repository/Data-Analysis-And-Machine-Learning-Projects/2. Marketing and Customer Analysis/Udemy courses/Udemy-Dataset.csv')

# 4. First Organization

In [None]:
df.info()

In [None]:
df.head(2)

In [None]:
df.isna().sum()

In [None]:
df.duplicated().sum()

In [None]:
df.describe().T

# 5. Data cleaning

## 5.1 duplicated values

In [None]:
df[df.duplicated()]

### After checking for duplicates, no duplicate records were found in the dataset.

## 5.2 Unique values

In [None]:
pd.DataFrame(df.nunique(), columns=['Number of unique Values'])

## 5.3 value counts

In [None]:
def value_counts(col, top_n=10):
    """
    Returns the top N unique values and their counts for a specified column.

    Parameters:
    - col (str): The column name for which to calculate value counts.
    - top_n (int): The number of top unique values to return (default is 10).

    Returns:
    - pd.DataFrame: A DataFrame containing the top N unique values and their counts.
    """
    # Calculate value counts and convert to DataFrame
    value_counts_df = pd.DataFrame(df[col].value_counts().head(top_n))
    
    # Rename columns for better readability
    value_counts_df.columns = ['Count']
    
    # Reset index to make the unique values a column
    value_counts_df.reset_index(inplace=True)
    value_counts_df.rename(columns={'index': col}, inplace=True)
    
    return value_counts_df

## 5.4 published_timestamp

In [None]:
df['published_timestamp']=pd.to_datetime(df['published_timestamp'])

In [None]:
df.info()

## 5.5 content_duration

In [None]:
df['content_duration']

In [None]:
# First try
#for i in range (0,df.shape[0]-1):
 #   if df['content_duration'].str.contains('mins')[i] == True:
  #      df['content_duration']=df['content_duration'].str.replace(' mins', '')
   # elif df['content_duration'].str.contains("hour")[i] == True:
    #    df['content_duration']=df['content_duration'].str.replace(r" hour.*", '')

In [None]:
## second try:
# Ensure the 'content_duration' column is treated as a string
df['content_duration']=df['content_duration'].astype(str)

for i, row in df.iterrows():
    # Extract hours
    hours = re.search(r"(\d+)\s*hours?", row['content_duration'])
    hours = float(hours.group(1)) if hours else 0

    # Extract minute
    mins = re.search(r"(\d+)\s*mins?", row['content_duration'])
    mins = float(mins.group(1)) if mins else 0

    # Calculate total duration in hours
    total_duration= (hours*60) + mins

    # Update the DataFrame
    df.at[i,'content_duration']= round(total_duration)

In [None]:
## Modify code for content_duration column:

# Function to convert duration to total minutes
def convert_to_minutes(duration):
    # Extract hours
    hours = re.search(r"(\d+)\s*hours?", duration, re.IGNORECASE)
    hours = float(hours.group(1)) if hours else 0

    # Extract minutes
    mins = re.search(r"(\d+)\s*mins?", duration, re.IGNORECASE)
    mins = float(mins.group(1)) if mins else 0

    # Convert to total minutes
    return int((hours * 60) + mins)

# Apply the function to the column
df['content_duration'] = df['content_duration'].apply(convert_to_minutes)



In [None]:
df['content_duration']

## 5.6 price

In [None]:
value_counts('price')

In [None]:
# replacing Free with 0
df['price']=df['price'].replace('Free',0)

## 5.7 change columns dtype

In [None]:
def change_type(df,dtype,columns):
    for col in columns:
        df[col]=df[col].astype(dtype)

In [None]:
change_type(df, 'float', [['price' ]])

In [None]:
df.info()

## 5.8 map level column

In [None]:
value_counts('level')

In [None]:
level_mapping={'All Levels' : 0, 'Beginner Level': 1, 'Intermediate Level': 2, 'Expert Level': 3}
df['level']= df['level'].map(level_mapping)

## 5.9 change columns Name

In [None]:
df.columns=['course_id','title', 'is_paid', 'price', 'subscribers','reviews', 'lectures', 'level', 'content_duration_mins','published_time', 'subject']

## 5.10 Drop columns

In [51]:
df.drop(columns=['course_id'])

Unnamed: 0,course_title,is_paid,price,num_subscribers,num_reviews,num_lectures,level,content_duration,published_timestamp,subject
0,#1 Piano Hand Coordination: Play 10th Ballad i...,True,35,3137,18,68,All Levels,1.5 hours,2014-09-18T05:07:05Z,Musical Instruments
1,#10 Hand Coordination - Transfer Chord Ballad ...,True,75,1593,1,41,Intermediate Level,1 hour,2017-04-12T19:06:34Z,Musical Instruments
2,#12 Hand Coordination: Let your Hands dance wi...,True,75,482,1,47,Intermediate Level,1.5 hours,2017-04-26T18:34:57Z,Musical Instruments
3,#4 Piano Hand Coordination: Fun Piano Runs in ...,True,75,850,3,43,Intermediate Level,1 hour,2017-02-21T23:48:18Z,Musical Instruments
4,#5 Piano Hand Coordination: Piano Runs in 2 ...,True,75,940,3,32,Intermediate Level,37 mins,2017-02-21T23:44:49Z,Musical Instruments
...,...,...,...,...,...,...,...,...,...,...
3677,Your Own Site in 45 Min: The Complete Wordpres...,True,120,1566,29,36,All Levels,4 hours,2015-04-20T22:15:17Z,Web Development
3678,Your Second Course on Piano: Two Handed Playing,True,70,1018,12,22,Beginner Level,5 hours,2015-10-26T20:04:21Z,Musical Instruments
3679,Zend Framework 2: Learn the PHP framework ZF2 ...,True,40,723,130,37,All Levels,6.5 hours,2015-11-11T18:55:45Z,Web Development
3680,Zoho Books Gestion Financière d'Entreprise pas...,False,Free,229,0,33,All Levels,2 hours,2017-05-26T16:45:55Z,Business Finance
