# **Udemy COURSES ANALYSIS**



I came across Onyx Data and DataDNA Dataset Challenge on LinkedIn and decided to do a little research on the challenge. It was inline with my aspiration; to become a practical and profound data professional. This challenge expose data analyst to actual data for them to utilize their visual design , analytical, story telling and technological innovations skills to glean insight from the data and leverage on the findings to inform decision making.


This is the January 2024 challenge and beneath is I went about my analysis


## **IMPORT LIBRARY**

1. **pandas (import pandas as pd):** Pandas is a powerful data manipulation and analysis library in Python. It provides data structures like DataFrame and Series, which are essential for handling and analyzing structured data. With Pandas, you can easily load, manipulate, and analyze data, making it a fundamental library for data analysis and preparation.

2. **plotly.express (import plotly.express as px):** Plotly Express is a high-level data visualization library built on top of Plotly. It offers a simplified interface for creating a variety of interactive and visually appealing plots with minimal code. Plotly Express is particularly useful for creating charts like scatter plots, line charts, bar charts, and more, with a focus on ease of use.

3. **plotly.graph_objects (import plotly.graph_objects as go):** Plotly Graph Objects provides a lower-level interface compared to Plotly Express. It allows for more fine-grained control over the appearance and customization of plots. With this library, you can create sophisticated and customized visualizations using a broader set of configuration options.

4. **plotly.io (import plotly.io as pio):** Plotly IO is a module within Plotly that provides functionality for reading and writing different file formats for plots. It allows you to export plots to various formats such as HTML, JSON, static images, or interactive web-based visualizations.

5. **plotly.colors (import plotly.colors):** Plotly Colors provides a collection of predefined color scales and color-related functions. It is useful for customizing the color schemes of your plots, ensuring aesthetically pleasing and meaningful visualizations.

In [1]:


# These libraries collectively empower data scientists and analysts to efficiently handle data, 
# explore patterns, and communicate insights through visually compelling plots and charts.

import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
import plotly.colors as colors



## **DATASET**

The dataset comprises real-world data sourced from the Udemy learning platform, generously provided by Onyx Data and DataDNA. This dataset contains a diverse set of variables.

* course_id => A unique identifier for each course.
* course_title => The title of the course.
* url => URL of the course on Udemy.
* is_paid => Indicates whether the course is paid or free.
* price => The price of the course (if it's a paid course).
* num_subscribers => The number of subscribers for the course.
* num_reviews => The number of reviews the course has received.
* num_lectures => The number of lectures in the course.
* level => The level of the course (e.g., All Levels, Intermediate Level).
* content_duration => The duration of the course content in hours.
* published_timestamp => The date and time when the course was published.
* subject => The subject category of the course.




1. `excel_file_path = "Onyx Data -DataDNA Dataset Challenge - Udemy Courses - January 2024.xlsx"`

   This line of code assigns a string value to the variable `excel_file_path`. The string represents the file path of an Excel file named "Onyx Data -DataDNA Dataset Challenge - Udemy Courses - January 2024.xlsx".

2. `data = pd.read_excel(excel_file_path)`

   This line of code uses the pandas library to read the Excel file specified by the `excel_file_path` variable into a pandas DataFrame named `data`. The `read_excel()` function is a pandas function designed to read Excel files. It takes the file path as an argument and returns a DataFrame containing the data from the Excel file.


In [2]:

excel_file_path = "Onyx Data -DataDNA Dataset Challenge - Udemy Courses - January 2024.xlsx"

# Read the Excel file into a pandas DataFrame
data = pd.read_excel(excel_file_path)




`.head(2)`: This is a method call on the DataFrame `data`. The `head()` method is used to retrieve the first few rows of a DataFrame. By passing the argument `2` within the parentheses, it specifies that you want to retrieve the first 2 rows of the DataFrame. 

So, when you execute `data.head(2)`, it will display the first 2 rows of your DataFrame `data`, providing a quick preview of the structure and content of your data. This is often useful for initial exploration and understanding of the dataset.

In [3]:
data.head(2)

Unnamed: 0,course_id,course_title,url,is_paid,price,num_subscribers,num_reviews,num_lectures,level,content_duration,published_timestamp,subject
0,1070968,Ultimate Investment Banking Course,https://www.udemy.com/ultimate-investment-bank...,True,200,2147,23,51,All Levels,1.5,2017-01-18T20:58:58Z,Business Finance
1,1113822,Complete GST Course & Certification - Grow You...,https://www.udemy.com/goods-and-services-tax/,True,75,2792,923,274,All Levels,39.0,2017-03-09T16:34:20Z,Business Finance


The code `data.shape` returns a tuple representing the dimensions of the DataFrame `data`. 

The shape attribute of a DataFrame in pandas provides information about the number of rows and columns in the DataFrame. It returns a tuple where the first element represents the number of rows and the second element represents the number of columns.

So, when you execute `data.shape`, you will get a tuple containing two integers: the number of rows followed by the number of columns in the DataFrame `data`. 

In [4]:
# Our data has 3678 rows and 12 coumns
data.shape

(3678, 12)

`data.columns` is retrieving the column labels (names) of the DataFrame data. It returns an Index object containing the column labels.

In [5]:
data.columns

Index(['course_id', 'course_title', 'url', 'is_paid', 'price',
       'num_subscribers', 'num_reviews', 'num_lectures', 'level',
       'content_duration', 'published_timestamp', 'subject'],
      dtype='object')

Let's explore the data by understanding the variable names of our data

The code `data["course_id"].nunique()` calculates the number of unique values in the "course_id" column of the DataFrame `data`.


- `data["course_id"]`: This selects the column labeled "course_id" from the DataFrame `data`. It retrieves the data in this column.

- `.nunique()`: This is a method that is applied to the selected column. The `nunique()` method calculates the number of unique values in the column. It counts the number of distinct values that appear in the column.



In [6]:
#
data["course_id"].nunique()

3672

This code identifies duplicated rows in the DataFrame `data` based on all columns. 

- `data[data.duplicated()]`: This line of code uses the `duplicated()` method to identify rows in the DataFrame `data` that are duplicates. When called without any arguments, `duplicated()` checks for duplicated rows based on all columns. It returns a boolean Series where `True` indicates that a row is a duplicate and `False` indicates it is not.

- `duplicated_rows`: This assigns the result of the previous line to the variable `duplicated_rows`. It contains the subset of the DataFrame `data` where the rows are identified as duplicates based on all columns.


In [7]:


# Identify duplicated rows based on all columns
duplicated_rows = data[data.duplicated()]

# 6 rows beign duplicated in the dataset
duplicated_rows



Unnamed: 0,course_id,course_title,url,is_paid,price,num_subscribers,num_reviews,num_lectures,level,content_duration,published_timestamp,subject
787,837322,Essentials of money value: Get a financial Life !,https://www.udemy.com/essentials-of-money-value/,True,20,0,0,20,All Levels,0.616667,2016-05-16T18:28:30Z,Business Finance
788,1157298,Introduction to Forex Trading Business For Beg...,https://www.udemy.com/introduction-to-forex-tr...,True,20,0,0,27,Beginner Level,1.5,2017-04-23T16:19:01Z,Business Finance
894,1035638,Understanding Financial Statements,https://www.udemy.com/understanding-financial-...,True,25,0,0,10,All Levels,1.0,2016-12-15T14:56:17Z,Business Finance
1100,1084454,CFA Level 2- Quantitative Methods,https://www.udemy.com/cfa-level-2-quantitative...,True,40,0,0,35,All Levels,5.5,2017-07-02T14:29:35Z,Business Finance
1473,185526,MicroStation - CÃ©lulas,https://www.udemy.com/microstation-celulas/,True,20,0,0,9,Beginner Level,0.616667,2014-04-15T21:48:55Z,Graphic Design
2561,28295,Learn Web Designing & HTML5/CSS3 Essentials in...,https://www.udemy.com/build-beautiful-html5-we...,True,75,43285,525,24,All Levels,4.0,2013-01-03T00:55:31Z,Web Development


This code identifies duplicated rows in the DataFrame `data` and sorts them by the "course_id" column. 

- `data[data.duplicated(keep=False)]`: This line of code uses the `duplicated()` method with the argument `keep=False` to identify rows in the DataFrame `data` that are duplicates. Setting `keep=False` ensures that all instances of duplicated rows are marked as `True`. It returns a boolean Series where `True` indicates that a row is a duplicate and `False` indicates it is not.



- `sorted_duplicated_rows = duplicated_rows.sort_values(by='course_id')`: This line of code sorts the duplicated rows subset (`duplicated_rows`) by the "course_id" column using the `sort_values()` method. This ensures that the duplicated rows are ordered based on the values in the "course_id" column.



In [39]:


duplicated_rows = data[data.duplicated(keep=False)]
# Sort the duplicated rows by the "course_id" column
sorted_duplicated_rows = duplicated_rows.sort_values(by='course_id')
# sorted_duplicated_rows



This code removes duplicated rows from the DataFrame `data`, keeping only the last occurrence of each duplicated row. 

- `data.drop_duplicates(keep='last')`: This line of code uses the `drop_duplicates()` method to remove duplicated rows from the DataFrame `data`. The argument `keep='last'` specifies that only the last occurrence of each duplicated row should be kept, and earlier occurrences should be removed. It returns a new DataFrame with duplicated rows removed.

- `data = ...`: This line of code assigns the result of the `drop_duplicates()` operation back to the variable `data`. This updates the DataFrame `data` to contain the new DataFrame with duplicated rows removed.

So, after executing this code, the DataFrame `data` will no longer contain any duplicated rows. Only the last occurrence of each duplicated row will be retained. This can be useful for cleaning up duplicated data in your dataset.

In [40]:

# If you want to keep the last occurrence, use the following:
data = data.drop_duplicates(keep='last')
# data


This code performs two operations on the "course_title" column of the DataFrame `data`. 

1. `print(data["course_title"].nunique())`:
   - This line of code calculates and prints the number of unique values in the "course_title" column of the DataFrame `data`. 
   - `data["course_title"].nunique()` returns the number of unique titles in the "course_title" column.
   - `print()` is used to display this number.

2. `data["course_title"].unique()`:
   - This line of code retrieves and displays the unique values (titles) in the "course_title" column of the DataFrame `data`.
   - `data["course_title"].unique()` returns an array containing all unique titles in the "course_title" column.

So, overall, this code snippet provides insights into the uniqueness and variety of course titles in the dataset. The first line prints the number of unique titles, while the second line displays the actual unique titles themselves. This is useful for understanding the diversity and distribution of course titles in the dataset.

In [10]:

print(data["course_title"].nunique())
data["course_title"].unique()


3663


array(['Ultimate Investment Banking Course',
       'Complete GST Course & Certification - Grow Your CA Practice',
       'Financial Modeling for Business Analysts and Consultants', ...,
       'Learn and Build using Polymer',
       'CSS Animations: Create Amazing Effects on Your Website',
       "Using MODX CMS to Build Websites: A Beginner's Guide"],
      dtype=object)

This code identifies duplicated course titles in the DataFrame `data` and then calculates the number of duplicated course titles.

1. `duplicated_course_title = data[data["course_title"].duplicated()]`:
   - This line of code creates a subset of the DataFrame `data` containing rows where the "course_title" column has duplicated values.
   - `data["course_title"].duplicated()` returns a boolean mask indicating whether each value in the "course_title" column is duplicated. Rows where this mask is `True` are selected using boolean indexing.

2. `duplicated_course_title.shape`:
   - This line of code accesses the `shape` attribute of the DataFrame `duplicated_course_title`, which returns a tuple representing the dimensions of the DataFrame.
   - Since we are interested in the number of rows (instances), we retrieve the first element of the tuple, which corresponds to the number of rows with duplicated course titles.



In [11]:


duplicated_course_title = data[data["course_title"].duplicated()]
# number of duplicated course titles
duplicated_course_title.shape



(9, 12)


1. `print(data["is_paid"].nunique())`:
   - This line calculates and prints the number of unique values in the "is_paid" column of the DataFrame `data`.
   - `data["is_paid"].nunique()` returns the number of unique categories present in the "is_paid" column.
   - `print()` is used to display this number.

2. `data["is_paid"].unique()`:
   - This line retrieves and displays the unique values in the "is_paid" column of the DataFrame `data`.
   - `data["is_paid"].unique()` returns an array containing all unique categories present in the "is_paid" column.

So, the combination of these two lines provides insights into the unique categories present in the "is_paid" column. The first line prints the number of unique categories, while the second line displays the actual unique categories themselves. 


In [12]:

# contains two categories - TRUE and FALSE
print(data["is_paid"].nunique())
data["is_paid"].unique()


2


array([ True, False])

In [13]:
# Four(4) different levels - 'All Levels', 'Intermediate Level', 'Beginner Level','Expert Level'

print(data["level"].nunique())
data["level"].unique()


4


array(['All Levels', 'Intermediate Level', 'Beginner Level',
       'Expert Level'], dtype=object)

In [14]:
# Four(4) different levels - 'Business Finance', 'Graphic Design', 'Musical Instruments','Web Development'

print(data["subject"].nunique())
data["subject"].unique()



4


array(['Business Finance', 'Graphic Design', 'Musical Instruments',
       'Web Development'], dtype=object)

The provided code generates descriptive statistics for the DataFrame `data`, excluding the specified column `'course_id'`. 

1. `data.describe()`:
   - This line generates descriptive statistics for all numerical columns in the DataFrame `data`. The `describe()` method computes various summary statistics, including count, mean, standard deviation, minimum, maximum, and quartile values, for each numerical column in the DataFrame. By default, it includes all numerical columns in the computation.

2. `column_to_exclude = 'course_id'`:
   - This line defines a variable `column_to_exclude` and assigns the value `'course_id'` to it. This variable specifies the name of the column that should be excluded from the computation of descriptive statistics.

3. `descriptive_statistics = data.drop(columns=column_to_exclude).describe()`:
   - This line generates descriptive statistics for all numerical columns in the DataFrame `data`, excluding the column specified by `column_to_exclude` (i.e., `'course_id'`). 
   - `data.drop(columns=column_to_exclude)` creates a new DataFrame where the column specified by `column_to_exclude` ('course_id') is dropped.
   - `.describe()` is then applied to this modified DataFrame, computing descriptive statistics for all remaining numerical columns.

4. `descriptive_statistics`:
   - This line assigns the result of the descriptive statistics computation to the variable `descriptive_statistics`. 
   - The variable `descriptive_statistics` holds a DataFrame containing the descriptive statistics (count, mean, standard deviation, minimum, maximum, and quartile values) for all numerical columns in `data`, excluding the specified column ('course_id').

By executing this code, you'll obtain descriptive statistics for all numerical columns in the DataFrame `data`, except for the column `'course_id'`. This allows for a focused analysis of numerical data while excluding a specific column from consideration.

In [15]:


data.describe()
# Generate descriptive statistics excluding the specified column

column_to_exclude = 'course_id'
descriptive_statistics = data.drop(columns=column_to_exclude).describe()
descriptive_statistics


Unnamed: 0,price,num_subscribers,num_reviews,num_lectures,content_duration
count,3672.0,3672.0,3672.0,3672.0,3672.0
mean,66.102941,3190.586874,156.37146,40.140251,4.097603
std,61.03592,9488.105448,936.178649,50.417102,6.05783
min,0.0,0.0,0.0,0.0,0.0
25%,20.0,111.75,4.0,15.0,1.0
50%,45.0,912.0,18.0,25.0,2.0
75%,95.0,2548.75,67.0,46.0,4.5
max,200.0,268923.0,27445.0,779.0,78.5


## **FEATURE ENGINEERING / DATA CLEANING**

The code below uses the **isna()** method to create a boolean mask where True represents NaN values, and then **sum()** is applied to count the number of True values (which are the NaN values) along each column.

If you want the total count of NaN values in the entire DataFrame, you can use **data.isna().sum().sum()**.

In [16]:
# There are no NaN values in the dataset
print(data.isna().sum())
print(data.isna().sum().sum())

course_id              0
course_title           0
url                    0
is_paid                0
price                  0
num_subscribers        0
num_reviews            0
num_lectures           0
level                  0
content_duration       0
published_timestamp    0
subject                0
dtype: int64
0


The provided code snippet is converting the values in the "published_timestamp" column of the DataFrame `data` into datetime objects using the `pd.to_datetime()` function from the pandas library. 

1. `data['published_timestamp']`:
   - This selects the "published_timestamp" column from the DataFrame `data`, which presumably contains values representing timestamps or dates.

2. `pd.to_datetime(data['published_timestamp'])`:
   - This part of the code applies the `pd.to_datetime()` function to the values in the "published_timestamp" column. 
   - The `pd.to_datetime()` function is a pandas function used to convert argument to datetime. When applied to a Series (in this case, the "published_timestamp" column), it converts the values to datetime objects, enabling easier manipulation and analysis of dates and times.

3. `data['published_timestamp'] = ...`:
   - Finally, the converted datetime objects are assigned back to the "published_timestamp" column in the DataFrame `data`. By doing so, the original values in the "published_timestamp" column are replaced with the corresponding datetime objects.



In [17]:

data['published_timestamp'] = pd.to_datetime(data['published_timestamp'])


The provided code adds three new columns to the DataFrame `data`, extracting specific components (month, year, and day of the week) from the "published_timestamp" column. Here's what each part of the code does:

1. `data['published Month'] = data['published_timestamp'].dt.month`:
   - This line extracts the month component from the "published_timestamp" column using the `.dt.month` accessor. 
   - The `.dt` accessor allows access to the datetime properties of a Series. By chaining `.month`, it extracts the month component from each datetime value in the "published_timestamp" column.
   - The extracted month values are then assigned to a new column named "published Month" in the DataFrame `data`.

2. `data['published Year'] = data['published_timestamp'].dt.year`:
   - Similar to the previous line, this line extracts the year component from the "published_timestamp" column using the `.dt.year` accessor.
   - It retrieves the year component from each datetime value in the "published_timestamp" column.
   - The extracted year values are assigned to a new column named "published Year" in the DataFrame `data`.

3. `data['published Day of Week'] = data['published_timestamp'].dt.dayofweek`:
   - This line extracts the day of the week component from the "published_timestamp" column using the `.dt.dayofweek` accessor.
   - The `.dayofweek` attribute returns the day of the week as an integer, where Monday is 0 and Sunday is 6.
   - The extracted day of the week values are assigned to a new column named "published Day of Week" in the DataFrame `data`.

After executing this code, `data` will contain three new columns: "published Month", "published Year", and "published Day of Week", each corresponding to the extracted components from the "published_timestamp" column. These new columns will be useful for temporal analysis and visualization of the data.

In [18]:

data['published Month'] = data['published_timestamp'].dt.month 
data['published Year'] = data['published_timestamp'].dt.year
data['published Day of Week'] = data['published_timestamp'].dt.dayofweek


## **ANALYSIS**

In [19]:



# Group the data by the 'is_paid' column, count the occurrences of each category,
# and reset the index of the resulting Series while renaming the column to 'observation_count'
is_paid_counts = data.groupby('is_paid').size().reset_index(name='observation_count')

# Display the resulting DataFrame showing the counts of observations for each category of 'is_paid'
is_paid_counts



Unnamed: 0,is_paid,observation_count
0,False,310
1,True,3362


In [20]:




# Group the data by the 'is_paid' column, count the occurrences of each category,
# and reset the index of the resulting DataFrame while renaming the column to 'observation_count'
is_paid_counts = data.groupby('is_paid').size().reset_index(name='observation_count')

# Create a pie chart using Plotly Express with 'observation_count' as values, 'is_paid' as names,
# and customize the title and labels
fig = px.pie(is_paid_counts, values='observation_count', names='is_paid', 
             title='DISTRIBUTION OF PAYMENT STATUS', labels={'is_paid':'Paid Status'},
             template='plotly', hole=0.3)

# Hide the legend in the pie chart
fig.update_layout(showlegend=False)

# Customize text information to display percentage and labels, and adjust slice separation
fig.update_traces(textinfo='percent+label', pull=[0.1, 0])

# Display the pie chart
fig.show()






This fig illustrates the distribution of the "is paid" column in the table. The data reveals that 91.6% of Udemy courses are not offered for free. Among the 3,672 unique courses examined, only 310 are available at no cost. The graphical representation provides a clear overview of the prevalence of paid courses on Udemy. This analysis underscores that the majority of courses on the platform require payment, with a limited number offered as complimentary resources.

![image.png](attachment:image.png)

Fig. 1 illustrates the distribution of the "is paid" column in the table. The data reveals that 91.6% of Udemy courses are not offered for free. Among the 3,672 unique courses examined, only 310 are available at no cost. The graphical representation provides a clear overview of the prevalence of paid courses on Udemy. This analysis underscores that the majority of courses on the platform require payment, with a limited number offered as complimentary resources.

In [21]:


# Group the data by 'is_paid' and calculate the sum of 'num_subscribers' for each payment status
is_paid_by_num_subscribers = data.groupby('is_paid')['num_subscribers'].sum().reset_index()

# Create a bar chart using Plotly Express with 'is_paid' on the x-axis, 'num_subscribers' on the y-axis,
# and customize the title
fig = px.bar(is_paid_by_num_subscribers, x='is_paid', 
             y='num_subscribers', 
             title='PAYMENT STATUS BY NUMBER OF SUBSCRIBERS')

# Display the bar chart
fig.show()




The relationship between the "is paid" column and the subscriber column is visually explored, revealing significant insights. The data indicates that over 8 million individuals are subscribers to paid courses, emphasizing a substantial audience for this category. In contrast, approximately 3.52 million individuals are involved with free courses, indicating a sizable yet comparatively smaller user base. This visual representation enables a direct comparison of subscriber engagement between paid and free courses, underscoring the considerable appeal and demand for paid content evident in the larger subscriber count. Moreover, it highlights the substantial audience attracted to free offerings, suggesting diverse user preferences. The data points provide valuable insights into user behavior and preferences, aiding in understanding subscription patterns and engagement dynamics across different course categories. This analytical approach contributes to assessing the effectiveness of various pricing strategies and content offerings on the platform.

![image.png](attachment:image.png)

The relationship between the "is paid" column and the subscriber column is visually explored, revealing significant insights. The data indicates that over 8 million individuals are subscribers to paid courses, emphasizing a substantial audience for this category. In contrast, approximately 3.52 million individuals are involved with free courses, indicating a sizable yet comparatively smaller user base. This visual representation enables a direct comparison of subscriber engagement between paid and free courses, underscoring the considerable appeal and demand for paid content evident in the larger subscriber count. Moreover, it highlights the substantial audience attracted to free offerings, suggesting diverse user preferences. The data points provide valuable insights into user behavior and preferences, aiding in understanding subscription patterns and engagement dynamics across different course categories. This analytical approach contributes to assessing the effectiveness of various pricing strategies and content offerings on the platform.

In [22]:


# Group the data by 'is_paid' and calculate the mean number of subscribers for each payment status
is_paid_by_avgnum_subscribers = data.groupby('is_paid')['num_subscribers'].mean().reset_index()

# Create a bar chart using Plotly Express with payment status on the x-axis, average number of subscribers on the y-axis,
# and a title describing the visualization
fig = px.bar(is_paid_by_avgnum_subscribers, x='is_paid', 
             y='num_subscribers', 
             title='Average Number of Subscribers by Payment Status')

# Display the bar chart
fig.show()




![image.png](attachment:image.png)

In [23]:


# Group the data by the year of publication ('published Year') and calculate the sum of prices (total revenue)
price_by_published_Year = data.groupby('published Year')['price'].sum()

# Format the total revenue in accountant format (e.g., 2,000.00)
price_by_published_Year_formatted = price_by_published_Year.map(lambda x: '${:,.2f}'.format(x))

# Display the total revenue generated from course sales for each year as a series with accountant format
price_by_published_Year_formatted





published Year
2011       $310.00
2012     $1,835.00
2013    $10,785.00
2014    $23,780.00
2015    $67,830.00
2016    $84,165.00
2017    $54,025.00
Name: price, dtype: object

In [24]:


# Group the data by the year of publication ('published Year') and calculate the sum of prices (total revenue)
price_by_published_Year = data.groupby('published Year')['price'].sum().reset_index()

# Create a line plot using Plotly Express with 'published Year' on the x-axis, 'price' on the y-axis,
# and a title describing the visualization
fig = px.line(price_by_published_Year, 
              x='published Year', 
              y='price', 
              title='REVENUE BY PUBLISHED YEAR')

# Display the line plot
fig.show()




The data represents annual revenues over a seven-year span, starting at $310.00 in 2011 and gradually rising to $1,835.00 in 2012, $10,785.00 in 2013, and $23,780.00 in 2014. A significant jump occurred in 2015, with revenue reaching $67,830.00, and peaking at $84,165.00 in 2016. However, in 2017, there was a notable decline in revenue to $54,025.00.

The line graph effectively illustrates the connection between the published year and the corresponding revenue figures. Notably, the trend indicates a consistent increase in revenue from 2011 to 2016, suggesting positive financial growth during this period. However, 2017 marks a departure from this trend, as the revenue experiences a decline.

This visual representation offers a clear and concise overview of the revenue dynamics over the years, highlighting the upward trajectory followed by a downturn in the final recorded year.

![image.png](attachment:image.png)

The data represents annual revenues over a seven-year span, starting at $310.00 in 2011 and gradually rising to $1,835.00 in 2012, $10,785.00 in 2013, and $23,780.00 in 2014. A significant jump occurred in 2015, with revenue reaching $67,830.00, and peaking at $84,165.00 in 2016. However, in 2017, there was a notable decline in revenue to $54,025.00.

The line graph effectively illustrates the connection between the published year and the corresponding revenue figures. Notably, the trend indicates a consistent increase in revenue from 2011 to 2016, suggesting positive financial growth during this period. However, 2017 marks a departure from this trend, as the revenue experiences a decline.

This visual representation offers a clear and concise overview of the revenue dynamics over the years, highlighting the upward trajectory followed by a downturn in the final recorded year.

In [25]:


# Compute the average price of courses for each difficulty level
avg_price_by_level = data.groupby('level')["price"].mean().reset_index()

# Create a pie chart using Plotly Express with average prices as values, difficulty levels as names,
# and a title describing the visualization
fig = px.pie(avg_price_by_level, values='price', names='level', 
             title='AVERAGE PRICE ANALYSIS BY LEVEL', 
             template='plotly', hole=0.3)

# Hide the legend in the pie chart
fig.update_layout(showlegend=False)

# Customize text information to display percentage and labels, and adjust slice separation
fig.update_traces(textinfo='percent+label', pull=[0.1, 0]) 

# Display the pie chart
fig.show()



The figure above illustrates the mean prices associated with courses across four distinct subjects available for enrollment on Udemy. These subjects comprise Business Finance, Musical Instrumental, Graphic Design, and Web Development, each offering unique learning opportunities. Notably, the average price for Business Finance courses stands at $68.69, while Web Development courses command a slightly higher average price of $77.04. Conversely, Graphic Design courses maintain an average price of $57.89, and Musical Instrumental courses are priced at an average of $49.56. These price differentials shed light on why Web Development emerges as the top revenue generator for Udemy, reflecting its comparatively higher average price point. The nuanced interplay between subject pricing and revenue underscores the significance of Web Development courses in driving Udemy's financial success.

![image.png](attachment:image.png)

The figure above illustrates the mean prices associated with courses across four distinct subjects available for enrollment on Udemy. These subjects comprise Business Finance, Musical Instrumental, Graphic Design, and Web Development, each offering unique learning opportunities. Notably, the average price for Business Finance courses stands at $68.69, while Web Development courses command a slightly higher average price of $77.04. Conversely, Graphic Design courses maintain an average price of $57.89, and Musical Instrumental courses are priced at an average of $49.56. These price differentials shed light on why Web Development emerges as the top revenue generator for Udemy, reflecting its comparatively higher average price point. The nuanced interplay between subject pricing and revenue underscores the significance of Web Development courses in driving Udemy's financial success.

In [26]:


# Calculate the total revenue generated from course sales for each subject category
price_by_subject = data.groupby('subject')["price"].sum().reset_index()

# Create a pie chart using Plotly Express with total revenue as values, subject categories as names,
# and a title describing the visualization
fig = px.pie(price_by_subject, values='price', names='subject', 
             title='PRICE ANALYSIS BY SUBJECT', 
             template='plotly', hole=0.3)

# Hide the legend in the pie chart
fig.update_layout(showlegend=False)

# Customize text information to display percentage and labels, and adjust slice separation
fig.update_traces(textinfo='percent+label', pull=[0.1, 0])  

# Display the pie chart
fig.show()




The figure above delineates the comprehensive financial distribution among four distinct subjects available for enrollment. The subjects encompass Business Finance, Musical Instrumental, Graphic Design, and Web Development, each contributing to the overall revenue composition of Udemy. Remarkably, Web Development stands out as the highest income-generating subject, yielding $92,365, constituting 38.1% of the total revenue. Following closely, Business Finance secures the second-highest income at $81,815, representing 33.7% of the overall income. Graphic Design contributes significantly with an annual income of $34,850, constituting 14.4%, while Musical Instrumentals amass an income of $33,750, representing 13.9%. The graphical representation of this data is encapsulated in a pie chart, offering a visually intuitive understanding of the proportional financial impact of each subject within Udemy's revenue structure.

![image.png](attachment:image.png)

The figure above delineates the comprehensive financial distribution among four distinct subjects available for enrollment. The subjects encompass Business Finance, Musical Instrumental, Graphic Design, and Web Development, each contributing to the overall revenue composition of Udemy. Remarkably, Web Development stands out as the highest income-generating subject, yielding $92,365, constituting 38.1% of the total revenue. Following closely, Business Finance secures the second-highest income at $81,815, representing 33.7% of the overall income. Graphic Design contributes significantly with an annual income of $34,850, constituting 14.4%, while Musical Instrumentals amass an income of $33,750, representing 13.9%. The graphical representation of this data is encapsulated in a pie chart, offering a visually intuitive understanding of the proportional financial impact of each subject within Udemy's revenue structure.

In [27]:


# Compute the average price of courses for each subject category
avg_price_by_subject = data.groupby('subject')["price"].mean().reset_index()

# Create a pie chart using Plotly Express with average prices as values, subject categories as names,
# and a title describing the visualization
fig = px.pie(avg_price_by_subject, values='price', names='subject', 
             title='AVERAGE PRICE ANALYSIS BY SUBJECT', 
             template='plotly', hole=0.3)

# Hide the legend in the pie chart
fig.update_layout(showlegend=False)

# Customize text information to display percentage and labels, and adjust slice separation
fig.update_traces(textinfo='percent+label', pull=[0.1, 0])  

# Display the pie chart
fig.show()





This graph illustrates the mean prices associated with courses across four distinct subjects available for enrollment on Udemy. These subjects comprise Business Finance, Musical Instrumental, Graphic Design, and Web Development, each offering unique learning opportunities. Notably, the average price for Business Finance courses stands at $68.69, while Web Development courses command a slightly higher average price of $77.04. Conversely, Graphic Design courses maintain an average price of $57.89, and Musical Instrumental courses are priced at an average of $49.56. These price differentials shed light on why Web Development emerges as the top revenue generator for Udemy, reflecting its comparatively higher average price point. The nuanced interplay between subject pricing and revenue underscores the significance of Web Development courses in driving Udemy's financial success

![image.png](attachment:image.png)

This graph illustrates the mean prices associated with courses across four distinct subjects available for enrollment on Udemy. These subjects comprise Business Finance, Musical Instrumental, Graphic Design, and Web Development, each offering unique learning opportunities. Notably, the average price for Business Finance courses stands at $68.69, while Web Development courses command a slightly higher average price of $77.04. Conversely, Graphic Design courses maintain an average price of $57.89, and Musical Instrumental courses are priced at an average of $49.56. These price differentials shed light on why Web Development emerges as the top revenue generator for Udemy, reflecting its comparatively higher average price point. The nuanced interplay between subject pricing and revenue underscores the significance of Web Development courses in driving Udemy's financial success

In [28]:
import plotly.express as px

# Create a histogram of the 'price' column with 5 bins using Plotly Express
fig = px.histogram(data, x='price', nbins=5, title='<b><span style="font-size: 20px;">Distribution of Price</span></b>')

# Update layout to center align the title
fig.update_layout(xaxis_title='Price', yaxis_title='Frequency')

# Show plot
fig.show()



Within the analyzed dataset, a detailed examination of course distribution across various price intervals reveals insightful trends. Notably, the price interval ranging from 0 to 49 exhibits the highest abundance of courses, totaling 1871. Following this, the 50-99 price bracket encompasses 945 courses, showcasing a substantial but lower count compared to the previous interval. Further delineating the distribution, the 100-149 price range features 283 courses, while the 150-199 interval closely follows with 278 counts. Interestingly, the 200-249 price tier contributes 295 courses to the dataset.

This nuanced breakdown of course counts within different price intervals provides a comprehensive understanding of the distribution landscape. The prevalence of courses in the 0-49 range suggests a significant emphasis on affordability, catering to a potentially broader audience. The subsequent intervals exhibit a descending trend, aligning with increasing price brackets. This detailed analysis not only sheds light on the overall distribution but also facilitates strategic insights into pricing dynamics and market positioning within the broader context of the analyzed dataset.

![image.png](attachment:image.png)


Within the analyzed dataset, a detailed examination of course distribution across various price intervals reveals insightful trends. Notably, the price interval ranging from 0 to 49 exhibits the highest abundance of courses, totaling 1871. Following this, the 50-99 price bracket encompasses 945 courses, showcasing a substantial but lower count compared to the previous interval. Further delineating the distribution, the 100-149 price range features 283 courses, while the 150-199 interval closely follows with 278 counts. Interestingly, the 200-249 price tier contributes 295 courses to the dataset.

This nuanced breakdown of course counts within different price intervals provides a comprehensive understanding of the distribution landscape. The prevalence of courses in the 0-49 range suggests a significant emphasis on affordability, catering to a potentially broader audience. The subsequent intervals exhibit a descending trend, aligning with increasing price brackets. This detailed analysis not only sheds light on the overall distribution but also facilitates strategic insights into pricing dynamics and market positioning within the broader context of the analyzed dataset.

In [29]:
# Calculate the total number of subscribers
total_subscribers = data["num_subscribers"].sum()

# Format the total number of subscribers in accountant form
formatted_total_subscribers = '${:,.2f}'.format(total_subscribers)

# Display the formatted total number of subscribers
formatted_total_subscribers


'$11,715,835.00'

In [30]:


# Calculate the total number of subscribers for each subject category
num_subscribers_by_subject = data.groupby('subject')["num_subscribers"].sum().reset_index()

# Create a pie chart using Plotly Express with total subscribers as values, subject categories as names,
# and a title describing the visualization
fig = px.pie(num_subscribers_by_subject, values='num_subscribers', names='subject', 
             title='SUBSCRIBERS ANALYSIS BY SUBJECT', 
             template='plotly', hole=0.3)

# Hide the legend in the pie chart
fig.update_layout(showlegend=False)

# Customize text information to display percentage and labels, and adjust slice separation
fig.update_traces(textinfo='percent+label') #, pull=[0.1, 0])  

# Display the pie chart
fig.show()




The pie chart presented above offers a comprehensive insight into the distribution of subscribers across four distinct subjects available for enrollment on Udemy: Business Finance, Musical Instrumental, Graphic Design, and Web Development. Among these subjects, Web Development emerges as the dominant force, commanding a staggering 67.7% share of the total subscribers, amounting to an impressive 7,937,281 individuals. This significant subscriber base underscores the pivotal role of Web Development in driving Udemy's revenue streams, showcasing its immense popularity among learners.

Following closely, Business Finance secures the second-highest subscriber count, boasting a considerable total of 1,868,711 subscribers, representing 16% of the overall subscriber base. Graphic Design attracts a notable following, with 1,063,148 subscribers accounting for 9.07% of the total subscribers. In the realm of Musical Instruments, there is also substantial interest, with 846,689 subscribers constituting 7.23% of the total subscriber population.

The prominence of Web Development within the subscriber landscape highlights its undeniable demand and influence within the online learning community. This analysis underscores the varying levels of interest across different subjects, offering valuable insights for educators, course developers, and stakeholders to tailor offerings and strategies to meet the diverse needs and preferences of learners.

![image.png](attachment:image.png)

The pie chart presented above offers a comprehensive insight into the distribution of subscribers across four distinct subjects available for enrollment on Udemy: Business Finance, Musical Instrumental, Graphic Design, and Web Development. Among these subjects, Web Development emerges as the dominant force, commanding a staggering 67.7% share of the total subscribers, amounting to an impressive 7,937,281 individuals. This significant subscriber base underscores the pivotal role of Web Development in driving Udemy's revenue streams, showcasing its immense popularity among learners.

Following closely, Business Finance secures the second-highest subscriber count, boasting a considerable total of 1,868,711 subscribers, representing 16% of the overall subscriber base. Graphic Design attracts a notable following, with 1,063,148 subscribers accounting for 9.07% of the total subscribers. In the realm of Musical Instruments, there is also substantial interest, with 846,689 subscribers constituting 7.23% of the total subscriber population.

The prominence of Web Development within the subscriber landscape highlights its undeniable demand and influence within the online learning community. This analysis underscores the varying levels of interest across different subjects, offering valuable insights for educators, course developers, and stakeholders to tailor offerings and strategies to meet the diverse needs and preferences of learners.

In [31]:


# Calculate the total number of subscribers for each difficulty level category
num_subscribers_by_level = data.groupby('level')["num_subscribers"].sum().reset_index()

# Create a pie chart using Plotly Express with total subscribers as values, difficulty levels as names,
# and a title describing the visualization
fig = px.pie(num_subscribers_by_level, values='num_subscribers', names='level', 
             title='SUBSCRIBERS ANALYSIS BY LEVEL', 
             template='plotly', hole=0.3)

# Hide the legend in the pie chart
fig.update_layout(showlegend=False)

# Customize text information to display percentage and labels
fig.update_traces(textinfo='percent+label') 

# Display the pie chart
fig.show()





This figure  provides a comprehensive overview of subscriber distribution across four proficiency levels: Beginner, Intermediate, All Levels, and Expert. Through the intuitive visualization of a pie chart, the intricate relationship between subscription levels and their corresponding subscriber bases is elucidated.

Remarkably, the All Levels category emerges as the most subscribed, boasting a significant 6,871,791 subscribers, constituting 58.7% of the total subscriber count. In contrast, the Expert level, while having the least number of subscribers at 50,196, represents a mere 0.48% of the total revenue. Nonetheless, it significantly contributes to the highest revenue generation, highlighting the disparity between subscriber count and revenue impact.

The Beginner level secures the second-highest subscriber count, with 4,051,843 subscribers, comprising 34.6% of the total revenue. Intermediate level subscriptions total 742,005, accounting for 6.33% of the overall subscriber base. Notably, the disparity between subscriber counts and revenue generation underscores the complexity of the relationship between subscription levels and financial impact.

This nuanced analysis underscores the need for a holistic understanding of subscriber dynamics beyond mere numbers, emphasizing the intricate interplay between subscription levels, revenue generation, and the broader financial ecosystem

![image.png](attachment:image.png)


This figure provides a comprehensive overview of subscriber distribution across four proficiency levels: Beginner, Intermediate, All Levels, and Expert. Through the intuitive visualization of a pie chart, the intricate relationship between subscription levels and their corresponding subscriber bases is elucidated.

Remarkably, the All Levels category emerges as the most subscribed, boasting a significant 6,871,791 subscribers, constituting 58.7% of the total subscriber count. In contrast, the Expert level, while having the least number of subscribers at 50,196, represents a mere 0.48% of the total revenue. Nonetheless, it significantly contributes to the highest revenue generation, highlighting the disparity between subscriber count and revenue impact.

The Beginner level secures the second-highest subscriber count, with 4,051,843 subscribers, comprising 34.6% of the total revenue. Intermediate level subscriptions total 742,005, accounting for 6.33% of the overall subscriber base. Notably, the disparity between subscriber counts and revenue generation underscores the complexity of the relationship between subscription levels and financial impact.

This nuanced analysis underscores the need for a holistic understanding of subscriber dynamics beyond mere numbers, emphasizing the intricate interplay between subscription levels, revenue generation, and the broader financial ecosystem

In [32]:
# Find the highest value in the "price" column
highest_price = data['price'].max()

highest_price = '${:,.2f}'.format(highest_price)

# Print the highest value and its corresponding column name
print("Highest Course Price:", highest_price)


Highest Course Price: $200.00


In [33]:

# Extract the top 5 most subscribed courses based on the 'num_subscribers' column
top_subscribed_courses = data.nlargest(5, 'num_subscribers')

# Convert the 'course_id' column to string
top_subscribed_courses['course_id'] = top_subscribed_courses['course_id'].astype(str)

# Sort the extracted courses by the number of subscribers in descending order
top_subscribed_courses = top_subscribed_courses.sort_values(by='num_subscribers', ascending=False)

# Create a horizontal bar chart using Plotly Express with course ID on the y-axis, number of subscribers on the x-axis,
# and a title describing the visualization
fig = px.bar(top_subscribed_courses, y='course_id', x='num_subscribers', 
             title='Top 5 Most Subscribed Courses', 
             labels={'num_subscribers': 'Number of Subscribers'},
             orientation='h')  

# Display the horizontal bar chart
fig.show()



In our examination of course popularity, we focus on the top five course_ids that have garnered the highest number of subscribers. Leading the list is course_id 41295, capturing considerable attention with a substantial 268,923 subscribers. Following closely, course_id 59014 boasts an impressive 161,029 subscribers, while course_id 625204 maintains a robust following with 121,584 subscribers. Course_id 173548 secures a notable position with 120,291 subscribers, and rounding out the top five, course_id 764164 draws significant interest with 114,512 subscribers.

These course_ids, distinguished by their substantial subscriber counts, collectively represent a considerable level of interest among the learner community. This analysis provides valuable insights into the courses that have resonated most profoundly with subscribers, facilitating a nuanced understanding of learner preferences and the popularity dynamics within the educational platform.

![image.png](attachment:image.png)


In our examination of course popularity, we focus on the top five course_ids that have garnered the highest number of subscribers. Leading the list is course_id 41295, capturing considerable attention with a substantial 268,923 subscribers. Following closely, course_id 59014 boasts an impressive 161,029 subscribers, while course_id 625204 maintains a robust following with 121,584 subscribers. Course_id 173548 secures a notable position with 120,291 subscribers, and rounding out the top five, course_id 764164 draws significant interest with 114,512 subscribers.

These course_ids, distinguished by their substantial subscriber counts, collectively represent a considerable level of interest among the learner community. This analysis provides valuable insights into the courses that have resonated most profoundly with subscribers, facilitating a nuanced understanding of learner preferences and the popularity dynamics within the educational platform.

In [34]:
data["content_duration"].describe()

count    3672.000000
mean        4.097603
std         6.057830
min         0.000000
25%         1.000000
50%         2.000000
75%         4.500000
max        78.500000
Name: content_duration, dtype: float64

In [35]:


# Create a new column 'duration_categories' in the DataFrame 'data'
# Categorize the content duration values into four groups based on specified bins and labels
data['duration_categories'] = pd.cut(data['content_duration'], bins=[0, 20, 40, 60, 80],
                                     labels=['0-20 hours', '20-40 hours', '40-60 hours', '60-80 hours'],
                                     include_lowest=True, right=False)



In [36]:


# Group the data by 'duration_categories' column and calculate the total number of subscribers for each category
num_subscribers_by_duration_categories = data.groupby('duration_categories')["num_subscribers"].sum().reset_index()

# Format the 'num_subscribers' column to include commas for thousands separator
num_subscribers_by_duration_categories['num_subscribers'] = num_subscribers_by_duration_categories['num_subscribers'].apply(lambda x: '{:,}'.format(x))

# Display the DataFrame containing the formatted total number of subscribers for each duration category
num_subscribers_by_duration_categories




Unnamed: 0,duration_categories,num_subscribers
0,0-20 hours,10665785
1,20-40 hours,783444
2,40-60 hours,222605
3,60-80 hours,44001


In [37]:


# Group the data by 'duration_categories' column and count the number of observations in each category
duration_categories = data.groupby('duration_categories').size().reset_index(name='observation_count')

# Create a pie chart using Plotly Express with the number of observations as values, duration categories as names,
# and a title describing the visualization
fig = px.pie(duration_categories, values='observation_count', names='duration_categories', 
             title='DISTRIBUTION OF DURATION', labels={'duration_categories':'duration_categories'},
             template='plotly', hole=0.3)

# Hide the legend in the pie chart
fig.update_layout(showlegend=False)

# Customize text information to display percentage and labels, and adjust slice separation
fig.update_traces(textinfo='percent+label', pull=[0.1, 0])  

# Display the pie chart
fig.show()




The analysis delves into the distribution of courses across specific time intervals, providing valuable insights into learning preferences and trends. Notably, within the initial 0-20 hours timeframe, a substantial observation of 3584 courses was recorded, representing a significant 97.6% of the total courses assessed. Following this, the subsequent time bracket of 20-40 hours showcased 67 observed courses, constituting 1.82% of the total analyzed. In the intermediary 40-60 hours duration, there were 12 observations, accounting for 0.32% of the total course count. Furthermore, within the 60-80 hours duration, a modest count of 9 observations was recorded, representing 0.245% of the total courses scrutinized. This comprehensive breakdown offers nuanced insights into how learners engage with course materials across varying time spans, potentially informing educational strategies and resource allocation. The analysis aims to discern patterns in course duration preferences, enabling educational institutions to tailor their offerings more effectively. Understanding these temporal dynamics can enhance instructional design and improve learning outcomes for diverse student populations.

![image.png](attachment:image.png)


The analysis delves into the distribution of courses across specific time intervals, providing valuable insights into learning preferences and trends. Notably, within the initial 0-20 hours timeframe, a substantial observation of 3584 courses was recorded, representing a significant 97.6% of the total courses assessed. Following this, the subsequent time bracket of 20-40 hours showcased 67 observed courses, constituting 1.82% of the total analyzed. In the intermediary 40-60 hours duration, there were 12 observations, accounting for 0.32% of the total course count. Furthermore, within the 60-80 hours duration, a modest count of 9 observations was recorded, representing 0.245% of the total courses scrutinized. This comprehensive breakdown offers nuanced insights into how learners engage with course materials across varying time spans, potentially informing educational strategies and resource allocation. The analysis aims to discern patterns in course duration preferences, enabling educational institutions to tailor their offerings more effectively. Understanding these temporal dynamics can enhance instructional design and improve learning outcomes for diverse student populations.

In [38]:


# Group the data by 'duration_categories' column and calculate the total number of subscribers for each category
num_subscribers_by_duration_categories = data.groupby('duration_categories')["num_subscribers"].sum().reset_index()

# Create a pie chart using Plotly Express to visualize the distribution of subscribers based on duration categories
fig = px.pie(num_subscribers_by_duration_categories, values='num_subscribers', names='duration_categories', 
             title='SUBSCRIBERS ANALYSIS BY DURATION', 
             template='plotly', hole=0.3)

# Hide the legend in the pie chart
fig.update_layout(showlegend=False)

# Customize text information to display percentage and labels
fig.update_traces(textinfo='percent+label')

# Update layout to center align the title, make it bold, increase font size, and add space at the bottom
fig.update_layout(title=dict(text='<b><span style="font-size: 20px;">SUBSCRIBERS ANALYSIS BY DURATION</span></b>',
                              x=0.5, y=0.95, xanchor='center', yanchor='top')) 

# Display the pie chart
fig.show()


In our analysis of content_duration and its relation to subscriber engagement, we discern the distribution of subscribers across different time brackets. Courses falling within the 0-20 hours content_duration range command a substantial majority, representing 91% of total subscriber engagement. Within this range, learners show a strong preference for concise and digestible learning modules.

Moving into the 20-40 hours bracket, courses capture a smaller yet significant portion of subscriber interest, accounting for 6.69% of total engagement. This suggests that while shorter durations remain prevalent, there is notable interest in slightly more extended learning experiences.

The 40-60 hours content_duration range indicates a modest but still significant engagement level, constituting 1.9% of the total subscriber base. Learners seeking more comprehensive content may gravitate towards offerings within this duration bracket.

Lastly, courses spanning the 60-80 hours content_duration category capture a minority share of subscriber attention, accounting for 0.37% of total engagement. While these longer-duration courses cater to a niche audience, they still play a valuable role in providing in-depth learning experiences.

This analysis underscores the diverse preferences of subscribers regarding content duration, offering valuable insights for course developers and platform stakeholders. Understanding these preferences can inform strategic decisions regarding course design and content development, ultimately enhancing the overall learning experience for subscribers.

<!-- **Price Interval** -->

![image.png](attachment:image.png)

In our analysis of content_duration and its relation to subscriber engagement, we discern the distribution of subscribers across different time brackets. Courses falling within the 0-20 hours content_duration range command a substantial majority, representing 91% of total subscriber engagement. Within this range, learners show a strong preference for concise and digestible learning modules.

Moving into the 20-40 hours bracket, courses capture a smaller yet significant portion of subscriber interest, accounting for 6.69% of total engagement. This suggests that while shorter durations remain prevalent, there is notable interest in slightly more extended learning experiences.

The 40-60 hours content_duration range indicates a modest but still significant engagement level, constituting 1.9% of the total subscriber base. Learners seeking more comprehensive content may gravitate towards offerings within this duration bracket.

Lastly, courses spanning the 60-80 hours content_duration category capture a minority share of subscriber attention, accounting for 0.37% of total engagement. While these longer-duration courses cater to a niche audience, they still play a valuable role in providing in-depth learning experiences.

This analysis underscores the diverse preferences of subscribers regarding content duration, offering valuable insights for course developers and platform stakeholders. Understanding these preferences can inform strategic decisions regarding course design and content development, ultimately enhancing the overall learning experience for subscribers.