## Udemy Courses Analysis

### 1. Business Understanding

#### 1.1 *Problem Statement:* The online learning industry is growing rapidly, and Udemy, as a major player, offers a variety of courses across multiple subjects. The challenge is to understand what factors influence the popularity and success of Udemy courses to guide decisions for course creators and platform management.

#### 1.2 *Project Goal:* The goal of this analysis is to provide insights into what makes a course successful on Udemy. This includes understanding which courses attract the most subscribers, how course pricing affects popularity, and which topics are most in-demand. The analysis will also aim to identify trends in course content and performance based on various factors.

#### 1.3 *Stakeholders:*
- Udemy Platform Managers: To optimize the course offering strategy and promote high-demand content.

- Course Creators: To design and price their courses more effectively based on what works well on the platform.

- Marketing Teams: To focus on promoting courses with higher success potential.

- Data Analysts: To derive actionable insights from the data.

#### 1.4 *Key Metrics:*

- Number of Subscribers: A higher number indicates more popular courses.

- Number of Reviews: Represents course engagement and satisfaction.

- Price: Impact of pricing on course enrollment.

- Course Level: To assess if beginners, intermediate, or advanced levels attract more students.

- Subject: Popularity trends across different subject categories.

#### 1.5 Features of the Dataset

- course_id: Unique identifier for each course.

- course_title: The title of the course.

- url: Link to the course on Udemy.

- is_paid: Whether the course is free or paid.

- price: Price of the course (applicable only if it’s a paid course).

- num_subscribers: Number of students enrolled in the course.

- num_reviews: Number of reviews the course received.

- num_lectures: Total number of lectures included in the course.

- level: Course difficulty level (e.g., Beginner, Intermediate, Advanced).

- content_duration: Total length of course content in hours.

- published_timestamp: Date when the course was first published.

- subject: The topic category of the course.

#### 1.6 Null and Alternative Hypothesis

- Null Hypothesis (H0): There is no significant relationship between the price of a course and the number of subscribers it attracts.

- Alternative Hypothesis (H1): There is a significant relationship between the price of a course and the number of subscribers it attracts.

#### 1.7 10 Business Analytical Questions

1. What is the average number of subscribers for free versus paid courses?

2. Does a higher price correlate with more course reviews or better engagement?

3. Which subject areas attract the most subscribers?

4. How does the level of a course (beginner, intermediate, advanced) impact the number of subscribers?

5. Is there a trend in the number of courses published over time?

6. What is the average content duration for courses in each subject?

7. Are courses with more lectures more successful (i.e., more subscribers)?

8. What is the distribution of course prices across different subjects?

9. How do the number of reviews impact course enrollment (subscribers)?

10. Do courses published earlier perform better than recently published courses?


### 2. Data Understanding

#### 2.1 Importing relevant Libraries

In [2]:
# Data Manipulation
import pandas as pd
import numpy as np
import warnings

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

# Statistical Analysis and Hypothesis Testing
from scipy import stats
import pingouin as pg
import statsmodels.api as sm
import openpyxl

#### 2.2 Loading Dataset

In [3]:
# File path for the dataset for analysis
file_path = '../data/udemy_courses_dataset.xlsx'

# Load the file into the notebook
udemy_data = pd.read_excel(file_path)
udemy_data.head()

FileNotFoundError: [Errno 2] No such file or directory: '../data/udemy_courses_dataset.xlsx'