<div style="text-align: center; background-color: #0A6EBD; font-family: 'Trebuchet MS', Arial, sans-serif; color: white; padding: 20px; font-size: 40px; font-weight: bold; border-radius: 0 0 0 0; box-shadow: 0px 6px 8px rgba(0, 0, 0, 0.2);">
 Final Project Programming for Data Science
</div>

<div style="text-align: center; background-color: #5A96E3; font-family: 'Trebuchet MS', Arial, sans-serif; color: white; padding: 20px; font-size: 40px; font-weight: bold; border-radius: 0 0 0 0; box-shadow: 0px 6px 8px rgba(0, 0, 0, 0.2);">
  Asking + Preprocesssing +Analyzing data to answer each question</div>

## Import libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## Read data from csv file

In [None]:
udemy_df = pd.read_csv("./Data/udemy.csv", parse_dates = ['published_time', 'last_update_date'])
udemy_df.head(10)

## Question 3: How udemy develope?

### Benefits of finding the answer?

- Insights into Online Education Trends: Understanding how Udemy developed provides insights into the trends and dynamics of the online education industry. This knowledge can be valuable for individuals interested in the field of e-learning.
- Entrepreneurial Inspiration: Udemy's success story can serve as inspiration for entrepreneurs looking to create platforms that make education more accessible. It showcases the potential for innovation in the education sector.
- Learning and Teaching Opportunities: Individuals interested in learning new skills or sharing their expertise can benefit from Udemy's platform. By understanding its development, users can make informed decisions about participating in the Udemy community.
- Impact on Education Accessibility: Udemy has played a role in making education accessible to a global audience. Understanding its development can contribute to discussions about the democratization of education and the role of technology in expanding learning opportunities.

### Preprocessing

- To make it easier to evaluate udemy's growth, we evaluate it by year. We create a column year.

In [None]:
udemy_df['year'] = udemy_df['published_time'].dt.year

### Analyze data to answer the question?

First, let's look at how subscriber numbers look over time.

In [None]:
num_sub_per_year = udemy_df.groupby('year')['num_subscribers'].sum()
display(num_sub_per_year)

In [None]:
plt.figure(figsize=(10, 6))
num_sub_per_year.plot(kind='bar', color='skyblue')
plt.title('Number of subscribers over year')
plt.xlabel('Year')
plt.ylabel('Number of subscribers')
plt.show()

- We see that the number of subscribers tends to increase each year. 
- In 2020, we saw a sudden increase in the number of subscribers, perhaps due to the covid 19 pandemic.

Next, the number of courses over time.

In [None]:
num_course_per_year = udemy_df.groupby('year')['id'].size()
plt.figure(figsize=(10, 6))
num_course_per_year.plot(kind='bar', color='skyblue')
plt.title('Number of courses over year')
plt.xlabel('Year')
plt.ylabel('Number of courses')
plt.show()

The number of instructor over time.

In [None]:
num_instruc_per_year = udemy_df.groupby('year')['instructor_name'].unique()
num_instruc_per_year = num_instruc_per_year.apply(lambda x: len(x))
plt.figure(figsize=(10, 6))
num_instruc_per_year.plot(kind='bar', color='skyblue')
plt.title('The number of instructors over year')
plt.xlabel('Year')
plt.ylabel('Number of instructors')
plt.show()

- We can see that after 2020 the number of registrants increased dramatically, in 2021 the number of instructors and the number of courses continues to increase. However, the number of registrations has dropped quite sharply, which tells experienced teachers when to enter the teaching market appropriately.
- In general, the number of courses and instructors still tend to increase.

Now, we'll look at another aspect of udemy's growth, looking at the average duration of each course over time.

In [None]:
average_duration_per_year = udemy_df.groupby('year')['content_length_min'].mean()
plt.figure(figsize=(10, 6))
average_duration_per_year.plot()
plt.title('The average duration of each course over year')
plt.xlabel('Year')
plt.ylabel('The average duration')
plt.show()

- We see that the average duration of the course tends to decrease. This helps instructors and teaching centers adjust course times to suit the market.

#### Conclusion:
- Over the years, Udemy has experienced substantial growth in terms of the number of subscribers, instructors, and courses. The platform's user base has likely expanded significantly as more learners around the world turn to online education.
- The growth in the number of instructors and courses on Udemy indicates a diverse range of content available on the platform. This diversity attracts learners with varied interests and learning objectives, contributing to Udemy's popularity.
- Udemy's success is likely attributed to its global appeal, with a broad and diverse user base from different countries and cultures. The platform's ability to attract instructors and learners globally demonstrates its effectiveness in providing accessible education.
- Udemy's focus on both instructors and learners has contributed to its growth. Instructors are attracted by the opportunity to reach a global audience, while learners benefit from a wide array of courses tailored to various skill levels and interests.

### Question 4: The diversity and scaling of languages?

### Benefits of finding the answer?

- It helps us see the situation about the diversity and proportion of languages ​​used in udemy courses. This can also help us predict which languages ​​will be commonly used in the near future.
- Learners tend to engage more actively with content presented in their native language. Offering courses in multiple languages can lead to increased participation, comprehension, and retention of information, as learners feel more comfortable and connected to the material.
- Udemy can tap into new markets and demographics by offering courses in different languages. This expansion can lead to increased user base and revenue opportunities as the platform becomes more inclusive and diverse.

### Preprocessing

We count number of language used in courses

In [None]:
language_per_year = udemy_df.groupby('year')['language'].unique()
language_per_year = language_per_year.apply(lambda x: len(x))

In [None]:
plt.figure(figsize=(10, 6))
language_per_year.plot(kind='bar', color='skyblue')
plt.title('Number of languages over year')
plt.xlabel('Year')
plt.ylabel('Number languages')
plt.show()

We see languages ​​diversify over time, with more and more types of languages ​​serving learner.

Next, we look at how much each language is used. Because there are a lot of language, so we take 5 top most popular language, another language we put it type 'Other'

In [None]:
list_language_top = udemy_df.groupby('language')['id'].count().nlargest(5).index
list_language_top = pd.Series(list_language_top)
language_top = udemy_df[udemy_df['language'].isin(list_language_top)]
df_language = language_top.groupby(['year', 'language']).size().unstack(fill_value=0)
df_language['Other'] = udemy_df.groupby('year')['language'].size()- df_language.sum(axis=1)
df_percentage_language = df_language.div(df_language.sum(axis=1), axis=0) * 100
df_percentage_language.plot(kind='area', stacked=True, title='Ratio between languages over years')
plt.legend(loc='upper left', bbox_to_anchor=(1, 1))
plt.xlabel('Year')
fig = plt.gcf()
fig.set_size_inches(10, 5)
plt.show()

- We can see the trend of using languages ​​such as 'Spanish', 'Postuguese', ... is increasing.
- English accounts for a large proportion but is no longer as dominant as before.

#### Conclusion:
- Udemy tailor its offerings to local markets by providing courses in languages specific to those regions. This adaptability can help the platform stay relevant and competitive in a globalized education landscape.
- Language diversity opens up opportunities for instructors proficient in specific languages to create and deliver content. This can attract skilled instructors from various linguistic backgrounds, enriching the platform with a diverse pool of expertise.