# Coursera

##  Introduction
Tilto, Inc. is a Lithuanian company, registered in the United States. Tilto is planning to become a company like Coursera which partners with more than 200 leading universities and companies to bring flexible, affordable, job-relevant online learning to individuals and organizations worldwide. Tilto means bridge in Lithuanian and the company plans to become a bridge between citizens of the world and their potential. 

Coursera was founded by Daphne Koller and Andrew Ng in 2012 with a vision of providing life-transforming learning experiences to learners around the world. Today, Coursera is a global online learning platform that offers anyone, anywhere, access to online courses and degrees from leading universities and companies. 

Coursera offers a range of learning opportunities from hands-on projects and courses to job-ready certificates and degree programs. 82 million learners, 100+ Fortune 500 companies, and more than 6,000 campuses, businesses, and governments come to Coursera to access world-class learning—anytime, anywhere.
 
Coursera received B Corp certification in February 2021, which means that they have a legal duty not only to their shareholders, but to also make a positive impact on society and continue to reduce barriers to world-class education for all. Titlo has a similiar vision to become an international company that impacts the world by making its citizens realized their innate potentials. 

For this analysis, I will work with a dataset provided by Coursea and obtained from Kaggle. 

## Goals

The goal of this project is to analyze Coursera's current offering and advise the leadership of Tilto, Inc on how to move forward with their vision. This analysis will answer the following questions:

**Organizations**
1. Which learning organizations have been most successful with learners?

**Enrollment**
1. How many students can Tilto Inc project to attract?

**Certificates**
1. How does offereing certificates affect learner satisfaction?
2. Which type of certificate is more beneficial?
3. Which type of certificate attracts more learners?

**Ratings**
1. Which are the highest rated courses by learners?
2. What factors affect a learner to give a higher rating to a course?

**Difficulty**
1. Which level of courses are more popular among learners?
2. How does difficulty level and offering certificates interrelate?

**Imrpovements**
1. How can Tilto Inc become a better organization than Coursera?

## Importing Libraries and Loading Data

### Importing Libraries

In [None]:
%matplotlib inline

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import warnings # Supresses FutureWarning that are unnecessary.

### Loading Data in Pandas

The data is a csv file that I have dowloaded from Kaggle. In this section, I create a pandas dataframe object so I can work with the data.

In [None]:
coursera = pd.read_csv('C:\py\Projects\TuringCollege\Coursera\DataSet\coursera.csv', index_col = 0)

## Basic Information

In this section, I will display the following information about this dataset:

1. Number of rows and columns
2. Total number of data enteries
3. The first 5 rows
4. The data types in this dataset

### Number of Rows and Columns 

This dataset is made of 50 rows and 16 columns.

In [None]:
coursera.shape

### Total Number of Entries 

This dataset is made of 50 rows and 16 columns.

In [None]:
coursera.size

### The First Five Rows

In [None]:
pd.set_option("display.max.columns", None) 
coursera.sort_index(inplace=True)
coursera.head()

### Data Types

In [None]:
coursera.dtypes()

## Data Cleaning

### NaN or Null Values

In [None]:
coursera.isnull().sum()

### Duplicate Values

In [None]:
coursera[coursera.duplicated(keep = False)].sum()

### Alignment

### Modification of Column Names

In [None]:
coursera.rename(columns = {'course_title':'Title', 
                           'course_organization':'Organization', 
                           'course_Certificate_type':'Certificate',
                           'course_rating':'Rating',
                          'course_difficulty':'Difficulty',
                          'course_students_enrolled':'Enrollment'}, inplace = True)

In [None]:
coursera.head()

### Modification of Certificate Column

In [None]:
coursera['Certificate'] = coursera['Certificate'].str.title()
coursera['Certificate'] = coursera['Certificate'].str.replace(r'Certificate', '')
coursera.head(10)

### Modification of Enrollment Column

In [None]:
coursera['Symbol'] = coursera['Enrollment'].str[-1:]

In [None]:
pd.set_option("max_rows", None)
coursera['Enrollment'] = coursera['Enrollment'].str.extract(r'(\d+[.\d]*)').astype(float)

In [None]:
coursera.loc[coursera['Symbol'] == 'k', 'Multiple'] = 1000
coursera.loc[coursera['Symbol'] == 'm', 'Multiple'] = 1000000
coursera['Multiple'] = coursera['Multiple'].astype(int)

In [None]:
coursera['Enrolled'] = coursera['Enrollment'] * coursera['Multiple']
coursera['Enrolled'] = coursera['Enrolled'].astype(float)
pd.options.display.float_format = '{:,.0f}'.format

In [None]:
coursera = coursera.drop(['Symbol', 'Multiple', 'Enrollment'], axis = 1)
coursera.head(10)

## Descriptive Analysis

In [None]:
coursera.describe()

## Outliers

In [None]:
Ignores FutureWarning message that appears with the code below.

warnings.simplefilter(action = "ignore", category = FutureWarning) 

Q1 = top_fifty.quantile(0.25)
Q3 = top_fifty.quantile(0.75)
IQR = Q3 - Q1

outliers_df = (top_fifty < (Q1 - 1.5 * IQR)) | (
    top_fifty > (Q3 + 1.5 * IQR)
)

((top_fifty < (Q1 - 1.5 * IQR)) | (top_fifty > (Q3 + 1.5 * IQR))).sum()

In [None]:
fig, axes = plt.subplots(3, 3, figsize=(20, 20))

sns.boxplot(ax=axes[0, 0], data = top_fifty, x = top_fifty ['acousticness'])
sns.boxplot(ax=axes[0, 1], data = top_fifty, x = top_fifty ['danceability'])
sns.boxplot(ax=axes[0, 2], data = top_fifty, x = top_fifty ['duration_ms'])
sns.boxplot(ax=axes[1, 0], data = top_fifty, x = top_fifty ['instrumentalness'])
sns.boxplot(ax=axes[1, 1], data = top_fifty, x = top_fifty ['liveness'])
sns.boxplot(ax=axes[1, 2], data = top_fifty, x = top_fifty ['loudness'])
sns.boxplot(ax=axes[2, 0], data = top_fifty, x = top_fifty ['speechiness'])

fig.delaxes(ax = axes[2,1]) 
fig.delaxes(ax = axes[2,2]) 

## Exploratory Data Analysis (EDA)

### Sort by Enrolled

In [None]:
sorted_enrollment = coursera.sort_values("Enrolled", axis = 0, ascending = False, inplace = False, na_position ='last')
sorted_enrollment.head(10)

### Sort by Rating

In [None]:
sorted_rating = coursera.sort_values("Rating", axis = 0, ascending = False, inplace = False, na_position ='last')
sorted_rating.head(10)

### Organization vs Enrollment

In [None]:
organization_enrollment = coursera.groupby("Organization")["Enrolled"].sum()
organization_enrollment.sort_values(ascending=False, inplace = True)
organization_enrollment.head(10)

In [None]:
organization_enrollment_df = pd.DataFrame(organization_enrollment)
organization_enrollment_df.head(10)

### Rating vs Enrollment

In [None]:
rating_enrollment = coursera.groupby("Rating")["Enrolled"].sum()
rating_enrollment.sort_index(ascending=False, inplace = True)
rating_enrollment

### Organization vs Rating

In [None]:
ratings_organization = coursera.groupby("Organization")["Rating"].sum()
ratings_organization.sort_values(ascending=False, inplace = True)
ratings_organization.head(10)

### Difficulty vs Enrollment

In [None]:
difficulty_enrollment = coursera.groupby("Difficulty")["Enrolled"].sum()
difficulty_enrollment.sort_values(ascending=False, inplace = True)
difficulty_enrollment

### Certificate vs Enrollment

In [None]:
certificate_enrollment = coursera.groupby("Certificate")["Enrolled"].sum()
certificate_enrollment.sort_values(ascending=False, inplace = True)
certificate_enrollment

In [None]:
size = 25
pad = 25

params = {'legend.fontsize': 'large',
          'figure.figsize': (20,12),
          'axes.labelsize': size,
          'axes.titlesize': size,
          'xtick.labelsize': size*0.75,
          'ytick.labelsize': size*0.75,
          'axes.titlepad': pad,
          'axes.labelpad': pad,
          'font.family':'times new roman',
         }

plt.rcParams.update(params)

certificate = coursera['Certificate'].values
enrollment = coursera['Enrolled'].values
plt.xlabel('Certificate')
plt.ylabel('Enrolled')
plt.title('Number of Students Enrolled for Each Certificate Type')

plt.bar(certificate,enrollment, width = 0.5, color = ('mediumseagreen'))
plt.show();

In [None]:
plt.bar(pos, popularity, align = 'center')
plt.xticks(pos, languages)
plt.ylabel('% Popularity')
plt.title('Top 5 Languages for Math & Data \nby % popularity on Stack Overflow', alpha=0.8)

plt.show()

In [None]:
coursera.head()