### Course Recommendation System using Udemy Dataset


#### Algorithm:
* Cosine Similarity
* Linear Similarity


#### Workflow:
* Dataset
* Vectorized Our Dataset
* Cosine Similarity Matrix
* ID.Score
* Recommendation

In [39]:
# Load EDA packages
import pandas as pd
import neattext.functions as nfx


In [40]:
#Load Machine Learning/Recommendation packages
from sklearn.feature_extraction.text import CountVectorizer,TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity,linear_kernel


In [47]:
# Loading Dataset
df=pd.read_csv("./data/udemy_courses.csv")

In [48]:
df.head()

Unnamed: 0,course_id,course_title,url,is_paid,price,num_subscribers,num_reviews,num_lectures,level,content_duration,published_timestamp,subject
0,1070968,Ultimate Investment Banking Course,https://www.udemy.com/ultimate-investment-bank...,True,200,2147,23,51,All Levels,1.5,2017-01-18T20:58:58Z,Business Finance
1,1113822,Complete GST Course & Certification - Grow You...,https://www.udemy.com/goods-and-services-tax/,True,75,2792,923,274,All Levels,39.0,2017-03-09T16:34:20Z,Business Finance
2,1006314,Financial Modeling for Business Analysts and C...,https://www.udemy.com/financial-modeling-for-b...,True,45,2174,74,51,Intermediate Level,2.5,2016-12-19T19:26:30Z,Business Finance
3,1210588,Beginner to Pro - Financial Analysis in Excel ...,https://www.udemy.com/complete-excel-finance-c...,True,95,2451,11,36,All Levels,3.0,2017-05-30T20:07:24Z,Business Finance
4,1011058,How To Maximize Your Profits Trading Options,https://www.udemy.com/how-to-maximize-your-pro...,True,200,1276,45,26,Intermediate Level,2.0,2016-12-13T14:57:18Z,Business Finance


In [44]:
df['course_title']

0                      Ultimate Investment Banking Course
1       Complete GST Course & Certification - Grow You...
2       Financial Modeling for Business Analysts and C...
3       Beginner to Pro - Financial Analysis in Excel ...
4            How To Maximize Your Profits Trading Options
                              ...                        
3673    Learn jQuery from Scratch - Master of JavaScri...
3674    How To Design A WordPress Website With No Codi...
3675                        Learn and Build using Polymer
3676    CSS Animations: Create Amazing Effects on Your...
3677    Using MODX CMS to Build Websites: A Beginner's...
Name: course_title, Length: 3678, dtype: object

In [27]:
dir(nfx)

['BTC_ADDRESS_REGEX',
 'CURRENCY_REGEX',
 'CURRENCY_SYMB_REGEX',
 'Counter',
 'DATE_REGEX',
 'EMAIL_REGEX',
 'EMOJI_REGEX',
 'HASTAG_REGEX',
 'MASTERCard_REGEX',
 'MD5_SHA_REGEX',
 'MOST_COMMON_PUNCT_REGEX',
 'NUMBERS_REGEX',
 'PHONE_REGEX',
 'PoBOX_REGEX',
 'SPECIAL_CHARACTERS_REGEX',
 'STOPWORDS',
 'STOPWORDS_de',
 'STOPWORDS_en',
 'STOPWORDS_es',
 'STOPWORDS_fr',
 'STOPWORDS_ru',
 'STOPWORDS_yo',
 'STREET_ADDRESS_REGEX',
 'TextFrame',
 'URL_PATTERN',
 'USER_HANDLES_REGEX',
 'VISACard_REGEX',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__generate_text',
 '__loader__',
 '__name__',
 '__numbers_dict',
 '__package__',
 '__spec__',
 '_lex_richness_herdan',
 '_lex_richness_maas_ttr',
 'clean_text',
 'defaultdict',
 'digit2words',
 'extract_btc_address',
 'extract_currencies',
 'extract_currency_symbols',
 'extract_dates',
 'extract_emails',
 'extract_emojis',
 'extract_hashtags',
 'extract_html_tags',
 'extract_mastercard_addr',
 'extract_md5sha',
 'extract_numbers',
 'extr

In [49]:
# Cleaning Text:stopwords,special character
df['clean_course_title']=df['course_title'].apply(nfx.remove_stopwords)

In [51]:
# Cleaning Text:stopwords,special character
df['clean_course_title']=df['clean_course_title'].apply(nfx.remove_special_characters)

In [52]:
df[['clean_course_title','course_title']]

Unnamed: 0,clean_course_title,course_title
0,Ultimate Investment Banking Course,Ultimate Investment Banking Course
1,Complete GST Course Certification Grow Practice,Complete GST Course & Certification - Grow You...
2,Financial Modeling Business Analysts Consultants,Financial Modeling for Business Analysts and C...
3,Beginner Pro Financial Analysis Excel 2017,Beginner to Pro - Financial Analysis in Excel ...
4,Maximize Profits Trading Options,How To Maximize Your Profits Trading Options
...,...,...
3673,Learn jQuery Scratch Master JavaScript library,Learn jQuery from Scratch - Master of JavaScri...
3674,Design WordPress Website Coding,How To Design A WordPress Website With No Codi...
3675,Learn Build Polymer,Learn and Build using Polymer
3676,CSS Animations Create Amazing Effects Website,CSS Animations: Create Amazing Effects on Your...


In [53]:
#Vectorize our Text
count_vect=CountVectorizer()
cv_mat=count_vect.fit_transform(df['clean_course_title'])

In [54]:
#Sparse
cv_mat

<3678x3559 sparse matrix of type '<class 'numpy.int64'>'
	with 18333 stored elements in Compressed Sparse Row format>

In [55]:
#Dense
cv_mat.todense()

matrix([[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]], dtype=int64)

In [56]:
df_cv_words=pd.DataFrame(cv_mat.todense(),columns=count_vect.get_feature_names())



In [57]:
df_cv_words.head()

Unnamed: 0,000005,001,01,02,10,100,101,101master,102,10k,...,zend,zero,zerotohero,zf2,zinsen,zoho,zombie,zu,zuhause,zur
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [61]:
#Cosine Similarity Matrix
cosine_sim_mat=cosine_similarity(cv_mat)

In [62]:
cosine_sim_mat

array([[1.        , 0.20412415, 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.20412415, 1.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 1.        , ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.        , 0.        , 0.        , ..., 1.        , 0.        ,
        0.23570226],
       [0.        , 0.        , 0.        , ..., 0.        , 1.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.23570226, 0.        ,
        1.        ]])

In [64]:
cosine_sim_mat[0:10]

array([[1.        , 0.20412415, 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.20412415, 1.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 1.        , ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.25      , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ]])

In [None]:
# import seaborn as sns
# sns.heatmap(cosine_sim_mat[0:10],annot=True)

In [65]:
df.head()

Unnamed: 0,course_id,course_title,url,is_paid,price,num_subscribers,num_reviews,num_lectures,level,content_duration,published_timestamp,subject,clean_course_title
0,1070968,Ultimate Investment Banking Course,https://www.udemy.com/ultimate-investment-bank...,True,200,2147,23,51,All Levels,1.5,2017-01-18T20:58:58Z,Business Finance,Ultimate Investment Banking Course
1,1113822,Complete GST Course & Certification - Grow You...,https://www.udemy.com/goods-and-services-tax/,True,75,2792,923,274,All Levels,39.0,2017-03-09T16:34:20Z,Business Finance,Complete GST Course Certification Grow Practice
2,1006314,Financial Modeling for Business Analysts and C...,https://www.udemy.com/financial-modeling-for-b...,True,45,2174,74,51,Intermediate Level,2.5,2016-12-19T19:26:30Z,Business Finance,Financial Modeling Business Analysts Consultants
3,1210588,Beginner to Pro - Financial Analysis in Excel ...,https://www.udemy.com/complete-excel-finance-c...,True,95,2451,11,36,All Levels,3.0,2017-05-30T20:07:24Z,Business Finance,Beginner Pro Financial Analysis Excel 2017
4,1011058,How To Maximize Your Profits Trading Options,https://www.udemy.com/how-to-maximize-your-pro...,True,200,1276,45,26,Intermediate Level,2.0,2016-12-13T14:57:18Z,Business Finance,Maximize Profits Trading Options


In [67]:
#Get Course ID/index
course_indices=pd.Series(df.index,index=df['course_title']).drop_duplicates()

In [68]:
course_indices

course_title
Ultimate Investment Banking Course                                0
Complete GST Course & Certification - Grow Your CA Practice       1
Financial Modeling for Business Analysts and Consultants          2
Beginner to Pro - Financial Analysis in Excel 2017                3
How To Maximize Your Profits Trading Options                      4
                                                               ... 
Learn jQuery from Scratch - Master of JavaScript library       3673
How To Design A WordPress Website With No Coding At All        3674
Learn and Build using Polymer                                  3675
CSS Animations: Create Amazing Effects on Your Website         3676
Using MODX CMS to Build Websites: A Beginner's Guide           3677
Length: 3678, dtype: int64

In [69]:
course_indices['Beginner to Pro - Financial Analysis in Excel 2017']

3

In [70]:
idx=course_indices['Beginner to Pro - Financial Analysis in Excel 2017']

In [71]:
idx

3

In [72]:
scores=list(enumerate(cosine_sim_mat[idx]))

In [73]:
scores

[(0, 0.0),
 (1, 0.0),
 (2, 0.18257418583505539),
 (3, 1.0000000000000002),
 (4, 0.0),
 (5, 0.1666666666666667),
 (6, 0.0),
 (7, 0.0),
 (8, 0.0),
 (9, 0.0),
 (10, 0.0),
 (11, 0.0),
 (12, 0.18257418583505539),
 (13, 0.0),
 (14, 0.0),
 (15, 0.0),
 (16, 0.1666666666666667),
 (17, 0.0),
 (18, 0.0),
 (19, 0.2357022603955159),
 (20, 0.0),
 (21, 0.0),
 (22, 0.1543033499620919),
 (23, 0.18257418583505539),
 (24, 0.0),
 (25, 0.2886751345948129),
 (26, 0.0),
 (27, 0.0),
 (28, 0.0),
 (29, 0.1543033499620919),
 (30, 0.0),
 (31, 0.0),
 (32, 0.0),
 (33, 0.0),
 (34, 0.0),
 (35, 0.1543033499620919),
 (36, 0.0),
 (37, 0.0),
 (38, 0.6666666666666669),
 (39, 0.18257418583505539),
 (40, 0.36514837167011077),
 (41, 0.0),
 (42, 0.18257418583505539),
 (43, 0.0),
 (44, 0.0),
 (45, 0.3333333333333334),
 (46, 0.0),
 (47, 0.0),
 (48, 0.0),
 (49, 0.0),
 (50, 0.0),
 (51, 0.0),
 (52, 0.0),
 (53, 0.2357022603955159),
 (54, 0.0),
 (55, 0.0),
 (56, 0.0),
 (57, 0.0),
 (58, 0.0),
 (59, 0.0),
 (60, 0.5443310539518174),
 (

In [74]:
#Sort our scores per cosine score
sorted_scores=sorted(scores,key=lambda x:x[1],reverse=True)

In [75]:
sorted_scores

[(3, 1.0000000000000002),
 (38, 0.6666666666666669),
 (60, 0.5443310539518174),
 (1193, 0.5000000000000001),
 (739, 0.4714045207910318),
 (647, 0.4629100498862757),
 (980, 0.4629100498862757),
 (101, 0.4082482904638631),
 (130, 0.4082482904638631),
 (168, 0.4082482904638631),
 (257, 0.4082482904638631),
 (447, 0.4082482904638631),
 (543, 0.4082482904638631),
 (572, 0.4082482904638631),
 (726, 0.4082482904638631),
 (785, 0.4082482904638631),
 (829, 0.4082482904638631),
 (935, 0.4082482904638631),
 (1095, 0.4082482904638631),
 (1113, 0.4082482904638631),
 (1181, 0.4082482904638631),
 (1843, 0.4082482904638631),
 (1135, 0.408248290463863),
 (40, 0.36514837167011077),
 (247, 0.36514837167011077),
 (250, 0.36514837167011077),
 (262, 0.36514837167011077),
 (657, 0.36514837167011077),
 (744, 0.36514837167011077),
 (844, 0.36514837167011077),
 (1212, 0.36514837167011077),
 (45, 0.3333333333333334),
 (268, 0.3333333333333334),
 (270, 0.3333333333333334),
 (393, 0.3333333333333334),
 (419, 0.333

In [76]:
#Omit the First Value/itself
sorted_scores[1:]

[(38, 0.6666666666666669),
 (60, 0.5443310539518174),
 (1193, 0.5000000000000001),
 (739, 0.4714045207910318),
 (647, 0.4629100498862757),
 (980, 0.4629100498862757),
 (101, 0.4082482904638631),
 (130, 0.4082482904638631),
 (168, 0.4082482904638631),
 (257, 0.4082482904638631),
 (447, 0.4082482904638631),
 (543, 0.4082482904638631),
 (572, 0.4082482904638631),
 (726, 0.4082482904638631),
 (785, 0.4082482904638631),
 (829, 0.4082482904638631),
 (935, 0.4082482904638631),
 (1095, 0.4082482904638631),
 (1113, 0.4082482904638631),
 (1181, 0.4082482904638631),
 (1843, 0.4082482904638631),
 (1135, 0.408248290463863),
 (40, 0.36514837167011077),
 (247, 0.36514837167011077),
 (250, 0.36514837167011077),
 (262, 0.36514837167011077),
 (657, 0.36514837167011077),
 (744, 0.36514837167011077),
 (844, 0.36514837167011077),
 (1212, 0.36514837167011077),
 (45, 0.3333333333333334),
 (268, 0.3333333333333334),
 (270, 0.3333333333333334),
 (393, 0.3333333333333334),
 (419, 0.3333333333333334),
 (721, 0.3

In [77]:
#Selected Courses Indices
selected_course_indices=[i[0] for i in sorted_scores[1:]]

In [78]:
selected_course_indices

[38,
 60,
 1193,
 739,
 647,
 980,
 101,
 130,
 168,
 257,
 447,
 543,
 572,
 726,
 785,
 829,
 935,
 1095,
 1113,
 1181,
 1843,
 1135,
 40,
 247,
 250,
 262,
 657,
 744,
 844,
 1212,
 45,
 268,
 270,
 393,
 419,
 721,
 1825,
 1832,
 1836,
 1840,
 1852,
 1962,
 2275,
 2459,
 326,
 378,
 610,
 940,
 1013,
 1839,
 2154,
 2945,
 25,
 423,
 473,
 930,
 1878,
 1951,
 1970,
 2193,
 490,
 235,
 19,
 53,
 99,
 119,
 140,
 178,
 207,
 224,
 245,
 263,
 287,
 300,
 401,
 412,
 452,
 479,
 519,
 534,
 547,
 623,
 641,
 719,
 763,
 778,
 803,
 822,
 838,
 894,
 928,
 936,
 943,
 972,
 981,
 988,
 1009,
 1074,
 1082,
 1156,
 1579,
 2235,
 2239,
 2266,
 2435,
 2464,
 2740,
 2909,
 2932,
 3107,
 3274,
 131,
 252,
 265,
 322,
 333,
 334,
 374,
 398,
 415,
 458,
 482,
 486,
 497,
 529,
 552,
 561,
 574,
 611,
 691,
 708,
 759,
 762,
 771,
 806,
 841,
 862,
 863,
 887,
 905,
 986,
 1023,
 1024,
 1101,
 1131,
 1141,
 1291,
 1549,
 1580,
 1586,
 1770,
 1815,
 1862,
 1956,
 1984,
 1988,
 1999,
 2042,
 2070

In [80]:
selected_course_scores=[i[1] for i in sorted_scores[1:]]

In [84]:
recommended_result=df['course_title'].iloc[selected_course_indices]

In [85]:
rec_df=pd.DataFrame(recommended_result)

In [86]:
rec_df.head()

Unnamed: 0,course_title
38,Beginner to Pro in Excel: Financial Modeling a...
60,Excel Crash Course: Master Excel for Financial...
1193,Financial Modeling and Valuation: Complete Beg...
739,Financial Ratios Using Excel
647,Financial Statements Analysis: Learn to Invest...


In [87]:
rec_df['similarity_scores']=selected_course_scores

In [88]:
rec_df

Unnamed: 0,course_title,similarity_scores
38,Beginner to Pro in Excel: Financial Modeling a...,0.666667
60,Excel Crash Course: Master Excel for Financial...,0.544331
1193,Financial Modeling and Valuation: Complete Beg...,0.500000
739,Financial Ratios Using Excel,0.471405
647,Financial Statements Analysis: Learn to Invest...,0.462910
...,...,...
3673,Learn jQuery from Scratch - Master of JavaScri...,0.000000
3674,How To Design A WordPress Website With No Codi...,0.000000
3675,Learn and Build using Polymer,0.000000
3676,CSS Animations: Create Amazing Effects on Your...,0.000000


In [90]:
def recommend_course(title,num_of_rec=10):
    #ID for title
    idx=course_indices[title]
    #Course Indices
    #Search inside cosine_sim_mat
    scores=list(enumerate(cosine_sim_mat[idx]))
    #Scores
    #Sort Scores
    sorted_scores=sorted(scores,key=lambda x:x[1],reverse=True)
    #Recommendation
    selected_course_indices=[i[0] for i in sorted_scores[1:]]
    selected_course_scores=[i[1] for i in sorted_scores[1:]]
    result=df['course_title'].iloc[selected_course_indices]
    rec_df=pd.DataFrame(result)
    rec_df['similarity_scores']=selected_course_scores
    return rec_df.head(num_of_rec)

In [91]:
recommend_course('Trading Options Basics')

Unnamed: 0,course_title,similarity_scores
95,Options Trading 101: The Basics,0.866025
193,Trading Options For Consistent Returns: Option...,0.816497
861,Basics of Trading,0.816497
66,Options Trading Basics (3-Course Bundle),0.774597
800,Trading: Basics of Trading for Beginners,0.707107
953,Options Basics & Trading With Small Capital! -...,0.707107
43,Options Trading - How to Win with Weekly Options,0.654654
94,Intermediate Options trading concepts for Stoc...,0.612372
136,Forex Trading with Fixed 'Risk through Options...,0.612372
442,The Advantages of ETF Options and Index Option...,0.612372
