# Doc2Vec Cosine Similarity

#### :brief: This program finds similar courses based off course title, overview description, and/or target audience.  We convert the paragraphs to doc2vec vectors, then compute the cosine similarity among all the courses.

#### :conclusion: Semantic comparison of course descriptions does not look promising for our purposes of finding replacement courses.  Also, doc2vec on course title only seems to simply return matching titles.
Some matches were good, i.e.   863  Business Analysis and Solution Evaluation  === 866     Documentation and Criteria Used for Business Analysis, but they're interlaced with a lot of bad apples.


What about the pre-built text similarity engine on predictionIO?

In [1]:
!pip install -U gensim

Requirement already up-to-date: gensim in /opt/conda/envs/python2/lib/python2.7/site-packages
Requirement already up-to-date: smart-open>=1.2.1 in /opt/conda/envs/python2/lib/python2.7/site-packages (from gensim)
Requirement already up-to-date: numpy>=1.3 in /opt/conda/envs/python2/lib/python2.7/site-packages (from gensim)
Requirement already up-to-date: scipy>=0.7.0 in /opt/conda/envs/python2/lib/python2.7/site-packages (from gensim)
Requirement already up-to-date: six>=1.5.0 in /opt/conda/envs/python2/lib/python2.7/site-packages (from gensim)
Requirement already up-to-date: boto>=2.32 in /opt/conda/envs/python2/lib/python2.7/site-packages (from smart-open>=1.2.1->gensim)
Requirement already up-to-date: requests in /opt/conda/envs/python2/lib/python2.7/site-packages (from smart-open>=1.2.1->gensim)
Requirement already up-to-date: bz2file in /opt/conda/envs/python2/lib/python2.7/site-packages (from smart-open>=1.2.1->gensim)


In [2]:
import numpy as np
import pandas as pd
import pickle
import urllib
import collections
import re
pd.options.mode.chained_assignment = None  # default='warn'

from gensim import utils
from gensim.models.doc2vec import LabeledSentence
from gensim.models import Doc2Vec
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score,confusion_matrix,accuracy_score,roc_curve
from sklearn.metrics.pairwise import cosine_similarity


def extractCourseDescription(catalog):
    """
    :brief: This function scrapes the website for the course overview description and target audience from the url.
    """
    courseDes = []
    target_audience = []
    for i,r in catalog.iterrows():
        try:
            link = r['course url'] #gather url from dataframe
            if ".htm" in link:
                param, value = link.split(".htm",1)
                link = param + '.htm'
                f = urllib.urlopen(link)
                myfile = f.read()
                courseDes.append(re.findall("<!- OVERVIEW_\[ ->(.*?)<!- ]_OVERVIEW ->", myfile))
                target_audience.append(re.findall("<!- TARGET_AUDIENCE_\[ ->(.*?)<!- ]_TARGET_AUDIENCE ->", myfile))
            else:
                courseDes.append("no description")
                target_audience.append("no target audience")

            # print progress messages
            if i%100==0:
                print ("finished " + str(i) + " out of " +str(len(catalog)) + " records...")

        except Exception as e:
            print(e)
            courseDes.append(e)
            target_audience.append(e)
            pass
            
    catalog["Course Description"] = courseDes
    catalog["Target Audience"] = target_audience
    print("Done.")
    return catalog

In [3]:
"""
:brief: uncomment and run this cell to scrape webpages and update pickle file with course descriptions and target audience.
    - smaller subset, takes less than 10 mins
"""

catalogBusDec = pd.read_csv('immuta/December 05 2016 Catalog Business Courses/December 05 2016 Catalog Business Courses.csv')
catalogBusDecAddedDesc = extractCourseDescription(catalogBusDec)

finished 0 out of 1023 records...
finished 100 out of 1023 records...
finished 200 out of 1023 records...
finished 300 out of 1023 records...
finished 400 out of 1023 records...
finished 500 out of 1023 records...
finished 600 out of 1023 records...
finished 700 out of 1023 records...
finished 800 out of 1023 records...
finished 900 out of 1023 records...
finished 1000 out of 1023 records...
Done.


In [4]:
"""
save catalogBusDecAddedDesc to pickle file
"""
with open('catalog_with_desc.txt','w') as f:
    pickle.dump(catalogBusDecAddedDesc,f)

In [5]:
df = pickle.load(open('/home/jupyter/work/catalog_with_desc.txt', 'rb'))
df.tail(2)

Unnamed: 0,index,language,solution area,curriculum,series,course title,course#,course url,asset type,estimated duration hours,skillport,cd,replaces,Course Description,Target Audience
1021,1021,English,SALES and CUSTOMER FACING SKILLS,TestPreps,Test Preps,TestPrep ITIL Foundation,ib_itlv_a01_tp_enus,http://library.skillport.com/coursedesc/ib_itl...,SkillSoft Testprep Exams,1.0,Released,Released,,[To test your knowledge on the skills and comp...,[Individuals seeking practice in a structured ...
1022,1022,English,SALES and CUSTOMER FACING SKILLS,Mentoring Assets,Mentoring Assets,Mentoring ITIL Foundation,mntitv3f,http://library.skillport.com/coursedesc/mntitv...,SkillSoft Mentoring Assets,,Released,,,[Skillsoft Mentors are available to help stude...,[Individuals who are studying the associated S...


In [6]:
"""
:brief: to identify similar courses, use doc2vec on title, description, and target audience
"""

# df.columns
df['Course Description'] = map(str, df['Course Description'])
df['Target Audience'] = map(str, df['Target Audience'])

# combine desc + aud into a str
df['desc+aud'] = df['Course Description'] + df['Target Audience']
pd.set_option('display.max_colwidth', -1)
df.tail(1)['desc+aud']

1022    ['Skillsoft Mentors are available to help students with their studies for the ITIL Foundation exam. You can reach them by entering a Mentored Chat Room or by using the Email My Mentor service.<br/><br/>* This asset is aligned to the ITIL 2011 Edition publications.']['Individuals who are studying the associated Skillsoft content in preparation for, or to become familiar with, the skills and competencies being measured by the actual certification exam.']
Name: desc+aud, dtype: object

### Modeling Doc2Vec on the whole data
We can infer vectors for any keywords from this model. We then compare this vector with all the document vectors to find the highest cosine similarity.

In [10]:
def Labeled(s,l):
    sentences = []
    for i,talk in enumerate(s):
        sentences.append(LabeledSentence(utils.to_unicode(talk).split(),[l[i]]))
    return sentences

sentences_all = Labeled(df['desc+aud'], range(1405))
model = Doc2Vec(min_count=1, window=10, size=50, sample=1e-4, negative=5, workers=7)  # size=128
model.build_vocab(sentences_all)
model.train(sentences_all, total_examples=model.corpus_count, epochs=model.iter)
# model.train(sentences)
X = []
for doc_id in range(len(sentences_all)):
    inferred_vector = model.infer_vector(sentences_all[doc_id].words)
    X.append(inferred_vector)

## cosine similarity matrix

In [11]:
cosine_similarity(X)[0][0]
cosine_similarity(X).shape

(1023, 1023)

In [12]:
df.columns
df.tail(1)

Unnamed: 0,index,language,solution area,curriculum,series,course title,course#,course url,asset type,estimated duration hours,skillport,cd,replaces,Course Description,Target Audience,desc+aud
1022,1022,English,SALES and CUSTOMER FACING SKILLS,Mentoring Assets,Mentoring Assets,Mentoring ITIL Foundation,mntitv3f,http://library.skillport.com/coursedesc/mntitv3f/summary.htm,SkillSoft Mentoring Assets,,Released,,,['Skillsoft Mentors are available to help students with their studies for the ITIL Foundation exam. You can reach them by entering a Mentored Chat Room or by using the Email My Mentor service.<br/><br/>* This asset is aligned to the ITIL 2011 Edition publications.'],"['Individuals who are studying the associated Skillsoft content in preparation for, or to become familiar with, the skills and competencies being measured by the actual certification exam.']","['Skillsoft Mentors are available to help students with their studies for the ITIL Foundation exam. You can reach them by entering a Mentored Chat Room or by using the Email My Mentor service.<br/><br/>* This asset is aligned to the ITIL 2011 Edition publications.']['Individuals who are studying the associated Skillsoft content in preparation for, or to become familiar with, the skills and competencies being measured by the actual certification exam.']"


In [17]:
similarity_matrix = cosine_similarity(X)

# replace diagonal values with 0
np.fill_diagonal(similarity_matrix, 0)
scores = []
similar_to = []
for i, x in enumerate(similarity_matrix):
    scores.append(max(x))
    similar_to.append(np.argmax(x))
#     print(i, argmax(x), round(max(x), 2))
df['scores'] = scores
df['similar_to'] = similar_to

# create description_of_similar_to
descriptions_of_similar_to = []
titles_of_similar_to = []
for i, x in enumerate(df['similar_to']):
    descriptions_of_similar_to.append(df['Course Description'][x])
    titles_of_similar_to.append(df['course title'][x])
    
df['description_of_similar_to'] = descriptions_of_similar_to
df['titles_of_similar_to'] = titles_of_similar_to
pd.set_option('display.height', 500)
pd.set_option('display.max_rows', 500)

similar_courses = df[['course title', 'desc+aud', 'scores', 'similar_to', 'titles_of_similar_to', 'description_of_similar_to']].sort(['scores'], ascending=False)
similar_courses[similar_courses['scores'] > .9].head(50)

height has been deprecated.





Unnamed: 0,course title,desc+aud,scores,similar_to,titles_of_similar_to,description_of_similar_to
809,Core PMI? Values and Ethical Standards,"['As a project manager, you will inevitably be called upon to address ethical dilemmas. The type and complexity of these dilemmas can vary significantly from balancing the competing interests of stakeholders to adhering to conflicting legal, multi-cultural, and multi-national rules, regulations, and requirements. Addressing these issues is much more complex than simply deciding what is right and what is wrong. In an increasingly global network, project managers must proactively seek to understand cultural diversity, and how to work successfully with multi-national teams. Sensitivity to other groups, their social customs, and their means of doing business is key to success. Often, project managers will need to weigh all competing interests fairly and objectively in order to make the ethical decision that will have the most far-reaching benefits. In this course, learners will explore the values underlying ethical decisions and behaviors as outlined in the PMI\xc2\xae Code of Ethics and Professional Conduct. For each value, learners will be introduced to the integrity aspired to, as well as the mandatory conduct demanded of project managers to effectively manage projects and further promote project management as a profession. Topics covered include the behaviors that align with the core values of responsibility, respect, honesty, and fairness; how to integrate ethics into your project environments; and how to resolve ethical dilemmas. The course provides a foundational knowledge base reflecting the most up-to-date project management information so learners can effectively put principles to work at their own organizations. This course will assist in preparing the learner for the PMI\xc2\xae certification exam. This course is aligned with A Guide to the Project Management Body of Knowledge (PMBOK\xc2\xae Guide) \xe2\x80\x93 Fifth Edition, published by PMI\xc2\xae, Inc., 2013. Copyright and all rights reserved. Material from this publication has been reproduced with the permission of PMI\xc2\xae.']['Existing project managers wishing to get certified in recognition of their skills and experience, or others who wish to train to become accredited project managers.']",0.999989,803,Core PMI? Values and Ethical Standards,"['As a project manager, you will inevitably be called upon to address ethical dilemmas. The type and complexity of these dilemmas can vary significantly from balancing the competing interests of stakeholders to adhering to conflicting legal, multi-cultural, and multi-national rules, regulations, and requirements. Addressing these issues is much more complex than simply deciding what is right and what is wrong. In an increasingly global network, project managers must proactively seek to understand cultural diversity, and how to work successfully with multi-national teams. Sensitivity to other groups, their social customs, and their means of doing business is key to success. Often, project managers will need to weigh all competing interests fairly and objectively in order to make the ethical decision that will have the most far-reaching benefits. In this course, learners will explore the values underlying ethical decisions and behaviors as outlined in the PMI\xc2\xae Code of Ethics and Professional Conduct. For each value, learners will be introduced to the integrity aspired to, as well as the mandatory conduct demanded of project managers to effectively manage projects and further promote project management as a profession. Topics covered include the behaviors that align with the core values of responsibility, respect, honesty, and fairness; how to integrate ethics into your project environments; and how to resolve ethical dilemmas. The course provides a foundational knowledge base reflecting the most up-to-date project management information so learners can effectively put principles to work at their own organizations. This course will assist in preparing the learner for the PMI\xc2\xae certification exam. This course is aligned with A Guide to the Project Management Body of Knowledge (PMBOK\xc2\xae Guide) \xe2\x80\x93 Fifth Edition, published by PMI\xc2\xae, Inc., 2013. Copyright and all rights reserved. Material from this publication has been reproduced with the permission of PMI\xc2\xae.']"
803,Core PMI? Values and Ethical Standards,"['As a project manager, you will inevitably be called upon to address ethical dilemmas. The type and complexity of these dilemmas can vary significantly from balancing the competing interests of stakeholders to adhering to conflicting legal, multi-cultural, and multi-national rules, regulations, and requirements. Addressing these issues is much more complex than simply deciding what is right and what is wrong. In an increasingly global network, project managers must proactively seek to understand cultural diversity, and how to work successfully with multi-national teams. Sensitivity to other groups, their social customs, and their means of doing business is key to success. Often, project managers will need to weigh all competing interests fairly and objectively in order to make the ethical decision that will have the most far-reaching benefits. In this course, learners will explore the values underlying ethical decisions and behaviors as outlined in the PMI\xc2\xae Code of Ethics and Professional Conduct. For each value, learners will be introduced to the integrity aspired to, as well as the mandatory conduct demanded of project managers to effectively manage projects and further promote project management as a profession. Topics covered include the behaviors that align with the core values of responsibility, respect, honesty, and fairness; how to integrate ethics into your project environments; and how to resolve ethical dilemmas. The course provides a foundational knowledge base reflecting the most up-to-date project management information so learners can effectively put principles to work at their own organizations. This course will assist in preparing the learner for the PMI\xc2\xae certification exam. This course is aligned with A Guide to the Project Management Body of Knowledge (PMBOK\xc2\xae Guide) \xe2\x80\x93 Fifth Edition, published by PMI\xc2\xae, Inc., 2013. Copyright and all rights reserved. Material from this publication has been reproduced with the permission of PMI\xc2\xae.']['Existing project managers wishing to get certified in recognition of their skills and experience, or others who wish to train to become accredited project managers.']",0.999989,809,Core PMI? Values and Ethical Standards,"['As a project manager, you will inevitably be called upon to address ethical dilemmas. The type and complexity of these dilemmas can vary significantly from balancing the competing interests of stakeholders to adhering to conflicting legal, multi-cultural, and multi-national rules, regulations, and requirements. Addressing these issues is much more complex than simply deciding what is right and what is wrong. In an increasingly global network, project managers must proactively seek to understand cultural diversity, and how to work successfully with multi-national teams. Sensitivity to other groups, their social customs, and their means of doing business is key to success. Often, project managers will need to weigh all competing interests fairly and objectively in order to make the ethical decision that will have the most far-reaching benefits. In this course, learners will explore the values underlying ethical decisions and behaviors as outlined in the PMI\xc2\xae Code of Ethics and Professional Conduct. For each value, learners will be introduced to the integrity aspired to, as well as the mandatory conduct demanded of project managers to effectively manage projects and further promote project management as a profession. Topics covered include the behaviors that align with the core values of responsibility, respect, honesty, and fairness; how to integrate ethics into your project environments; and how to resolve ethical dilemmas. The course provides a foundational knowledge base reflecting the most up-to-date project management information so learners can effectively put principles to work at their own organizations. This course will assist in preparing the learner for the PMI\xc2\xae certification exam. This course is aligned with A Guide to the Project Management Body of Knowledge (PMBOK\xc2\xae Guide) \xe2\x80\x93 Fifth Edition, published by PMI\xc2\xae, Inc., 2013. Copyright and all rights reserved. Material from this publication has been reproduced with the permission of PMI\xc2\xae.']"
120,Effective Critical Analysis of Business Reports,"['Effective decision making requires sound analytics. This impact explores the pitfalls of basing decisions on faulty logic.']['Students preparing to enter the workforce, entry level employees who have just entered the workforce and mid-level employees looking to refresh their skills.']",0.999951,155,Effective Critical Analysis of Business Reports,['Effective decision making requires sound analytics. This impact explores the pitfalls of basing decisions on faulty logic.']
155,Effective Critical Analysis of Business Reports,"['Effective decision making requires sound analytics. This impact explores the pitfalls of basing decisions on faulty logic.']['Students preparing to enter the workforce, entry level employees who have just entered the workforce and mid-level employees looking to refresh their skills.']",0.999951,120,Effective Critical Analysis of Business Reports,['Effective decision making requires sound analytics. This impact explores the pitfalls of basing decisions on faulty logic.']
267,Securitization and Asset-backed Securities,"[""Securitization has become one of the leading tools used by banks and other financial institutions to manage their balance sheet by transferring assets off the balance sheet \xe2\x80\x93 typically loans. The securitization process and the products involved are highly complex financial tools and transactions, which can limit investors' ability to monitor and manage risk. Securitization became a widespread practice in the 1970s, although examples of it can be found much earlier. It has experienced immense growth globally, by some estimates to around $13 trillion. Securitizations involving mortgages and other assets, such as credit card receivables, housing and auto loans, airline receivables, and student loans, have become common place. Securitization brings many advantages to the issuer in the form of lower funding costs, reduction in asset and liability mismatching, as well as lower capital requirements and transfer of credit risk. This course introduces the concept of securitization and important aspects relating to it, such as the role of a special purpose vehicle and note tranching. It examines mortgage-backed securities (MBS) and the major risks faced by investors. Different asset-backed security (ABS) structures, which include securities backed by credit card receivables, home equity loans, auto loans, and student loans, are presented at a high level. Then, the course presents the structure of collateralized debt obligations (CDOs) that include collateralized bond obligations (CBOs) and collateralized loan obligations (CLO), as well as how regular CDOs and Synthetic CDOs are used for arbitrage and balance sheet transactions.""]['Financial services professionals, consultants, and sales professionals interested in providing or selling products and services to banks, investment companies, and other financial corporations, and everyone interested in creation and use of credit derivative instruments']",0.999945,229,Commodity and Energy Markets and Derivatives,"['The commodities and energy markets are generally considered to be some of the largest markets in existence. Every day billions of dollars of commodities such as oil, coal, gold, and sugar are bought and sold between various traders acting on behalf of producers, manufacturers, and financial institutions. The derivatives market has seen explosive growth around these markets over the decades. Although traditionally used by farmers and other manufacturers to protect themselves against price movements that would affect their profitability, derivatives are now widely used by many financial institutions as a way to profit on speculation and arbitrage. Financial institutions, such as banks and hedge funds, are profit seekers by their inherent nature and are willing to take risks in return for these profits. Hedgers, on the other hand, look to protect themselves by removing risk or hedging. These two different points of view complement each other and have allowed the market to grow rapidly and become quite liquid\xe2\x80\x94 for every hedger there is almost always a speculator willing to take the opposite side of the trade. This course covers the fundamental characteristics of the commodities and energy markets. We examine the types of underlying products that are traded, such as various metals and energy producing chemicals, and we also examine what they are used for. We then identify various financial contracts known as derivatives whereby these products may be bought or sold at a predetermined price, in specific quantities, and for delivery at a future date. The course focuses on the features of forward and futures commodity contracts and swap and option commodity contracts and their uses for hedging, speculation, and arbitrage.']"
229,Commodity and Energy Markets and Derivatives,"['The commodities and energy markets are generally considered to be some of the largest markets in existence. Every day billions of dollars of commodities such as oil, coal, gold, and sugar are bought and sold between various traders acting on behalf of producers, manufacturers, and financial institutions. The derivatives market has seen explosive growth around these markets over the decades. Although traditionally used by farmers and other manufacturers to protect themselves against price movements that would affect their profitability, derivatives are now widely used by many financial institutions as a way to profit on speculation and arbitrage. Financial institutions, such as banks and hedge funds, are profit seekers by their inherent nature and are willing to take risks in return for these profits. Hedgers, on the other hand, look to protect themselves by removing risk or hedging. These two different points of view complement each other and have allowed the market to grow rapidly and become quite liquid\xe2\x80\x94 for every hedger there is almost always a speculator willing to take the opposite side of the trade. This course covers the fundamental characteristics of the commodities and energy markets. We examine the types of underlying products that are traded, such as various metals and energy producing chemicals, and we also examine what they are used for. We then identify various financial contracts known as derivatives whereby these products may be bought or sold at a predetermined price, in specific quantities, and for delivery at a future date. The course focuses on the features of forward and futures commodity contracts and swap and option commodity contracts and their uses for hedging, speculation, and arbitrage.']['Financial services professionals, consultants, sales professionals interested in providing or selling products and services to banks, investment companies, other financial corporations, and everyone interested in understanding commodity and financial futures and forwards']",0.999945,267,Securitization and Asset-backed Securities,"[""Securitization has become one of the leading tools used by banks and other financial institutions to manage their balance sheet by transferring assets off the balance sheet \xe2\x80\x93 typically loans. The securitization process and the products involved are highly complex financial tools and transactions, which can limit investors' ability to monitor and manage risk. Securitization became a widespread practice in the 1970s, although examples of it can be found much earlier. It has experienced immense growth globally, by some estimates to around $13 trillion. Securitizations involving mortgages and other assets, such as credit card receivables, housing and auto loans, airline receivables, and student loans, have become common place. Securitization brings many advantages to the issuer in the form of lower funding costs, reduction in asset and liability mismatching, as well as lower capital requirements and transfer of credit risk. This course introduces the concept of securitization and important aspects relating to it, such as the role of a special purpose vehicle and note tranching. It examines mortgage-backed securities (MBS) and the major risks faced by investors. Different asset-backed security (ABS) structures, which include securities backed by credit card receivables, home equity loans, auto loans, and student loans, are presented at a high level. Then, the course presents the structure of collateralized debt obligations (CDOs) that include collateralized bond obligations (CBOs) and collateralized loan obligations (CLO), as well as how regular CDOs and Synthetic CDOs are used for arbitrage and balance sheet transactions.""]"
241,Anti-money Laundering and Global Initiatives,"['Money laundering is an illegal process of using legitimate means to conceal the true source of funds that are acquired through illegal activities, including drug trafficking and terrorism. This activity has grown and become more sophisticated as it keeps pace with modern technology, leaving authorities world-wide with the difficult task of trying to stay one step ahead of the criminals. Various unreliable estimates have been produced to measure this practice globally, with some estimates quoting sums as large as percentage points of the global economy. Because of this, countries across the world have created joint programs to combat and prevent money laundering. This course introduces the concept of money laundering and identifies the different types, enabling circumstances and common indicators for this activity. It also acknowledges global initiatives such as the Financial Action Task Force, the Basel Committee, and the International Money Laundering Information Network, among others, that set recommended actions to prevent and combat money laundering. The course then discusses the Customer Identification Program under the US Patriot Act.']['Financial services professionals, consultants, and sales professionals interested in providing or selling products and services to fund managers, insurance companies, and banks, and everyone interested in knowing about banking supervision and anti-money laundering regulations.']",0.999937,229,Commodity and Energy Markets and Derivatives,"['The commodities and energy markets are generally considered to be some of the largest markets in existence. Every day billions of dollars of commodities such as oil, coal, gold, and sugar are bought and sold between various traders acting on behalf of producers, manufacturers, and financial institutions. The derivatives market has seen explosive growth around these markets over the decades. Although traditionally used by farmers and other manufacturers to protect themselves against price movements that would affect their profitability, derivatives are now widely used by many financial institutions as a way to profit on speculation and arbitrage. Financial institutions, such as banks and hedge funds, are profit seekers by their inherent nature and are willing to take risks in return for these profits. Hedgers, on the other hand, look to protect themselves by removing risk or hedging. These two different points of view complement each other and have allowed the market to grow rapidly and become quite liquid\xe2\x80\x94 for every hedger there is almost always a speculator willing to take the opposite side of the trade. This course covers the fundamental characteristics of the commodities and energy markets. We examine the types of underlying products that are traded, such as various metals and energy producing chemicals, and we also examine what they are used for. We then identify various financial contracts known as derivatives whereby these products may be bought or sold at a predetermined price, in specific quantities, and for delivery at a future date. The course focuses on the features of forward and futures commodity contracts and swap and option commodity contracts and their uses for hedging, speculation, and arbitrage.']"
268,Credit-linked and Repackaged Notes,"[""The structured derivatives market gained traction in the early 2000's due to its ability to convert security features, primarily cash flows and maturity, to meet investors' specific needs. This market is composed of complicated combinations of securities and derivatives, allowing for a large degree of flexibility to cater to investor demands. Because potential risks have also become increasingly complicated to identify and measure, it is important for individuals involved in these structured deals to have a good grasp of the dynamics of various transactions and how they may be altered when packaged with others. This course provides an introduction to the basics of credit-linked notes (CLN) and their variations and structures, including total return swap-based CLNs and default-based CLNs. The course also provides an overview of repackaged notes and other synthetic structures dealing with the packaging of derivatives with assets such as synthetic bonds, and callable and puttable asset swaps.""]['Financial services professionals, consultants, and sales professionals interested in providing or selling products and services to banks, investment companies, and other financial corporations, and everyone interested in the creation and use of credit derivative instruments']",0.999928,241,Anti-money Laundering and Global Initiatives,"['Money laundering is an illegal process of using legitimate means to conceal the true source of funds that are acquired through illegal activities, including drug trafficking and terrorism. This activity has grown and become more sophisticated as it keeps pace with modern technology, leaving authorities world-wide with the difficult task of trying to stay one step ahead of the criminals. Various unreliable estimates have been produced to measure this practice globally, with some estimates quoting sums as large as percentage points of the global economy. Because of this, countries across the world have created joint programs to combat and prevent money laundering. This course introduces the concept of money laundering and identifies the different types, enabling circumstances and common indicators for this activity. It also acknowledges global initiatives such as the Financial Action Task Force, the Basel Committee, and the International Money Laundering Information Network, among others, that set recommended actions to prevent and combat money laundering. The course then discusses the Customer Identification Program under the US Patriot Act.']"
755,Core PMI? Values and Ethical Standards,"['As a project manager, you will inevitably be called upon to address ethical dilemmas. The type and complexity of these dilemmas can vary significantly from balancing the competing interests of stakeholders to adhering to conflicting legal, multi-cultural, and multi-national rules, regulations, and requirements. Addressing these issues is much more complex than simply deciding what is right and what is wrong. In an increasingly global network, project managers must proactively seek to understand cultural diversity, and how to work successfully with multi-national teams. Sensitivity to other groups, their social customs, and their means of doing business is key to success. Often, project managers will need to weigh all competing interests fairly and objectively in order to make the ethical decision that will have the most far-reaching benefits. In this course, learners will explore the values underlying ethical decisions and behaviors as outlined in the PMI\xc2\xae Code of Ethics and Professional Conduct. For each value, learners will be introduced to the integrity aspired to, as well as the mandatory conduct demanded of project managers to effectively manage projects and further promote project management as a profession. Topics covered include the behaviors that align with the core values of responsibility, respect, honesty, and fairness; how to integrate ethics into your project environments; and how to resolve ethical dilemmas. The course provides a foundational knowledge base reflecting the most up-to-date project management information so learners can effectively put principles to work at their own organizations. This course will assist in preparing the learner for the PMI\xc2\xae certification exam. This course is aligned with A Guide to the Project Management Body of Knowledge (PMBOK\xc2\xae Guide) \xe2\x80\x93 Fifth Edition, published by PMI\xc2\xae, Inc., 2013. Copyright and all rights reserved. Material from this publication has been reproduced with the permission of PMI\xc2\xae.']['Existing project managers wishing to get certified in recognition of their skills and experience, or others who wish to train to become accredited project managers.']",0.999928,809,Core PMI? Values and Ethical Standards,"['As a project manager, you will inevitably be called upon to address ethical dilemmas. The type and complexity of these dilemmas can vary significantly from balancing the competing interests of stakeholders to adhering to conflicting legal, multi-cultural, and multi-national rules, regulations, and requirements. Addressing these issues is much more complex than simply deciding what is right and what is wrong. In an increasingly global network, project managers must proactively seek to understand cultural diversity, and how to work successfully with multi-national teams. Sensitivity to other groups, their social customs, and their means of doing business is key to success. Often, project managers will need to weigh all competing interests fairly and objectively in order to make the ethical decision that will have the most far-reaching benefits. In this course, learners will explore the values underlying ethical decisions and behaviors as outlined in the PMI\xc2\xae Code of Ethics and Professional Conduct. For each value, learners will be introduced to the integrity aspired to, as well as the mandatory conduct demanded of project managers to effectively manage projects and further promote project management as a profession. Topics covered include the behaviors that align with the core values of responsibility, respect, honesty, and fairness; how to integrate ethics into your project environments; and how to resolve ethical dilemmas. The course provides a foundational knowledge base reflecting the most up-to-date project management information so learners can effectively put principles to work at their own organizations. This course will assist in preparing the learner for the PMI\xc2\xae certification exam. This course is aligned with A Guide to the Project Management Body of Knowledge (PMBOK\xc2\xae Guide) \xe2\x80\x93 Fifth Edition, published by PMI\xc2\xae, Inc., 2013. Copyright and all rights reserved. Material from this publication has been reproduced with the permission of PMI\xc2\xae.']"
1009,The Federal Government Industry Overview: Version 4,"[""The federal government industry is the world's largest service provider with primary responsibility to provide essential services to its citizens funded through its collection of taxes. A federal government is usually comprised of a host of governmental departments and various stakeholders, which can include, but are not limited to, individual states, provinces, and territories. A federal government delivers services, creates and enforces laws, maintains highways, collects taxes, defends national sovereignty, encourages investment, provides education, oversees environmental issues, and performs a host of other services to its citizens. Different nations, and their respective federal governments, all adopt strategies to overcome the challenges they face that can stagnate services, overextend budgets, cause intradepartmental barriers, and impede hiring the best employees. Federal governments are continually driven to find solutions through improving service delivery, reducing costs, opening departmental barriers, and attracting a talented workforce. This course is designed to help learners understand key concepts, terminology, issues, and challenges associated with the federal government industry, and strategies employed to meet some of those challenges. It will identify the main sectors of the federal government industry and its business drivers, and review the key aspects of the federal government business model, its competitive environment, and the current trends in government services. Finally, this course outlines some key challenges that this industry is facing and presents common strategies that the players in the industry are adopting to overcome challenges. This course was updated in 2015.""]['Consulting houses, industry investors, and all size companies that sell products or services to other sectors and industries; and organizations looking for knowledge and key business information in the federal government industry']",0.999921,267,Securitization and Asset-backed Securities,"[""Securitization has become one of the leading tools used by banks and other financial institutions to manage their balance sheet by transferring assets off the balance sheet \xe2\x80\x93 typically loans. The securitization process and the products involved are highly complex financial tools and transactions, which can limit investors' ability to monitor and manage risk. Securitization became a widespread practice in the 1970s, although examples of it can be found much earlier. It has experienced immense growth globally, by some estimates to around $13 trillion. Securitizations involving mortgages and other assets, such as credit card receivables, housing and auto loans, airline receivables, and student loans, have become common place. Securitization brings many advantages to the issuer in the form of lower funding costs, reduction in asset and liability mismatching, as well as lower capital requirements and transfer of credit risk. This course introduces the concept of securitization and important aspects relating to it, such as the role of a special purpose vehicle and note tranching. It examines mortgage-backed securities (MBS) and the major risks faced by investors. Different asset-backed security (ABS) structures, which include securities backed by credit card receivables, home equity loans, auto loans, and student loans, are presented at a high level. Then, the course presents the structure of collateralized debt obligations (CDOs) that include collateralized bond obligations (CBOs) and collateralized loan obligations (CLO), as well as how regular CDOs and Synthetic CDOs are used for arbitrage and balance sheet transactions.""]"


## The semantic approach on the descriptions doesn't work too well for our purposes.

In [16]:
# #  Let's try doc2vec on only the course title.

# sentences_all = Labeled(df['course title'], range(1405))
# model = Doc2Vec(min_count=1, window=10, size=50, sample=1e-4, negative=5, workers=7)  # size=128
# model.build_vocab(sentences_all)
# model.train(sentences_all)
# # model.train(sentences)
# X = []
# for doc_id in range(len(sentences_all)):
#     inferred_vector = model.infer_vector(sentences_all[doc_id].words)
#     X.append(inferred_vector)
    
    
# similarity_matrix = cosine_similarity(X)

# # replace diagonal values with 0
# np.fill_diagonal(similarity_matrix, 0)
# scores = []
# similar_to = []
# for i, x in enumerate(similarity_matrix):
#     scores.append(max(x))
#     similar_to.append(np.argmax(x))
# #     print(i, argmax(x), round(max(x), 2))
# df['scores'] = scores
# df['similar_to'] = similar_to

# # create description_of_similar_to
# descriptions_of_similar_to = []
# titles_of_similar_to = []
# for i, x in enumerate(df['similar_to']):
#     descriptions_of_similar_to.append(df['Course Description'][x])
#     titles_of_similar_to.append(df['course title'][x])
    
# df['description_of_similar_to'] = descriptions_of_similar_to
# df['titles_of_similar_to'] = titles_of_similar_to
# pd.set_option('display.height', 500)
# pd.set_option('display.max_rows', 500)

# similar_courses = df[['course title', 'desc+aud', 'scores', 'similar_to', 'titles_of_similar_to', 'description_of_similar_to']].sort(['scores'], ascending=False)
# similar_courses[similar_courses['scores'] > .9].head(5)