### Text Analysis Procedure

The text analysis will identify the words in the business functions for the listed Google acquisitions vs. acquisitions of other similar companies (e.g. Facebook, Apple). This will create a bag-of-words model that will be used to examine the profiles of the select public companies and determine a similiarity index that will be used as a probability measure for how likely they would be acquired by Alphabet. 

Following this, the the IPO list will be divided into the leaf nodes identified in the decision tree model, from which point the average probabilties will be calculated, and a shortlist of the top 10 companies will be identified from here as the most likely to be acquired by Google as per this model.   

In [1]:
conda install -c anaconda beautifulsoup4

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.


Note: you may need to restart the kernel to use updated packages.


In [2]:
import bs4
import requests
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as bsoup

In [3]:
import numpy as np
import pandas as pd

### Data Cleaning & Scraping 

The retrieved dataset with the list of public companies has some faults in it. Some of these firms are no longer public, others are located on stock exchanges that are less mainstream - making retrieving further data on them more difficult (this project's focus will be on the NYSE and the NASDAQ Global Select - the highest tier on the NASDAQ). Therefore, cleaning this dataset is the first step before we can obtain more information.

In [6]:
ipo_links = pd.read_csv('Final Project/IPO Calendar.csv')
ipos_list = ipo_links[ipo_links['Exchange'].isin(["NASDAQ Global Select", "NYSE"])]
ipos_list['Symbol'][12] = "PACK" #updated ticker
ipos_list['Symbol'][16] = "NEX"  #updated ticker
ipos_list['Symbol'][82] = "RMG"  #updated ticker
ipos_list['Symbol'][98] = "LHC"  #updated ticker
ipos_list['Symbol'][119] = "SHLL"  #updated ticker
ipos_list['Symbol'][143] = "TRNE"  #updated ticker
ipos_list['Symbol'][193] = "RPLA"  #updated ticker
ipos_list['Symbol'][204] = "ALTG"  #updated ticker
ipos_list['Symbol'][290] = "MGY"  #updated ticker
ipos_list['Symbol'][332] = "IR"  #updated ticker
ipos_list['Symbol'][397] = "GIX"  #updated ticker
ipos_list['Symbol'][403] = "VRT"  #updated ticker
ipos_list['Symbol'][414] = "FPAC"  #updated ticker
ipos_list['Symbol'][437] = "RPAY"  #updated ticker
ipos_list['Symbol'][492] = "CCX"  #updated ticker
ipos_list['Symbol'][495] = "ACEL"  #updated ticker
ipos_list['Symbol'][500] = "NFH"  #updated ticker
ipos_list['Symbol'][526] = "PIC"  #updated ticker
ipos_list['Symbol'][527] = "SCPE"  #updated ticker
ipos_list['Symbol'][545] = "OAC"  #updated ticker
ipos_list['Symbol'][595] = "SBE"  #updated ticker
ipos_list['Symbol'][654] = "SPAQ"  #updated ticker
ipos_list['Symbol'][670] = "NSCO"  #updated ticker
ipos_list['Symbol'][679] = "MFAC"  #updated ticker
ipos_list['Symbol'][698] = "SPCE"  #updated ticker
ipos_list['Symbol'][723] = "BEST"  #updated ticker
ipos_list['Symbol'][784] = "CCH"  #updated ticker
ipos_list['Symbol'][829] = "GLEO"  #updated ticker
ipos_list['Symbol'][842] = "VVNT"  #updated ticker
ipos_list['Symbol'][914] = "JIH"  #updated ticker
ipos_list['Symbol'][950] = "LGC"  #updated ticker
ipos_list['Symbol'][990] = "KLR"  #updated ticker
ipos_list = ipos_list.drop([27, 28, 48, 49, 54, 117, 135, 149, 208, 237, 281, 287, 486, 602, 626, 683, 759, 763, 790, 821, 854, 879, 889, 898, 923, 937, 973, 979, 987, 1007, 1011], axis=0) #they went private/bankrupt/not on the same exchange
ipos_list.reset_index(drop=True, inplace=True)
ipos_list

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  exec(code_obj, self.user_global_ns, self.user_ns)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  

Unnamed: 0,Symbol,Company Name,Exchange,Price,Shares,Date,Offer Amount,Actions
0,ILPT,Industrial Logistics Properties Trust,NASDAQ Global Select,24.00,20000000,01/12/2018,480000000,Priced
1,LBRT,Liberty Oilfield Services Inc.,NYSE,17.00,12731092,01/12/2018,216428564,Priced
2,PACK,Ranpak Holdings Corp.,NYSE,10.00,30000000,01/18/2018,300000000,Priced
3,NINE,"Nine Energy Service, Inc.",NYSE,23.00,7000000,01/19/2018,161000000,Priced
4,ADT,ADT Inc.,NYSE,14.00,105000000,01/19/2018,1470000000,Priced
...,...,...,...,...,...,...,...,...
394,CASA,Casa Systems Inc,NASDAQ Global Select,13.00,6000000,12/15/2017,78000000,Priced
395,NMRK,"NEWMARK GROUP, INC.",NASDAQ Global Select,14.00,20000000,12/15/2017,280000000,Priced
396,TRVG,trivago N.V.,NASDAQ Global Select,11.00,26110118,12/16/2016,287211298,Priced
397,AMTB,Amerant Bancorp Inc.,NASDAQ Global Select,13.00,6300000,12/19/2018,81900000,Priced


In [176]:
profile_file = "profiles.csv"
f = open(profile_file, "w")
header = "Description\n"
f.write(header)

def ipo_profile(ticker):
    ipo1_url = f'https://finance.yahoo.com/quote/{ticker}/profile?p={ticker}'
    ipo_Client1 = uReq(ipo1_url)
    ipo1_html = ipo_Client1.read()
    ipo_Client1.close()

    ipo1_soup = bsoup(ipo1_html, "html.parser")
    ipo1_prfl = ipo1_soup.find('p', {'class':'Mt(15px) Lh(1.6)'}).text
    if type(ipo1_prfl) == None:
        return print("")
    else:
        return ipo1_prfl

symbols = ipos_list['Symbol']
for symbol in symbols:
    #print(ipo_profile(symbol))
    f.write(ipo_profile(symbol).replace(",","") + "\n")

f.close()


ILPT is a real estate investment trust, or REIT, that owns and leases industrial and logistics properties throughout the United States. ILPT is managed by the operating subsidiary of The RMR Group Inc. (Nasdaq: RMR), an alternative asset management company that is headquartered in Newton, MA.
Liberty Oilfield Services Inc. provides hydraulic fracturing services to onshore oil and natural gas exploration and production companies in North America. The company offers its services primarily in the Permian Basin, the Eagle Ford Shale, the Denver-Julesburg Basin, the Williston Basin, and the Powder River Basin. Liberty Oilfield Services Inc. was founded in 2011 and is headquartered in Denver, Colorado.
Ranpak Holdings Corp. and its subsidiaries provide product protection solutions for e-commerce and industrial supply chains. The company manufactures and assembles proprietary protective systems that convert kraft paper into a range of packaging and cushioning products to address its customers

Gates Industrial Corporation plc manufactures and sells engineered power transmission and fluid power solutions worldwide. The company offers synchronous or asynchronous belts, including V-belts, CVT belts, and Micro-V belts, as well as related components, such as sprockets, pulleys, water pumps, tensioners, or other accessories; and metal drive components and kits for stationary and mobile drives, engine systems, personal mobility, and vertical lifts application platforms. It also provides fluid power products comprising hydraulic hoses, tubing, and fittings, as well as assemblies; and engine and industrial hoses for use in stationary hydraulics, mobile hydraulics, engine systems, and other industrial applications. The company serves various end markets comprising the construction, agriculture, energy, automotive, transportation, general industrial, consumer products, and others. It sells its engineered products under the Gates brand. The company offers its products to replacement cha

Laureate Education, Inc., together with its subsidiaries, provides higher education programs and services to students through a network of universities and higher education institutions. It offers a range of undergraduate and graduate degree programs primarily in the areas of business and management, medicine and health sciences, and engineering and information technology through campus-based, online, and hybrid programs. The company also operates online institutions that offer professional degree programs primarily for the working adults with undergraduate and graduate degree programs. It provides its services in Brazil, Mexico, Chile, Peru, Australia, China, New Zealand, the United Kingdom, and the United States. The company was formerly known as Sylvan Learning Systems, Inc. and changed its name to Laureate Education, Inc. in May 2004. The company was founded in 1989 and is headquartered in Baltimore, Maryland.
Invitation Homes is the nation's premier single-family home leasing comp

Huami Corporation, a biometric and activity data-driven company, develops, manufactures, and sells smart wearable technological devices in the People's Republic of China. It operates through two segments, Xiaomi Wearable Products, and Self-Branded Products and Others. The company offers smart bands, watches, and scales; and a range of accessories, including bands, watch straps, earphones, sportswear, home gym, treadmill, etc. under the Xiaomi and Amazfit brands. It provides charts and graphs to display analysis of the activity and biometric data collected from users through its Mi Fit and Amazfit mobile apps. Huami Corporation has strategic collaborations with Timex Group to develop smart watches; and AliveCor, Inc. to deliver a medical functionality to wearable devices. The company was founded in 2013 and is headquartered in Hefei, the People's Republic of China.
Gossamer Bio, Inc., a clinical-stage biopharmaceutical company, focuses on discovering, acquiring, developing, and commerci

Hamilton Lane Incorporated is an investment firm specializing in direct and fund of fund investments. It provides following services: separate accounts (customized to each individual client and structured as single client vehicles); specialized strategies (fund-of-funds, secondaries, co-investments, taft-hartley, distribution management); advisory relationships (including due diligence, strategic portfolio planning, monitoring and reporting services); and reporting and analytics solutions. For direct investments, the firm invests in mid and late venture, mature companies, growth equity, emerging growth, distressed debt, later stage, turnarounds, bridge financing, mezzanine financing, and buyouts in middle market companies. For fund of fund investments, it invests in mezzanine, venture capital, private equity, turnaround, secondary investments, real estate, and special situation funds. The firm invests in real estate investments. It also invest in technology, healthcare, education, natu

ProPetro Holding Corp., an oilfield services company, provides pressure pumping and other related services. The company offers hydraulic fracturing services; and a suite of well completion and production services, including cementing, acidizing, coiled tubing, flowback, surface air drilling, and drilling services. It serves the upstream oil and gas companies engaged in the exploration and production of North American unconventional oil and natural gas resources in the Permian Basin. As of December 31, 2018, the company's fleet comprised 20 hydraulic fracturing units with 905,000 hydraulic horsepower. ProPetro Holding Corp. was founded in 2007 and is headquartered in Midland, Texas.
UP Fintech Holding Limited provides online brokerage services focusing on Chinese investors. The company has developed a brokerage platform, which allows investor to trade stocks, options, warrants, and other financial instruments that can be accessed through its APP and website. It offers brokerage and valu

Bilibili Inc. provides online entertainment services for the young generations in the People's Republic of China. It offers a platform that covers a range of genres and media formats, including videos, live broadcasting, and mobile games. Bilibili Inc. has a strategic collaboration agreement with Tencent Holdings Limited for sharing and operating existing and additional anime and games on its platform in China. The company was founded in 2009 and is headquartered in Shanghai, the People's Republic of China.
Homology Medicines, Inc., a genetic medicines company, focuses on transforming the lives of patients suffering from rare genetic diseases. Its proprietary platform is designed to utilize its human hematopoietic stem cell derived adeno-associated virus vectors (AAVHSCs) to deliver genetic medicines in vivo either through a gene therapy or nuclease-free gene editing across a range of genetic disorders. The company's various set of AAVHSCs allows company to target, through a single inj

Replay Acquisition Corp. does not have significant operations. It intends to identify, source, negotiate, and execute a business combination in Latin America. The company was founded in 2018 and is based in New York, New York.
Hess Midstream LP owns, operates, develops, and acquires midstream assets. The company operates through three segments: Gathering; Processing and Storage; and Terminaling and Export. The Gathering segment owns natural gas gathering and crude oil gathering systems; and produced water gathering and disposal facilities. Its gathering systems consists of approximately 1,350 miles of high and low pressure natural gas and natural gas liquids gathering pipelines with capacity of approximately 450 million cubic feet per day; and crude oil gathering system comprises approximately 550 miles of crude oil gathering pipelines. The Processing and Storage segment comprises Tioga Gas Plant, a natural gas processing and fractionation plant located in Tioga, North Dakota; 50% of t

Tocagen Inc., a clinical-stage cancer-selective gene therapy company, focuses on developing and commercializing product candidates designed to activate a patient's immune system against their cancer. Its cancer-selective gene therapy platform is built on retroviral replicating vectors (RRVs), which are designed to deliver therapeutic genes into the DNA of cancer cells. The company's lead product candidate is Toca 511 & Toca FC that is under Phase III clinical trial for recurrent high-grade glioma. It is also developing Toca 511 & Toca FC in a Phase Ib clinical trial for intravenous treatment of advanced cancers. In addition, the company is developing other RRVs to deliver genes to cancer cells against validated immunotherapy targets. The company has a license agreement with ApolloBio to develop and commercialize Toca 511 & Toca FC; and a collaboration agreement with NRG Oncology to develop a clinical trial utilizing Toca 511 & Toca FC for the treatment of patients with newly diagnosed 

GrafTech International Ltd. researches, develops, manufactures, and sells graphite and carbon based products worldwide. It offers graphite electrodes, which requires for the production of electric arc furnace steel, ferrous, and non-ferrous metals; and petroleum needle coke, a crystalline form of carbon used in the production of graphite electrodes. The company sells its products primarily through direct sales force, independent sales representatives, and distributors. The company was founded in 1886 and is headquartered in Brooklyn Heights, Ohio. GrafTech International Ltd. is a subsidiary of Brookfield Asset Management Inc.
Level One Bancorp, Inc. operates as a bank holding company for Level One Bank that provides business and consumer financial services in Michigan. Its deposit products include checking accounts, NOW accounts, savings and other time deposits, certificates of deposit, and specialty deposit accounts. The company also provides lending products and related services comp

Smartsheet Inc. provides cloud-based platform for execution of work. It enables teams and organizations to plan, capture, manage, automate, and report on work. The company offers Smartdashboards that provides real-time visibility into the status of work to align individuals, managers, and executives; Smartportals to easily locate and access from any device the resources available for a project without IT assistance; Smartcards to organize, share, and act on workflows; and Smartgrids to keep teams on task by easily tracking multiple moving parts. It also provides Smartprojects, which offers interface with capabilities that foster collaboration among teams and organizations to enhance work execution; Smartcalendars that align teams and organizations by connecting deadlines to workflows; Smartforms enables business users to collect information in a structured and consistent format; Smartautomation that automates repetitive processes; and Smartintegrations enable organizations and teams to

Inspire Medical Systems, Inc., a medical technology company, focuses on the development and commercialization of minimally invasive solutions for patients with obstructive sleep apnea (OSA). It offers Inspire system, a neurostimulation technology that provides a safe and effective treatment for moderate to severe OSA. The company also develops a novel, a closed-loop solution that continuously monitors a patient's breathing and delivers mild hypoglossal nerve stimulation to maintain an open airway. Inspire Medical Systems, Inc. was founded in 2007 and is headquartered in Golden Valley, Minnesota.
Unity Biotechnology, Inc., a biotechnology company, engages in the research and development of therapeutics to extend human health span. The company's lead drug candidates include UBX0101 that is in Phase 1 clinical study for musculoskeletal disease; and UBX1967 for ophthalmologic diseases. It is also developing programs in pulmonary disorders. The company was formerly known as Forge, Inc. and 

Magnolia Oil & Gas Corporation engages in the acquisition, development, exploration, and production of oil, natural gas, and natural gas liquids reserves in the United States. The company's properties are located primarily in Karnes County and the Giddings Field in South Texas principally comprising the Eagle Ford Shale and the Austin Chalk formation. As of December 31, 2019, its assets consisted of a total leasehold position of 450,854 net acres, including 22,088 net acres in Karnes, Gonzales, DeWitt, and Atascosa counties, Texas; 428,766 net acres in the Giddings Field; and approximately 1,141 net wells with a total production capacity of 66.8 thousand barrels of oil equivalent per day. The company is headquartered in Houston, Texas.
Ovid Therapeutics Inc., a biopharmaceutical company, develops impactful medicines for patients and families with neurological disorders in the United States. The company is developing OV101, a drug candidate, which is in Phase III clinical trial for the 

Mayville Engineering Company, Inc. operates as a contract manufacturer that serves the heavy and medium duty commercial vehicle, construction, powersports, agriculture, military, and other end markets in the United States. The company provides a range of prototyping and tooling, production fabrication, coating, assembly and aftermarket components. It also supplies engineered components to original equipment manufacturers. The company was founded in 1945 and is headquartered in Mayville, Wisconsin.
Milestone Pharmaceuticals Inc., a biopharmaceutical company, develops and commercializes etripamil for the treatment of cardiovascular indications. It is developing etripamil, a novel channel blocker, which is in Phase III clinical trial for the treatment of paroxysmal supraventricular tachycardia in the United States and Canada, as well as for the treatment of atrial fibrillation, angina, and other cardiovascular indications. The company was founded in 2003 and is headquartered in Montréal, 

Ingersoll Rand Inc. provides mission-critical flow control and compression equipment, and associated aftermarket parts, consumables, and services in the United States, Europe, the Middle East, Africa, and the Asia Pacific. It operates through three segments: Industrials, Energy, and Medical. The Industrials segment designs, manufactures, markets, and services a range of air compression, vacuum, and blower products, as well as offers associated aftermarket parts, consumables, and services. Its products are used in process-critical applications, such as the operation of industrial air tools, vacuum packaging of food products, aeration of waste water, and others. This segment sells its products through an integrated network of direct sales representatives and independent distributors under the Gardner Denver, CompAir, Elmo Rietschle, Robuschi, and other brand names. The Energy segment designs, manufactures, markets, and services a range of displacement and liquid ring vacuum pumps, compre

Bright Scholar Education Holdings Limited, an education service company, operates K-12 schools in China. Its schools comprise international and bilingual schools, and kindergartens. The company also offers a range of complementary education services, including international camps and after-school programs, as well as international education consulting services; and Chinese government-mandated curriculum services for students. As of November 30, 2019, it operated 80 schools across 10 provinces in China, as well as 8 schools internationally with a total student capacity of 67,194 students. Bright Scholar Education Holdings Limited was founded in 1994 and is based in Foshan, China.
EVO Payments, Inc. operates as an integrated merchant acquirer and payment processor in the Americas and Europe. Its payment and commerce solutions consist of gateway solutions, online fraud prevention and management reporting, online hosted payments page capabilities, cellphone-based SMS integrated payment col

Scholar Rock Holding Corporation, a clinical-stage biopharmaceutical company, focuses on the discovery and development of medicines for the treatment of serious diseases in which signaling by protein growth factors plays a fundamental role. Its lead antibody product candidate is SRK-015, a novel inhibitor of the activation of myostatin, which is in Phase II clinical trials for the treatment of spinal muscular atrophy. The company is also developing SRK-181, which is in Phase I clinical trials for the treatment of cancers that are resistant to checkpoint inhibitor therapies, such as anti-PD-1 or anti-PD-L1. In addition, it is developing a pipeline of novel product candidates for a range of serious diseases, including neuromuscular disorders, cancer, fibrosis, and anemia. The company has a collaboration agreement with Gilead Sciences, Inc. to discover and develop specific inhibitors of transforming growth factor beta activation for the treatment of fibrotic diseases. Scholar Rock Holding

Fiverr International Ltd. operates an online marketplace worldwide. Its platform enables sellers to sell their services and buyers to buy them. The company's platform includes approximately 300 categories in eight verticals, including graphic and design, digital marketing, writing and translation, video and animation, music and audio, programming and technology, business, and lifestyle. It also offer And.Co, a platform for online back office service to assist freelancers with invoicing, contracts and task management; Fiverr Learn, an online learning platform with original course content in categories such as graphic design, branding, digital marketing, and copywriting; and ClearVoice, a subscription based content marketing platform for medium to large businesses. Its buyers include businesses of various sizes, as well as sellers comprise a group of freelancers and small businesses. The company was founded in 2010 and is headquartered in Tel Aviv, Israel.
Athenex, Inc., a biopharmaceuti

Repay Holdings Corporation provides integrated payment processing solutions to industry-oriented markets. Its payment processing solutions enable consumers and businesses to make payments using electronic payment methods. The company offers a range of solutions relating to electronic payment methods, including credit and debit processing, automated clearing house processing, and instant funding. It provides payment processing solutions to customers primarily operating in the personal loans, automotive loans, receivables management, and business-to-business verticals. The company sells its products through direct sales representatives and software integration partners. Repay Holdings Corporation was founded in 2006 and is headquartered in Atlanta, Georgia.
Stoke Therapeutics, Inc., an early-stage biopharmaceutical company, develops novel antisense oligonucleotide medicines to treat the underlying causes of severe genetic diseases. Its lead product candidate, STK-001 used to treat Dravet

Xeris Pharmaceuticals, Inc., a specialty pharmaceutical company, develops and commercializes ready-to-use injectable and infusible drug formulations. Its proprietary XeriSol and XeriJect formulation technologies allow for the subcutaneous and intramuscular delivery of highly-concentrated, ready-to-use formulations of peptides, proteins, antibodies, and small molecules using commercially available syringes, auto-injectors, multi-dose pens, and infusion pumps. The company's lead product candidate is Gvoke HypoPen, which has completed Phase III clinical trials for the treatment of severe hypoglycemia, a potentially life-threatening condition in people with diabetes. Its product candidates also comprise ready-to-use glucagon that is in Phase II clinical trials for the treatment of congenital hyperinsulinism, post-bariatric hypoglycemia, exercise-induced hypoglycemia in diabetes, and hypoglycemia-associated autonomic failure, as well as for treating hypoglycemia associated with intermittent

BridgeBio Pharma, Inc. discovers and develops various medicines for genetic diseases. The company has a pipeline of approximately 20 development programs that include product candidates ranging from early discovery to late-stage development in various therapeutic areas. Its principal products in development programs include BBP-265, an oral small molecule transthyretin (TTR) for the treatment of TTR amyloidosis, including cardiomyopathy and polyneuropathy manifestations; BBP-831/infigratinib, an oral FGFR1-3 selective tyrosine kinase inhibitor to treat FGFR-driven cancers, as well as for the treatment of achondroplasia; BBP-631, a preclinical adeno-associated virus gene transfer product candidate for the treatment of congenital adrenal hyperplasia caused by 21OHD; and BBP-454, a preclinical development program for small molecule inhibitors of KRAS for the treatment of pan-mutant KRAS-driven cancers. The company is also developing BBP-418 for the treatment of limb-girdle muscular dystro

The RealReal, Inc. operates an online marketplace for consigned luxury goods. It offers resale product categories, including women's, men's, kids', jewelry and watches, as well as home and art products. The company was founded in 2011 and is headquartered in San Francisco, California.
Blue Apron Holdings, Inc. operates direct-to-consumer platform that delivers original recipes, and fresh and seasonal ingredients. It also operates Blue Apron Market, an e-commerce market that provides cooking tools, utensils, pantry items, and other products. In addition, the company offers Blue Apron Wine, a direct-to-consumer wine delivery service that sells wines, which can be paired with its meals; and supplies poultry, beef, and lamb. It serves college graduates, young couples, families, singles, and empty nesters. The company offers its services through order selections on Website or mobile application primarily in the United States. Blue Apron Holdings, Inc. was founded in 2012 and is headquartere

AssetMark Financial Holdings, Inc. provides wealth management and technology solutions in the United States. The company offers an open-architecture product platform, as well as client advice, asset allocation options, practice management, support services, and technology to the financial adviser channel. It offers an integrated technology platform that allows advisers to do research and portfolio analysis, create proposals, open and maintain accounts, implement investments, and meet reporting obligations; delivers its platform and solutions through people who get to know the company's clients; and provides curated platform of investment options. The company also offers mutual funds to clients of financial advisers; custodial recordkeeping services primarily to investor clients of registered investment advisers; and record-keeper and third-party administrator services for retirement products. It serves independent advisers who provide wealth management advice to the U.S. investors. The

Intercorp Financial Services Inc. provides financial products and services in Peru. It provides transactional accounts, such as cuenta sueldo and cuenta simple; savings accounts; investment accounts; and time deposits, certificates of deposit, and compensation for service time accounts. The company also offers retail banking, including consumer; payroll deduction; cash, vehicle, student, express, collateralized cash, and other consumer loans; and mortgage loans, as well as credit cards. In addition, it provides corporate, medium-size business, and small business banking services; and commercial banking products, which include commercial real estate, vehicles, machinery and other goods, cash management, trade finance, and electronic factoring products. Further, it offers treasury and institutional banking, as well as securitization services. As of December 31, 2019, the company operated 256 financial stores and 1,603 ATMs. Intercorp Financial Services Inc. was founded in 1897 and is bas

Livongo Health, Inc. provides an integrated suite of solutions for the healthcare industry in North America. Its solutions promote health behavior change based on real-time data capture supported by intuitive devices and insights driven by data science. The company offers a platform that provides cellular-connected devices, supplies, informed coaching, data science-enabled insights, and facilitates access to medications. Its products include Livongo for diabetes, Livongo for hypertension, Livongo for prediabetes and weight management, and Livongo for behavioral health by myStrength. The company was formerly known as EosHealth, Inc. and changed its name to Livongo Health, Inc. in 2014. Livongo Health, Inc. was incorporated in 2008 and is headquartered in Mountain View, California.
RBB Bancorp operates as the bank holding company for Royal Business Bank that provides various banking products and services to the Chinese-American communities. Its deposit products include checking, savings,

Opera Limited, together with its subsidiaries, provides mobile and PC web browsers. The company offers mobile browser products, such as Opera Mini, Opera for Android, and Opera Touch; PC browsers, including Opera for Computers and Opera GX; Opera News, a personalized news aggregation app; and Okash, a microfinance app. It operates in India, Ireland, Kenya, Russia, and internationally. The company was founded in 1996 and is headquartered in Oslo, Norway.
Redfin Corporation operates as a real estate brokerage company in the United States and Canada. The company operates an online real estate marketplace and provides real estate services, including assisting individuals in the purchase or sell of home. It also provides title and settlement services; originates and sells mortgages; and buys and sells homes. The company was formerly known as Appliance Computing Inc. and changed its name to Redfin Corporation in May 2006. Redfin Corporation was incorporated in 2002 and is headquartered in Se

AMTD International Inc., an investment holding company, engages in investment banking activities in Hong Kong, Mainland China, the United States, and internationally. The company operates through three segments: Investment Banking, Asset Management, and Strategic Investment. It offers a range of investment banking services, including equity underwriting, debt underwriting, securities brokerage, institutional sales and distribution, and research, as well as advisory on credit rating, financing, and mergers and acquisitions transactions. The company also provides professional investment management and advisory services primarily to corporate and other institutional clients. In addition, it makes long-term strategic investments focusing on Asia's financial and new economy sectors. The company was formerly known as AMTD Inc. The company was incorporated in 2019 and is headquartered in Central, Hong Kong. AMTD International Inc. is as subsidiary of AMTD Group Company Limited.
InMode Ltd. de

Nesco Holdings, Inc. provides specialty equipment, parts, tools, accessories, and services to the electric utility transmission and distribution, telecommunications, and rail markets in North America. The company rents and sells specialized equipment to various customer base for the maintenance, repair, upgrade, and installation of critical infrastructure assets, including electric lines, telecommunications networks, and rail systems. It has a coast-to-coast rental fleet of approximately 4,600 units comprising insulated and non-insulated bucket trucks, digger derricks, line equipment, cranes, pressure diggers, and underground equipment. The company is based in the Fort Wayne, Indiana.
Megalith Financial Acquisition Corp. does not have significant operations. It intends to effect a merger, capital stock exchange, asset acquisition, stock purchase, reorganization, or similar business combination with one or more businesses. The company focuses on companies in the financial services indus

Tremont Mortgage Trust, a real estate investment trust (REIT), focuses on originating and investing in first mortgage loans secured by middle market and transitional commercial real estate in the United States. The company qualifies as a REIT for federal income tax purposes. It generally would not be subject to federal corporate income taxes if it distributes at least 90% of its taxable income to its stockholders. Tremont Mortgage Trust was founded in 2017 and is headquartered in Newton, Massachusetts. Tremont Mortgage Trust is a subsidiary of Tremont Realty Advisors LLC.
Principia Biopharma Inc., a late-stage biopharmaceutical company, focuses on developing novel therapies for immune-mediated diseases. The company is developing rilzabrutinib, an inhibitor that is in Phase III clinical trials for the treatment of pemphigus, a chronic skin disease, as well as and pemphigus foliaceus; in a Phase 1/2 trial for the treatment of immune thrombocytopenia; and a Phase 2 trial for the treatment

X Financial provides personal finance services in the People's Republic of China. The company offers a suite of products connecting borrowers and investors through a proprietary Internet platform. It provides Xiaoying Credit loan, which includes Xiaoying card loan, a credit card balance transfer product; and Xiaoying preferred loan, a high-credit-limit unsecured loan product. The company also offers Xiaoying housing loan; investment products through Xiaoying wealth management platform, such as loans, money market, and insurance products; and loan facilitation to other platforms. X Financial was founded in 2014 and is headquartered in Shenzhen, the People's Republic of China.
Ping Identity Holding Corp., doing business as Ping Identity Corporation, provides intelligent identity solutions for the enterprise in the United States and internationally. Its Ping Intelligent Identity platform provides customers, workforce, and partners with access to cloud, mobile, Software-as-a-Service, and o

Farfetch Limited, through its subsidiary, Farfetch.com Limited, provides an online marketplace for luxury goods in the Americas, Europe, the Middle East, Africa, and the Asia Pacific. It operates in three segments: Digital Platform, Brand Platform, and In-Store. The company operates Farfetch.com, an online marketplace, as well as Farfetch app for retailers and brands. It also offers web design, build, development, and retail distribution solutions for retailers and brands. In addition, the company operates two Browns retail stores in London; one Stadium Goods retail store in New York; and two New Guards Off-White stores in Las Vegas and New York. Further, it operates approximately 50 New Guards franchised retail stores. Farfetch Limited was founded in 2007 and is headquartered in London, the United Kingdom.
CapStar Financial Holdings, Inc. operates as the bank holding company for CapStar Bank that provides banking services to consumer and corporate customers located primarily in Tennes

Peloton Interactive, Inc. provides interactive fitness products in North America. It offers connected fitness products, such as the Peloton Bike and the Peloton Tread, which include touchscreen that streams live and on-demand classes. The company also provides connected fitness subscriptions for multiple household users, and access to all live and on-demand classes, as well as Peloton Digital app for connected fitness subscribers to provide access to its classes. It has approximately 1.4 million members. The company markets and sells its interactive fitness products directly through its retail showrooms and at onepeloton.com Peloton Interactive, Inc. was founded in 2012 and is headquartered in New York, New York.
RYB Education, Inc. provides early childhood education service in the People's Republic of China. The company offers kindergarten services to 2-6-year-old children; and play-and-learn centers services for the joint participation of 0-6-year-old children and their adult family 

CooTek (Cayman) Inc. operates as a mobile internet company in the United States, the People's Republic of China, and internationally. Its primary product is TouchPal Smart Input, an input method for mobile devices that supports approximately 110 languages; and TouchPal Phonebook, Chinese communication application that enables users in China to make phone calls through internet for free, to search contacts on the dial pad, and to block spam calls. The company also offers Crazy Reading Novel, a mobile application that provides users with free online novels; fitness application comprising Hi Shou; Drink Water Reminder that helps users drink an appropriate amount of water on a daily basis; Happy Jogging, a free pedometer mobile application; and Hailaidian, a mobile application that provides pictures, videos, and music to decorate the call interface and help users have fun when receiving phone calls. CooTek (Cayman) Inc. was founded in 2008 and is headquartered in Shanghai, the People's Rep

Viela Bio, Inc., a clinical-stage biotechnology company, engages in the research and development of treatments for severe inflammation and autoimmune diseases in the United States. The company's lead product candidate is inebilizumab, a humanized monoclonal antibody for neuromyelitis optica spectrum disorder, kidney transplant desensitization, myasthenia gravis, and IgG4-related diseases. It is also developing VIB4920 for kidney transplantation rejection and sjögren's syndrome; and VIB7734 for cutaneous lupus erythematosus. Viela Bio, Inc. has a strategic collaboration with Mitsubishi Tanabe Pharma Corporation to develop and commercialize inebilizumab for autoimmune diseases in Japan, Thailand, South Korea, Indonesia, Vietnam, Malaysia, the Philippines, Singapore, and Taiwan. The company was founded in 2017 and is headquartered in Gaithersburg, Maryland.
Guardant Health, Inc., a precision oncology company, provides blood tests, data sets, and analytics in the United States and internat

Allogene Therapeutics, Inc., a clinical stage immuno-oncology company, develops and commercializes genetically engineered allogeneic T cell therapies for the treatment of cancer. The company is developing UCART19, an allogeneic chimeric antigen receptor (CAR) T cell product candidate, which is in Phase I clinical trials for the treatment of pediatric and adult patients with R/R CD19 positive B-cell ALL; ALLO-501, an anti-CD19 allogeneic CAR T cell product candidate that is in Phase I clinical trial for the treatment of R/R non-Hodgkin lymphoma; and ALLO-501A for the treatment R/R large B-cell lymphoma or transformed follicular lymphoma. It is also developing ALLO-715, an allogeneic CAR T cell product candidate that is in a Phase 1 clinical trial for treating R/R multiple myeloma; ALLO-819, an allogeneic CAR T cell product candidates for the treatment of acute myeloid leukemia; ALLO-647, an anti-CD52 monoclonal antibody; CD70 to treat renal cell cancer; and DLL3 for the treatment of sma

OptiNose, Inc., a specialty pharmaceutical company, focuses on the development and commercialization of products for patients treated by ear, nose, and throat; and allergy specialists in the United States. The company offers XHANCE, a therapeutic product utilizing its proprietary optinose exhalation delivery system that delivers a topically-acting and anti-inflammatory corticosteroid for the treatment of chronic rhinosinusitis with and without nasal polyps. It is also developing XHANCE, which is in Phase IIIb clinical trial for the treatment of chronic sinusitis; and OPN-300 for the treatment of Prader-Willi syndrome, a rare genetic obesity disorder, as well as autism spectrum disorder. The company has a license agreement with Currax Pharmaceuticals LLC for the commercialization of Onzetra Xsail (sumatriptan nasal powder) to treat migraine in adults; and Inexia Limited to develop, manufacture, import, and sale products containing orexin receptor agonist and/or orexin receptor positive 

Galileo Acquisition Corp. does not have signification operations. It focuses on effecting a merger , share exchange, asset acquisition, share purchase, reorganization, or similar business combination with one or more businesses. It intends to focus on companies operating in the consumer, retail, food and beverage, fashion and luxury, specialty industrial, technology, or healthcare sectors, which are headquartered in Western Europe. Galileo Acquisition Corp. was founded in 2019 and is based in New York, New York.
Sea Limited engages in the digital entertainment, e-commerce, and digital financial service businesses in Southeast Asia, Latin America, rest of Asia, and internationally. It provides Garena digital entertainment platform for users to access mobile and PC online games, as well as eSports operations; and access to other entertainment content, such as livestreaming of gameplay and social features , such as user chat and online forums. The company also operates Shopee e-commerce p

YETI Holdings, Inc. designs, markets, retails, and distributes products for the outdoor and recreation market under the YETI brand in the United States, Canada, Australia, and Japan. The company offers hard and soft coolers, as well as storage, transport, outdoor living, and associated accessories. It also provides drinkware products, including colsters, lowballs, stackable pints, mugs, tumblers, bottles, and jugs, as well as accessories comprising bottle straw caps, tumbler handles, and jug mounts under the Rambler brand. In addition, the company offers YETI-branded gear products, such as hats, shirts, bottle openers, ice substitutes, and dog bowls. The company sells its products through independent retailers, including outdoor specialty, hardware, sporting goods, and farm and ranch supply stores, as well as through Website. YETI Holdings, Inc. was founded in 2006 and is headquartered in Austin, Texas.
Progyny, Inc., a benefits management company, specializes in fertility and family b

Acushnet Holdings Corp. designs, develops, manufactures, and distributes golf products in the United States, Europe, the Middle East, Africa, Japan, Korea, and internationally. The company operates through four segments: Titleist Golf Balls, Titleist Golf Clubs, Titleist Golf Gear, and FootJoy Golf Wear. The Titleist Golf Balls segment offers golf balls, such as Pro V1, Pro V1x, AVX, Tour Soft, Velocity, and Pinnacle golf balls, as well as provides custom imprinted golf balls with corporate logos, tournament logos, country club or resort logos, and personalization on Titleist and Pinnacle golf balls. The Titleist Golf Clubs segment designs, assembles, and sells golf clubs, such as drivers, fairways, hybrids, and irons under the Titleist brand; wedges under the Vokey Design brand; and putters under Scotty Cameron brand. The Titleist Golf Gear segment designs and develops golf bags, headwear, golf gloves, travel gears, head covers, and other golf accessories, as well as offers customizat

Evoqua Water Technologies Corp. provides a range of water and wastewater treatment systems and technologies, and mobile and emergency water supply solutions and services for industrial, commercial, and municipal water treatment markets. It operates in two segments, Integrated Solutions and Services, and Applied Product Technologies. The Integrated Solutions and Services segment offers capital systems and related recurring aftermarket services, parts, and consumables, as well as long-term and short-term service contracts, and emergency services for treating industrial process water, utility water, and wastewater. It serves customers in the pharmaceutical, food and beverage, power, and chemical processing industries. The Applied Product Technologies segment offers advanced filtration and separation products, such as Memcor membranes, Ionpure technologies, and Vortisand systems; wastewater treatment technologies, including the BioMag systems, clarification systems, and odor control and sl

Juniper Industrial Holdings, Inc. does not have significant operations. The company intends to effect into a merger, capital stock exchange, asset acquisition, stock purchase, reorganization, or similar business combination with one or more businesses in the industrial sector. The company was founded in 2019 and is based in Chatham, New Jersey.
Apellis Pharmaceuticals, Inc., a clinical-stage biopharmaceutical company, focuses on the development of therapeutic compounds through the inhibition of the complement system for autoimmune and inflammatory diseases. Its lead product candidate is APL-2 that is in Phase III clinical trials for the treatment of geographic atrophy in age-related macular degeneration and paroxysmal nocturnal hemoglobinuria diseases; and in Phase II clinical trials for the treatment of cold agglutinin and warm antibody autoimmune hemolytic anemia diseases, as well as in Phase II clinical trials to treat four types of glomerular diseases, such as C3 glomerulopathy, Ig

SailPoint Technologies Holdings, Inc. designs, develops, and markets identity governance software solutions in the United States, Europe, the Middle East, Africa, and internationally. The company offers software and software as a service solutions, which help organizations to govern the digital identities of employees, contractors, business partners, software bots, and other human and non-human users, as well as manage their constantly changing access rights to enterprise applications and data across hybrid IT environments, spanning on-premises, cloud and mobile applications, and file storage platforms. Its solutions include IdentityIQ, an on-premises identity governance solution; IdentityNow, a cloud-based multi-tenant governance platform; and IdentityAI, a multi-tenant AI and ML SaaS subscription offering that helps organizations detect potential threats before they turn into security breaches. The company sells its products and solutions to commercial enterprises, financial institut

Luther Burbank Corporation operates as the bank holding company for Luther Burbank Savings that provides various banking products and services for real estate investors, professionals, entrepreneurs, high net worth individuals, and commercial businesses. The company offers interest and noninterest-bearing transaction accounts, certificates of deposit, and money market accounts. It also provides commercial real estate loans, including first mortgage loans for the purchase, refinance, or build-out of tenant improvements on investor owned multifamily residential properties, as well as loans for the purchase, refinance, or improvement of office, retail, and light industrial properties; single family residential loans; and mortgage products, such as a portfolio of 30-year fixed rate first mortgage and a forgivable second mortgage. In addition, the company offers ATM, debit cards, and online and mobile banking services; engages in the real estate investment; and issues trust preferred securi

Newmark Group, Inc. provides commercial real estate services in the United States and internationally. The company's investor/owner services and products include capital markets, such as investment sales; and agency leasing, property management, valuation and advisory, and diligence and underwriting, as well as government sponsored enterprise lending, loan servicing, debt and structured finance, loan sales, mortgage broking and equity-raising under the Newmark Knight Frank name. It occupier services and products comprise tenant representation, real estate management technology systems, workplace and occupancy strategy, global corporate services consulting, project management, lease administration, and facilities management. The company provides its services to commercial real estate tenants, owner-occupiers, investors, and developers. As of February 27, 2020, it operated approximately 480 offices in 6 continents. The company was formerly known as Newmark Knight Frank and changed its na

In [7]:
company_desc = pd.read_csv("Final Project/profiles.csv")
ipos_list['Description'] = company_desc
ipos_list[70:80]

Unnamed: 0,Symbol,Company Name,Exchange,Price,Shares,Date,Offer Amount,Actions,Description
70,SNDR,"Schneider National, Inc.",NYSE,19.0,28947000,04/06/2017,549993000,Priced,Schneider National Inc. a transportation and l...
71,ELVT,"Elevate Credit, Inc.",NYSE,6.5,12400000,04/06/2017,80600000,Priced,Elevate Credit Inc. provides online credit sol...
72,OKTA,"Okta, Inc.",NASDAQ Global Select,17.0,11000000,04/07/2017,187000000,Priced,Okta Inc. provides identity management platfor...
73,ALTG,ALTA EQUIPMENT GROUP INC.,NYSE,10.0,12500000,04/09/2019,125000000,Priced,Alta Equipment Group Inc. owns and operates in...
74,TUFN,Tufin Software Technologies Ltd.,NYSE,14.0,7700000,04/11/2019,107800000,Priced,Tufin Software Technologies Ltd. develops mark...
75,PD,"PagerDuty, Inc.",NYSE,24.0,9070000,04/11/2019,217680000,Priced,PagerDuty Inc. operates a platform for real-ti...
76,ZUO,ZUORA INC,NYSE,14.0,11000000,04/12/2018,154000000,Priced,Zuora Inc. provides a cloud-based software on ...
77,JMIA,Jumia Technologies AG,NYSE,14.5,13500000,04/12/2019,195750000,Priced,Jumia Technologies AG operates an e-commerce p...
78,TOCA,Tocagen Inc,NASDAQ Global Select,10.0,8500000,04/13/2017,85000000,Priced,Tocagen Inc. a clinical-stage cancer-selective...
79,CADE,Cadence Bancorporation,NYSE,20.0,7500000,04/13/2017,150000000,Priced,Cadence Bancorporation a financial holding com...


Now we run a program to obtain the company profiles, as well as any other information that could be useful. This data will be appended to our ipos_list dataset.

In [None]:
country_file = "countries.csv"
f = open(country_file, "w")
header1 = "Country\n"
f.write(header1)

def ipo_country(ticker):
    ipo2_url = f'https://finance.yahoo.com/quote/{ticker}/profile?p={ticker}'
    ipo_Client2 = uReq(ipo2_url)
    ipo2_html = ipo_Client2.read()
    ipo_Client2.close()

    ipo2_soup = bsoup(ipo2_html, "html.parser")
    ipo2_cntry = ipo2_soup.find('p', {'class':'D(ib) W(47.727%) Pend(40px)'}).text
    if type(ipo2_cntry) == None:
        return print("")
    else:
        return ipo2_cntry

symbols1 = ipos_list['Symbol']
for symbol in symbols1:
    print(ipo_country(symbol))
    f.write(ipo_country(symbol).replace(",","") + "\n")

f.close()

        
        

Two Newton PlaceSuite 300 255 Washington StreetNewton, MA 02458United States617-219-1460http://www.ilptreit.com
950 17th StreetSuite 2400Denver, CO 80202United States303 515 2800http://www.libertyfrac.com
7990 Auburn RoadConcord Township, OH 44077United States440 354 4445http://www.ranpak.com
2001 Kirby DriveSuite 200Houston, TX 77019United States281 730 5100http://nineenergyservice.com
1501 Yamato RoadBoca Raton, FL 33431United States561 988 3600http://www.adt.com
South TowerSuite 600 10 Glenlake ParkwayAtlanta, GA 30328United States678-441-1400http://www.americold.com
3990 Rogerdale RoadHouston, TX 77042United States713 325 6000http://nextierofs.com
Avenida Brigadeiro Faria Lima, 13844º Andar Parte ASao Paulo, SP 01451-001Brazil55 11 3038 8127http://www.pagseguro.uol.com.br
1144 Fifteenth StreetDenver, CO 80202United States303 744 1911http://www.gates.com
520 U.S. Highway 22Suite 305Bridgewater, NJ 08807United States800-775-7936http://www.menlotherapeutics.com
Chemin des Aulx, 12Plan

In [8]:
cntry = pd.read_csv('Final Project/countries.csv')
ipos_list['Country'] = cntry['Actual Country']
ipos_list

Unnamed: 0,Symbol,Company Name,Exchange,Price,Shares,Date,Offer Amount,Actions,Description,Country
0,ILPT,Industrial Logistics Properties Trust,NASDAQ Global Select,24.00,20000000,01/12/2018,480000000,Priced,ILPT is a real estate investment trust or REIT...,United States
1,LBRT,Liberty Oilfield Services Inc.,NYSE,17.00,12731092,01/12/2018,216428564,Priced,Liberty Oilfield Services Inc. provides hydrau...,United States
2,PACK,Ranpak Holdings Corp.,NYSE,10.00,30000000,01/18/2018,300000000,Priced,Ranpak Holdings Corp. and its subsidiaries pro...,United States
3,NINE,"Nine Energy Service, Inc.",NYSE,23.00,7000000,01/19/2018,161000000,Priced,Nine Energy Service Inc. operates as an onshor...,United States
4,ADT,ADT Inc.,NYSE,14.00,105000000,01/19/2018,1470000000,Priced,ADT Inc. provides security and automation solu...,United States
...,...,...,...,...,...,...,...,...,...,...
394,CASA,Casa Systems Inc,NASDAQ Global Select,13.00,6000000,12/15/2017,78000000,Priced,Casa Systems Inc. provides software-centric br...,United States
395,NMRK,"NEWMARK GROUP, INC.",NASDAQ Global Select,14.00,20000000,12/15/2017,280000000,Priced,Newmark Group Inc. provides commercial real es...,United States
396,TRVG,trivago N.V.,NASDAQ Global Select,11.00,26110118,12/16/2016,287211298,Priced,trivago N.V. together with its subsidiaries op...,Germany
397,AMTB,Amerant Bancorp Inc.,NASDAQ Global Select,13.00,6300000,12/19/2018,81900000,Priced,Amerant Bancorp Inc. operates as the bank hold...,United States


In [227]:
sector_file = "sectors.csv"
f = open(sector_file, "w")
header2 = "Sector\n"
f.write(header2)

def ipo_sector(ticker):
    ipo3_url = f'https://finance.yahoo.com/quote/{ticker}/profile?p={ticker}'
    ipo_Client3 = uReq(ipo3_url)
    ipo3_html = ipo_Client3.read()
    ipo_Client3.close()

    ipo3_soup = bsoup(ipo3_html, "html.parser")
    ipo3_sctr = ipo3_soup.findAll('span', {'class':'Fw(600)'})
    if type(ipo3_sctr[1]) == None:
        return print("")
    else:
        return ipo3_sctr[1].text
    
symbols2 = ipos_list['Symbol']
for symbol in symbols2:
    #print(ipo_sector(symbol))
    f.write(ipo_sector(symbol).replace(",","") + "\n")

f.close()

In [9]:
sctr = pd.read_csv('Final Project/sectors.csv')
ipos_list['Group'] = sctr['Group']
ipos_list

Unnamed: 0,Symbol,Company Name,Exchange,Price,Shares,Date,Offer Amount,Actions,Description,Country,Group
0,ILPT,Industrial Logistics Properties Trust,NASDAQ Global Select,24.00,20000000,01/12/2018,480000000,Priced,ILPT is a real estate investment trust or REIT...,United States,Misc.
1,LBRT,Liberty Oilfield Services Inc.,NYSE,17.00,12731092,01/12/2018,216428564,Priced,Liberty Oilfield Services Inc. provides hydrau...,United States,Misc.
2,PACK,Ranpak Holdings Corp.,NYSE,10.00,30000000,01/18/2018,300000000,Priced,Ranpak Holdings Corp. and its subsidiaries pro...,United States,Misc.
3,NINE,"Nine Energy Service, Inc.",NYSE,23.00,7000000,01/19/2018,161000000,Priced,Nine Energy Service Inc. operates as an onshor...,United States,Misc.
4,ADT,ADT Inc.,NYSE,14.00,105000000,01/19/2018,1470000000,Priced,ADT Inc. provides security and automation solu...,United States,Devices
...,...,...,...,...,...,...,...,...,...,...,...
394,CASA,Casa Systems Inc,NASDAQ Global Select,13.00,6000000,12/15/2017,78000000,Priced,Casa Systems Inc. provides software-centric br...,United States,Media
395,NMRK,"NEWMARK GROUP, INC.",NASDAQ Global Select,14.00,20000000,12/15/2017,280000000,Priced,Newmark Group Inc. provides commercial real es...,United States,Misc.
396,TRVG,trivago N.V.,NASDAQ Global Select,11.00,26110118,12/16/2016,287211298,Priced,trivago N.V. together with its subsidiaries op...,Germany,Search
397,AMTB,Amerant Bancorp Inc.,NASDAQ Global Select,13.00,6300000,12/19/2018,81900000,Priced,Amerant Bancorp Inc. operates as the bank hold...,United States,Organization


In [258]:
ind_file = "industry.csv"
f = open(ind_file, "w")
header3 = "Industry\n"
f.write(header3)

def ipo_industry(ticker):
    ipo4_url = f'https://finance.yahoo.com/quote/{ticker}/profile?p={ticker}'
    ipo_Client4 = uReq(ipo4_url)
    ipo4_html = ipo_Client4.read()
    ipo_Client4.close()

    ipo4_soup = bsoup(ipo4_html, "html.parser")
    ipo4_ind = ipo4_soup.find('span', {'class':'Fw(600)'})
    if type(ipo4_ind) == None:
        return print("")
    else:
        return ipo4_ind.text

ipos_lista = ipos_list.drop([78], axis=0) 

    
symbols3 = ipos_lista['Symbol']
for symbol in symbols3:
    #print(ipo_industry(symbol))
    f.write(ipo_industry(symbol).replace(",","") + "\n")

f.close()

In [10]:
indt = pd.read_csv('Final Project/industry.csv')
ipos_list['Industry'] = indt
ipos_list

Unnamed: 0,Symbol,Company Name,Exchange,Price,Shares,Date,Offer Amount,Actions,Description,Country,Group,Industry
0,ILPT,Industrial Logistics Properties Trust,NASDAQ Global Select,24.00,20000000,01/12/2018,480000000,Priced,ILPT is a real estate investment trust or REIT...,United States,Misc.,Real Estate
1,LBRT,Liberty Oilfield Services Inc.,NYSE,17.00,12731092,01/12/2018,216428564,Priced,Liberty Oilfield Services Inc. provides hydrau...,United States,Misc.,Energy
2,PACK,Ranpak Holdings Corp.,NYSE,10.00,30000000,01/18/2018,300000000,Priced,Ranpak Holdings Corp. and its subsidiaries pro...,United States,Misc.,Consumer Cyclical
3,NINE,"Nine Energy Service, Inc.",NYSE,23.00,7000000,01/19/2018,161000000,Priced,Nine Energy Service Inc. operates as an onshor...,United States,Misc.,Energy
4,ADT,ADT Inc.,NYSE,14.00,105000000,01/19/2018,1470000000,Priced,ADT Inc. provides security and automation solu...,United States,Devices,Industrials
...,...,...,...,...,...,...,...,...,...,...,...,...
394,CASA,Casa Systems Inc,NASDAQ Global Select,13.00,6000000,12/15/2017,78000000,Priced,Casa Systems Inc. provides software-centric br...,United States,Media,Technology
395,NMRK,"NEWMARK GROUP, INC.",NASDAQ Global Select,14.00,20000000,12/15/2017,280000000,Priced,Newmark Group Inc. provides commercial real es...,United States,Misc.,Real Estate
396,TRVG,trivago N.V.,NASDAQ Global Select,11.00,26110118,12/16/2016,287211298,Priced,trivago N.V. together with its subsidiaries op...,Germany,Search,Communication Services
397,AMTB,Amerant Bancorp Inc.,NASDAQ Global Select,13.00,6300000,12/19/2018,81900000,Priced,Amerant Bancorp Inc. operates as the bank hold...,United States,Organization,Financial Services


In [11]:
ipos_csv = ipos_list.copy()
ipos_csv.to_csv('Final Project/IPO Information.csv')

### Text Analysis - Bag-of-words  

Now that we have the information that we need, we can move to creating the bag-of-words model.

In [12]:
from sklearn.feature_extraction import text  
from sklearn.naive_bayes import MultinomialNB, GaussianNB, BernoulliNB
acqx = pd.read_csv('Final Project/acquired companies.csv')
acqx['Google'] = np.where(acqx['ParentCompany'] == 'Alphabet', 1, 0) #binary identifier that will be used to train the model
acqx

Unnamed: 0,Acquisition Date,Company,Business,Country,Price,Used as or integrated with:,Group,ParentCompany,Google
0,2005,Systems Research & Development,Identity management,United States,,,,IBM,0
1,2005,en'tegrate,Software,United States,,,,Microsoft,0
2,2005,Stadeon,Online game,United States,,Yahoo! Games,,Yahoo,0
3,2005,Corio,Application Services,United States,1.820000e+08,,,IBM,0
4,2005,Groove Networks,Community software,United States,,,,Microsoft,0
...,...,...,...,...,...,...,...,...,...
668,01-Nov-19,Fitbit,Wearables,United States,2.100000e+09,,,Alphabet,1
669,18-Nov-19,CloudSimple,Cloud hosting,United States,0.000000e+00,Google Cloud Platform,Productivity,Alphabet,1
670,19-Dec-19,Typhoon Studios,Video game development,Canada,0.000000e+00,Stadia,Media,Alphabet,1
671,14-Jan-20,Pointy,Local retail inventory feeds,Ireland,1.630000e+08,Google Maps,Search,Alphabet,1


In [13]:
bustxt_bag = text.CountVectorizer(stop_words="english")
bustxt_bag.fit(acqx['Business'])

CountVectorizer(analyzer='word', binary=False, decode_error='strict',
                dtype=<class 'numpy.int64'>, encoding='utf-8', input='content',
                lowercase=True, max_df=1.0, max_features=None, min_df=1,
                ngram_range=(1, 1), preprocessor=None, stop_words='english',
                strip_accents=None, token_pattern='(?u)\\b\\w\\w+\\b',
                tokenizer=None, vocabulary=None)

In [14]:
all_bustxt = bustxt_bag.get_feature_names()
len(all_bustxt)

543

In [15]:
all_bustxt[1:50]

['3d',
 '64',
 'access',
 'ad',
 'ads',
 'advanced',
 'advertiser',
 'advertising',
 'aerial',
 'agency',
 'aggregator',
 'ai',
 'airborne',
 'ajax',
 'altitude',
 'american',
 'amp',
 'analysis',
 'analytics',
 'android',
 'anonymous',
 'answer',
 'anti',
 'api',
 'app',
 'appliance',
 'application',
 'applications',
 'apps',
 'architecture',
 'arms',
 'artificial',
 'asset',
 'assets',
 'assistant',
 'associated',
 'atlas',
 'attribution',
 'audio',
 'augmented',
 'authentication',
 'automated',
 'automatic',
 'automation',
 'backends',
 'backup',
 'based',
 'behavioral',
 'beta']

In [16]:
y = acqx['Google']
X = bustxt_bag.transform(acqx['Business'])
X

<673x543 sparse matrix of type '<class 'numpy.int64'>'
	with 1714 stored elements in Compressed Sparse Row format>

In [17]:
Goog_nb = MultinomialNB()
Goog_nb.fit(X, y)

MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True)

In [18]:
Goog_nb.score(X,y)

0.8216939078751857

After initializing the model, we apply it to the company list dataset to obtain the probabilties, denoted in the Google_prob column.

In [19]:
bagged = bustxt_bag.transform(ipos_list['Description'])
probas = Goog_nb.predict_proba(bagged)[:,1]
ipos_list['Google_prob'] = probas
#ipos_list[ipos_list['Google_prob'].isin([1])]
#ipos_list['Google_prob'] = (ipos_list['Google_prob']*100).astype(str) + '%'
group_index = {'Search':0, 'Media':0, 'Business':0, 'OS':1, 'Organization':1, 'Productivity':1, 'Connectivity':1, 'Devices':2, 'Research':2, 'Innovation':2, 'Misc.':2}
ipos_list['Group_Index'] = ipos_list['Group'].map(group_index)

ipos_list



Unnamed: 0,Symbol,Company Name,Exchange,Price,Shares,Date,Offer Amount,Actions,Description,Country,Group,Industry,Google_prob,Group_Index
0,ILPT,Industrial Logistics Properties Trust,NASDAQ Global Select,24.00,20000000,01/12/2018,480000000,Priced,ILPT is a real estate investment trust or REIT...,United States,Misc.,Real Estate,9.866884e-02,2
1,LBRT,Liberty Oilfield Services Inc.,NYSE,17.00,12731092,01/12/2018,216428564,Priced,Liberty Oilfield Services Inc. provides hydrau...,United States,Misc.,Energy,3.174335e-02,2
2,PACK,Ranpak Holdings Corp.,NYSE,10.00,30000000,01/18/2018,300000000,Priced,Ranpak Holdings Corp. and its subsidiaries pro...,United States,Misc.,Consumer Cyclical,1.843263e-05,2
3,NINE,"Nine Energy Service, Inc.",NYSE,23.00,7000000,01/19/2018,161000000,Priced,Nine Energy Service Inc. operates as an onshor...,United States,Misc.,Energy,2.670285e-01,2
4,ADT,ADT Inc.,NYSE,14.00,105000000,01/19/2018,1470000000,Priced,ADT Inc. provides security and automation solu...,United States,Devices,Industrials,7.411440e-09,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
394,CASA,Casa Systems Inc,NASDAQ Global Select,13.00,6000000,12/15/2017,78000000,Priced,Casa Systems Inc. provides software-centric br...,United States,Media,Technology,3.112301e-03,0
395,NMRK,"NEWMARK GROUP, INC.",NASDAQ Global Select,14.00,20000000,12/15/2017,280000000,Priced,Newmark Group Inc. provides commercial real es...,United States,Misc.,Real Estate,2.921059e-08,2
396,TRVG,trivago N.V.,NASDAQ Global Select,11.00,26110118,12/16/2016,287211298,Priced,trivago N.V. together with its subsidiaries op...,Germany,Search,Communication Services,7.067322e-01,0
397,AMTB,Amerant Bancorp Inc.,NASDAQ Global Select,13.00,6300000,12/19/2018,81900000,Priced,Amerant Bancorp Inc. operates as the bank hold...,United States,Organization,Financial Services,3.483509e-06,1


### Merging Text Analysis with the Decision Tree Model

The first step in this final process is to split the data into their respective leaf nodes.

In [20]:
node1 = ipos_list.loc[(ipos_list['Offer Amount'] <= 395000000)]
node16 = ipos_list.loc[(ipos_list['Offer Amount'] > 395000000)]

node2 = node1.loc[(node1['Offer Amount'] <= 165000000)]
node9 = node1.loc[(node1['Offer Amount'] > 165000000)]

node17 = node16.loc[(node16['Group_Index'] <= 0.5)]
node22 = node16.loc[(node16['Group_Index'] > 0.5)] #Leaf

node3 = node2.loc[(node2['Offer Amount'] <= 59000000)]
node6 = node2.loc[(node2['Offer Amount'] > 59000000)]

node10 = node9.loc[(node9['Offer Amount'] <= 368000000)]
node13 = node9.loc[(node9['Offer Amount'] > 368000000)]

node18 = node17.loc[(node17['Offer Amount'] <= 3050000000)]
node21 = node17.loc[(node17['Offer Amount'] > 3050000000)] #Leaf

node4 = node3.loc[(node3['Group_Index'] <= 0.5)] #Leaf
node5 = node3.loc[(node3['Group_Index'] > 0.5)] #Leaf

node7 = node6.loc[(node6['Group_Index'] <= 1.5)] #Leaf
node8 = node6.loc[(node6['Group_Index'] > 1.5)] #Leaf

node11 = node10.loc[(node10['Offer Amount'] <= 247500000)] #Leaf
node12 = node10.loc[(node10['Offer Amount'] > 247500000)] #Leaf

node14 = node13.loc[(node13['Offer Amount'] <= 385000000)] #Leaf
node15 = node13.loc[(node13['Offer Amount'] > 385000000)] #Leaf

node19 = node18.loc[(node18['Offer Amount'] <= 2325000000)] #Leaf
node20 = node18.loc[(node18['Offer Amount'] > 2325000000)] #Leaf

print("Node 4:", len(node4))
print("Node 5:", len(node5))
print("Node 7:", len(node7))
print("Node 8:", len(node8))
print("Node 11:", len(node11))
print("Node 12:", len(node12))
print("Node 14:", len(node14))
print("Node 15:", len(node15))
print("Node 19:", len(node19))
print("Node 20:", len(node20))
print("Node 21:", len(node21))
print("Node 22:", len(node22))
x = len(node4)+len(node5)+len(node7)+len(node8)+len(node11)+len(node12)+len(node14)+len(node15)+len(node19)+len(node20)+len(node21)+len(node22)
x #check to see that we have all the data points - we do!

Node 4: 3
Node 5: 23
Node 7: 55
Node 8: 116
Node 11: 60
Node 12: 53
Node 14: 2
Node 15: 1
Node 19: 18
Node 20: 0
Node 21: 1
Node 22: 67


399

From here, we calculate the average probability of each of the leaf nodes to determine from where to extract our final shortlist.

In [408]:
print(node4['Google_prob'].sum(axis = 0, skipna = True)/len(node4))
print(node5['Google_prob'].sum(axis = 0, skipna = True)/len(node5))
print(node7['Google_prob'].sum(axis = 0, skipna = True)/len(node7))
print(node8['Google_prob'].sum(axis = 0, skipna = True)/len(node8)) #second highest average
print(node11['Google_prob'].sum(axis = 0, skipna = True)/len(node11))
print(node12['Google_prob'].sum(axis = 0, skipna = True)/len(node12))
print(node14['Google_prob'].sum(axis = 0, skipna = True)/len(node14))
print(node15['Google_prob'].sum(axis = 0, skipna = True)/len(node15)) #highest average
print(node19['Google_prob'].sum(axis = 0, skipna = True)/len(node19))
print(node20['Google_prob'].sum(axis = 0, skipna = True)) # no division as div/0 error would arise
print(node21['Google_prob'].sum(axis = 0, skipna = True)/len(node21))
print(node22['Google_prob'].sum(axis = 0, skipna = True)/len(node22))

0.07726789548907556
0.1796768292005607
0.12917560262859198
0.5073885141963883
0.25416867947540733
0.26188704148814895
0.18283347279845144
0.8221431942987762
0.1426279033968427
0.0
0.006879908932349326
0.16014592158957602


In [22]:
shortlist = node15
shortlist

Unnamed: 0,Symbol,Company Name,Exchange,Price,Shares,Date,Offer Amount,Actions,Description,Country,Group,Industry,Google_prob,Group_Index
256,TXG,"10X Genomics, Inc.",NASDAQ Global Select,39.0,10000000,09/12/2019,390000000,Priced,10x Genomics Inc. a life science technology co...,United States,Business,Healthcare,0.822143,0


 Node 15 only has one company inside it, so let's grab the 9 best from the 2nd highest average node - Node 8.

In [25]:
short1 = node8.sort_values(by=['Google_prob'], ascending=False).head(9)
shortlist1 = shortlist.append(short1, ignore_index=True)
shortlist1

Unnamed: 0,Symbol,Company Name,Exchange,Price,Shares,Date,Offer Amount,Actions,Description,Country,Group,Industry,Google_prob,Group_Index
0,TXG,"10X Genomics, Inc.",NASDAQ Global Select,39.0,10000000,09/12/2019,390000000,Priced,10x Genomics Inc. a life science technology co...,United States,Business,Healthcare,0.822143,0
1,ANAB,"ANAPTYSBIO, INC",NASDAQ Global Select,15.0,5000000,01/26/2017,75000000,Priced,AnaptysBio Inc. a clinical stage biotechnology...,United States,Research,Healthcare,0.999989,2
2,IPHA,Innate Pharma SA,NASDAQ Global Select,5.5,12500000,10/17/2019,68750000,Priced,Innate Pharma S.A. a biotechnology company dis...,France,Research,Healthcare,0.99667,2
3,SCPH,scPharmaceuticals Inc.,NASDAQ Global Select,14.0,6400000,11/17/2017,89600000,Priced,scPharmaceuticals Inc. a pharmaceutical compan...,United States,Research,Healthcare,0.989601,2
4,KALA,"Kala Pharmaceuticals, Inc.",NASDAQ Global Select,15.0,6000000,07/20/2017,90000000,Priced,Kala Pharmaceuticals Inc. a biopharmaceutical ...,United States,Research,Healthcare,0.98524,2
5,CABA,"Cabaletta Bio, Inc.",NASDAQ Global Select,11.0,6800000,10/25/2019,74800000,Priced,Cabaletta Bio Inc. a clinical-stage biotechnol...,United States,Research,Healthcare,0.983492,2
6,ERYP,Erytech Pharma S.A.,NASDAQ Global Select,23.26,4686106,11/10/2017,108998826,Priced,ERYTECH Pharma S.A. a clinical-stage biopharma...,France,Research,Healthcare,0.977425,2
7,DTIL,PRECISION BIOSCIENCES INC,NASDAQ Global Select,16.0,7900000,03/28/2019,126400000,Priced,Precision BioSciences Inc. operates as a genom...,United States,Research,Healthcare,0.976034,2
8,SLDB,Solid Biosciences Inc.,NASDAQ Global Select,16.0,7812500,01/26/2018,125000000,Priced,Solid Biosciences Inc. a life science company ...,United States,Research,Healthcare,0.973965,2
9,MRSN,"Mersana Therapeutics, Inc.",NASDAQ Global Select,15.0,5000000,06/28/2017,75000000,Priced,Mersana Therapeutics Inc. a clinical stage bio...,United States,Research,Healthcare,0.973725,2


In [414]:
shortlist1.to_csv('Final Company Shortlist.csv')

### Results

The results outline the 10 most likely acquisition candidates to all be some form of healthcare firm - mostly in the biotechnology space. This could potentially be attributed to Google's earlier acquisition of Verily that has now formed their life sciences research division. In contrast to the other big tech companies that were compared to Alphabet (Facebook, Apple, etc.), Google was likely the only firm with a life sciences acquisition in that dataset.

### Identified Risks (there are a lot more of them)

- The final decision tree only implemented two criterion to partition the data: price and functional group. While this makes the model simple to interpret, it runs the risk of not being robust.

- The initial regression approach did not work as a result of the lack of correlation between the jumps in stock price and the aforementioned independent variables. This could signal a lack of validity in using the independent variables in the other analysis measures.

- More quantitative measures could have been used to pair with the categorical and text-based data points (e.g EPS, financial ratios).

### Concluding Remarks 

This project examined which companies would be most likely to be acquired by Google's parent company Alphabet from a list of firms that went public between January 1st, 2015, and December 31st, 2019. The initial hypothesis suggested that the result would likely point in the direction of Google's 'other bets' groups, where a company focused on some form of deep research or innovation would most likely be acquired to complement Google's existing and growing enterprises on these fronts.

A simple regression was first run, but the low R^2 of only ~20% suggested that this model should not be used. A logistic regression was also tested, but the model proved unreliable and was also abandoned. However, the decision tree model was able to separate the data into 12 (11) distinct groups, from which point it was combined with the text-based analysis to determine that 10X Genomics Inc., A California-based biotechnology company, would be the most likely to be acquired by Alphabet. This result is in line with the initial hypothesis, as Google has a life sciences research division as part of their 'other bets' group. 

The model itself does presents some risks, particularly with respect to the quantity and quality of the data that was used, and these aspects are what would need to be improved on as a next step from what this project's model predicted.