**NER**

The task of this notebook is to perform Named Entity Recognition (NER) from the PDF file. The task was realized with spaCy library. I have implemented two pretrained models to compare their results. As the results weren't satisfied I have trained a custom model. The model showed lower results as a problem of proper labeling appeared.


In [None]:
import spacy
import fitz
import pandas as pd
from spacy import displacy
import re
from spacy.tokens import DocBin
from tqdm import tqdm

In [None]:
# parse from pdf to text with PyMuPDF
def parse_pdf_to_text(input_file_name):
  
 doc = fitz.open(input_file_name)
 text_output=""
 for page in doc:
    text_output+= page.get_text()

 return text_output

#clean parsed text from unexpected and useless signs that can decrease the quality of prediction
def clean_text(text):

  text_to_list=[]
  for word in text.split():
   text_to_list.append(word)
  
  new_str=" ".join([w for w in text_to_list])
  new_str = re.sub(r'[�]', ' ', new_str)
  new_str = re.sub(r'[’]', ' ', new_str)
  new_str= re.sub("\s\s+", " ", new_str)
  
  return new_str

#create dataset with unique entities and labels which were found in the text
def parse_unique_ner(model, text_output):

 nlp = spacy.load(model)
 doc = nlp(text_output, disable=["tok2vec", "tagger", "parser", "attribute_ruler", "lemmatizer"])
 text = []
 labels=[]

 for ent in doc.ents:
    text.append(ent.text)
    labels.append(ent.label_)

 df= pd.DataFrame()
 df["Text"] = text
 df['Label']=labels
 df_unique=df.drop_duplicates()
 
 print("Rows in total: ",df.shape[0], " Unique rows: ", df_unique.shape[0] )
 
 return df_unique

# compare 2 spaCy pretrained models, umber of the entities found
def models_comparision(df1, df2, label):

 grouped_one= df1.groupby(by=label).count()
 grouped_two= df2.groupby(by=label).count()
 comparison= grouped_one.merge(grouped_two, how='left', on=label, suffixes=["_model_one", "_model_two"])

 return comparison

# visualize found entities in the text
def visualize(options, model, text):

 nlp = spacy.load(model)
 doc = nlp(text, disable=["tok2vec", "tagger", "parser", "attribute_ruler", "lemmatizer"])
 displacy.render(doc, style="ent", jupyter=True, options= options)

# label data for custom trained model 
def data_preparation(text,train_list):

  DATA = []
  for sent in text.split("."):
    visited_items = []
    entities = []
    ent_dict = {}
    for token in train_list:
      for i in re.finditer(r"\b{}\b".format(token), sent,re.IGNORECASE):
        if i not in visited_items:
          entity = (i.span()[0], i.span()[1], 'ORG')
          visited_items.append(token)
          entities.append(entity)
    if len(entities) > 0:
      ent_dict['entities'] = entities
      train_item = (sent, ent_dict)         
      DATA.append(train_item)
 
  return DATA

# transform the text to spaCy format, prepare data for custom trained model
def custom_model(TRAIN_DATA, VAL_DATA):

  nlp = spacy.blank("en")
  db = DocBin()

  for text, annot in tqdm(TRAIN_DATA):
    doc = nlp.make_doc(text)
    ents = []
    for start, end, label in annot["entities"]:
        span = doc.char_span(start, end, label=label, alignment_mode="contract")
        ents.append(span)
    doc.ents = ents
    db.add(doc)

  db.to_disk("./train.spacy")

  db = DocBin()
  for text, annot in tqdm(VAL_DATA):
    doc = nlp.make_doc(text)
    ents = []
    for start, end, label in annot["entities"]:
        span = doc.char_span(start, end, label=label, alignment_mode="contract")
        if span is None:
            print("Skipping entity")
        else:
            ents.append(span)
    doc.ents = ents
    db.add(doc)

  db.to_disk("./valid.spacy")

# test custom trained model
def custom_model_test_data(test, custom_model_path):
 nlp_trained = spacy.load(custom_model_path)
 doc = nlp_trained(test)
 spacy.displacy.render(doc, style="ent", jupyter=True)
 

In [None]:
options = {"ents": ["PERSON", "ORG", "PRODUCT", "GPE", "LOC", "WORK_OF_ART", "NORP", "DATE"]}
model_one = "en_core_web_sm"
model_two = "en_core_web_lg"
custom_model_path="output/model-best"
file= "Interview GS.pdf"

Parse and clean a text from the PDF file.

In [None]:
text_output= parse_pdf_to_text(file)
cleaned_text= clean_text(text_output)

In [None]:
cleaned_text



Compare 2 datasets that were created by 2 pretrained spaCy models "en_core_web_sm" and "en_core_web_lg".

In [None]:
unique_lines_one=parse_unique_ner(model_one, cleaned_text)
unique_lines_one.head(15)

Unnamed: 0,Text,Label
0,May 21,DATE
1,6:00 PM EDT,TIME
2,Global Macro Research Investors,ORG
3,Reg AC,PERSON
4,The Goldman Sachs Group,ORG
5,Michael Novogratz,PERSON
6,NYU s Nouriel Roubini,PRODUCT
7,Grayscale,ORG
8,Michael Sonnenshein,PERSON
9,2020,DATE


In [None]:
unique_lines_two=parse_unique_ner(model_two, cleaned_text)
unique_lines_two.head(15)

Rows in total:  2335  Unique rows:  1119


Unnamed: 0,Text,Label
0,May 21,DATE
1,6:00 PM EDT,TIME
2,Global Macro Research Investors,ORG
3,Reg AC,ORG
4,www.gs.com/research/hedge.html,PERSON
5,"The Goldman Sachs Group, Inc.",ORG
6,Michael Novogratz,PERSON
7,NYU,ORG
8,Nouriel Roubini,PERSON
9,Grayscale,PERSON


Comparison of the number of entities for each label of 2 datasets.

In [None]:
models_comparision(unique_lines_one, unique_lines_two, "Label")

Unnamed: 0_level_0,Text_model_one,Text_model_two
Label,Unnamed: 1_level_1,Unnamed: 2_level_1
CARDINAL,131,115
DATE,236,244
EVENT,3,3
FAC,5,3
GPE,74,72
LANGUAGE,2,1
LAW,7,6
LOC,12,10
MONEY,38,55
NORP,15,17


Visualization of 2 pretrained spaCy models.

In [None]:
visualize(options, model_one, cleaned_text)

In [None]:
visualize(options, model_two, cleaned_text)

Train, validation and test data for training and testing our custom model.

In [None]:
train="""CRYPTO: A NEW ASSET CLASS? ISSUE 98| May 21, 2021| 6:00 PM EDT — T Global Macro Research Investors should consider this report as only a single factor in making their investment decision. For Reg AC certiﬁcation and other important disclosures, see the Disclosure Appendix, or go to www.gs.com/research/hedge.html. The Goldman Sachs Group, Inc. With cryptocurrency prices remaining extremely volatile even as interest in cryptos from credible investors has been rising, and legacy ﬁnancial institutions—including ourselves—have been launching new crypto products and services, crypto is undoubtedly Top of Mind. Amid the recent volatility, we ask experts whether cryptos can and should be considered an institutional asset class, including Galaxy s Michael Novogratz (Yes; the mere fact that a critical mass of credible investors is engaging with cryptos has cemented this), NYU s Nouriel Roubini (No; cryptos have no income, utility or relationship with economic fundamentals), Grayscale s Michael Sonnenshein (Yes; their strong rebound in 2020 reassured investors about their resiliency as an asset class), and GS s own Mathew McDermott (clients increasingly say “yes”). And GS research analysts also weigh in. We then speak to former SEC advisor Alan Cohen, Trail of Bits Dan Guido, and Chainalysis Michael Gronager to explore the regulatory, technological, and security obstacles to further institutional adoption. Bitcoin and other cryptocurrencies aren t assets. Assets have some cash ﬂow or utility that can be used to determine their fundamental value... Bitcoin and other cryptocurrencies have no income or utility. - Nouriel Roubini “ We ve now hit a critical mass of institutional engagement [in crypto]. Everyone from the major banks to PayPal and Square is getting more involved, which is a loud and clear signal that crypto is now an ofﬁcial asset class. - Michael Novogratz “ INTERVIEWS WITH: Michael Novogratz, Co-founder and CEO, Galaxy Digital Holdings Nouriel Roubini, Professor of Economics, New York University Stern School of Business Michael Sonnenshein, CEO, Grayscale Investments Mathew McDermott, Global Head of Digital Assets, Goldman Sachs Alan Cohen, former Senior Policy Advisor, US Securities and Exchange Commission Dan Guido, Co-founder and CEO, Trail of Bits Michael Gronager, Co-founder and CEO, Chainalysis BITCOIN AS A MACRO ASSET Zach Pandl, GS Markets Research CRYPTO IS ITS OWN CLASS OF ASSET Jeff Currie, GS Commodities Research WHAT IS A DIGITAL STORE OF VALUE? Mikhail Sprogis and Jeff Currie, GS Commodities Research THE ROLE OF CRYPTO IN BALANCED PORTFOLIOS Christian Mueller-Glissmann, GS Multi-Asset Strategy Research WHAT S INSIDE of Allison Nathan | allison.nathan@gs.com ...AND MORE I have yet to ﬁnd somebody who has really done their homework on crypto assets that isn t truly amazed by the potential for the asset class. - Michael Sonnenshein TOP MIND Jenny Grimberg | jenny.grimberg@gs.com Gabriel Lipton Galbraith | gabe.liptongalbraith@gs.com El Goldman Sachs Global Investment Research 2 Top of Mind Issue 98 Macro news and views US Japan Latest GS proprietary datapoints/major changes in views We now expect core PCE inflation to peak at 2.8% in May and fall to 2.25% by year-end 2021 after the strong April CPI print. Datapoints/trends we re focused on Taper timeline; we think the Fed will only start to hint at tapering in 2H21 and begin to taper in early 2022. Fed liftoff; if our taper timeline is right, then liftoff will probably not be on the table for about two years. Unemployment; we expect a somewhat less front-loaded jobs recovery, but still see unemployment at 4% by year-end 2021. Latest GS proprietary datapoints/major changes in views We lowered our 2Q21 and CY21 real GDP growth forecasts to 1.8% qoq ann. and 2.6%, respectively, after the imposition of a third state of emergency, and see a more back-loaded recovery. Datapoints/trends we re focused on Pent-up demand, which should boost spending by ¥3.1tn (1% of consumption) and ¥3.9tn (1.3%) in the first and second years after reopening, respectively. Fiscal policy; additional support is a possibility. BoJ policy; we expect the status quo in policy to remain for a long time with little impact from the inflation outlook. Pandemic distortions to core inflation should peak soon Core PCE and contributions to its 2020-22 deviation from trend Third state of emergency delays recovery Aggregate mobility index, index (1/3-2/6/2020 =100) Source: Department of Commerce, Haver Analytics, Goldman Sachs GIR. Source: Google LLC "Google COVID-19 Community Mobility Reports"; https://www.google.com/covid19/mobility/."""


In [None]:
test= """Michael Sonnenshein, CEO of Grayscale Investments, the world s largest digital asset manager, agrees that institutional investors now generally appreciate that digital assets are here to stay, with investors increasingly attracted to the finite quality of assets like bitcoin—which is verifiably scarce—as a way to hedge against inflation and currency debasement, and to diversify their portfolios in the pursuit of higher risk-adjusted returns. Even though crypto assets have behaved as anything but a diversifier over the past year—selling off more than traditional assets as the COVID-19 pandemic set in—he says that their faster and stronger rebound in 2020 only reassured investors about their resiliency as an asset class. But what makes a crypto like bitcoin—which has no income, no practical uses and high volatility—a good store of value? Novogratz s answer: because “the world has voted that they believe” it is. Zach Pandl, GS Co-Head of Global FX, Rates, and EM Strategy, largely agrees, arguing that bitcoin s potential for widespread social adoption given its strong brand on top of its other properties, such as its security, privacy, transferability and the fact that it s digital makes it a plausible store of value for future generations. And he believes that institutional investors today should treat bitcoin as a macro asset, akin to gold. GS commodity analyst Mikhail Sprogis and Jeff Currie, Global Head of Commodities Research, for their part, argue that cryptos can act as stores of value, but only if they have other real world uses that create value and temper price volatility. This, they say, best positions cryptos whose blockchains offer the greatest potential for such uses, like ether, to become the dominant digital store of value. More broadly, Currie contends that cryptos are a new class of asset that derive their value from the information being verified and the size and growth of their networks, but that legal challenges to their future growth loom large due to their decentralized and anonymous nature. And Nouriel Roubini, professor of economics at NYU s Stern School of Business, entirely disagrees with the idea that something with no income, utility or relationship with economic fundamentals can be considered a store of value, or an asset at all. Despite the recent crypto mania, he doubts the willingness of most institutions to expose themselves to cryptos  volatility and risks, which the volatile price action in recent days has served as a stark reminder of. Christian Mueller-Glissmann, GS Senior Multi-Asset Strategist, then makes the case that for an asset to add value to a portfolio, it has to offer either an attractive risk/reward or low correlations with other macro assets, and preferably both. He finds that a small allocation to bitcoin in a standard US 60/40 portfolio since 2014 would ve led to strong outperformance, owing both to higher risk-adjusted returns for bitcoin compared to the S&P 500 and US 10y bonds, as well as diversification benefits from relatively low correlations between bitcoin and other assets. But with this outperformance largely owing to only a handful of idiosyncratic bitcoin rallies, he concludes that bitcoin s short and volatile history makes it too soon to conclude how much value it adds to a balanced portfolio. But beyond the debatable role of cryptos as a store of value and investible asset, does the broader crypto ecosystem provide promise for investors? Novogratz and Sonnenshein strongly believe that the answer is yes, given a myriad of potential use cases for crypto assets. In particular, Novogratz sees the three biggest developments in the crypto ecosystem—payments, Decentralized Finance (DeFi), and non- fungible tokens (NFTs)—mostly being built on the Ethereum network, which suggests substantial upside for it and various DeFi applications. But Roubini contends that few successful applications of blockchain technology exist today. And he sees many potential corporate uses of it as “BINO”—Blockchain In Name Only. In short, he s skeptical that blockchain technology will prove revolutionary because “the idea that technology can resolve the question of trust is delusional.” Mathew McDermott, GS Global Head of Digital Assets, then explains why GS has (re)engaged in the space—in two words: client demand—and how interest in cryptos differs between client types—from asset managers who are seeking portfolio diversification, to high-net-worth clients who are increasingly looking for exposure to broader crypto use cases, to hedge funds that are largely aiming to profit from the basis between going long the physical and short the future—an arbitrage that reflects the difficulties that still persist in accessing the market today. Beyond this issue of market fragmentation, we conclude with a look at some of the other main obstacles to further institutional adoption of crypto assets. Alan Cohen, previous senior policy advisor to former SEC Chairman Jay Clayton and former GS Global Head of Compliance, explains how regulators are looking at crypto assets today. Michael Gronager, Co-founder and CEO of blockchain investigations firm Chainalysis, explains what is— and isn t—included in their analysis that finds that less than 1% of all cryptocurrency activity is illicit. And Dan Guido, Co- founder and CEO of software security firm Trail of Bits, discusses the black swan technological and security scenarios that all investors in the crypto ecosystem should be aware of. Allison Nathan, Editor Email: allison.nathan@gs.com Tel: 212-357-7504 Goldman Sachs and Co. LLC Crypto: a new asset class? El Goldman Sachs Global Investment Research 4 Top of Mind Issue 98 Michael Novogratz is CEO of Galaxy Digital Holdings Ltd. Below, he discusses the potential for crypto assets and their ability to transform the financial system and beyond. The views stated herein are those of the interviewee and do not necessarily reflect those of Goldman Sachs. Allison Nathan: How does Galaxy invest in the crypto universe? Michael Novogratz: Galaxy Digital grew out of my family office, which operates like a merchant bank, and has become a nearly full-service business for the digital asset and blockchain technology communities. Being involved across the ecosystem is important to us, namely so that we can be positioned to help grow the industry that we believe will transform the way we live and work globally. We own and trade coins, have a large venture business, and invest in the virtual world that will be used not by finance, but by consumers—the metaverse, gaming studios, and non-fungible token (NFT) projects. We believe you learn by being at the frontier and that s why we started the company—to learn about the crypto space and share that knowledge with our institutional customers as we create the next generation of financial services companies. Allison Nathan: You ve been involved in and excited about the crypto space for a while now, but it s had fits and starts, including the dramatic price rise and collapse in 2017/18. What makes this time different? Michael Novogratz: 2017/2018 was the first-ever truly global and retail-driven speculative mania. It was blind excitement. It s not that there are no excesses, knuckleheaded Twitter comments, cheerleading, or tribalism today, but that s all there was back then. And crypto s market cap cratered 98.5%. But out of that mania grew a much smarter investor base that took the lessons learned and is more willing to differentiate between the different use cases for crypto—from stores of value to decentralized finance (DeFi) to stablecoins and payment systems. And in turn, the community has built up a more logical investment process. Importantly, that price downturn didn t result in a downturn in investments being made in the underlying crypto infrastructure, so the custody and security infrastructure necessary to attract institutions has been built. As a result, we ve now hit a critical mass of institutional engagement. Everyone from the major banks to PayPal and Square is getting more involved, which is a loud and clear signal that crypto is now an official asset class. There s still a lot of volatility, so people will wash in and out. But crypto is not going away. And a core group of crypto people see this as—and I quote the Blues Brothers here —“a mission from god”. They want to rebuild the infrastructure of the financial markets in a way that s more transparent and egalitarian and doesn t rely on governments who make bad decisions with our finances. They will never sell. And because of that, bitcoin and ether can t go to zero. Allison Nathan: But can the crypto ecosystem survive if it isn t intertwined with the traditional financial system? Michael Novogratz: No. Institutions need to participate because they have most of the money in the world and there s actually a symbiotic relationship between the two. The advisor model that Galaxy possesses is important because many people don t have time to learn to become investors. And as traditional financial advisors and asset managers understand the space and become crypto preachers, they bring more people into the tent, which is key for the future of crypto. That said, payments will be an interesting battleground. The money transfer business is a very high margin one for legacy financial institutions and it s under threat from new payment systems that are faster, more transparent, and cheaper. Facebook is coming out with their Dollar-based payment system, the Chinese government is coming out with theirs, and stablecoins are gaining traction. At some point, I believe our phones will have crypto wallets that will replace bank accounts. The competition to see who dominates payments is just starting along with the competition between exchanges and derivative markets. So the question is, how fast will banks iterate and compete? A core group of crypto people see this as—and I quote the Blues Brothers here — “a mission from god”… They will never sell. And because of that, bitcoin and ether can t go to zero.” Allison Nathan: But will it be bitcoin that s transformative in payments? Michael Novogratz: No. Bitcoin isn\'t set up to process thousands of transactions per second. Paying for a diet coke with bitcoin would be like paying for it with gold. That won t happen. But payment rails will be built on other blockchains. Right now, if I want to send money to my sister in Holland, it would be painful, costly, and slow. But soon, I ll be able to send her a Dollar stablecoin and transferring money will become free. Most of this will be built on the Ethereum network, which is why ethereum prices have been rising. The three biggest moves in the crypto ecosystem—payments, DeFi, and NFTs— are mostly being built on Ethereum, so it s going to get priced like a network. The more people that use it and the more stuff that gets built on it, the higher the price will ultimately go. Allison Nathan: What s the value proposition of bitcoin, then? Michael Novogratz: Bitcoin is a really convenient way to store value. One of the main reasons people have gotten excited about bitcoin recently is that they re worried that we currently have an unsustainable balance of monetary and fiscal policy Interview with Michael Novogratz El Goldman Sachs Global Investment Research 5 Top of Mind Issue 98 that will eventually set off an inflationary spiral. And that worry isn t going away anytime soon. More and more Americans are in favor of paying for college for people whose families earn less than $100k annually. President Biden just gave half of the $1.9tn fiscal package directly to people who needed it, which was very well-received. Some version of universal basic income (UBI) is coming; it may not be called UBI, but capital will be taxed and given to labor. None of that is fiscally prudent, but there s no political imperative to say stop spending money. Even before COVID-19, deficits were bad, but now they re insane. And monetary policymakers are financing everything the government wants to spend, not just in the US but all over the world. So the main reason everyone got into bitcoin is the same reason they got into gold—the current macro backdrop is tailor-made for it. And, as long as that macro and political backdrop persists and crypto remains in the adoption cycle, it s crazy to get out. """

In [None]:
valid= """During the peak of the COVID-19 shock in early 2020, US equities fell by about 35%, but bitcoin collapsed by around 50%. Other top 10 crypto currencies fell by even more. In difficult times, crypto assets don\'t go up; they go down. If investors want inflation hedges, a wide variety of assets have proven to be good inflation hedges for decades, including commodities and their stocks, gold, TIPS, inflation-adjusted and other forms of inflation-indexed bonds. I do worry that monetized deficits might eventually lead to fiscal dominance and higher inflation. But I wouldn\'t recommend bitcoin or other cryptocurrencies to protect against this risk. Allison Nathan: Nascent technologies are often volatile in their adoption phase. What makes this moment for crypto any different than the early days of the internet? Nouriel Roubini: More than a decade on from the advent of Bitcoin, it\'s nowhere near as transformative as the internet was at a similar stage. The World Wide Web already had around a billion users ten years in. While it\'s difficult to know the total number of crypto users today, active users for the most traded coins probably amount to a maximum of a hundred million. Transaction growth for cryptocurrencies has been slower than in the case of the internet, and transaction costs remain very high, with mining revenues as a share of the total volume of transactions still very high. After ten years of the internet, there was email, millions of useful websites and apps, and technologies like the TCP and HTML protocols with broader applications. In the case of cryptocurrencies, there are so-called "dApps", or decentralized apps, but 75% of dApps are games like CryptoKitties or literally pyramid or Ponzi schemes of one sort or another. And the other 25% are "DEXs", or decentralized exchanges, that for now have few transactions and little liquidity. So the comparison with the internet just doesn\'t ring true. Allison Nathan: Doesn t the concept of decentralized ledgers and networks have value, though? Nouriel Roubini: I am not sure it does, but the reality is that the crypto ecosystem is not decentralized. An oligopoly of miners essentially controls about 70-80% of bitcoin and ether mining. These miners are located in places like China, Russia, and Belarus, which are strategic rivals of the US and have a different rule of law. That\'s why the US National Security Council is starting to worry about the risks that could pose for the United States. And 99% of all crypto transactions occur on centralized exchanges. Many crypto currencies also have a concentrated group of core developers who are police, judge, and jury whenever updates to or conflicts over the blockchain arise. Rules assumed to be fixed have been changed in these situations. So the blockchain isn\'t even immutable. There s some evidence that the ownership of crypto wealth is also highly concentrated. Less than 0.5% of addresses own around 85% of all bitcoin, based on CoinMarketCap data. There\'s also evidence that whales holding a large amount of the total supply of bitcoin and other cryptocurrencies actively manipulate their prices. Tons of news articles have detailed active manipulation in chat rooms in the form of pump-and- dump schemes, spoofing, wash trading, front-running, etc. This behavior is much worse than even penny stocks, which suggests a high likelihood of an eventual regulatory crackdown. Allison Nathan: Does any innovation in the crypto ecosystem look promising to you? Nouriel Roubini: Not really. The next decade will see radical financial innovation across many dimensions, disrupting the traditional financial system. But it will have nothing to do with cryptocurrencies. Driving this innovation will be a revolution in fintech owing to some combination of AI, machine learning, and the use of the Internet of Things (IoT) to collect big data. Fintech is already transforming payment systems, borrowing and lending, credit allocation, insurance, asset management, and parts of the capital markets. In the context of payment systems, billions of transactions are made every day using AliPay and WeChat Pay in China, M-Pesa in Kenya and most of Sub-Saharan Africa, and Venmo, PayPal, and Square in the United States. These are all great companies that are scalable, secure, and are disrupting financial services. They\'re not based on decentralized finance (DeFi), and have nothing to do with crypto or blockchain. I ve honestly spent a lot of time looking at this because more and more people are saying that while maybe these aren\'t currencies, blockchain technology could be revolutionary. There are now all these buzzwords like "enterprise distributed ledger technology (DLT)" or "corporate blockchain." But I call most of these projects BINO—"Blockchain In Name Only". Something truly based on blockchain technology should be public, decentralized, permissionless, and trustless. But looking at DLT and corporate blockchain experiments, almost all of them are private, centralized and permissioned—because a small group of people has the ability to validate transactions—and most are authenticated by a trusted institution. And even among these projects, few have actually worked. One study looking at 43 applications of blockchain technologies in the non-profit sphere for reasons such as banking the unbanked, giving IDs to refugees, and transferring remittances found that zero actually worked. The fundamental problem with this whole space is that it assumes the idea that technology can create trust. But that\'s mission impossible. Resolving the challenge of authenticating ownership or quality requires due diligence and testing. Why should I trust a DLT that says my tomatoes are organic? I trust Whole Foods that actually tests the tomatoes for chemicals. The idea that technology can resolve the question of trust is delusional. So, I\'m deeply skeptical that blockchain, DLT, and cryptocurrencies for that matter will be the revolutionary technologies that their proponents suggest. El Goldman Sachs Global Investment Research 10 Top of Mind Issue 98 Zach Pandl argues that institutional investors should treat bitcoin as a macro asset, akin to gold, going through a social adoption phase Although bitcoin is now seeing wider institutional adoption, many sophisticated investors still struggle to understand why a digital asset should have any value—much less a market capitalization of more than $500bn. And because of the parabolic price increases and high retail participation, many treat the cryptocurrency phenomenon as a classic speculative mania or “bubble”. Regardless of whether bitcoin will prove to be a good investment over time, this perspective is too narrow. Bitcoin is a medium which is beginning to serve the functions of money—primarily as a “store of value”. Virtually anything can serve this purpose as long as it gains widespread social adoption, and bitcoin has made meaningful progress down that path. The need for stores of value To understand bitcoin, it is best to begin with gold. Gold serves a unique function in the global financial system. It is both a useful commodity and a money-like, “store of value” asset. However, unlike conventional money mediums, it is not issued by a government and does not denominate any transactions in goods or assets. In effect, gold serves as an alternative fallback money instrument for adverse states of the world—when investors are unsure about the safety of conventional assets or fiat money in general (e.g. due to the risk of inflation or confiscation). In foreign exchange markets, gold behaves like an “inverse currency”: its price tends to fall when the fundamentals of major currencies improve, and tends to rise when the fundamentals of major currencies worsen. Over time, the most important driver of nominal exchange rates is the relative rate of inflation between two economies. Because gold has a quasi-fixed supply, its nominal value tends to rise at the rate of inflation in major markets. These correlation and store of value properties allow gold to play a very useful diversification role in portfolios. Originally, gold was likely adopted as a money medium due to its elemental properties. Gold and copper are the only metals which are not greyish in color in their natural state0F1, and they have captivated humans since ancient times. Gold is also relatively dense, malleable, and ductile (stretchable), and unlike many other metals it does not tarnish, rust, or corrode. These features have underpinned gold s use as a money instrument throughout human history. But the use of gold today has as much to do with inertia as it does with the metal s physical properties. After all, US Dollar notes are also a store of value, and they are made of paper1F2. Money, like language, is a social device—it is closer to a concept than a thing. Money is a social device that facilitates commerce, in much the same way that language is a social device that facilitates other aspects of our lives. It is useful for society to have a type of money that is not issued by a sovereign government. But the specific medium used for that purpose is partly arbitrary. Throughout history, a diverse array of 1 Gold s periodic symbol AU comes from the Latin word aurum, meaning “shining dawn.” 2 Technically a 75% cotton-based and 25% linen-based material. objects has functioned as money, dictated by the demands of place and time—as Bitcoiners and monetary historians are fond of pointing out. Classic examples include the tobacco-based money standards of the early American colonies, and the regular use of mobile phone minutes as money throughout Africa. Gold serves a money function today primarily as an artifact of history, not because it is literally the best possible medium for society s store of value needs. Gold plays an important diversification role in portfolios 10-year annualized returns Source: Bloomberg, MeasuringWorth, Goldman Sachs GIR. When inflation accelerated in the mid-20th century and investors sought out options to protect the real value of their assets, gold was the natural choice. At the time, major currencies were pegged to gold via the US Dollar through the Bretton Woods gold exchange standard, and, before the Great Depression, most currencies, as well as most US Treasury notes, were directly backed by gold. The US government provided an official price of gold in Dollars, which changed only twice in the nearly two centuries between the 1790s and 1970s. During the 1960s, under the gold exchange standard, gold trading above its official stated price was the clearest way to observe depreciation pressure on the US Dollar. In short, over much of the post- WWII period, there was a close association between the price of gold, currency stability, and the real value of money—making it the obvious inflation hedge for portfolios. But the official link between the Dollar and the price of gold was severed 50 years ago when President Nixon ended the convertibility of Dollars into gold in August 1971. As a result, a generation of asset holders have grown up in a world without a tight connection between gold and money. So when the need for a store of value asset arises, could it be that they reach for something else? Gold for the digital generation This is where bitcoin comes in. Any alternative medium would need to be secure, privately held, have a fixed or quasi-fixed supply, and be transferable, ideally outside the traditional payments system. In our modern globalized society, where a substantial portion of social interaction and commerce occurs online (especially among younger people), it may also need to be digital. But, most importantly, it would need to have the potential for widespread social adoption—anything can be money, as long as it has that. Bitcoin is therefore a plausible -10 0 10 20 30 40 1960 1967 1974 1981 1988 1995 2002 2009 2016 S&P 500 Gold Bitcoin as a macro asset El Goldman Sachs Global Investment Research 11 Top of Mind Issue 98 alternative store of value medium to gold and, at the moment, the best candidate among cryptocurrencies with a similar structure because of its broader social adoption (i.e. its “name brand”). In equilibrium, a store of value as volatile as bitcoin would not be very useful. But cryptocurrencies are in their infancy; it is better to think of today s prices as reflecting some probability that bitcoin or another coin/token could achieve greater adoption in the future, at which time its price could be extremely high. Therefore, small changes in those probabilities can result in high price volatility today. Bitcoin investors are speculating that it will eventually achieve near-universal acceptance as a non-sovereign money, with high returns (and high volatility) along the way. Today s bitcoin prices reflect some probability that cryptos could achieve greater adoption in the future Time (x-axis) vs. price (y-axis) Source: Goldman Sachs GIR. The critical ingredient to bitcoin s success—widespread social adoption—has now crossed many notable thresholds: Tesla, the sixth largest company in the S&P 500, is carrying bitcoin on its balance sheet; storied macro hedge fund Brevan Howard has begun investing in cryptocurrencies; and Coinbase is now listed on the Nasdaq. """

Tokens for our labels.

In [None]:
train_list= ['reg ac', 'galaxy', 'nyu', 'grayscale', 'sec', 'chainalysis', 'paypal', 'square', 'haver analytics', 'google llc ']

Labeling of train, valid and test data.

In [None]:
TRAIN_DATA= data_preparation(train,train_list)
TRAIN_DATA

[(' For Reg AC certiﬁcation and other important disclosures, see the Disclosure Appendix, or go to www',
  {'entities': [(5, 11, 'ORG')]}),
 (' Amid the recent volatility, we ask experts whether cryptos can and should be considered an institutional asset class, including Galaxy s Michael Novogratz (Yes; the mere fact that a critical mass of credible investors is engaging with cryptos has cemented this), NYU s Nouriel Roubini (No; cryptos have no income, utility or relationship with economic fundamentals), Grayscale s Michael Sonnenshein (Yes; their strong rebound in 2020 reassured investors about their resiliency as an asset class), and GS s own Mathew McDermott (clients increasingly say “yes”)',
  {'entities': [(129, 135, 'ORG'), (264, 267, 'ORG'), (368, 377, 'ORG')]}),
 (' We then speak to former SEC advisor Alan Cohen, Trail of Bits Dan Guido, and Chainalysis Michael Gronager to explore the regulatory, technological, and security obstacles to further institutional adoption',
  {'ent

In [None]:
TEST_DATA= data_preparation(test,train_list)
TEST_DATA

[('Michael Sonnenshein, CEO of Grayscale Investments, the world s largest digital asset manager, agrees that institutional investors now generally appreciate that digital assets are here to stay, with investors increasingly attracted to the finite quality of assets like bitcoin—which is verifiably scarce—as a way to hedge against inflation and currency debasement, and to diversify their portfolios in the pursuit of higher risk-adjusted returns',
  {'entities': [(28, 37, 'ORG')]}),
 (' And Nouriel Roubini, professor of economics at NYU s Stern School of Business, entirely disagrees with the idea that something with no income, utility or relationship with economic fundamentals can be considered a store of value, or an asset at all',
  {'entities': [(48, 51, 'ORG')]}),
 (' Alan Cohen, previous senior policy advisor to former SEC Chairman Jay Clayton and former GS Global Head of Compliance, explains how regulators are looking at crypto assets today',
  {'entities': [(54, 57, 'ORG')]}),
 ('

In [None]:
VAL_DATA= data_preparation(valid,train_list)
VAL_DATA

[(' In the context of payment systems, billions of transactions are made every day using AliPay and WeChat Pay in China, M-Pesa in Kenya and most of Sub-Saharan Africa, and Venmo, PayPal, and Square in the United States',
  {'entities': [(177, 183, 'ORG'), (189, 195, 'ORG')]})]

Transform the train and valid data to spaCy format, save data for the custom trained model.

In [None]:
custom_model(TRAIN_DATA, VAL_DATA)

100%|██████████| 6/6 [00:00<00:00, 398.06it/s]
100%|██████████| 1/1 [00:00<00:00, 363.62it/s]


Base configuration was downloaded from spaCy Quickstart https://spacy.io/usage/training#training-data and now I fill in a base config file for the custom model.

In [None]:
!python -m spacy init fill-config base_config.cfg config.cfg

[38;5;2m✔ Auto-filled config with all values[0m
[38;5;2m✔ Saved config[0m
config.cfg
You can now add your data and train your pipeline:
python -m spacy train config.cfg --paths.train ./train.spacy --paths.dev ./dev.spacy


Train and validate the custom model.

In [None]:
!python -m spacy train config.cfg --output ./output --paths.train train.spacy --paths.dev valid.spacy

[38;5;2m✔ Created output directory: output[0m
[38;5;4mℹ Using CPU[0m
[1m
[2022-05-15 15:53:38,551] [INFO] Set up nlp object from config
[2022-05-15 15:53:38,562] [INFO] Pipeline: ['tok2vec', 'ner']
[2022-05-15 15:53:38,570] [INFO] Created vocabulary
[2022-05-15 15:53:38,570] [INFO] Finished initializing nlp object
[2022-05-15 15:53:38,872] [INFO] Initialized pipeline components: ['tok2vec', 'ner']
[38;5;2m✔ Initialized pipeline[0m
[1m
[38;5;4mℹ Pipeline: ['tok2vec', 'ner'][0m
[38;5;4mℹ Initial learn rate: 0.001[0m
E    #       LOSS TOK2VEC  LOSS NER  ENTS_F  ENTS_P  ENTS_R  SCORE 
---  ------  ------------  --------  ------  ------  ------  ------
  0       0          0.00     55.17    0.00    0.00    0.00    0.00
 51     200          4.92    651.92   80.00   66.67  100.00    0.80
110     400          0.00      0.00   80.00   66.67  100.00    0.80
171     600          0.00      0.00   80.00   66.67  100.00    0.80
246     800          0.00      0.00   80.00   66.67  100.00 

Visualization of the results of the test data with the custom model.

In [None]:
custom_model_test_data(test, custom_model_path)