<a href="https://colab.research.google.com/github/MariaMuu/snippets/blob/main/NER_NLP_BTC.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Installing the Required Packages

In [None]:
pip install spacy # NER python library



In [None]:
# This will download the small English model for use with spaCy. You can then load this model in your Python code and use it to perform NER

!python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m37.4 MB/s[0m eta [36m0:00:00[0m
[?25h[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


# Loading a Pre-Trained Model

In [None]:
import spacy

# Processing the text
nlp = spacy.load('en_core_web_sm')

#ner_categories = ['PERSON','ORG', 'GPE', 'PRODUCT'] # a list of named entity categories is defined, specifying the categories of named entities that we want to recognize

In [None]:
"""# Read text from a file
with open('document.txt', 'r', encoding='utf-8') as file:
    text = file.read()"""

In [None]:
"""#for reading structured pdfs
from spacy_layout import spaCyLayout

nlp = spacy.blank("en")
layout = spaCyLayout(nlp)

doc = layout("document.pdf")

print(doc.text)
for ent in doc.ents:
    print(ent.text, ent.label_)"""

# Tokenizing the Text

In [None]:
text = "Bitcoin (BTC) is a cryptocurrency (a virtual currency) designed to act as money and a form of payment outside the control of any one person, group, or entity. This removes the need for trusted third-party involvement (e.g., a mint or bank) in financial transactions. Bitcoin was introduced to the public in 2008 by an anonymous developer or group of developers using the name Satoshi Nakamoto. In August 2008, the domain name Bitcoin.org was registered. It was created by Satoshi Nakamoto and Martti Malmi, who worked with the anonymous Nakamoto to develop Bitcoin. In October 2008, Nakamoto announced to the cryptography mailing list at metzdowd.com: I've been working on a new electronic cash system thats fully peer-to-peer, with no trusted third party. The now-famous white paper published on Bitcoin.org, entitled Bitcoin: A Peer-to-Peer Electronic Cash System would become the Magna Carta for how Bitcoin operates today. On Jan. 3, 2009, the first Bitcoin block was mined. Called Block 0, it is also known as the genesis block and contains the text: The Times 03/Jan/2009 Chancellor on brink of second bailout for banks perhaps proof that the block was mined on or after that date. Bitcoin rewards are halved every 210,000 blocks. For example, the block reward was 50 new bitcoins in 2009. On May 11, 2020, the third halving occurred, bringing the reward for each block down to 6.25 bitcoins. The fourth halving occurred in April 2024 and lowered the reward to 3.125 bitcoins. The next halving should happen in mid-2028 and reduce the reward to 1.5625 BTC"

doc = nlp(text)

# Identifying and Classifying Named Entities

The code checks whether the entity belongs to one of the specified named entity categories using the ent.label_ property, and if so, it appends the text and category of the entity to a list of entities. This list will be used later to print the named entities and their categories.

In [None]:
entities = []
for ent in doc.ents:
  if ent.label_ in ner_categories:
    entities.append((ent.text, ent.label_))
entities

[('BTC', 'ORG'),
 ('Satoshi Nakamoto', 'PERSON'),
 ('Bitcoin.org', 'ORG'),
 ('Satoshi Nakamoto', 'PERSON'),
 ('Martti Malmi', 'PERSON'),
 ('Nakamoto', 'PERSON'),
 ('Bitcoin', 'PERSON'),
 ('Nakamoto', 'PERSON'),
 ('Bitcoin.org', 'ORG'),
 ('Bitcoin', 'PERSON'),
 ('the Magna Carta', 'ORG'),
 ('Bitcoin', 'PERSON'),
 ('Times', 'ORG'),
 ('BTC', 'ORG')]

In [None]:
# Print the named entities and their categories

for entity, category in entities:
  print(f"{entity}: {category}")

BTC: ORG
Satoshi Nakamoto: PERSON
Bitcoin.org: ORG
Satoshi Nakamoto: PERSON
Martti Malmi: PERSON
Nakamoto: PERSON
Bitcoin: PERSON
Nakamoto: PERSON
Bitcoin.org: ORG
Bitcoin: PERSON
the Magna Carta: ORG
Bitcoin: PERSON
Times: ORG
BTC: ORG


In [None]:
for entity, category in entities:
  print(f"{category}: {entity}")

ORG: BTC
PERSON: Satoshi Nakamoto
ORG: Bitcoin.org
PERSON: Satoshi Nakamoto
PERSON: Martti Malmi
PERSON: Nakamoto
PERSON: Bitcoin
PERSON: Nakamoto
ORG: Bitcoin.org
PERSON: Bitcoin
ORG: the Magna Carta
PERSON: Bitcoin
ORG: Times
ORG: BTC


In [None]:
# Print the named entities and their categories in alphabetical order by entity text
sorted_entities = sorted(entities)

for entity, category in sorted_entities:
  print(f"{category}: {entity}")

ORG: BTC
ORG: BTC
PERSON: Bitcoin
PERSON: Bitcoin
PERSON: Bitcoin
ORG: Bitcoin.org
ORG: Bitcoin.org
PERSON: Martti Malmi
PERSON: Nakamoto
PERSON: Nakamoto
PERSON: Satoshi Nakamoto
PERSON: Satoshi Nakamoto
ORG: Times
ORG: the Magna Carta


In [None]:
#visualize the entities in the document
spacy.displacy.render(doc, style = 'ent')