<a href="https://colab.research.google.com/github/Linamaho/LearningNLP/blob/main/First_examples_with_NLP.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Use this notebook to start familiarizing with NLP and spaCy ✅
Some examples were taken from the course Advance NLP with spaCy available at [course.spacy.io/en](https://)

This are some basic lines of code to try out spaCy

# First, be sure to run the first cell to import Spacy and the pipeline

In [None]:
# Import spaCy
import spacy

# Import small pipeline (English)
nlp = spacy.load("en_core_web_sm")


English example

In [1]:
# Create the English nlp object
nlp = spacy.blank("en")

# Process a text
doc = nlp("This is a sentence.")

# Print the document text
print(doc.text)

This is a sentence.


**If your text or document is in Spanish. No problem! Let's try spaCy in Spanish:**

In [None]:
# Import spaCy
import spacy

# Create the Spanish nlp object
nlp = spacy.blank("es")

# Process a text (this is Spanish for: "How are you?")
doc = nlp("¿Cómo estás?")

# Print the document text
print(doc.text)

¿Cómo estás?




---



---



## Selecting slices of text

In [None]:
# Import spaCy
import spacy

# Create the English nlp object
nlp = spacy.blank("en")

# tokenize the text
doc = nlp("This is a sentence.")

# Print the first word in the sentence
print("The first word in the sentence is:", doc[0])


The first word in the sentence is: This


In [None]:
# Print the last word in the sentence
doc[3]

sentence



---



---



## Let's move on to some exciting uses of **NLP**

To extract the percentages in a text:

In [None]:
# Import spaCy
import spacy

# Create the English nlp object
nlp = spacy.blank("en")

# Process the text
doc = nlp(
    "In 1990, more than 60% of people in East Asia were in extreme poverty. "
    "Now less than 4% are."
)

# Iterate over the tokens in the doc
for token in doc:
    # Check if the token resembles a number
    if token.like_num:
        # Get the next token in the document
        next_token = doc[token.i + 1]
        # Check if the next token's text equals "%"
        if next_token.text == "%":
            print("Percentage found:", token.text)

Percentage found: 60
Percentage found: 4


### Predicting Name entities

In [2]:
# *******Insert the text down here******
text = "Apple is looking at buying U.K. startup for $1 billion"

# Process the text
doc = nlp(text)

# Iterate over the entities
for ent in doc.ents:
    # Print the entity text and label
    print(ent.text, ent.label_)

Apple ORG
U.K. GPE
$1 billion MONEY




---



---



# Extracting brand names

Texts taken from https://davidsonbranding.com.au/the-inspiration-behind-the-brand-names-of-10-world-famous-companies/

## Brand name example 1: Amazon

In [None]:
# *******Insert the text down here******
text = "Amazon wasn't the digital giant's first name or its second name for that matter. Amazon founder Jeff Bezos, now the richest man in the world, changed the brand name three times in the first year of business"
"before finally settling on the name for, what is now, one of the world's most iconic brands. The first name Jeff registered was Cadabra. However, Jeff very quickly went cold on the idea when his accountant miss heard"
"him and thought the name was Cadava. Back to the drawing board Jeff's next naming brainwave was 'Relentless'. However, soon after registering the name and domains, Jeff's friends and colleagues confessed that they"
"thought the name sounded too sinister. Strike two. In desperation, Jeff turned to the dictionary. He wanted the new name to start with the letter A so the company would appear first in a web search. It wasn't long"
"before he stumbled upon the perfect name Amazon. It was the ideal metaphor for his new venture. The Amazon was exotic and different, just as he wanted his online store to be. It was also the largest river in the world,"
"10 times larger than the next contender perfectly fitting the vision Jeff had for his business."

# Process the text
doc = nlp(text)

# Iterate over the entities
for ent in doc.ents:
    # Print the entity text and label
    print(ent.text, ent.label_)

Amazon ORG
first ORDINAL
second ORDINAL
Amazon ORG
Jeff Bezos PERSON
three CARDINAL
the first year DATE


## Brand name example 2: Sony

In [None]:
# *******Insert the text down here******
text = "In 1946 businessman Masaru Ibuka established Tokyo Tsushin Kogyo with a vision of “establishing an ideal factory that stresses a spirit of freedom and open-mindedness that will, through technology,"
"contribute to Japanese culture. After a decade of building a successful business, Masaru had aspirations to expand globally. During a business trip to the United States, he quickly discovered that Americans"
"had trouble pronouncing his company's name. To be a successful international brand, Masaru realised he had to change the company name. His first thought was to use the acronym 'TTK', however, this name was"
"being used by the Tokyo Rail. His next thought was to take the first two letters from the company's current name to create an abbreviation 'Totsuko'  (Tokyo Tsushin Kogyo), however, he discovered that westerners"
"also had trouble pronouncing it. He settled on the name Tokyo Teletech, however on a subsequent trip to the United States, discovered another company using this name. In 1958 the name 'Sony' was chosen inspired"
"by the Latin word 'sonus' meaning sonic and sound. The name had a second meaning sonny, a Japanese slang term describing a smart and presentable young man. Sony is now ranked 47th in the World's Most Valuable Brands"
"list, proving that persistence pays off."

# Process the text
doc = nlp(text)

# Iterate over the entities
for ent in doc.ents:
    # Print the entity text and label
    print(ent.text, ent.label_)

1946 DATE
Masaru Ibuka PERSON
Tokyo GPE
Tsushin Kogyo PERSON
Japanese NORP
a decade DATE
Masaru PERSON
the United States GPE
Americans NORP
Masaru PERSON
first ORDINAL
TTK ORG
the Tokyo Rail ORG
first ORDINAL
two CARDINAL
Tsushin Kogyo PERSON
Tokyo Teletech PERSON
the United States GPE
1958 DATE
Sony ORG
Latin NORP
second ORDINAL
Japanese NORP
Sony ORG
47th ORDINAL
the World’s Most Valuable Brands ORG


## Finding words and numbers in text

In [None]:
import spacy
from spacy.matcher import Matcher

nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)

doc = nlp(
    "After making the iOS update you won't notice a radical system-wide "
    "redesign: nothing like the aesthetic upheaval we got with iOS 7. Most of "
    "iOS 11's furniture remains the same as in iOS 10. But you will discover "
    "some tweaks once you delve a little deeper."
)

# Write a pattern for full iOS versions ("iOS 7", "iOS 11", "iOS 10")
pattern = [{"TEXT": "iOS"}, {"IS_DIGIT": True}]

# Add the pattern to the matcher and apply the matcher to the doc
matcher.add("IOS_VERSION_PATTERN", [pattern])
matches = matcher(doc)
print("Total matches found:", len(matches))

# Iterate over the matches and print the span text
for match_id, start, end in matches:
    print("Match found:", doc[start:end].text)

Total matches found: 3
Match found: iOS 7
Match found: iOS 11
Match found: iOS 10


## Finding words and phrases in text

Example 1

In [None]:
import spacy
from spacy.matcher import Matcher

nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)

doc = nlp(
    "i downloaded Fortnite on my laptop and can't open the game at all. Help? "
    "so when I was downloading Minecraft, I got the Windows version where it "
    "is the '.zip' folder and I used the default program to unpack it... do "
    "I also need to download Winzip?"
)

# Write a pattern that matches a form of "download" plus proper noun
pattern = [{"LEMMA": "download"}, {"POS": "PROPN"}]

# Add the pattern to the matcher and apply the matcher to the doc
matcher.add("DOWNLOAD_THINGS_PATTERN", [pattern])
matches = matcher(doc)
print("Total matches found:", len(matches))

# Iterate over the matches and print the span text
for match_id, start, end in matches:
    print("Match found:", doc[start:end].text)

Total matches found: 3
Match found: downloaded Fortnite
Match found: downloading Minecraft
Match found: download Winzip
