## Lecture Housekeeping:

- The use of disrespectful language is prohibited in the questions, this is a supportive, learning environment for all - please engage accordingly.
    - Please review Code of Conduct (in Student Undertaking Agreement) if unsure
- No question is daft or silly - ask them!
- There are Q&A sessions midway and at the end of the session, should you wish to ask any follow-up questions.
- Should you have any questions after the lecture, please schedule a mentor session.
- For all non-academic questions, please submit a query: [www.hyperiondev.com/support](www.hyperiondev.com/support)


#### Learning objectives

- Define NLP
- Understand the process of NLP
- Use SpaCy for text classification


##### What is NLP?

- Example: In Ironman Tony Stark goes into his garage and says "Wake up, daddy's home.". To which an AI named JARVIS responds saying "Welcome home, sir."
- Converts soundwaves to words
- Interprets words - “Wake up, daddy’s home” to a command to boot up and reply with “Welcome home, sir”

##### Natural Languages

- Natural language = human language (not programming language)
- JARVIS Example
    - Phonetics and Phonology - linguistic sounds.
    - Morphology - meaningful components of words.
    - Syntax - structural relationships between words.
    - Semantics - knowledge of meaning.
    - Pragmatics - relationship of meaning to goals and intentions of speaker.
    - Discourse - linguistic units larger than a single utterance.
- Lots of stuff for a computer to know!


##### Ambiguity

- Scenario - Tony is telling JARVIS that he was annoyed at a female worker, and threw a piece of paper. He then proceeds to say “I made her duck.”
- How does JARVIS interpret this?
    - I cooked a duck for her.
    - I cooked a duck belonging to her.
    - I created the duck that she owns.
    - I caused her to quickly lower her head.
    - I turned her into a duck (possibly with the help of Dr. Strange?).
- English (and natural languages generally) is weird and ambiguous. This makes it difficult for computers to navigate the way we speak.


##### Parts of Speech (POS)

- To know what “duck” means, we use POS tagging.
- We can identify what it means based on its position in the sentence.
- E.g. “The old man the boat”
    - The - determiner
    - Old - adjective
    - Man - noun/verb
    - The - determiner
    - Boat - noun.
- From its position in the sentence, you can see that “man” is a verb in this sentence.

##### How does NLP Work?

- Goal is to simulate human intelligence.
- Machine Learning techniques used to train a model to understand human language.
- Probability-based - therefore not always 100% accurate, but close enough, and getting closer every day!
![AI.png](attachment:AI.png)

##### Solving POS Tagging

- Give a program a bit set of tagged words (supervised training), and ask it to find patterns on an unseen sentence (test set).
- POS tagging tries to tag words with correct POS tag. Once we understand the POS of each word, we can then parse the sentence.
- Parsing - putting the sentence together in the right way so that it can be understood.


##### SpaCy

- A software package that comes with all of these models already!
- We won’t need to be generating models and training with data.
- We give it a sentence, it gives us linguistic data about the sentence.


##### Using SpaCy

- pip install spacy
- python3 -m spacy download en_core_web_sm  

In [None]:
import spacy

nlp = spacy.load('en_core_web_sm')

doc = nlp("this is a test sentence")
print([(w.text, w.pos_) for w in doc])

##### Tokenization
  
Basic

In [None]:
nlp = spacy.load('en_core_web_sm')
sample = "Today we get the chance to learn about NLP and build upon our knowledge of text classification."
doc = nlp(sample)

print(doc.text.split())

Using SpaCy

In [None]:
print([token.orth_ for token in doc])

In [None]:
print([(token, token.orth_, token.orth) for token in doc])

Remove punctuation and spaces

In [None]:
tokens = [token.orth_ for token in doc if not token.is_punct | token.is_space]
print(tokens)

Find stop words

In [None]:
for word in doc:
    if word.is_stop == True:
        print(word)

Lemmatisation

In [None]:
speak = "spoke speaking speak"
nlp_practice = nlp(speak)
print([word.lemma_ for word in nlp_practice])

Named Entity Recognition

In [None]:
paragraph = "The Cooper's Hill Cheese-Rolling and Wake is an annual event held on the Spring Bank Holiday at Cooper's Hill, near Gloucester in England.[1] Participants race down the 200-yard (180 m) long hill chasing a wheel of Double Gloucester cheese. The event has a long tradition, held by the people of the local village of Brockworth, but now people from all over the world take part. The Guardian called it a 'world-famous event', with winners coming from Canada, Belgium, Egypt, Australia, New Zealand and the United States. The most recent event was held on 29 May 2023"
nlp_paragraph = nlp(paragraph)

print([(ent, ent.label_, ent.label) for ent in nlp_paragraph.ents])

In [None]:
entity_money = spacy.explain("PERSON")
print(f"PERSON: {entity_money}")

# Questions and Answers
Questions around Text classification

# Thank you for joining!

## Please remember to:
- Take regular breaks.
- Stay hydrated.
- Avoid prolonged screen time.
- Don't slouch!
- Remember to have fun :)
