<a href="https://colab.research.google.com/github/Ben-Ogega/Machine-Learning-Projects/blob/master/NLP_using_Spacy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Sentiment Analysis in Python
### In this notebook we will be doing some Natural Language Processing (NLP) using Python, NLTK, and Spacy



#### Implement **NLP in spaCy**

1.   Customize and extend built-in functionalities in spaCy
2.   Perform basic statistical analysis on a text
3.   Create a **pipeline** to process **unstructured text**
4.   Parse a sentence and extract meaningful insights from it
5.   I refer from this [site](https://realpython.com/natural-language-processing-spacy-python/)

### NLP is a subfield of artificial intelligence, and it’s all about allowing computers to comprehend human language.
> NLP involves **Analyzing, Quantifying, Understanding, and Deriving** meaning from natural languages. Read more [here](https://realpython.com/natural-language-processing-spacy-python/)

Examples of NLP applications include:

1.   BERT from Google

2.  GPT family from OpenAI





### NLP helps you extract insights from unstructured text and has many use cases, such as:

Automatic summarization

> **Named-entity recognition**

> **Question answering systems**

> **Sentiment analysis**

## Installation of spaCy

In [13]:
!pip install spacy



## Step 0. Read in Data and NLTK Basics

### The default model for the English language is designated as **en_core_web_sm**. Since the models are quite large, it’s best to **install them separately**— *including all languages in one package would make the download too massive.*

Import Spacy

In [14]:
import spacy
nlp = spacy.load("en_core_web_sm")
nlp #is a callable spacy object

<spacy.lang.en.English at 0x79d01b1c7f10>

In [15]:
 # To start processing my input, I construct a Doc object.
 # A Doc object is a sequence of Token objects represneting a lexical token.
 # A token is an individual object ie word, punctuation, symbol, whitespace

introduction_doc = nlp("This tutorial is about Natural Language Processing in spaCy.")
type(introduction_doc)


spacy.tokens.doc.Doc

In [16]:
# Generate tokens from the Doc
tokens = [token.text for token in introduction_doc]
tokens[0]

'This'

### We can also read from a file

In [17]:
# import pathlib
# file_name = "introduction.txt"
# introduction_doc = nlp(pathlib.Path(file_name).read_text(encoding="utf-8"))
# print ([token.text for token in introduction_doc])

## Sentence Detection
Sentence detection is the process of locating where **sentences start** and **end in a given text**.

This allows us **to divide a text into linguistically meaningful units.**

In [18]:
about_text = (
...     "Gus Proto is a Python developer currently"
...     " working for a London-based Fintech"
...     " company. He is interested in learning"
...     " Natural Language Processing."
... )
about_doc = nlp(about_text)
sentences = list(about_doc.sents) # .sents property is used to extract sentences from the Doc object
len(sentences)


2

In [19]:
# Printing the first 5 token span
for sentence in sentences:
   print(f"{sentence[:5]}...")
  #  print(type(sentence))

Gus Proto is a Python...
He is interested in learning...


Import Python Modules and Libraries

### Reading Data

\begin{equation}
E = mc^2
\end{equation}