# Nature Language Processing ( NLP )


- Natural Language Processing (NLP) is a fascinating field that sits at the intersection of artificial intelligence, linguistics, and computer science.
- It deals with the interaction between computers and humans using natural language.
- Here's a basic introduction to NLP

**Definition:**  NLP is a branch of artificial intelligence that enables computers to understand, interpret, and generate human language in a way that is both meaningful and useful.

## NLP Techniques
- Stemming
- Lemmatization
- Term Frequency-Inverse Document Frequency (TF-IDF)
- Bag of Words
- Word2Vec
- Word Embeddings
- Skip Grams
- CBOW

## Application of NLP

- Sentiment Analysis
- Machine Translation
- Text Summarization
- Question Answering
- Healthcare

## Advantages of NLP

- Automation
- Insight Extraction
- Personalization
- Language Translation

## Disadvantages of NLP

- Data Quality and Bias
- Privacy Concerns
- Lack of Domain Specificity


# NLP Libraries Installzation

## 1) NLTK (Natural Language Toolkit)
- NLTK is a comprehensive library for building Python programs to work with human language data.

In [1]:
!pip install nltk



In [2]:
import nltk
nltk.download('punkt')
from nltk.tokenize import word_tokenize

text = "Hello, world!"
tokens = word_tokenize(text)
print(tokens)


[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\USER\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping tokenizers\punkt.zip.


['Hello', ',', 'world', '!']


## 2. spaCy
- spaCy is an open-source library for advanced NLP in Python.

In [3]:
!pip install spacy

Collecting spacy
  Downloading spacy-3.7.5-cp311-cp311-win_amd64.whl.metadata (27 kB)
Collecting spacy-legacy<3.1.0,>=3.0.11 (from spacy)
  Downloading spacy_legacy-3.0.12-py2.py3-none-any.whl.metadata (2.8 kB)
Collecting spacy-loggers<2.0.0,>=1.0.0 (from spacy)
  Downloading spacy_loggers-1.0.5-py3-none-any.whl.metadata (23 kB)
Collecting murmurhash<1.1.0,>=0.28.0 (from spacy)
  Downloading murmurhash-1.0.10-cp311-cp311-win_amd64.whl.metadata (2.0 kB)
Collecting cymem<2.1.0,>=2.0.2 (from spacy)
  Downloading cymem-2.0.8-cp311-cp311-win_amd64.whl.metadata (8.6 kB)
Collecting preshed<3.1.0,>=3.0.2 (from spacy)
  Downloading preshed-3.0.9-cp311-cp311-win_amd64.whl.metadata (2.2 kB)
Collecting thinc<8.3.0,>=8.2.2 (from spacy)
  Downloading thinc-8.2.4-cp311-cp311-win_amd64.whl.metadata (15 kB)
Collecting wasabi<1.2.0,>=0.9.1 (from spacy)
  Downloading wasabi-1.1.3-py3-none-any.whl.metadata (28 kB)
Collecting srsly<3.0.0,>=2.4.3 (from spacy)
  Downloading srsly-2.4.8-cp311-cp311-win_amd64.

In [4]:
!python -m spacy download en_core_web_sm


Collecting en-core-web-sm==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl (12.8 MB)
     ---------------------------------------- 0.0/12.8 MB ? eta -:--:--
     ---------------------------------------- 0.0/12.8 MB ? eta -:--:--
     --------------------------------------- 0.0/12.8 MB 325.1 kB/s eta 0:00:40
     --------------------------------------- 0.1/12.8 MB 508.4 kB/s eta 0:00:26
     - -------------------------------------- 0.5/12.8 MB 2.3 MB/s eta 0:00:06
     ----- ---------------------------------- 1.7/12.8 MB 7.3 MB/s eta 0:00:02
     ----------- ---------------------------- 3.7/12.8 MB 13.0 MB/s eta 0:00:01
     ------------------- -------------------- 6.2/12.8 MB 18.7 MB/s eta 0:00:01
     --------------------------- ------------ 8.7/12.8 MB 23.2 MB/s eta 0:00:01
     ------------------------------ -------- 10.0/12.8 MB 24.6 MB/s eta 0:00:01
     -------------------------------------

In [5]:
import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("Hello, world!")
for token in doc:
    print(token.text, token.pos_, token.dep_)


Hello PROPN ROOT
, PUNCT punct
world NOUN appos
! PUNCT punct


## 3. Gensim
- Gensim is a library for topic modeling and document similarity analysis.

In [6]:
!pip install gensim

Collecting FuzzyTM>=0.4.0 (from gensim)
  Downloading FuzzyTM-2.0.9-py3-none-any.whl.metadata (7.9 kB)
Collecting pyfume (from FuzzyTM>=0.4.0->gensim)
  Downloading pyFUME-0.3.4-py3-none-any.whl.metadata (9.7 kB)
Collecting scipy>=1.7.0 (from gensim)
  Downloading scipy-1.10.1-cp311-cp311-win_amd64.whl.metadata (58 kB)
     ---------------------------------------- 0.0/59.0 kB ? eta -:--:--
     ------ --------------------------------- 10.2/59.0 kB ? eta -:--:--
     -------------------------- ----------- 41.0/59.0 kB 653.6 kB/s eta 0:00:01
     -------------------------------------- 59.0/59.0 kB 518.5 kB/s eta 0:00:00
Collecting numpy>=1.18.5 (from gensim)
  Downloading numpy-1.24.4-cp311-cp311-win_amd64.whl.metadata (5.6 kB)
Collecting simpful==2.12.0 (from pyfume->FuzzyTM>=0.4.0->gensim)
  Downloading simpful-2.12.0-py3-none-any.whl.metadata (4.8 kB)
Collecting fst-pso==1.8.1 (from pyfume->FuzzyTM>=0.4.0->gensim)
  Downloading fst-pso-1.8.1.tar.gz (18 kB)
  Preparing metadata (setup.

  You can safely remove it manually.
  You can safely remove it manually.


In [8]:
from gensim.models import Word2Vec

sentences = [["hello", "world"], ["my", "name", "is", "Vamsi"]]
model = Word2Vec(sentences, vector_size=100, window=5, min_count=1, workers=4)
vector = model.wv['hello']
print(vector)


[-8.7274825e-03  2.1301615e-03 -8.7354420e-04 -9.3190884e-03
 -9.4281426e-03 -1.4107180e-03  4.4324086e-03  3.7040710e-03
 -6.4986930e-03 -6.8730675e-03 -4.9994122e-03 -2.2868442e-03
 -7.2502876e-03 -9.6033178e-03 -2.7436293e-03 -8.3628409e-03
 -6.0388758e-03 -5.6709289e-03 -2.3441375e-03 -1.7069972e-03
 -8.9569986e-03 -7.3519943e-04  8.1525063e-03  7.6904297e-03
 -7.2061159e-03 -3.6668312e-03  3.1185520e-03 -9.5707225e-03
  1.4764392e-03  6.5244664e-03  5.7464195e-03 -8.7630618e-03
 -4.5171441e-03 -8.1401607e-03  4.5956374e-05  9.2636338e-03
  5.9733056e-03  5.0673080e-03  5.0610625e-03 -3.2429171e-03
  9.5521836e-03 -7.3564244e-03 -7.2703874e-03 -2.2653891e-03
 -7.7856064e-04 -3.2161034e-03 -5.9258583e-04  7.4888230e-03
 -6.9751858e-04 -1.6249407e-03  2.7443992e-03 -8.3591007e-03
  7.8558037e-03  8.5361041e-03 -9.5840869e-03  2.4462664e-03
  9.9049713e-03 -7.6658037e-03 -6.9669187e-03 -7.7365171e-03
  8.3959233e-03 -6.8133592e-04  9.1444086e-03 -8.1582209e-03
  3.7430846e-03  2.63504