# **Methode 1: Natural Language Processing**

nlkt peut être utilisé pour supprimer les mots vides du texte en Python. Il contient des mots vides de plusieurs langues différentes.

Le code suivant montre comment supprimer les mots vides avec ce package. 


In [None]:
pip install nltk

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
import nltk


In [None]:
from nltk.corpus import stopwords


In [None]:
nltk.download('stopwords')
nltk.download('punkt')
from nltk.tokenize import word_tokenize

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


In [None]:
text = """
 What is artificial intelligence?  """

tokens = word_tokenize(text)

In [None]:
tokens_without_sw= [word for word in tokens if word not in stopwords.words('english')]

In [None]:
print(tokens_without_sw)

['What', 'artificial', 'intelligence', '?']


In [None]:
print(tokens_without_sw)

['What', 'artificial', 'intelligence', '?']


In [None]:
tokens = tokens_without_sw
code_block = ' '.join(tokens)
print(code_block)

What artificial intelligence ?


# **Methode 2: stop-words**

Le package stop-words est utilisé pour supprimer les mots vides du texte en Python. Ce paquet contient des mots vides dans de nombreuses langues comme l’anglais, le danois, le français, l’espagnol et plus encore.

In [None]:
pip install stop_words

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting stop_words
  Downloading stop-words-2018.7.23.tar.gz (31 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: stop_words
  Building wheel for stop_words (setup.py) ... [?25l[?25hdone
  Created wheel for stop_words: filename=stop_words-2018.7.23-py3-none-any.whl size=32910 sha256=cb1fcc3aabbcf99a09d0bb3dc8deb3e7377057a493980477308cf11a9601a551
  Stored in directory: /root/.cache/pip/wheels/eb/03/0d/3bd31c983789aeb0b4d5e2ca48590288d9db1586cf5f225062
Successfully built stop_words
Installing collected packages: stop_words
Successfully installed stop_words-2018.7.23


In [None]:
from stop_words import get_stop_words
A = [word for word in tokens if word not in get_stop_words('french')]
print(A)

["C'est", 'quoi', "l'intelligence", 'artificielle', '?']


# **Methode 3: remove_stpwrds()**

La méthode remove_stpwrds() de la bibliothèque textcleaner est utilisée pour supprimer les mots vides du texte en Python.

In [None]:
pip install textcleaner

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting textcleaner
  Downloading textcleaner-0.4.26.tar.gz (4.9 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: textcleaner
  Building wheel for textcleaner (setup.py) ... [?25l[?25hdone
  Created wheel for textcleaner: filename=textcleaner-0.4.26-py3-none-any.whl size=4737 sha256=1478bd82366209ab7408c7f6ad0fd82ffb5efb38a4561647555bd241329b009d
  Stored in directory: /root/.cache/pip/wheels/42/f4/c4/9284af2d3be0674e9637abca7cfb1d3ee982ab5208075fb833
Successfully built textcleaner
Installing collected packages: textcleaner
Successfully installed textcleaner-0.4.26


In [None]:
import textcleaner as tc
data = tc.document(tokens)
print(data.remove_stpwrds())

C'est
quoi
l'intelligence
artificielle
?


La bibliothèque gensim possède aussi un outil remove_stopwords

In [None]:
import re
import gensim
from gensim.parsing.preprocessing import remove_stopwords

def clean_sentence(sentence, stopwords=False):
  sentence = sentence.lower().strip()
  sentence = re.sub(r'[^a-z0-9\s]', '', sentence)
  if stopwords:
    sentence = remove_stopwords(sentence)
  return sentence

def get_cleaned_sentences(tokens, stopwords=False):
  cleaned_sentences = []
  for row in tokens:
    cleaned = clean_sentence(row, stopwords)
    cleaned_sentences.append(cleaned)
  return cleaned_sentences

In [None]:
cleaned_sentences = get_cleaned_sentences(tokens, stopwords=True)
print(cleaned_sentences)

['', 'chatbot', '', '', 'automated', 'program', '', 'interacts', '', 'customers', '', '', 'human', '', '', 'costs', 'little', '', '', '', 'engage', '', '', 'chatbots', 'attend', '', 'customers', '', '', 'times', '', '', 'day', '', 'week', '', '', '', 'limited', '', 'time', '', '', 'physical', 'location', '', '', 'makes', '', 'implementation', 'appealing', '', '', 'lot', '', 'businesses', '', '', '', '', '', 'manpower', '', 'financial', 'resources', '', '', 'employees', 'working', '', '', 'clock', '', 'chatbots', '', 'convenient', '', 'providing', 'customer', 'service', '', 'support', '24', 'hours', '', 'day', '', '7', 'days', '', 'week', '', '', '', 'free', '', 'phone', 'lines', '', '', 'far', '', 'expensive', '', '', 'long', 'run', '', 'hiring', 'people', '', 'perform', 'support', '', '', 'ai', '', 'natural', 'language', 'processing', '', 'chatbots', '', '', 'better', '', 'understanding', '', 'customers', 'want', '', 'providing', '', 'help', '', 'need', '', 'companies', '', 'like', 'c

# **Named entity recognition**





In [None]:
import spacy

In [None]:
nlp=spacy.load('en_core_web_sm')

In [None]:
doc = nlp(text)

In [None]:
from spacy import displacy

displacy.render(nlp(doc.text),style='ent', jupyter=True)