 **Préparation de l'environnement :
Installez les bibliothèques nécessaires**

In [7]:
!pip install pandas
!pip install nltk




**Install and Connect To Kaggle**

In [8]:
!pip install kaggle




**Importation des fichiers de configuration dans Colab :**

In [9]:
from google.colab import files

uploaded = files.upload()


Saving kaggle.json to kaggle.json


**Déplacement du fichier de configuration :**

In [10]:
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/


**Autorisation pour utiliser l'API Kaggle :**

In [11]:
!chmod 600 ~/.kaggle/kaggle.json


**Téléchargement du dataset :**

In [12]:
!kaggle datasets download -d thoughtvector/customer-support-on-twitter

Downloading customer-support-on-twitter.zip to /content
100% 168M/169M [00:04<00:00, 41.2MB/s]
100% 169M/169M [00:04<00:00, 36.6MB/s]


**Décompression du dataset :**

In [13]:
!unzip customer-support-on-twitter.zip


Archive:  customer-support-on-twitter.zip
  inflating: sample.csv              
  inflating: twcs/twcs.csv           


**Chargement des données :**

In [14]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [15]:
#NLTK, Spacy

In [16]:
df=pd.read_csv('/content/sample.csv')

In [17]:
df

Unnamed: 0,tweet_id,author_id,inbound,created_at,text,response_tweet_id,in_response_to_tweet_id
0,119237,105834,True,Wed Oct 11 06:55:44 +0000 2017,@AppleSupport causing the reply to be disregar...,119236,
1,119238,ChaseSupport,False,Wed Oct 11 13:25:49 +0000 2017,@105835 Your business means a lot to us. Pleas...,,119239.0
2,119239,105835,True,Wed Oct 11 13:00:09 +0000 2017,@76328 I really hope you all change but I'm su...,119238,
3,119240,VirginTrains,False,Tue Oct 10 15:16:08 +0000 2017,@105836 LiveChat is online at the moment - htt...,119241,119242.0
4,119241,105836,True,Tue Oct 10 15:17:21 +0000 2017,@VirginTrains see attached error message. I've...,119243,119240.0
...,...,...,...,...,...,...,...
88,119330,105859,True,Wed Oct 11 13:50:42 +0000 2017,@105860 I wish Amazon had an option of where I...,119329,119331.0
89,119331,105860,True,Wed Oct 11 13:47:14 +0000 2017,They reschedule my shit for tomorrow https://t...,119330,
90,119332,Tesco,False,Wed Oct 11 13:34:06 +0000 2017,"@105861 Hey Sara, sorry to hear of the issues ...",119333,119334.0
91,119333,105861,True,Wed Oct 11 14:05:18 +0000 2017,@Tesco bit of both - finding the layout cumber...,119335119336,119332.0


In [21]:
# extract the text column
df_text = df.text
df_text

0     @AppleSupport causing the reply to be disregar...
1     @105835 Your business means a lot to us. Pleas...
2     @76328 I really hope you all change but I'm su...
3     @105836 LiveChat is online at the moment - htt...
4     @VirginTrains see attached error message. I've...
                            ...                        
88    @105860 I wish Amazon had an option of where I...
89    They reschedule my shit for tomorrow https://t...
90    @105861 Hey Sara, sorry to hear of the issues ...
91    @Tesco bit of both - finding the layout cumber...
92    @105861 If that doesn't help please DM your fu...
Name: text, Length: 93, dtype: object

**Nettoyage des données :**

**Les imports**

In [25]:
import pandas as pd
import re
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer, WordNetLemmatizer
from tabulate import tabulate
import numpy as np
import matplotlib.pyplot as plt
import re
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

**# Fonction de nettoyage des données**

In [26]:

def clean_text(text):
    # Mise en minuscules
    text = text.lower()
    # Suppression des ponctuations
    text = re.sub(r'[^\w\s]', '', text)
    # Suppression des mots vides (stopwords)
    stop_words = set(stopwords.words('english'))
    word_tokens = word_tokenize(text)
    text = ' '.join([word for word in word_tokens if word not in stop_words])
    # Stemming
    stemmer = PorterStemmer()
    text = ' '.join([stemmer.stem(word) for word in word_tokens])
    # Lemmatisation
    lemmatizer = WordNetLemmatizer()
    text = ' '.join([lemmatizer.lemmatize(word) for word in word_tokens])
    # Suppression des émojis, émoticônes, URL, balises HTML (pattern pour émojis et émoticônes)
    text = re.sub(r':\)|:|:-|;-\)|:-/|:-\|', '', text)
    text = re.sub(r'http\S+', '', text)
    text = re.sub(r'<.*?>', '', text)
    return text



**# Application de la fonction de nettoyage**

In [27]:
df['text_cleaned'] = df['text'].apply(clean_text)

**# Sélectionner les colonnes tweet_id, text et text_cleaned**

In [28]:
df_selected = df[['tweet_id', 'text', 'text_cleaned']]

**# Convertir le DataFrame en une liste de listes pour l'affichage sous forme de tableau**

In [29]:
data = df_selected.values.tolist()

**# Afficher les données sous forme de tableau avec les colonnes adjacentes**

In [30]:

print(tabulate(data, headers=['tweet_id', 'text', 'text_cleaned'], tablefmt='grid'))

+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
|   tweet_id | text                                                                                                                                                                       | text_cleaned                                                                                                                                    |
|     119237 | @AppleSupport causing the reply to be disregarded and the tapped notification under the keyboard is opened😡😡😡                                                           | applesupport causing the reply to be disregarded and the tapped notification under the keyboard is opened                                       |
+