# Measuring the Impact of COVID-19 on the Shipping Industry in the Gulf Region

In social commerce, the role of shipping companies is considered essential to provide good customer experience. The objective of this study is to measure the impact of COVID-19 pandemic on social commerce costumers in the Gulf region. The study is focused on published information by costumers on international and domestic shipping companies through Twitter. We have analysed a total of 10,006 Arabic and English Tweets that were posted during the months of June and July 2020 to conduct our study. After performing sentiment analysis on these Tweets, the results show that even though more people switched to social commerce businesses, the costumer experience has dropped drastically due to delayed shipments in the Gulf region. 

Therefore, this notebook was produced to explore, get, and analyze Tweets using Twitter API. We started by getting Tweets for the GCC region as a whole, and then filtered down our research into each of its countries. Please note that several parts of the code written here was inspired from our course '99-520: Data Analysis for Social Commerce Platforms in the Gulf during COVID-19' in Carnegie Mellon University Qatar. All of this information can be found below:

## 1) Installing libraries and helper functions

First and foremost, we have to download the python libraries and write the helper functions needed to produce the data; and they are as follows:
#### 1) Tweepy: 
Twitter API used to gather Tweets from Twitter
#### 2) Pandas, matplotlib, and seaborn: 
Data visualization libraries
#### 3) Textblob, networkx, and nltk: 
Data analysis libraries
#### 4) Others: 
libraries used to assist with the analysis step


In [1]:
!pip install textblob
from textblob import TextBlob

import tweepy as tw
from tweepy import OAuthHandler

from wordcloud import WordCloud
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import string

import csv
import urllib
from pprint import pprint

import itertools
import collections
import re
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
from nltk import bigrams
import networkx as nx


#helper function used to remove the URL from a given string. This is helpful as the Tweets we get from the Twitter API often
#contains the url of the Tweet, which could hinder the accuracy of our sentiment analysis tools.
#Requires: a string
#Ensures: The same txt string with url's removed.
def remove_url(txt):
    return " ".join(re.sub("([^0-9A-Za-z \t])|(\w+:\/\/\S+)", "", txt).split())



[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\ashaa\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


# 2) Getting authentication

As we are getting information from Twitter, authentication matters because it enables Twitter to keep their networks secure by permitting only authenticated users (or processes) to access its protected resources. For privacy reasons, we have commented out our key and access tokens:

In [2]:
# consumer_key = 'Your consumer key here'
# consumer_secret = 'Your consumer secret here'
# access_token = 'Your access token here'
# access_secret = 'Your access secret here'
consumer_key = 'W5hpNRkG3lGflwXju5nGhC6yS'
consumer_secret = 'QrGCY6LvrlRY79KKcLs6LofX7vVEyKVCPO29uQe8ynxfC5MNCM'
access_token = '1279943091059900416-1XYRUDEbxgGqW5oFVwpJjXX2KNswzM'
access_secret = 'r2MwwdqgBUvOGU8xIkWDuCaxBiNKf8SvauRXvN4FLPxWK'

try:
    auth = tw.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_secret)
    api = tw.API(auth, wait_on_rate_limit=True)
    
except:
    print("Error: Authentication Failed")


## Aramex related Tweets in the GCC region

### 1) Aramex Shipping in GCC countries - English Tweets

In [3]:
#Search words and Date
search_words = """AramexUAE OR AramexQatar OR AramexKwi OR AramexQa OR Aramex_KSA OR UPS_Kuwait"""
date_since = "2019-12-1"

# To Keep or Remove Retweets
search_words = search_words + " -filter:retweets"

# Collect tweets
tweets = tw.Cursor(api.search,
                   q=search_words,
                   lang="en",
                   tweet_mode = 'extended',
                   since=date_since).items(1000)

#Saving the tweets in a csv file:
csvFile = open("aramex_gcc_en.csv", "w+")
csvWriter = csv.writer(csvFile)

#Writing the header of the csv file
csvWriter.writerow(["Date", "Tweet", "User", "Location"])

#general information saved for analysis
aramex_gcc_en = []

counter = 0

for tweet in tweets:
    time_created = tweet.created_at.strftime("%m/%d/%Y")

    filtered_text = tweet.full_text
    
    filtered_text = remove_url(filtered_text)
    
    screen_user = (tweet.user.screen_name.encode('utf-8'))

    location = (tweet.user.location.encode('utf-8'))
    
    if ("thanks for reaching out" not in filtered_text):
        aramex_gcc_en.append(tweet.full_text)
        counter += 1
        csvWriter.writerow([time_created, filtered_text, screen_user, location])
        
csvFile.close()

### 2) Aramex shipping in GCC countries - Arabic Tweets

In [4]:
#Search words and Date
search_words = """AramexUAE OR AramexQatar OR AramexKwi OR AramexQa OR AramexQatar OR Aramex_KSA"""
date_since = "2019-12-1"

# To Keep or Remove Retweets
search_words = search_words + " -filter:retweets"

# Collect tweets
tweets = tw.Cursor(api.search,
                   q=search_words,
                   lang="ar",
                   tweet_mode = 'extended',
                   since=date_since).items(1000)

#analysis
aramex_gcc_ar = []

#Saving the tweets in a csv file:
csvFile = open("aramex_gcc_ar.csv", "w+")
csvWriter = csv.writer(csvFile)

#Writing the header of the csv file
csvWriter.writerow(["Date", "Tweet", "User", "Location"])

for tweet in tweets:
    time_created = tweet.created_at.strftime("%m/%d/%Y")
    
    text = tweet.full_text
    filtered_text = tweet.full_text.encode('utf-8')
    
    screen_user = tweet.user.screen_name.encode('utf-8')
    
    location = tweet.user.location.encode('utf-8')

    if ("شكراً لتواصلك معنا" not in text):
        if ("مرحبا" not in text):
            aramex_gcc_ar.append(text)
            csvWriter.writerow([time_created, filtered_text, screen_user, location])
            
csvFile.close()

### 3) The usage of Shop and Ship in the GCC countries - English Tweets

In [5]:
#Search words and Date
search_words = "#shopandship OR shopandship OR @shopandship"
date_since = "2019-12-1"

# To Keep or Remove Retweets
search_words = search_words + " -filter:retweets"

# Collect tweets
tweets = tw.Cursor(api.search,
                   q=search_words,
                   lang="en",
                   tweet_mode='extended',
                   since=date_since).items(1000)

#general information saved for analysis
shopANDship_gcc_en = []

#Saving the tweets in a csv file:
csvFile = open("shopANDship_gcc_en.csv", "w+")
csvWriter = csv.writer(csvFile)

# Writing the header of the csv file
csvWriter.writerow(["Date", "Tweet", "User", "Location"])

for tweet in tweets:
    time_created = tweet.created_at.strftime("%m/%d/%Y")

    filtered_text = tweet.full_text
    
    filtered_text = remove_url(filtered_text)
    
    screen_user = (tweet.user.screen_name.encode('utf-8'))

    location = (tweet.user.location.encode('utf-8'))
    
    if ("thanks for reaching out" not in tweet.full_text):
        counter += 1
        shopANDship_gcc_en.append(tweet.full_text)
        csvWriter.writerow([time_created, filtered_text, screen_user, location])

csvFile.close()

### 4) The usage of Shop and Ship in the GCC countries - Arabic Tweets

In [6]:
#Search words and Date
search_words = "#shopandship OR shopandship OR @shopandship"
date_since = "2019-12-1"

# To Keep or Remove Retweets
search_words = search_words + " -filter:retweets"
print(search_words)

tweets = tw.Cursor(api.search,
                   q=search_words,
                   lang="ar",
                   tweet_mode='extended',
                   since=date_since).items()

#general information saved for analysis
shopANDship_gcc_ar = []

#Saving the tweets in a csv file:

csvFile = open("shopANDship_gcc_ar.csv", "w+")
csvWriter = csv.writer(csvFile)

#Writing the header of the csv file
csvWriter.writerow(["Date", "Tweet", "User", "Location"])

for tweet in tweets:

    time_created = tweet.created_at.strftime("%m/%d/%Y")

    filtered_text = tweet.full_text.encode('utf-8')
    text = tweet.full_text
    screen_user = tweet.user.screen_name.encode('utf-8')
    
    location = tweet.user.location.encode('utf-8')

    if ("شكراً لتواصلك معنا" not in text):
        if ("مرحبا" not in text):
            shopANDship_gcc_ar.append(text)
            csvWriter.writerow([time_created, filtered_text, screen_user, location])

csvFile.close()


#shopandship OR shopandship OR @shopandship -filter:retweets


## UPS related Tweets in the GCC region

### 1) The usage of UPS shipping service in the GCC region - English Tweets

In [7]:
#Search words and Date
search_words = "UPS_UAE OR UPSQatar OR UPSKwi OR UPSQa OR UPS_KSA OR UPS_Kuwait"
date_since = "2019-12-1"

# To Keep or Remove Retweets
search_words = search_words + " -filter:retweets"

# Collect tweets
tweets = tw.Cursor(api.search,
                   q=search_words,
                   lang="en",
                   tweet_mode = 'extended',
                   since=date_since).items(1000)

#Saving the tweets in a csv file:
csvFile = open("ups_gcc_en.csv", "w+")
csvWriter = csv.writer(csvFile)

#Writing the header of the csv file
csvWriter.writerow(["Date", "Tweet", "User", "Location"])

#general information saved for analysis
ups_gcc_en = []

counter = 0

for tweet in tweets:
    time_created = tweet.created_at.strftime("%m/%d/%Y")

    filtered_text = tweet.full_text
    
    filtered_text = remove_url(filtered_text)
    
    screen_user = (tweet.user.screen_name.encode('utf-8'))

    location = (tweet.user.location.encode('utf-8'))
    
    if ("thanks for reaching out" not in filtered_text):
        aramex_gcc_en.append(tweet.full_text)
        counter += 1
        csvWriter.writerow([time_created, filtered_text, screen_user, location])
        
csvFile.close()

### 2) The usage of UPS shipping service in the GCC region - Arabic Tweets

In [8]:
#Search words and Date
search_words = "UPS_UAE OR UPSQatar OR UPSKwi OR UPSQa OR UPS_KSA OR UPS_Kuwait"
date_since = "2019-12-1"

# To Keep or Remove Retweets
search_words = search_words + " -filter:retweets"

# Collect tweets
tweets = tw.Cursor(api.search,
                   q=search_words,
                   lang="ar",
                   tweet_mode = 'extended',
                   since=date_since).items(1000)

#Saving the tweets in a csv file:
csvFile = open("ups_gcc_ar.csv", "w+")
csvWriter = csv.writer(csvFile)

#Writing the header of the csv file
csvWriter.writerow(["Date", "Tweet", "User", "Location"])

#general information saved for analysis
ups_gcc_ar = []

for tweet in tweets:

    time_created = tweet.created_at.strftime("%m/%d/%Y")

    filtered_text = tweet.full_text.encode('utf-8')
    text = tweet.full_text
    screen_user = tweet.user.screen_name.encode('utf-8')
    
    location = tweet.user.location.encode('utf-8')

    if ("شكراً لتواصلك معنا" not in text):
        if ("مرحبا" not in text):
            shopANDship_gcc_ar.append(text)
            csvWriter.writerow([time_created, filtered_text, screen_user, location])

csvFile.close()

## DHL related Tweets in the GCC region

### 1) The usage of DHL shipping service in the GCC region - English Tweets

In [9]:
#Search words and Date
search_words = "DHL_UAE OR DHLQatar OR DHLKwi OR DHLQa OR DHL_KSA OR DHL_Kuwait"
date_since = "2019-12-1"

# To Keep or Remove Retweets
search_words = search_words + " -filter:retweets"

# Collect tweets
tweets = tw.Cursor(api.search,
                   q=search_words,
                   lang="ar",
                   tweet_mode = 'extended',
                   since=date_since).items(1000)

#Saving the tweets in a csv file:
csvFile = open("dhl_gcc_en.csv", "w+")
csvWriter = csv.writer(csvFile)

#Writing the header of the csv file
csvWriter.writerow(["Date", "Tweet", "User", "Location"])

#general information saved for analysis
dhl_gcc_en = []

counter = 0

for tweet in tweets:
    time_created = tweet.created_at.strftime("%m/%d/%Y")

    filtered_text = tweet.full_text
    
    filtered_text = remove_url(filtered_text)
    
    screen_user = (tweet.user.screen_name.encode('utf-8'))

    location = (tweet.user.location.encode('utf-8'))
    
    if ("thanks for reaching out" not in filtered_text):
        aramex_gcc_en.append(tweet.full_text)
        counter += 1
        csvWriter.writerow([time_created, filtered_text, screen_user, location])
        
csvFile.close()

### 2) The usage of DHL shipping service in the GCC region - Arabic Tweets

In [10]:
#Search words and Date
search_words = "DHL_UAE OR DHLQatar OR DHLKwi OR DHLQa OR DHL_KSA OR DHL_Kuwait"
date_since = "2019-12-1"

# To Keep or Remove Retweets
search_words = search_words + " -filter:retweets"

# Collect tweets
tweets = tw.Cursor(api.search,
                   q=search_words,
                   lang="ar",
                   tweet_mode = 'extended',
                   since=date_since).items(1000)

#Saving the tweets in a csv file:
csvFile = open("dhl_gcc_ar.csv", "w+")
csvWriter = csv.writer(csvFile)

#Writing the header of the csv file
csvWriter.writerow(["Date", "Tweet", "User", "Location"])

#general information saved for analysis
dhl_gcc_ar = []

for tweet in tweets:

    time_created = tweet.created_at.strftime("%m/%d/%Y")

    filtered_text = tweet.full_text.encode('utf-8')
    text = tweet.full_text
    screen_user = tweet.user.screen_name.encode('utf-8')
    
    location = tweet.user.location.encode('utf-8')

    if ("شكراً لتواصلك معنا" not in text):
        if ("مرحبا" not in text):
            shopANDship_gcc_ar.append(text)
            csvWriter.writerow([time_created, filtered_text, screen_user, location])

csvFile.close()

As the number of Tweets available for DHL and UPS shipping companies is signficantly lower than for Aramex (0.2:99.8), we decided to further our analysis with Aramex shipping company.

Here, we generated Tweets for specific to every GCC country to be able to analyze each of them more specifically. 

### 1) Shipping in Qatar - English Tweets

In [11]:
#Search words and Date
search_words = '@Qatar_post OR @MOCIQatar'

date_since = "2019-12-1"

# To Keep or Remove Retweets
search_words = search_words + " -filter:retweets"

# Collect tweets
tweets = tw.Cursor(api.search,
                   q=search_words,
                   lang="en",
                   tweet_mode='extended',
                   since=date_since).items(1000)

#general information saved for analysis
shipping_qatar_en = []

#Saving the tweets in a csv file:
csvFile = open("shipping_qatar_en.csv", "w+")
csvWriter = csv.writer(csvFile)

#Writing the header of the csv file
csvWriter.writerow(["Date", "Tweet", "User", "Location"])

for tweet in tweets:
    replies = []
    time_created = tweet.created_at.strftime("%m/%d/%Y")

    filtered_text = tweet.full_text
    
    filtered_text = remove_url(filtered_text)
    
    screen_user = (tweet.user.screen_name.encode('utf-8'))

    location = (tweet.user.location.encode('utf-8'))
    
    if ("thanks for reaching out" not in filtered_text):
        counter += 1
        shipping_qatar_en.append(tweet.full_text)
        csvWriter.writerow([time_created, filtered_text, screen_user, location])

search_words = "@shopandship AND qatar"
search_words = search_words + " -filter:retweets"

tweets = tw.Cursor(api.search,
                   q=search_words,
                   lang="en",
                   tweet_mode='extended',
                   since=date_since).items(1000)

for tweet in tweets:
    time_created = tweet.created_at.strftime("%m/%d/%Y")

    filtered_text = tweet.full_text
    
    filtered_text = remove_url(filtered_text)
    
    screen_user = (tweet.user.screen_name.encode('utf-8'))

    location = (tweet.user.location.encode('utf-8'))
    
    if ("general inquiries" not in filtered_text):
        counter += 1
        shipping_qatar_en.append(filtered_text)
        csvWriter.writerow([time_created, filtered_text, screen_user, location])
csvFile.close()

### 2) Shipping in Qatar - Arabic Tweets

In [12]:
#Search words and Date
search_words = '@Qatar_post OR @MOCIQatar'
date_since = "2019-12-1"

# To Keep or Remove Retweets
search_words = search_words + " -filter:retweets"

# Collect tweets
tweets = tw.Cursor(api.search,
                   q=search_words,
                   lang="ar",
                   tweet_mode='extended',
                   since=date_since).items()

#general information saved for analysis
shipping_qatar_ar = []

#Saving the tweets in a csv file:

csvFile = open("shipping_qatar_ar.csv", "w+")
csvWriter = csv.writer(csvFile)

#Writing the header of the csv file
csvWriter.writerow(["Date", "Tweet", "User", "Location"])

for tweet in tweets:

    time_created = tweet.created_at.strftime("%m/%d/%Y")

    filtered_text = tweet.full_text.encode('utf-8')

    text = tweet.full_text
    
    screen_user = tweet.user.screen_name.encode('utf-8')
    
    location = tweet.user.location.encode('utf-8')

    if ("شكراً لتواصلك معنا" not in text):
        shipping_qatar_ar.append(text)
        csvWriter.writerow([time_created, filtered_text, screen_user, location])

        
search_words = "@shopandship AND qatar"
search_words = search_words + " -filter:retweets"

tweets = tw.Cursor(api.search,
                   q=search_words,
                   lang="ar",
                   tweet_mode='extended',
                   since=date_since).items(1000)

for tweet in tweets:
    time_created = tweet.created_at.strftime("%m/%d/%Y")

    filtered_text = tweet.full_text.encode('utf-8')

    text = tweet.full_text
    
    screen_user = tweet.user.screen_name.encode('utf-8')
    
    location = tweet.user.location.encode('utf-8')

    if ("شكراً لتواصلك معنا" not in text):
        shipping_qatar_ar.append(text)
        csvWriter.writerow([time_created, filtered_text, screen_user, location])

csvFile.close()

### 3) Shipping in UAE - English Tweets

In [13]:
#Search words and Date
search_words = 'aramex AND Dubai'
date_since = "2019-12-1"

# To Keep or Remove Retweets
search_words = search_words + " -filter:retweets"

# Collect tweets
tweets = tw.Cursor(api.search,
                   q=search_words,
                   lang="en",
                   tweet_mode='extended',
                   since=date_since).items(1000)

#general information saved for analysis
aramex_UAE_en = []

#Saving the tweets in a csv file:
csvFile = open("aramex_UAE_en.csv", "w+")
csvWriter = csv.writer(csvFile)

#Writing the header of the csv file
csvWriter.writerow(["Date", "Tweet", "User", "Location"])

for tweet in tweets:
    replies = []
    time_created = tweet.created_at.strftime("%m/%d/%Y")

    filtered_text = tweet.full_text
    
    filtered_text = remove_url(filtered_text)
    
    screen_user = (tweet.user.screen_name.encode('utf-8'))

    location = (tweet.user.location.encode('utf-8'))
    
    if ("thanks for reaching out" not in filtered_text):
        counter += 1
        aramex_UAE_en.append(tweet.full_text)
        csvWriter.writerow([time_created, filtered_text, screen_user, location])

        
search_words = "@shopandship AND dubai"
search_words = search_words + " -filter:retweets"

tweets = tw.Cursor(api.search,
                   q=search_words,
                   lang="en",
                   tweet_mode='extended',
                   since=date_since).items(1000)

for tweet in tweets:
    replies = []
    time_created = tweet.created_at.strftime("%m/%d/%Y")

    filtered_text = tweet.full_text
    
    filtered_text = remove_url(filtered_text)
    
    screen_user = (tweet.user.screen_name.encode('utf-8'))

    location = (tweet.user.location.encode('utf-8'))
    
    if ("general inquiries" not in filtered_text):
        aramex_UAE_en.append(filtered_text)
        csvWriter.writerow([time_created, filtered_text, screen_user, location])

search_words = "@shopandship AND UAE"
search_words = search_words + " -filter:retweets"

tweets = tw.Cursor(api.search,
                   q=search_words,
                   lang="en",
                   tweet_mode='extended',
                   since=date_since).items(1000)

for tweet in tweets:
    replies = []
    time_created = tweet.created_at.strftime("%m/%d/%Y")

    filtered_text = tweet.full_text
    
    filtered_text = remove_url(filtered_text)
    
    screen_user = (tweet.user.screen_name.encode('utf-8'))

    location = (tweet.user.location.encode('utf-8'))
    
    if ("general inquiries" not in filtered_text):
        aramex_UAE_en.append(text.full_text)
        csvWriter.writerow([time_created, filtered_text, screen_user, location])
csvFile.close()

AttributeError: 'str' object has no attribute 'full_text'

### 4) Shipping in UAE - Arabic Tweets

In [None]:
#Search words and Date
search_words = '#aramex and UAE'
date_since = "2019-12-1"

# To Keep or Remove Retweets
search_words = search_words

# Collect tweets
tweets = tw.Cursor(api.search,
                   q=search_words,
                   lang="ar",
                   tweet_mode='extended',
                   since=date_since).items()

#general information saved for analysis
aramex_UAE_ar = []

#Saving the tweets in a csv file:

csvFile = open("aramex_UAE_ar.csv", "w+")
csvWriter = csv.writer(csvFile)

#Writing the header of the csv file
csvWriter.writerow(["Date", "Tweet", "User", "Location"])

for tweet in tweets:

    time_created = tweet.created_at.strftime("%m/%d/%Y")

    filtered_text = tweet.full_text.encode('utf-8')

    text = tweet.full_text
    
    screen_user = tweet.user.screen_name.encode('utf-8')
    
    location = tweet.user.location.encode('utf-8')

    if ("شكراً لتواصلك معنا" not in text):
        aramex_UAE_ar.append(text)
        
        csvWriter.writerow([time_created, filtered_text, screen_user, location])

        
search_words = "@shopandship AND dubai"
search_words = search_words + " -filter:retweets"

tweets = tw.Cursor(api.search,
                   q=search_words,
                   lang="ar",
                   tweet_mode='extended',
                   since=date_since).items(1000)

for tweet in tweets:
    time_created = tweet.created_at.strftime("%m/%d/%Y")

    filtered_text = tweet.full_text.encode('utf-8')

    text = tweet.full_text
    
    screen_user = tweet.user.screen_name.encode('utf-8')
    
    location = tweet.user.location.encode('utf-8')

    if ("شكراً لتواصلك معنا" not in text):
        aramex_UAE_ar.append(text)
        csvWriter.writerow([time_created, filtered_text, screen_user, location])


search_words = "@shopandship AND UAE"
search_words = search_words + " -filter:retweets"

tweets = tw.Cursor(api.search,
                   q=search_words,
                   lang="ar",
                   tweet_mode='extended',
                   since=date_since).items(1000)

for tweet in tweets:
    time_created = tweet.created_at.strftime("%m/%d/%Y")

    filtered_text = tweet.full_text.encode('utf-8')

    text = tweet.full_text
    
    screen_user = tweet.user.screen_name.encode('utf-8')
    
    location = tweet.user.location.encode('utf-8')

    if ("شكراً لتواصلك معنا" not in text):
        aramex_UAE_ar.append(text)
        csvWriter.writerow([time_created, filtered_text, screen_user, location])

csvFile.close()

### 5) Shipping in Kuwait - English Tweets


In [None]:
#Search words and Date
search_words = '@AramexKWI'
date_since = "2019-12-1"

# To Keep or Remove Retweets
search_words = search_words + " -filter:retweets"

# Collect tweets
tweets = tw.Cursor(api.search,
                   q=search_words,
                   lang="en",
                   tweet_mode='extended',
                   since=date_since).items(1000)

#general information saved for analysis
aramex_kuwait_en = []


#Saving the tweets in a csv file:
csvFile = open("aramex_kuwait_en.csv", "w+")
csvWriter = csv.writer(csvFile)

#Writing the header of the csv file
csvWriter.writerow(["Date", "Tweet", "User", "Location"])

for tweet in tweets:
    replies = []
    time_created = tweet.created_at.strftime("%m/%d/%Y")

    filtered_text = tweet.full_text
    
    filtered_text = remove_url(filtered_text)
    
    screen_user = (tweet.user.screen_name.encode('utf-8'))

    location = (tweet.user.location.encode('utf-8'))
    
    if ("general inquiries" not in filtered_text):
        aramex_kuwait_en.append(filtered_text)
        csvWriter.writerow([time_created, filtered_text, screen_user, location])

search_words = '@UPS AND kuwait'
search_words = search_words + " -filter:retweets"

tweets = tw.Cursor(api.search,
                   q=search_words,
                   lang="en",
                   tweet_mode='extended',
                   since=date_since).items(1000)

for tweet in tweets:
    replies = []
    time_created = tweet.created_at.strftime("%m/%d/%Y")

    filtered_text = tweet.full_text
    
    filtered_text = remove_url(filtered_text)
    
    screen_user = (tweet.user.screen_name.encode('utf-8'))

    location = (tweet.user.location.encode('utf-8'))
    
    if ("general inquiries" not in filtered_text):
        aramex_kuwait_en.append(filtered_text)
        csvWriter.writerow([time_created, filtered_text, screen_user, location])

search_words = '@Aramex AND kuwait'
search_words = search_words + " -filter:retweets"

tweets = tw.Cursor(api.search,
                   q=search_words,
                   lang="en",
                   tweet_mode='extended',
                   since=date_since).items(1000)

for tweet in tweets:
    replies = []
    time_created = tweet.created_at.strftime("%m/%d/%Y")

    filtered_text = tweet.full_text
    
    filtered_text = remove_url(filtered_text)
    
    screen_user = (tweet.user.screen_name.encode('utf-8'))

    location = (tweet.user.location.encode('utf-8'))
    
    if ("general inquiries" not in filtered_text):
        aramex_kuwait_en.append(filtered_text)
        csvWriter.writerow([time_created, filtered_text, screen_user, location])


csvFile.close()

### 6) Shipping in Kuwait - Arabic Tweets


In [None]:
#Search words and Date
search_words = '@AramexKWI'
date_since = "2019-12-1"

# To Keep or Remove Retweets
search_words = search_words + " -filter:retweets"

# Collect tweets
tweets = tw.Cursor(api.search,
                   q=search_words,
                   lang="ar",
                   tweet_mode='extended',
                   since=date_since).items(1000)

#general information saved for analysis
aramex_kuwait_ar = []
    
# Saving the tweets in a csv file:
csvFile = open("aramex_kuwait_ar.csv", "w+")
csvWriter = csv.writer(csvFile)

# Writing the header of the csv file
csvWriter.writerow(["Date", "Tweet", "User", "Location"])

for tweet in tweets:
    time_created = tweet.created_at.strftime("%m/%d/%Y")

    filtered_text = tweet.full_text.encode('utf-8')

    text = tweet.full_text
    
    screen_user = tweet.user.screen_name.encode('utf-8')
    
    location = tweet.user.location.encode('utf-8')

    if ("شكراً لتواصلك معنا" not in text):
        aramex_kuwait_ar.append(text)
        csvWriter.writerow([time_created, filtered_text, screen_user, location])

search_words = '@UPS AND kuwait'
search_words = search_words + " -filter:retweets"

tweets = tw.Cursor(api.search,
                   q=search_words,
                   lang="ar",
                   tweet_mode='extended',
                   since=date_since).items(1000)

for tweet in tweets:
    time_created = tweet.created_at.strftime("%m/%d/%Y")

    filtered_text = tweet.full_text.encode('utf-8')

    text = tweet.full_text
    
    screen_user = tweet.user.screen_name.encode('utf-8')
    
    location = tweet.user.location.encode('utf-8')

    if ("شكراً لتواصلك معنا" not in text):
        aramex_kuwait_ar.append(text)
        csvWriter.writerow([time_created, filtered_text, screen_user, location])


search_words = '@Aramex AND kuwait'
search_words = search_words + " -filter:retweets"

tweets = tw.Cursor(api.search,
                   q=search_words,
                   lang="ar",
                   tweet_mode='extended',
                   since=date_since).items(1000)

for tweet in tweets:
    time_created = tweet.created_at.strftime("%m/%d/%Y")

    filtered_text = tweet.full_text.encode('utf-8')

    text = tweet.full_text
    
    screen_user = tweet.user.screen_name.encode('utf-8')
    
    location = tweet.user.location.encode('utf-8')

    if ("شكراً لتواصلك معنا" not in text):
        aramex_kuwait_ar.append(text)
        csvWriter.writerow([time_created, filtered_text, screen_user, location])
csvFile.close()

### 7) Shipping in KSA - English Tweets

In [None]:
#Search words and Date
search_words = '@Aramex_KSA'
date_since = "2019-12-1"

# To Keep or Remove Retweets
search_words = search_words + " -filter:retweets"
print(search_words)

# Collect tweets
tweets = tw.Cursor(api.search,
                   q=search_words,
                   lang="en",
                   tweet_mode='extended',
                   since=date_since).items(1000)

#general information saved for analysis
aramex_KSA_en = []
    
#Saving the tweets in a csv file:
csvFile = open("aramex_KSA_en.csv", "w+")
csvWriter = csv.writer(csvFile)

#Writing the header of the csv file
csvWriter.writerow(["Date", "Tweet", "User", "Location"])


for tweet in tweets:
    replies = []
    time_created = tweet.created_at.strftime("%m/%d/%Y")

    filtered_text = tweet.full_text
    
    filtered_text = remove_url(filtered_text)
    
    screen_user = (tweet.user.screen_name.encode('utf-8'))

    location = (tweet.user.location.encode('utf-8'))
    
    if ("general inquiries" not in filtered_text):
        aramex_KSA_en.append(tweet.full_text)
        csvWriter.writerow([time_created, filtered_text, screen_user, location])

search_words = '@Aramex AND KSA'
search_words = search_words + " -filter:retweets"

tweets = tw.Cursor(api.search,
                   q=search_words,
                   lang="en",
                   tweet_mode='extended',
                   since=date_since).items(1000)

for tweet in tweets:
    replies = []
    time_created = tweet.created_at.strftime("%m/%d/%Y")

    filtered_text = tweet.full_text
    
    filtered_text = remove_url(filtered_text)
    
    screen_user = (tweet.user.screen_name.encode('utf-8'))

    location = (tweet.user.location.encode('utf-8'))
    
    if ("general inquiries" not in filtered_text):
        aramex_KSA_en.append(tweet.full_text)
        csvWriter.writerow([time_created, filtered_text, screen_user, location])

csvFile.close()

### 8) Shipping in KSA - Arabic Tweets

In [None]:
#Search words and Date
search_words = '@Aramex_KSA'
date_since = "2019-12-1"

# To Keep or Remove Retweets
search_words = search_words + " -filter:retweets"

# Collect tweets
tweets = tw.Cursor(api.search,
                   q=search_words,
                   lang="ar",
                   tweet_mode='extended',
                   since=date_since).items(1000)

#Saving the tweets in a csv file:
csvFile = open("aramex_KSA_ar.csv", "w+")
csvWriter = csv.writer(csvFile)

#Writing the header of the csv file
csvWriter.writerow(["Date", "Tweet", "User", "Location"])

#general information saved for analysis
aramex_KSA_ar = []

for tweet in tweets:
    time_created = tweet.created_at.strftime("%m/%d/%Y")

    filtered_text = tweet.full_text.encode('utf-8')

    text = tweet.full_text
    
    screen_user = tweet.user.screen_name.encode('utf-8')
    
    location = tweet.user.location.encode('utf-8')

    if ("شكراً لتواصلك معنا" not in text):
        aramex_KSA_ar.append(text)
        csvWriter.writerow([time_created, filtered_text, screen_user, location])


search_words = '@Aramex AND KSA'
search_words = search_words + " -filter:retweets"

tweets = tw.Cursor(api.search,
                   q=search_words,
                   lang="ar",
                   tweet_mode='extended',
                   since=date_since).items(1000)

for tweet in tweets:
    time_created = tweet.created_at.strftime("%m/%d/%Y")

    filtered_text = tweet.full_text.encode('utf-8')

    text = tweet.full_text
    
    screen_user = tweet.user.screen_name.encode('utf-8')
    
    location = tweet.user.location.encode('utf-8')

    if ("شكراً لتواصلك معنا" not in text):
        aramex_KSA_ar.append(text)
        csvWriter.writerow([time_created, filtered_text, screen_user, location])

csvFile.close()

And by that, we have gathered Tweets the GCC region as a whole for each one of its country and saved them in a csv file for further analysis.

## Data visualization

Now, we want to get a general idea about the number of Tweets we have:

In [None]:
print("aramex_KSA_en tweets: ", len(aramex_KSA_en))
print("aramex_KSA_ar tweets: ", len(aramex_KSA_ar))

print("aramex_kuwait_en tweets: ", len(aramex_kuwait_en))
print("aramex_kuwait_ar tweets: ", len(aramex_kuwait_ar))

print("shipping_qatar_en tweets: ", len(shipping_qatar_en))
print("shipping_qatar_ar tweets: ", len(shipping_qatar_ar))

print("shopANDship_gcc_en tweets: ", len(shopANDship_gcc_en))
print("shopANDship_gcc_ar tweets: ", len(shopANDship_gcc_ar))

print("aramex_gcc_en tweets: ", len(aramex_gcc_en))
print("aramex_gcc_ar tweets: ", len(aramex_gcc_ar))

print("aramex_UAE_en tweets: ", len(aramex_UAE_en))
print("aramex_UAE_ar tweets: ", len(aramex_UAE_ar))

print("--------------------------------")

for s in aramex_gcc_en:
    s = s.translate(str.maketrans('', '', string.punctuation))

for s in aramex_gcc_ar:
    s = s.translate(str.maketrans('', '', string.punctuation))    

    
arabic_length = len(aramex_KSA_ar) + len(aramex_kuwait_ar) + len(shipping_qatar_ar) + len(shopANDship_gcc_ar)
arabic_length += len(aramex_gcc_ar) + len(aramex_UAE_ar)

english_length = len(aramex_KSA_en) + len(aramex_kuwait_en) + len(shipping_qatar_en) + len(shopANDship_gcc_en)
english_length += len(aramex_gcc_en) + len(aramex_UAE_en)
                        
print("Number of Arabic Tweets: ", arabic_length)
print("Number of English Tweets: ", english_length)

### Consumer feedback in the GCC region

To answer our research question, the first step we have to do is get a general idea about the consumers' feedback on shipping services in the GCC region as a whole. Since we barely had any data for UPS and DHL shipping companies, we performed our analysis on Aramex only. 

In [None]:
words_en = [tweet.lower().split() for tweet in aramex_gcc_en]
words_ar = [tweet.lower().split() for tweet in aramex_gcc_ar]

stopwords_ar = set(stopwords.words('arabic'))
stopwords_en = set(stopwords.words('english'))

tweets_en = [[word for word in tweet_words if not word in stopwords_en and not "@aramex" in word]
              for tweet_words in words_en]
   
tweets_ar = [[word for word in tweet_words if not word in stopwords_ar and not "@aramex" in word]
              for tweet_words in words_ar]


all_words_en = list(itertools.chain(*tweets_en))
all_words_ar = list(itertools.chain(*tweets_ar))


counts_en = collections.Counter(all_words_en)
counts_ar = collections.Counter(all_words_ar)



wc = WordCloud(width = 800, height = 800, 
                background_color ='white', 
                stopwords = stopwords_en, 
                min_font_size = 10).generate(" ".join(all_words_en)) 

plt.figure(figsize=(10,8))
plt.imshow(wc)

plt.savefig('aramex_gcc_en.png')

# wc = WordCloud(width = 800, height = 800, 
#                 background_color ='white', 
#                 stopwords = stopwords_en, 
#                 min_font_size = 10).generate(" ".join(all_words_ar)) 

# plt.figure(figsize=(10,8))
# plt.imshow(wc)

# plt.savefig('aramex_gcc_ar.png')

However this was not enough, we had to also get the polarity of these Tweets:

In [None]:
all_tweets = aramex_gcc_en + aramex_gcc_ar + shopANDship_gcc_en + shopANDship_gcc_ar  + aramex_KSA_en + aramex_KSA_ar
all_tweets += aramex_kuwait_en + aramex_kuwait_ar + shipping_qatar_en + shipping_qatar_ar + aramex_UAE_en + aramex_UAE_ar

sentiment_objects = [TextBlob(tweet) for tweet in all_tweets]

all_sentiment = [tweet.sentiment.polarity for tweet in sentiment_objects]
pos_counter = 0
neg_counter = 0
neu_counter = 0

print(len(all_tweets))

sentiment_values = [[tweet.sentiment.polarity, str(tweet)] for tweet in sentiment_objects]
print(sentiment_values[0])
# Create dataframe containing the polarity value and tweet text

sentiment_df = pd.DataFrame(sentiment_values, columns=["polarity", "tweet"])
sentiment_df = sentiment_df[sentiment_df.polarity != 0]

x = sentiment_df['polarity'].values # array with polarity only

sns.distplot(x, color = 'red');

# Calculating the mean
mean = sentiment_df['polarity'].mean()

#ploting the mean
plt.axvline(mean, 0,1, color = 'blue')

plt.title("Sentiment analysis for shipments in the GCC countries")
plt.ylabel("Frequency (x100)")
plt.xlabel("Polarity")

      
plt.savefig('Sentiment analysis for shipments in the GCC countries.png')


To further understand the nature of our tweets, we looked at the co-occuring words:

In [None]:
# Create list of lists containing bigrams in tweets
terms_bigram = [list(bigrams(tweet)) for tweet in tweets_en]

# Flatten list of bigrams in clean tweets
bigrams = list(itertools.chain(*terms_bigram))

# Create counter of words in clean bigrams
bigram_counts = collections.Counter(bigrams)

bigram_df = pd.DataFrame(bigram_counts.most_common(15),
                             columns=['bigram', 'count'])

# Create dictionary of bigrams and their counts
d = bigram_df.set_index('bigram').T.to_dict('records')
# Create network plot 
G = nx.Graph()

# Create connections between nodes
for k, v in d[0].items():
    G.add_edge(k[0], k[1], weight=(v * 5))

fig, ax = plt.subplots(figsize=(10, 8))

pos = nx.spring_layout(G, k=4)

# Plot networks
nx.draw_networkx(G, pos,
                 font_size=10,
                 width=2,
                 edge_color='grey',
                 node_color='gray',
                 with_labels = False,
                 ax=ax)

# Create offset labels
for key, value in pos.items():
    x, y = value[0]+.135, value[1]+.045
    ax.text(x, y,
            s=key,
            bbox=dict(facecolor='black', alpha=0.25),
            horizontalalignment='center', fontsize=10)

plt.title("Most occuring bigrams in the English tweets")

plt.savefig('bigram_gcc.png')
plt.show()


### Consumer feedback in Qatar

we started by generating a wordcloud that represents the nature of Tweets here in Qatar:

In [None]:
words_shipping_qatar_en = [tweet.lower().split() for tweet in shipping_qatar_en]
# words_shipping_qatar_ar = [tweet.lower().split() for tweet in aramex_gcc_ar]

tweets_nsw_en = [[word for word in tweet_words if not word in stopwords_en and not "@aramex" in word]
              for tweet_words in words_shipping_qatar_en]

# tweets_nsw_ar = [[word for word in tweet_words if not word in stopwords_ar and not "@aramex" in word]
#               for tweet_words in words_aramix_gcc_ar]

all_words_nsw_en = list(itertools.chain(*tweets_nsw_en))
# all_words_nsw_ar = list(itertools.chain(*tweets_nsw_ar))


counts_nsw_en = collections.Counter(all_words_nsw_en)
# counts_nsw_ar = collections.Counter(all_words_nsw_ar)

# print(type(counts_nsw_en))

# counts_nsw_ar.most_common(20)

# from wordcloud import WordCloud

wc = WordCloud(width = 800, height = 800, 
                background_color ='white', 
                stopwords = stopwords_en, 
                min_font_size = 10).generate(" ".join(all_words_nsw_en)) 

plt.figure(figsize=(10,8))
plt.imshow(wc)

plt.savefig('qatar_shipping_en.png')

We then analyzed the polarity of these Tweets:

In [None]:
all_tweets = shipping_qatar_en + shipping_qatar_ar
sentiment_objects = [TextBlob(tweet) for tweet in all_tweets]

sentiment_values = [[tweet.sentiment.polarity, str(tweet)] for tweet in sentiment_objects]
print(sentiment_values[0])
# Create dataframe containing the polarity value and tweet text

sentiment_df = pd.DataFrame(sentiment_values, columns=["polarity", "tweet"])


# Remove polarity values equal to zero
sentiment_df = sentiment_df[sentiment_df.polarity != 0]
fig, ax = plt.subplots(figsize=(8, 6))

# Plot histogram with break at zero
sentiment_df.hist(bins=[-1, -0.75, -0.5, -0.25, 0.0, 0.25, 0.5, 0.75, 1],
             ax=ax,
             color="rosybrown")



plt.title("Sentiment analysis for shipments in Qatar")
plt.ylabel("Frequency")
plt.xlabel("Polarity")

# plt.show()

plt.savefig('Sentiment analysis for shipments in Qatar.png')


In [None]:
#Analysis for English tweets only in Qatar
all_tweets = shipping_qatar_en
sentiment_objects = [TextBlob(tweet) for tweet in all_tweets]

# sentiment_objects[2].polarity, sentiment_objects[2].subjectivity , sentiment_objects[2]
all_sentiment = [tweet.sentiment.polarity for tweet in sentiment_objects]

pos = 0
neg = 0
neut = 0

for i in range(len(sentiment_objects)):
    if sentiment_objects[i].polarity > 0:
        pos += 1
    elif sentiment_objects[i].polarity < 0:
        neg += 1
    else:
        neut += 1 
total = pos + neg + neut
s = [100 * (pos/total), 100 * (neg/total), 100 * (neut/total)]

sentiment_df = pd.DataFrame(s, columns=["polarity"])



labels = 'positive', 'negative', 'neutral'
sizes = s
explode = (0.1, 0.1, 0.1)

fig1, ax1 = plt.subplots()
ax1.pie(sizes, explode=explode, labels=labels, autopct='%1.1f%%',
        shadow=True, startangle=90)

plt.title("Qatari's feedback when using Aramex - English tweets")

ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.

plt.savefig("Qatari's feedback when using Aramex - English tweets.png")

plt.show()

In [None]:
#Analysis for Arabic Tweets only in Qatar
all_tweets = shipping_qatar_ar
sentiment_objects = [TextBlob(tweet) for tweet in all_tweets]

all_sentiment = [tweet.sentiment.polarity for tweet in sentiment_objects]

pos = 0
neg = 0
neut = 0

for i in range(len(sentiment_objects)):
    if sentiment_objects[i].polarity > 0:
        pos += 1
    elif sentiment_objects[i].polarity < 0:
        neg += 1
    else:
        neut += 1 
total = pos + neg
s = [100 * (pos/total), 100 * (pos/neg)]

sentiment_df = pd.DataFrame(s, columns=["polarity"])

labels = 'positive', 'negative'
sizes = s
explode = (0.1, 0.1)

fig1, ax1 = plt.subplots()
ax1.pie(sizes, explode=explode, labels=labels, autopct='%1.1f%%',
        shadow=True, startangle=90)

plt.title("Qatari's feedback when using Aramex - Arabic tweets")

ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.

plt.savefig("Qatari's feedback when using Aramex - Arabic tweets.png")

plt.show()

### Consumer feedback in KSA

In [None]:
#KSA
all_tweets = aramex_KSA_en + aramex_KSA_ar
sentiment_objects = [TextBlob(tweet) for tweet in all_tweets]

sentiment_values = [[tweet.sentiment.polarity, str(tweet)] for tweet in sentiment_objects]

sentiment_df = pd.DataFrame(sentiment_values, columns=["polarity", "tweet"])


# Remove polarity values equal to zero
sentiment_df = sentiment_df[sentiment_df.polarity != 0]
fig, ax = plt.subplots(figsize=(8, 6))

# Plot histogram with break at zero
sentiment_df.hist(bins=[-1, -0.75, -0.5, -0.25, 0.0, 0.25, 0.5, 0.75, 1],
             ax=ax,
             color="lightpink")



plt.title("Sentiment analysis for shipments in KSA")
plt.ylabel("Frequency")
plt.xlabel("Polarity")

plt.show()

plt.savefig('Sentiment analysis for shipments in KSA.png')

### Consumer feedback in UAE

In [None]:
#UAE
all_tweets = aramex_UAE_en + aramex_UAE_ar
sentiment_objects = [TextBlob(tweet) for tweet in all_tweets]

sentiment_values = [[tweet.sentiment.polarity, str(tweet)] for tweet in sentiment_objects]

sentiment_df = pd.DataFrame(sentiment_values, columns=["polarity", "tweet"])


# Remove polarity values equal to zero
sentiment_df = sentiment_df[sentiment_df.polarity != 0]
fig, ax = plt.subplots(figsize=(8, 6))

# Plot histogram with break at zero
sentiment_df.hist(bins=[-1, -0.75, -0.5, -0.25, 0.0, 0.25, 0.5, 0.75, 1],
             ax=ax,
             color="rosybrown")



plt.title("Sentiment analysis for shipments in UAE")
plt.ylabel("Frequency")
plt.xlabel("Polarity")

plt.show()

plt.savefig('Sentiment analysis for shipments in UAE.png')


### Consumer feedback in Kuwait

In [None]:
#Kuwait
all_tweets = aramex_kuwait_en + aramex_kuwait_ar
sentiment_objects = [TextBlob(tweet) for tweet in all_tweets]

sentiment_values = [[tweet.sentiment.polarity, str(tweet)] for tweet in sentiment_objects]

sentiment_df = pd.DataFrame(sentiment_values, columns=["polarity", "tweet"])

# Remove polarity values equal to zero
sentiment_df = sentiment_df[sentiment_df.polarity != 0]
fig, ax = plt.subplots(figsize=(8, 6))

# Plot histogram with break at zero
sentiment_df.hist(bins=[-1, -0.75, -0.5, -0.25, 0.0, 0.25, 0.5, 0.75, 1],
             ax=ax,
             color="moccasin")



plt.title("Sentiment analysis for shipments in Kuwait")
plt.ylabel("Frequency")
plt.xlabel("Polarity")

plt.show()

plt.savefig('Sentiment analysis for shipments in Kuwait.png')


And by that, we conclude the data we analyzed and gathered for our project.