<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Introduction" data-toc-modified-id="Introduction-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Introduction</a></span></li><li><span><a href="#Load-packages" data-toc-modified-id="Load-packages-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Load packages</a></span></li><li><span><a href="#Seeting-up-twitter-authentication" data-toc-modified-id="Seeting-up-twitter-authentication-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Seeting up twitter authentication</a></span></li><li><span><a href="#Getting-some-Tweets" data-toc-modified-id="Getting-some-Tweets-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Getting some Tweets</a></span><ul class="toc-item"><li><span><a href="#Getting-twitter-by-user" data-toc-modified-id="Getting-twitter-by-user-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Getting twitter by user</a></span><ul class="toc-item"><li><span><a href="#@picnic" data-toc-modified-id="@picnic-4.1.1"><span class="toc-item-num">4.1.1&nbsp;&nbsp;</span>@picnic</a></span></li><li><span><a href="#@JumboSupermarkt" data-toc-modified-id="@JumboSupermarkt-4.1.2"><span class="toc-item-num">4.1.2&nbsp;&nbsp;</span>@JumboSupermarkt</a></span></li><li><span><a href="#@albertheijn" data-toc-modified-id="@albertheijn-4.1.3"><span class="toc-item-num">4.1.3&nbsp;&nbsp;</span>@albertheijn</a></span></li></ul></li><li><span><a href="#Applying-GetSearch-to-search-for-a-defined-query" data-toc-modified-id="Applying-GetSearch-to-search-for-a-defined-query-4.2"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>Applying GetSearch to search for a defined query</a></span><ul class="toc-item"><li><span><a href="#Query_01:" data-toc-modified-id="Query_01:-4.2.1"><span class="toc-item-num">4.2.1&nbsp;&nbsp;</span>Query_01:</a></span></li><li><span><a href="#Query_02:" data-toc-modified-id="Query_02:-4.2.2"><span class="toc-item-num">4.2.2&nbsp;&nbsp;</span>Query_02:</a></span></li><li><span><a href="#Query_03:" data-toc-modified-id="Query_03:-4.2.3"><span class="toc-item-num">4.2.3&nbsp;&nbsp;</span>Query_03:</a></span></li><li><span><a href="#Query_04:" data-toc-modified-id="Query_04:-4.2.4"><span class="toc-item-num">4.2.4&nbsp;&nbsp;</span>Query_04:</a></span></li><li><span><a href="#Query_04B:" data-toc-modified-id="Query_04B:-4.2.5"><span class="toc-item-num">4.2.5&nbsp;&nbsp;</span>Query_04B:</a></span></li><li><span><a href="#Query_05:" data-toc-modified-id="Query_05:-4.2.6"><span class="toc-item-num">4.2.6&nbsp;&nbsp;</span>Query_05:</a></span></li><li><span><a href="#Query_06:" data-toc-modified-id="Query_06:-4.2.7"><span class="toc-item-num">4.2.7&nbsp;&nbsp;</span>Query_06:</a></span></li></ul></li></ul></li><li><span><a href="#Conclusions" data-toc-modified-id="Conclusions-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Conclusions</a></span></li></ul></div>

# Introduction

The main goal of this project is to analyze Twitter data related to three of the biggest (online) supermarkets in the Netherlands and their customers in the period of the corona crisis. The main focus is sentiment analysis, but we will perform also other analysis on the data collected.

The data will be collected via Twitter API. Therefore, we will try to cover as much as possible of the period of the corona crisis considering the limitations imposed by the API.

In this notebook will be:

1. Collection tweets from the user timeline (`GetUserTimeline`)
2. Collecting tweets using queries (`GetSearch`)
3. Selection which information from the data retrieved will be kept
4. Saving collected data in .csv file

# Load packages


**Attention**: `private_twitter_credentials.py` contains my Twitter credentials. Insert your Twitter credentials in `twitter_credentials.py`
and replace `private_twitter_credentials` by `twitter_credentials` in this notebook.


In [1]:
import private_twitter_credentials
import twitter
import datetime
import pandas as pd
import time

TodaysDate = time.strftime("%Y-%m-%d-%H-%M")

# Seeting up twitter authentication

I'll be using [`python-twitter`](https://python-twitter.readthedocs.io/en/latest/index.html) a python wrapper around the Twitter API.

In [2]:
consumer_key = private_twitter_credentials.consumer_key
consumer_secret = private_twitter_credentials.consumer_secret
access_token = private_twitter_credentials.access_token
access_token_secret = private_twitter_credentials.access_token_secret

api = twitter.Api(
    consumer_key         =   consumer_key,
    consumer_secret      =   consumer_secret,
    access_token_key     =   access_token,
    access_token_secret  =   access_token_secret,
    tweet_mode = 'extended' # to ensure that we get the full text of the users' original tweets
)

# Getting some Tweets

In this project we want to :

* Access user timeline Tweets, i.e., apply `GetUserTimeline` method on object `api` created in last section
* Access Tweets resulting from some query, i.e., apply `GetSearch` method on `api`.

Consider we want to get some Tweets from `Picnic`, one of the Twitter's user will be considering in this project. The account’s Twitter handle of `Picnic` is `@picnic` and we will be using the argument `screen_name` as the handle without `@`.

In [3]:
# we will use for now `screen_name` and `count`. Count has limit of 200, i.e., we can only retrieve 200 tweets per call
tweets_picnic = api.GetUserTimeline(screen_name='picnic', count = 200)

# see all available info in a dictionary

tweets_picnic = [ _.AsDict() for _ in tweets_picnic]

Check the 1st one

In [4]:
tweets_picnic[0]

{'created_at': 'Wed Jun 24 14:34:45 +0000 2020',
 'full_text': '@RuudVeHa Ik stuur je even een privébericht zodat ik je hier meer over kan vertellen Ruud! ^Mylène',
 'hashtags': [],
 'id': 1275799526717247493,
 'id_str': '1275799526717247493',
 'in_reply_to_screen_name': 'RuudVeHa',
 'in_reply_to_status_id': 1275787048209678336,
 'in_reply_to_user_id': 2870481582,
 'lang': 'nl',
 'source': '<a href="https://mobile.twitter.com" rel="nofollow">Twitter Web App</a>',
 'urls': [],
 'user': {'created_at': 'Tue Dec 02 13:21:56 +0000 2014',
  'description': 'Online supermarket Picnic delivers groceries free to your door with a 100% electric fleet, making online grocery shopping affordable for everyone',
  'favourites_count': 3885,
  'followers_count': 4842,
  'friends_count': 5,
  'id': 2902300475,
  'id_str': '2902300475',
  'listed_count': 26,
  'name': 'Picnic',
  'profile_background_color': '000000',
  'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png',
  'p

Let's say that you want to search Tweets that mention AH and COVID-19. In thi case you use the method `GetSearch` using your query as argument. 

The easiest way to have the query right is going to [Twitter’s Advanced Search](https://twitter.com/search-advanced) and typing what you want to know. Then using as your `raw_query` the part of search URL after the "?" to use `raw_query`, removing the `&src=typd` portion.

Let's try it out.

In [5]:
%%html
<img src ="../images/search.JPG", width=400, height=200>

The url obtained was https://twitter.com/search?q=%40albertheijn%20covid-19&src=typed_query therefore I will use `raw_query = q=%40albertheijn%20covid-19`.

In [6]:
search_result = api.GetSearch(raw_query='q=%40albertheijn%20covid-19')
search_result = [ _.AsDict() for _ in search_result]

In [7]:
print('Result of the results with the last 7 days:', len(search_result))

Result of the results with the last 7 days: 3


The results from `GetSearch` are limited to 7 days. In the last 7 days we got 3 Tweets that mentioned @albertheijn and covid-19. Details of the most recent is shown below:

In [8]:
search_result[0]

{'created_at': 'Mon Jun 22 17:22:09 +0000 2020',
 'full_text': 'Steeds meer artikelen verdwijnen uit het online assortiment bij \u2066@albertheijn\u2069. Ben benieuwd of er een alternatief komt. Zo niet: zit je dan met je jaarabonnement bezorgen. Wie is het haasje? Mensen die tot de risicogroep behoren mbt covid-19. https://t.co/5LDBwwfjJZ',
 'hashtags': [],
 'id': 1275116878566916096,
 'id_str': '1275116878566916096',
 'lang': 'nl',
 'media': [{'display_url': 'pic.twitter.com/5LDBwwfjJZ',
   'expanded_url': 'https://twitter.com/Belleblabla/status/1275116878566916096/photo/1',
   'id': 1275116874578243589,
   'media_url': 'http://pbs.twimg.com/media/EbIgCqRXsAUMyw-.jpg',
   'media_url_https': 'https://pbs.twimg.com/media/EbIgCqRXsAUMyw-.jpg',
   'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'},
    'small': {'w': 680, 'h': 163, 'resize': 'fit'},
    'medium': {'w': 1063, 'h': 255, 'resize': 'fit'},
    'large': {'w': 1063, 'h': 255, 'resize': 'fit'}},
   'type': 'photo',
   'u

When retrieving Tweets from user timeline Twitter API limites us to 200 Tweets at a time, and from search to 100 tweets. This is the parameter `count` from both methods, `GetUserTimeline` and `GetSearch`.

Another constraint that we need to deal with is that the Twitter API is rate limited, meaning Twitter puts restrictions on how much data you can take at a time. More details about it [here](https://developer.twitter.com/en/docs/basics/rate-limiting#:~:text=Rate%20limiting%20of%20the%20standard,per%20window%20per%20access%20token.).

Because I want to retrieve much more than 200 Tweets (if possible) I'll write a class with two methods.

Class `TweetMiner` contains two methods: 

* `mine_user_tweets` which mine user's tweets making use of [GetUserTimeline](https://developer.twitter.com/en/docs/tweets/timelines/api-reference/get-statuses-user_timeline)

* `search_tweets` which mine tweets using [GetSearch](https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets)

Details about both uses will be show respectively, in sections:

- [Getting twitter by user](#Getting-twitter-by-user)

- [Applying GetSearch to search for a defined query](#Applying-GetSearch-to-search-for-a-defined-query)


Notice that in this class I'm also selecting which information from the Tweets I want to keep in a from of a dictionary. I'm almost collecting everything. Probably for the purpose I have in mind now I'll not be using all this but it was my choice to keep what I kept. Feel free to adapt it.

Next, you find a function where I use the list of dictionaries obtained from apply my `TweetMiner`, organize it a bit and save the result in a .csv file

In [35]:
import datetime

class TweetMiner(object):
    """ Make possible obtaining tweets using twitter user id (mine_user_tweets) or performing a standard Twitter 
    API search (search_tweets)"""

    
    def __init__(self, api, result_limit = 20, max_pages = 40):
        """result_limit = count that can take max 200 (mine_user_tweets) and max 100 (search_tweets)"""
        
        self.api = api        
        self.result_limit = result_limit
        self.max_pages = max_pages
        

    def mine_user_tweets(self, user, mine_retweets=False):
        """ Mine tweets of user = screen_name or user_id"""

        data           =  []
        last_tweet_id  =  False
        page           =  1
        
        while page <= self.max_pages:
            
            if last_tweet_id:
                statuses   =   self.api.GetUserTimeline(screen_name=user, count=self.result_limit, max_id=last_tweet_id - 1, 
                                                        include_rts=mine_retweets)
                statuses = [ _.AsDict() for _ in statuses]
            else:
                statuses   =   self.api.GetUserTimeline(screen_name=user, count=self.result_limit, 
                                                        include_rts=mine_retweets)
                statuses = [_.AsDict() for _ in statuses]
                
            for item in statuses:
                # Using try except here.
                # When retweets = 0 we get an error (GetUserTimeline fails to create a key, 'retweet_count')
                try:
                    mined = {
                        'mined_at':         datetime.datetime.now(),
                        'created_at':       item['created_at'],
                        'tweet_id':         item['id'],
                        'tweet_id_str':     item['id_str'],
                        'screen_name':      item['user']['screen_name'],
                        'favorite_count':   item['favorite_count'],
                        'text':             item['full_text'],
                        'source':           item['source'],
                        'hashtags':         item['hashtags'],
                        'urls':             item['urls'],
                        'language':         item['lang'],
                        'retweet_count':    item['retweet_count'],
                        #user info
                        'user_favourites_count': item['user']['favourites_count'],
                        'followers_count':  item['user']['followers_count'],
                        'friends_count':    item['user']['friends_count']
                    }
            
                
                except:
                    mined = {
                        'mined_at':         datetime.datetime.now(),
                        'created_at':       item['created_at'],
                        'tweet_id':         item['id'],
                        'tweet_id_str':     item['id_str'],
                        'screen_name':      item['user']['screen_name'],
#                         'favorite_count':   item['favorite_count'],
                        'text':             item['full_text'],
                        'source':           item['source'],
                        'hashtags':         item['hashtags'],
                        'urls':             item['urls'],
                        'language':         item['lang'],
                        'retweet_count':    0,
                        # user info
                        'user_favourites_count': item['user']['favourites_count'],
                        'followers_count':  item['user']['followers_count'],
                        'friends_count':    item['user']['friends_count']
                        }
                
                last_tweet_id = item['id']
                data.append(mined)
                
            page += 1
            
        return data
    
    def search_tweets(max_pages = 20, count = 20, raw_query = None, result_type = 'mixed'):
        """ Search tweets """

        data           =  []
        last_tweet_id  =  False
        page           =  1
        
        while page <= max_pages:
            
            if last_tweet_id:
                statuses = api.GetSearch(raw_query=raw_query, count = count, result_type=result_type, 
                                         max_id=last_tweet_id - 1)
                statuses = [ _.AsDict() for _ in statuses]
            else:
                statuses = api.GetSearch(raw_query=raw_query, count = count, result_type=result_type)
                statuses = [_.AsDict() for _ in statuses]
                
            for item in statuses:
                # Using try except here.
                try:
                    mined = {
                        'mined_at':                datetime.datetime.now(),
                        'created_at':              item['created_at'],
                        'tweet_id':                item['id'],
                        'tweet_id_str':            item['id_str'],
                        'in_reply_to_screen_name': item['in_reply_to_screen_name'],
                        'in_reply_to_status_id':   item['in_reply_to_status_id'],
                        'in_reply_to_user_id':     item['in_reply_to_user_id'],
                        'language':                item['lang'],
                        'text':                    item['full_text'],
                        'hashtags':                item['hashtags'],
                        'source':                  item['source'],
                       # info about user
                        'user_id':                 item['user']['id'],
                        'user_screen_name':        item['user']['screen_name'],
                        'user_location':           item['user']['location'],
                        'user_favourites_count':   item['user']['favourites_count'],
                        'followers_count':         item['user']['followers_count'],
                        'friends_count':           item['user']['friends_count']
                    }
                    
                except:
                    mined = {
                        'mined_at':                datetime.datetime.now(),
                        'created_at':              item['created_at'],
                        'tweet_id':                item['id'],
                        'tweet_id_str':            item['id_str'],
#                         'in_reply_to_screen_name': item['in_reply_to_screen_name'],
#                         'in_reply_to_status_id':   item['in_reply_to_status_id'],
#                         'in_reply_to_user_id':     item['in_reply_to_user_id'],
                        'language':                item['lang'],
                        'text':                    item['full_text'],
                        'hashtags':                item['hashtags'],
                        'source':                  item['source'],
                       # info about user
                        'user_id':                 item['user']['id'],
                        'user_screen_name':        item['user']['screen_name'],
#                        'user_location':           item['user']['location'],
#                         'user_favourites_count':   item['user']['favourites_count'],
                        'followers_count':         item['user']['followers_count'],
                        'friends_count':           item['user']['friends_count']
                    }
                                            
                
                last_tweet_id = item['id']
                data.append(mined)
                
            page += 1
            
        return data

In [36]:
def process_and_save(df, file_name, mine_user_twitter=1):
    """ Save retrieved tweets in csv file.
    
    Input:
    
    df : dataframe of tweets'data
    file_name: name with which the csv will be saved (without extension)
    mine_user_twitter: Indicates if df came contains tweets from a twitter user (mine_user_twitter=1), i.e., it was 
    obtained using GetUserTimeline since the information obtained from this method is different from an API search 
    from GetSearch. Therefore, when using GetSearch (mine_user_twitter=0).
    
    """
    
    TodaysDate = time.strftime("%Y-%m-%d-%H-%M")

    
    # Create columns 'year', 'month', 'day', 'hour', 'min' from 'created_at'
    df['created_at'] = pd.to_datetime(df['created_at'])
        
    df['year'] = df['created_at'].dt.year 
    df['month'] = df['created_at'].dt.month 
    df['day'] = df['created_at'].dt.day 
    df['hour'] = df['created_at'].dt.hour 
    df['minute'] = df['created_at'].dt.minute
    df['day_of_week'] = df['created_at'].dt.weekday
    
    if mine_user_twitter:
    
        df = df[['mined_at', 'created_at', 'year', 'month', 'day','day_of_week', 'hour', 'minute', 'screen_name', 
                 'tweet_id', 'tweet_id_str',  'retweet_count', 'favorite_count', 'source','hashtags', 'urls', 'language', 
                 'user_favourites_count', 'followers_count','friends_count','text']]
    else:
        df = df[['mined_at','created_at', 'year', 'month', 'day','day_of_week','hour', 'minute', 'tweet_id', 'tweet_id_str', 
                 'in_reply_to_screen_name','in_reply_to_status_id','in_reply_to_user_id', 'hashtags','source','language', 
                 'user_screen_name','user_id','user_location','user_favourites_count','followers_count','friends_count', 
                 'text']]
        
    df.sort_values(by='created_at',inplace = True)
    
    # the normal use of drop_duplicates raise typeerror unhashable type 'list' so I needed to adapt using str
    # as pointed out in https://stackoverflow.com/questions/43855462/pandas-drop-duplicates-method-not-working?rq=1
    
    df = df.loc[df.astype(str).drop_duplicates(subset=['created_at','tweet_id','text']).index]
    df.reset_index(drop = True, inplace = True)
    
    
    df.to_csv("../data/tweets/"+file_name+"_"+TodaysDate+".csv", index = False)
    
    return df
    
    

Now that we have our class to retrieve Tweets and our function to save the result in .csv let's collect data.

The goal of this project is to check the sentiment of users towards the (online) supermarkets , i.e., Jumbo Supermarkten, AH, and Picnic, especially after COVID-19 entered our lives. The first two are supermarkets chains that also provide online grocery shopping while the last one is exclusively online supermarket.

Everything changed since the first case of corona virus in The Netherlands (February 27th) and the way of shopping groceries suffered important changes, in particular, considering the sudden and significant increase the number of users that opted for online grocery shopping. Supermarkets were not ready for such an explosion on demand, some adapted faster than others. Specially, `Picnic` that has the sole focus on online shopping.

Some were completely not available to receive order while with orders the consumer should consider a planning of about 2 weeks until being able to receive his/her order. I have my own experiences but I want to access the sentiment of consumers via tweets messages over these 3 providers. In special, I want to focus in this period were COVID-19 is present.

The idea is to get twitters covering the period from the 1st case until today both for info retrieved by user (`GetUserTimeline`) and by query (`GetSearch`).

We need, however, to consider some limitations.
First, it is difficult to have control on how far in the past we can go when retrieving user timeline data (using `TweetMiner. mine_user_tweets()`. So, we will play mainly with parameters `result_limit`, i.e., count and `max_pages`.

Second, when performing search with API it is only possible to access Tweets of the last 7 days (more details [here](https://developer.twitter.com/en/docs/tweets/search/overview/standard)). So unfortunately, when performing queries, we will be limited and we will not be able to go back months ago.

## Getting twitter by user

To start we will obtain tweets for `picnic`, `JumboSupermarkt`, and `albertheijn`.

### @picnic

In [11]:
# Result limit == count parameter from our GetUserTimeline() it can take max 200
# More pages more back in time you can go
miner = TweetMiner(api, result_limit=200, max_pages = 20)
picnic = miner.mine_user_tweets(user="picnic")
df_picnic = process_and_save(pd.DataFrame(picnic), "picnic")

In [12]:
df_picnic.tail()

Unnamed: 0,mined_at,created_at,year,month,day,day_of_week,hour,minute,screen_name,tweet_id,...,retweet_count,favorite_count,source,hashtags,urls,language,user_favourites_count,followers_count,friends_count,text
3224,2020-06-24 17:06:45.034704,2020-06-23 13:17:08+00:00,2020,6,23,1,13,17,picnic,1275417608360263680,...,0,,"<a href=""https://mobile.twitter.com"" rel=""nofo...",[],[],nl,3885,4842,5,@Chrishuygens Er is een storing bij iDeal op h...
3225,2020-06-24 17:06:45.034704,2020-06-23 13:17:16+00:00,2020,6,23,1,13,17,picnic,1275417643504238598,...,0,,"<a href=""https://mobile.twitter.com"" rel=""nofo...",[],[],nl,3885,4842,5,@birgitcloggy Er is een storing bij iDeal op h...
3226,2020-06-24 17:06:45.034704,2020-06-23 19:32:38+00:00,2020,6,23,1,19,32,picnic,1275512106515927044,...,0,,"<a href=""http://www.zendesk.com"" rel=""nofollow...",[],[],nl,3885,4842,5,@seilram76 Goed dat je even een berichtje stuu...
3227,2020-06-24 17:06:45.034704,2020-06-24 08:12:09+00:00,2020,6,24,2,8,12,picnic,1275703244707319808,...,0,,"<a href=""https://mobile.twitter.com"" rel=""nofo...",[],[],nl,3885,4842,5,@Wouter_Kamp Door de coronacrisis stond de wac...
3228,2020-06-24 17:06:45.034704,2020-06-24 14:34:45+00:00,2020,6,24,2,14,34,picnic,1275799526717247493,...,0,,"<a href=""https://mobile.twitter.com"" rel=""nofo...",[],[],nl,3885,4842,5,@RuudVeHa Ik stuur je even een privébericht zo...


In [13]:
df_picnic.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3229 entries, 0 to 3228
Data columns (total 21 columns):
 #   Column                 Non-Null Count  Dtype              
---  ------                 --------------  -----              
 0   mined_at               3229 non-null   datetime64[ns]     
 1   created_at             3229 non-null   datetime64[ns, UTC]
 2   year                   3229 non-null   int64              
 3   month                  3229 non-null   int64              
 4   day                    3229 non-null   int64              
 5   day_of_week            3229 non-null   int64              
 6   hour                   3229 non-null   int64              
 7   minute                 3229 non-null   int64              
 8   screen_name            3229 non-null   object             
 9   tweet_id               3229 non-null   int64              
 10  tweet_id_str           3229 non-null   object             
 11  retweet_count          3229 non-null   int64            

In [14]:
min(df_picnic.created_at),max(df_picnic.created_at)

(Timestamp('2018-12-07 15:06:19+0000', tz='UTC'),
 Timestamp('2020-06-24 14:34:45+0000', tz='UTC'))

For `Picnic` we went far enough. We got data back to December 2018!

After experimenting a bit, it seems that better keep `result_limit` close to its maximum 200 and try `max_pages` around 20. If there is some kind of connection error wait and try again after a while or reduce these values.

In [15]:
print("Picnic's followers", df_picnic.loc[df_picnic.shape[0]-1,'followers_count'])
print("Picnic's friends", df_picnic.loc[df_picnic.shape[0]-1,'friends_count'])

Picnic's followers 4842
Picnic's friends 5


In [16]:
df_picnic['language'].value_counts()

nl     2930
und     190
en       90
es        5
de        4
fr        3
in        2
sv        1
fi        1
it        1
ht        1
cy        1
Name: language, dtype: int64

### @JumboSupermarkt

In [17]:
# Result limit == count parameter from our GetUserTimeline() it can take max 200
# More pages more back in time you can go
miner = TweetMiner(api, result_limit=200, max_pages = 20)
JumboSupermarkt = miner.mine_user_tweets(user="JumboSupermarkt")

df_JumboSupermarkt = process_and_save(pd.DataFrame(JumboSupermarkt), "JumboSupermarkt")

In [18]:
df_JumboSupermarkt.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3238 entries, 0 to 3237
Data columns (total 21 columns):
 #   Column                 Non-Null Count  Dtype              
---  ------                 --------------  -----              
 0   mined_at               3238 non-null   datetime64[ns]     
 1   created_at             3238 non-null   datetime64[ns, UTC]
 2   year                   3238 non-null   int64              
 3   month                  3238 non-null   int64              
 4   day                    3238 non-null   int64              
 5   day_of_week            3238 non-null   int64              
 6   hour                   3238 non-null   int64              
 7   minute                 3238 non-null   int64              
 8   screen_name            3238 non-null   object             
 9   tweet_id               3238 non-null   int64              
 10  tweet_id_str           3238 non-null   object             
 11  retweet_count          3238 non-null   int64            

In [19]:
min(df_JumboSupermarkt.created_at),max(df_JumboSupermarkt.created_at)

(Timestamp('2020-03-10 07:06:47+0000', tz='UTC'),
 Timestamp('2020-06-24 13:56:42+0000', tz='UTC'))

In [20]:
# Result limit == count parameter from our GetUserTimeline() it can take max 200
# More pages more back in time you can go
miner = TweetMiner(api, result_limit=200, max_pages = 25)
JumboSupermarkt = miner.mine_user_tweets(user="JumboSupermarkt")
df_JumboSupermarkt = process_and_save(pd.DataFrame(JumboSupermarkt), "JumboSupermarkt")


In [21]:
min(df_JumboSupermarkt.created_at),max(df_JumboSupermarkt.created_at)

(Timestamp('2020-03-10 07:06:47+0000', tz='UTC'),
 Timestamp('2020-06-24 13:56:42+0000', tz='UTC'))

Changing parameters does not seem to help here and we could only retrieve data from March 10th, 2020 until now for Jumbo.

In [22]:
print("Jumbo's followers", df_JumboSupermarkt.loc[df_JumboSupermarkt.shape[0]-1,'followers_count'])
print("Jumbo's friends", df_JumboSupermarkt.loc[df_JumboSupermarkt.shape[0]-1,'friends_count'])

Jumbo's followers 16208
Jumbo's friends 1710


In [23]:
df_JumboSupermarkt['language'].value_counts()

nl     3134
en       70
und      25
in        4
da        2
de        2
tl        1
Name: language, dtype: int64

### @albertheijn

In [24]:
# Result limit == count parameter from our GetUserTimeline() it can take max 200
# More pages more back in time you can go
miner = TweetMiner(api, result_limit=200, max_pages = 20)
albertheijn = miner.mine_user_tweets(user="albertheijn")
df_albertheijn = process_and_save(pd.DataFrame(albertheijn), "albertheijn")


In [25]:
df_albertheijn.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3227 entries, 0 to 3226
Data columns (total 21 columns):
 #   Column                 Non-Null Count  Dtype              
---  ------                 --------------  -----              
 0   mined_at               3227 non-null   datetime64[ns]     
 1   created_at             3227 non-null   datetime64[ns, UTC]
 2   year                   3227 non-null   int64              
 3   month                  3227 non-null   int64              
 4   day                    3227 non-null   int64              
 5   day_of_week            3227 non-null   int64              
 6   hour                   3227 non-null   int64              
 7   minute                 3227 non-null   int64              
 8   screen_name            3227 non-null   object             
 9   tweet_id               3227 non-null   int64              
 10  tweet_id_str           3227 non-null   object             
 11  retweet_count          3227 non-null   int64            

In [26]:
min(df_albertheijn.created_at),max(df_albertheijn.created_at)

(Timestamp('2020-04-08 08:47:29+0000', tz='UTC'),
 Timestamp('2020-06-24 15:06:54+0000', tz='UTC'))

In [27]:
# Result limit == count parameter from our GetUserTimeline() it can take max 200
# More pages more back in time you can go
miner = TweetMiner(api, result_limit=200, max_pages = 100)
albertheijn = miner.mine_user_tweets(user="albertheijn")
df_albertheijn = process_and_save(pd.DataFrame(albertheijn), "albertheijn")


In [28]:
min(df_albertheijn.created_at),max(df_albertheijn.created_at)

(Timestamp('2020-04-08 08:47:29+0000', tz='UTC'),
 Timestamp('2020-06-24 15:06:54+0000', tz='UTC'))

In [29]:
df_albertheijn.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3227 entries, 0 to 3226
Data columns (total 21 columns):
 #   Column                 Non-Null Count  Dtype              
---  ------                 --------------  -----              
 0   mined_at               3227 non-null   datetime64[ns]     
 1   created_at             3227 non-null   datetime64[ns, UTC]
 2   year                   3227 non-null   int64              
 3   month                  3227 non-null   int64              
 4   day                    3227 non-null   int64              
 5   day_of_week            3227 non-null   int64              
 6   hour                   3227 non-null   int64              
 7   minute                 3227 non-null   int64              
 8   screen_name            3227 non-null   object             
 9   tweet_id               3227 non-null   int64              
 10  tweet_id_str           3227 non-null   object             
 11  retweet_count          3227 non-null   int64            

Again the changing parameters didn't work. For `AH` we only succeeded in getting back to April 8th, 2020.

In [30]:
print("AH's followers", df_albertheijn.loc[df_albertheijn.shape[0]-1,'followers_count'])
print("AH's friends", df_albertheijn.loc[df_albertheijn.shape[0]-1,'friends_count'])

AH's followers 45540
AH's friends 6


In [31]:
df_albertheijn['language'].value_counts()

nl     3122
en       75
und       6
fr        5
tr        4
et        3
in        2
da        2
pl        1
sv        1
es        1
fi        1
is        1
ht        1
de        1
cy        1
Name: language, dtype: int64

## Applying GetSearch to search for a defined query


Twitter’s search parameters are a bit complex, to perform a particular search, you can consult Twitter’s documentation at https://dev.twitter.com/rest/public/search.

An easier way is to make use of [Twitter’s Advanced Search](toolhttps://twitter.com/search-advanced), and then use the part of search URL after the "?" to use `raw_query`, removing the `&src=typd` portion.

In this section we will perform some queries to allow us to compare `Picnic`, `Jumbo`, and `AH`. In particular, my main goal is to include also `COVID-19`. But since we are in June and things seem to be now much more in control than few months ago, I'm not sure if I'll be able to retrieve enough Tweets using these key words.

In addition, the results obtained from Twitter API searches against a sampling of recent Tweets published in the past 7 days (more details [here](https://developer.twitter.com/en/docs/tweets/search/overview/standard)). So, the results you see in the website when using `Twitter’s Advanced Search` will not per se be in the result of the API search.




### Query_01:

`All these words`: picnic, jumbo, ah, covid

`Dates:`

    From: February - 25 -2020
    To:   June - 21 -2020

**Result:** `https://twitter.com/search?q=picnic%2C%20jumbo%2C%20ah%2C%20covid%20until%3A2020-06-21%20since%3A2020-02-25&src=typed_query`

In [32]:
query_01 = 'q=picnic%2C%20jumbo%2C%20ah%2C%20covid%20until%3A2020-06-21%20since%3A2020-02-25'
result = TweetMiner.search_tweets(raw_query = query_01)
len(result)

0

As expected, since the 2 Tweets there are from March and therefore are not catch by the API.

### Query_02:

So let's make a query without `covid`.

`All these words`: picnic, jumbo, ah

**Result:** `https://twitter.com/search?q=picnic%2C%20jumbo%2C%20ah&src=typed_query`

In [37]:
query_02 = 'q=picnic%2C%20jumbo%2C%20ah'
result_02 = TweetMiner.search_tweets(max_pages = 20, count = 100, raw_query = query_02)
len(result_02)

80

In [38]:
result_02[0]

{'mined_at': datetime.datetime(2020, 6, 24, 17, 15, 25, 807873),
 'created_at': 'Thu Jun 18 15:03:16 +0000 2020',
 'tweet_id': 1273632376950849537,
 'tweet_id_str': '1273632376950849537',
 'language': 'nl',
 'text': 'Half juni en nog steeds worden boodschappen bezorgd bij meerdere huizen in de straat door AH, Jumbo, Picnic. Benieuwd: is dat blijvend of gaan mensen toch weer zelf naar de winkel over een tijdje? Onze boodschappen werden al langer bezorgd, wegens geen biowinkel hier @Korenmaat',
 'hashtags': [],
 'source': '<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>',
 'user_id': 111582731,
 'user_screen_name': 'martiendas',
 'followers_count': 4645,
 'friends_count': 5108}

In [39]:
df_query_02 = process_and_save(pd.DataFrame(result_02),"query_02",mine_user_twitter=0)
df_query_02.head()

Unnamed: 0,mined_at,created_at,year,month,day,day_of_week,hour,minute,tweet_id,tweet_id_str,...,hashtags,source,language,user_screen_name,user_id,user_location,user_favourites_count,followers_count,friends_count,text
0,2020-06-24 17:15:27.659258,2020-06-18 11:50:31+00:00,2020,6,18,3,11,50,1273583869753724930,1273583869753724930,...,[],"<a href=""https://mobile.twitter.com"" rel=""nofo...",nl,GHengeveld,20709019,"Amersfoort, Nederland",891.0,1485,1750,@jelleprins @picnic @JumboSupermarkt @alberthe...
1,2020-06-24 17:15:29.661533,2020-06-18 11:56:42+00:00,2020,6,18,3,11,56,1273585426268409859,1273585426268409859,...,[],"<a href=""http://twitter.com/download/iphone"" r...",nl,jelleprins,16546619,,,10510,1066,@Oli4K @picnic @JumboSupermarkt @albertheijn L...
2,2020-06-24 17:15:28.534556,2020-06-18 14:00:01+00:00,2020,6,18,3,14,0,1273616458459705347,1273616458459705347,...,[],"<a href=""https://about.twitter.com/products/tw...",nl,agfnl,164631799,,,8535,2130,"""Trendbreuk: Jumbo al 3 weken duurder in AGF d..."
3,2020-06-24 17:15:29.251031,2020-06-18 15:03:16+00:00,2020,6,18,3,15,3,1273632376950849537,1273632376950849537,...,[],"<a href=""http://twitter.com/download/android"" ...",nl,martiendas,111582731,,,4645,5108,Half juni en nog steeds worden boodschappen be...


In [40]:
df_query_02.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 23 columns):
 #   Column                   Non-Null Count  Dtype              
---  ------                   --------------  -----              
 0   mined_at                 4 non-null      datetime64[ns]     
 1   created_at               4 non-null      datetime64[ns, UTC]
 2   year                     4 non-null      int64              
 3   month                    4 non-null      int64              
 4   day                      4 non-null      int64              
 5   day_of_week              4 non-null      int64              
 6   hour                     4 non-null      int64              
 7   minute                   4 non-null      int64              
 8   tweet_id                 4 non-null      int64              
 9   tweet_id_str             4 non-null      object             
 10  in_reply_to_screen_name  1 non-null      object             
 11  in_reply_to_status_id    1 non-null 

In the case of queries we have much less results than in the case of user timeline so it was necessary to add `drop_duplicates` to function `process_and_save` to remove repeated rows.

Notice that `result_02` showed to exist 80 tweets when in fact after removing duplicates we have only 4.

In [41]:
min(df_query_02['created_at']),max(df_query_02['created_at'])

(Timestamp('2020-06-18 11:50:31+0000', tz='UTC'),
 Timestamp('2020-06-18 15:03:16+0000', tz='UTC'))

In [42]:
df_query_02['text'].values

array(['@jelleprins @picnic @JumboSupermarkt @albertheijn Geen ervaring met die van Jumbo of AH, maar die van Picnic is so gebruiksvriendelijk dat mijn zoontje van 3 de boodschappen doet.',
       '@Oli4K @picnic @JumboSupermarkt @albertheijn Lijkt er op dat Jumbo &amp; AH erg naar website &amp; elkaar hebben gekeken, terwijl Picnic echt mobile first &amp; vanuit niks is gestart.',
       '"Trendbreuk: Jumbo al 3 weken duurder in AGF dan AH, Picnic volgt"\nhttps://t.co/mVfpqWNoT7 https://t.co/vPVSFN5S49',
       'Half juni en nog steeds worden boodschappen bezorgd bij meerdere huizen in de straat door AH, Jumbo, Picnic. Benieuwd: is dat blijvend of gaan mensen toch weer zelf naar de winkel over een tijdje? Onze boodschappen werden al langer bezorgd, wegens geen biowinkel hier @Korenmaat'],
      dtype=object)

### Query_03:

Now I'll add `albertheijn` to the last query because sometimes there are expressions like `ahhhhh` and this is not what we are looking for.

`All these words`: picnic, jumbo, ah, albertheijn

**Result:** `https://twitter.com/search?q=picnic%2C%20jumbo%2C%20ah%2C%20albertheijn&src=typed_query`

In [43]:
query_03 = 'q=picnic%2C%20jumbo%2C%20ah%2C%20albertheijn'
result_03 = TweetMiner.search_tweets(max_pages = 20, raw_query = query_03)
len(result_03)

40

In [44]:
result_03[0]

{'mined_at': datetime.datetime(2020, 6, 24, 17, 16, 4, 26977),
 'created_at': 'Thu Jun 18 11:56:42 +0000 2020',
 'tweet_id': 1273585426268409859,
 'tweet_id_str': '1273585426268409859',
 'language': 'nl',
 'text': '@Oli4K @picnic @JumboSupermarkt @albertheijn Lijkt er op dat Jumbo &amp; AH erg naar website &amp; elkaar hebben gekeken, terwijl Picnic echt mobile first &amp; vanuit niks is gestart.',
 'hashtags': [],
 'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>',
 'user_id': 16546619,
 'user_screen_name': 'jelleprins',
 'followers_count': 10510,
 'friends_count': 1066}

In [45]:
df_query_03 = process_and_save(pd.DataFrame(result_03),"query_03",mine_user_twitter=0)
df_query_03.head()

Unnamed: 0,mined_at,created_at,year,month,day,day_of_week,hour,minute,tweet_id,tweet_id_str,...,hashtags,source,language,user_screen_name,user_id,user_location,user_favourites_count,followers_count,friends_count,text
0,2020-06-24 17:16:05.725407,2020-06-18 11:50:31+00:00,2020,6,18,3,11,50,1273583869753724930,1273583869753724930,...,[],"<a href=""https://mobile.twitter.com"" rel=""nofo...",nl,GHengeveld,20709019,"Amersfoort, Nederland",891.0,1485,1750,@jelleprins @picnic @JumboSupermarkt @alberthe...
1,2020-06-24 17:16:05.328164,2020-06-18 11:56:42+00:00,2020,6,18,3,11,56,1273585426268409859,1273585426268409859,...,[],"<a href=""http://twitter.com/download/iphone"" r...",nl,jelleprins,16546619,,,10510,1066,@Oli4K @picnic @JumboSupermarkt @albertheijn L...


In [46]:
df_query_03.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 23 columns):
 #   Column                   Non-Null Count  Dtype              
---  ------                   --------------  -----              
 0   mined_at                 2 non-null      datetime64[ns]     
 1   created_at               2 non-null      datetime64[ns, UTC]
 2   year                     2 non-null      int64              
 3   month                    2 non-null      int64              
 4   day                      2 non-null      int64              
 5   day_of_week              2 non-null      int64              
 6   hour                     2 non-null      int64              
 7   minute                   2 non-null      int64              
 8   tweet_id                 2 non-null      int64              
 9   tweet_id_str             2 non-null      object             
 10  in_reply_to_screen_name  1 non-null      object             
 11  in_reply_to_status_id    1 non-null 

In [47]:
min(df_query_03['created_at']),max(df_query_03['created_at'])

(Timestamp('2020-06-18 11:50:31+0000', tz='UTC'),
 Timestamp('2020-06-18 11:56:42+0000', tz='UTC'))

### Query_04:

`All these words`: picnic covid

**Result:** `https://twitter.com/search?q=picnic%20covid&src=typed_query`

In [48]:
query_04 = 'q=picnic%20covid'
result_04 = TweetMiner.search_tweets(max_pages = 20, raw_query = query_04)
len(result_04)

300

In [49]:
result_04[0]

{'mined_at': datetime.datetime(2020, 6, 24, 17, 16, 7, 946584),
 'created_at': 'Wed Jun 24 15:03:00 +0000 2020',
 'tweet_id': 1275806639749242880,
 'tweet_id_str': '1275806639749242880',
 'language': 'en',
 'text': 'A New Photographic Series Makes the Most of Lockdown Props https://t.co/ObIKvePeup',
 'hashtags': [],
 'source': '<a href="https://buffer.com" rel="nofollow">Buffer</a>',
 'user_id': 3438250311,
 'user_screen_name': 'squatterant',
 'followers_count': 10434,
 'friends_count': 6450}

In [50]:
df_query_04 = process_and_save(pd.DataFrame(result_04),"query_04",mine_user_twitter=0)
df_query_04.head()

Unnamed: 0,mined_at,created_at,year,month,day,day_of_week,hour,minute,tweet_id,tweet_id_str,...,hashtags,source,language,user_screen_name,user_id,user_location,user_favourites_count,followers_count,friends_count,text
0,2020-06-24 17:16:10.649508,2020-06-24 02:30:50+00:00,2020,6,24,2,2,30,1275617351044644866,1275617351044644866,...,[],"<a href=""http://twitter.com/download/iphone"" r...",en,saidTCeleste,1251661483668377606,,,45,537,Sitting her planning my birthday picnic and re...
1,2020-06-24 17:16:13.136173,2020-06-24 03:12:11+00:00,2020,6,24,2,3,12,1275627754130563075,1275627754130563075,...,[],"<a href=""http://twitter.com/download/iphone"" r...",en,ce1esteee,2733877696,,,614,659,i jus wanna know who said i couldn’t adopt a d...
2,2020-06-24 17:16:11.430794,2020-06-24 04:20:25+00:00,2020,6,24,2,4,20,1275644926664019969,1275644926664019969,...,[],"<a href=""http://twitter.com/download/iphone"" r...",en,SeanLikeConnery,995656730309808129,,,118,753,RT @ChrisMWilliams: Just wanted to let my radi...
3,2020-06-24 17:16:12.171422,2020-06-24 04:38:55+00:00,2020,6,24,2,4,38,1275649581141934081,1275649581141934081,...,[],"<a href=""https://mobile.twitter.com"" rel=""nofo...",en,TakenByTC,76281701,,,450,379,@difusella @ClayneCrawford @CCF_Birmingham I c...
4,2020-06-24 17:16:13.136173,2020-06-24 06:24:15+00:00,2020,6,24,2,6,24,1275676090686746624,1275676090686746624,...,[],"<a href=""https://mobile.twitter.com"" rel=""nofo...",en,picnic_crasher,18290294,,,119,1315,RT @yc: It’s really hitting right now me how m...


In [51]:
df_query_04.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15 entries, 0 to 14
Data columns (total 23 columns):
 #   Column                   Non-Null Count  Dtype              
---  ------                   --------------  -----              
 0   mined_at                 15 non-null     datetime64[ns]     
 1   created_at               15 non-null     datetime64[ns, UTC]
 2   year                     15 non-null     int64              
 3   month                    15 non-null     int64              
 4   day                      15 non-null     int64              
 5   day_of_week              15 non-null     int64              
 6   hour                     15 non-null     int64              
 7   minute                   15 non-null     int64              
 8   tweet_id                 15 non-null     int64              
 9   tweet_id_str             15 non-null     object             
 10  in_reply_to_screen_name  3 non-null      object             
 11  in_reply_to_status_id    3 non-nul

In [52]:
df_query_04['text'].values

array(['Sitting her planning my birthday picnic and realizing that all of my best guy friends moved back home because of Covid 🤧😩😩😩',
       'i jus wanna know who said i couldn’t adopt a dog this year at the picnic bc it’s canceled due to covid lmao',
       'RT @ChrisMWilliams: Just wanted to let my radio &amp; CF audience/Twitter family know that I have tested positive for COVID-19. This is no picn…',
       "@difusella @ClayneCrawford @CCF_Birmingham I can understand. With Covid-19 still going on, I can't imagine traveling out of the country this year. You're going to have so much fun next year at the picnic and that's great you can make it a longer trip. 😊💕",
       'RT @yc: It’s really hitting right now me how many people have fucking died of COVID, of lack of healthcare, of violence, of racism, and on…',
       'Woman Crush Wednesday!\n\n#timeout #picnic #pic #art #africa #black #culture #explore #eventmanagement #design #goodvibes #healthylifestyle #happiness💕 #xoxo #jumping #in

Well it seems that is not really the picnic we are looking for, so let's try `@picnic, covid`

### Query_04B:

`All these words`: @picnic covid

**Result:** `https://twitter.com/search?q=%40picnic%20covid&src=typed_query`

In [53]:
query_04B = 'q=%40picnic%20covid'
result_04B = TweetMiner.search_tweets(max_pages = 20, raw_query = query_04B)
len(result_04B)

0

The webpage shows the most recent result from April so the API didn't catch tweets here. So no results for this query unfortunately.

### Query_05:

`All these words`: JumboSupermarkt covid

**Result:** `https://twitter.com/search?q=JumboSupermarkt%20covid&src=typed_query`

In [54]:
query_05 = 'q=JumboSupermarkt%20covid'
result_05 = TweetMiner.search_tweets(max_pages = 20, raw_query = query_05)
len(result_05)

80

In [55]:
result_05[0]

{'mined_at': datetime.datetime(2020, 6, 24, 17, 16, 16, 767294),
 'created_at': 'Wed Jun 24 08:38:38 +0000 2020',
 'tweet_id': 1275709910286827520,
 'tweet_id_str': '1275709910286827520',
 'language': 'nl',
 'text': 'Hallo @JumboSupermarkt, ik weet dat iedere cent telt maar sinds wanneer is het policy om handgels/sprays (die &gt;70% alcohol moeten bevatten om effectief te zijn) bij de ingang van jullie supermarkten te vervangen door laffe chloorsopjes? Waar? Nou o.a. in #Rotterdam #COVID__19 #',
 'hashtags': [{'text': 'Rotterdam'}, {'text': 'COVID__19'}],
 'source': '<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>',
 'user_id': 1088012905650491392,
 'user_screen_name': 'GeeJeeAn',
 'followers_count': 60,
 'friends_count': 126}

In [56]:
df_query_05 = process_and_save(pd.DataFrame(result_05),"query_05",mine_user_twitter=0)
df_query_05.head()

Unnamed: 0,mined_at,created_at,year,month,day,day_of_week,hour,minute,tweet_id,tweet_id_str,...,hashtags,source,language,user_screen_name,user_id,user_location,user_favourites_count,followers_count,friends_count,text
0,2020-06-24 17:16:18.558470,2020-06-20 17:25:49+00:00,2020,6,20,5,17,25,1274393028183162881,1274393028183162881,...,"[{'text': 'Utrecht'}, {'text': 'merelstraat'},...","<a href=""http://twitter.com/download/android"" ...",nl,milieuzone,3567511941,,,171,75,Complimenten aan @JumboSupermarkt #Utrecht #me...
1,2020-06-24 17:16:20.566996,2020-06-21 09:33:32+00:00,2020,6,21,6,9,33,1274636561704005632,1274636561704005632,...,[],"<a href=""https://mobile.twitter.com"" rel=""nofo...",nl,JumboSupermarkt,2797822974,"Veghel, Nederland",3767.0,16208,1710,@deAZfan De 'Vierde wachtende' regel geldt tij...
2,2020-06-24 17:16:19.427127,2020-06-22 15:14:49+00:00,2020,6,22,0,15,14,1275084834914918407,1275084834914918407,...,[],"<a href=""http://twitter.com/download/android"" ...",nl,milieuzone,3567511941,"Utrecht, The Netherlands",858.0,171,75,@UtrechtseSjoerd @GemeenteUtrecht @JumboSuperm...
3,2020-06-24 17:16:20.207921,2020-06-24 08:38:38+00:00,2020,6,24,2,8,38,1275709910286827520,1275709910286827520,...,"[{'text': 'Rotterdam'}, {'text': 'COVID__19'}]","<a href=""http://twitter.com/download/android"" ...",nl,GeeJeeAn,1088012905650491392,,,60,126,"Hallo @JumboSupermarkt, ik weet dat iedere cen..."


In [57]:
df_query_05.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 23 columns):
 #   Column                   Non-Null Count  Dtype              
---  ------                   --------------  -----              
 0   mined_at                 4 non-null      datetime64[ns]     
 1   created_at               4 non-null      datetime64[ns, UTC]
 2   year                     4 non-null      int64              
 3   month                    4 non-null      int64              
 4   day                      4 non-null      int64              
 5   day_of_week              4 non-null      int64              
 6   hour                     4 non-null      int64              
 7   minute                   4 non-null      int64              
 8   tweet_id                 4 non-null      int64              
 9   tweet_id_str             4 non-null      object             
 10  in_reply_to_screen_name  2 non-null      object             
 11  in_reply_to_status_id    2 non-null 

### Query_06:

`All these words`: albertheijn covid ah

**Result:** `https://twitter.com/search?q=albertheijn%20covid&src=typed_query`

In [58]:
query_06 = 'q=albertheijn%20covid'
result_06 = TweetMiner.search_tweets(max_pages = 20, raw_query = query_06)
len(result_06)

160

In [59]:
result_06[0]

{'mined_at': datetime.datetime(2020, 6, 24, 17, 16, 20, 923592),
 'created_at': 'Wed Jun 24 11:39:37 +0000 2020',
 'tweet_id': 1275755454648520704,
 'tweet_id_str': '1275755454648520704',
 'language': 'nl',
 'text': 'Nog een avonddienst en dan 2 weken vakantie. Ben nog nooit zo toegeweest aan vakantie. 3 Maanden politieagent spelen en zelfs nu luisteren de mensen nog niet! Neem een winkelwagen, kom alleen om boodschappen. Hoe moeilijk is het? #COVID__19  #Albertheijn',
 'hashtags': [{'text': 'COVID__19'}, {'text': 'Albertheijn'}],
 'source': '<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>',
 'user_id': 212649116,
 'user_screen_name': 'AnjaFijnaut',
 'followers_count': 42,
 'friends_count': 100}

In [60]:
df_query_06 = process_and_save(pd.DataFrame(result_06),"query_06",mine_user_twitter=0)
df_query_06.head()

Unnamed: 0,mined_at,created_at,year,month,day,day_of_week,hour,minute,tweet_id,tweet_id_str,...,hashtags,source,language,user_screen_name,user_id,user_location,user_favourites_count,followers_count,friends_count,text
0,2020-06-24 17:16:22.823956,2020-06-18 09:59:31+00:00,2020,6,18,3,9,59,1273555936356061186,1273555936356061186,...,[{'text': 'Covid_19'}],"<a href=""https://mobile.twitter.com"" rel=""nofo...",nl,ChristienJanson,40452837,,,590,1105,"Goh @albertheijn Almere Lavendelplantsoen, gee..."
1,2020-06-24 17:16:24.992432,2020-06-18 17:10:57+00:00,2020,6,18,3,17,10,1273664510591733762,1273664510591733762,...,"[{'text': 'Retail'}, {'text': 'Innovatie'}, {'...","<a href=""http://www.linkedin.com/"" rel=""nofoll...",und,buisman_pro,756432448561487872,,,165,1321,#Retail #Innovatie #Covid_19 😷 #AlbertHeijn #C...
2,2020-06-24 17:16:22.823956,2020-06-19 20:43:35+00:00,2020,6,19,4,20,43,1274080408028753920,1274080408028753920,...,"[{'text': 'WillemAlexander'}, {'text': 'veroni...","<a href=""http://twitter.com/download/iphone"" r...",nl,wendersinke,381493231,,,606,519,"Iedereen heeft wel iets te zeggen, maar soms n..."
3,2020-06-24 17:16:23.045414,2020-06-20 06:14:39+00:00,2020,6,20,5,6,14,1274224124836089857,1274224124836089857,...,[{'text': 'Jumbo'}],"<a href=""https://mobile.twitter.com"" rel=""nofo...",en,JeannetteWezen1,1235877748163383296,,,193,176,@marutza_mh @albertheijn #Jumbo is selling bre...
4,2020-06-24 17:16:24.809824,2020-06-20 17:25:49+00:00,2020,6,20,5,17,25,1274393028183162881,1274393028183162881,...,"[{'text': 'Utrecht'}, {'text': 'merelstraat'},...","<a href=""http://twitter.com/download/android"" ...",nl,milieuzone,3567511941,,,171,75,Complimenten aan @JumboSupermarkt #Utrecht #me...


In [61]:
df_query_06.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8 entries, 0 to 7
Data columns (total 23 columns):
 #   Column                   Non-Null Count  Dtype              
---  ------                   --------------  -----              
 0   mined_at                 8 non-null      datetime64[ns]     
 1   created_at               8 non-null      datetime64[ns, UTC]
 2   year                     8 non-null      int64              
 3   month                    8 non-null      int64              
 4   day                      8 non-null      int64              
 5   day_of_week              8 non-null      int64              
 6   hour                     8 non-null      int64              
 7   minute                   8 non-null      int64              
 8   tweet_id                 8 non-null      int64              
 9   tweet_id_str             8 non-null      object             
 10  in_reply_to_screen_name  1 non-null      object             
 11  in_reply_to_status_id    1 non-null 

# Conclusions

* When retrieving data from Twitter timelines of @picnic, @JumboSupermarkten, and @albertijn we succeeded in going back respectively to data from 7th December 2018, 7th March, and 6th April 2020. Therefore, we will not be able to cover all COVID-19 period from the beginning (27th February 2020) for all 3 (online) supermarkets. In terms, of comparison we probably need to consider the constraints and consider the period covered by AH.

* Making search using queries via API query is trick and limited. For instances, picnic can be seen as *`an occasion when you have an informal meal of sandwiches, etc. outside, or the food itself`*[[definition Cambridge dictionary](https://dictionary.cambridge.org/dictionary/english/picnic)] which is not exactly what we want to analyze here. The other problem is that we can only reach the past 7 days from the day we apply the query. So, if we want to have enough data to analyze we need to apply the query for some weeks.

Following, we will go deeper in the data collect here and perform some EDA and sentiment analysis. Next challenge: Sentiment analysis in Dutch since the great majority of the tweets are in Dutch as expected.