# Create own dataset from Twitter

AGENDA:
1. Get auth keys for Twitter API
2. Scrape specific account
3. Scrape by keywords
4. Translate from any language to English using GoogleTranslate API
5. Detailed look at the twitter data
6. Store scraped data in csv on GoogleDrive

In [1]:
!pip install googletrans==4.0.0-rc1

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting googletrans==4.0.0-rc1
  Downloading googletrans-4.0.0rc1.tar.gz (20 kB)
Collecting httpx==0.13.3
  Downloading httpx-0.13.3-py3-none-any.whl (55 kB)
[K     |████████████████████████████████| 55 kB 1.5 MB/s 
[?25hCollecting hstspreload
  Downloading hstspreload-2021.12.1-py3-none-any.whl (1.3 MB)
[K     |████████████████████████████████| 1.3 MB 13.7 MB/s 
[?25hCollecting rfc3986<2,>=1.3
  Downloading rfc3986-1.5.0-py2.py3-none-any.whl (31 kB)
Collecting sniffio
  Downloading sniffio-1.2.0-py3-none-any.whl (10 kB)
Collecting httpcore==0.9.*
  Downloading httpcore-0.9.1-py3-none-any.whl (42 kB)
[K     |████████████████████████████████| 42 kB 438 kB/s 
[?25hCollecting h11<0.10,>=0.8
  Downloading h11-0.9.0-py2.py3-none-any.whl (53 kB)
[K     |████████████████████████████████| 53 kB 936 kB/s 
[?25hCollecting h2==3.*
  Downloading h2-3.2.0-py2.py3-none-any.whl (65 kB)
[K   

In [3]:
from googletrans import Translator

import pandas as pd
import time
import traceback
import tweepy

# 1. Authentification

First, you need to register yourself as a developer to get an access to Twitter API.

https://developer.twitter.com/

After completing the form you will get consumer/access keys. 

In [4]:
# the values below are fake due to security reason. Substitute yours unique credentials.

# consumer_key, consumer_secret
api_key = "xxxxxxxxxxxxxxxxxxx"
api_secret = "yyyyyyyyyyyyyyyyy"
access_token = "cccccccccccccccc"
access_secret = "wwwwwwwwwwwwwwwwwwwwwwww"

auth = tweepy.OAuthHandler(api_key, api_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth, wait_on_rate_limit=True)

# 2. Scrape twitter accounts 

The code below let you to scrape tweets from specific tweeter users. 

I have chosen to scrape 20 most recent tweets from three news media accounts. Scraped data to be saved in csv file.

In [23]:
tweets, news_media = [], ['cnnbrk', 'ArabNewsjp', 'France24_fr']

count = 20

try: 
    for news in news_media:
        print(f"Scraping: {news}")
        for tweet in api.user_timeline(id = news, count=count):
            tweets.append((news, tweet.created_at, tweet.id, tweet.text))
        time.sleep(15)    
    df = pd.DataFrame(tweets, columns = ['news_media', 'created_at', 'tweet_id', 'text'])
    # df.to_csv('news_tweets.csv')
    print("Completed.")
    
except BaseException as e:
    traceback.print_exc()

Scraping: cnnbrk
Scraping: ArabNewsjp
Scraping: France24_fr
Completed.


In [25]:
df.head(10)

Unnamed: 0,news_media,created_at,tweet_id,text
0,cnnbrk,2022-06-07 01:20:46,1533982263473541120,The Trump campaign told fake Georgia electors ...
1,cnnbrk,2022-06-06 20:06:09,1533903088918188037,British Prime Minister Boris Johnson has survi...
2,cnnbrk,2022-06-06 19:57:45,1533900974552756225,The Justice Department has charged Proud Boys ...
3,cnnbrk,2022-06-06 07:26:20,1533711876328153090,British Prime Minister Boris Johnson will face...
4,cnnbrk,2022-06-05 15:37:11,1533473013429043200,Rafael Nadal wins the French Open in straight ...
5,cnnbrk,2022-06-05 10:21:25,1533393549391192064,Putin warns that his forces will strike new ta...
6,cnnbrk,2022-06-04 14:29:07,1533093496814546946,Top-ranked Iga Swiatek wins the French Open wo...
7,cnnbrk,2022-06-03 23:17:29,1532864076531044352,Dave McCormick has conceded Pennsylvania's GOP...
8,cnnbrk,2022-06-03 15:54:11,1532752515284013056,Former Trump White House adviser Peter Navarro...
9,cnnbrk,2022-06-03 13:00:00,1532708679094611969,At least three people were killed and several ...


# 3/4. Scrape tweets by query. Translate in English

The code below lets you to scrape tweets by keywords. You get the most recent tweets by the keywords from Twitter stream. Additionally, if the tweet was not in English, I would like to translate them into English. Save all data in csv.

Therefore, I extract for every tweet: 

* time it was created_at
* tweet_id 
* tweet text 
* username (unique tweeter username) 
* name (name written by user)
* location (where the user is from)
* followers_count (number of user's followers)
* original language

In [27]:
tweets, text_query = [], 'coronavirus OR virus OR covid-19 OR covid19 OR Κορονοϊός'
count = 10

try:
    for tweet in api.search(q=text_query, count=count, result_type='recent',
                           include_entities=True,
                           monitor_rate_limit=True, 
                           wait_on_rate_limit=True):
        # print(f"Raw tweet: {tweet}")
        tweet_text = ""
            
        try:
            tr = Translator()
            
            if tweet.lang and tweet.lang not in ['en', 'und']:
                print(f"Raw tweet: {tweet.text}")
                    
                if tweet.text:
                    translated = tr.translate(tweet.text)
                    if translated:
                        tweet_text = translated.__dict__()["text"]
            else:
                tweet_text = tweet.text
                
        except Exception as e:
            traceback.print_exc()
            pass
        
        tweets.append((tweet.created_at, tweet.id, tweet.text, tweet_text, tweet.user.screen_name, tweet.user.name, 
                               tweet.user.location, tweet.user.followers_count, tweet.lang))
        
        df = pd.DataFrame(tweets, columns = ['created_at', 'tweet_id', 'text_raw', 'text_en', 'username', 'name', 'location', 'followers_count', 'lang'])
        # df.to_csv('tweets_by_keywords.csv')
        time.sleep(3)
        
    print("Completed.")
    
except BaseException as e:
    traceback.print_exc()        

Raw tweet: RT @myuuko: 日本政府の10万円給付には景気刺激効果があったよ！という研究。280万件の銀行口座のデータによると，給付の週にすぐ支出が増え，その1ヵ月後も支出が上昇していた。/ Consumption responses to COVID-19 payments: E…
Raw tweet: @Karmashoarma @RamoniMeijering @Kevinpkrr @HuyskensJohan @PetraWoude @rblommestijn News flash: Niet wappies zijn oo… https://t.co/fREyYXkGdF
Raw tweet: RT @politicaestereo: #PruebasPCR | Las cabinas móviles del @EICEES se instalarán hoy en los parques centrales de Turín, Ahuachapán ; Dulce…
Raw tweet: RT @1st_Army_Area: รายงานผู้ป่วยติดเชื้อโควิด-19 วันอังคารที่ 7 มิถุนายน 2565 ผู้ป่วยใหม่ 2,224 คน หายป่วย 4,824 คน และเสียชีวิต 20 คน

#โค…
Raw tweet: RT @HectorRossete: En México se reportó más de 18 mil casos de COVID-19 en una semana; yo veo más próximo el repunte de casos, que la adqui…
Raw tweet: RT @myuuko: 日本政府の10万円給付には景気刺激効果があったよ！という研究。280万件の銀行口座のデータによると，給付の週にすぐ支出が増え，その1ヵ月後も支出が上昇していた。/ Consumption responses to COVID-19 payments: E…
Completed.


In [28]:
df.head(3)

Unnamed: 0,created_at,tweet_id,text_raw,text_en,username,name,location,followers_count,lang
0,2022-06-07 12:24:02,1534149180977201153,RT @myuuko: 日本政府の10万円給付には景気刺激効果があったよ！という研究。280...,"RT @myuuko: The Japanese government's 100,000 ...",kusuriya330,くすりや,足立区の北東端（陸の孤島）,101,ja
1,2022-06-07 12:24:02,1534149180813352960,@Karmashoarma @RamoniMeijering @Kevinpkrr @Huy...,@Karmashoarma @ramonimeijering @kevinpkrr @huy...,cat_coronavirus,Coronavirus,"Amsterdam, The Netherlands",466,nl
2,2022-06-07 12:24:02,1534149180603572230,RT @NavroopSingh_: Adverse effects of COVID-19...,RT @NavroopSingh_: Adverse effects of COVID-19...,vikas1689,Akhand Bharat 🇮🇳,,1396,en


In [29]:
df[['text_raw', 'text_en', 'lang']]

Unnamed: 0,text_raw,text_en,lang
0,RT @myuuko: 日本政府の10万円給付には景気刺激効果があったよ！という研究。280...,"RT @myuuko: The Japanese government's 100,000 ...",ja
1,@Karmashoarma @RamoniMeijering @Kevinpkrr @Huy...,@Karmashoarma @ramonimeijering @kevinpkrr @huy...,nl
2,RT @NavroopSingh_: Adverse effects of COVID-19...,RT @NavroopSingh_: Adverse effects of COVID-19...,en
3,RT @IAmAshAsh1: Latest CDC Data Shows Covid-19...,RT @IAmAshAsh1: Latest CDC Data Shows Covid-19...,en
4,"We have had locust invasions, the terrorist bo...","We have had locust invasions, the terrorist bo...",en
5,RT @politicaestereo: #PruebasPCR | Las cabinas...,Rt @politica celebrity: #Pruebaspcr |The mobil...,es
6,RT @1st_Army_Area: รายงานผู้ป่วยติดเชื้อโควิด-...,RT @1st_army_area: Report of patients with Cho...,th
7,RT @HectorRossete: En México se reportó más de...,"RT @Hectorrossete: In Mexico, more than 18 tho...",es
8,RT @myuuko: 日本政府の10万円給付には景気刺激効果があったよ！という研究。280...,"RT @myuuko: The Japanese government's 100,000 ...",ja
9,"RT @Peston: For many, @uksciencechief became s...","RT @Peston: For many, @uksciencechief became s...",en


## 5. Explore api.search response

In [30]:
tweet

Status(_api=<tweepy.api.API object at 0x7f3a366ba510>, _json={'created_at': 'Tue Jun 07 12:24:01 +0000 2022', 'id': 1534149177856376832, 'id_str': '1534149177856376832', 'text': 'RT @Peston: For many, @uksciencechief became something of a hero for the way he helped protect us from #Covid_19. So @itvnews asked him: “w…', 'truncated': False, 'entities': {'hashtags': [{'text': 'Covid_19', 'indices': [103, 112]}], 'symbols': [], 'user_mentions': [{'screen_name': 'Peston', 'name': 'Robert Peston', 'id': 14157134, 'id_str': '14157134', 'indices': [3, 10]}, {'screen_name': 'uksciencechief', 'name': 'Sir Patrick Vallance', 'id': 264124770, 'id_str': '264124770', 'indices': [22, 37]}, {'screen_name': 'itvnews', 'name': 'ITV News', 'id': 21866939, 'id_str': '21866939', 'indices': [117, 125]}], 'urls': []}, 'metadata': {'iso_language_code': 'en', 'result_type': 'recent'}, 'source': '<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>', 'in_reply_to_status_id': Non

Status(_api=<tweepy.api.API object at 0x7f3a366ba510>, _json={'created_at': 'Tue Jun 07 12:24:01 +0000 2022', 'id': 1534149177856376832, 'id_str': '1534149177856376832', 'text': 'RT @Peston: For many, @uksciencechief became something of a hero for the way he helped protect us from #Covid_19. So @itvnews asked him: “w…', 'truncated': False, 'entities': {'hashtags': [{'text': 'Covid_19', 'indices': [103, 112]}], 'symbols': [], 'user_mentions': [{'screen_name': 'Peston', 'name': 'Robert Peston', 'id': 14157134, 'id_str': '14157134', 'indices': [3, 10]}, {'screen_name': 'uksciencechief', 'name': 'Sir Patrick Vallance', 'id': 264124770, 'id_str': '264124770', 'indices': [22, 37]}, {'screen_name': 'itvnews', 'name': 'ITV News', 'id': 21866939, 'id_str': '21866939', 'indices': [117, 125]}], 'urls': []}, 'metadata': {'iso_language_code': 'en', 'result_type': 'recent'}, 'source': '<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 1468551971519008773, 'id_str': '1468551971519008773', 'name': 'David Allan', 'screen_name': 'DavidAllan704', 'location': 'Oxfordshire', 'description': 'Principal Beamline Scientist, beamline I19, Diamond Light Source', 'url': None, 'entities': {'description': {'urls': []}}, 'protected': False, 'followers_count': 388, 'friends_count': 2876, 'listed_count': 0, 'created_at': 'Wed Dec 08 12:04:46 +0000 2021', 'favourites_count': 54129, 'utc_offset': None, 'time_zone': None, 'geo_enabled': False, 'verified': False, 'statuses_count': 4495, 'lang': None, 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': False, 'profile_background_color': 'F5F8FA', 'profile_background_image_url': None, 'profile_background_image_url_https': None, 'profile_background_tile': False, 'profile_image_url': 'http://pbs.twimg.com/profile_images/1468552130172964864/5FVWDo0s_normal.jpg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/1468552130172964864/5FVWDo0s_normal.jpg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/1468551971519008773/1639563844', 'profile_link_color': '1DA1F2', 'profile_sidebar_border_color': 'C0DEED', 'profile_sidebar_fill_color': 'DDEEF6', 'profile_text_color': '333333', 'profile_use_background_image': True, 'has_extended_profile': True, 'default_profile': True, 'default_profile_image': False, 'following': False, 'follow_request_sent': False, 'notifications': False, 'translator_type': 'none', 'withheld_in_countries': []}, 'geo': None, 'coordinates': None, 'place': None, 'contributors': None, 'retweeted_status': {'created_at': 'Tue Jun 07 12:05:26 +0000 2022', 'id': 1534144501874409472, 'id_str': '1534144501874409472', 'text': 'For many, @uksciencechief became something of a hero for the way he helped protect us from #Covid_19. So @itvnews a… https://t.co/D25kw8iJ2T', 'truncated': True, 'entities': {'hashtags': [{'text': 'Covid_19', 'indices': [91, 100]}], 'symbols': [], 'user_mentions': [{'screen_name': 'uksciencechief', 'name': 'Sir Patrick Vallance', 'id': 264124770, 'id_str': '264124770', 'indices': [10, 25]}, {'screen_name': 'itvnews', 'name': 'ITV News', 'id': 21866939, 'id_str': '21866939', 'indices': [105, 113]}], 'urls': [{'url': 'https://t.co/D25kw8iJ2T', 'expanded_url': 'https://twitter.com/i/web/status/1534144501874409472', 'display_url': 'twitter.com/i/web/status/1…', 'indices': [117, 140]}]}, 'metadata': {'iso_language_code': 'en', 'result_type': 'recent'}, 'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 14157134, 'id_str': '14157134', 'name': 'Robert Peston', 'screen_name': 'Peston', 'location': '', 'description': 'ITV (pol ed), Speakers for Schools (founder), writer (WTF), Hospice UK (vice pres), Arsenal (East Stand), Peston (as in #Peston show, 10.45pm Weds). So?', 'url': 'https://t.co/KcrDn3jvur', 'entities': {'url': {'urls': [{'url': 'https://t.co/KcrDn3jvur', 'expanded_url': 'http://itv.com/robertpeston', 'display_url': 'itv.com/robertpeston', 'indices': [0, 23]}]}, 'description': {'urls': []}}, 'protected': False, 'followers_count': 1237386, 'friends_count': 1427, 'listed_count': 8945, 'created_at': 'Sun Mar 16 11:41:49 +0000 2008', 'favourites_count': 181, 'utc_offset': None, 'time_zone': None, 'geo_enabled': False, 'verified': True, 'statuses_count': 31854, 'lang': None, 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': False, 'profile_background_color': 'FFFFFF', 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_tile': False, 'profile_image_url': 'http://pbs.twimg.com/profile_images/1180055085801443329/JAUJWMmN_normal.jpg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/1180055085801443329/JAUJWMmN_normal.jpg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/14157134/1619538125', 'profile_link_color': '1F527B', 'profile_sidebar_border_color': 'CCCCCC', 'profile_sidebar_fill_color': 'FFFFFF', 'profile_text_color': '5A5A5A', 'profile_use_background_image': False, 'has_extended_profile': False, 'default_profile': False, 'default_profile_image': False, 'following': False, 'follow_request_sent': False, 'notifications': False, 'translator_type': 'none', 'withheld_in_countries': []}, 'geo': None, 'coordinates': None, 'place': None, 'contributors': None, 'is_quote_status': False, 'retweet_count': 38, 'favorite_count': 172, 'favorited': False, 'retweeted': False, 'lang': 'en'}, 'is_quote_status': False, 'retweet_count': 38, 'favorite_count': 0, 'favorited': False, 'retweeted': False, 'lang': 'en'}, created_at=datetime.datetime(2022, 6, 7, 12, 24, 1), id=1534149177856376832, id_str='1534149177856376832', text='RT @Peston: For many, @uksciencechief became something of a hero for the way he helped protect us from #Covid_19. So @itvnews asked him: “w…', truncated=False, entities={'hashtags': [{'text': 'Covid_19', 'indices': [103, 112]}], 'symbols': [], 'user_mentions': [{'screen_name': 'Peston', 'name': 'Robert Peston', 'id': 14157134, 'id_str': '14157134', 'indices': [3, 10]}, {'screen_name': 'uksciencechief', 'name': 'Sir Patrick Vallance', 'id': 264124770, 'id_str': '264124770', 'indices': [22, 37]}, {'screen_name': 'itvnews', 'name': 'ITV News', 'id': 21866939, 'id_str': '21866939', 'indices': [117, 125]}], 'urls': []}, metadata={'iso_language_code': 'en', 'result_type': 'recent'}, source='Twitter for Android', source_url='http://twitter.com/download/android', in_reply_to_status_id=None, in_reply_to_status_id_str=None, in_reply_to_user_id=None, in_reply_to_user_id_str=None, in_reply_to_screen_name=None, author=User(_api=<tweepy.api.API object at 0x7f3a366ba510>, _json={'id': 1468551971519008773, 'id_str': '1468551971519008773', 'name': 'David Allan', 'screen_name': 'DavidAllan704', 'location': 'Oxfordshire', 'description': 'Principal Beamline Scientist, beamline I19, Diamond Light Source', 'url': None, 'entities': {'description': {'urls': []}}, 'protected': False, 'followers_count': 388, 'friends_count': 2876, 'listed_count': 0, 'created_at': 'Wed Dec 08 12:04:46 +0000 2021', 'favourites_count': 54129, 'utc_offset': None, 'time_zone': None, 'geo_enabled': False, 'verified': False, 'statuses_count': 4495, 'lang': None, 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': False, 'profile_background_color': 'F5F8FA', 'profile_background_image_url': None, 'profile_background_image_url_https': None, 'profile_background_tile': False, 'profile_image_url': 'http://pbs.twimg.com/profile_images/1468552130172964864/5FVWDo0s_normal.jpg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/1468552130172964864/5FVWDo0s_normal.jpg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/1468551971519008773/1639563844', 'profile_link_color': '1DA1F2', 'profile_sidebar_border_color': 'C0DEED', 'profile_sidebar_fill_color': 'DDEEF6', 'profile_text_color': '333333', 'profile_use_background_image': True, 'has_extended_profile': True, 'default_profile': True, 'default_profile_image': False, 'following': False, 'follow_request_sent': False, 'notifications': False, 'translator_type': 'none', 'withheld_in_countries': []}, id=1468551971519008773, id_str='1468551971519008773', name='David Allan', screen_name='DavidAllan704', location='Oxfordshire', description='Principal Beamline Scientist, beamline I19, Diamond Light Source', url=None, entities={'description': {'urls': []}}, protected=False, followers_count=388, friends_count=2876, listed_count=0, created_at=datetime.datetime(2021, 12, 8, 12, 4, 46), favourites_count=54129, utc_offset=None, time_zone=None, geo_enabled=False, verified=False, statuses_count=4495, lang=None, contributors_enabled=False, is_translator=False, is_translation_enabled=False, profile_background_color='F5F8FA', profile_background_image_url=None, profile_background_image_url_https=None, profile_background_tile=False, profile_image_url='http://pbs.twimg.com/profile_images/1468552130172964864/5FVWDo0s_normal.jpg', profile_image_url_https='https://pbs.twimg.com/profile_images/1468552130172964864/5FVWDo0s_normal.jpg', profile_banner_url='https://pbs.twimg.com/profile_banners/1468551971519008773/1639563844', profile_link_color='1DA1F2', profile_sidebar_border_color='C0DEED', profile_sidebar_fill_color='DDEEF6', profile_text_color='333333', profile_use_background_image=True, has_extended_profile=True, default_profile=True, default_profile_image=False, following=False, follow_request_sent=False, notifications=False, translator_type='none', withheld_in_countries=[]), user=User(_api=<tweepy.api.API object at 0x7f3a366ba510>, _json={'id': 1468551971519008773, 'id_str': '1468551971519008773', 'name': 'David Allan', 'screen_name': 'DavidAllan704', 'location': 'Oxfordshire', 'description': 'Principal Beamline Scientist, beamline I19, Diamond Light Source', 'url': None, 'entities': {'description': {'urls': []}}, 'protected': False, 'followers_count': 388, 'friends_count': 2876, 'listed_count': 0, 'created_at': 'Wed Dec 08 12:04:46 +0000 2021', 'favourites_count': 54129, 'utc_offset': None, 'time_zone': None, 'geo_enabled': False, 'verified': False, 'statuses_count': 4495, 'lang': None, 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': False, 'profile_background_color': 'F5F8FA', 'profile_background_image_url': None, 'profile_background_image_url_https': None, 'profile_background_tile': False, 'profile_image_url': 'http://pbs.twimg.com/profile_images/1468552130172964864/5FVWDo0s_normal.jpg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/1468552130172964864/5FVWDo0s_normal.jpg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/1468551971519008773/1639563844', 'profile_link_color': '1DA1F2', 'profile_sidebar_border_color': 'C0DEED', 'profile_sidebar_fill_color': 'DDEEF6', 'profile_text_color': '333333', 'profile_use_background_image': True, 'has_extended_profile': True, 'default_profile': True, 'default_profile_image': False, 'following': False, 'follow_request_sent': False, 'notifications': False, 'translator_type': 'none', 'withheld_in_countries': []}, id=1468551971519008773, id_str='1468551971519008773', name='David Allan', screen_name='DavidAllan704', location='Oxfordshire', description='Principal Beamline Scientist, beamline I19, Diamond Light Source', url=None, entities={'description': {'urls': []}}, protected=False, followers_count=388, friends_count=2876, listed_count=0, created_at=datetime.datetime(2021, 12, 8, 12, 4, 46), favourites_count=54129, utc_offset=None, time_zone=None, geo_enabled=False, verified=False, statuses_count=4495, lang=None, contributors_enabled=False, is_translator=False, is_translation_enabled=False, profile_background_color='F5F8FA', profile_background_image_url=None, profile_background_image_url_https=None, profile_background_tile=False, profile_image_url='http://pbs.twimg.com/profile_images/1468552130172964864/5FVWDo0s_normal.jpg', profile_image_url_https='https://pbs.twimg.com/profile_images/1468552130172964864/5FVWDo0s_normal.jpg', profile_banner_url='https://pbs.twimg.com/profile_banners/1468551971519008773/1639563844', profile_link_color='1DA1F2', profile_sidebar_border_color='C0DEED', profile_sidebar_fill_color='DDEEF6', profile_text_color='333333', profile_use_background_image=True, has_extended_profile=True, default_profile=True, default_profile_image=False, following=False, follow_request_sent=False, notifications=False, translator_type='none', withheld_in_countries=[]), geo=None, coordinates=None, place=None, contributors=None, retweeted_status=Status(_api=<tweepy.api.API object at 0x7f3a366ba510>, _json={'created_at': 'Tue Jun 07 12:05:26 +0000 2022', 'id': 1534144501874409472, 'id_str': '1534144501874409472', 'text': 'For many, @uksciencechief became something of a hero for the way he helped protect us from #Covid_19. So @itvnews a… https://t.co/D25kw8iJ2T', 'truncated': True, 'entities': {'hashtags': [{'text': 'Covid_19', 'indices': [91, 100]}], 'symbols': [], 'user_mentions': [{'screen_name': 'uksciencechief', 'name': 'Sir Patrick Vallance', 'id': 264124770, 'id_str': '264124770', 'indices': [10, 25]}, {'screen_name': 'itvnews', 'name': 'ITV News', 'id': 21866939, 'id_str': '21866939', 'indices': [105, 113]}], 'urls': [{'url': 'https://t.co/D25kw8iJ2T', 'expanded_url': 'https://twitter.com/i/web/status/1534144501874409472', 'display_url': 'twitter.com/i/web/status/1…', 'indices': [117, 140]}]}, 'metadata': {'iso_language_code': 'en', 'result_type': 'recent'}, 'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 14157134, 'id_str': '14157134', 'name': 'Robert Peston', 'screen_name': 'Peston', 'location': '', 'description': 'ITV (pol ed), Speakers for Schools (founder), writer (WTF), Hospice UK (vice pres), Arsenal (East Stand), Peston (as in #Peston show, 10.45pm Weds). So?', 'url': 'https://t.co/KcrDn3jvur', 'entities': {'url': {'urls': [{'url': 'https://t.co/KcrDn3jvur', 'expanded_url': 'http://itv.com/robertpeston', 'display_url': 'itv.com/robertpeston', 'indices': [0, 23]}]}, 'description': {'urls': []}}, 'protected': False, 'followers_count': 1237386, 'friends_count': 1427, 'listed_count': 8945, 'created_at': 'Sun Mar 16 11:41:49 +0000 2008', 'favourites_count': 181, 'utc_offset': None, 'time_zone': None, 'geo_enabled': False, 'verified': True, 'statuses_count': 31854, 'lang': None, 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': False, 'profile_background_color': 'FFFFFF', 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_tile': False, 'profile_image_url': 'http://pbs.twimg.com/profile_images/1180055085801443329/JAUJWMmN_normal.jpg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/1180055085801443329/JAUJWMmN_normal.jpg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/14157134/1619538125', 'profile_link_color': '1F527B', 'profile_sidebar_border_color': 'CCCCCC', 'profile_sidebar_fill_color': 'FFFFFF', 'profile_text_color': '5A5A5A', 'profile_use_background_image': False, 'has_extended_profile': False, 'default_profile': False, 'default_profile_image': False, 'following': False, 'follow_request_sent': False, 'notifications': False, 'translator_type': 'none', 'withheld_in_countries': []}, 'geo': None, 'coordinates': None, 'place': None, 'contributors': None, 'is_quote_status': False, 'retweet_count': 38, 'favorite_count': 172, 'favorited': False, 'retweeted': False, 'lang': 'en'}, created_at=datetime.datetime(2022, 6, 7, 12, 5, 26), id=1534144501874409472, id_str='1534144501874409472', text='For many, @uksciencechief became something of a hero for the way he helped protect us from #Covid_19. So @itvnews a… https://t.co/D25kw8iJ2T', truncated=True, entities={'hashtags': [{'text': 'Covid_19', 'indices': [91, 100]}], 'symbols': [], 'user_mentions': [{'screen_name': 'uksciencechief', 'name': 'Sir Patrick Vallance', 'id': 264124770, 'id_str': '264124770', 'indices': [10, 25]}, {'screen_name': 'itvnews', 'name': 'ITV News', 'id': 21866939, 'id_str': '21866939', 'indices': [105, 113]}], 'urls': [{'url': 'https://t.co/D25kw8iJ2T', 'expanded_url': 'https://twitter.com/i/web/status/1534144501874409472', 'display_url': 'twitter.com/i/web/status/1…', 'indices': [117, 140]}]}, metadata={'iso_language_code': 'en', 'result_type': 'recent'}, source='Twitter for iPhone', source_url='http://twitter.com/download/iphone', in_reply_to_status_id=None, in_reply_to_status_id_str=None, in_reply_to_user_id=None, in_reply_to_user_id_str=None, in_reply_to_screen_name=None, author=User(_api=<tweepy.api.API object at 0x7f3a366ba510>, _json={'id': 14157134, 'id_str': '14157134', 'name': 'Robert Peston', 'screen_name': 'Peston', 'location': '', 'description': 'ITV (pol ed), Speakers for Schools (founder), writer (WTF), Hospice UK (vice pres), Arsenal (East Stand), Peston (as in #Peston show, 10.45pm Weds). So?', 'url': 'https://t.co/KcrDn3jvur', 'entities': {'url': {'urls': [{'url': 'https://t.co/KcrDn3jvur', 'expanded_url': 'http://itv.com/robertpeston', 'display_url': 'itv.com/robertpeston', 'indices': [0, 23]}]}, 'description': {'urls': []}}, 'protected': False, 'followers_count': 1237386, 'friends_count': 1427, 'listed_count': 8945, 'created_at': 'Sun Mar 16 11:41:49 +0000 2008', 'favourites_count': 181, 'utc_offset': None, 'time_zone': None, 'geo_enabled': False, 'verified': True, 'statuses_count': 31854, 'lang': None, 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': False, 'profile_background_color': 'FFFFFF', 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_tile': False, 'profile_image_url': 'http://pbs.twimg.com/profile_images/1180055085801443329/JAUJWMmN_normal.jpg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/1180055085801443329/JAUJWMmN_normal.jpg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/14157134/1619538125', 'profile_link_color': '1F527B', 'profile_sidebar_border_color': 'CCCCCC', 'profile_sidebar_fill_color': 'FFFFFF', 'profile_text_color': '5A5A5A', 'profile_use_background_image': False, 'has_extended_profile': False, 'default_profile': False, 'default_profile_image': False, 'following': False, 'follow_request_sent': False, 'notifications': False, 'translator_type': 'none', 'withheld_in_countries': []}, id=14157134, id_str='14157134', name='Robert Peston', screen_name='Peston', location='', description='ITV (pol ed), Speakers for Schools (founder), writer (WTF), Hospice UK (vice pres), Arsenal (East Stand), Peston (as in #Peston show, 10.45pm Weds). So?', url='https://t.co/KcrDn3jvur', entities={'url': {'urls': [{'url': 'https://t.co/KcrDn3jvur', 'expanded_url': 'http://itv.com/robertpeston', 'display_url': 'itv.com/robertpeston', 'indices': [0, 23]}]}, 'description': {'urls': []}}, protected=False, followers_count=1237386, friends_count=1427, listed_count=8945, created_at=datetime.datetime(2008, 3, 16, 11, 41, 49), favourites_count=181, utc_offset=None, time_zone=None, geo_enabled=False, verified=True, statuses_count=31854, lang=None, contributors_enabled=False, is_translator=False, is_translation_enabled=False, profile_background_color='FFFFFF', profile_background_image_url='http://abs.twimg.com/images/themes/theme1/bg.png', profile_background_image_url_https='https://abs.twimg.com/images/themes/theme1/bg.png', profile_background_tile=False, profile_image_url='http://pbs.twimg.com/profile_images/1180055085801443329/JAUJWMmN_normal.jpg', profile_image_url_https='https://pbs.twimg.com/profile_images/1180055085801443329/JAUJWMmN_normal.jpg', profile_banner_url='https://pbs.twimg.com/profile_banners/14157134/1619538125', profile_link_color='1F527B', profile_sidebar_border_color='CCCCCC', profile_sidebar_fill_color='FFFFFF', profile_text_color='5A5A5A', profile_use_background_image=False, has_extended_profile=False, default_profile=False, default_profile_image=False, following=False, follow_request_sent=False, notifications=False, translator_type='none', withheld_in_countries=[]), user=User(_api=<tweepy.api.API object at 0x7f3a366ba510>, _json={'id': 14157134, 'id_str': '14157134', 'name': 'Robert Peston', 'screen_name': 'Peston', 'location': '', 'description': 'ITV (pol ed), Speakers for Schools (founder), writer (WTF), Hospice UK (vice pres), Arsenal (East Stand), Peston (as in #Peston show, 10.45pm Weds). So?', 'url': 'https://t.co/KcrDn3jvur', 'entities': {'url': {'urls': [{'url': 'https://t.co/KcrDn3jvur', 'expanded_url': 'http://itv.com/robertpeston', 'display_url': 'itv.com/robertpeston', 'indices': [0, 23]}]}, 'description': {'urls': []}}, 'protected': False, 'followers_count': 1237386, 'friends_count': 1427, 'listed_count': 8945, 'created_at': 'Sun Mar 16 11:41:49 +0000 2008', 'favourites_count': 181, 'utc_offset': None, 'time_zone': None, 'geo_enabled': False, 'verified': True, 'statuses_count': 31854, 'lang': None, 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': False, 'profile_background_color': 'FFFFFF', 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_tile': False, 'profile_image_url': 'http://pbs.twimg.com/profile_images/1180055085801443329/JAUJWMmN_normal.jpg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/1180055085801443329/JAUJWMmN_normal.jpg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/14157134/1619538125', 'profile_link_color': '1F527B', 'profile_sidebar_border_color': 'CCCCCC', 'profile_sidebar_fill_color': 'FFFFFF', 'profile_text_color': '5A5A5A', 'profile_use_background_image': False, 'has_extended_profile': False, 'default_profile': False, 'default_profile_image': False, 'following': False, 'follow_request_sent': False, 'notifications': False, 'translator_type': 'none', 'withheld_in_countries': []}, id=14157134, id_str='14157134', name='Robert Peston', screen_name='Peston', location='', description='ITV (pol ed), Speakers for Schools (founder), writer (WTF), Hospice UK (vice pres), Arsenal (East Stand), Peston (as in #Peston show, 10.45pm Weds). So?', url='https://t.co/KcrDn3jvur', entities={'url': {'urls': [{'url': 'https://t.co/KcrDn3jvur', 'expanded_url': 'http://itv.com/robertpeston', 'display_url': 'itv.com/robertpeston', 'indices': [0, 23]}]}, 'description': {'urls': []}}, protected=False, followers_count=1237386, friends_count=1427, listed_count=8945, created_at=datetime.datetime(2008, 3, 16, 11, 41, 49), favourites_count=181, utc_offset=None, time_zone=None, geo_enabled=False, verified=True, statuses_count=31854, lang=None, contributors_enabled=False, is_translator=False, is_translation_enabled=False, profile_background_color='FFFFFF', profile_background_image_url='http://abs.twimg.com/images/themes/theme1/bg.png', profile_background_image_url_https='https://abs.twimg.com/images/themes/theme1/bg.png', profile_background_tile=False, profile_image_url='http://pbs.twimg.com/profile_images/1180055085801443329/JAUJWMmN_normal.jpg', profile_image_url_https='https://pbs.twimg.com/profile_images/1180055085801443329/JAUJWMmN_normal.jpg', profile_banner_url='https://pbs.twimg.com/profile_banners/14157134/1619538125', profile_link_color='1F527B', profile_sidebar_border_color='CCCCCC', profile_sidebar_fill_color='FFFFFF', profile_text_color='5A5A5A', profile_use_background_image=False, has_extended_profile=False, default_profile=False, default_profile_image=False, following=False, follow_request_sent=False, notifications=False, translator_type='none', withheld_in_countries=[]), geo=None, coordinates=None, place=None, contributors=None, is_quote_status=False, retweet_count=38, favorite_count=172, favorited=False, retweeted=False, lang='en'), is_quote_status=False, retweet_count=38, favorite_count=0, favorited=False, retweeted=False, lang='en')

## 6. Save data as csv on drive

In [31]:
# save dataset in csv on drive
from google.colab import drive
drive.mount('/gdrive')

Drive already mounted at /gdrive; to attempt to forcibly remount, call drive.mount("/gdrive", force_remount=True).


In [32]:
path = "/gdrive/My Drive/YouTube/data/"
df.to_csv(path + 'tweets_by_keywords.csv')