# Sentiment Analysis Covid Vaccine

data: https://www.kaggle.com/gpreda/all-covid19-vaccines-tweets

Context

I collect recent tweets about the COVID-19 vaccines used in entire world on large scale, as following:

Pfizer/BioNTech;
Sinopharm;
Sinovac;
Moderna;
Oxford/AstraZeneca;
Covaxin;
Sputnik V.
Data collection

The data is collected using tweepy Python package to access Twitter API. For each of the vaccine I use relevant search term (most frequently used in Twitter to refer to the respective vaccine)

Data collection frequency

Initial data was merged from tweets about Pfizer/BioNTech vaccine. I added then tweets from Sinopharm, Sinovac (both Chinese-produced vaccines), Moderna, Oxford/Astra-Zeneca, Covaxin and Sputnik V vaccines. The collection was in the first days twice a day, until I identified approximatively the new tweets quota and then collection (for all vaccines) stabilized at once a day, during morning hours (GMT).

Inspiration

You can perform multiple operations on the vaccines tweets. Here are few possible suggestions:

Study the subjects of recent tweets about the vaccine made by various producers;

Perform various NLP tasks on this data source (topic modelling, sentiment analysis);

Using the COVID-19 World Vaccination Progress (where we can see the progress of the vaccinations and the countries where the vaccines are administered), you can study the relationship between the vaccination progress and the discussions in social media (from the tweets) about the vaccines.

In [1]:
import pandas as pd
from geopy.geocoders import Nominatim
import numpy as np

In [2]:
twit_data = pd.read_csv("Data/vaccination_all_tweets.csv")
twit_data

Unnamed: 0,id,user_name,user_location,user_description,user_created,user_followers,user_friends,user_favourites,user_verified,date,text,hashtags,source,retweets,favorites,is_retweet
0,1340539111971516416,Rachel Roh,"La Crescenta-Montrose, CA",Aggregator of Asian American news; scanning di...,2009-04-08 17:52:46,405,1692,3247,False,2020-12-20 06:06:44,Same folks said daikon paste could treat a cyt...,['PfizerBioNTech'],Twitter for Android,0,0,False
1,1338158543359250433,Albert Fong,"San Francisco, CA","Marketing dude, tech geek, heavy metal & '80s ...",2009-09-21 15:27:30,834,666,178,False,2020-12-13 16:27:13,While the world has been on the wrong side of ...,,Twitter Web App,1,1,False
2,1337858199140118533,eli🇱🇹🇪🇺👌,Your Bed,"heil, hydra 🖐☺",2020-06-25 23:30:28,10,88,155,False,2020-12-12 20:33:45,#coronavirus #SputnikV #AstraZeneca #PfizerBio...,"['coronavirus', 'SputnikV', 'AstraZeneca', 'Pf...",Twitter for Android,0,0,False
3,1337855739918835717,Charles Adler,"Vancouver, BC - Canada","Hosting ""CharlesAdlerTonight"" Global News Radi...",2008-09-10 11:28:53,49165,3933,21853,True,2020-12-12 20:23:59,"Facts are immutable, Senator, even when you're...",,Twitter Web App,446,2129,False
4,1337854064604966912,Citizen News Channel,,Citizen News Channel bringing you an alternati...,2020-04-23 17:58:42,152,580,1473,False,2020-12-12 20:17:19,Explain to me again why we need a vaccine @Bor...,"['whereareallthesickpeople', 'PfizerBioNTech']",Twitter for iPhone,0,0,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
69713,1382248484259123205,Russian Mission in Geneva,Geneve,Постпредство России при Отделении ООН и др. ме...,2011-07-26 08:40:47,9862,589,3502,True,2021-04-14 08:24:54,✅ 🇷🇺#Gamaleya Research Center in cooperation w...,['Gamaleya'],Twitter for iPhone,2,9,False
69714,1382246393532801026,Mora Mosese,NtswanaTsatsi,#MoAfrika - #MoSotho,2019-12-06 11:11:07,164,101,6465,False,2021-04-14 08:16:35,@FloydShivambu #SputnikV. Where are #AfricanEx...,"['SputnikV', 'AfricanExpertise', 'ShamDemic']",Twitter Web App,0,0,False
69715,1382245076303179778,Robert ka Boss,,Lily don't be silly you want to know me just g...,2020-11-26 09:24:12,76,65,78,False,2021-04-14 08:11:21,"Hello, it s because of this stubbornness and f...",,Twitter Web App,0,0,False
69716,1382243373747146752,Mick Brown,"Cambridge, UK","Retired 1960s dropout, part-time self-appointe...",2011-12-05 14:11:05,1882,1850,83008,False,2021-04-14 08:04:35,In a lengthy interview on #wato some months ag...,"['wato', 'SputnikV']",Twitter for Android,0,0,False


In [3]:
print(min(twit_data.date))
print(max(twit_data.date))


2020-12-12 11:55:28
2021-04-22 14:19:57


google inverse lookup from text to location 

In [4]:
twit_data.describe()

Unnamed: 0,id,user_followers,user_friends,user_favourites,retweets,favorites
count,69718.0,69718.0,69718.0,69718.0,69718.0,69718.0
mean,1.372171e+18,99835.87,1297.046631,16027.58,3.387647,14.620973
std,1.034539e+16,839918.3,5838.860507,44200.21,69.8004,236.200153
min,1.337728e+18,0.0,0.0,0.0,0.0,0.0
25%,1.366318e+18,113.0,146.0,378.0,0.0,0.0
50%,1.373665e+18,545.0,406.0,2257.0,0.0,1.0
75%,1.380663e+18,2450.0,1180.0,11867.5,1.0,3.0
max,1.385237e+18,15045130.0,516578.0,1221784.0,11288.0,25724.0


clean the user_location non usable

In [5]:
invalid_location = ["Earth", "Planet Earth", "Global", "World", "Worldwide", "WorldWide", "Everywhere", "#KeepFightingMichael", "E",
"Phone No = +917006375573", "World Wide Web", "online", "StocksLand 💶💵💷", "Email:talksavailable@gmail.com",
"1996 BrExodus 2020", "00916366006573", "🇺🇸🇯🇵🇪🇺 🇮🇳 🇬🇧", "🌍", "Knokke,a spot in this Universe", 
"The Moon, The Milky Way", "No shills/puppets/trolls.", "RNA World"]

twit_data_first_clean = twit_data[twit_data.user_location.isin(invalid_location) == False]
twit_data_first_clean 

Unnamed: 0,id,user_name,user_location,user_description,user_created,user_followers,user_friends,user_favourites,user_verified,date,text,hashtags,source,retweets,favorites,is_retweet
0,1340539111971516416,Rachel Roh,"La Crescenta-Montrose, CA",Aggregator of Asian American news; scanning di...,2009-04-08 17:52:46,405,1692,3247,False,2020-12-20 06:06:44,Same folks said daikon paste could treat a cyt...,['PfizerBioNTech'],Twitter for Android,0,0,False
1,1338158543359250433,Albert Fong,"San Francisco, CA","Marketing dude, tech geek, heavy metal & '80s ...",2009-09-21 15:27:30,834,666,178,False,2020-12-13 16:27:13,While the world has been on the wrong side of ...,,Twitter Web App,1,1,False
2,1337858199140118533,eli🇱🇹🇪🇺👌,Your Bed,"heil, hydra 🖐☺",2020-06-25 23:30:28,10,88,155,False,2020-12-12 20:33:45,#coronavirus #SputnikV #AstraZeneca #PfizerBio...,"['coronavirus', 'SputnikV', 'AstraZeneca', 'Pf...",Twitter for Android,0,0,False
3,1337855739918835717,Charles Adler,"Vancouver, BC - Canada","Hosting ""CharlesAdlerTonight"" Global News Radi...",2008-09-10 11:28:53,49165,3933,21853,True,2020-12-12 20:23:59,"Facts are immutable, Senator, even when you're...",,Twitter Web App,446,2129,False
4,1337854064604966912,Citizen News Channel,,Citizen News Channel bringing you an alternati...,2020-04-23 17:58:42,152,580,1473,False,2020-12-12 20:17:19,Explain to me again why we need a vaccine @Bor...,"['whereareallthesickpeople', 'PfizerBioNTech']",Twitter for iPhone,0,0,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
69713,1382248484259123205,Russian Mission in Geneva,Geneve,Постпредство России при Отделении ООН и др. ме...,2011-07-26 08:40:47,9862,589,3502,True,2021-04-14 08:24:54,✅ 🇷🇺#Gamaleya Research Center in cooperation w...,['Gamaleya'],Twitter for iPhone,2,9,False
69714,1382246393532801026,Mora Mosese,NtswanaTsatsi,#MoAfrika - #MoSotho,2019-12-06 11:11:07,164,101,6465,False,2021-04-14 08:16:35,@FloydShivambu #SputnikV. Where are #AfricanEx...,"['SputnikV', 'AfricanExpertise', 'ShamDemic']",Twitter Web App,0,0,False
69715,1382245076303179778,Robert ka Boss,,Lily don't be silly you want to know me just g...,2020-11-26 09:24:12,76,65,78,False,2021-04-14 08:11:21,"Hello, it s because of this stubbornness and f...",,Twitter Web App,0,0,False
69716,1382243373747146752,Mick Brown,"Cambridge, UK","Retired 1960s dropout, part-time self-appointe...",2011-12-05 14:11:05,1882,1850,83008,False,2021-04-14 08:04:35,In a lengthy interview on #wato some months ag...,"['wato', 'SputnikV']",Twitter for Android,0,0,False


In [6]:
twit_data_drop_na = twit_data_first_clean.dropna(subset = ["user_location"])
twit_data_drop_na

Unnamed: 0,id,user_name,user_location,user_description,user_created,user_followers,user_friends,user_favourites,user_verified,date,text,hashtags,source,retweets,favorites,is_retweet
0,1340539111971516416,Rachel Roh,"La Crescenta-Montrose, CA",Aggregator of Asian American news; scanning di...,2009-04-08 17:52:46,405,1692,3247,False,2020-12-20 06:06:44,Same folks said daikon paste could treat a cyt...,['PfizerBioNTech'],Twitter for Android,0,0,False
1,1338158543359250433,Albert Fong,"San Francisco, CA","Marketing dude, tech geek, heavy metal & '80s ...",2009-09-21 15:27:30,834,666,178,False,2020-12-13 16:27:13,While the world has been on the wrong side of ...,,Twitter Web App,1,1,False
2,1337858199140118533,eli🇱🇹🇪🇺👌,Your Bed,"heil, hydra 🖐☺",2020-06-25 23:30:28,10,88,155,False,2020-12-12 20:33:45,#coronavirus #SputnikV #AstraZeneca #PfizerBio...,"['coronavirus', 'SputnikV', 'AstraZeneca', 'Pf...",Twitter for Android,0,0,False
3,1337855739918835717,Charles Adler,"Vancouver, BC - Canada","Hosting ""CharlesAdlerTonight"" Global News Radi...",2008-09-10 11:28:53,49165,3933,21853,True,2020-12-12 20:23:59,"Facts are immutable, Senator, even when you're...",,Twitter Web App,446,2129,False
5,1337852648389832708,Dee,"Birmingham, England","Gastroenterology trainee, Clinical Research Fe...",2020-01-26 21:43:12,105,108,106,False,2020-12-12 20:11:42,Does anyone have any useful advice/guidance fo...,,Twitter for iPhone,0,0,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
69712,1382250738760384515,ET NOW,India,Youtube: https://t.co/u54RR8zcQ8 Facebook: htt...,2010-12-10 12:49:29,641608,79,3,True,2021-04-14 08:33:51,Watch @drreddys address #SputnikV EUA #LIVE h...,"['SputnikV', 'LIVE']",Twitter Media Studio,1,10,False
69713,1382248484259123205,Russian Mission in Geneva,Geneve,Постпредство России при Отделении ООН и др. ме...,2011-07-26 08:40:47,9862,589,3502,True,2021-04-14 08:24:54,✅ 🇷🇺#Gamaleya Research Center in cooperation w...,['Gamaleya'],Twitter for iPhone,2,9,False
69714,1382246393532801026,Mora Mosese,NtswanaTsatsi,#MoAfrika - #MoSotho,2019-12-06 11:11:07,164,101,6465,False,2021-04-14 08:16:35,@FloydShivambu #SputnikV. Where are #AfricanEx...,"['SputnikV', 'AfricanExpertise', 'ShamDemic']",Twitter Web App,0,0,False
69716,1382243373747146752,Mick Brown,"Cambridge, UK","Retired 1960s dropout, part-time self-appointe...",2011-12-05 14:11:05,1882,1850,83008,False,2021-04-14 08:04:35,In a lengthy interview on #wato some months ag...,"['wato', 'SputnikV']",Twitter for Android,0,0,False


In [8]:
twit_data_alpha = twit_data_drop_na[twit_data_drop_na.user_location.str.isalpha()]
twit_data_alpha

Unnamed: 0,id,user_name,user_location,user_description,user_created,user_followers,user_friends,user_favourites,user_verified,date,text,hashtags,source,retweets,favorites,is_retweet
9,1337842295857623042,Ch.Amjad Ali,Islamabad,#ProudPakistani #LovePakArmy #PMIK @insafiansp...,2012-11-12 04:18:12,671,2368,20469,False,2020-12-12 19:30:33,#CovidVaccine \n\nStates will start getting #C...,"['CovidVaccine', 'COVID19Vaccine', 'US', 'paku...",Twitter Web App,0,0,False
12,1337815730486702087,WION,India,#WION: World Is One | Welcome to India’s first...,2016-03-21 03:44:54,292510,91,7531,True,2020-12-12 17:45:00,The agency also released new information for h...,,TweetDeck,0,18,False
17,1337783770070409218,ILKHA,Türkiye,Official Twitter account of Ilke News Agency /,2015-05-22 08:31:12,4056,6,3,True,2020-12-12 15:38:00,"Coronavirus: Iran reports 8,201 new cases, 221...","['Iran', 'coronavirus', 'PfizerBioNTech']",TweetDeck,3,5,False
30,1337760271151063040,Andy Thomas,London,Retweets not necessarily endorsements.,2009-03-07 20:39:15,1151,4301,95963,False,2020-12-12 14:04:37,"@ZubyMusic 6 deaths so far. \nIt's only death,...","['CovidVaccines', 'Pfizervaccine']",Twitter for Android,0,2,False
44,1337727767551553536,Daily News Egypt,Egypt,Egypt's Only Daily Independent Newspaper in En...,2009-04-26 07:56:24,278080,116,765,True,2020-12-12 11:55:28,#FDA authorizes #PfizerBioNTech #coronavirus v...,"['FDA', 'PfizerBioNTech', 'coronavirus']",Twitter Web App,1,1,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
69707,1382253724878344196,Roopam,Ranchi,Be Ur Kind of Crazy....Someone out there will ...,2019-08-19 14:55:44,304,280,87,False,2021-04-14 08:45:43,Mr. @RahulGandhi it is not because of you but ...,['SputnikV'],Twitter Web App,0,2,False
69711,1382250817189597189,moneycontrol,Mumbai,Moneycontrol is India’s No. 1 financial portal...,2009-08-26 07:55:29,1077795,297,1079,True,2021-04-14 08:34:10,.@viswanath_pilla looks at how #SputnikV stack...,"['SputnikV', 'Covishield', 'Covaxin']",Twitter Web App,2,4,False
69712,1382250738760384515,ET NOW,India,Youtube: https://t.co/u54RR8zcQ8 Facebook: htt...,2010-12-10 12:49:29,641608,79,3,True,2021-04-14 08:33:51,Watch @drreddys address #SputnikV EUA #LIVE h...,"['SputnikV', 'LIVE']",Twitter Media Studio,1,10,False
69713,1382248484259123205,Russian Mission in Geneva,Geneve,Постпредство России при Отделении ООН и др. ме...,2011-07-26 08:40:47,9862,589,3502,True,2021-04-14 08:24:54,✅ 🇷🇺#Gamaleya Research Center in cooperation w...,['Gamaleya'],Twitter for iPhone,2,9,False


In [9]:
newuser_created = twit_data_alpha["user_created"].str.split(" ", n = 1, expand = True)
twit_data_alpha["user_created"].replace(newuser_created[0])

newdate = twit_data_alpha["date"].str.split(" ", n = 1, expand = True)
twit_data_alpha["date"].replace(newdate[0])

twit_data_alpha

Unnamed: 0,id,user_name,user_location,user_description,user_created,user_followers,user_friends,user_favourites,user_verified,date,text,hashtags,source,retweets,favorites,is_retweet
9,1337842295857623042,Ch.Amjad Ali,Islamabad,#ProudPakistani #LovePakArmy #PMIK @insafiansp...,2012-11-12 04:18:12,671,2368,20469,False,2020-12-12 19:30:33,#CovidVaccine \n\nStates will start getting #C...,"['CovidVaccine', 'COVID19Vaccine', 'US', 'paku...",Twitter Web App,0,0,False
12,1337815730486702087,WION,India,#WION: World Is One | Welcome to India’s first...,2016-03-21 03:44:54,292510,91,7531,True,2020-12-12 17:45:00,The agency also released new information for h...,,TweetDeck,0,18,False
17,1337783770070409218,ILKHA,Türkiye,Official Twitter account of Ilke News Agency /,2015-05-22 08:31:12,4056,6,3,True,2020-12-12 15:38:00,"Coronavirus: Iran reports 8,201 new cases, 221...","['Iran', 'coronavirus', 'PfizerBioNTech']",TweetDeck,3,5,False
30,1337760271151063040,Andy Thomas,London,Retweets not necessarily endorsements.,2009-03-07 20:39:15,1151,4301,95963,False,2020-12-12 14:04:37,"@ZubyMusic 6 deaths so far. \nIt's only death,...","['CovidVaccines', 'Pfizervaccine']",Twitter for Android,0,2,False
44,1337727767551553536,Daily News Egypt,Egypt,Egypt's Only Daily Independent Newspaper in En...,2009-04-26 07:56:24,278080,116,765,True,2020-12-12 11:55:28,#FDA authorizes #PfizerBioNTech #coronavirus v...,"['FDA', 'PfizerBioNTech', 'coronavirus']",Twitter Web App,1,1,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
69707,1382253724878344196,Roopam,Ranchi,Be Ur Kind of Crazy....Someone out there will ...,2019-08-19 14:55:44,304,280,87,False,2021-04-14 08:45:43,Mr. @RahulGandhi it is not because of you but ...,['SputnikV'],Twitter Web App,0,2,False
69711,1382250817189597189,moneycontrol,Mumbai,Moneycontrol is India’s No. 1 financial portal...,2009-08-26 07:55:29,1077795,297,1079,True,2021-04-14 08:34:10,.@viswanath_pilla looks at how #SputnikV stack...,"['SputnikV', 'Covishield', 'Covaxin']",Twitter Web App,2,4,False
69712,1382250738760384515,ET NOW,India,Youtube: https://t.co/u54RR8zcQ8 Facebook: htt...,2010-12-10 12:49:29,641608,79,3,True,2021-04-14 08:33:51,Watch @drreddys address #SputnikV EUA #LIVE h...,"['SputnikV', 'LIVE']",Twitter Media Studio,1,10,False
69713,1382248484259123205,Russian Mission in Geneva,Geneve,Постпредство России при Отделении ООН и др. ме...,2011-07-26 08:40:47,9862,589,3502,True,2021-04-14 08:24:54,✅ 🇷🇺#Gamaleya Research Center in cooperation w...,['Gamaleya'],Twitter for iPhone,2,9,False


Test the user_coordinates creation with 10 elements.

In [11]:
## test for 10 elements
def return_location(x):
    if x is None:
        return None
    else:
        return [x.latitude, x.longitude]

geolocator = Nominatim(user_agent="my-app")
test_data = twit_data_alpha[:10]
test_data['user_coordinates'] = test_data['user_location'].apply(geolocator.geocode).apply(lambda x: return_location(x))
test_data

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test_data['user_coordinates'] = test_data['user_location'].apply(geolocator.geocode).apply(lambda x: return_location(x))


Unnamed: 0,id,user_name,user_location,user_description,user_created,user_followers,user_friends,user_favourites,user_verified,date,text,hashtags,source,retweets,favorites,is_retweet,user_coordinates
9,1337842295857623042,Ch.Amjad Ali,Islamabad,#ProudPakistani #LovePakArmy #PMIK @insafiansp...,2012-11-12 04:18:12,671,2368,20469,False,2020-12-12 19:30:33,#CovidVaccine \n\nStates will start getting #C...,"['CovidVaccine', 'COVID19Vaccine', 'US', 'paku...",Twitter Web App,0,0,False,"[33.6938118, 73.0651511]"
12,1337815730486702087,WION,India,#WION: World Is One | Welcome to India’s first...,2016-03-21 03:44:54,292510,91,7531,True,2020-12-12 17:45:00,The agency also released new information for h...,,TweetDeck,0,18,False,"[22.3511148, 78.6677428]"
17,1337783770070409218,ILKHA,Türkiye,Official Twitter account of Ilke News Agency /,2015-05-22 08:31:12,4056,6,3,True,2020-12-12 15:38:00,"Coronavirus: Iran reports 8,201 new cases, 221...","['Iran', 'coronavirus', 'PfizerBioNTech']",TweetDeck,3,5,False,"[38.9597594, 34.9249653]"
30,1337760271151063040,Andy Thomas,London,Retweets not necessarily endorsements.,2009-03-07 20:39:15,1151,4301,95963,False,2020-12-12 14:04:37,"@ZubyMusic 6 deaths so far. \nIt's only death,...","['CovidVaccines', 'Pfizervaccine']",Twitter for Android,0,2,False,"[51.5073219, -0.1276474]"
44,1337727767551553536,Daily News Egypt,Egypt,Egypt's Only Daily Independent Newspaper in En...,2009-04-26 07:56:24,278080,116,765,True,2020-12-12 11:55:28,#FDA authorizes #PfizerBioNTech #coronavirus v...,"['FDA', 'PfizerBioNTech', 'coronavirus']",Twitter Web App,1,1,False,"[26.2540493, 29.2675469]"
45,1340571472025141248,IP_Man,America,"One Man, \nWith One Statistically Impossible G...",2012-06-21 01:41:33,625,477,14475,False,2020-12-20 08:15:20,When The #CovidVaccine \nPoisons Enough Of The...,"['CovidVaccine', 'BellsPalsy']",Twitter Web App,0,0,False,"[51.4371483, 5.9799001]"
49,1339822296278519810,IP_Man,America,"One Man, \nWith One Statistically Impossible G...",2012-06-21 01:41:33,625,477,14475,False,2020-12-18 06:38:22,COVID-19: News and updates\npublic questioned ...,,Twitter Web App,0,0,False,"[51.4371483, 5.9799001]"
59,1338607616600256513,Roger Simmons,Ontario,,2020-01-03 22:29:02,8,37,658,False,2020-12-14 22:11:40,Will you be taking the COVID-19 vaccine once a...,"['COVID19', 'Pfizer', 'BioNTech', 'vaccine', '...",Twitter for iPhone,0,0,False,"[50.000678, -86.000977]"
75,1338574693087936513,Prof. Manish Thakur,India,#Proprietor English Academy #Blockchain #AI #I...,2012-06-11 13:50:05,3372,1713,119631,False,2020-12-14 20:00:51,#UgurSahin #ozlemtureci the #Muslim Scientists...,"['UgurSahin', 'ozlemtureci', 'Muslim', 'Pfizer...",Twitter for Android,0,0,False,"[22.3511148, 78.6677428]"
78,1338572995992969217,Toni Kappesz,BERLIN,,2009-06-08 12:38:48,221,483,58673,False,2020-12-14 19:54:06,"Where did the #WarpSpeed money go, if not to t...","['WarpSpeed', 'PfizerBioNTech', 'Moderna', 'Gr...",Twitter for iPhone,0,0,False,"[52.5170365, 13.3888599]"


In order to use the reverse geopy function (to obtain english) we need to add a column(user_coordinates2) in the db with the coordinates without brackets, there is the creation for a test with 10 examples:

In [12]:
## test for 10 elements
def return_location2(x):
    if x is None:
        return None
    else:
        return "{}, {}".format(x.latitude, x.longitude)

geolocator = Nominatim(user_agent="my-app")
test_data1 = test_data
test_data1['user_coordinates2'] = test_data1['user_location'].apply(geolocator.geocode).apply(lambda x: return_location2(x))
test_data1

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test_data1['user_coordinates2'] = test_data1['user_location'].apply(geolocator.geocode).apply(lambda x: return_location2(x))


Unnamed: 0,id,user_name,user_location,user_description,user_created,user_followers,user_friends,user_favourites,user_verified,date,text,hashtags,source,retweets,favorites,is_retweet,user_coordinates,user_coordinates2
9,1337842295857623042,Ch.Amjad Ali,Islamabad,#ProudPakistani #LovePakArmy #PMIK @insafiansp...,2012-11-12 04:18:12,671,2368,20469,False,2020-12-12 19:30:33,#CovidVaccine \n\nStates will start getting #C...,"['CovidVaccine', 'COVID19Vaccine', 'US', 'paku...",Twitter Web App,0,0,False,"[33.6938118, 73.0651511]","33.6938118, 73.0651511"
12,1337815730486702087,WION,India,#WION: World Is One | Welcome to India’s first...,2016-03-21 03:44:54,292510,91,7531,True,2020-12-12 17:45:00,The agency also released new information for h...,,TweetDeck,0,18,False,"[22.3511148, 78.6677428]","22.3511148, 78.6677428"
17,1337783770070409218,ILKHA,Türkiye,Official Twitter account of Ilke News Agency /,2015-05-22 08:31:12,4056,6,3,True,2020-12-12 15:38:00,"Coronavirus: Iran reports 8,201 new cases, 221...","['Iran', 'coronavirus', 'PfizerBioNTech']",TweetDeck,3,5,False,"[38.9597594, 34.9249653]","38.9597594, 34.9249653"
30,1337760271151063040,Andy Thomas,London,Retweets not necessarily endorsements.,2009-03-07 20:39:15,1151,4301,95963,False,2020-12-12 14:04:37,"@ZubyMusic 6 deaths so far. \nIt's only death,...","['CovidVaccines', 'Pfizervaccine']",Twitter for Android,0,2,False,"[51.5073219, -0.1276474]","51.5073219, -0.1276474"
44,1337727767551553536,Daily News Egypt,Egypt,Egypt's Only Daily Independent Newspaper in En...,2009-04-26 07:56:24,278080,116,765,True,2020-12-12 11:55:28,#FDA authorizes #PfizerBioNTech #coronavirus v...,"['FDA', 'PfizerBioNTech', 'coronavirus']",Twitter Web App,1,1,False,"[26.2540493, 29.2675469]","26.2540493, 29.2675469"
45,1340571472025141248,IP_Man,America,"One Man, \nWith One Statistically Impossible G...",2012-06-21 01:41:33,625,477,14475,False,2020-12-20 08:15:20,When The #CovidVaccine \nPoisons Enough Of The...,"['CovidVaccine', 'BellsPalsy']",Twitter Web App,0,0,False,"[51.4371483, 5.9799001]","51.4371483, 5.9799001"
49,1339822296278519810,IP_Man,America,"One Man, \nWith One Statistically Impossible G...",2012-06-21 01:41:33,625,477,14475,False,2020-12-18 06:38:22,COVID-19: News and updates\npublic questioned ...,,Twitter Web App,0,0,False,"[51.4371483, 5.9799001]","51.4371483, 5.9799001"
59,1338607616600256513,Roger Simmons,Ontario,,2020-01-03 22:29:02,8,37,658,False,2020-12-14 22:11:40,Will you be taking the COVID-19 vaccine once a...,"['COVID19', 'Pfizer', 'BioNTech', 'vaccine', '...",Twitter for iPhone,0,0,False,"[50.000678, -86.000977]","50.000678, -86.000977"
75,1338574693087936513,Prof. Manish Thakur,India,#Proprietor English Academy #Blockchain #AI #I...,2012-06-11 13:50:05,3372,1713,119631,False,2020-12-14 20:00:51,#UgurSahin #ozlemtureci the #Muslim Scientists...,"['UgurSahin', 'ozlemtureci', 'Muslim', 'Pfizer...",Twitter for Android,0,0,False,"[22.3511148, 78.6677428]","22.3511148, 78.6677428"
78,1338572995992969217,Toni Kappesz,BERLIN,,2009-06-08 12:38:48,221,483,58673,False,2020-12-14 19:54:06,"Where did the #WarpSpeed money go, if not to t...","['WarpSpeed', 'PfizerBioNTech', 'Moderna', 'Gr...",Twitter for iPhone,0,0,False,"[52.5170365, 13.3888599]","52.5170365, 13.3888599"


try a single reverse english with a single point with coordinates of the first dataset row

In [13]:
a=geolocator.reverse(None,language='en')
a.raw
#b=geolocator.reverse("33.6938118,73.0651511",language='en')
#b.raw['address']['country']

{'place_id': 312028333,
 'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright',
 'osm_type': 'node',
 'osm_id': 3815077900,
 'lat': '0',
 'lon': '0',
 'display_name': 'Soul Buoy',
 'address': {'man_made': 'Soul Buoy'},
 'boundingbox': ['-5.0E-5', '5.0E-5', '-5.0E-5', '5.0E-5']}

se metto 36 elementi mi da errore perche il 35 è None

try with 10 elements to obtain the english country:

In [14]:
def return_location3(x):
    try:
        temp=x.raw['address']['country']#['state']
    except:
        temp="Unknown"
    return temp
    
geolocator = Nominatim(user_agent="my-app4")
test_data2 = test_data1[:10]
#test_data2 = test_data2.dropna(subset = ["user_coordinates"])
test_data2['user_country'] = test_data2['user_coordinates2'].apply(lambda y : geolocator.reverse(y,language='en')).apply(lambda x: return_location3(x))
test_data2

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test_data2['user_country'] = test_data2['user_coordinates2'].apply(lambda y : geolocator.reverse(y,language='en')).apply(lambda x: return_location3(x))


Unnamed: 0,id,user_name,user_location,user_description,user_created,user_followers,user_friends,user_favourites,user_verified,date,text,hashtags,source,retweets,favorites,is_retweet,user_coordinates,user_coordinates2,user_country
9,1337842295857623042,Ch.Amjad Ali,Islamabad,#ProudPakistani #LovePakArmy #PMIK @insafiansp...,2012-11-12 04:18:12,671,2368,20469,False,2020-12-12 19:30:33,#CovidVaccine \n\nStates will start getting #C...,"['CovidVaccine', 'COVID19Vaccine', 'US', 'paku...",Twitter Web App,0,0,False,"[33.6938118, 73.0651511]","33.6938118, 73.0651511",Pakistan
12,1337815730486702087,WION,India,#WION: World Is One | Welcome to India’s first...,2016-03-21 03:44:54,292510,91,7531,True,2020-12-12 17:45:00,The agency also released new information for h...,,TweetDeck,0,18,False,"[22.3511148, 78.6677428]","22.3511148, 78.6677428",India
17,1337783770070409218,ILKHA,Türkiye,Official Twitter account of Ilke News Agency /,2015-05-22 08:31:12,4056,6,3,True,2020-12-12 15:38:00,"Coronavirus: Iran reports 8,201 new cases, 221...","['Iran', 'coronavirus', 'PfizerBioNTech']",TweetDeck,3,5,False,"[38.9597594, 34.9249653]","38.9597594, 34.9249653",Turkey
30,1337760271151063040,Andy Thomas,London,Retweets not necessarily endorsements.,2009-03-07 20:39:15,1151,4301,95963,False,2020-12-12 14:04:37,"@ZubyMusic 6 deaths so far. \nIt's only death,...","['CovidVaccines', 'Pfizervaccine']",Twitter for Android,0,2,False,"[51.5073219, -0.1276474]","51.5073219, -0.1276474",United Kingdom
44,1337727767551553536,Daily News Egypt,Egypt,Egypt's Only Daily Independent Newspaper in En...,2009-04-26 07:56:24,278080,116,765,True,2020-12-12 11:55:28,#FDA authorizes #PfizerBioNTech #coronavirus v...,"['FDA', 'PfizerBioNTech', 'coronavirus']",Twitter Web App,1,1,False,"[26.2540493, 29.2675469]","26.2540493, 29.2675469",Egypt
45,1340571472025141248,IP_Man,America,"One Man, \nWith One Statistically Impossible G...",2012-06-21 01:41:33,625,477,14475,False,2020-12-20 08:15:20,When The #CovidVaccine \nPoisons Enough Of The...,"['CovidVaccine', 'BellsPalsy']",Twitter Web App,0,0,False,"[51.4371483, 5.9799001]","51.4371483, 5.9799001",Netherlands
49,1339822296278519810,IP_Man,America,"One Man, \nWith One Statistically Impossible G...",2012-06-21 01:41:33,625,477,14475,False,2020-12-18 06:38:22,COVID-19: News and updates\npublic questioned ...,,Twitter Web App,0,0,False,"[51.4371483, 5.9799001]","51.4371483, 5.9799001",Netherlands
59,1338607616600256513,Roger Simmons,Ontario,,2020-01-03 22:29:02,8,37,658,False,2020-12-14 22:11:40,Will you be taking the COVID-19 vaccine once a...,"['COVID19', 'Pfizer', 'BioNTech', 'vaccine', '...",Twitter for iPhone,0,0,False,"[50.000678, -86.000977]","50.000678, -86.000977",Canada
75,1338574693087936513,Prof. Manish Thakur,India,#Proprietor English Academy #Blockchain #AI #I...,2012-06-11 13:50:05,3372,1713,119631,False,2020-12-14 20:00:51,#UgurSahin #ozlemtureci the #Muslim Scientists...,"['UgurSahin', 'ozlemtureci', 'Muslim', 'Pfizer...",Twitter for Android,0,0,False,"[22.3511148, 78.6677428]","22.3511148, 78.6677428",India
78,1338572995992969217,Toni Kappesz,BERLIN,,2009-06-08 12:38:48,221,483,58673,False,2020-12-14 19:54:06,"Where did the #WarpSpeed money go, if not to t...","['WarpSpeed', 'PfizerBioNTech', 'Moderna', 'Gr...",Twitter for iPhone,0,0,False,"[52.5170365, 13.3888599]","52.5170365, 13.3888599",Germany


In [15]:
def return_location(x):
    if x is None:
        return None
    else:
        ### TO BE INGESTED IN ELASTIC[x.longitude, x.latitude]
        return [x.longitude, x.latitude]
    
def return_location_2(x):
    if x is None:
        return None
    else: 
        ###usefull in order to obtain the english country
        return "{}, {}".format(x.latitude, x.longitude)
    
    
def return_location_3(x):
    try:
        temp=x.raw['address']['country']#['state']
    except:
        temp="Unknown"
    return temp
##def return_location_3(x):
  ##  if x is None: 
    ##    return none
    ##else: 
      ##  return x.raw['address']['country']#['state']
    
geolocator = Nominatim(user_agent="my-app2")


twit_data_alpha_sliced = np.array_split(twit_data_alpha, 1000)


count = 0 
for data_slice in twit_data_alpha_sliced:
   # if count==10:
    #    break
    data_slice['user_coordinates'] = data_slice['user_location'].apply(geolocator.geocode).apply(lambda x:return_location(x))
    data_slice['user_coordinates2'] = data_slice['user_location'].apply(geolocator.geocode).apply(lambda x: return_location_2(x))
    data_slice['user_country'] = data_slice['user_coordinates2'].apply(lambda y : geolocator.reverse(y,language='en')).apply(lambda x: return_location_3(x))
    count = count + 1
    if(count%100==0):
        print(count)
  


KeyboardInterrupt: 

In [None]:
### get slices togheter
first = True
for data_slice in twit_data_alpha_sliced:
    if first:
        final_data = data_slice
        first = False
    else:
        final_data = final_data.append(data_slice)
        
final_data

In [None]:
final_data.to_csv("Data/vaccination_all_tweets_cleaned_with_country.csv")