
This module is prepared for  SOCIAL MEDIA MINING MADE EASY: PYTHON TEXT ANALYTICS CRASH COURSE WORKSHOP, 25-26 September 2023.

Author: Dr Lailatul Qadri Zakaria, Asian Language Processing Lab (ASLAN), Center For Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology (FTSM), Universiti Kebangsaan Malaysia (UKM).

Email: lailatul.qadri@ukm.edu.my


#Tweety
Previously, numerous Python libraries that did not utilise Twitter's API could be used for Twitter data scraping, however owing to changes in Twitter policy, most of the libraries are no longer usable.

In this workshop, I will use **Tweepy** library to extract Twitter data. Tweepy is can be used to extract data such as get user info, get tweets, searching tweets based on keywords, get trends, get community and get community tweets.

Let's go!

Resource for Tweety documentation link: https://mahrtayyab.github.io/tweety_docs/#

## Install required library


### 1. We need to install tweety library using the following code:

In [5]:
!pip install tweety-ns



### 2. Upgrade the library to the latest version:

In [6]:
!pip install https://github.com/mahrtayyab/tweety/archive/main.zip --upgrade

Collecting https://github.com/mahrtayyab/tweety/archive/main.zip
  Using cached https://github.com/mahrtayyab/tweety/archive/main.zip (98 kB)
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


## Tweety Functions

**Note:** If you have a Twitter account, you may use your Twitter ID and password to access more Tweepy features. Otherwise, your Tweety access is restricted to the first two functions.


**Basic Functions:**

a.   Get User Info

b.   Get User's Tweet (limited to 100 tweets)

**Advance Functions:**

a.   Get User Info (similar to basic function)

b.   Get User's Tweet (more than 100 tweets)

c.   Searching a keyword






In [7]:
from tweety import Twitter
app = Twitter("session")

### 1. Basic Function (no sign-in required)

#### a.  Get User info
Lets try to search for a public figure in Twitter. In this example, I will search for our former prime minister - Tun Dr Mahathir Mohamad. You will need his twitter name to get his information: "chedetofficial".

In [8]:
user = app.get_user_info('chedetofficial')
print(user)

User(id=2974181588, username=chedetofficial, name=Dr Mahathir Mohamad, verified=True)


Now, try to get other user information by changing the user name. You also can search for your user information.

In [10]:
user = app.get_user_info('Aremien1096206')
print(user)

User(id=1778704336094367744, username=Aremien1096206, name=Are_mien, verified=False)


#### b. Get tweets from the user (limited to 100 tweets)
You now have the user information. Next, let's see if we can capture what they tweet. In this example, I would like to extract Tun M' tweets and save it into pandas.


In [11]:
# Importing pandas as pd
import pandas as pd

After running this code, you will receive raw tweet data from Tun M. It comprises a lot of information, like the Tweet id, author, username, and produced data. Observe the output:
*   Without Twitter User ID and password, you only can access 100 tweet from the user.
*   Tweet's date are random






In [12]:
all_tweets = app.get_tweets("chedetofficial")

tweet_text = [] # we want to store tweet's text
tweet_date = [] # we want to store tweet's date

for tweet in all_tweets:
      print(tweet)
      tweet_text.append(tweet.text)
      tweet_date.append(tweet.date)

Tweet(id=1232164352800645121, author=User(id=2974181588, username=chedetofficial, name=Dr Mahathir Mohamad, verified=True), created_on=2020-02-25 04:44:08+00:00)
Tweet(id=994832602816278529, author=User(id=2974181588, username=chedetofficial, name=Dr Mahathir Mohamad, verified=True), created_on=2018-05-11 06:52:30+00:00)
Tweet(id=994448161212018689, author=User(id=2974181588, username=chedetofficial, name=Dr Mahathir Mohamad, verified=True), created_on=2018-05-10 05:24:52+00:00)
Tweet(id=1198088000359428096, author=User(id=2974181588, username=chedetofficial, name=Dr Mahathir Mohamad, verified=True), created_on=2019-11-23 03:56:53+00:00)
Tweet(id=996903394223812614, author=User(id=2974181588, username=chedetofficial, name=Dr Mahathir Mohamad, verified=True), created_on=2018-05-17 00:01:05+00:00)
Tweet(id=1714951451079049272, author=User(id=2974181588, username=chedetofficial, name=Dr Mahathir Mohamad, verified=True), created_on=2023-10-19 10:27:42+00:00)
Tweet(id=1290817057915719680, a

In [13]:
# Calling DataFrame constructor on list

df = pd.DataFrame({'Date': tweet_date, 'Tweet':tweet_text})
df

Unnamed: 0,Date,Tweet
0,2020-02-25 04:44:08+00:00,Just another day in the office. https://t.co/f...
1,2018-05-11 06:52:30+00:00,First day on the job... once again. https://t....
2,2018-05-10 05:24:52+00:00,Kita telah melakukan sesuatu yang selama ini m...
3,2019-11-23 03:56:53+00:00,Sekali sekala bawa Hasmah dating di KL https:/...
4,2018-05-17 00:01:05+00:00,Selamat menunaikan ibadah puasa dari saya dan ...
...,...,...
93,2018-07-17 07:58:22+00:00,"17 Julai 2018, sesi Parlimen Malaysia ke-14 ht..."
94,2019-08-30 14:29:14+00:00,Demi negara yang tercinta dicurahkan bakti pen...
95,2017-11-21 02:45:38+00:00,Umur bukan penghalang untuk kita belajar tekno...
96,2018-01-07 07:45:16+00:00,Siti marah saya curi makanan dia. \nSharing is...


Save your data in the form of csv

In [14]:
df.to_csv("my_data.csv")

**Limitation:** Without Twitter User ID and password, you only can access 100 tweet from the user.



### 2. Advance Functions (sign-in required)
In order to use advance functions, you need to provide Twitter user ID, password and extra password. The extra password will be emailed to you based on your Twitter ID email.

app.sign_in("USER ID", "PASSWORD", extra="EXTRA PASSWORD")

#### a. Get tweets from the user



Let us now attempt to extract Tun M's tweets. We may also set the "pages" value. If "pages" is set to 1, it will retrieve 20 tweets per page. Increase the value of "pages" to extract more tweets.


In [16]:
app.sign_in("Aremien1096206", "@Taiping1234")

InvalidTweetIdentifier: Missing data.

In [17]:
app.sign_in("Aremien1096206", "", extra="evc6x4y8")
target_username = "chedetofficial"
user = app.get_user_info(target_username)
all_tweets = app.get_tweets(user,pages = 10)


tweet_date = []
tweet_text = []
for x in all_tweets:
  #print(x.items)
  for k, v in x.items():
    #print("-",k)
    if(k == "date"):
      #print(">",v)
      tweet_date.append(v)
    if(k == "text"):
      #print(">",v)
      tweet_text.append(v)




In [None]:
# Calling DataFrame constructor on list

df = pd.DataFrame({'Date': tweet_date, 'Tweet':tweet_text})
df

Unnamed: 0,Date,Tweet
0,2024-04-18 09:03:53+00:00,𝐓𝐔𝐃𝐔𝐇𝐀𝐍\n\n1. Membuat tuduhan mudah. Tetapi tu...
1,2024-04-17 07:39:32+00:00,𝐋𝐀𝐖 𝐀𝐒 𝐀 𝐌𝐄𝐀𝐍𝐒 𝐓𝐎 𝐓𝐇𝐑𝐄𝐀𝐓𝐄𝐍\n \n1. We believe i...
2,2024-04-16 07:54:00+00:00,APAKAH SAYA PENJENAYAH\n\n1. Kata Inspector-Ge...
3,2024-04-09 16:54:32+00:00,Saya ingin mengucapkan Selamat Menyambut Hari ...
4,2024-04-09 13:46:58+00:00,Dr Mahathir juga mengingatkan rakyat supaya ti...
...,...,...
153,2023-07-27 14:15:14+00:00,WHO OWNS THE COUNTRY\n \n1. Ownership of a cou...
154,2023-07-27 05:24:16+00:00,MALAYSIA A MALAY COUNTRY\n \n1. Stop talking a...
155,2023-07-26 13:37:52+00:00,MULTIRACIALISM\n \n1. The Government and its s...
156,2023-07-26 07:45:24+00:00,RACE\n\n1. The people are not allowed to creat...


In [18]:
df.to_csv("my_data_chedet.csv")

#### b. Searching a keyword
Do you have a specific topic or subject that you want to collect from Twitter? We can use searching a keyword function to get them. In order to search tweets, you need to provide such information:

*   keywords : example #UKM, paracetamol, pulau redang, etc..
*   nunmber of Tweet pages you want to get
*   Filter (optional) : Filter you would like to apply on the search.
*   Wait time: Number of seconds to wait between multiple requests

In this case, we'd want to get tweets on Fukushima water waste. As we all know, Japan began dumping treated radioactive water into the Pacific Ocean from the stricken Fukushima nuclear power plant. To obtain the information, we must first find relevant terms, such as "fukushima waste water" or "fukushima radiation water." Each query will get different results.

**Note:** Don't worry if you get Error 500 when looking for data. It is only a momentary issue that indicates "Something is broken. This is generally a transitory mistake, such as when a server is under heavy load or an endpoint is experiencing problems". Grab a coffee and after a few minutes try again with a lower page number.




In [19]:
from tweety.filters import SearchFilters

tweets = app.search("Gaza", pages = 10, filter_=SearchFilters.Latest(), wait_time = 2)

tweet_text = []
tweet_date = []
for tweet in tweets:
  tweet_text.append(tweet.text)
  tweet_date.append(tweet.date)

df = pd.DataFrame({'Date': tweet_date, 'Tweet':tweet_text})
df


Unnamed: 0,Date,Tweet
0,2024-04-23 09:44:36+00:00,Mass-Arrests-Made-As-US-Campus-Protests-Over-G...
1,2024-04-23 09:44:33+00:00,@DrLoupis 90% of the residents of Gaza are now...
2,2024-04-23 09:44:31+00:00,@Kingbuster9903 @2szszs @exsdel @HowidyHamza A...
3,2024-04-23 09:44:31+00:00,@alon_mizrahi So pathetic https://t.co/6MrASAOS7K
4,2024-04-23 09:44:31+00:00,"@OliLondonTV Lock em up, then deport them to G..."
...,...,...
124,2024-04-23 09:42:26+00:00,@ionebelarra @MartinaVelardeG @NicoSguiglia lo...
125,2024-04-23 09:42:24+00:00,#ريال_مدريد\n#غزة\n#فلسطين\n#RealMadrid\n#Gaza...
126,2024-04-23 09:42:15+00:00,200 يوم من الإبادة والقتل والتهجير والرعب وفقد...
127,2024-04-23 09:42:10+00:00,@Sprinterfactory Put them on a fukn plane to Gaza


In this example, I'll name my file my_data.csv. You may change the file's name accordingly.

In [None]:
df.to_csv("gaza.csv")

We also can filter/search for users by using user name. In this example, we search for

##**Self Activity: Collect Data From Twitter**
Now, you can collect your own social media data.

1. You may use the given template to collect your own dataset based on Twitter user

In [None]:
# Which user's tweets would you like to search for
target_username = "elonmusk" #change to your target username

user = app.get_user_info(target_username)
all_tweets = app.get_tweets(user,pages = 20 )

# save to data: date and tweet text
tweet_date = []
tweet_text = []
for x in all_tweets:
  #print(x.items)
  for k, v in x.items():
    #print("-",k)
    if(k == "date"):
      #print(">",v)
      tweet_date.append(v)
    if(k == "text"):
      #print(">",v)
      tweet_text.append(v)

#save the data into dataframe
df = pd.DataFrame({'Date': tweet_date, 'Tweet':tweet_text})

#save the data in csv format
df.to_csv("my_data1.csv") # you may change the data name accordingly.

2. Search for any topic of your interest.

In [None]:
tweets = app.search("airasia", pages = 10, filter_=SearchFilters.Latest(), wait_time = 10)

tweet_text = []
tweet_date = []
for tweet in tweets:
  tweet_text.append(tweet.text)
  tweet_date.append(tweet.date)

df = pd.DataFrame({'Date': tweet_date, 'Tweet':tweet_text})

#save the data in csv format
df.to_csv("my_data2.csv") # you may change the data name accordingly.