# Twitter Data Collection

In this program we're going to extract Twitter data using Twitter API

***
## Import Modules
First of all, we need to import modules needed for the data collection process. The followings are the modules that we need:<br><br>
1. Tweepy --> to crawl the data.<br>
2. Pandas --> to organize and export the collected twitter data into a file.


In [1]:
import tweepy as tw
import pandas as pd

## Define your Twitter API credentials
Define your Twitter developer credentials to connect to Twitter API in order to be able to crawl data from it.

In [2]:
api_key = 'YOUR_API_KEY'
api_secret_key = 'YOUR_API_SECRET_KEY'
access_token = 'YOUR_ACCESS_TOKEN'
access_token_secret = 'YOUR_ACCESS_TOKEN_SECRET'

In [3]:
auth = tw.OAuthHandler(api_key, api_secret_key)
auth.set_access_token(access_token, access_token_secret)
api = tw.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

## Collecting Data
Define the keywords to filter the twitter data that you want to collect. The data collection process is done using the Cursor function by Tweepy module. The amount of tweets collected could be set up manually, in this case we only extract 10 twitter data just for instance. You can also set the language from the "lang" parameter. For more information about the function/module you can check its documentation on the internet.

In [4]:
key_words = "SpaceX since:2020-11-10 -filter:replies -filter:retweets"

search_result = tw.Cursor(api.search,
              q=key_words,
              lang="id",
              truncated=True).items(10)

## Extract Data Information
The data returned by the module will be in JSON format and it contains a lot of parameter of informations, you may choose informations that you're going to use later on. In this code, we'll be extracting following informations:<br><br>
1. User screen name<br>
2. Tweet<br>
3. Retweets count<br>
4. Likes count<br>
5. Tweet's location<br>
6. Source of Tweet<br>
7. Account's verified status<br>
8. Account's date of creation<br>
9. Profile image attachment<br>
10. Bio attachment<br>
11. Statuses count<br>
12. Followings count<br>
13. Followers count<br>
14. Account's location<br>

In [5]:
crawling_result = [api.get_status(data.id, tweet_mode="extended") for data in search_result]

tweet_list = [[status.user.screen_name, status.full_text, status.retweet_count, status.favorite_count, status.geo, status.source, status.user.verified, status.author.created_at, status.author.default_profile_image, status.author.default_profile, status.user.statuses_count, status.user.friends_count, status.user.followers_count, status.user.location] for status in crawling_result]

### Converting the tweet_list into dataframe
we're converting the list into dataframe just so the data will be more organized and easier to read.

In [6]:
tweet_df = pd.DataFrame(data = tweet_list, 
                    columns=["username", "tweet", "retweet_count", "like_count", "location", "device", "verified_status", "acc_creation_date","no_profile_pic", "no_bio", "tweets_count", "followings_count", "followers_count", "user_location"])
tweet_df

Unnamed: 0,username,tweet,retweet_count,like_count,location,device,verified_status,acc_creation_date,no_profile_pic,no_bio,tweets_count,followings_count,followers_count,user_location
0,aristiaelvina,Indonesia sibuk dengan berbagai permasalahan y...,0,0,,Twitter for Android,False,2019-08-01 12:40:46,False,True,3571,109,126,
1,WartaEkonomi,"Peluncuran SpaceX, Apa Kegiatan Astronautnya d...",0,0,,Warta Ekonomi,False,2009-04-12 07:01:14,False,False,383674,2298,22589,"Jakarta, Indonesia"
2,SinarOnline,Tiga rakyat AS dan seorang warga Jepun berlepa...,0,1,,Twitter Web App,False,2011-07-08 02:51:57,False,False,539514,560,636832,Malaysia
3,tw0savage,Aku ingat diorang meniarap kat katil urut http...,0,0,,Twitter for Android,False,2020-09-20 00:38:56,False,True,2368,688,140,
4,Amsyarnaif,Bestnya keluar bumi. Takde covid hmm https://t...,1,5,,Twitter for iPhone,False,2016-02-03 04:56:21,False,True,63199,323,11235,Malaysia
5,Aisa_jaafa,pagi tadi dalam kelas pon layan live spacex la...,0,0,,Twitter for Android,False,2011-09-26 09:51:21,False,False,16652,157,241,
6,RioAaGoGo,"melok po o, ndk kene rusuh nemen https://t.co/...",0,0,,Twitter for iPhone,False,2011-08-03 01:06:16,False,False,12284,399,215,Malang-East Java-Indonesia
7,AdukaTaruna1453,Macam baju dalam Interstellar. Damn cool https...,0,0,,Twitter for Android,False,2017-03-09 08:48:07,False,True,9922,124,170,
8,yaksat__,Hati hati bro https://t.co/yCI1mVYOH9,0,0,,Twitter for Android,False,2019-08-05 02:36:02,False,True,4987,420,151,"Bali, Indonesia"
9,detikcom,Empat astronot telah diluncurkan dari Florida ...,0,12,,Echobox,True,2009-08-27 03:03:05,False,False,1720276,30,16325840,"Jakarta, Indonesia"


## Saving the data into csv file
we're saving the data just in case for further use.

In [7]:
tweet_df.to_csv(r'hasil_crawling.csv', index=False)