<a href="https://colab.research.google.com/github/MUmairAB/Twitter-Climate-Change-Sentiment-Analysis/blob/main/Twitter_scrapping.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**This Notebook scrapes the data regarding "Climate Change" from Twitter. For scrapping the Twitter "snscrape" library is used.**

In [1]:
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

We need to extract the tweet data from Twitter. To do that, we'll use **Snscrape**. It is a scraper for **social networking services (SNS)**. 

First, we'll install it and then use it to extract the required data.

In [2]:
# Installing the snscrape module
! pip install snscrape

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting snscrape
  Downloading snscrape-0.6.0.20230303-py3-none-any.whl (71 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m71.6/71.6 KB[0m [31m1.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: snscrape
Successfully installed snscrape-0.6.0.20230303


Since, **snscraper** is successfully installed, we will now load the module.

In [3]:
# Importing the module
import snscrape.modules.twitter as sntwitter

Now, we will use **snscraper's** **TwitterSearchScraper** function to scrape the twitter data.

We are extracting **3000 tweets** per day for the **370 days**. The tweets are related to **Climate Change**.

This amounts to **1,110,000** or **1.1 million** tweets.

In [4]:
""" Extracting 1.1 million tweets in one go requires more computation power than provided
by Google Colab or Kaggle. So to work around this problem, we need to run the program a 
couple of times. 

That is, first, we'll set days = 0 in the following line

             current_date = (datetime.now().date() - timedelta(days=0))

This will give us the current date.
we will set days to 50 in the following variable at line 6

             # days for which we will scrape the data
             days = 50

We will run the rest of the code as it is. This will scrape 3,000 tweets for the last 50 days,
amounting to 150K tweets in one loop. 
We'll save the DataFrame to a csv file and download it. Then, for the 2nd time, we'll:
  --> set the days in current_date to 50
and run the code without changing anything else. And save the new csv file. For, third run, 
days in the current_date will be changed to 100 and so on...

In this way, in a couple of runs, you can download any number of tweets you want.
"""
# extracting today's date or relative date

current_date = (datetime.now().date() - timedelta(days=0))

# days for which we will scrape the data
days = 50

# max tweets to extract in a day
max_tweets = 3000

# list to store the tweets
tweets = list()

while days >0:
    
    loop_date = (current_date - timedelta(days=days))
    querry = 'climate change since:{0} until:{1}'.format(str(str(loop_date - timedelta(1))),str(str(loop_date)))

    # Using for loop to scrape tweets data on "Climate Change"
    for num_of_tweets, tweet in enumerate(sntwitter.TwitterSearchScraper(querry).get_items()):
        if num_of_tweets>max_tweets:
            break
        tweets.append([tweet.date, tweet.id, tweet.rawContent, tweet.user.username,\
                      tweet.likeCount, tweet.retweetCount, tweet.lang])
    days -= 1

In [5]:
# converting the tweet data to a DataFrame
tweets_df = pd.DataFrame(tweets, columns=['Datetime', 'Tweeter Id', 'Content', 'Twitter Username',\
                                 'Likes','No of Retweets','Tweet Language'])

# display first 5 entries from dataframe
tweets_df.head()

Unnamed: 0,Datetime,Tweeter Id,Content,Twitter Username,Likes,No of Retweets,Tweet Language
0,2023-01-11 23:59:57+00:00,1613324868682153984,“We will never survive the climate crisis with...,andrewpaulhill,1,0,en
1,2023-01-11 23:59:52+00:00,1613324847488159750,"Growing up in the Ruby Sea, it was normal to w...",Keelster361,1,0,en
2,2023-01-11 23:59:39+00:00,1613324793247301632,@RoKhanna I thought it was Climate Change ™️,superdupler,0,0,en
3,2023-01-11 23:59:38+00:00,1613324790642741250,@JackPosobiec Yeap … know a people in the hole...,ellmaness,0,0,en
4,2023-01-11 23:59:35+00:00,1613324775643815937,Just when I think they can’t get any more stup...,javaluvingrams,1,0,en


In [6]:
# saving the tweets data
tweets_df.to_csv(f'tweets_{str(current_date)}.csv',index=False)