# **Scraping data from Twitter using Python package**

This tutorial will follow through the process of scraping tweets from Twitter using Twint python library.  

Twint is an advanced Twitter scraping tool written in Python that allows for scraping Tweets from Twitter profiles without using Twitter's API. Twint utilizes Twitter's search operators to let you scrape Tweets from specific users, scrape Tweets relating to certain topics, hashtags & trends.

Twint documentation can be found from [this URL](https://github.com/twintproject/twint).

## Install and Load Libraries

**twint** package is not pre-installed in the Colab environment. Therefore run the following command to install it.  
If you are working on your own Python environment, you can install twint using Anaconda interactive user interface or running *pip install twint* from your command prompt.

In [14]:
!pip install --user --upgrade -e git+https://github.com/twintproject/twint.git@origin/master#egg=twint
# You need to go to Runtime option and select restart runtime to get the effect of this installation.
!pip install nest-asyncio

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Obtaining twint from git+https://github.com/twintproject/twint.git@origin/master#egg=twint
  Updating ./src/twint clone (to revision origin/master)
  Running command git fetch -q --tags
  Running command git reset --hard -q origin/master
Installing collected packages: twint
  Attempting uninstall: twint
    Found existing installation: twint 2.1.21
    Uninstalling twint-2.1.21:
      Successfully uninstalled twint-2.1.21
  Running setup.py develop for twint
Successfully installed twint


In [1]:
import twint
import nest_asyncio
nest_asyncio.apply()

## Design the scaping parameters

Next you need to design which parameters you need to scrape. The parameters to be customised can be found from Twint API located in [this URL](https://github.com/twintproject/twint/wiki/Configuration).  

For this exercise, we will scrape tweets with specific keywords and within a pre-given time period.

In [2]:
keyword = 'LaTrobeBusiness'

In [3]:
since_date = "2020-03-01 00:00:00"  # Should be '%Y-%m-%d %H:%M:%S' time format

Set the parameters

In [4]:
c = twint.Config()
c.Search = keyword
c.Since = since_date

Output parameters

In [5]:
c.Lang = "en"
c.Store_csv = True
c.Output = keyword

Format the output attributes.  
Available parameters can be found from [this URL](https://github.com/twintproject/twint/wiki/Tweet-attributes).

Here we will use exeption handling in order to warn us if there is an error. This is called python exception handling using try-catch block.

In [None]:
try:
  # Start search
  twint.run.Search(c)
except Exception as e:
  print(e)
  print('Error in {}'.format(keyword))

The data will be saved in different folders based on the keywords in the **Files** tab.  
To download, right click on tweets.csv file, and download.

### Keyword combinations

You can search and scrape tweets using multiple keywords and combinations as documented in [this URL](https://github.com/twintproject/twint/issues/165#issuecomment-399747575).

In [7]:
keyword = 'latrobe covid19'
since_date = "2020-03-01 00:00:00"  # Should be '%Y-%m-%d %H:%M:%S' time format
c = twint.Config()
c.Search = keyword
c.Limit = 40  # (Increments of 20) e.g., If 23, it will scrape 40. If 12, it will scrape 20.
c.Since = since_date
c.Store_csv = True
c.Output = keyword

In [None]:
try:
  # Start search
  twint.run.Search(c)
except Exception as e:
  print(e)
  print('Error in {}'.format(keyword))

### Multiple keywords

We can loop to scape multiple keywords.

In [9]:
Keyword_list = ['LBS', 'data analytics latrobe']

In [10]:
since_date = "2020-03-01 00:00:00"  # Should be '%Y-%m-%d %H:%M:%S' time format

In [None]:
for key in Keyword_list:

  print('Searching Tweets for {}'.format(key))

  c = twint.Config()
  c.Search = key
  c.Limit = 40
  c.Since = since_date
  c.Store_csv = True
  c.Output = key

  try:
    # Start search
    twint.run.Search(c)
  except Exception as e:
    print(e)
    print('Error in {}'.format(key))
  continue

  print('\n\nCompleted {}\n\n'.format(key))