# Article Notebook for Scraping Twitter Using snscrape's Python Wrapper
<br>Package Github: https://github.com/JustAnotherArchivist/snscrape
<br>This notebook will be using the development version of snscrape

Article Read-Along: https://medium.com/better-programming/how-to-scrape-tweets-with-snscrape-90124ed006af

### Notebook Author: Martin Beck
<b>Information current as of November, 28th 2020</b><br>

This notebook contains materials for scraping tweets from Twitter using snscrape's Python Wrapper

<b>Dependencies: </b> 
- Your <b>Python</b> version must be <b>3.8</b> or higher. The development version of snscrape will not work with Python 3.7 or lower. You can download the latest Python version [here](https://www.python.org/downloads/).
- <b>Development version of snscrape</b>, uncomment the pip install line in the below cell to pip install in the notebook if you don't already have it.
- <b>Pandas</b>, the dataframes allows easy manipulation and indexing of data, this is more of a preference but is what I follow in this notebook.

In [2]:
# Run the pip install command below if you don't already have the library
# !pip install git+https://github.com/JustAnotherArchivist/snscrape.git

# Run the below command if you don't already have Pandas
# !pip install pandas

# Imports
import snscrape.modules.twitter as sntwitter
import pandas as pd

# Query by Username
The code below will scrape for 100 tweets by a username then provide a CSV file with Pandas

In [3]:
# Setting variables to be used below
maxTweets = 100

# Creating list to append tweet data to
tweets_list1 = []

# Using TwitterSearchScraper to scrape data and append tweets to list
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:jack').get_items()):
    if i>maxTweets:
        break
    tweets_list1.append([tweet.date, tweet.content, tweet.user.username])

In [4]:
# Creating a dataframe from the tweets list above
tweets_df1 = pd.DataFrame(tweets_list1, columns=['Datetime', 'Tweet Id', 'Text', 'Username'])

# Display first 5 entries from dataframe
tweets_df1.head()

Unnamed: 0,Datetime,Tweet Id,Text,Username
0,2021-10-05 12:02:17+00:00,1445358681017749507,6,jack
1,2021-10-04 18:57:05+00:00,1445100680990138372,Signal is WhatsUp\n\n🆙 https://t.co/zpRrxf9qKP...,jack
2,2021-10-04 18:18:31+00:00,1445090973772562432,@usainbolt 😁,jack
3,2021-10-04 17:55:43+00:00,1445085236933664770,wow this blew up\n\nhere’s a link to my SoundC...,jack
4,2021-10-04 17:38:01+00:00,1445080782335365125,@WhatsApp @obyezeks @Twitter thought this was ...,jack


In [5]:
# Export dataframe into a CSV
tweets_df1.to_csv('user-tweets.csv', sep=',', index=False)

# Query by Text Search
The code below will scrape for 500 tweets between June 1st, 2020 and July 31st, 2020, by a text search then provide a CSV file with Pandas

In [11]:
# Setting variables to be used below
maxTweets = 5000

# Creating list to append tweet data to
tweets_list2 = []

# Using TwitterSearchScraper to scrape data and append tweets to list
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('bitcoin since:2021-06-01 until:2021-07-31').get_items()):
    if i>maxTweets:
        break
    tweets_list2.append([tweet.date, tweet.content, tweet.user.username])

In [13]:
# Creating a dataframe from the tweets list above
tweets_df2 = pd.DataFrame(tweets_list2, columns=['Datetime', 'Text', 'Username'])

# Display first 5 entries from dataframe
tweets_df2.head()

Unnamed: 0,Datetime,Tweet Id,Text,Username
0,2021-07-30 23:59:59+00:00,1421259310013763593,Current stats of DELEGATE_DONT_HATE\nRank: 26\...,XCASH_HIPPIE
1,2021-07-30 23:59:59+00:00,1421259308642177024,Clear manipulation by whales in the market to ...,xTaySol
2,2021-07-30 23:59:59+00:00,1421259307996401664,Current stats of Pullki\nRank: 46\nBlocks Foun...,Pullki
3,2021-07-30 23:59:59+00:00,1421259306641629184,@BinanceChain #PlayToEarn #BscGameFi\nBuy 🥚$EG...,egg_chain
4,2021-07-30 23:59:57+00:00,1421259299817508868,long bit coin @ 41.8 shekels https://t.co/v4UM...,virtualmetrics


In [14]:
# Export dataframe into a CSV
tweets_df2.to_csv('text-query-tweets.csv', sep=',', index=False)

In [None]:
# Output into SQL Database

import mysql.connector

tweetsdb = mysql.connector.connect(
  user="knightshade",
  database="mydatabase"
)

mycursor = tweetsdb.cursor()

sql = "INSERT INTO customers (name, address) VALUES (%s, %s)"
val = ("John", "Highway 21")
mycursor.execute(sql, val)

tweetsdb.commit()

print(mycursor.rowcount, "record inserted.")