Skip to content

Commit 55b4307

Browse files
Merge pull request avinashkranjan#903 from RohiniRG/RohiniRG-twitterb
Twitter Scraper using snscrape
2 parents 5d5af7b + 16c5ce2 commit 55b4307

File tree

4 files changed

+167
-0
lines changed

4 files changed

+167
-0
lines changed
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# Tweet hashtag based scraper without Twitter API
2+
3+
- Here, we make use of snscrape to scrape tweets associated with a particular hashtag. Snscrape is a python library that scrapes twitter without the use of API keys.
4+
5+
- We have 2 scripts associated with this project one to fetch tweets with snscrape and store it in the database (we use SQLite3), and the other script displays the tweets from the database.
6+
7+
- Using snscrape, we are storing the hashtag, the tweet content, user id, as well as the URL of the tweets in the database.
8+
9+
## Requirements
10+
11+
Packages associated can be installed as:
12+
13+
```sh
14+
$ pip install -r requirements.txt
15+
```
16+
17+
## Running the script
18+
19+
For running the script which fetches tweets and other info associated with the hashtag and storing in the database:
20+
```sh
21+
$ python fetch_hashtags.py
22+
```
23+
24+
For running the script to display the tweet info stored in the database:
25+
```sh
26+
$ python display_hashtags.py
27+
```
28+
29+
## Working
30+
31+
```fetch_hashtags.py``` will work as follows:
32+
33+
![image](https://imgur.com/8YFK4OV.png)
34+
35+
```display_hashtags.py``` will work as follows:
36+
37+
![image](https://i.imgur.com/1uNEEMw.png)
38+
39+
## Author
40+
41+
[Rohini Rao](https://github.com/RohiniRG)
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
import sqlite3
2+
import os
3+
4+
5+
def sql_connection():
6+
"""
7+
Establishes a connection to the SQL file database
8+
:return connection object:
9+
"""
10+
path = os.path.abspath('./Twitter_Scraper_without_API/TwitterDatabase.db')
11+
con = sqlite3.connect(path)
12+
return con
13+
14+
15+
def sql_fetcher(con):
16+
"""
17+
Fetches all the tweets with the given hashtag from our database
18+
:param con:
19+
:return:
20+
"""
21+
hashtag = input("\nEnter hashtag to search: #")
22+
hashtag = '#' + hashtag
23+
count = 0
24+
cur = con.cursor()
25+
cur.execute('SELECT * FROM tweets') # SQL search query
26+
rows = cur.fetchall()
27+
28+
for r in rows:
29+
if hashtag in r:
30+
count += 1
31+
print(f'USERNAME: {r[1]}\nTWEET CONTENT: {r[2]}\nURL: {r[3]}\n')
32+
33+
if count:
34+
print(f'{count} tweets fetched from database')
35+
else:
36+
print('No tweets available for this hashtag')
37+
38+
39+
con = sql_connection()
40+
41+
while 1:
42+
sql_fetcher(con)
43+
44+
ans = input('Press (y) to continue or any other key to exit: ').lower()
45+
if ans == 'y':
46+
continue
47+
else:
48+
print('Exiting..')
49+
break
50+
Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
import snscrape.modules.twitter as sntweets
2+
import sqlite3
3+
4+
5+
def sql_connection():
6+
"""
7+
Establishes a connection to the SQL file database
8+
:return connection object:
9+
"""
10+
con = sqlite3.connect('./Twitter_Scraper_without_API/TwitterDatabase.db')
11+
return con
12+
13+
14+
def sql_table(con):
15+
"""
16+
Creates a table in the database (if it does not exist already)
17+
to store the tweet info
18+
:param con:
19+
:return:
20+
"""
21+
cur = con.cursor()
22+
cur.execute("CREATE TABLE IF NOT EXISTS tweets(HASHTAG text, USERNAME text,"
23+
" CONTENT text, URL text)")
24+
con.commit()
25+
26+
27+
def sql_insert_table(con, entities):
28+
"""
29+
Inserts the desired data into the table to store tweet info
30+
:param con:
31+
:param entities:
32+
:return:
33+
"""
34+
cur = con.cursor()
35+
cur.execute('INSERT INTO tweets(HASHTAG, USERNAME, CONTENT, '
36+
'URL) VALUES(?, ?, ?, ?)', entities)
37+
con.commit()
38+
39+
40+
con = sql_connection()
41+
sql_table(con)
42+
43+
while 1:
44+
tag = input('\n\nEnter a hashtag: #')
45+
max_count = int(input('Enter maximum number of tweets to be listed: '))
46+
47+
count = 0
48+
# snscrape uses the given string of hashtag to find the desired amount of
49+
# tweets and associated info
50+
for i in sntweets.TwitterSearchScraper('#' + tag).get_items():
51+
count += 1
52+
entities = ('#'+tag, i.username, i.content, i.url)
53+
sql_insert_table(con, entities)
54+
55+
if count == max_count:
56+
break
57+
58+
print('Done!')
59+
60+
ans = input('Press (y) to continue or any other key to exit: ').lower()
61+
if ans == 'y':
62+
continue
63+
else:
64+
print('Exiting..')
65+
break
66+
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
beautifulsoup4==4.9.3
2+
certifi==2020.12.5
3+
chardet==4.0.0
4+
idna==2.10
5+
lxml==4.6.2
6+
PySocks==1.7.1
7+
requests==2.25.1
8+
snscrape==0.3.4
9+
soupsieve==2.2
10+
urllib3==1.26.4

0 commit comments

Comments
 (0)