# Building the Database

In the previous notebook, I played around with the GetOldTweets python package I'd downloaded and pulled some tweets with the search criteria I was interested in.

Now it's time to start creating the database that will hold these tweets.

Thinking back on the last project I did, I gathered way too much information. I collected nearly 100k tweets, with identifying information such as the tweet ID, geo location, time stamps and others.

But for this project, I want to run a sentiment analysis, and I want to keep it pretty simple. At the bare minimum, I'm only going to need the tweet text. I could collect the hashtags, but from the preliminary data analysis the search criteria will grab whole words ("Naomi Osaka") as well as hashtags ("#naomiosaka") so I don't need to duplicate this information.

I would, though, at least like to gather location information (if available) so I can see if the sentiment towards Naomi and Serena differs depending on where in the world the person is Tweeting from.

In that same vein, I'll be pulling Japanese-only tweets into separate tables, and hoping to find a way to filter out non-English tweets from the English tables.

So, it looks like I will have a database of "Naomi Serena Tweets" with the following tables, delineated by search query:

* "naomi osaka"
* "serena williams"
* "大阪なおみ"
* "セレナウィリアムズ"

For each table, I'll need to populate it with the following information:

* tweetID
* tweet text
* tweet location

That will give me a unique ID to identify each tweet by, the main text of the tweet that I want to analyze, and the possible location of each tweet for further analysis.

## Environment Setup
So that this project can be run entirely in Jupyter, I'm going to move my copy of GetOldTweets into this directory so I can call it without constantly appending it to the syspath. I'll also import the database management module I'm going to be using.

In [9]:
import sys
sys.path.append("GetOldTweets-python-master/")
import got3
import sqlite3
from sqlite3 import Error

### Test DB
Before I create the real thing, I want to make sure I know how to use the module. So I'll be creating and populating test tables here.

In [10]:
connect = sqlite3.connect('test.db')
print(sqlite3.version)
connect.close()

2.6.0


That successfully created a database titled "test" in the current directory. Now I'll try to populate it with a few tables:

In [11]:
connect = sqlite3.connect('test.db')
cursor = connect.cursor()

cursor.executescript('''
    DROP TABLE IF EXISTS Test001;
    DROP TABLE IF EXISTS Test002;

CREATE TABLE Test001 (
    id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
    tweet_id INTEGER,
    tweet_text TEXT,
    tweet_loc TEXT);
    
CREATE TABLE Test002 (
    id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
    tweet_id INTEGER,
    tweet_text TEXT,
    tweet_loc TEXT);
''')

<sqlite3.Cursor at 0x22cffcf95e0>

In [12]:
connect.close()

In [13]:
print(cursor)

<sqlite3.Cursor object at 0x0000022CFFCF95E0>


Done! That wasn't too bad.

Okay. I think I'm ready to build my databases and start populating them. I'll be doing this in a separate notebook to keep everything neat and tidy.