# Wikipedia Events Database
---
  
**Goal:** create a database for the events I retrieve from Wikipedia and write python scripts for CRUD-ing data in the database.   
  
  
**Notes:** 
- this is a supporting notebook for the TADS_Wikipedia_this_day_in_history project  

### TO-DO:
---  
  
- [ ] create database  
- [ ] create tables  
- [x] create picture of the database schema
- [ ] write code for CRUD-ing data in the database
    - example

In [1]:
import sqlite3

### Preliminary schema

In [3]:
%%html 
<img src = '../images/img_ss_database_schema_01_01feb21.jpg' width='70%'>


## Create database and tables

In [None]:
with sqlite3.connect(DATABASE_FILE) as conn:
    # initialize cursor
    c = conn.cursor()
    
    # create table wiki_day_data_log
    c.execute('''CREATE TABLE IF NOT EXISTS wiki_day_data_log 
                (day_data_log_id INTEGER PRIMARY KEY,
                 doe TEXT,
                 day INTEGER,
                 month INTEGER,
                 status_code INTEGER,
                 day_soup BLOB,
                 no_event_for TEXT)''')
    
    # create table wiki_link
    c.execute('''CREATE TABLE IF NOT EXISTS wiki_link 
                (link_id INTEGER PRIMARY KEY,
                 doe TEXT,
                 link_url TEXT UNIQUE)''')
    
    # create table wiki_event
    c.execute('''CREATE TABLE IF NOT EXISTS wiki_event 
                (event_id INTEGER PRIMARY KEY,
                 doe TEXT,
                 day INTEGER,
                 month INTEGER,
                 year INTEGER,
                 bc_ad TEXT,
                 bc_ad_note TEXT,
                 event_type TEXT,
                 event_description TEXT,
                 event_first_link_id INTEGER,
                 event_links_list BLOB,
                 FOREIGN KEY (event_first_link_id) REFERENCES wiki_link(link_id))''')
    

    # create table wiki_link_usage
    c.execute('''CREATE TABLE IF NOT EXISTS wiki_link_usage 
                (link_usage_id INTEGER PRIMARY KEY,
                 doe TEXT,
                 link_id INTEGER,
                 event_id INTEGER,
                 is_first_link INTEGER,
                 FOREIGN KEY (link_id) REFERENCES wiki_link(link_id),
                 FOREIGN KEY (event_id) REFERENCES wiki_event(event_id))''')
    
    # create table wiki_link_data_log
    c.execute('''CREATE TABLE IF NOT EXISTS wiki_link_data_log 
                (link_data_log_id INTEGER PRIMARY KEY,
                 doe TEXT,
                 link_id INTEGER,
                 status_code INTEGER,
                 FOREIGN KEY (link_id) REFERENCES wiki_link(link_id))''')
    
    # create table wiki_link_size
    c.execute('''CREATE TABLE IF NOT EXISTS wiki_link_size 
                (link_size_id INTEGER PRIMARY KEY,
                 doe TEXT,
                 link_id INTEGER, 
                 link_size INTEGER,
                 FOREIGN KEY (link_id) REFERENCES wiki_link(link_id))''')
    
    # create table wiki_link_page_views
    c.execute('''CREATE TABLE IF NOT EXISTS wiki_link_page_views 
                (link_page_views_id INTEGER PRIMARY KEY,
                 doe TEXT,
                 link_id INTEGER,
                 page_views_date TEXT,
                 page_views INTEGER,
                 FOREIGN KEY (link_id) REFERENCES wiki_link(link_id))''')
    
    # create table wiki_image
    c.execute('''CREATE TABLE IF NOT EXISTS wiki_image 
                (image_id INTEGER PRIMARY KEY,
                 doe TEXT,
                 image_url TEXT)''')
    
    # create table wiki_image_usage
    c.execute('''CREATE TABLE IF NOT EXISTS wiki_image_usage
                (image_usage_id INTEGER PRIMARY KEY,
                 doe TEXT,
                 image_id INTEGER,
                 link_id INTEGER,
                 FOREIGN KEY (image_id) REFERENCES wiki_image(image_id))''')
    
    # create table wiki_credit
    c.execute('''CREATE TABLE IF NOT EXISTS wiki_credit 
                (credit_id INTEGER PRIMARY KEY,
                 doe TEXT,
                 credit_username TEXT UNIQUE)''')
    
    # create table wiki_copyright license
    c.execute('''CREATE TABLE IF NOT EXISTS wiki_copyright_license
                (copyright_license_id INTEGER PRIMARY KEY,
                 doe TEXT,
                 license_name TEXT,
                 license_description TEXT,
                 is_reuse_rights INTEGER)''')
    
    # create table wiki_image_info
    c.execute('''CREATE TABLE IF NOT EXISTS wiki_image_info 
                (image_info_id INTEGER PRIMARY KEY,
                 doe TEXT,
                 image_id INTEGER,
                 copyright_license_id INTEGER,
                 image_credit_id INTEGER,
                 FOREIGN KEY (image_id) REFERENCES wiki_image(image_id),
                 FOREIGN KEY (copyright_license_id) REFERENCES wiki_copyright_license(copyright_license_id),
                 FOREIGN KEY (image_credit_id) REFERENCES wiki_credit(credit_id))''')   

## Database CRUD functions

In [None]:
def add_day_data_to_db(day_data_dict, doe = DOE):
    """
    Main function for adding a day's data to wikipedia_tdih.db.
    It that takes the day data retrieved and cleaned up with day_data_main and uses it to update
    the following tables:
        - wiki_day_data_log
        - wiki_event
        - wiki_link (if link not already in wiki_link)
        - wiki_link_usage
    
    Params:
        day_data_dict: dictionary with day data (created with day_data_main)
        doe: date of entry (defaults to current day)
        
    Returns:
        update_results: a list of dictionaries with the status of each table update
    """
    update_results = []
    
    # make smaller dictionaries based on which table data belongs in
    # data for the wiki_day_data_log
    day_data = {key: day_data_dict[key] for key in ['status_code', 'day_soup', 'no_event_for']}
    # data for wiki_event (and, if needed, wiki_link)
    event_data = {key: day_data_dict.get(key, []) for key in ['events', 'births', 'deaths',
                                                            'holidays_and_observances']}
    
    # get day and month for the data being looged
    day = day_data_dict['day']
    month = day_data_dict['month']
    
    with sqlite3.connect(DATABASE_FILE) as conn:
        c = conn.cursor()
        
        # log data into wiki_day_data_log
        update_wiki_day_data_log = update_table_wiki_day_data_log(day_data, day, month, doe, c)
        update_results.append(update_wiki_day_data_log)
        
        # log data into wiki_event (and, if needed, in wiki_link)
        for event_type in event_data:
            update_wiki_event = update_table_wiki_event(event_data[event_type], event_type, doe, c)
            update_results.append(update_wiki_event)
            
    return update_results

Flow:
---  
  
1. run day_data_main
    - returns nested dictionary day_data_dict
        - status_code
        - day_text_soup
        - events
        - births
        - deaths
        - holidays_and_observances
        - no_event_for  


2. log data into wiki_get_day_data_log
    - id
    - day
    - month
    - status_code
    - day_text_soup
    - no_event_for