## **Opsætning af SQL-database (Fakenews data)**
Dette dokument tager udgangspunkt i fakenews datasættet

### **Klargøring af data**
Når man har datasættet i csv format skal man først opdele det i mindre dele og ryde op i data’et. Dette kan gøre med scriptet: https://github.com/JonathanAhrenkiel-Frellsen/Milestone_Datascience/blob/master/Organize_Daniel_crash.ipynb

Dette er det script vi brugte i milestone opgaven.

### **Opsætning af værktøjer**
Efter det skal man opsætte sin database. Dette kan man gøre i pgadmin eller psql terminalen. Jeg tager udgangspunkt i psql! De credentials man skal bruge i psql er bare default og derefter din kode (koden behøver ikke at være lang, det er normalt at den bare er ”root”). 

### **Opstilling af databasen**
Før man opstiller databasen er det i en god ide at skrive ”\d” i psql, dette giver alle tables. Hvis den er tom kan du forsætte, hvis du vil slætte et table skal man skrive ”DROP TABLE [table_name] **CASCADE**;” [table_name] er navnet på det table man vil slætte, CASCADE  er ikke nødvendigt, kun hvis man slætter et table som har relation til andre tables eller omvent. **HUSK SIMICOLON** jeg glemmer det altid 😉

Når eller hvis du har slættet de tables som ikke skal være der kan du skrive komandoen ”\i [path_to_sql_file].sql;” eksempel: ”\i C:/Users/jola1/Desktop/ERDreal.sql;” tor det er vigtigt at skrive ”/” og ikke ”\”, men er ikke sikker. Efter det kan du skrive ”\d” igen for at se om dine tables er lavet. 

### **Populate database**
Nu har du opsættet csv filerne fra afsnittet ”Klargøring af data” og opsættet databasen i afsnittet ”Opstilling af databasen” nu kan du overfør dataet fra csv filerne til databasen med psql. Dette gør man med komandoen: “\COPY [table_name] FROM '[csv_file_position]’ DELIMITER ',' CSV HEADER; 

Her er et eksempel: 

\COPY

article(id,domain_id,type_id,url,content,title,meta_description,scraped_at,updated_at,inserted_at) FROM 'C:/Users/jola1/Desktop/Milestone_Datascience-master/article_clean.csv' DELIMITER ',' CSV HEADER; 

eller 

\COPY article FROM 'C:/Users/jola1/Desktop/Milestone_Datascience-master/article_clean.csv' DELIMITER ',' CSV HEADER;

Den sql fil som jeg bruger kan downloades fra her:

https://discordapp.com/channels/@me/690271126944874496/714802715682668585

## General useful commands
### |  |  |
| SQL cmd | cmd description |
| :------------ |:------------|
| \l | List available databases |
| \c database_name | Connect to a Database  |
| \dt | List available tables |
| \d table_name | Describe a table |
| \dn | List available schema |
| \df | List available functions |
| \dv | List available views |
| \du | List users and their roles |
| \dt \*.\* | List tables in all schemas |
| \dt public.* | List tables in "public"-schema |
| \i filename | Execute psql commands from a file |

## CREATING VIEW

IMP pt we chose to combine with these colums/tables

Below tables shows what tables has to be conbined

| FakeNews table | WikiNews table |
| ------------- |-------------|
| Content | content |
| title | title |
| type_id | [create type_id] = reliable_id |
| type | [create type] = reliable |
| Meta_keywords| Categories |

## Populating DataBase

Below are function declarations to make pipeline simpler

In [7]:
import psycopg2 # postgres lib
import os       # work with path

# get abs_path for a file
def file_path(rel_path="", file_name=""):
    # get absolute path of file
    path = os.path.abspath(rel_path+file_name)

    # always use "/"
    newPath = '/'.join(path.split('\\'))

    return newPath

# Querry a database 
def SQL_query(query="", use_database= "data_science"):
    try:
        # Connect to an existing database
        connection = psycopg2.connect(user = "postgres",
                                    password = "root",
                                    host = "127.0.0.1",
                                    port = "5432",
                                    database = use_database)
        # Open a cursor to perform database operations
        cursor = connection.cursor()
        # Execute a command
        cursor.execute(query)
        # Make the changes to the database persistent
        connection.commit()

    # Print if something is wrong ie. setup
    except (Exception, psycopg2.Error) as error :
        print ("[PostgreSQL Error] -", error)
    finally:
        #closing database connection.
            if(connection):
                cursor.close()
                connection.close()
                print("[PostgreSQL connection is closed]")

# Generate COPY_files for SCHEMA
def create_database_copy_file(SHCEMA="SHCEMA.", csv_location="", tables=["table01"], return_path=""):
    SQL_COPY_file = open(return_path+"COPY_for_"+SHCEMA+"sql", "w")

    SQL_COPY_file.write("/* This file is unique to one pc, use Setup_SQL.ipynb to generate local */\n\n")

    # Delete previous files
    SQL_COPY_file.write("/* Remove all data from SCHEMA */\n")
    for t in tables:
        TABLE_COPY = "TRUNCATE TABLE {0} CASCADE;".format(SHCEMA+t)
        SQL_COPY_file.write(TABLE_COPY+"\n\n")

    # COPY into table
    SQL_COPY_file.write("/* Used for encoding error */\nSET CLIENT_ENCODING TO 'utf8';\n\n")
    SQL_COPY_file.write("/* COPY into SCHEMA */\n")
    for t in tables:
        TABLE_COPY = "COPY {0} FROM '{1}' DELIMITER ',' CSV HEADER;".format(SHCEMA+t , file_path(csv_location, t+".csv"))
        SQL_COPY_file.write(TABLE_COPY+"\n\n")

    SQL_COPY_file.close()

## Generate copy files

at preset time you have to go into your database and run all cmds form the COPY_files

In [8]:
# tables used i each SCHEMA
fakenews_tables = ['domain_name', 'type', 'article', 'tags', 'tags_in', 'authors', 'authors_in', 'meta_keywords', 'meta_keywords_in' ]
wikinews_tables = ['article', 'sources', 'source_to', 'in_category']

# create COPY for fakenews
create_database_copy_file(SHCEMA="fakenews.", csv_location="../Data_git_ignore/clean_csv/", tables=fakenews_tables, return_path="../Code/")

# create COPY for wikinews
create_database_copy_file(SHCEMA="wikinews.", csv_location="../Data_git_ignore/clean_csv/", tables=wikinews_tables, return_path="../Code/")

# BELOW ARE JUST THINGS BUT DON*T DELETE IT

In [141]:
t = 'Domain_Name'
SQL_query("TRUNCATE TABLE fakenews.Domain_Name CASCADE; COPY fakenews.Domain_Name FROM 'd:/Personal/OneDrive/KU-uni/DataScience/Python/Datascience_Final_Project/Appendix/Data_git_ignore/clean_csv/Domain_Name_clean.csv' DELIMITER ',' CSV HEADER;")

[PostgreSQL connection is closed]


In [108]:
SQL_query("TRUNCATE TABLE fakenews.Domain_Name CASCADE;")

[PostgreSQL Error] - no results to fetch
[PostgreSQL connection is closed]


In [107]:
SQL_query("SELECT domain_id  FROM fakenews.Domain_Name;")

[PostgreSQL return] -  [(0,), (1,), (2,), (3,), (4,), (5,), (6,), (7,), (8,), (9,), (10,), (11,), (12,), (13,), (14,), (15,), (16,), (17,), (18,), (19,), (20,), (21,), (22,), (23,), (24,), (25,), (26,), (27,)] 

[PostgreSQL connection is closed]


In [164]:
file_path("../Code/", "fakenews.sql")

'd:/Personal/OneDrive/KU-uni/DataScience/Python/Datascience_Final_Project/Appendix/Code/fakenews.sql'

In [None]:
with open('Path/to/file', 'r') as content_file:
    content = content_file.read()