# Import Population file

This Docker container may serve as a starting point for your course project.
It includes a set of simple instructions to:
<br>
 -> load a dataset (locally)
 <br>
 -> into a database
 <br>
 -> and be able to query data from the database
 <br>
 -> and make simple visualizations on the queried data

In [2]:
# Imports

from sqlalchemy import create_engine, text, inspect, Table
import pandas as pd

## Load csv file

Load the file called world_population.csv into a pandas dataframe. Make sure you parse the columns correctly.

In [3]:
import os

# Get the current working directory
current_directory = os.getcwd()

# Get the parent directory (directory above the current directory)
parent_directory = os.path.dirname(current_directory)

# List all folders in the parent directory
folders_in_parent_directory = [folder for folder in os.listdir(parent_directory) if os.path.isdir(os.path.join(parent_directory, folder))]

# Print the list of folders
print("Folders in the parent directory:")
for folder in folders_in_parent_directory:
    print(folder)


Folders in the parent directory:
etc
proc
root
mnt
home
boot
dev
opt
lib
srv
sys
usr
media
var
bin
sbin
tmp
run
notebook
data


In [4]:
# Load the csv into a pandas dataframe (https://www.w3schools.com/python/pandas/pandas_dataframes.asp)
policefile = pd.read_csv("../data/policedata.csv")

print(policefile)

      ID;"SoortMisdrijf";"RegioS";"Perioden";"GeregistreerdeMisdrijven_1";"Aangiften_2";"Internetaangiften_3"
0      142;"0.0.0 ";"NL01  ";"2022JJ00";"  799681";" ...                                                     
1      293;"0.0.0 ";"LD01  ";"2022JJ00";"   63445";" ...                                                     
2      444;"0.0.0 ";"LD02  ";"2022JJ00";"  144450";" ...                                                     
3      595;"0.0.0 ";"LD03  ";"2022JJ00";"  420994";" ...                                                     
4      746;"0.0.0 ";"LDG4  ";"2022JJ00";"       .";" ...                                                     
...                                                  ...                                                     
22120  3340262;"3.9.3 ";"RE07  ";"2022JJ00";"      84...                                                     
22121  3340413;"3.9.3 ";"RE08  ";"2022JJ00";"      50...                                                     
22122  334

## Store data into database
Save the contents in the world_population file to the a table called population in the database. 

In [11]:
# Create a SQLAlchemy engine to connect to the PostgreSQL database
engine = create_engine("postgresql://student:infomdss@db_dashboard:5432/dashboard")

# Establish a connection to the database using the engine
# The 'with' statement ensures that the connection is properly closed when done
with engine.connect() as conn:
    # Execute an SQL command to drop the 'population' table if it exists
    # The text() function allows you to execute raw SQL statements
    result = conn.execute(text("DROP TABLE IF EXISTS crimes CASCADE;"))

# Assuming you have a DataFrame named 'world_population_df', the following line
# writes the data from the DataFrame to a new 'population' table in the database
# If the 'population' table already exists, it will be replaced with the new data
# This prints the number of rows entered in the database table
policefile.to_sql("crimes", engine, if_exists="replace", index=True)

125

## Fetch data from database
Read the table **population** from the database in a dataframe. Make sure the index column is the index of the dataframe.

In [13]:
# Read data from the SQL table named 'population' using pandas
# 'pd.read_sql_table' is a pandas function that reads data from an SQL table
# 'db_conn' is the database connection object previously established
crimes = pd.read_sql_table('crimes', engine, index_col='index')

# This line prints the entire DataFrame to the output
print(crimes)

# Note that we transformed the data from a .csv file to a pandas dataframe
# Then loaded the dataframe into the database table
# And now we have pulled the data from the database and put it in a dataframe again
# This is an example of how you might store and fetch data to and from your database for your dashboard

      ID;"SoortMisdrijf";"RegioS";"Perioden";"GeregistreerdeMisdrijve
index                                                                
0      142;"0.0.0 ";"NL01  ";"2022JJ00";"  799681";" ...             
1      293;"0.0.0 ";"LD01  ";"2022JJ00";"   63445";" ...             
2      444;"0.0.0 ";"LD02  ";"2022JJ00";"  144450";" ...             
3      595;"0.0.0 ";"LD03  ";"2022JJ00";"  420994";" ...             
4      746;"0.0.0 ";"LDG4  ";"2022JJ00";"       .";" ...             
...                                                  ...             
22120  3340262;"3.9.3 ";"RE07  ";"2022JJ00";"      84...             
22121  3340413;"3.9.3 ";"RE08  ";"2022JJ00";"      50...             
22122  3340564;"3.9.3 ";"RE09  ";"2022JJ00";"      52...             
22123  3340715;"3.9.3 ";"RE10  ";"2022JJ00";"      62...             
22124  3340866;"3.9.3 ";"RE99  ";"2022JJ00";"       ....             

[22125 rows x 1 columns]
