# Notes on SQLAlchemy Python 
## Examples for reference, tips, Best Practices

Based on the Courses: Understanding Databases with SQL Alchemy: Python Data Playbook

Author: Gonçalo Felício  
Date: 03/2022  
Provided by: ISIWAY

Something like a pocketbook to come to for quick references, examples, and tips of best practices, compiled with my own preferences.  

## SQL Alchemy
SQLAlchemy is a python toolkit to operate on multiple types of databases. It can be described as a Object-relational Mapper. This is the pythonic way of working with modular functions.  

First we download the databases from the exercises files of the course and load them to MySQL.  
To load a database into the MySQL Server open a terminal and run:

In [None]:
# change directory to the sql client dir
cd C:\Program Files\MySQL\MySQL Server 8.0\bin

# Connect to MySQL server with the mysql client
> mysql -u root -p
# enter password
Enter password: *********

# load database file with source command, include the path to the database file
mysql > source C:\Users\Goncalo\Documents\Databases\sqlAlchemy\sqlalchemy_mysql.sql
    
# Done! To see that it worked run:
mysql > SHOW DATABASES;

mysql > USE slqalchemy_mysql;
mysql > Select * FROM posts Limit 5;

# if it didnt load the database, try running the source command again. Make sure the server is running in 'Services'

Advantages of using SQLAlchemy:  
- May be more robust  
- Mitigates possibility of syntax errors  
- Scalability  
- More secure  

On the other side, can also use Raw SQL:
- More flexible  
- Strings with full query 
- Better performance


In [3]:
import sqlalchemy as db

In [4]:
engine = db.\
create_engine('mysql+mysqlconnector://root:Parolo_987@localhost:3306/sqlalchemy_mysql')

In [5]:
engine.table_names()

  engine.table_names()


['posts', 'tags', 'users']

The main takeaway is that SQLAlchemy is a mapper, it translates sql queries into object oriented code. Can use the respective exercise files to see many commands available and examples.

Commands related with changing a db table to a Pandas DF, after data, all the capabilities of pandas are available to analyse the table:

In [6]:
import pandas as pd

In [7]:
query = 'SELECT * FROM posts'
posts_df = pd.read_sql_query(query, engine)

In [8]:
posts_df.head()

Unnamed: 0,Id,AcceptedAnswerId,AnswerCount,Body,ClosedDate,CommentCount,CommunityOwnedDate,CreationDate,FavoriteCount,LastActivityDate,...,LastEditorDisplayName,LastEditorUserId,OwnerDisplayName,OwnerUserId,ParentId,PostTypeId,Score,Tags,Title,ViewCount
0,5,,1.0,,2014-05-14 14:40:26,1,NaT,2014-05-13 23:58:30,1.0,2014-05-14 00:36:31,...,,,,5.0,,1,9.0,<machine-learning>,How can I do simple machine learning without h...,448.0
1,7,10.0,3.0,,2014-05-14 08:40:55,4,NaT,2014-05-14 00:11:06,1.0,2014-05-16 13:45:00,...,,97.0,,36.0,,1,4.0,<education><open-source>,What open-source books (or other materials) pr...,388.0
2,9,,,,NaT,0,NaT,2014-05-14 00:36:31,,2014-05-14 00:36:31,...,,,,51.0,5.0,2,5.0,,,
3,10,,,,NaT,1,NaT,2014-05-14 00:53:43,,2014-05-14 00:53:43,...,,,,22.0,7.0,2,12.0,,,
4,14,29.0,4.0,,NaT,1,NaT,2014-05-14 01:26:00,4.0,2014-06-20 17:36:05,...,,322.0,,66.0,,1,21.0,<data-mining><definitions>,Is Data Science the Same as Data Mining?,1243.0


In [11]:
posts_df.columns

Index(['Id', 'AcceptedAnswerId', 'AnswerCount', 'Body', 'ClosedDate',
       'CommentCount', 'CommunityOwnedDate', 'CreationDate', 'FavoriteCount',
       'LastActivityDate', 'LastEditDate', 'LastEditorDisplayName',
       'LastEditorUserId', 'OwnerDisplayName', 'OwnerUserId', 'ParentId',
       'PostTypeId', 'Score', 'Tags', 'Title', 'ViewCount'],
      dtype='object')

In [20]:
# the error is because column ViewCount has some values as 'None' instead of Nan, which gives a type conflict of strings and ints
# a way to solve this is to replace all None columns to Nan
posts_df[['ViewCount','AnswerCount']].max()

  posts_df[['ViewCount','AnswerCount']].max()


AnswerCount    34.0
dtype: float64

In [21]:
posts_df[['ViewCount','AnswerCount']].describe()

Unnamed: 0,AnswerCount
count,11798.0
mean,1.136294
std,1.143618
min,0.0
25%,0.0
50%,1.0
75%,2.0
max,34.0
