#  From .csv to SQL (hispasonic 2)

<br>

### This is the second part of hispasonic project.

![todo.png](todo.png)


The aim to be achieved are:

- Establish communication with mysql through jupyter notebooks.

- Create the database and dump the file contents.csv into the database

<br>


The third part of this project (hispasonic 3) will be responsible for making the corresponding queries to the database to obtain the insights of our data.

In [1]:
import pandas as pd
import datetime as dt

In [2]:
data = pd.read_csv("htmls/df_hispa_1092022.csv",index_col=[0]) # unname column has been deleted

In [3]:
data.head(2)

Unnamed: 0,urgent,buy,change,sell,price,gift,search,repair,parts,synt_brand,description,city,published,expire,date_scrapped,seen,anon_user
1,0,0,0,1,350,0,0,0,0,dreadbox,dreadbox nyx v1,Valencia,08/05/2022,30/09/2022,1/9/2022,742,1073
2,0,0,0,1,350,0,0,0,0,modal electronics,cobalt 5s,Castellón,25/07/2022,23/09/2022,1/9/2022,350,1845


In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 805 entries, 1 to 805
Data columns (total 17 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   urgent         805 non-null    int64 
 1   buy            805 non-null    int64 
 2   change         805 non-null    int64 
 3   sell           805 non-null    int64 
 4   price          805 non-null    int64 
 5   gift           805 non-null    int64 
 6   search         805 non-null    int64 
 7   repair         805 non-null    int64 
 8   parts          805 non-null    int64 
 9   synt_brand     805 non-null    object
 10  description    805 non-null    object
 11  city           805 non-null    object
 12  published      805 non-null    object
 13  expire         805 non-null    object
 14  date_scrapped  805 non-null    object
 15  seen           805 non-null    int64 
 16  anon_user      805 non-null    int64 
dtypes: int64(11), object(6)
memory usage: 113.2+ KB


### change date columns to datetime.


In [5]:
def date_totime(time_str):
    time_dt = dt.datetime.strptime(time_str, "%d/%m/%Y")
    return time_dt

In [6]:
for column in ['date_scrapped','published','expire']:
    data[column] = data[column].apply(date_totime)

In [7]:
data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 805 entries, 1 to 805
Data columns (total 17 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   urgent         805 non-null    int64         
 1   buy            805 non-null    int64         
 2   change         805 non-null    int64         
 3   sell           805 non-null    int64         
 4   price          805 non-null    int64         
 5   gift           805 non-null    int64         
 6   search         805 non-null    int64         
 7   repair         805 non-null    int64         
 8   parts          805 non-null    int64         
 9   synt_brand     805 non-null    object        
 10  description    805 non-null    object        
 11  city           805 non-null    object        
 12  published      805 non-null    datetime64[ns]
 13  expire         805 non-null    datetime64[ns]
 14  date_scrapped  805 non-null    datetime64[ns]
 15  seen           805 non-

### do I have the mysql service working?

I check it in the terminal:


    (base) $systemctl list-units --type=service | grep mysql
     mysql.service                                         loaded active running MySQL Community Server
         
yes!

In [8]:
import sys     #This module provides functions & variables to manipulate parts of Python runtime environment.
import pymysql # pure-Python MySQL client library. installed in env conda
import sqlalchemy 
from sqlalchemy import create_engine # establish connection database, using the create_engine() function of SQLAlchemy.

In [9]:
! mysql -V

mysql  Ver 8.0.30-0ubuntu0.20.04.2 for Linux on x86_64 ((Ubuntu))


### Accesing into mysql

#### Establish a connection with your existing database, using the `create_engine()`function of `SQLAlchemy`.

In [10]:
# create sqlalchemy engine
engine = create_engine("mysql+pymysql://{user}:{pw}@localhost/{db}"
                       .format(user="ion",
                               pw="jjxx33pp",
                               db="hispasonic"))

In [11]:
sqlalchemy.engine

<module 'sqlalchemy.engine' from '/home/ion/anaconda3/envs/dataquest/lib/python3.8/site-packages/sqlalchemy/engine/__init__.py'>

- we can load in previously installed SQL module:

In [12]:
%load_ext sql

In [13]:
%sql mysql+pymysql://ion:jjxx33pp@localhost/

'Connected: ion@'

### Do I have any databases installed?

- connection established, so I can check which are the existing databases.

In [14]:
%%sql

show databases;

 * mysql+pymysql://ion:***@localhost/
4 rows affected.


Database
information_schema
mysql
performance_schema
sys


### Creating new database where to dump the contents of the csv


- I create a new database that will help me to dump all the content of the csv

In [15]:
%%sql

create database hispasonic;

 * mysql+pymysql://ion:***@localhost/
1 rows affected.


[]

- I verify that the database was created successfully.

In [16]:
%%sql

show databases;

 * mysql+pymysql://ion:***@localhost/
5 rows affected.


Database
hispasonic
information_schema
mysql
performance_schema
sys


### Selecting the database

- to be able to make use of it I need to select it

In [17]:
%%sql

use hispasonic;

 * mysql+pymysql://ion:***@localhost/
0 rows affected.


[]

### Dumping all csv content into the hispasonic database

In [18]:
data.to_sql('hispasonic',engine)

- Checking if all has been properly done.

In [19]:
%%sql

select * from hispasonic
limit 5;

 * mysql+pymysql://ion:***@localhost/
5 rows affected.


index,urgent,buy,change,sell,price,gift,search,repair,parts,synt_brand,description,city,published,expire,date_scrapped,seen,anon_user
1,0,0,0,1,350,0,0,0,0,dreadbox,dreadbox nyx v1,Valencia,2022-05-08 00:00:00,2022-09-30 00:00:00,2022-09-01 00:00:00,742,1073
2,0,0,0,1,350,0,0,0,0,modal electronics,cobalt 5s,Castellón,2022-07-25 00:00:00,2022-09-23 00:00:00,2022-09-01 00:00:00,350,1845
3,0,0,0,1,900,0,0,0,0,moog,moog little phatty stage 2,Baleares,2021-07-10 00:00:00,2022-09-28 00:00:00,2022-09-01 00:00:00,584,635
4,0,0,0,1,1100,0,0,0,0,kawai,kawai mp 11,Sevilla,2022-08-31 00:00:00,2022-10-30 00:00:00,2022-09-01 00:00:00,96,1879
5,0,0,0,1,480,0,0,0,0,mpc,mpc live ssd 250 envío incluido,Albacete,2022-08-25 00:00:00,2022-10-25 00:00:00,2022-09-01 00:00:00,226,438


In [None]:
%%sql # test emergency

drop database hispasonic;