## Loading Scraped Data into MySQL Database
This notebook I loaded cleaned university data into a MySQL database. The data is first read from a CSV file, and then inserted into a MySQL database using SQLAlchemy and pymysql.

Dependencies: Python installed with pandas, pymysql, and SQLAlchemy libraries.

MySQL server running and accessible.

Cleaned data available in universities_clean.csv.

SQL script files mysql_create_table.sql and mysql_upsert.sql for creating and upserting data into the MySQL table.

## 1. Import Libraries

In [2]:
import pandas as pd     # for data manipulation
import pymysql          # pymysql to connect to MySQL
from sqlalchemy import create_engine       # to create the database engine

## 2. Read the Cleaned Data

In [3]:
# read the data into df
df = pd.read_csv('universities_clean.csv')

In [4]:
# display the first 5 rows
df.head()

Unnamed: 0,Country,University,Founded,Type,Enrollment,Link
0,Albania,University of Tirana,1957,Public,35000,https://en.wikipedia.org/wiki/University_of_Ti...
1,Algeria,Constantine University,1978,Public,85000,https://en.wikipedia.org/wiki/List_of_universi...
2,Angola,Agostinho Neto University,1962,Public,29827,https://en.wikipedia.org/wiki/Agostinho_Neto_U...
3,Argentina,University of Buenos Aires,1821,Public,311175,https://en.wikipedia.org/wiki/University_of_Bu...
4,Australia,Monash University,1958,Public,73807,https://en.wikipedia.org/wiki/Monash_University


## 3. Establish Connection to MySQL Database

In [None]:
# define the parameters and set up the connection to the database
conn = pymysql.connect(
    host='localhost',  # Hostname 
    port=3306,  # Port number of the MySQL server
    user='amdariuser',  # Username for the MySQL database
    password='amdariuserpassword',  # Password for the MySQL database
    database='amdaridb'  # Name of the database to connect to
)
  # Create a cursor object to execute SQL queries
cursor = conn.cursor()

## 4. Verify Connection and Database State

In [7]:
# Execute SQL command to show all databases
cursor.execute('SHOW DATABASES;')

3

In [8]:
# check the results to verify the connection
cursor.fetchall()

(('amdaridb',), ('information_schema',), ('performance_schema',))

In [9]:
# Execute SQL command to show all tables in the current database
cursor.execute('SHOW TABLES;')

0

In [10]:
# check the results to verify the connection
cursor.fetchall()

()

## 5. Create Table in MySQL

In [22]:
# create table
create_query_file = open('./sql/mysql_create_table.sql')
create_query = create_query_file.read()
create_query

'CREATE TABLE university(\n    id BIGINT AUTO_INCREMENT NOT NULL,\n    country VARCHAR(255) NOT NULL,\n    name VARCHAR(255) NOT NULL,\n    founded INT NOT NULL,\n    type VARCHAR(255) NOT NULL,\n    enrollment BIGINT NOT NULL,\n    link VARCHAR(255) NOT NULL,\n\n\n    UNIQUE KEY unique_combination (country, name),\n    PRIMARY KEY (id)\n);'

In [23]:
cursor.execute(create_query)

0

## 6. Prepare Data for Insertion

In [31]:
data = list(df.itertuples(index=None, name=None))

In [32]:
data

[('Albania',
  'University of Tirana',
  1957,
  'Public',
  35000,
  'https://en.wikipedia.org/wiki/University_of_Tirana'),
 ('Algeria',
  'Constantine University',
  1978,
  'Public',
  85000,
  'https://en.wikipedia.org/wiki/List_of_universities_in_Algeria'),
 ('Angola',
  'Agostinho Neto University',
  1962,
  'Public',
  29827,
  'https://en.wikipedia.org/wiki/Agostinho_Neto_University'),
 ('Argentina',
  'University of Buenos Aires',
  1821,
  'Public',
  311175,
  'https://en.wikipedia.org/wiki/University_of_Buenos_Aires'),
 ('Australia',
  'Monash University',
  1958,
  'Public',
  73807,
  'https://en.wikipedia.org/wiki/Monash_University'),
 ('Austria',
  'University of Vienna',
  1365,
  'Public',
  91000,
  'https://en.wikipedia.org/wiki/University_of_Vienna'),
 ('Bangladesh',
  'National University, Bangladesh',
  1992,
  'Public',
  2097,
  'https://en.wikipedia.org/wiki/National_University,_Bangladesh'),
 ('Belarus',
  'Belarusian State University',
  1921,
  'Public',
  

## 7. Create SQLAlchemy Engine

In [27]:
engine = create_engine('mysql+pymysql://', creator=lambda:conn)

In [28]:
pd.read_sql('SELECT * FROM university;', con=engine)

Unnamed: 0,id,country,name,founded,type,enrollment,link


## 8. Load Data into the Database
Open and read the SQL script file 'mysql_upsert.sql'. The script contains the SQL command for inseting the scraped data into the table.

In [34]:
# load data into the database

merge_query_file = open('./sql/mysql_upsert.sql')
merge_query = merge_query_file.read()
merge_query

'-- Active: 1720513473051@@127.0.0.1@3306@amdaridb\nINSERT INTO university (country, name, founded, type, enrollment, link)\nVALUES (%s, %s, %s, %s, %s, %s)\nON DUPLICATE KEY UPDATE\n    founded = VALUES(founded),\n    type = VALUES(type),\n    enrollment = VALUES(enrollment),\n    link = VALUES(link);\n'

In [35]:
cursor.executemany(merge_query, data)

69

In [1]:
# save changes to the database
conn.commit()

NameError: name 'conn' is not defined

## 9. Verify Data loading

In [39]:
# read the contents of the university table again to verify that the data has been inserted successfully
pd.read_sql("SELECT * FROM university;", con= engine)

Unnamed: 0,id,country,name,founded,type,enrollment,link
0,1,Albania,University of Tirana,1957,Public,35000,https://en.wikipedia.org/wiki/University_of_Ti...
1,2,Algeria,Constantine University,1978,Public,85000,https://en.wikipedia.org/wiki/List_of_universi...
2,3,Angola,Agostinho Neto University,1962,Public,29827,https://en.wikipedia.org/wiki/Agostinho_Neto_U...
3,4,Argentina,University of Buenos Aires,1821,Public,311175,https://en.wikipedia.org/wiki/University_of_Bu...
4,5,Australia,Monash University,1958,Public,73807,https://en.wikipedia.org/wiki/Monash_University
...,...,...,...,...,...,...,...
64,65,Turkey,Anadolu University,1958,Public,1969,https://en.wikipedia.org/wiki/Anadolu_University
65,66,United Kingdom,Open University,1969,Public,253075,https://en.wikipedia.org/wiki/Open_University
66,67,United States,Texas A&M University,1876,Public,73284,https://en.wikipedia.org/wiki/Texas_A%26M_Univ...
67,68,Uruguay,University of the Republic,1949,Public,144108,https://en.wikipedia.org/wiki/University_of_th...


## Conclusion
I loaded the cleaned data from a CSV file into a MySQL database using pandas, pymysql, and SQLAlchemy. I can efficiently manage and interact with your MySQL database.