# Tips for mysql

This is the course material for 2025-2026 CS150A in Shanghaitech

Author：Yixi Zhou

**Reference:** Parts of this notebook are adapted from the course *CS639: Data Management for Data Science* at the University of Wisconsin–Madison.  
Source: [GitHub Repository](https://github.com/CS639-Data-Management-for-Data-Science/s25)


## SQL 1

#### Installation requirements

You can either run `pip3 install pandas` on your ssh session or try doing the installation from the notebook.

In [36]:
!pip install sqlalchemy mysql-connector-python




In [37]:
from sqlalchemy import create_engine, text
import pandas as pd

In [38]:
engine = create_engine("mysql+mysqlconnector://root:123456@127.0.0.1:3306/cs150")
conn = engine.connect()

In [None]:
list(conn.execute(text("show tables;")))

### Table creation

#### `Students` table
Let's create `Students` table with columns:
- `sid(int)` - primary key
- `name(text)` - required
- `gpa(float)`

In [13]:
conn.execute(text("""
    create table Students (sid int, name text NOT NULL, gpa float, primary key(sid))
"""))

<sqlalchemy.engine.cursor.CursorResult at 0x10dd0f150>

In [14]:
list(conn.execute(text("show tables;")))

[('Students',)]

### Inserting data

Let's add a student: example: 101, "Tong", 4.0

In [15]:
conn.execute(text("""
    INSERT INTO Students (sid, name, gpa) VALUES (101, "Tong", 4.0)
"""))

<sqlalchemy.engine.cursor.CursorResult at 0x10dd0f7e0>

Add another student.

In [16]:
conn.execute(text("""
    INSERT INTO Students (sid, name, gpa) VALUES (123, "Yang", 3.8)
"""))

<sqlalchemy.engine.cursor.CursorResult at 0x10dd0f8c0>

### Projection aka `SELECT` clause in SQL

Retrieving all or specific columns from a table.

In [17]:
pd.read_sql("SELECT * FROM Students", conn)

Unnamed: 0,sid,name,gpa
0,101,Tong,4.0
1,123,Yang,3.8


### Updating data

Let's change Alice's GPA to 3.7

In [18]:
conn.execute(text("""
    UPDATE Students SET gpa = '4.0' WHERE sid = 101;
"""))

<sqlalchemy.engine.cursor.CursorResult at 0x1187907c0>

In [19]:
pd.read_sql("SELECT * FROM Students", conn)

Unnamed: 0,sid,name,gpa
0,101,Tong,4.0
1,123,Yang,3.8


#### `Courses` table
Let's create accounts `Courses` with columns:
- `cid(int)` - primary key
- `cname(text)` - required
- `credits(int)` - required

In [20]:
conn.execute(text("""
    create table Courses (cid int, cname text NOT NULL, credits int NOT NULL, primary key(cid))
"""))

<sqlalchemy.engine.cursor.CursorResult at 0x118790c20>

In [21]:
list(conn.execute(text("show tables;")))

[('Courses',), ('Students',)]

### Table deletion

What if we wanted to delete a table?

In [22]:
conn.execute(text("drop table Courses"))

<sqlalchemy.engine.cursor.CursorResult at 0x118790e50>

Let's recreate `Courses` table. This time, let's make `cid` type `VARCHAR(255)` instead of int.

In [23]:
conn.execute(text("""
    create table Courses (cid VARCHAR(255) PRIMARY KEY, cname text NOT NULL, credits int NOT NULL)
"""))

<sqlalchemy.engine.cursor.CursorResult at 0x118790fa0>

Let's insert the two courses from the slide example.

In [24]:
conn.execute(text("""
    INSERT INTO Courses (cid, cname, credits) VALUES ("CS150A", "Database", 4)
"""))
conn.execute(text("""
    INSERT INTO Courses (cid, cname, credits) VALUES ("CS181", "Artificial Intelligence", 4)
"""))

<sqlalchemy.engine.cursor.CursorResult at 0x118791010>

#### `Enrolled` table

Let's create `Enrolled` table with columns:
- sid(int) - foreign key
- cid(VARCHAR (255)) - foreign key
- grade(text)

In [25]:
conn.execute(text("""
    create table Enrolled (sid int, cid VARCHAR(255), grade text,
                           foreign key (sid) references Students(sid),
                           foreign key (cid) references Courses(cid))
"""))

<sqlalchemy.engine.cursor.CursorResult at 0x118791160>

In [26]:
list(conn.execute(text("show tables;")))

[('Courses',), ('Enrolled',), ('Students',)]

Let's add the erollments from the slide example.

In [27]:
conn.execute(text("""
    INSERT INTO Enrolled (sid, cid, grade) VALUES (123, "CS181", "A")
"""))

<sqlalchemy.engine.cursor.CursorResult at 0x118791470>

In [28]:
conn.execute(text("""
    INSERT INTO Enrolled (sid, cid, grade) VALUES (101, "CS150A", "A")
"""))

<sqlalchemy.engine.cursor.CursorResult at 0x118791710>

In [29]:
pd.read_sql("SELECT * FROM Courses", conn)

Unnamed: 0,cid,cname,credits
0,CS150A,Database,4
1,CS181,Artificial Intelligence,4


What if we try to enroll a non-existing student?

In [None]:
# doesn't work - no foreign key mapping
# conn.execute(text("""
#     INSERT INTO Enrolled (sid, cid, grade) VALUES (10, "No one", "Nothing")
# """))

Commit the transaction.

In [30]:
conn.commit()

What if we try to delete Student with id 101 from Students table?

In [None]:
# doesn't work - foreign key prevents this
# conn.execute(text("""
#     DELETE FROM Students WHERE sid = 101
# """))

### Load CSVs to MySQL Tables

### Spotify dataset: https://ms.sites.cs.wisc.edu/cs639/data/spotify.zip

In [40]:
base_url = "https://github.com/XanderZhou2022/ShanghaiTech_CS150A_2025fall/raw/refs/heads/main/data/spotify.zip"
df = pd.read_csv(base_url, compression="zip")
df.to_sql("songs", conn, index=False, if_exists="replace")

1556

In [41]:
pd.read_sql("SELECT * FROM songs", conn).head()

Unnamed: 0,Index,Highest Charting Position,Number of Times Charted,Week of Highest Charting,Song Name,Streams,Artist,Artist Followers,Song ID,Genre,...,Danceability,Energy,Loudness,Speechiness,Acousticness,Liveness,Tempo,Duration (ms),Valence,Chord
0,1,1,8,2021-07-23--2021-07-30,Beggin',48633449,Måneskin,3377762,3Wrjm47oTz2sjIgck11l5e,"['indie rock italiano', 'italian pop']",...,0.714,0.8,-4.808,0.0504,0.127,0.359,134.002,211560,0.589,B
1,2,2,3,2021-07-23--2021-07-30,STAY (with Justin Bieber),47248719,The Kid LAROI,2230022,5HCyWlXZPP0y6Gqq8TgA20,['australian hip hop'],...,0.591,0.764,-5.484,0.0483,0.0383,0.103,169.928,141806,0.478,C#/Db
2,3,1,11,2021-06-25--2021-07-02,good 4 u,40162559,Olivia Rodrigo,6266514,4ZtFanR9U6ndgddUvNcjcG,['pop'],...,0.563,0.664,-5.044,0.154,0.335,0.0849,166.928,178147,0.688,A
3,4,3,5,2021-07-02--2021-07-09,Bad Habits,37799456,Ed Sheeran,83293380,6PQ88X9TkUIAUIZJHW2upE,"['pop', 'uk pop']",...,0.808,0.897,-3.712,0.0348,0.0469,0.364,126.026,231041,0.591,B
4,5,5,1,2021-07-23--2021-07-30,INDUSTRY BABY (feat. Jack Harlow),33948454,Lil Nas X,5473565,27NovPIUIRrOZoCHxABJwK,"['lgbtq+ hip hop', 'pop rap']",...,0.736,0.704,-7.409,0.0615,0.0203,0.0501,149.995,212000,0.894,D#/Eb
