# SQL Walkthrough Using Spotify Data

### The Data
The data is coming from Yamac Eren Ay on Kaggle: 
https://www.kaggle.com/datasets/yamaerenay/spotify-dataset-19212020-600k-tracks

We've now got two csv files - artists and tracks 

There's one more thing that must be done before this data can be added to a database. The artists and data_by_artist_o files have a column for genres that includes lists. Similarly, the tracks file has an artists column that includes lists. Relational databases don't work well with lists, instead these should be expanded out to form their own many-to-many relationship tables. 

It should also be noted that to help reduce size for ease of loading into the Postgres database, I deleted any artist with less than 5000 followers from the artists file. Additionally, I deleted any track from before 2011 as well as any track with a popularity less than 50 from the tracks file. 

In [47]:
import pandas as pd
import os

In [176]:
directory = os.getcwd()

artists_f = os.path.join(directory, 'Data', 'artists.csv')
tracks_f = os.path.join(directory, 'Data', 'tracks.csv')

# eval tells pandas to read the column as it's corresponding dtype in python instead of a string
artists = pd.read_csv(artists_f, converters={'genres': eval})
tracks = pd.read_csv(tracks_f, converters={'artists': eval, 'id_artists': eval})

In [177]:
# Reorganize the artists dataframe
artists = artists[['id', 'name', 'genres', 'followers', 'popularity']]

In [178]:
artists.sample(2)

Unnamed: 0,id,name,genres,followers,popularity
16288,2borB1YWJN6giuvN920B1X,Raptori,[],808.0,28
107835,09xvPCA9tBbmnT6DgbYSC7,Kundanpreet,[],1.0,0


In [179]:
tracks.sample(2)

Unnamed: 0,id,name,popularity,duration_ms,explicit,artists,id_artists,release_date,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,time_signature
222828,0dc1kYrVpq3dqn1ubblOXz,Sunshine After The Rain,26,203827,0,[Elkie Brooks],[4Xn6fPXDrarY8LxXWqlE2M],1981-01-01,0.649,0.265,3,-18.213,1,0.032,0.819,0.0,0.0808,0.184,106.523,4
553605,6dkNzag9ucTnvnhJ4QlI3f,I'LL BE - Single Version,29,338867,0,[Mr.Children],[1qma7XhwZotCAucL7NHVLY],1999-05-12,0.551,0.703,3,-6.232,1,0.0297,0.0386,0.00333,0.114,0.429,124.126,4


In [180]:
# Size of the tables
print(artists.shape, tracks.shape)

(1162095, 5) (586672, 20)


In [181]:
# Convert release_date into a datetime variable
tracks['release_date'] = pd.to_datetime(tracks['release_date'])

In [182]:
tracks.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 586672 entries, 0 to 586671
Data columns (total 20 columns):
 #   Column            Non-Null Count   Dtype         
---  ------            --------------   -----         
 0   id                586672 non-null  object        
 1   name              586601 non-null  object        
 2   popularity        586672 non-null  int64         
 3   duration_ms       586672 non-null  int64         
 4   explicit          586672 non-null  int64         
 5   artists           586672 non-null  object        
 6   id_artists        586672 non-null  object        
 7   release_date      586672 non-null  datetime64[ns]
 8   danceability      586672 non-null  float64       
 9   energy            586672 non-null  float64       
 10  key               586672 non-null  int64         
 11  loudness          586672 non-null  float64       
 12  mode              586672 non-null  int64         
 13  speechiness       586672 non-null  float64       
 14  acou

In [183]:
# Only select artists with 5000 or more followers
top_artists = artists[artists['followers'] >= 5000]

In [184]:
# Only select tracks with a popularity of 50 or more, and that has been released since 2011
top_tracks = tracks[(tracks['popularity'] >= 50) & (tracks['release_date'] >= '2011-01-01')]

In [185]:
print(top_artists.shape, top_tracks.shape)

(88609, 5) (41912, 20)


In [186]:
top_artists.sample(5)

Unnamed: 0,id,name,genres,followers,popularity
141207,3i9hP422d2KMjaupTzBNVS,The Spencer Davis Group,"[blues rock, british blues, british invasion, ...",197589.0,56
1157496,2QCy4hrgZS3lYRV5C9MMz2,Twelve24,"[christian pop, german worship]",9809.0,29
600593,4VbAtGhXMJr2AGXa8fkcRu,I-LAND,[k-pop],270927.0,53
138451,2GUxWjR8cNgljddVLEp72u,Ali Akbar Khan,"[dhrupad, hindustani classical, hindustani ins...",53285.0,37
302881,1sgmTpjFhU8xeSlrDGpiSQ,Dariann González,[],17115.0,45


In [187]:
top_tracks.sample(5)

Unnamed: 0,id,name,popularity,duration_ms,explicit,artists,id_artists,release_date,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,time_signature
187194,3CScJ0ttMJ687s3rlLdrnV,Sweet Love,51,200000,0,[Chris Brown],[7bXgB6jMjp9ATFy66eO08Z],2012-07-03,0.631,0.759,4,-4.622,1,0.048,0.00593,0.0,0.235,0.698,139.901,4
156587,4UDI3SSa9jZS9IAcHUR1iV,Ella y El,51,183133,0,[José Luis Perales],[5RwfJb8wxN1fuodcPORVxP],2011-09-05,0.589,0.754,2,-7.523,1,0.0267,0.0546,5.1e-05,0.0542,0.541,93.491,4
427541,4OzmPHAqniJC1yMuDh9lmC,Borracho Gacho,59,165802,1,"[Gera MX, Dharius]","[2hejA1Dkf8v8R0koF44FvW, 66RfYVdftqnuHRicyClgL0]",2019-07-15,0.85,0.634,0,-5.841,1,0.413,0.266,0.0,0.717,0.77,91.007,4
570031,2Aq9fFfBJfBwuPS81x52We,Palenie Zabija,55,145714,1,[Chivas],[1fZAAHNWdSM5gqbi9o5iEA],2020-11-05,0.631,0.798,4,-5.079,0,0.102,0.596,0.000125,0.455,0.779,140.013,4
231175,1YrC8s6yZWw23QxW6rfM9f,Come,62,162267,0,[Jain],[2HHmvvSQ44ePDH7IKVzgK0],2016-10-21,0.818,0.783,1,-6.534,1,0.0444,0.232,0.00886,0.0803,0.819,99.992,4


In [188]:
# Create the artist_genre many-to-many table
artist_genre = top_artists[['id', 'name', 'genres']].copy()
top_artists = top_artists.drop(columns='genres')

In [189]:
# Explode out the genre lists
artist_genre = artist_genre.explode('genres')

In [190]:
artist_genre.head()

Unnamed: 0,id,name,genres
153,7frYUe4C7A42uZqCzD34Y4,Sultaan,desi pop
153,7frYUe4C7A42uZqCzD34Y4,Sultaan,punjabi hip hop
153,7frYUe4C7A42uZqCzD34Y4,Sultaan,punjabi pop
154,6acbdy69rtlv8m9EW31MYl,Phyno,afro dancehall
154,6acbdy69rtlv8m9EW31MYl,Phyno,afropop


In [191]:
top_artists.head()

Unnamed: 0,id,name,followers,popularity
153,7frYUe4C7A42uZqCzD34Y4,Sultaan,53636.0,53
154,6acbdy69rtlv8m9EW31MYl,Phyno,72684.0,51
155,72578usTM6Cj5qWsi471Nc,Raghu Dixit,248568.0,52
156,4rK6HLvoZhLFUTcUhG9WfC,Deacon,5644.0,52
158,7b6Ui7JVaBDEfZB9k6nHL0,The Local Train,701766.0,57


In [192]:
# Now we can create the artist_track many-to-many table
artist_track = top_tracks[['id_artists', 'artists', 'id', 'name']].copy()
artist_track = artist_track.rename(columns = {'id': 'id_tracks', 'name':'tracks'})
top_tracks = top_tracks.drop(columns = ['artists', 'id_artists'])

In [193]:
# Explode out the tracks lists
artist_track = artist_track.explode(['id_artists', 'artists'])

In [206]:
# Have to make sure the artists are in the top_artists table
artist_track = artist_track[artist_track['id_artists'].isin(top_artists['id'])]

In [207]:
artist_track.head(30)

Unnamed: 0,id_artists,artists,id_tracks,tracks
73439,7gOdHgIoIKoe4i9Tta6qdD,Jonas Brothers,4zP7ADsgJgHGY6VzxbNp1z,Year 3000
76404,5ND0mGcL9SKSjWIjPd0xIb,Bowling For Soup,1AHGrKFv3nSCH9K7yg8gOz,Punk Rock 101
80314,7kwEvDE8e7EBGKh5bLczqQ,Anthem Lights,1dKDRs99KkNbtC9AHM7TLm,Best of 2012: Payphone / Call Me Maybe / Wide ...
80317,7kwEvDE8e7EBGKh5bLczqQ,Anthem Lights,65bcYKY0QzlXILxVuWspdT,Best of 2011: Just the Way You Are / For the F...
84076,7gP3bB2nilZXLfPHJhMdvc,Foster The People,7w87IxuO7BDcJ3YUqCyMTT,Pumped Up Kicks
84077,3kVUvbeRdcrqQ3oHk5hPdx,Grouplove,0GO8y8jQk1PkHzS31d699N,Tongue Tied
84078,0TnOYISbd1XYRBk9myaseg,Pitbull,4QNpBfC0zvjKqPJcyqBy9W,"Give Me Everything (feat. Ne-Yo, Afrojack & Na..."
84078,21E3waRsmPlU7jZsS13rcj,Ne-Yo,4QNpBfC0zvjKqPJcyqBy9W,"Give Me Everything (feat. Ne-Yo, Afrojack & Na..."
84078,4D75GcNG95ebPtNvoNVXhz,Afrojack,4QNpBfC0zvjKqPJcyqBy9W,"Give Me Everything (feat. Ne-Yo, Afrojack & Na..."
84078,1ruutHJcECI7cos2n5TqpO,Nayer,4QNpBfC0zvjKqPJcyqBy9W,"Give Me Everything (feat. Ne-Yo, Afrojack & Na..."


In [208]:
top_tracks.head()

Unnamed: 0,id,name,popularity,duration_ms,explicit,release_date,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,time_signature
73439,4zP7ADsgJgHGY6VzxbNp1z,Year 3000,67,201960,0,2019-05-09,0.659,0.857,11,-5.85,1,0.0437,0.0045,2e-06,0.335,0.798,106.965,4
76404,1AHGrKFv3nSCH9K7yg8gOz,Punk Rock 101,52,184322,0,2015-01-27,0.63,0.936,4,-4.576,1,0.084,0.00128,0.0,0.0823,0.733,117.962,4
80314,1dKDRs99KkNbtC9AHM7TLm,Best of 2012: Payphone / Call Me Maybe / Wide ...,55,209134,0,2015-10-16,0.375,0.418,11,-5.999,1,0.036,0.688,0.0,0.371,0.287,136.319,5
80317,65bcYKY0QzlXILxVuWspdT,Best of 2011: Just the Way You Are / For the F...,50,183814,0,2015-10-16,0.418,0.343,4,-7.492,1,0.0339,0.741,0.0,0.113,0.327,121.805,4
84076,7w87IxuO7BDcJ3YUqCyMTT,Pumped Up Kicks,85,239600,0,2011-05-23,0.733,0.71,5,-5.849,0,0.0292,0.145,0.115,0.0956,0.965,127.975,4


In [209]:
# Convert followers to int64
top_artists['followers'] = top_artists['followers'].astype("int64")

In [210]:
top_artists.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 88609 entries, 153 to 1162081
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   id          88609 non-null  object
 1   name        88609 non-null  object
 2   followers   88609 non-null  int64 
 3   popularity  88609 non-null  int64 
dtypes: int64(2), object(2)
memory usage: 3.4+ MB


In [211]:
# Convert DataFrames to csv to load into the Postgres Database 
top_artists.to_csv('Data\\top_artists.csv', sep=',', encoding='utf-8', index=False)
artist_genre.to_csv('Data\\artist_genre.csv', sep=',', encoding='utf-8', index=False)
top_tracks.to_csv('Data\\top_tracks.csv', sep=',', encoding='utf-8', index=False)
artist_track.to_csv('Data\\artist_track.csv', sep=',', encoding='utf-8', index=False)

Now that you have the csv files you will be able to import them into your tables as soon as you have created them. You can do this by right clicking on the table name and go to import/export. Select import at the top, select the filename, format, encoding, whether it has a header, and which columns you want to import.

### QuickDBD

https://app.quickdatabasediagrams.com/#/

Make the ERD. Can also export the PostgreSQL file to create the tables in our database. 

### pgAdmin

Using the Query Editor we can load our SQL file to create all of the tables

pgAdmin can be used to make and view your database. Under Schemas, you can find the tables in a database which show all of their information, including the columns and constraints. 

Remember to import the csv files into the database

### CREATE, DROP, and BACKUP DATABASE
- To create a new database in postgres you can use pgAdmin. Go to Object, Create, and Database. 

- To drop a databse in pgAdmin, right click on the database and select Delete/Drop. 

- If you need to backup a database, then right click on it and select Backup. 

### Connecting to the Database

To start, you'll want to download:
- ipython-sql - to get the %sql and %%sql magic commands
- sqlalchemy - which is a python SQL toolkit
- Psycopg2 - communicates your SQL statements to your postgres database 
    
Next, load the ipython-sql extension and use the magic command to connect to the Postgres database
- The database URL for sqlalchemy is: dialect+driver://username:password@host:port/database 

In [212]:
# Remember to hide password

%load_ext sql

# Load the spotify database on localhost 
%sql postgresql://postgres:J795ufg6!@localhost:5432/Spotify
        
# To hide connection from outputs
%config SqlMagic.displaycon=False

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


## Now We're Ready to Begin

### CREATE TABLE Statement
Used to create a new table in a database.
- The table has a tablename, columns, and table constraints. 
- Each column has a column name, a data type, and a column constraint.
- The data type is what values a column can hold like - INT, FLOAT, DATE, VARCHAR(max lenght), TEXT, etc

#### Constraints
These can be specified when the table is made or altered
- NOT NULL - Ensures that a column cannot have a NULL value.
- UNIQUE - Ensures that all values in the column are different. 
- PRIMARY KEY - A combination of NOT NULL and UNIQUE. A table can only have one primary key, which can be made of multiple fields (composite key). 
- FOREIGN KEY - Uniquely identifies a row in another table, thus links two tables together. A table can have multiple foreign keys. 
    - ON DELETE SET NULL - If something is deleted, the foreign key associated will be set to null.
    - ON DELETE CASCADE - if we delete something the primary key rows associated will be deleted. 
- CHECK - Ensures that all values in a column satisfy a boolean expression condition. 
- DEFAULT - Sets a default value for a column when no value is specified. 
- INDEX - Used to create and retrieve data from the database very quickly. 
- AUTO_INCREMENT - Allows a unique number to be generated automatically when a new record is inserted into a table.

In [None]:
%%sql

CREATE TABLE artist_year (
    artists TEXT NOT NULL UNIQUE,
    year DATE CHECK (year > '12-31-2010'),
    popularity FLOAT DEFAULT 0,
    PRIMARY KEY(artists, year),
    FOREIGN KEY(artists) REFERENCES artists(artists) ON DELETE CASCADE,
    FOREIGN KEY(year) REFERENCES years(year) ON DELETE SET NULL
);
-- This is an example, it wont work because artists.artists does allow duplicates, and years.year is INT not DATE.

### DROP TABLE Statement
Used to drop an existing table. Be careful with this. 
- Alternatively, TRUNCATE TABLE tablename;  - To delete the info in the table. 

In [None]:
%%sql
DROP TABLE artist_year;

###  ALTER TABLE Statement
Used to add, delete, or modify columns in an existing table.
- Also used to add and drop various constraints on an existing table. 
- Here are some examples:

In [None]:
ALTER TABLE artist_year
ADD COLUMN songs_released TEXT;

ALTER TABLE artist_year
DROP COLUMN songs_released;

ALTER TABLE artist_year
MODIFY COLUMN songs_released INT;

ALTER TABLE artist_year
CHANGE songs_released num_songs_released INT;

ALTER TABLE artist_year
ADD FOREIGN KEY (num_songs_released)
REFERENCES artists(count)
ON DELETE SET NULL;

## Querying the Database to Select Information from a Single Table

### SELECT & LIMIT
To look at one or more columns from a table. Use * to represent all of the columns.

- The LIMIT command will determine how many entries are shown, which is important for large datasets
- Leave it out if you want to see all of the entries

In [216]:
%%sql
SELECT *
FROM artist_track
LIMIT 5

5 rows affected.


id_artists,artists,id_tracks,tracks
7gOdHgIoIKoe4i9Tta6qdD,Jonas Brothers,4zP7ADsgJgHGY6VzxbNp1z,Year 3000
5ND0mGcL9SKSjWIjPd0xIb,Bowling For Soup,1AHGrKFv3nSCH9K7yg8gOz,Punk Rock 101
7kwEvDE8e7EBGKh5bLczqQ,Anthem Lights,1dKDRs99KkNbtC9AHM7TLm,Best of 2012: Payphone / Call Me Maybe / Wide Awake / Starships / We Are Young
7kwEvDE8e7EBGKh5bLczqQ,Anthem Lights,65bcYKY0QzlXILxVuWspdT,Best of 2011: Just the Way You Are / For the First Time / Someone Like You / Superbass / Grenade / Without You
7gP3bB2nilZXLfPHJhMdvc,Foster The People,7w87IxuO7BDcJ3YUqCyMTT,Pumped Up Kicks


In [5]:
%sql SELECT * FROM artists LIMIT 5

5 rows affected.


id,followers,popularity,artists,acousticness,danceability,duration_ms,energy,instrumentalness,liveness,loudness,speechiness,tempo,valence,avg_popularity,key,mode,count
6ChsWygqG6IhGEC312KSIQ,5001.0,25.0,Ken Nordine,0.4475,0.5755,191440.0,0.21,1.055e-06,0.178,-21.11,0.4675,140.621,0.4735,8.5,10,1,2
6oWIS1UZp9dR74eYez74vX,5001.0,46.0,Tikkle Me,0.0867,0.55,222404.0,0.884,1.55e-05,0.68,-5.507999999999999,0.0363,145.029,0.5489999999999999,57.0,9,1,2
2HtbGWgFbeFudyoFwc2wHw,5005.0,23.0,JVC Force,0.754,0.7659999999999999,359693.0,0.902,1.94e-06,0.183,-6.36,0.0973,95.883,0.958,34.0,6,1,2
6dz608P8sHylVvVVo5OLx2,5006.0,34.0,Marcus Roberts,0.8640000000000001,0.5255000000000001,260733.5,0.08525,0.7244999999999999,0.13,-23.321,0.04425,101.2765,0.0877999999999999,23.5,8,1,4
5U4QDnlOlmZx9MHV45EoDE,5007.0,56.0,Rowan Atkinson,0.321,0.6890000000000001,170880.0,0.5329999999999999,7.37e-05,0.0958,-14.205,0.0544,98.801,0.898,69.0,6,1,1


### Comments
SQL comments are used if you ever need to explain a SQL statement, or to prevent execution of a statement
- -- Single line comments, anything from it to the end of the line will be ignored
- /* multi line comments */ can be used to comment out multiple lines or part of a line

In [6]:
%%sql
SELECT followers, artists, popularity -- Selecting these three columns
FROM artists
/* WHERE followers > 1000000
ORDER BY followers DESC */ 
LIMIT 5 

5 rows affected.


followers,artists,popularity
5001.0,Ken Nordine,25.0
5001.0,Tikkle Me,46.0
5005.0,JVC Force,23.0
5006.0,Marcus Roberts,34.0
5007.0,Rowan Atkinson,56.0


### WHERE
Used to select records that fulfill some condition 
- Uses =, >, <, >=, <=, <>, IN, BETWEEN, LIKE
- Can be combined with AND, OR, and NOT operators, which can be combined: WHERE NOT, AND NOT, OR NOT

In [7]:
%%sql
SELECT *
FROM artists
WHERE followers > 10000000 AND popularity >= 90
LIMIT 10;

10 rows affected.


id,followers,popularity,artists,acousticness,danceability,duration_ms,energy,instrumentalness,liveness,loudness,speechiness,tempo,valence,avg_popularity,key,mode,count
7bXgB6jMjp9ATFy66eO08Z,13493933.0,90.0,Chris Brown,0.1063397099567099,0.6481688311688311,229846.5367965368,0.6338528138528139,8.140792207792206e-05,0.1766437229437227,-6.103207792207793,0.128717316017316,116.64525108225116,0.4761385281385284,59.73160173160174,1,0,231
3TVXtAsR1Inumwj472S9r4,54416812.0,98.0,Drake,0.1959119117082534,0.6745278310940498,244244.9404990403,0.5805961612284068,0.0043544118234165,0.1830785028790788,-7.461955854126682,0.214185604606526,118.98272360844528,0.3878059500959693,61.91170825335893,1,1,521
1RyvyyTE3xzB2ZywiAwp0i,10099191.0,91.0,Future,0.114301519163763,0.7618188153310107,220344.5087108014,0.5729407665505226,0.0133008522648083,0.1824397212543553,-6.9354773519163775,0.2041815331010454,133.46796167247393,0.3741547038327529,57.13937282229965,1,1,287
55Aa2cqylxrFIXC767Z865,10309562.0,90.0,Lil Wayne,0.0900290188976378,0.6487322834645671,253415.69291338584,0.695254593175853,0.0005392371653543,0.2142771653543304,-6.016716535433076,0.2260091863517061,121.42707874015755,0.4830545931758529,54.0,1,1,381
4O15NlyKLIASxsJ0PrXPfz,11209483.0,91.0,Lil Uzi Vert,0.1256302971428572,0.7768399999999995,210858.49142857143,0.6148971428571424,3.388285714285715e-07,0.1874388571428572,-6.43076571428572,0.2247582857142855,135.26023142857142,0.4541720000000007,57.10857142857144,1,1,350
5K4W6rqBFWDnAN6FQUkS6x,13713751.0,92.0,Kanye West,0.2206124401114207,0.6180111420612819,235493.61838440108,0.6429119777158783,0.0113251635097493,0.2562841225626741,-6.337130919220058,0.2325986072423399,115.1779052924792,0.4701295264623954,58.25626740947075,1,1,359
6LuN9FCkKOj5PcnpouEgny,13728298.0,92.0,Khalid,0.4306016161616161,0.6423535353535353,216609.19191919192,0.5232121212121212,0.0105156443434343,0.1505777777777777,-8.126797979797981,0.1051555555555555,108.6018383838385,0.3797575757575759,70.50505050505049,6,0,99
6KImCVD70vtIoJWnq6nGn3,14086781.0,90.0,Harry Styles,0.2688155769230769,0.5414615384615382,238180.2307692308,0.5867692307692305,0.0168088842307692,0.1562076923076923,-6.38692307692308,0.0358884615384615,114.68699999999995,0.4068961538461537,78.34615384615384,5,1,52
4kYSro6naA4h99UJvo89HB,15762250.0,90.0,Cardi B,0.1572329310344827,0.8252241379310341,211658.7068965517,0.6591896551724136,0.0003559151724137,0.1494999999999999,-5.6886034482758605,0.1934655172413794,130.1865172413793,0.522603448275862,71.6896551724138,1,1,58
0Y5tJX1MQlPlqiwlOH1tJY,16118616.0,94.0,Travis Scott,0.1319922877697843,0.6976834532374102,228400.7769784173,0.6081654676258996,0.0007531797122302,0.2067410071942447,-6.228928057553954,0.1254446043165467,127.89237410071952,0.3342690647482012,67.62589928057554,1,0,139


- The IN operator allows you to specify multiple values in the WHERE clause. 
- The BETWEEN operator allows you to select values within a given range. Values can be numbers, text, or dates. 

In [8]:
%%sql 
SELECT *
FROM artist_genre
WHERE artists IN ('Drake', 'Taylor Swift', 'Ed Sheeran');

11 rows affected.


genres,artists
,Drake
canadian hip hop,Drake
canadian pop,Drake
hip hop,Drake
pop rap,Drake
rap,Drake
toronto rap,Drake
pop,Taylor Swift
post-teen pop,Taylor Swift
pop,Ed Sheeran


In [9]:
%%sql
SELECT *
FROM tracks
WHERE release_date BETWEEN '01-01-2013' AND '01-01-2014'
LIMIT 10;

10 rows affected.


id,name,popularity,duration_ms,explicit,release_date,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,time_signature
1W0Yfa8vX68HsBo5MvYi9l,Satisfaction,50,193360,0,2013-01-01,0.831,0.609,9,-3.616,0,0.135,0.146,0.12,0.0839,0.358,129.976,4
2b41rRRJFLNTWFg4XmpWO5,Slice of Heaven,50,278987,0,2013-01-01,0.828,0.586,7,-10.777,1,0.0416,0.481,0.0,0.6809999999999999,0.975,122.117,3
3573ra6gmsw6o4O7FgCEHt,Juventud,50,221987,0,2013-01-01,0.638,0.473,11,-4.6610000000000005,0,0.0447,0.494,0.0,0.123,0.953,177.864,4
5U3qiJCXKCJ40Nf6tkI7Sq,"I Sold My Bed, But Not My Stereo",50,234333,0,2013-01-01,0.688,0.8079999999999999,1,-5.617999999999999,0,0.0445,0.00126,0.00183,0.0557,0.956,122.024,4
1SUkUVm3R9E4X8Z0KzuTB7,Twinkle Twinkle Little Star - Piano / Nature Instrumental,50,115859,0,2013-01-01,0.731,0.159,0,-19.068,1,0.0433,0.97,0.7440000000000001,0.0942,0.963,120.055,4
1lWfkpHnpgQ5RbgnjxnY4c,Ngiti,50,214160,0,2013-01-01,0.296,0.431,2,-8.266,1,0.0342,0.5760000000000001,0.0,0.342,0.314,69.773,4
2OV2bkuzkV28WhvewBzJ2r,"Twinkle, Twinkle, Little Star",50,136013,0,2013-01-01,0.302,0.385,2,-6.662000000000001,1,0.0293,0.536,0.000212,0.2239999999999999,0.3829999999999999,100.584,4
5YPQIVqREhclwBZUo8Yfrh,Chuck Norris - Radio Edit,50,175251,0,2013-01-01,0.735,0.948,7,-5.502999999999999,1,0.059,0.000339,0.853,0.289,0.794,128.03799999999998,4
2YU3eXyrRnZcRXc7Edu5fX,Lopov,50,205787,0,2013-01-01,0.655,0.8059999999999999,2,-3.609,0,0.027,0.187,1.11e-06,0.304,0.6629999999999999,99.979,4
4kMQVpke2L9tlWOINuAo07,Kamu,50,229467,0,2013-01-01,0.7120000000000001,0.828,10,-4.673,1,0.0519,0.143,0.0,0.299,0.565,123.967,4


- The LIKE operator allows you to search for a specified pattern in a column by using wildcards. 
- Wildcards are used to substitute one or more characters in a string. 
- Two wildcards are often used and can be used in combination.
- %  Represents zero, one, or multiple characters.
- _  Represents a single character. 

In [10]:
%%sql
SELECT *
FROM artists
WHERE artists LIKE 'Lil%'
LIMIT 10;

10 rows affected.


id,followers,popularity,artists,acousticness,danceability,duration_ms,energy,instrumentalness,liveness,loudness,speechiness,tempo,valence,avg_popularity,key,mode,count
6yhjowlmZr6QzRMDy0pJvv,5097.0,32.0,Lil' Flex,0.478,0.773,427880.0,0.585,0.0238,0.254,-8.117,0.396,144.063,0.4029999999999999,37.0,2,1,1
5TKKPpY9zr2qrz3JM3Vawq,5428.0,61.0,Lillias White,0.354,0.594875,129598.25,0.5606249999999999,1.907375e-05,0.173475,-9.76875,0.0853875,124.383875,0.4920000000000001,57.75,0,1,8
3kcD3ZRH9oUwWSvQNVvMli,7442.0,40.0,Lily Potter,0.975,0.7140000000000001,152920.0,0.131,0.904,0.354,-21.958,0.0556,113.032,0.685,68.0,5,1,2
7pmErpU2pOjeGequFZxDnN,9784.0,49.0,Lil Boodang,0.15330744,0.8676,104380.6,0.6460000000000001,0.192190262,0.16114,-5.9579999999999975,0.19064,117.3332,0.4686,23.2,1,1,10
64MIiJ3jBhIAnFIgrJK2ls,11032.0,43.0,Lil Wil,0.0407,0.8905,220719.5,0.5045000000000001,0.0,0.1054,-8.871,0.1159999999999999,112.5,0.8845,48.5,9,1,4
6V4zyNV40Zyu5MGlhD0i8g,11460.0,49.0,Lil' Cease,0.1537499999999999,0.8005,246240.0,0.6990000000000001,0.0,0.1077,-6.9410000000000025,0.2815,96.9275,0.6925,52.0,11,0,4
6uft3KriUGneffNm6jyAug,14239.0,50.0,Lil Ronny Motha F,0.0452,0.975,152214.0,0.3929999999999999,0.0,0.103,-10.852,0.439,120.001,0.288,60.0,0,1,1
5P46zciG5gm8IALoKTLYGb,17921.0,55.0,Lil Kapow,0.0566,0.726,83131.0,0.298,1.1e-05,0.132,-13.879,0.695,150.036,0.486,68.0,1,1,2
3ykdYhlVieu2rlYCi5HZnT,19597.0,34.0,Lil Blacky,0.00162,0.586,269600.0,0.599,0.0,0.416,-6.869,0.0838,87.53,0.297,41.0,5,1,2
61mwtI8FCpYa9G2NuThRhI,32147.0,43.0,Lil' Zane,0.00814,0.782,257307.0,0.675,0.0,0.0978,-5.096,0.0706,106.956,0.6629999999999999,39.0,8,0,2


### ORDER BY
Allows for sorting the results by a specified column
- Sort by ascending (default) - ASC, and descending - DESC
- Can order by multiple columns if there are two results with the same value

In [11]:
%%sql
SELECT *
FROM artists
WHERE key = 10
ORDER BY followers DESC, popularity DESC
LIMIT 10;

10 rows affected.


id,followers,popularity,artists,acousticness,danceability,duration_ms,energy,instrumentalness,liveness,loudness,speechiness,tempo,valence,avg_popularity,key,mode,count
7n2wHs1TKAczGzO7Dd2rGr,32419313.0,89.0,Shawn Mendes,0.3077230000000002,0.6154833333333335,200855.01666666663,0.6144333333333332,2.5810000000000004e-06,0.12447,-6.256866666666664,0.0776,122.09106666666666,0.5419666666666667,67.51666666666667,10,1,60
2DlGxzQSjYe5N6G9nkYghR,9805610.0,80.0,Jennifer Lopez,0.1295096491228069,0.7257368421052629,240696.15789473685,0.7767368421052634,0.0131239135087719,0.1817385964912281,-5.288736842105265,0.0966912280701754,108.85012280701756,0.6331578947368419,55.912280701754376,10,0,57
3wyVrVrFCkukjdVIdirGVY,7632399.0,74.0,Lil Pump,0.1376866666666666,0.8094166666666666,156283.58333333337,0.6426666666666666,1.7166666666666668e-07,0.23875,-5.306916666666667,0.1759833333333333,131.56,0.541,67.08333333333333,10,1,12
46SHBwWsqBkxI7EeeBEQG7,6912157.0,83.0,Kodak Black,0.1747053333333333,0.7762952380952384,195596.0,0.6092761904761904,0.0006245443809523,0.1802609523809524,-6.660885714285714,0.184544761904762,134.2634285714286,0.4316304761904761,57.12380952380953,10,0,105
3AQRLZ9PuTAozP28Skbq8V,6548759.0,78.0,The Script,0.2343376000000001,0.49244,239932.92,0.65604,1.024e-07,0.105984,-5.760919999999999,0.034604,122.26291999999998,0.418224,62.36,10,1,25
6dJeKm76NjfXBNTpHmOhfO,6398084.0,71.0,Selena Gomez & The Scene,0.0638214285714285,0.6555714285714286,196748.4285714285,0.8305714285714287,2.601428571428571e-06,0.1748285714285714,-4.118285714285714,0.0723285714285714,124.80500000000004,0.7032857142857143,61.71428571428572,10,0,14
5LHRHt1k9lMyONurDHEdrp,6200496.0,86.0,Tyga,0.10415765625,0.8114062499999998,209944.890625,0.6417968749999999,0.0001594403125,0.141940625,-5.842578125,0.1654734374999999,119.28403125000004,0.4881093749999999,65.125,10,1,64
2o5jDhtHVPhrJdv3cEQ99Z,5527032.0,87.0,Tiësto,0.0887491176470588,0.6686470588235293,247276.29411764705,0.8029999999999999,0.1311866035294117,0.1078882352941176,-5.597588235294117,0.0726588235294117,127.2470588235294,0.3141529411764705,67.0,10,0,17
7FsRH5bw8iWpSbMX1G7xf1,5066723.0,79.0,Joan Sebastian,0.5742586206896556,0.6725603448275861,177054.7068965517,0.5146249999999999,0.0012641731896551,0.1522370689655172,-7.703556034482758,0.0648823275862069,127.99415517241376,0.8305344827586209,48.26724137931034,10,1,232
6deZN1bslXzeGvOLaLMOIF,5031348.0,79.0,Nickelback,0.0048482603773584,0.5332264150943395,226398.50943396223,0.8773773584905658,0.0264304662264151,0.163,-4.280037735849056,0.0512320754716981,137.8632452830188,0.565509433962264,55.11320754716981,10,1,106


### Aggregate Functions
Can be used on a column in a table to perform some additional computation and return a single value.
- MIN()  Returns the smallest value of the selected column.
- MAX()  Returns the largest value of the selected column.
- COUNT()  Returns the number of rows that matches a specified criteria - NULL not counted. 
- AVG()  Returns the average value of a numeric column - NULL values ignored.
- SUM()  Returns the total sum of a numeric column - NULL values ignored. 

In [12]:
%%sql 
SELECT MIN(popularity) AS Min_Popularity, MAX(popularity) AS Max_Popularity, 
    COUNT(artists) AS Number_of_Artists, AVG(followers) AS Average_Followers, SUM(followers) AS Total_Followers
FROM artists;  -- Sum is not applicable here since the same spotify account could be following multiple artists

1 rows affected.


min_popularity,max_popularity,number_of_artists,average_followers,total_followers
0.0,100.0,15459,496736.6918946892,7679052520.0


We could use aggregate functions to compare the average danceability, loudness, and tempo for all artists vs artists with a popularity above 80. We can also use count to determine how many artists are in each category. 

In [13]:
%%sql
SELECT AVG(danceability) AS Average_Danceability, AVG(loudness) AS Averate_Loudness, 
    AVG(tempo) AS Average_Tempo, COUNT(artists) AS Number_of_Artists
FROM artists
WHERE popularity > 80;

1 rows affected.


average_danceability,averate_loudness,average_tempo,number_of_artists
0.6554941228852029,-6.494001312870927,122.40567103225116,319


In [14]:
%%sql
SELECT AVG(danceability) AS Average_Danceability, AVG(loudness) AS Averate_Loudness, 
    AVG(tempo) AS Average_Tempo, COUNT(artists) AS Number_of_Artists
FROM artists;

1 rows affected.


average_danceability,averate_loudness,average_tempo,number_of_artists
0.5790440379894264,-9.30851937049459,119.46865009906998,15459


### Aliases
Can be used to give a table or a column a temporary name. This can make them more readable, and only exists during that query. 
- To do this you write the column or table and then write AS new_name
- Can combine multiple columns using CONCAT(column, column2) AS new_name
- Can also add the table name in front of the column name to make them more clear when querying mult tables - table.column

In [15]:
%%sql
SELECT COUNT(artists) AS Number_of_Artists
FROM artists AS Artist_Profile;

1 rows affected.


number_of_artists
15459


In [16]:
%%sql
SELECT a.artists, a.popularity
FROM artists AS a
WHERE popularity > 85
LIMIT 10;

10 rows affected.


artists,popularity
Sech,89.0
Chris Brown,90.0
Drake,98.0
Giveon,91.0
24kGoldn,87.0
iann dior,87.0
Lenny Tavárez,87.0
Olivia Rodrigo,88.0
The Kid LAROI,90.0
Morgan Wallen,88.0


In [17]:
%%sql
SELECT CONCAT(artists, ' ', genres) AS Artist_and_Genre
FROM artist_genre
LIMIT 10;  -- Not really applicable here, but it's an example

10 rows affected.


artist_and_genre
Ken Nordine beat poetry
Tikkle Me swedish electropop
Tikkle Me swedish synthpop
JVC Force electro
Marcus Roberts jazz piano
Marcus Roberts stride
Rowan Atkinson british comedy
The Capitols motown
The Choir Of Westminster Abbey british choir
The Choir Of Westminster Abbey choral


### GROUP BY
Groups rows that have the same values into summary rows, like average loudness per genre.
- It's often used with aggregate functions (MIN, MAX, COUNT, AVG, SUM) to group the result-set by one or more columns.

In [18]:
%%sql
SELECT COUNT(artists), AVG(duration_ms), MAX(popularity), mode
FROM artists
GROUP BY mode;

2 rows affected.


count,avg,max,mode
3742,247827.1748024725,98.0,0
11717,239482.199593585,100.0,1


### HAVING Clause 
Added because the WHERE keyword can not take aggregate functions

In [19]:
%%sql
SELECT COUNT(artists), AVG(duration_ms), MAX(popularity), key
FROM artists
GROUP BY key
HAVING COUNT(artists) > 100
ORDER BY COUNT(artists) DESC
LIMIT 15;

12 rows affected.


count,avg,max,key
2332,240889.44867691715,98.0,7
1599,244028.38339516497,90.0,2
1563,239749.6449654486,90.0,9
1531,238853.09935869923,100.0,0
1381,246245.64259628163,98.0,1
1354,241163.5484653228,95.0,11
1353,242333.65425014848,93.0,5
1044,243755.62955926804,92.0,4
1040,239208.76450523568,96.0,6
1033,242605.53364151256,89.0,10


## Querying Information from Multiple Tables and Combining the Results
Thus far we have only queried information from one table at a time. But there are multiple tables in a dataset, so how do we query information from multiple tables and combine the results?
- To start, you can use information from one table to search in another, using the same column.
- You can also combine rows and columns from two or more tables, based on a shared column.
- Last, you can combine the result-set of tables if they have the same number of columns, similar datatypes, and are in the same order.

### NESTED QUERIES
Uses the WHERE and IN with a query from one table in order to use that information to search the same column in another table.

In [20]:
%%sql
SELECT artists, followers, popularity
FROM artists
WHERE artists IN (
    SELECT artists
    FROM artist_genre
    WHERE genres = 'pop'
)
ORDER BY followers DESC
LIMIT 15;

15 rows affected.


artists,followers,popularity
Ed Sheeran,78900234.0,92.0
Ariana Grande,61301006.0,95.0
Justin Bieber,44606973.0,100.0
Rihanna,42244011.0,92.0
Billie Eilish,41792604.0,92.0
Taylor Swift,38869193.0,98.0
Shawn Mendes,32419313.0,89.0
The Weeknd,31308207.0,96.0
Maroon 5,30291109.0,91.0
Marshmello,30244604.0,88.0


### JOIN Clause 
Used to combine rows from two or more tables, based on a shared column

- (INNER) JOIN - Returns records that have matching values in both tables.
- LEFT (OUTER) JOIN - Returns all records from the left table, and the matched records from the right table.
- RIGHT (OUTER) JOIN - Returns all records from the right table, and the matched records from the left table.
- FULL (OUTER) JOIN - Returns all records when there is a match in either left or right table. 

In [21]:
%%sql
SELECT artists.artists, artists.followers, artists.popularity, artist_genre.genres
FROM artists
JOIN artist_genre
ON artists.artists=artist_genre.artists
WHERE artists.artists IN ('Ed Sheeran', 'Taylor Swift', 'Justin Bieber')
LIMIT 10;

7 rows affected.


artists,followers,popularity,genres
Taylor Swift,38869193.0,98.0,pop
Taylor Swift,38869193.0,98.0,post-teen pop
Justin Bieber,44606973.0,100.0,canadian pop
Justin Bieber,44606973.0,100.0,pop
Justin Bieber,44606973.0,100.0,post-teen pop
Ed Sheeran,78900234.0,92.0,pop
Ed Sheeran,78900234.0,92.0,uk pop


In [22]:
%%sql
SELECT artists.artists, artists.followers, artist_genre.genres, artist_track.name
FROM ((artists
JOIN artist_genre ON artists.artists=artist_genre.artists)
      JOIN artist_track ON artists.artists=artist_track.artists)
WHERE artists.artists LIKE 'Ed Sheeran'
LIMIT 10;

10 rows affected.


artists,followers,genres,name
Ed Sheeran,78900234.0,uk pop,Give Me Love
Ed Sheeran,78900234.0,pop,Give Me Love
Ed Sheeran,78900234.0,uk pop,Lego House
Ed Sheeran,78900234.0,pop,Lego House
Ed Sheeran,78900234.0,uk pop,Kiss Me
Ed Sheeran,78900234.0,pop,Kiss Me
Ed Sheeran,78900234.0,uk pop,Reuf
Ed Sheeran,78900234.0,pop,Reuf
Ed Sheeran,78900234.0,uk pop,Grade 8
Ed Sheeran,78900234.0,pop,Grade 8


In [23]:
%%sql
SELECT genres.genres, genres.danceability, genres.duration_ms, artist_genre.artists
FROM genres
FULL JOIN artist_genre
ON genres.genres=artist_genre.genres
WHERE genres.popularity < 50 AND genres.tempo < 100
ORDER BY genres.genres ASC
LIMIT 15;

15 rows affected.


genres,danceability,duration_ms,artists
21st century classical,0.1628833333333333,160297.66666666663,Thomas Adès
21st century classical,0.1628833333333333,160297.66666666663,Eric Whitacre
21st century classical,0.1628833333333333,160297.66666666663,Ola Gjeilo
action rock,0.412,198400.0,Jay Reatard
afghan traditional,0.4403333333333333,278581.33333333326,Ahmad Zahir
ambeat,0.748507326007326,234658.49450549448,Uyama Hiroto
ambeat,0.748507326007326,234658.49450549448,DJ Okawari
ambeat,0.748507326007326,234658.49450549448,Freddie Joachim
ambeat,0.748507326007326,234658.49450549448,Shing02
ambeat,0.748507326007326,234658.49450549448,SoulChef


### UNION Operator
Used to combine the result-set of two or more SELECT statements.
- Must have the same number of columns and similar data types, and be in the same order.

In [24]:
%%sql
SELECT genres AS genres_or_year, popularity, energy, loudness, speechiness, valence, key
FROM genres
WHERE key=10 OR popularity > 90
UNION
SELECT year, popularity, energy, loudness, speechiness, valence, key
FROM years
WHERE key=10 OR popularity > 60
ORDER BY genres_or_year ASC
LIMIT 15;

15 rows affected.


genres_or_year,popularity,energy,loudness,speechiness,valence,key
1922,0.1408450704225352,0.2378153521126759,-19.275281690140844,0.1166549295774648,0.5355492957746479,10
1924,0.6610169491525424,0.3443466101694912,-14.231343220338989,0.0920894067796609,0.6637254237288139,10
1936,5.080909090909091,0.3083886092727274,-14.612999090909067,0.2790293636363637,0.5640635454545455,10
1944,3.1928191489361697,0.2534414494680852,-14.58205585106382,0.1732832446808509,0.5406954787234041,10
2017,63.26355421686747,0.5904210208835337,-8.31262951807228,0.1105364959839356,0.4164763112449793,1
2018,63.29624346172135,0.6024346220161672,-7.168785068949124,0.1271755587256302,0.4479212743699474,1
2019,65.25654181631606,0.5932240360184717,-7.722191893278596,0.1210433555669573,0.4588176295536167,1
2020,64.30197044334976,0.6312316354679793,-6.595066995073878,0.1413836945812805,0.5010478078817729,1
abstract beats,58.93333333333332,0.5277999999999999,-7.918000000000001,0.1163733333333333,0.4935066666666666,10
alternative pop rock,54.08095238095239,0.7202441558441559,-4.751005194805195,0.0687059307359307,0.6437601731601732,10


## Change the Contents of a Table

### INSERT INTO
Inserts new records into a table. May need to insert some values as null when creating your table if a value that's referenced hasn't been made yet. 

In [None]:
%%sql
INSERT INTO artists (id, followers, popularity, artists)  -- Adds to the specified columns
VALUES ('new_artist_id', 10000, 70.0, 'New Artist');

Can also copy data from one table and insert it into another table. 
- This requires that the data types in source and target tables match. 
- The existing records are unaffected. 

In [None]:
%%sql
INSERT INTO small_artists
SELECT * FROM artists
WHERE followers < 100000;

### UPDATE
Used to modify existing records in a table. 
- It's important to be careful about which records are updated in the WHERE clause. If this is missing, then all the values will be updated.

In [None]:
%%sql
UPDATE artists
SET followers=15000, popularity=75.0
WHERE artists='New Artist';

### DELETE
Used to delete existing records in a table. 

In [None]:
%%sql
DELETE FROM artists;  -- This will delete all of the records in the artists table, but keep the table intact

In [None]:
%%
DELETE FROM artists
WHERE artists='New Artist';