# SQL exercises

There are multiple Python packages to connect with PostgreSQL databases. 


#### SQLalchemy
In this case, we will be using _SQLalchemy_, but just to connect to the database. SQLalchemy provides many more options! Take a look [here](https://www.sqlalchemy.org/).

#### psycopg2
You can also try _psycopg2_ which is easy and direct to use. Further reading [here](https://pypi.org/project/psycopg2/).

## Settings

In [None]:
import pandas as pd
from sqlalchemy import create_engine

You have 2 options:
1. Install PostgreSQL and set up a local database in your computer!
2. Used an already set-up database. For that, you will need some credentials!

In [None]:
# Credentials to connect to DB 
# (this should not be placed inside the code, but in a configuration file!)
host = 'localhost'
port = ...
user = ...
password = ...
database = ...

In [None]:
# Connection to DB
database_uri = 'postgresql://{}:{}@{}:{}/{}'.format(user, password, host, port, database)
con = create_engine(database_uri).connect()

## Let's practice!

### Before calçotada

Before the _calçotada_ starts, there is always a big issue: **music**. That's why a column *fav_music_gndr* was added to choose that music among people's preferences which is more socially accepted in this group.

**1. In order to have some songs from each style, check which genders of music were chosen as favourite.**

In [None]:
# Define query
query = """
SELECT ...
"""

In [None]:
# Fetch data directly with Pandas
pd.read_sql_query(query, con)

**2. Let's see which styles are predominant. Count the number of followers for each music gender and order them to place the most popular ones on top.**

**3. Apparently, there were some people whose names started with "Ma". Just as a curious thing, which are they favourite music genders?**

Hint: *You may want to look for SUBSTRING or LEFT functions*

**4. To create a fair playlist of 200 songs (enough for at least 12 hours), calculate how many songs per music gender we need (maintaining the same distribution as people's preferences).**

**5. Since reggaeton is always controversial in a party, let's see if there is a majority of people whose favourite style is not _reggaeton_.**

Hint: _Check how does CASE WHEN works!_

### After calçotada

Once the _calçotada_ is over, it's time to **settle up expenses**. In this case, let's imagine I paid for everything (which I did not because we are such a nice group that everyone was in charge of sth) and we created a *debt_balance* column with the amount of money each person owes me.

**6. Which is the total amount that I have to be paid back?**

**7. Which is the neighborhood owing more money (as an absolute quantity)?**

**8. And from a debt per person (€/pers) perspective, which is the neighborhood I should send the debt collectors (*cobradors del frac*) first?**

**9. And, by profession, which is the job most defaulter?**

Take into account that if we just take into account the total amount, it is very probable that mathematicians are the ones owing more money just because we were more. Calculate the debt by person, too.

### Time to leave! :(

Now that I know after who I have to go first to get back the money, it's already time to go back home after a complete day full with *calçots*, good music and great company, so I better find a good spot on a car. But wait...!

**10. Could you discover who is my flatmate so that we can go back together in the same car? Thanks a lot!**

**Super well done and thanks for your time and interest!!!**

If you want to practice more PostreSQL exercises, take a look here: https://pgexercises.com/!  

Once finished, it's polite to close the connection to the DB! Too many open connections may cause some trouble in accessing the database, as well as some anger among infrastructure and systems developers!

In [None]:
# Close connection
con.close()