# SQL SubQuery
## How to load the dataset(s) to SQL?
Using Python, you can load the large datasets to MySQL database very easily. For that follow the below steps.

- First create a database in your local machine server.

```sql
CREATE DATABASE <database_name>
```

- Next, use Python to load the database

```python
import pandas as pd
from sqlalchemy import create_engine

df = pd.read_csv("file/path/to/the/database.csv")

engine = create_engine("mysql+pymysql://<db_username>:<db_password>@<hostname>/<database_name>")
df.to_sql("<table_name>", con=engine)
```

## Problems 1-6

For problems 1 to 6, use the Olympic dataset. You can get that from [here](https://drive.google.com/file/d/1EGIRBkbQGByJPvCqDtxtTnXv93oGunFp/view?usp=share_link).

**Column description:**
1. ID -> ID of every records to our dataset. It has integer datatype.
2. Name -> Name of the athletes.
3. Sex -> Gender of the athletes.
4. Height -> Height of the athletes
5. Weight -> Weight of the athletes
6. NOC -> In which country, the athletes belong to. This is actually the country code.
7. Year -> In which year, the athlete has participated
8. Sport -> What is the sport name in which the athlete participated.
9. Event -> Event name of the sport
10. Medal -> Which medal the athlege got. If the athlete did not get any medal then this cell is blank.
11. country -> The name of the country.

### Problem 1

Display the names of athletes who won a gold medal in the 2008 Olympics and whose height is greater than the average height of all athletes in the 2008 Olympics.


### Problem 2

Display the names of athletes who won a medal in the sport of basketball in the 2016 Olympics and whose weight is less than the average weight of all athletes who won a medal in the 2016 Olympics.



### Problem 3

Display the names of all athletes who have won a medal in the sport of swimming in both the 2008 and 2016 Olympics.



### Problem 4

Display the names of all countries that have won more than 50 medals in a single year.



### Problem 5

Display the names of all athletes who have won medals in more than one sport in the same year.



### Problem 6

What is the average weight difference between male and female athletes in the Olympics who have won a medal in the same event?

In [None]:
1.  SELECT Name, Height FROM tasks.olympic
    WHERE Medal = 'Gold' AND Year = 2008 AND Height > (SELECT AVG(Height) FROM tasks.olympic WHERE Year = 2008);

2.  SELECT Name, Weight, Medal FROM tasks.olympic
    WHERE Year = 2016 AND Medal IS NOT NULL AND Sport = 'Basketball' AND
    Weight < (SELECT AVG(Weight) FROM tasks.olympic WHERE Year = 2016 AND Medal IS NOT NULL);

3.  SELECT Name FROM tasks.olympic
    WHERE Medal IS NOT NULL AND Sport = 'Swimming' AND Year = 2008
    AND Name IN (
        SELECT Name FROM tasks.olympic WHERE Medal IS NOT NULL
            AND Sport = 'Swimming' AND Year = 2016
    );

4.  SELECT Country, Year, COUNT(Medal) AS Total_Medals FROM tasks.olympic
    WHERE Medal IS NOT NULL GROUP BY Country, Year HAVING COUNT(Medal) > 50;

5.  SELECT DISTINCT a.Name FROM tasks.olympic a
    JOIN tasks.olympic b ON a.Name = b.Name
        AND a.Year = b.Year
        AND a.Sport <> b.Sport
    WHERE a.Medal IS NOT NULL
    AND b.Medal IS NOT NULL;

6.  SELECT AVG(ABS(m.Weight - f.Weight)) AS avg_weight_difference
    FROM tasks.olympic m
    JOIN tasks.olympic f ON m.Event = f.Event
        AND m.Year = f.Year AND m.Medal IS NOT NULL AND f.Medal IS NOT NULL
        AND m.Sex = 'M' AND f.Sex = 'F';

## Problem 7 - 10

Use the health insurance dataset. You can get the dataset as well as the description of the dataset [here](https://www.kaggle.com/datasets/thedevastator/insurance-claim-analysis-demographic-and-health).

### Problem 7

How many patients have claimed more than the average claim amount for patients who are smokers and have at least one child, and belong to the southeast region?


### Problem 8

How many patients have claimed more than the average claim amount for patients who are not smokers and have a BMI greater than the average BMI for patients who have at least one child?



### Problem 9

How many patients have claimed more than the average claim amount for patients who have a BMI greater than the average BMI for patients who are diabetic, have at least one child, and are from the southwest region?


### Problem 10:

What is the difference in the average claim amount between patients who are smokers and patients who are non-smokers, and have the same BMI and number of children?

In [None]:
7.  SELECT COUNT(*) FROM tasks.insurance
    WHERE smoker = 'Yes' AND region = 'southeast' AND children >= 1
    AND claim > (SELECT AVG(claim) FROM tasks.insurance WHERE smoker = 'Yes'
                    AND region = 'southeast')

8.  SELECT COUNT(*) FROM tasks.insurance
    WHERE smoker = 'No' AND claim > (SELECT AVG(claim) FROM tasks.insurance WHERE smoker = 'No')
        AND bmi > (SELECT AVG(bmi) FROM tasks.insurance WHERE children >= 1);

9.  SELECT COUNT(*) FROM tasks.insurance
    WHERE claim > (SELECT AVG(claim) FROM tasks.insurance)
        AND bmi > (SELECT AVG(bmi) FROM tasks.insurance WHERE diabetic = 'Yes'
                    AND children >= 1 AND region = 'southwest');

10. SELECT bmi, children,
        AVG(CASE WHEN smoker = 'Yes' THEN claim END) AS smoker_avg_claim,
        AVG(CASE WHEN smoker = 'No' THEN claim END) AS nonsmoker_avg_claim,
        (AVG(CASE WHEN smoker = 'Yes' THEN claim END) - AVG(CASE WHEN smoker = 'No' THEN claim END)) AS claim_difference
    FROM tasks.insurance GROUP BY bmi, children
    HAVING smoker_avg_claim IS NOT NULL AND nonsmoker_avg_claim IS NOT NULL;