![image info](.\head_image.jpg)

## 🎯 Abstract

Hello! We are Lucas Galdino de Camargo and Carlos Danilo Tomé.

We are students from IFSP-Campinas (https://portal.cmp.ifsp.edu.br/) on the Post Graduation: Specialization in Data Science (learn more about it on: https://portal.cmp.ifsp.edu.br/index.php/pos-graduacao/especializacao-em-ciencia-de-dados), and this notebook was made as an activity of the course.

Our goal is to use some tools from AWS throughout the course (in this case we are gonna use the Amazon RDS), as we are going to use SQL to access and to analyse the data from the database we've created.


---


###  ⚽ Motivation

Relational databases are essential to companies, making SQL an essential tool to anyone who aims to work with data.

This notebook was made to discuss a little bit about relational databases and to practice data analysis using SQL.

We are using a database of soccer, including lists of clubs, players, matches, and so on.


###  Topics



1. Cloud Infrastructure
    - 1.1 Database
    - 1.2 Uploading the tables to the database
    - 1.3 DER (Diagram Entity Relationship)
    
2. Exploratory Data Analysis

### 1. Cloud Infrastructure

The relational database we're gonna use in this notebook was created by this means: RDS from AWS, which is a relational database service based on the cloud, being easily configured, scalable, redimensionable, and offering a lot more facilities to implementing and managing a relational database with security and high performance (you can see more about Amazon RDS on https://aws.amazon.com/pt/rds/). Let me explain more the tools there are using in this project: 


#### Database System: RDS - MySQL

We create a relational database in RDS, one product of pool of tools availables in AWS Cloud. We choose **MySQL**managed database system because there are on of the mosts used system in the relational database contexts. Therefore this database is hosting in "us-east-1" zone and is open to connect with this credentials:

        - host='soccer-database.csdmmiwixxfg.us-east-1.rds.amazonaws.com',
        - port=3306,
        - user='admin',
        - passwd='bancodedados12',
        - db='europeansoccerDB',
        - charset='utf8mb4'
        

You can see more here: https://aws.amazon.com/pt/getting-started/hands-on/create-mysql-db/ . 

#### SQL Client : DBeaver

In order to upload, manipulate and managed data we use DBeaver a SQL client software application and a database administration tool, that allow us to connect with diferent kinds of Relational Database System like: MySQL, SQLite, MariaDB, PostSQL and more.

You can see more here: https://dbeaver.io/ . 

#### Python Connect

Finally, we connect the database with this jupyter notebook or kaggle notebook (Depending where you read this, Kaggle or GitHub) with pymsql an interface for connecting to a MySQL database server from Python.

![image info](.\infra_cloud.png)


#### 1.1 Database

The data is coming from a database hosting in Kaggle platform with more than 25,000 matches in european leagues of football. The original database is a relational database in sqlite tool, so, for this case we transform the data in a .csv file to made all steps in the construct of a MySQL database in RDS. 

The data is distributed in 7 tables contain diferent kinds of records involved in a Football match:

- **Match** ; contain information about data, location, league, players and odds in a some bet house.
- **Country** ; contain information about country.
- **League** ; contain information about League.
- **Team** ; contain information about the Team like name and foreign keys.
- **Player** ; contain information about players physical stats.
- **Team_Attributes** ; contain information about players stats and attributes in FIFA game.
- **Player_Attributes** ; contain information about team stats and attributes in FIFA game.


**Dataset**: https://www.kaggle.com/hugomathien/soccer


#### 1.2 Uploading the tables to the database

Once we have created the database, we can connect to it, in order to:


   - Create our tables in the DB
We use SQL language to create tables on the database, let's show some examples:

       ```CREATE TABLE `Player_Attributes` (
      `id` int DEFAULT NULL,
      `player_fifa_api_id` int DEFAULT NULL,
      `player_api_id` int DEFAULT NULL,
      `date` varchar(16) DEFAULT NULL,
      `overall_rating` int DEFAULT NULL,
      `potential` int DEFAULT NULL,
      `preferred_foot` varchar(5) DEFAULT NULL,
      `attacking_work_rate` varchar(6) DEFAULT NULL,
      `defensive_work_rate` varchar(6) DEFAULT NULL,
      `crossing` int DEFAULT NULL,
      `finishing` int DEFAULT NULL,
      `heading_accuracy` int DEFAULT NULL,
      `short_passing` int DEFAULT NULL,
      `volleys` int DEFAULT NULL,
      `dribbling` int DEFAULT NULL,
      `curve` int DEFAULT NULL,
      `free_kick_accuracy` int DEFAULT NULL,
      `long_passing` int DEFAULT NULL,
      `ball_control` int DEFAULT NULL,
      `acceleration` int DEFAULT NULL,
      `sprint_speed` int DEFAULT NULL,
      `agility` int DEFAULT NULL,
      `reactions` int DEFAULT NULL,
      `balance` int DEFAULT NULL,
      `shot_power` int DEFAULT NULL,
      `jumping` int DEFAULT NULL,
      `stamina` int DEFAULT NULL,
      `strength` int DEFAULT NULL,
      `long_shots` int DEFAULT NULL,
      `aggression` int DEFAULT NULL,
      `interceptions` int DEFAULT NULL,
      `positioning` int DEFAULT NULL,
      `vision` int DEFAULT NULL,
      `penalties` int DEFAULT NULL,
      `marking` int DEFAULT NULL,
      `standing_tackle` int DEFAULT NULL,
      `sliding_tackle` int DEFAULT NULL,
      `gk_diving` int DEFAULT NULL,
      `gk_handling` int DEFAULT NULL,
      `gk_kicking` int DEFAULT NULL,
      `gk_positioning` int DEFAULT NULL,
      `gk_reflexes` int DEFAULT NULL
        ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci ```

---

   ``` CREATE TABLE `League` (
        `id` int DEFAULT NULL,
        `country_id` int DEFAULT NULL,
        `name` varchar(24) DEFAULT NULL
        ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci ```

---
 
   ``` CREATE TABLE `Country` (
      `id` int DEFAULT NULL,
      `name` text
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci```
       
---   

```CREATE TABLE `Team_Attributes` (
  `id` int DEFAULT NULL,
  `team_fifa_api_id` int DEFAULT NULL,
  `team_api_id` int DEFAULT NULL,
  `date` varchar(16) DEFAULT NULL,
  `buildUpPlaySpeed` int DEFAULT NULL,
  `buildUpPlaySpeedClass` varchar(8) DEFAULT NULL,
  `buildUpPlayDribbling` int DEFAULT NULL,
  `buildUpPlayDribblingClass` varchar(6) DEFAULT NULL,
  `buildUpPlayPassing` int DEFAULT NULL,
  `buildUpPlayPassingClass` varchar(5) DEFAULT NULL,
  `buildUpPlayPositioningClass` varchar(9) DEFAULT NULL,
  `chanceCreationPassing` int DEFAULT NULL,
  `chanceCreationPassingClass` varchar(6) DEFAULT NULL,
  `chanceCreationCrossing` int DEFAULT NULL,
  `chanceCreationCrossingClass` varchar(6) DEFAULT NULL,
  `chanceCreationShooting` int DEFAULT NULL,
  `chanceCreationShootingClass` varchar(6) DEFAULT NULL,
  `chanceCreationPositioningClass` varchar(9) DEFAULT NULL,
  `defencePressure` int DEFAULT NULL,
  `defencePressureClass` varchar(6) DEFAULT NULL,
  `defenceAggression` int DEFAULT NULL,
  `defenceAggressionClass` varchar(7) DEFAULT NULL,
  `defenceTeamWidth` int DEFAULT NULL,
  `defenceTeamWidthClass` varchar(6) DEFAULT NULL,
  `defenceDefenderLineClass` varchar(12) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci```

---

   - insert/load the data to the tables

```
LOAD DATA LOCAL INFILE "./dados/Country.csv" 
INTO TABLE users 
FIELDS TERMINATED BY "," 
LINES TERMINATED BY "\n" 
IGNORE 1 ROWS 
(id, country_id, name)
```


---   
   

#### 1.3 DER (Diagram Entity Relationship)

DER is the Graphic representation of our relational database, showing the relationships between the tables (lern more about it on: https://www.smartdraw.com/entity-relationship-diagram/).


![image info](.\DER.png)


## 2. Data Exploratory Analysis

### SQL queries

Our goal here is to briefly cover at least one of each type of SQL queries from the following list:
   - Junctions (inner join, left join, right join, ...) - you can learn more about junction queries on: https://pt.stackoverflow.com/questions/6441/qual-%C3%A9-a-diferen%C3%A7a-entre-inner-join-e-outer-join
   - Aggregations (group by, having, max, min, avg, sum, count, ...) - you can learn more about aggregation functions on: https://mode.com/sql-tutorial/sql-aggregate-functions/
   - Subqueries and functions (not in, when, date_format, concat, ...) - you can learn more about sql subqueries on: https://www.w3resource.com/sql/subqueries/understanding-sql-subqueries.php
   - Ordenations (order by, limit) - you can learn more about ordenations on: https://www.w3schools.com/sql/sql_top.asp
   - Analytical funcions (partition, rank, ...) - you can learn more about these sql functions on: https://dev.mysql.com/doc/refman/8.0/en/window-function-descriptions.html

In [1]:
import pymysql
import pandas as pd

# ! pip install pymysql

In [2]:
conn = pymysql.connect(
        host='soccer-database.csdmmiwixxfg.us-east-1.rds.amazonaws.com',
        port=3306,
        user='admin',
        passwd='bancodedados12',
        db='europeansoccerDB',
        charset='utf8mb4'
)

In [3]:
cursor = conn.cursor()

In [4]:
# Write a sql query

query = "SELECT * FROM Country"

In [5]:
frame           = pd.read_sql(query, conn)
frame.head()

Unnamed: 0,id,name
0,1,Belgium
1,1729,England
2,4769,France
3,7809,Germany
4,10257,Italy
