# Setup

## Install ucimlrepo package to load the dataset

In [1]:
!pip install ucimlrepo

Collecting ucimlrepo
  Downloading ucimlrepo-0.0.3-py3-none-any.whl (7.0 kB)
Installing collected packages: ucimlrepo
Successfully installed ucimlrepo-0.0.3


## Import Libraries

In [2]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np

## Data Loading
- dataset link: [click here!](https://archive.ics.uci.edu/dataset/165/concrete+compressive+strength)
- ucirepo has a fetch method that returns a pandas dataframe of the dataset called fetch_ucirepo(id)
- to get the dataset id
  * go to the dataset link
  * click the "import in python" button and copy the code in python notebook
  * or check the url after the word "dataset/"

- ucirepo returns the dataset at data object
- inside the data object listed the features, targets, and original

In [16]:
from ucimlrepo import fetch_ucirepo
concrete_compressive_strength = fetch_ucirepo(id=165)

concrete_data = concrete_compressive_strength.data.original
concrete_data

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Concrete compressive strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.30
...,...,...,...,...,...,...,...,...,...
1025,276.4,116.0,90.3,179.6,8.9,870.1,768.3,28,44.28
1026,322.2,0.0,115.6,196.0,10.4,817.9,813.4,28,31.18
1027,148.5,139.4,108.6,192.7,6.1,892.4,780.0,28,23.70
1028,159.1,186.7,0.0,175.6,11.3,989.6,788.9,28,32.77


### Split the features and label

In [13]:
X = concrete_compressive_strength.data.features
y = concrete_compressive_strength.data.targets

# Data Understanding

Download dataset
- dataset link: [click here!](https://archive.ics.uci.edu/dataset/165/concrete+compressive+strength)

## EDA - Variable Description


Variable Description
* Features : the component used for the concrete mixture
    1. Cement (component 1): Numerical value representing the amount of cement in kilograms in a cubic meter mixture.
    2. Blast Furnace Slag (component 2): Numerical value representing the amount of blast furnace slag in kilograms in a cubic meter mixture.
    3. Fly Ash (component 3): Numerical value representing the amount of fly ash in kilograms in a cubic meter mixture.
    4. Water (component 4): Numerical value representing the amount of water in kilograms in a cubic meter mixture.
    5. Superplasticizer (component 5): Numerical value representing the amount of superplasticizer in kilograms in a cubic meter mixture.
    6. Coarse Aggregate (component 6): Numerical value representing the amount of coarse aggregate in kilograms in a cubic meter mixture.
    7. Fine Aggregate (component 7): Numerical value representing the amount of fine aggregate in kilograms in a cubic meter mixture.
    8. Age: Numerical value representing the age of the concrete in days (1 to 365).

* Target:
  - Concrete Compressive Strength: Numerical value representing the compressive strength of the concrete in megapascals (MPa).


### Check dataset sample

In [14]:
concrete_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1030 entries, 0 to 1029
Data columns (total 9 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   Cement                         1030 non-null   float64
 1   Blast Furnace Slag             1030 non-null   float64
 2   Fly Ash                        1030 non-null   float64
 3   Water                          1030 non-null   float64
 4   Superplasticizer               1030 non-null   float64
 5   Coarse Aggregate               1030 non-null   float64
 6   Fine Aggregate                 1030 non-null   float64
 7   Age                            1030 non-null   int64  
 8   Concrete compressive strength  1030 non-null   float64
dtypes: float64(8), int64(1)
memory usage: 72.5 KB


### Check for missing values
> note: the 0's means that component wasn't used for the mixture

In [15]:
concrete_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Concrete compressive strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


In [18]:
blast = (concrete_data['Blast Furnace Slag'] == 0).sum()
fly = (concrete_data['Fly Ash'] == 0).sum()
sup = (concrete_data['Superplasticizer'] == 0).sum()

print("Nilai 0 di kolom Blast Furnace Slag ada: ",blast)
print("Nilai 0 di kolom Fly Ash ada: ",fly)
print("Nilai 0 di kolom Superplasticizer ada: ",sup)

Nilai 0 di kolom Blast Furnace Slag ada:  471
Nilai 0 di kolom Fly Ash ada:  566
Nilai 0 di kolom Superplasticizer ada:  379
