# Concrete

In this project, I will look at the <i>Concrete Compressive Strength Data Set</i> from the UCI repository. 

This data set was uploaded by Prof. I-Cheng Yeh and used mainly in Prof. I-Cheng Yeh's 1998 paper (I-Cheng Yeh, "Modeling of strength of high performance concrete using artificial neural networks," Cement and Concrete Research, Vol. 28, No. 12, pp. 1797-1808 (1998).)

This data set is concerned about predicting concrete compressive strength with 8 different variables (i.e. input features) involved in concrete admixing. 7 of these variables are cement, blast furnace slag, fly ash, water, superplasticizer, coarse aggregate, fine aggregate. The last input feature is the age of concrete.

In this project, I'll be analyzing the data set and developing models to predict concrete compressive strength.

In thıs Jupyter notebook, I'll load the data set and get some preliminary insights into the data set suhc as how many rows and columns it has. In the next couple Jupyter notebooks, I'm going to focus on exploratory data analysis, data preprocessing, and modeling.

<h2> 1. Data Set </h2>

First, I will import libraries we will need. Then, I will import the data set using the URL for the relevant UCI archive. If this doesn't work, consider importing the data set locally from your computer.

In [1]:
#Importing required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.filterwarnings("ignore")

In [2]:
#Load the data set
path = "http://archive.ics.uci.edu/ml/machine-learning-databases/concrete/compressive/Concrete_Data.xls"
df = pd.read_excel(path, index_col = False)

In [3]:
#See first 5 rows
df.head()

Unnamed: 0,Cement (component 1)(kg in a m^3 mixture),Blast Furnace Slag (component 2)(kg in a m^3 mixture),Fly Ash (component 3)(kg in a m^3 mixture),Water (component 4)(kg in a m^3 mixture),Superplasticizer (component 5)(kg in a m^3 mixture),Coarse Aggregate (component 6)(kg in a m^3 mixture),Fine Aggregate (component 7)(kg in a m^3 mixture),Age (day),"Concrete compressive strength(MPa, megapascals)"
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.986111
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.887366
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.269535
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05278
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.296075


In [4]:
print(df.shape)
df.dtypes

(1030, 9)


Cement (component 1)(kg in a m^3 mixture)                float64
Blast Furnace Slag (component 2)(kg in a m^3 mixture)    float64
Fly Ash (component 3)(kg in a m^3 mixture)               float64
Water  (component 4)(kg in a m^3 mixture)                float64
Superplasticizer (component 5)(kg in a m^3 mixture)      float64
Coarse Aggregate  (component 6)(kg in a m^3 mixture)     float64
Fine Aggregate (component 7)(kg in a m^3 mixture)        float64
Age (day)                                                  int64
Concrete compressive strength(MPa, megapascals)          float64
dtype: object

As seen above, the data set has 1030 rows and 9 columns, 8 of which are the variables that I will put into my models. The last column is for the concrete compressive strength which I will be trying to predict as my target variable. All 9 columns have floating point numbers, except for the age column which has integers.
I notice the column names are a bit too long. So, I will give them simpler names.

In [5]:
features = ["cement","slag", "ash", "water", "superplasticizer", "coarse_agg", "fine_agg", "age", "strength"]
df.columns = features
df.head()

Unnamed: 0,cement,slag,ash,water,superplasticizer,coarse_agg,fine_agg,age,strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.986111
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.887366
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.269535
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05278
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.296075


In [6]:
#Let's save this data set as a csv file
df.to_csv('concrete.csv', index=False)

In the next Jupyter notebook, I'm going to do Exploratory Data Analysis.