## 2. Data Exploration, Analysis and Visualization

Before we explore our data, let us identify some variables in our data frame.

![](https://i.imgur.com/V1dA182.png)

### a. Exploring the dataset

Let us explore our reservoir sandstone most crucial parameter in the data frame.

**Explore the thinkness of the Sand Stone**

In [43]:
reservoir_sdst_df['THK'].describe()

count    40130.000000
mean        22.869531
std         21.513727
min          1.000000
25%         10.232500
50%         17.230000
75%         28.240000
max        721.020000
Name: THK, dtype: float64

**Explore the Reservoir volume (m3)**

In [44]:
reservoir_sdst_df['TVOL'].describe()

count    7203.000000
mean      456.300014
std       284.576320
min         1.000000
25%       209.000000
50%       436.000000
75%       700.000000
max       999.000000
Name: TVOL, dtype: float64

**Explore the Total area (ac)**

In [45]:
reservoir_sdst_df['TAREA'].describe()

count    33285.000000
mean       282.563347
std        246.949951
min          1.000000
25%         87.000000
50%        200.000000
75%        420.000000
max        999.000000
Name: TAREA, dtype: float64

Let us explore our pore space of the sandstone most crucial parameter in the data frame.

**Explore the Average porosity (%)**

![](https://i.imgur.com/Q252ZT1.png)

We can also create a dataframe from the collected information.

In [46]:
reservoir_sdst_df[['POROSITY', 'PERMEABILITY', 'SW', 'THK', 'SS']].describe()

Unnamed: 0,POROSITY,PERMEABILITY,SW,THK,SS
count,40130.0,36450.0,40130.0,40130.0,43.0
mean,0.287274,357.467462,0.275979,22.869531,851.72093
std,0.0389,380.481876,0.094414,21.513727,173.114513
min,0.1,0.0,0.1,1.0,0.0
25%,0.27,109.0,0.2,10.2325,795.0
50%,0.29,233.0,0.27,17.23,900.0
75%,0.31,485.0,0.33,28.24,965.0
max,0.38,3954.0,0.75,721.02,999.0


In [47]:
reservoir_sdst_df['POROSITY'].describe()

count    40130.000000
mean         0.287274
std          0.038900
min          0.100000
25%          0.270000
50%          0.290000
75%          0.310000
max          0.380000
Name: POROSITY, dtype: float64

**Explore the Arithmetic average permeability (md)**

In [48]:
reservoir_sdst_df['PERMEABILITY'].describe()

count    36450.000000
mean       357.467462
std        380.481876
min          0.000000
25%        109.000000
50%        233.000000
75%        485.000000
max       3954.000000
Name: PERMEABILITY, dtype: float64

**Explore the Water saturation**

In [49]:
reservoir_sdst_df['SW'].describe()

count    40130.000000
mean         0.275979
std          0.094414
min          0.100000
25%          0.200000
50%          0.270000
75%          0.330000
max          0.750000
Name: SW, dtype: float64

Let us take a sample of the data and explore our reservoir more.

In [50]:
reservoir_sdst_df.sample(5)

Unnamed: 0,SN_FORMSAND,SAND_NAME,ASSESSED,SDDATE,SDYEAR,SDDATEH,SDYEARH,WELLAPI,BOEM_FIELD,FCLASS,...,TCNT,BHCOMP,LAT,LONG,CHRONOZONE_DESCRIPTION,PLAY,CHRONOZONE_y,PLAY_TYPE_y,PLAY_NUMBER,PLAY_NAME_DESCRIPTION
2032,105023,1361_GI016_B1,Y,01/08/56,1956,10/30/83,1983,177170080800,GI016,PDP,...,6,18.0,29.0963,-89.99725,Upper Upper Miocene,MUU_P1,MUU,P1,1361,Upper Upper Miocene Progradational Play
4499,223616,1341_SM274_GD,Y,09/20/84,1984,09/20/84,1984,177074047200,SM274,PDN,...,1,1.0,29.118515,-92.141484,Upper Upper Miocene,MUU_A1,MUU,A1,1341,Upper Upper Miocene Aggradational Play
2684,134056,0561_HI414A_PLA10,Y,02/02/79,1979,02/02/79,1979,427094039500,HI414A,PDN,...,1,2.0,28.46863,-94.02731,Lower Pleistocene,PLL_P1,PLL,P1,561,Lower Pleistocene Progradational Play
4238,209543,0361_SM107_DJ,Y,07/30/64,1964,07/30/64,1964,177080007600,SM107,PDP,...,1,6.0,28.41989,-91.95632,Middle Pleistocene,PLM_P1,PLM,P1,361,Middle Pleistocene Progradational Play
9556,414669,0582_GB179_9500,Y,09/19/97,1997,09/19/97,1997,608074063700,GB179,PDN,...,1,1.0,27.7711,-93.76464,Lower Pleistocene,PLL_F2,PLL,F2,582,Lower Pleistocene Fan 2 Play


### b. Analyzing data from data frames

**Porosity** is a measure of the percentage of pore spaces (empty) in a rock volume and is a fraction of the volume of voids over the total bulk volume of rock [between 0% to 100%]. For more information, see https://en.wikipedia.org/wiki/Porosity.

**Permeability** is a measure of the ability of a porous space in rocks to allow fluids to flow. For more information, see https://en.wikipedia.org/wiki/Permeability_(Earth_sciences).

Let's try to answer some questions about our datasets.

**Q: What is the total thickness of sandstone and total area of hydrocarbons reservoir in the Gulf of Mexico?**

In [51]:
total_thickness = (reservoir_sdst_df.THK.sum() / 1000)
total_area = (reservoir_sdst_df.TAREA.sum() / 1000)

In [52]:
print('The thickness of reported sandstones is {}km and the total reservoir area is {}km.'.format(int(total_thickness), int(total_area)))

The thickness of reported sandstones is 917km and the total reservoir area is 9405km.


**Q: What is the average porosity of sandstone and average of permeability of hydrocarbons reservoir in the Gulf of Mexico?**

In [53]:
aver_porosity = (reservoir_sdst_df.POROSITY.mean() * 100)
aver_permeability = (reservoir_sdst_df.PERMEABILITY.mean())

In [54]:
print('The average porosity of reservoir sandstones is {}% and the average permeability is {}md. However, 350md as permeability is very high for average permeability, which indicates the presence of outliers, so let us investigate this further.'.format(int(aver_porosity), int(aver_permeability)))

The average porosity of reservoir sandstones is 28% and the average permeability is 357md. However, 350md as permeability is very high for average permeability, which indicates the presence of outliers, so let us investigate this further.
