## Topic 1: Pandas DataFrames for Working with Tabular Data

### Tabular Data
This is a tutorial for / demonstration of Tabular Data Structures in Python. In Python, the common tool for dealing with Tabular Data Structures is the DataFrame from the pandas Python package.

#### Loading The Libraries

In [3]:
import os                                       # operating system 
import numpy as np                              # arrays and matrix math
import pandas as pd                             # DataFrames
import matplotlib.pyplot as plt                 # plotting

#### Loading The Data

In [9]:
df = pd.read_csv("Dataset.csv")
print(df.iloc[0:5,:]) # First 5 sample columns
df.head()

      X     Y  facies_threshold_0.3  porosity  permeability  \
0   565  1485                     1    0.1184         6.170   
1  2585  1185                     1    0.1566         6.275   
2  2065  2865                     2    0.1920        92.297   
3  3575  2655                     1    0.1621         9.048   
4  1835    35                     1    0.1766         7.123   

   acoustic_impedance  
0               2.009  
1               2.864  
2               3.524  
3               2.157  
4               3.979  


Unnamed: 0,X,Y,facies_threshold_0.3,porosity,permeability,acoustic_impedance
0,565,1485,1,0.1184,6.17,2.009
1,2585,1185,1,0.1566,6.275,2.864
2,2065,2865,2,0.192,92.297,3.524
3,3575,2655,1,0.1621,9.048,2.157
4,1835,35,1,0.1766,7.123,3.979


#### Checking The Tabular Data
It is useful to review the summary statistics of our loaded DataFrame. That can be accomplished with the `describe` DataFrame member function. We transpose to switch the axes for ease of visualization.

In [11]:
df.describe()

#changing other parameters
df.describe(percentiles=[0.1,0.9])# Adding 0.1 and 0.9 percentiles



Unnamed: 0,X,Y,facies_threshold_0.3,porosity,permeability,acoustic_impedance
count,200.0,200.0,200.0,200.0,200.0,200.0
mean,2053.4,1876.15,1.33,0.1493,25.287462,3.000435
std,1113.524641,1137.58016,0.471393,0.032948,64.470135,0.592201
min,25.0,35.0,1.0,0.05,0.01582,2.009
10%,414.0,364.0,1.0,0.1061,0.26229,2.1915
50%,2160.0,1855.0,1.0,0.15015,4.8255,2.9645
90%,3510.0,3475.0,2.0,0.19014,56.5344,3.8336
max,3955.0,3995.0,2.0,0.2232,463.641,3.984


#### Renaming
Renaming the facies, permeability and acoustic impedance for convenience.

In [12]:
# Renaming Feature
df = df.rename(columns={'facies_threshold_0.3': 'facies','permeability':'perm','acoustic_impedance':'ai'}) #rename columns of the dataframe 
df.head()


Unnamed: 0,X,Y,facies,porosity,perm,ai
0,565,1485,1,0.1184,6.17,2.009
1,2585,1185,1,0.1566,6.275,2.864
2,2065,2865,2,0.192,92.297,3.524
3,3575,2655,1,0.1621,9.048,2.157
4,1835,35,1,0.1766,7.123,3.979


#### Slicing Dataframes

It is straight forward to extract subsets from a `DataFrame` to make a new `DataFrame`: 
- We use `[my_DataFrame].iloc()` with indexes, integers for rows and columns.
- This is useful for cleaning up data by removing features that are no longer of interest.
- Below we make a new DataFrame, `df_subset`, with the rows 0 to 4 and columns 2 to 6 and another new DataFrame, `df_subset`.