# Select columns and filter rows

## Select columns with **.filter** method

In [2]:
import pandas as pd

In [3]:
local_relative_path = "./source/data_processed/wellbore_exploration_all_clean_names.csv"
wellbore_exploration_all = pd.read_csv(local_relative_path)
wellbore_exploration_all.head()

Unnamed: 0,wellbore_name,well,drilling_operator,production_licence,purpose,status,content,well_type,sub_sea,entry_date,...,npdid_wellbore,dsc_npdid_discovery,npdid_field,npdid_facility_drilling,npdid_wellbore_reclass,l_npdid_production_licence,npdid_site_survey,date_updated,date_updated_max,datesync_npd
0,1/2-1,1/2-1,Phillips Petroleum Norsk AS,143,WILDCAT,P&A,OIL,EXPLORATION,NO,20.03.1989,...,1382,43814.0,3437650.0,296245.0,0,21956.0,,03.10.2019,03.10.2019,22.11.2019
1,1/2-2,1/2-2,Paladin Resources Norge AS,143 CS,WILDCAT,P&A,OIL SHOWS,EXPLORATION,NO,14.12.2005,...,5192,,,278245.0,0,2424919.0,,03.10.2019,03.10.2019,22.11.2019
2,1/3-1,1/3-1,A/S Norske Shell,011,WILDCAT,P&A,GAS,EXPLORATION,NO,06.07.1968,...,154,43820.0,,288604.0,0,20844.0,,03.10.2019,03.10.2019,22.11.2019
3,1/3-2,1/3-2,A/S Norske Shell,011,WILDCAT,P&A,DRY,EXPLORATION,NO,14.05.1969,...,165,,,288847.0,0,20844.0,,03.10.2019,03.10.2019,22.11.2019
4,1/3-3,1/3-3,Elf Petroleum Norge AS,065,WILDCAT,P&A,OIL,EXPLORATION,NO,22.08.1982,...,87,43826.0,1028599.0,288334.0,0,21316.0,,03.10.2019,03.10.2019,22.11.2019


You select columns in a dataframe by using `.filter method*` which is a bit non-inuitive(in my head) since we use
it to select columns by column name, and we "**filter**" the columns by a condition using `.query*` method.
It is what it is :)


In [4]:
(wellbore_exploration_all
  .filter(items=["wellbore_name", "well_type"])
).head()

Unnamed: 0,wellbore_name,well_type
0,1/2-1,EXPLORATION
1,1/2-2,EXPLORATION
2,1/3-1,EXPLORATION
3,1/3-2,EXPLORATION
4,1/3-3,EXPLORATION


## Filter Dataframe

### Filter with **.query** method



Filter the dataframe by the elements from the list which is defined as a part of the query argument. 

**REMEMBER SINGLE QUOTES WITHIN DOUBLE QUOTES.**


**NOTE: for columns with spaces in their name, you can use backtick quoting.**

In [5]:
(wellbore_exploration_all
  .filter(items=["wellbore_name", "well_type"])
  .query("wellbore_name in ['1/2-1', '7324/10-1']")
)


Unnamed: 0,wellbore_name,well_type
0,1/2-1,EXPLORATION
1917,7324/10-1,EXPLORATION


### Query by multiple conditions

Here we just added three conditions to filter the dataframe.
Note that I had to use `\` withing the .query's expression just to make it a bit more readable, i.e. every condition needs to be in its own separate line.

In [5]:
(wellbore_exploration_all
   .filter(items=["drilling_operator", "purpose", "total_depth"]) 
   .query('drilling_operator in ["A/S Norske Shell", "Statoil Petroleum AS"] & \
           purpose in "WILDCAT" & \
           1000 < total_depth < 2000'))

Unnamed: 0,drilling_operator,purpose,total_depth
298,A/S Norske Shell,WILDCAT,1971.0
788,Statoil Petroleum AS,WILDCAT,1890.0
1289,Statoil Petroleum AS,WILDCAT,1640.0
1537,A/S Norske Shell,WILDCAT,1920.0
1831,Statoil Petroleum AS,WILDCAT,1033.0
1850,Statoil Petroleum AS,WILDCAT,1594.0
1853,Statoil Petroleum AS,WILDCAT,1780.0
1860,Statoil Petroleum AS,WILDCAT,1855.0
1898,Statoil Petroleum AS,WILDCAT,1500.0
1901,Statoil Petroleum AS,WILDCAT,1540.0


In [None]:
### Query by a predefined list

Now we have a list of top 10 operator companies based on the amount wells drilled on the Norwegian Continental Shelf. Since the data contains historical records you will see some old names for the recent companies.

In [6]:
operators_top_10 = ['Norsk Hydro Produksjon AS',
 'Statoil Petroleum AS',
 'Saga Petroleum ASA',
 'Lundin Norway AS',
 'Esso Exploration and Production Norway A/S',
 'A/S Norske Shell',
 'Elf Petroleum Norge AS',
 'Phillips Petroleum Company Norway',
 'Statoil ASA (old)']

We can refer to variables in the environment by prefixing them with an `@` character like for example 
`@operators_top_10`, i.e.

`.query("drilling_operator == **@operators_top_10**")`

In [7]:
(wellbore_exploration_all
    .filter(items=['drilling_operator', "drilling_days"])
    .query("drilling_operator == @operators_top_10")
)

Unnamed: 0,drilling_operator,drilling_days
2,A/S Norske Shell,129
3,A/S Norske Shell,75
4,Elf Petroleum Norge AS,216
5,Elf Petroleum Norge AS,83
6,A/S Norske Shell,134
...,...,...
1907,Statoil Petroleum AS,19
1916,Statoil Petroleum AS,11
1918,Statoil Petroleum AS,29
1919,Statoil Petroleum AS,16


## Sort Dataframe

Sort based on values in a single column:

In [8]:
(wellbore_exploration_all
    .filter(items=['drilling_operator', "drilling_days", "total_depth"])
    .sort_values(by="total_depth", ascending=False)
)

Unnamed: 0,drilling_operator,drilling_days,total_depth
1257,Statoil ASA (old),49,7928.0
1276,Statoil Petroleum AS,32,7811.0
1260,Statoil ASA (old),27,7725.0
1258,Statoil ASA (old),67,7594.0
293,Den norske stats oljeselskap a.s,203,7584.0
...,...,...,...
1332,Equinor Energy AS,0,0.0
702,ConocoPhillips Skandinavia AS,0,0.0
295,Talisman Energy Norge AS,0,0.0
1757,Norsk Hydro Produksjon AS,0,0.0


Sort based on the values in multiple columns.

In [9]:
(wellbore_exploration_all
    .filter(items=['drilling_operator', "drilling_days", "total_depth"])
    .sort_values(by=["drilling_operator", "drilling_days"], ascending=False)
)

Unnamed: 0,drilling_operator,drilling_days,total_depth
1478,Wintershall Norge ASA,112,4177.0
1447,Wintershall Norge ASA,87,4216.0
1368,Wintershall Norge ASA,66,3585.0
1366,Wintershall Norge ASA,55,2807.0
1317,Wintershall Norge ASA,47,3006.0
...,...,...,...
1538,A/S Norske Shell,13,1800.0
1534,A/S Norske Shell,12,1804.0
1546,A/S Norske Shell,10,1805.0
961,A/S Norske Shell,9,5035.0
