# Annual Hydrocarebon Production

## Question:

**What is the gross annual produced hydrocarbons on Norwegian Continental Shelf through time?**

In [1]:
import numpy as np
import pandas as pd
import janitor

Table name = Sub level - Production , sum wellbores			

Below you have a dictionary of the columns that are describing the table and its columns.

In [2]:
#data_dictionary = pd.read_clipboard() # pasted from https://factpages.npd.no/factpages/Default.aspx?culture=en

data_dictionary

NameError: name 'data_dictionary' is not defined

In [3]:
production_wellbores_monthly =  pd.read_csv("https://factpages.npd.no/ReportServer_npdpublic?/FactPages/TableView/field_production_gross_monthly&rs:Command=Render&rc:Toolbar=false&rc:Parameters=f&rs:Format=CSV&Top100=false&IpAddress=82.102.27.204&CultureCode=en")
production_wellbores_monthly.head()

Unnamed: 0,prfInformationCarrier,prfYear,prfMonth,prfPrdOilGrossMillSm3,prfPrdGasGrossBillSm3,prfPrdCondensateGrossMillSm3,prfPrdOeGrossMillSm3,prfPrdProducedWaterInFieldMillSm3,prfNpdidInformationCarrier
0,33/9-6 DELTA,2009,7,0.0007,0.00011,0.0,0.00081,0.00051,44576
1,33/9-6 DELTA,2009,8,0.00292,0.00047,0.0,0.00339,0.00063,44576
2,33/9-6 DELTA,2009,9,0.00338,0.00054,0.0,0.00392,0.00316,44576
3,33/9-6 DELTA,2009,10,0.00312,0.0005,0.0,0.00362,0.00535,44576
4,33/9-6 DELTA,2009,11,0.0,0.0,0.0,0.0,0.0,44576


In [4]:
production_wellbores_monthly.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20504 entries, 0 to 20503
Data columns (total 9 columns):
prfInformationCarrier                20504 non-null object
prfYear                              20504 non-null int64
prfMonth                             20504 non-null int64
prfPrdOilGrossMillSm3                20504 non-null float64
prfPrdGasGrossBillSm3                20504 non-null float64
prfPrdCondensateGrossMillSm3         20504 non-null float64
prfPrdOeGrossMillSm3                 20504 non-null float64
prfPrdProducedWaterInFieldMillSm3    20504 non-null float64
prfNpdidInformationCarrier           20504 non-null int64
dtypes: float64(5), int64(3), object(1)
memory usage: 1.4+ MB


Lets make column names a bit nicer. 

In [5]:
production_wellbores_monthly.columns = production_wellbores_monthly.columns.str.replace("prf", "")

production_wellbores_monthly = production_wellbores_monthly.clean_names(case_type="snake")

In [6]:
production_wellbores_monthly.columns

Index(['information_carrier', 'year', 'month', 'prd_oil_gross_mill_sm3',
       'prd_gas_gross_bill_sm3', 'prd_condensate_gross_mill_sm3',
       'prd_oe_gross_mill_sm3', 'prd_produced_water_in_field_mill_sm3',
       'npdid_information_carrier'],
      dtype='object')

Next we are going to do is the following:

    1) drop columns we do not need ["information_carrier", "npdid_information_carrier", "prd_oe_gross_mill_sm3"]
    2) reshape wide to long table
    3) group by ["year", "hc_phase"] and aggregate it by summarizing

In [7]:
production_wellbores_monthly_LONG = (production_wellbores_monthly
    .drop(columns=["information_carrier", "npdid_information_carrier", "prd_oe_gross_mill_sm3"])
    .melt(
        id_vars=["year", "month"],
        var_name="hc_phase",
        value_name="prd_gross_mill_sm3"
        )
    .groupby(["year", "hc_phase"])
    .agg(prd_gross_mill_sm3_year = pd.NamedAgg(column="prd_gross_mill_sm3", aggfunc="sum"))
    .reset_index()
    ).sort_values(["year", "hc_phase"])

**Voila - a LONG table.**

In [8]:
production_wellbores_monthly_LONG.head()

Unnamed: 0,year,hc_phase,prd_gross_mill_sm3_year
0,1971,prd_condensate_gross_mill_sm3,0.0
1,1971,prd_gas_gross_bill_sm3,0.10294
2,1971,prd_oil_gross_mill_sm3,0.35712
3,1971,prd_produced_water_in_field_mill_sm3,0.0
4,1972,prd_condensate_gross_mill_sm3,0.0


I am listing unique values within hc_phase column which I will color it based on geostandard colors.

In [9]:
production_wellbores_monthly_LONG["hc_phase"].unique()

array(['prd_condensate_gross_mill_sm3', 'prd_gas_gross_bill_sm3',
       'prd_oil_gross_mill_sm3', 'prd_produced_water_in_field_mill_sm3'],
      dtype=object)

## [Altair](https://altair-viz.github.io/index.html) - Exploratory Data Analyis

**Why Altair library you are probably asking your self?*'

[Sadly, in Python, we do not have a ggplot2.](http://fernandoi.cl/blog/posts/altair/)

**As Fernando Irarrázaval nicely summarized it: Main reason for personally using it is:**

"Python’s go to visualization library, matplotlib, is very powerfulmatplotlib recently came into the spotlight again for being attributed the first black hole image.

 but has severe limitations. At times its flexibility is a blessing, but it is easy to get frustrated adding a small feature to your graph. Also, matplotlib dual object oriented and state-based interface is confusing. I still don’t completely grasp it even though I have been using matplotlib for years. Lastly, it is not easy to make interactive charts

**Altair and the grammar of graphics**

Enter Altair. Altair is a wrapper for Vega-Lite, a JavaScript high-level visualization library. One of Vega-LiteIn the rest of the article, I will mainly refer to Altair, but Vega-Lite deserves as much (or more) credit.

 most important features is that its API is based in the grammar of graphics.

Grammar of graphics may sound like an abstract feature, but it is the main difference between Altair and other Python visualization libraries. Altair matches the way we reason about visualizing data."

[Useful video from Jake VanderPlas - Exploratory Data Visualization with Vega, Vega-Lite, and Altair - PyCon 2018](https://www.youtube.com/watch?v=ms29ZPUKxbU)

In [10]:
import altair as alt
alt.data_transformers.disable_max_rows() # default is 5000 rows warning
#alt.renderers.enable('notebook')

domain = ['prd_condensate_gross_mill_sm3', 'prd_gas_gross_bill_sm3',
       'prd_oil_gross_mill_sm3', 'prd_produced_water_in_field_mill_sm3']
range_ = ['pink', 'red', 'green', 'blue']

annual_production_plot = (alt
    .Chart(production_wellbores_monthly_LONG)
    .mark_bar(opacity=0.3)
    .encode(
        x="year:Q",
        y="prd_gross_mill_sm3_year:Q",
        color=alt.Color('hc_phase', scale=alt.Scale(domain=domain, range=range_)),
        row="hc_phase:N"
        )
    .properties(height=100)
)

annual_production_plot

<VegaLite 3 object>

If you see this message, it means the renderer has not been properly enabled
for the frontend that you are using. For more information, see
https://altair-viz.github.io/user_guide/troubleshooting.html


**What about making it interactive?**

In [11]:
(annual_production_plot
    .encode(tooltip=['prd_gross_mill_sm3_year', 'year', 'hc_phase'])
    .interactive())

<VegaLite 3 object>

If you see this message, it means the renderer has not been properly enabled
for the frontend that you are using. For more information, see
https://altair-viz.github.io/user_guide/troubleshooting.html
