In [8]:
from pandas import DataFrame, read_csv

import altair as alt
import pandas as pd
import numpy as np

file = r'GasProduction.xlsx'
df = pd.read_excel(file,index_col=False)


interval = alt.selection_interval()

base = alt.Chart(df).properties(
    width=350,
    height=350, 
).add_selection(interval)


points = base.mark_point(filled=True, size=200).encode(

    alt.X('X:Q',
            scale=alt.Scale(domain=(434000, 438000))
        ),    
    alt.Y('Y',
            scale=alt.Scale(domain=(6477000, 6480000))),

    size='BUTPD:Q',
    color=alt.condition(interval, 'Well_ID', alt.value('lightgray')),
    tooltip='Well_ID', 
).properties(
    title='GAS FIELD MAP',
    selection=interval
)


timeseries = base.mark_line().encode(
    x='Date',
    y=alt.Y('BUTPD', scale=alt.Scale(domain=(0, 40000))),
    color=alt.Color('Well_ID:O')
    
).properties(
    title='Gas Production by Well in BUTPD',
    width=350,
    height=140
    
).transform_filter(
     
    interval
)

timeseries2 = base.mark_line().encode(
    x='Date',
    y=alt.Y('BWPD', scale=alt.Scale(domain=(0, 40000))),
    color=alt.Color('Well_ID:O')
    
).properties(
    title='Water Production by Well in BWPD',
    width=350,
    height=140
    
).transform_filter(
    
    interval
)


hist = alt.Chart(df).mark_bar().encode(
    x='sum(BUTPM)',
    y='Well_ID',
    color='Well_ID'
).properties(
    width=700,
    height=80
).transform_filter(
    interval
)

hist2 = alt.Chart(df).mark_bar().encode(
    x='sum(BWPM)',
    y='Well_ID',
    color='Well_ID'
).properties(
    width=700,
    height=80
).transform_filter(
    interval
)



A = points | timeseries & timeseries2 
B = hist & hist2


A & B




Graphical Design Justifications:

1) Hue colours were chosen on the basis they show a clear and stark contrast between categorical information, and can therefore be more easily understood by the reader, especially with dynamic data using Time Series that intersect and overlap with each other tracking values over time.

2) The same colours have been used across all the graphs to represent the same variables (Well_ID), maintaining consistency to present a clear narrative, which can be easily followed.

3) By using interactive and dynamic data allows the user to select a group of different gas wells which are plotted as coordinates on the field map. Therefore, the other 4 graphs will update accordingly to show the respective data for the group of wells selected. In this sense, it creates coordinated multiple views of animated transitions and dynamic data which enables the user to quickly compare and digest complex data and avoids having to create multiple graphs. 

4) The Gas Field Map itself employs a scatter plot with configurable point shapes. This was chosen to plot coordinate positions, and to allow the user to use the interactive rectangular brush selection linked to the graphs.

5) Histograms have been chosen to present a different representation of the distribution of Wells against their Sum of BUTPM (BUT per month).

Data Introduction and Insights:

The main goal of this exercise was to interactively interrogate and visualize gas field production data well-by-well, using Altair. The user starts the process by selecting a well or group of wells and the associated graphs will show the corresponding data for that well or wells.

It is clear from the 'Gas Production by Well in BUTPD' graph, (measured BUT per day), that Well "x15/9-F-12" is the best producer of gas in the whole group from 2009 to 2010. However, its production reduces over time. During 2015 and 2016 the well that produced the least gas was "x15/9-F-15 D".

Also from the 'Sum of BUTPM' (BUT per month) histogram, we can conclude that the best producer of gas over the entire period was well "x15/9-F-12", followed by well " x15/9-F-14", and then well "x15/9-F-11".

A striking insight can be appreciated by contrasting the gas and water production line over time (from their respective time series graphs). There is an inverse relationship between them, meaning if there is a lot of gas production then there is a small amount of water produced, and vice versa.