# Project 1

This repository contains two files [`eog_wells_in_nd.csv`](datasets/eog_wells_in_nd.csv) and [`nd_production.csv`](datasets/nd_production.csv) in the `datasets` directory. The file [`eog_wells_in_nd.csv`](datasets/eog_wells_in_nd.csv) contains the latitude and longitude for all permitted wells owned by EOG in North Dakota.  The file [`nd_production.csv`](datasets/nd_production.csv) contains the monthly production history for *all* wells in North Dakota identified by their API numbers.  I encourage you to inspect the formatting of these two files.  For this project, you will have two tasks:

## Task 1

You should read the file [`eog_wells_in_nd.csv`](datasets/eog_wells_in_nd.csv) into the class attribute `well_df` (**Do not change the name**).  However, you should not hard-code the "eog" part of the file name, instead use the class instantiation argument `ticker` to create the file name. This is so we can use this class to read in many files of this type.  So if `ticker =  'xom'` then the class will read in a file `datasets/xom_wells_in_nd.csv`, if `ticker =  'nbl'` then the class will read in a file `datasets/nbl_wells_in_nd.csv`, etc.

Next, we want to add two new columns to the `well_df`.  These columns should be labeled **exactly** `'cumulative_oil'` and `'cumulative_gas'` and they should contain the total production (summed of over all months) for the corresponding API number in [`nd_production.csv`](datasets/nd_production.csv).  There are wells in [`eog_wells_in_nd.csv`](datasets/eog_wells_in_nd.csv) that do not have any production history, either because they have been permitted and not drilled, they were dry holes, the data is missing, etc.

It's possible to compute this total production for all the wells with a one line series of Pandas operations.  A few Dataframe member functions you may want to look into are [isin](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.isin.html?highlight=isin#pandas.DataFrame.isin), [groupby](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html?highlight=groupby#pandas.DataFrame.groupby), and [sum](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.sum.html?highlight=sum#pandas.Series.sum).  You could also loop over all the unique wells summing individually, but this will likely be very slow.

After you've added the new columns to the `well_df` DataFrame, run the member function `dropna(inplace=True)` on it to get rid of any missing values.

## Task 2

You should complete the function `create_well_map_plot`.  I've provided a few imports and template code already to set the tile provider to [OpenStreetMap](https://www.openstreetmap.org/).  Here are a couple of settings to get everything right, you *must* do these:

 * Create the figure with `plot_width = 500`, `plot_height = 400`,  `tools='tap,box_select,box_zoom,pan,reset'` and possibly other options we've already used to create a map project, set labels, etc.

 * Use a `circle` glyph with `size = 5`, `line_color=None`, `fill_alpha=0.8`, `name='wells'` and possible other options we've already used.
 
 * Color the circles via their `'cumulative_oil'` or `'cumulative_gas'` production values.  These should be selectable at class instantiation with the `color_by` argument.  Add a color bar on the left side of the figure.  Use a `Viridis256` color pallete where the miniumum color bar value is the minimum `'cumulative_[oil/gas]` value for all wells.  Likewise for the maximum.  [This](https://stackoverflow.com/questions/50013378/how-to-draw-a-circle-plot-the-linearcolormapper-using-python-bokeh) Stack Overflow post can assist you in setting the color bar.

The figures must be identical for the tests to pass.  If everything works correctly, you should get a plot with this interactivity by running the `show_plot()` function.

![img](images/wells.gif)

In [3]:
from bokeh.models import ColumnDataSource, LinearColorMapper, ColorBar, HoverTool
from bokeh.plotting import figure
from bokeh.layouts import row

from bokeh.io import show, output_notebook, export_png, output_file

output_notebook(hide_banner=True)

from bokeh.tile_providers import get_provider, Vendors
tile_provider = get_provider(Vendors.CARTODBPOSITRON_RETINA)

import pandas as pd
import numpy as np

from production_plotter import ProductionPlot


class NDWellProductionPlot(ProductionPlot):
    
    def __init__(self, ticker, color_by='oil'):
        
        self.ticker = ticker
        
        # Do not change the name of the following variable
        production_df = pd.read_csv('./datasets/nd_production.csv')
        
        # Uncomment and add command to read well file into well_df
        #self.well_df = 
       
        
        #Add any additional initialization code here
        
        
        #Leave as last line in __init__ function
        super().__init__(production_df)
        return
        
        
    def create_well_map_plot(self):
        
        #Add code to create well map
        
        
        return

        
    def show_plot(self):
        self.create_well_map_plot()
        self.create_production_plot()
        show(row(self.well_plot, self.prod_plot))
        
        return
    
    def save_plot(self):
        self.create_well_map_plot()
        export_png(self.well_plot, filename='{}_wells_in_nd.png'.format(self.ticker.lower()))

In [2]:
#wp = NDWellProductionPlot('EOG', 'oil')
#wp.show_plot()