# Nigeria Export Crude Oil Production and Price
Accoriding to [Wikipedia](https://en.wikipedia.org/wiki/Petroleum_industry_in_Nigeria#:~:text=Nigeria%20is%20the%20largest%20oil,paraffinic%20and%20low%20in%20sulfur.), Nigeria is the largest oil and gas producer in Africa. [Crude oil](https://en.wikipedia.org/wiki/Petroleum) from the [Niger delta basin](https://en.wikipedia.org/wiki/Niger_Delta) comes in two types: [light](https://en.wikipedia.org/wiki/Light_crude_oil), and comparatively [heavy](https://en.wikipedia.org/wiki/Heavy_crude_oil) – the lighter has around 36 gravity while the heavier has 20–25 gravity. Both types are [paraffinic](https://en.wikipedia.org/wiki/Alkane) and low in [sulfur](https://en.wikipedia.org/wiki/Sulfur).Nigeria's economy and budget have been largely supported from income and revenues generated from the petroleum industry since 1960. Statistics as at February 2021 shows that the Nigerian oil sector contributes to about 9% of the entire [GDP](https://en.wikipedia.org/wiki/Gross_domestic_product) of the nation. Nigeria is the largest oil and gas producer in Africa, a major exporter of crude oil and petroleum products to the United States of America. In 2010, Nigeria exported over one million barrels per day to the United States of America, representing 9% of the U.S. total crude oil and petroleum products imports and over 40% of Nigeria exports.

This project involves the use of [**SQL**](https://www.w3schools.com/sql/) and [**Python**](https://www.python.org/) programming language. SQL will be used to answer the given question while Python will be used to present the answers as a dataframe and visualize it where necessary.

The data used for this analysis can be found on [Central Bank of Nigeria (CBN) Statistic section](https://www.cbn.gov.ng/rates/crudeoil.asp). It contain 5 fields with 201 records.

<h1 id="crude_oil"><code>CrudeOilPrice</code></h1>
<table>
<thead>
<tr>
<th>column</th>
<th>type</th>
<th>description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>year</code></td>
<td><code>varchar</code></td>
<td>Year of production</td>
</tr>
<tr>
<td><code>month</code></td>
<td><code>smallint</code></td>
<td>Month in each year</td>
</tr>
<tr>
<td><code>crude_oil_price</code></td>
<td><code>decimal</code></td>
<td>Price of crude oil (US$/Barrel)</td>
</tr>
<tr>
<td><code>production</code></td>
<td><code>decimal</code></td>
<td>Crude oil produced in millions barrels per day (mbd)</td>
</tr>
<tr>
<td><code>crude_oil_export</code></td>
<td><code>decimal</code></td>
<td>Crude oil exported in millions barrels per day (mbd)</td>
</tr>
</tbody>
</table>

## Questions 
* Which year was the largest production ever made? Narrow it down to the month.
* Which year has the highest average production for our record?
* Which year has the highest average price?
* Is there a month where the total production was exported totally?
* How many times was oil exported less than 70% of the production?
* How has production and price change with three years gap?
* Summarize the three numeric column
* What is the correlation of price with production and export?
* Summarize for the last five years, the average price, standard deviation and median. 

In [1]:
# Importing required libraries
import sqlite3 as sq
import plotly.graph_objs as go
import plotly.express as px

In [2]:
# Creating Database
conn = sq.connect("crude_oil.db")

# Create a table called crudeOilPrice
query = """
CREATE TABLE crudeOilPrice(
    year VARCHAR(4),
    month SMALLINT,
    crude_oil_price DECIMAL(5,2),
    production DECIMAL(3,2),
    crude_oil_export DECIMAL(3,2)
    )"""

with conn:
    cur = conn.cursor()
    cur.execute("DROP TABLE IF EXISTS crudeOilPrice")
    cur.execute(query)

In [3]:
# checking our sqlite master for confirmation
query = "SELECT name FROM sqlite_master WHERE type='table'"
import pandas as pd
sqlDF = pd.read_sql_query(query, conn)
sqlDF

[('crudeOilPrice',)]


Let fill in values from the [csv file](https://www.cbn.gov.ng/Functions/export.asp?tablename=CrudeOilProdPrice) gotten from CBN


In [5]:
# Loading CSV file
csv = pd.read_csv('CrudeOilProdPrice25102022.csv', index_col=False)
csv.head()

Unnamed: 0,Year,Month,Crude Oil Price,Production,Crude Oil Export
0,2006,1,63.85,2.59,2.14
1,2006,2,61.33,2.47,2.02
2,2006,3,65.0,2.25,1.8
3,2006,4,72.09,2.32,1.87
4,2006,5,71.18,2.28,1.83


In [6]:
# creating a tuple of values in each row
csvTuple = tuple(csv.itertuples(index=False, name=None))
csvTuple[0]

# inserting into table crudeOilPrice
insertQuery = "INSERT INTO crudeOilPrice values(?,?,?,?,?)"
cur.executemany(insertQuery, csvTuple)

# Reading table as a pandas dataframe
query = "SELECT * FROM crudeOilPrice"
pd.read_sql_query(query, conn)


(2006, 1, 63.85, 2.59, 2.14)

## Data Cleaning
In the data, moths were represented with numbers. This will be replaced with the actual month name.

In [9]:
# Replacing month numbers with actual name
query = """
UPDATE
    crudeOilPrice
SET 
    month = 
        REPLACE(
        REPLACE(
        REPLACE(
        REPLACE(
        REPLACE(
        REPLACE(
        REPLACE(
        REPLACE(
        REPLACE(
        REPLACE(
        REPLACE(
        REPLACE(month, 12, 'December'), 
        2, 'February'), 
        3, 'March'),
        4, 'April'),
        5, 'May'), 
        6, 'June'),
        7, 'July'),
        8, 'August'),
        9, 'September'),
        10, 'October'),
        11, 'November'),
        1, 'January')
"""
# Previewing for confirmation
cur.execute(query)
query = "SELECT * FROM crudeOilPrice"
pd.read_sql_query(query, conn)

Unnamed: 0,year,month,crude_oil_price,production,crude_oil_export
0,2006,January,63.85,2.59,2.14
1,2006,February,61.33,2.47,2.02
2,2006,March,65.00,2.25,1.80
3,2006,April,72.09,2.32,1.87
4,2006,May,71.18,2.28,1.83
...,...,...,...,...,...
196,2022,May,116.72,1.02,0.57
197,2022,June,130.10,1.16,0.71
198,2022,July,120.54,1.08,0.63
199,2022,August,106.34,0.97,0.52


## Exploration

In [10]:
# Overview of record
query = """
SELECT  COUNT(*) Records, 
        COUNT(DISTINCT year) Years_record,
        COUNT(DISTINCT month) Distinct_Month,
        MIN(year) AS Start_year,
        MAX(year) AS End_year
  FROM  crudeOilPrice
"""
sqlDF = pd.read_sql_query(query, conn)
sqlDF

Unnamed: 0,Records,Years_record,Distinct_Month,Start_year,End_year
0,201,17,12,2006,2022


### Which year was the largest production ever made? Narrow it down to the month.

In [11]:
query = """
SELECT  year, month, max(production) AS Max_Prod
  FROM  crudeOilPrice
"""
pd.read_sql_query(query, conn)

Unnamed: 0,year,month,Max_Prod
0,2010,October,2.88


### Which year has the highest average production for our record?


In [12]:
query = """
SELECT sub.Year, max(sub.Avg_Prod) AS Max_Avg_Prod
  FROM (
          SELECT year AS Year, avg(production) AS Avg_Prod
            FROM crudeOilPrice
        GROUP BY Year) AS sub
"""
pd.read_sql_query(query, conn)

Unnamed: 0,Year,Max_Avg_Prod
0,2010,2.4675


 ### Which year has the highest average price?

In [13]:
query = """
SELECT sub.Year, max(Avg_Price) AS Max_Avg_Price
  FROM (
          SELECT year AS Year, avg(crude_oil_price) AS Avg_Price
            FROM crudeOilPrice
        GROUP BY Year) AS sub
"""
pd.read_sql_query(query, conn)

Unnamed: 0,Year,Max_Avg_Price
0,2011,113.76


### Is there a month where the total production was exported totally?

In [14]:
query = """
SELECT COUNT(*) AS count
  FROM crudeOilPrice
 WHERE production = crude_oil_export
"""
pd.read_sql_query(query, conn)

Unnamed: 0,count
0,0


### How many times was oil exported less than 70% of the production?

In [15]:
query = """
SELECT  COUNT(*) AS Count
  FROM  (
        SELECT production, year, crude_oil_export, 0.7 * production AS seventy_perc_prod
  FROM  crudeOilPrice)
 WHERE  crude_oil_export < seventy_perc_prod
"""
pd.read_sql_query(query, conn)

Unnamed: 0,Count
0,22


We have 22 count, let's narrow it down to year with the highest count.

In [16]:
query = """
SELECT production, year, crude_oil_export, 0.7 * production AS seventy_perc_prod
  FROM crudeOilPrice
 LIMIT 10
"""
pd.read_sql_query(query, conn)

Unnamed: 0,production,year,crude_oil_export,seventy_perc_prod
0,2.59,2006,2.14,1.813
1,2.47,2006,2.02,1.729
2,2.25,2006,1.8,1.575
3,2.32,2006,1.87,1.624
4,2.28,2006,1.83,1.596
5,2.41,2006,1.96,1.687
6,2.39,2006,1.94,1.673
7,2.4,2006,1.95,1.68
8,2.4,2006,1.95,1.68
9,2.38,2006,1.93,1.666


In [17]:
query = """
  SELECT year AS Year, COUNT(*) AS Export_Count
    FROM (
           SELECT production, year, crude_oil_export, 0.7 * production AS fifty_perc_prod
             FROM crudeOilPrice)
   WHERE crude_oil_export < fifty_perc_prod
GROUP BY Year
"""
pd.read_sql_query(query, conn)

Unnamed: 0,Year,Export_Count
0,2020,1
1,2021,12
2,2022,9


We have seen years that exported oil was below 70% of production. 

Let's look at when exported oil was above 80%

In [18]:
query = """
  SELECT year AS Year, COUNT(*) AS Export_Count
    FROM (
          SELECT production, year, crude_oil_export, 0.8 * production AS fifty_perc_prod
          FROM crudeOilPrice)
   WHERE crude_oil_export >= fifty_perc_prod
GROUP BY Year
"""
sqlDF = pd.read_sql_query(query, conn, index_col='Year')
print(sqlDF)

fig = px.line(sqlDF, x=sqlDF.index, y=sqlDF.Export_Count, 
                markers=True, width=700,
                title='Crude oil export above 80% of production',
                labels={'Year':'', 'Export_Count':'Month Count'})
fig.show()

      Export_Count
Year              
2006            12
2007             5
2008             1
2009             1
2010            11
2011            12
2012             8
2013             2
2014             3


It turns out that the last time oil was exported above 80% of production was in 2014. 

And its only in 2006 and 2011 that 80% of produced oil was exported in all months.

### How has production and price change with three years gap?
**Price Difference**

In [42]:
month_order = ['January', 'February', 'March', 'April', 'May', 'June', 
               'July', 'August', 'September', 'October', 'November', 'December']
month_short = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
query = """
WITH first AS (
    SELECT  c1.month, c1.crude_oil_price AS Price2006, c2.crude_oil_price AS Price2009
      FROM  crudeOilPrice AS c1
INNER JOIN  crudeOilPrice AS c2
        ON  c1.month = c2.month
     WHERE  c1.year = 2006 AND c2.year = c1.year + 3),

sec AS (
    SELECT  c1.month, c1.crude_oil_price AS Price2012, c2.crude_oil_price AS Price2015
      FROM  crudeOilPrice AS c1
INNER JOIN  crudeOilPrice AS c2
        ON  c1.month = c2.month
     WHERE  c1.year = 2012 AND c2.year = c1.year + 3),

third AS (
    SELECT  c1.month, c1.crude_oil_price AS Price2018, c2.crude_oil_price AS Price2021
      FROM  crudeOilPrice AS c1
INNER JOIN  crudeOilPrice AS c2
        ON  c1.month = c2.month
     WHERE  c1.year = 2018 AND c2.year = c1.year + 3)
     
    SELECT  f.month, f.Price2006, f.Price2009, s.Price2012, s.Price2015, t.Price2018, t.Price2021
      FROM  first AS f
 LEFT JOIN  sec AS s
     USING  (month)
 LEFT JOIN  third AS t
     USING  (month)
"""

sqlDF = pd.read_sql_query(query, conn, index_col='month').reindex(month_order)
sqlDF

Unnamed: 0_level_0,Price2006,Price2009,Price2012,Price2015,Price2018,Price2021
month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
January,63.85,44.95,113.81,48.81,69.68,54.87
February,61.33,46.52,121.87,58.09,66.67,62.48
March,65.0,49.7,128.0,56.69,74.72,65.62
April,72.09,51.16,122.62,57.45,72.37,64.3
May,71.18,60.02,113.08,65.08,77.64,67.83
June,69.32,72.24,98.06,62.06,75.38,73.46
July,75.13,66.52,104.62,57.01,74.72,75.93
August,75.15,74.0,113.76,47.09,73.35,70.72
September,62.97,70.22,114.36,48.08,79.59,74.55
October,59.49,78.25,108.92,48.86,79.18,84.11


In [49]:
def add_year(price_year, color):
    price = price_year
    name = price[5:] + ' Price'
    fig.add_trace(go.Scatter(x=sqlDF.index, y=sqlDF[price_year],
                            name=name, line=dict(color=color, 
                                                            width=3, dash='dot')))
fig = go.Figure()
add_year('Price2006', 'firebrick')
add_year('Price2009', 'royalblue')
add_year('Price2012', 'goldenrod')
add_year('Price2015', 'darkturquoise')
add_year('Price2018', 'forestgreen')
add_year('Price2021', 'darkgrey')
fig.update_layout(title='Crude oil price with three years gap', width=700,
                  yaxis_title = 'Price (US$/Barrel)',
                  xaxis = dict(
                      tickvals = month_order, 
                      ticktext = month_short
                  )
                  )
fig.show()

**Production Difference**

In [50]:
month_order = ['January', 'February', 'March', 'April', 'May', 'June', 
               'July', 'August', 'September', 'October', 'November', 'December']
query = """
WITH first AS (
    SELECT  c1.month, c1.production AS Prod2006, c2.production AS Prod2009
      FROM  crudeOilPrice AS c1
INNER JOIN  crudeOilPrice AS c2
        ON  c1.month = c2.month
     WHERE  c1.year = 2006 AND c2.year = c1.year + 3),

sec AS (
    SELECT  c1.month, c1.production AS Prod2012, c2.production AS Prod2015
      FROM  crudeOilPrice AS c1
INNER JOIN  crudeOilPrice AS c2
        ON  c1.month = c2.month
     WHERE  c1.year = 2012 AND c2.year = c1.year + 3),

third AS (
    SELECT  c1.month, c1.production AS Prod2018, c2.production AS Prod2021
      FROM  crudeOilPrice AS c1
INNER JOIN  crudeOilPrice AS c2
        ON  c1.month = c2.month
     WHERE  c1.year = 2018 AND c2.year = c1.year + 3)
     
    SELECT  f.month, f.Prod2006, f.Prod2009, s.Prod2012, s.Prod2015, t.Prod2018, t.Prod2021
      FROM  first AS f
 LEFT JOIN  sec AS s
     USING  (month)
 LEFT JOIN  third AS t
     USING  (month)
"""

sqlDF = pd.read_sql_query(query, conn, index_col='month').reindex(month_order)
sqlDF

Unnamed: 0_level_0,Prod2006,Prod2009,Prod2012,Prod2015,Prod2018,Prod2021
month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
January,2.59,2.03,2.23,2.2,2.0,1.36
February,2.47,2.06,2.4,2.21,2.01,1.42
March,2.25,2.07,2.34,2.07,1.94,1.43
April,2.32,1.86,2.3,2.03,1.97,1.37
May,2.28,2.22,2.4,2.05,1.78,1.34
June,2.41,2.17,2.37,1.97,1.78,1.31
July,2.39,2.14,2.42,2.18,1.83,1.32
August,2.4,2.12,2.48,2.12,2.0,1.24
September,2.4,2.18,2.45,2.22,1.96,1.25
October,2.38,2.28,2.19,2.21,2.01,1.23


In [51]:
def add_year(price_year, color):
    price = price_year
    name = price[4:] + ' Production'
    fig.add_trace(go.Scatter(x=sqlDF.index, y=sqlDF[price_year],
                            name=name, line=dict(color=color, 
                                                            width=3, dash='dot')))
fig = go.Figure()
add_year('Prod2006', 'firebrick')
add_year('Prod2009', 'royalblue')
add_year('Prod2012', 'goldenrod')
add_year('Prod2015', 'darkturquoise')
add_year('Prod2018', 'forestgreen')
add_year('Prod2021', 'darkgrey')

fig.update_layout(title='Crude oil production with three years gap', width=700,
                  yaxis_title = 'Production (mbd)',
                  xaxis = dict(
                      tickvals = month_order, 
                      ticktext = month_short
                  )
                  )
fig.show()

In [53]:
month_order = ['January', 'February', 'March', 'April', 'May', 'June', 
               'July', 'August', 'September', 'October', 'November', 'December']
query = """
WITH first AS (
    SELECT  c1.month, c1.crude_oil_export AS Export2006, c2.crude_oil_export AS Export2009
      FROM  crudeOilPrice AS c1
INNER JOIN  crudeOilPrice AS c2
        ON  c1.month = c2.month
     WHERE  c1.year = 2006 AND c2.year = c1.year + 3),

sec AS (
    SELECT  c1.month, c1.crude_oil_export AS Export2012, c2.crude_oil_export AS Export2015
      FROM  crudeOilPrice AS c1
INNER JOIN  crudeOilPrice AS c2
        ON  c1.month = c2.month
     WHERE  c1.year = 2012 AND c2.year = c1.year + 3),

third AS (
    SELECT  c1.month, c1.crude_oil_export AS Export2018, c2.crude_oil_export AS Export2021
      FROM  crudeOilPrice AS c1
INNER JOIN  crudeOilPrice AS c2
        ON  c1.month = c2.month
     WHERE  c1.year = 2018 AND c2.year = c1.year + 3)
     
    SELECT  f.month, f.Export2006, f.Export2009, s.Export2012, s.Export2015, t.Export2018, t.Export2021
      FROM  first AS f
 LEFT JOIN  sec AS s
     USING  (month)
 LEFT JOIN  third AS t
     USING  (month)
"""

sqlDF = pd.read_sql_query(query, conn, index_col='month').reindex(month_order)
sqlDF

Unnamed: 0_level_0,Export2006,Export2009,Export2012,Export2015,Export2018,Export2021
month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
January,2.14,1.58,1.78,1.75,1.55,0.91
February,2.02,1.61,1.95,1.76,1.56,0.97
March,1.8,1.62,1.89,1.62,1.49,0.98
April,1.87,1.41,1.85,1.58,1.52,0.92
May,1.83,1.77,1.95,1.6,1.33,0.89
June,1.96,1.72,1.92,1.52,1.33,0.86
July,1.94,1.69,1.97,1.73,1.38,0.87
August,1.95,1.67,2.03,1.67,1.55,0.79
September,1.95,1.73,2.0,1.77,1.51,0.8
October,1.93,1.83,1.74,1.76,1.56,0.78


In [54]:
def add_year(price_year, color):
    price = price_year
    name = price[6:] + ' Export'
    fig.add_trace(go.Scatter(x=sqlDF.index, y=sqlDF[price_year],
                            name=name, line=dict(color=color, 
                                                            width=3, dash='dot')))
fig = go.Figure()
add_year('Export2006', 'firebrick')
add_year('Export2009', 'royalblue')
add_year('Export2012', 'goldenrod')
add_year('Export2015', 'darkturquoise')
add_year('Export2018', 'forestgreen')
add_year('Export2021', 'darkgrey')

fig.update_layout(title='Crude oil export with three years gap', width=700,
                  yaxis_title = 'Export (mbd)',
                  xaxis = dict(
                      tickvals = month_order, 
                      ticktext = month_short
                  )
                  )
fig.show()

In [25]:
# fig = go.Figure()
# fig.add_trace(go.Scatter(x=sqlDF.index, y=sqlDF.Price2006,
#                          name='2006 Production', line=dict(color='firebrick', 
#                                                            width=3, dash='dot')))
# fig.add_trace(go.Scatter(x=sqlDF.index, y=sqlDF.Price2009,
#                          name='2009 Production', line=dict(color='royalblue', 
#                                                            width=3, dash='dot')))
# fig.add_trace(go.Scatter(x=sqlDF.index, y=sqlDF.Price2012,
#                          name='2012 Production', line=dict(color='goldenrod', 
#                                                            width=3, dash='dot')))
# fig.add_trace(go.Scatter(x=sqlDF.index, y=sqlDF.Price2015,
#                          name='2015 Production', line=dict(color='midnightblue', 
#                                                            width=3, dash='dot')))
# fig.add_trace(go.Scatter(x=sqlDF.index, y=sqlDF.Price2018,
#                          name='2018 Production', line=dict(color='darkturquoise', 
#                                                            width=3, dash='dot')))
# fig.add_trace(go.Scatter(x=sqlDF.index, y=sqlDF.Price2021,
#                          name='2021 Production', line=dict(color='saddlebrown', 
#                                                            width=3, dash='dot')))
# fig.update_layout(title='Crude oil price with three years gap', width=800,
#                   yaxis_title = 'Crude Oil Price (US$/Barrel)',
#                   )
# fig.show()

* Summarize the three numeric column

In [56]:
query = """
SELECT  min(crude_oil_price) Min_Price,
        ROUND(avg(crude_oil_price), 2) Avg_Price,
        max(crude_oil_price) Max_Price,
        min(production) Min_Production,
        ROUND(avg(production), 2) Avg_Production,
        max(production) Max_Production,
        min(crude_oil_export) Min_Export,
        ROUND(avg(crude_oil_export), 2) Avg_Export,
        max(crude_oil_export) Max_Export
  FROM  crudeOilPrice
"""

pd.read_sql_query(query, conn)

DatabaseError: Execution failed on sql '
SELECT  min(crude_oil_price) Min_Price,
        ROUND(avg(crude_oil_price) Avg_Price, 2),
        max(crude_oil_price) Max_Price,
        min(production) Min_Production,
        ROUND(avg(production) Avg_Production, 2),
        max(production) Max_Production,
        min(crude_oil_export) Min_Export,
        ROUND(avg(crude_oil_export) Avg_Export, 2),
        max(crude_oil_export) Max_Export
  FROM  crudeOilPrice
': near "Avg_Price": syntax error

Summary of Oil Price, Production and Export By Month and Year

In [28]:
query = """
  SELECT month Month,
         min(crude_oil_price) Min_Price,
         avg(crude_oil_price) Avg_Price,
         max(crude_oil_price) Max_Price
    FROM crudeOilPrice
GROUP BY month
ORDER BY Avg_Price
"""

pd.read_sql_query(query, conn)

Unnamed: 0,Month,Min_Price,Avg_Price,Max_Price
0,January,30.66,73.427059,115.24
1,December,37.8,73.53375,114.49
2,November,42.7,75.0225,113.92
3,February,31.7,75.730588,121.87
4,October,39.74,76.221875,113.12
5,March,32.29,78.700588,128.0
6,April,14.28,78.901176,124.49
7,September,40.85,79.057647,115.73
8,August,45.06,79.690588,115.84
9,May,27.9,81.265882,126.57


In [29]:
query = """
  SELECT year Year,
         min(crude_oil_price) Min_Price,
         avg(crude_oil_price) Avg_Price,
         max(crude_oil_price) Max_Price
    FROM crudeOilPrice
GROUP BY year
ORDER BY Avg_Price
"""

pd.read_sql_query(query, conn)

Unnamed: 0,Year,Min_Price,Avg_Price,Max_Price
0,2020,14.28,41.89,66.68
1,2016,30.66,43.806667,53.48
2,2015,37.8,52.653333,65.08
3,2017,46.39,54.085833,65.11
4,2009,44.95,63.9,78.25
5,2019,59.1,65.85,73.65
6,2006,59.49,66.668333,75.15
7,2021,54.87,70.12,84.11
8,2018,62.0,72.6575,79.59
9,2007,55.57,74.9625,95.05


In [30]:
query = """
  SELECT month Month,
         min(production) Min_Production,
         avg(production) Avg_Production,
         max(production) Max_Production
    FROM crudeOilPrice
GROUP BY month
ORDER BY Avg_Production
"""

pd.read_sql_query(query, conn)

Unnamed: 0,Month,Min_Production,Avg_Production,Max_Production
0,June,1.16,1.971176,2.41
1,May,1.02,1.978824,2.5
2,December,1.2,2.009375,2.58
3,July,1.08,2.009412,2.48
4,April,1.22,2.014118,2.42
5,March,1.24,2.015294,2.44
6,August,0.97,2.015882,2.5
7,September,0.94,2.021176,2.48
8,November,1.23,2.035625,2.5
9,February,1.26,2.095294,2.51


In [31]:
query = """
  SELECT year Year,
         min(production) Min_Production,
         avg(production) Avg_Production,
         max(production) Max_Production
    FROM crudeOilPrice
GROUP BY year
ORDER BY Avg_Production
"""

pd.read_sql_query(query, conn)

Unnamed: 0,Year,Min_Production,Avg_Production,Max_Production
0,2022,0.94,1.143333,1.4
1,2021,1.2,1.308333,1.43
2,2020,1.42,1.755833,2.07
3,2016,1.5,1.816667,2.15
4,2017,1.6,1.889167,2.01
5,2018,1.78,1.915,2.01
6,2019,1.94,2.0125,2.11
7,2008,1.96,2.099167,2.26
8,2009,1.86,2.110833,2.28
9,2015,1.97,2.126667,2.22


In [32]:
query = """
  SELECT month Month,
         min(crude_oil_export) Min_Export,
         avg(crude_oil_export) Avg_Export,
         max(crude_oil_export) Max_Export
    FROM crudeOilPrice
GROUP BY month
ORDER BY Avg_Export
"""

pd.read_sql_query(query, conn)

Unnamed: 0,Month,Min_Export,Avg_Export,Max_Export
0,June,0.71,1.521176,1.96
1,May,0.57,1.528824,2.05
2,December,0.75,1.559375,2.13
3,July,0.63,1.559412,2.03
4,April,0.77,1.564118,1.97
5,March,0.79,1.565294,1.99
6,August,0.52,1.565882,2.05
7,September,0.49,1.571176,2.03
8,November,0.78,1.585625,2.05
9,February,0.81,1.645294,2.06


In [33]:
query = """
  SELECT year Year,
         min(crude_oil_Export) Min_Export,
         avg(crude_oil_Export) Avg_Export,
         max(crude_oil_Export) Max_Export
    FROM crudeOilPrice
GROUP BY year
ORDER BY Avg_Export
"""

pd.read_sql_query(query, conn)

Unnamed: 0,Year,Min_Export,Avg_Export,Max_Export
0,2022,0.49,0.693333,0.95
1,2021,0.75,0.858333,0.98
2,2020,0.97,1.305833,1.62
3,2016,1.05,1.366667,1.7
4,2017,1.15,1.439167,1.56
5,2018,1.33,1.465,1.56
6,2019,1.49,1.5625,1.66
7,2008,1.51,1.649167,1.81
8,2009,1.41,1.660833,1.83
9,2015,1.52,1.676667,1.77


Let's look at years that their average is above the overall average for price, production and export

In [34]:
query = """
SELECT Year, Year_avg, sub.Overall_avg
  FROM (SELECT year Year, avg(crude_oil_export) Year_avg, 
               (SELECT avg(crude_oil_export) 
                  FROM crudeOilPrice) Overall_avg
          FROM crudeOilPrice
      GROUP BY year) sub
 WHERE Year_avg > sub.Overall_avg
"""
pd.read_sql_query(query, conn)

Unnamed: 0,Year,Year_avg,Overall_avg
0,2006,1.9325,1.581741
1,2007,1.751667,1.581741
2,2008,1.649167,1.581741
3,2009,1.660833,1.581741
4,2010,2.0175,1.581741
5,2011,1.930833,1.581741
6,2012,1.868333,1.581741
7,2013,1.733333,1.581741
8,2014,1.755833,1.581741
9,2015,1.676667,1.581741


Let's see the relationship between Nigeria Crude Oil export and other attributes

Since SQLite doesn't have a built in correlation function and doen't support windows function. I will use pandas to find the relation.

In [35]:
query = "SELECT * FROM crudeOilPrice"
sqlDF = pd.read_sql_query(query, conn)
sqlDF.corr()["crude_oil_export"]

crude_oil_price     0.199156
production          1.000000
crude_oil_export    1.000000
Name: crude_oil_export, dtype: float64

That's interesting! Crude oil price has a perfect positive correlation. Which means we can use just the production column and traiin a model to predict future export of crude oil.

If you want to see this model, do well to follow me as i will be taking my analysis a bit further by build a linear regression model that can predict Nigeria Future Crude Oil Export.

In [36]:
month_order = ['January', 'February', 'March', 'April', 
               'May', 'June', 'July', 'August', 'September']

query = """
SELECT  month, production AS "total_production_(mbd)", 
        crude_oil_price AS "total_price_(US$/Barrel)"
  FROM  crudeOilPrice
 WHERE  year = '2022'
"""
sqlDF = pd.read_sql_query(query, conn, index_col='month').reindex(month_order)
sqlDF

Unnamed: 0_level_0,total_production_(mbd),total_price_(US$/Barrel)
month,Unnamed: 1_level_1,Unnamed: 2_level_1
January,1.4,88.71
February,1.26,99.64
March,1.24,121.23
April,1.22,106.51
May,1.02,116.72
June,1.16,130.1
July,1.08,120.54
August,0.97,106.34
September,0.94,93.25


In [37]:
query = """
  SELECT  month, SUM(production) AS total_production, year
    FROM  crudeOilPrice
   WHERE  year = '2022'
GROUP BY  month
"""
sqlDF = pd.read_sql_query(query, conn)
print(sqlDF)

       month  total_production  year
0      April              1.22  2022
1     August              0.97  2022
2   February              1.26  2022
3    January              1.40  2022
4       July              1.08  2022
5       June              1.16  2022
6      March              1.24  2022
7        May              1.02  2022
8  September              0.94  2022


In [38]:
query = """
  SELECT month, SUM(production) AS total_production, year
    FROM crudeOilPrice
   WHERE year = '2021'
GROUP BY month
"""
sqlDF = pd.read_sql_query(query, conn)
print(sqlDF)

        month  total_production  year
0       April              1.37  2021
1      August              1.24  2021
2    December              1.20  2021
3    February              1.42  2021
4     January              1.36  2021
5        July              1.32  2021
6        June              1.31  2021
7       March              1.43  2021
8         May              1.34  2021
9    November              1.23  2021
10    October              1.23  2021
11  September              1.25  2021


In [39]:
# Filtering monthly production
query = """
  SELECT c1.month, SUM(c1.production) AS total_prod_2021, sub.total_prod_2022
    FROM crudeOilPrice AS c1,
                        ( SELECT month, SUM(production) AS total_prod_2022
                            FROM crudeOilPrice
                           WHERE year = '2022'
                        GROUP BY month) AS sub
   WHERE year = '2021' AND (c1.month = sub.month) 
GROUP BY c1.month
"""
sqlDF = pd.read_sql_query(query, conn, index_col='month').reindex(month_order)
print(sqlDF)

           total_prod_2021  total_prod_2022
month                                      
January               1.36             1.40
February              1.42             1.26
March                 1.43             1.24
April                 1.37             1.22
May                   1.34             1.02
June                  1.31             1.16
July                  1.32             1.08
August                1.24             0.97
September             1.25             0.94


In [40]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=sqlDF.index, y=sqlDF.total_prod_2021,
                         name='2021 Production', line=dict(color='firebrick', 
                                                           width=4, dash='dot')))
fig.add_trace(go.Scatter(x=sqlDF.index, y=sqlDF.total_prod_2022,
                         name='2022 Production', line=dict(color='royalblue', 
                                                           width=4, dash='dot')))
fig.update_layout(title='Crude Oil Production (mbd)',
                  yaxis_title = 'Millions Barrels Per Day (mbd)',
                  )
fig.show()

In [41]:
# jupyter nbconvert Crude_oil_analysis.ipynb --to slides --post serve --no-input --no-prompt