# Set-Up

In [1]:
import sqlite3
import pandas as pd
import folium
import plotly.express as px

In [2]:
conn = sqlite3.connect('../data/data.db')
c = conn.cursor()

def execute_statement(statement):
    c.execute(statement)
    res = c.fetchall()
    column_names = [description[0] for description in c.description]
    return pd.DataFrame(res, columns=column_names)

# Yearly Data
In the real world, it is very common to want to view data over time. This could include yearly sales by a company, changes in stock prices over time, or the 7 day average of cases of a disease in a population. With the raw data in a database, we can use SQL to answer many of these time-series questions.

Below shows the number of live traded birds in the CITES database for every year since the year 2000.

In [6]:
execute_statement('''SELECT Year, SUM(Quantity) AS "Live Birds Traded"
                     FROM cites
                     WHERE Term="live"
                     GROUP BY Year
                     ORDER BY Year ASC''')

Unnamed: 0,Year,Live Birds Traded
0,2000,2945945.0
1,2001,2336013.0
2,2002,1703308.0
3,2003,2350549.0
4,2004,2009323.0
5,2005,1683936.0
6,2006,296375.0
7,2007,402991.0
8,2008,484714.0
9,2009,526260.0


# Moving Average

Moving average is a technique to smooth out time-series data points by taking an average over a period of time. The smoothing out of data points reduces the effect of short term fluctuations or anomalies from the data, and highlights the long term trends in the data. The period of time for which we take a moving average is dependent upon the question being answered or the application of the data.

In our CITES database we will be taking a moving average of the number of live birds traded each year. To show an example of how we would calculate this, we will take the year 2004 - where there were 2,009,323 live traded birds. If we wanted to take a 5-year moving average of this statistic then we also need the value for the four previous years (which you can see in the above table). We then take an average of those 5 values (2000, 2001, 2002, 2003, and 2004), which becomes the 5-year moving average for the year 2004.

The query below uses a window function to get our moving average, and then rounds it to an integer.

In [20]:
execute_statement('''SELECT YEAR, CAST(AVG(Traded) OVER (ORDER BY Year ASC ROWS 4 PRECEDING) AS INT) AS "Moving Average"
                     FROM (SELECT Year, SUM(Quantity) AS "Traded"
                           FROM cites
                           WHERE Term="live"
                           GROUP BY Year
                           ORDER BY Year ASC)''')

Unnamed: 0,Year,Moving Average
0,2000,2945945
1,2001,2640979
2,2002,2328422
3,2003,2333953
4,2004,2269027
5,2005,2016625
6,2006,1608698
7,2007,1348634
8,2008,975467
9,2009,678855


With our moving average results, it is very clear to see that the overall trend of trading live birds was decreasing in the first decade of the century. It then increased slightly around 2010-2012 before settling at a figure of around 750,000 live birds traded each year.

This is just one common statistics calculated with time series data.

# Running Totals

The running total works on a similar premise to the moving average - however we will be taking the total across all previous years instead of the average for a subset of previous years. The query to achieve this in SQL is very similar to our previous query, however we will replace AVG with SUM, and not specify the number of preceding rows in our window function.

In [21]:
execute_statement('''SELECT Year, CAST(SUM(Traded) OVER (ORDER BY YEAR ASC) AS INT) AS "Running Total Birds Traded"
                     FROM (SELECT Year, SUM(Quantity) AS "Traded"
                           FROM cites
                           WHERE Term="live"
                           GROUP BY Year
                           ORDER BY Year ASC)''')

Unnamed: 0,Year,Running Total Birds Traded
0,2000,2945945
1,2001,5281958
2,2002,6985266
3,2003,9335815
4,2004,11345138
5,2005,13029074
6,2006,13325449
7,2007,13728440
8,2008,14213154
9,2009,14739414


# Yearly Changes and Growth
Yearly (or sometimes monthly) changes in values is a very common statistic to calculate. In the business world this could be used to see how the profits in your current year have changed based upon previous years, or to see how your sales for a specific month compare to sales from the same month in the previous year.

To calculate this statistic, we need to make use of another function called the LAG function. The LAG function can be used with a window function to get the xth previous value. For example, LAG(Traded, 1) will get the last value, while LAG(Traded, 3) will get the 3rd last value. By subtracting the last value from the current value we can get the difference between the two years. If this difference is divided by the last value, we can also get the percentage difference.

The example below uses the LAG function to do just that, for the number of live birds traded each year.

In [30]:
execute_statement('''SELECT Year,
                            Traded,
                            CAST(Traded - LAG(Traded, 1) OVER (ORDER BY Year ASC) AS INT) AS "Yearly Difference",
                            ROUND((Traded - LAG(Traded, 1) OVER (ORDER BY Year ASC)) / LAG(Traded, 1) OVER(ORDER BY Year ASC) * 100, 2) AS "Percentage Difference"
                     FROM (SELECT Year, SUM(Quantity) AS "Traded"
                           FROM cites
                           WHERE Term="live"
                           GROUP BY Year
                           ORDER BY Year ASC)''')

Unnamed: 0,Year,Traded,Yearly Difference,Percentage Difference
0,2000,2945945.0,,
1,2001,2336013.0,-609932.0,-20.7
2,2002,1703308.0,-632705.0,-27.08
3,2003,2350549.0,647241.0,38.0
4,2004,2009323.0,-341226.0,-14.52
5,2005,1683936.0,-325387.0,-16.19
6,2006,296375.0,-1387561.0,-82.4
7,2007,402991.0,106616.0,35.97
8,2008,484714.0,81723.0,20.28
9,2009,526260.0,41546.0,8.57
