# Emissions for five key air pollutants in England

A step by step example of scripted analysis

## Conventions

*  [PEP8 - Style guide](https://www.python.org/dev/peps/pep-0008/)
*  Explicit paramters
*  One expression per line

## Preparation

Use an [`import` statement](https://docs.python.org/3/reference/import.html) to make necessary [packages](https://docs.python.org/3/glossary.html#term-package) available. Packages, which may also be referred to as libraries, contain [modules](https://docs.python.org/3/glossary.html#term-module) that contain [functions](https://docs.python.org/3/glossary.html#term-function) that allow you to do things.

[`pandas`](https://pandas.pydata.org/) is the most popular Python package for data analysis and manipulation.

[`plotly`](https://plotly.com/python/) is an interactive ploting package for JavaScript, Python, and R. The [`express` module](https://plotly.com/python/plotly-express/) is a hig-level interface, intended to make the creation of common plots easier. The [`io` module](https://plotly.com/python-api-reference/plotly.io.html) is low-level interface for displaying, reading and writing figures.

Setting `pio.renderers.default` to `"png"` allows this notebook to display static version of the plots.

Using `as` allows you to refer to a library by an alias.

In [1]:
import pandas as pd
import plotly.express as px
import plotly.io as pio
pio.renderers.default = "vscode"

## Read input data

The [`read_excel` method](https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html) allows you to read a Excel or OpenDocument workbook from a local filesystem or URL. A [method](https://docs.python.org/3/glossary.html#term-method) is a special type of function that operates on a specific [class](https://docs.python.org/3/glossary.html#term-class) of [object](https://docs.python.org/3/glossary.html#term-object).

By default, `read_excel` reads the first sheet in a workbook but the `sheet_name` [parameter](https://docs.python.org/3/glossary.html#term-parameter) allows you to read a specific sheet. 

The `usecols` and `skiprows` parameters define a specific range within a sheet. As one might expect, `usecols` defines the columns to read, while `skiprows` defines the number of rows to skip. For example, if the table header row is at row 6, use `skiprows=5` 
  


In [2]:
df_raw = pd.read_excel(
    io="http://uk-air.defra.gov.uk/reports/cat09/2010220959_DA_API_1990-2018_V1.0.xlsx",
    sheet_name="England API", 
    usecols="B:AA",
    skiprows=13  
    )

df_raw

Unnamed: 0,ShortPollName,NFRCode,SourceName,1990,1995,1998,1999,2000,2001,2002,...,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018
0,CO,1A1a,Autogenerators,0.000000,0.000000,0.000861,0.000987,0.001225,0.001165,0.001197,...,0.072842,0.188518,0.380950,0.773427,1.108654,1.654888,2.648169,4.194255,4.936837,5.536873
1,CO,1A1a,Miscellaneous industrial/commercial combustion,0.372920,0.285545,0.104246,0.108912,0.118016,0.123270,0.126718,...,0.135080,0.127430,0.144871,0.148156,0.172320,0.125973,0.133799,0.127731,0.144326,0.136516
2,CO,1A1a,Power stations,91.485663,85.075695,48.035092,43.356297,52.875139,53.176906,52.253263,...,49.778303,54.550899,56.543332,67.445893,65.244769,55.655886,46.738127,30.714808,30.172769,38.881950
3,CO,1A1a,Public sector combustion,0.009175,0.015521,0.014372,0.014410,0.012844,0.013138,0.014227,...,0.013591,0.015413,0.017175,0.017028,0.018264,0.018127,0.019443,0.019173,0.022132,0.021411
4,CO,1A1b,Refineries - combustion,4.455526,4.998781,5.248910,4.862012,3.785558,2.391528,2.788338,...,4.460696,3.287260,2.801700,3.162767,3.266939,4.154463,5.523977,4.606405,4.050079,3.934826
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1597,Dioxins,5E,Accidental fires - other buildings,33.872274,31.567114,21.304133,24.019332,21.008371,23.956037,25.189342,...,14.384685,7.484103,7.735522,7.361696,5.908632,5.449799,5.676366,5.663757,5.629048,5.302077
1598,Dioxins,5E,Accidental fires - vehicles,2.078575,2.550182,2.790793,3.312796,3.492857,3.798588,3.531072,...,1.468508,1.249611,1.074494,0.914371,0.884555,0.876158,0.937898,1.046528,0.991551,0.968553
1599,Dioxins,5E,Bonfire night,5.659099,5.662411,5.669577,5.673903,5.677598,5.680664,5.682738,...,5.693059,5.696063,5.698611,5.702258,5.706041,5.710051,5.714023,5.717025,5.719205,5.721748
1600,Dioxins,5E,Regeneration of activated carbon,0.004980,0.004965,0.005035,0.005053,0.005066,0.005064,0.005064,...,0.005050,0.005067,0.005065,0.005064,0.005062,0.005067,0.005074,0.005079,0.005079,0.005086


## Filter rows

The [`query` method](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.query.html) fliters the specified column using the specified values.

The `expr` parameter takes an [expression](https://docs.python.org/3/reference/expressions.html) to be evaluated as an [argument](https://docs.python.org/3/glossary.html#term-argument). This syntax for this expression is `'column_name` [`comparison_operator`](https://docs.python.org/3/reference/expressions.html#comparisons) `value(s)'`. 

The expression `'ShortPollName == ["NH3 Total", "NOx Total", "SO2 Total", "VOC Total", "PM2.5 Total"]'` means return rows were `ShortPollName` is equal to (`==`) any of the values in the list `["NH3 Total", "NOx Total", "SO2 Total", "VOC Total", "PM2.5 Total"]`. 




In [3]:
df_filtered_rows = (df_raw
    .query(
        expr='ShortPollName == ["NH3 Total", "NOx Total", "SO2 Total", "VOC Total", "PM2.5 Total"]'
        ))

df_filtered_rows

Unnamed: 0,ShortPollName,NFRCode,SourceName,1990,1995,1998,1999,2000,2001,2002,...,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018
337,NOx Total,,,2397.847344,1945.014746,1646.545024,1549.651242,1480.364245,1454.626845,1375.173857,...,927.922721,903.036656,842.171153,859.742287,811.687614,757.196642,718.376897,641.083469,625.991775,604.799111
791,PM2.5 Total,,,174.144903,135.004607,121.529144,120.441788,109.819938,109.66065,98.334888,...,84.650568,90.751568,81.645427,86.531421,87.768569,82.850096,83.087125,81.753703,81.518031,83.142806
906,SO2 Total,,,3134.835121,2019.423573,1396.569378,1052.827899,993.424139,919.358221,838.663774,...,311.556834,312.14,303.511355,352.775618,300.543636,245.606037,183.677453,128.790102,131.065832,118.684955
1150,VOC Total,,,2109.138181,1651.104184,1405.042632,1264.700141,1147.39259,1093.962465,1013.62624,...,625.994844,605.245837,587.781187,569.848492,546.189339,542.77964,539.211414,527.043135,527.189567,526.173425
1262,NH3 Total,,,232.760876,207.731501,219.459243,215.01839,209.947757,202.172804,201.042769,...,177.554475,177.470159,181.568684,178.212905,174.402987,184.473002,187.422548,188.836776,189.99617,189.813783


## Drop unnecessary columns
The [drop method](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html) removes the specified columns.

In [4]:
df_filtered_cols = (df_filtered_rows
    .drop(
        columns=['NFRCode', 'SourceName']
        ))

df_filtered_cols


Unnamed: 0,ShortPollName,1990,1995,1998,1999,2000,2001,2002,2003,2004,...,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018
337,NOx Total,2397.847344,1945.014746,1646.545024,1549.651242,1480.364245,1454.626845,1375.173857,1364.114058,1320.67586,...,927.922721,903.036656,842.171153,859.742287,811.687614,757.196642,718.376897,641.083469,625.991775,604.799111
791,PM2.5 Total,174.144903,135.004607,121.529144,120.441788,109.819938,109.66065,98.334888,99.880007,97.109635,...,84.650568,90.751568,81.645427,86.531421,87.768569,82.850096,83.087125,81.753703,81.518031,83.142806
906,SO2 Total,3134.835121,2019.423573,1396.569378,1052.827899,993.424139,919.358221,838.663774,828.670702,690.699715,...,311.556834,312.14,303.511355,352.775618,300.543636,245.606037,183.677453,128.790102,131.065832,118.684955
1150,VOC Total,2109.138181,1651.104184,1405.042632,1264.700141,1147.39259,1093.962465,1013.62624,951.860757,894.585081,...,625.994844,605.245837,587.781187,569.848492,546.189339,542.77964,539.211414,527.043135,527.189567,526.173425
1262,NH3 Total,232.760876,207.731501,219.459243,215.01839,209.947757,202.172804,201.042769,194.227107,198.347573,...,177.554475,177.470159,181.568684,178.212905,174.402987,184.473002,187.422548,188.836776,189.99617,189.813783


## Clean data

The [`assign` method](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.assign.html?highlight=assign#pandas.DataFrame.assign) creates a new column or transforms an existing column. The syntax is `column_name=expression`, where the `expression` generates the values for the column `column_name`.

I want to remove the string `' Total'` from every row in the `ShortPollName` column and change `'VOC'` to `'NMVOC'`. To do this, I specify the column using `df.column_name`, then call the [`str.replace` method](https://pandas.pydata.org/docs/reference/api/pandas.Series.str.replace.html#pandas.Series.str.replace) on it twice. The first `str.replace` method replaces all occurances of the pattern `' Total'` with nothing (the equivalent of removing it) and the second call replaces occuances of the pattern `'VOC'` with `'NMVOC'`.   

In [5]:
df_cleaned = (df_filtered_cols
    .assign(
        ShortPollName=(df_filtered_cols
        .ShortPollName
        .str.replace(
            pat=' Total',
            repl=''
            )
        .str.replace(
            pat='VOC',
            repl='NMVOC'
            ))
        ))

df_cleaned

Unnamed: 0,ShortPollName,1990,1995,1998,1999,2000,2001,2002,2003,2004,...,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018
337,NOx,2397.847344,1945.014746,1646.545024,1549.651242,1480.364245,1454.626845,1375.173857,1364.114058,1320.67586,...,927.922721,903.036656,842.171153,859.742287,811.687614,757.196642,718.376897,641.083469,625.991775,604.799111
791,PM2.5,174.144903,135.004607,121.529144,120.441788,109.819938,109.66065,98.334888,99.880007,97.109635,...,84.650568,90.751568,81.645427,86.531421,87.768569,82.850096,83.087125,81.753703,81.518031,83.142806
906,SO2,3134.835121,2019.423573,1396.569378,1052.827899,993.424139,919.358221,838.663774,828.670702,690.699715,...,311.556834,312.14,303.511355,352.775618,300.543636,245.606037,183.677453,128.790102,131.065832,118.684955
1150,NMVOC,2109.138181,1651.104184,1405.042632,1264.700141,1147.39259,1093.962465,1013.62624,951.860757,894.585081,...,625.994844,605.245837,587.781187,569.848492,546.189339,542.77964,539.211414,527.043135,527.189567,526.173425
1262,NH3,232.760876,207.731501,219.459243,215.01839,209.947757,202.172804,201.042769,194.227107,198.347573,...,177.554475,177.470159,181.568684,178.212905,174.402987,184.473002,187.422548,188.836776,189.99617,189.813783


## Tidy data
The [melt method](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.melt.html) unpivots wider-format dataframes into a longer-format. This is tidier, as the variable `Year` is now captured in a column instead of a column header, but it also allows us to perform vecortised computations on the `Emissions` column.

The `id_vars` parameter specifies the column(s) that identify each observation. The `var_name` parameter defines the name of the column that will store the un-pivoted column names. The `value_nam` parameter defines the name of the column that will store the un-pivoted values.

For more information about the theory and benefits of the tidy data format see [Wickham, Hadley . "Tidy Data." Journal of Statistical Software [Online], 59.10 (2014): 1 - 23. Web. 20 Nov. 2020](http://dx.doi.org/10.18637/jss.v059.i10)

In [6]:
df_tidied = (df_cleaned
    .melt(
        id_vars='ShortPollName', 
        var_name='Year', 
        value_name='Emissions'
        ))

df_tidied


Unnamed: 0,ShortPollName,Year,Emissions
0,NOx,1990,2397.847344
1,PM2.5,1990,174.144903
2,SO2,1990,3134.835121
3,NMVOC,1990,2109.138181
4,NH3,1990,232.760876
...,...,...,...
110,NOx,2018,604.799111
111,PM2.5,2018,83.142806
112,SO2,2018,118.684955
113,NMVOC,2018,526.173425


## Filter tidied data

As the `Year` values are now stored in a column, it's very easy to filter the dataset by year.

In [7]:
df_tidied_filtered = (df_tidied
    .query(
        expr='Year >= 1998'
        ))

df_tidied_filtered

Unnamed: 0,ShortPollName,Year,Emissions
10,NOx,1998,1646.545024
11,PM2.5,1998,121.529144
12,SO2,1998,1396.569378
13,NMVOC,1998,1405.042632
14,NH3,1998,219.459243
...,...,...,...
110,NOx,2018,604.799111
111,PM2.5,2018,83.142806
112,SO2,2018,118.684955
113,NMVOC,2018,526.173425


## Add the Index column

Use the `assign` method to derived a new column called `Index` from the `Emissions` column. The `Emissions` value for each pollutant should be indexed to 1998.

The [`groupby` method](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html?highlight=groupby#pandas.DataFrame.groupby) indicates that the following methods should be applied to each group separately. The `Emissions` [attribute](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html?highlight=attributes#attribute-access) indicates that following method will only be called on the `Emissions` column. The [`apply` method](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.GroupBy.apply.html?highlight=apply#pandas.core.groupby.GroupBy.apply) applies the specified function to each item in the group.

The [`lambda` keyword](https://docs.python.org/3/tutorial/controlflow.html#lambda-expressions) allows us to define an anonymous function. 

Calling the [`div` method](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.div.html) on series `x` with argument `y` -- `x.div(y)` -- divides each item in `x` by `y`.

Calling the [`iloc` method](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.iloc.html) on series `x` with argument `y`  -- `x.iloc[y]` -- returns the item at index postion `y` in the series `x`. Notes that Python uses zero-based indexed, so calling `iloc[0]` on x returns the first item, and that it uses square brackets for indexing.

Calling the [`mul` method](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.mul.html) on object `x` with argument `y` -- `x.mul(y)` -- multiplies each item in `x` by `y`.


In [8]:
df_tidy_indexed = (df_tidied_filtered
    .assign(
        Index=(df_tidied_filtered
            .groupby('ShortPollName')
            .Emissions
            .apply(lambda x: (x.div(x.iloc[0]).mul(100))))
        ))

df_tidy_indexed

Unnamed: 0,ShortPollName,Year,Emissions,Index
10,NOx,1998,1646.545024,100.000000
11,PM2.5,1998,121.529144,100.000000
12,SO2,1998,1396.569378,100.000000
13,NMVOC,1998,1405.042632,100.000000
14,NH3,1998,219.459243,100.000000
...,...,...,...,...
110,NOx,2018,604.799111,36.731404
111,PM2.5,2018,83.142806,68.413883
112,SO2,2018,118.684955,8.498321
113,NMVOC,2018,526.173425,37.448929


## Reset dataframe index

Though not strictly necessary, the [`reset_index` method](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reset_index.html?highlight=reset_index#pandas.DataFrame.reset_index) resets the row numbers.

In [9]:
df_processed_step = (df_tidy_indexed
    .reset_index(drop=True))

df_processed_step


Unnamed: 0,ShortPollName,Year,Emissions,Index
0,NOx,1998,1646.545024,100.000000
1,PM2.5,1998,121.529144,100.000000
2,SO2,1998,1396.569378,100.000000
3,NMVOC,1998,1405.042632,100.000000
4,NH3,1998,219.459243,100.000000
...,...,...,...,...
100,NOx,2018,604.799111,36.731404
101,PM2.5,2018,83.142806,68.413883
102,SO2,2018,118.684955,8.498321
103,NMVOC,2018,526.173425,37.448929


In [13]:
df_processed_step.to_csv(
    path_or_buf="/home/edfawcetttaylor/repos/sdg-data-test/data/indicator_A1.csv",
    columns=["Year", "ShortPollName", "Index"],
    header=["Year", "Pollutant", "Value"],
    index=False
)

## Create plot

[`px.line`](https://plotly.com/python-api-reference/generated/plotly.express.line.html) creates a line plot.

[`update_xaxes` and `update_yaxes`](https://plotly.com/python/axes/#set-axis-title-text-with-graph-objects) provide additional options for customising the x and y axes.

[`update_layout`](https://plotly.com/python-api-reference/generated/plotly.graph_objects.Layout.html#plotly.graph_objects.Layout) provides additional options for customising layout elements, such as the legend and cursor interaction.

[`add_annotation`](https://plotly.com/python/text-and-annotations/) adds text labels and annotations to plots.

[`add_shape`](https://plotly.com/python/shapes/) adds shapes, including lines, to plots.


In [31]:
(px.line(
    data_frame=df_processed_step,
    x='Year',
    y='Index',
    color='ShortPollName',
    color_discrete_map={
        'NH3': '#00AF41',
        'PM2.5': '#00AF41',
        'NOx': 'Grey',
        'NMVOC': '#007CBA',
        'SO2': '#007CBA'
    },
    line_dash='ShortPollName',
    line_dash_map={
        'NH3': 'solid',
        'PM2.5': 'dash',
        'NOx': 'solid',
        'NMVOC': 'solid',
        'SO2': 'dash'
    },
    title=f'A1 Emissions for five key air pollutants in England, {df_processed_step.Year.min()} to {df_processed_step.Year.max()}',
    labels={
        'ShortPollName': 'Pollutant'
        },
    template='simple_white'
        )
    .update_xaxes(
        title_text='',
        type='linear',
        dtick=2
        )
    .update_yaxes(
        title_text=f'Index({df_processed_step.Year.min()} = 100)',
        type='linear',
        dtick=10
        )
    .update_layout(
        hovermode='x unified',
        legend={
            'title': ''
            }
        )
    .add_annotation(
        showarrow=False,
        text='<b>Source:</b> Ricardo Energy & Environment',
        xref='paper',
        xanchor='left',
        x=-0.15,
        yref='paper',
        yanchor='top',
        y=-0.15
        )
    .add_shape(
        type='line',
        xref='x',
        x1=df_processed_step.Year.max(),
        y1=df_processed_step.Index.max(),
        yref='y',
        x0=df_processed_step.Year.min(),
        y0=df_processed_step.Index.max(),
        line={
            'color': 'Black',
            'dash': 'dot',
            'width': 1
            } 
        ))

# Iterations

## Write a helper function to calculate the index column

### Define index helper function

`def function_name(param)` uses the [`def` keyword](https://docs.python.org/3/tutorial/controlflow.html#defining-functions) to define a function called `function_name` that takes parameter `param`.

`"""Some text""" ` is a [doc-string](https://www.python.org/dev/peps/pep-0257/#id15), it should contian a consice and useful description of what the function does.


In [11]:
def base_year_index(x):
    """Divide each value in a series by the first value, then multiply it by 100"""
    return x.div(x.iloc[0]).mul(100)

### Test index helper function

Test that, given the series `2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0` the funcion base_year_index returns the series `100.0, 150.0, 200.0, 250.0, 300.0, 350.0, 400.0, 450.0, 500.0, 550.0`.


In [12]:
test_input = pd.Series([2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0])

test_expected =  pd.Series([100.0, 150.0, 200.0, 250.0, 300.0, 350.0, 400.0, 450.0, 500.0, 550.0])

pd.testing.assert_series_equal(
    left=base_year_index(test_input),
    right=test_expected
)

## Chain the methods together

Performing each transformation as a separate step makes the process very clear but it involves creating (and naming) a lot of intermediate variables, so I prefer to chain the methods calls together.

See [Tom Augspurger. "Modern Pandas (Part 2): Method Chaining." datasframe, https://tomaugspurger.github.io/method-chaining.html. 2020-11-26.](https://tomaugspurger.github.io/method-chaining.html)

In [13]:
df_processed_chain = (df_raw
    .query(
        'ShortPollName == ["NH3 Total", "NOx Total", "SO2 Total", "VOC Total", "PM2.5 Total"]'
        )
    .assign(
        ShortPollName=(df_filtered_cols
        .ShortPollName
        .str.replace(
            pat=' Total',
            repl=''
            )
        .str.replace(
            pat='VOC',
            repl='NMVOC'
            ))
        )
    .drop(
        columns=['NFRCode', 'SourceName']
        )
    .melt(
        id_vars='ShortPollName', 
        var_name='Year', 
        value_name='Emissions'
        )
    .query(
        'Year >= 1998'
        )
    .assign(
        Index=lambda x: (x
            .groupby('ShortPollName')
            .Emissions
            .apply(base_year_index))
        )
    .reset_index(
        drop=True
        ))

df_processed_chain

Unnamed: 0,ShortPollName,Year,Emissions,Index
0,NOx,1998,1646.545024,100.000000
1,PM2.5,1998,121.529144,100.000000
2,SO2,1998,1396.569378,100.000000
3,NMVOC,1998,1405.042632,100.000000
4,NH3,1998,219.459243,100.000000
...,...,...,...,...
100,NOx,2018,604.799111,36.731404
101,PM2.5,2018,83.142806,68.413883
102,SO2,2018,118.684955,8.498321
103,NMVOC,2018,526.173425,37.448929


### Check that the step by step and chained process produce the same results

In [14]:
pd.testing.assert_frame_equal(
    left=df_processed_step,
    right=df_processed_chain
)

## Turn the chain into a function

### Define the function

In [15]:
def process_A1(df):
    """Transform A1 raw input data for ploting"""
    return (df
    .query(
        'ShortPollName == ["NH3 Total", "NOx Total", "SO2 Total", "VOC Total", "PM2.5 Total"]'
        )
    .assign(
        ShortPollName=(df_filtered_cols
        .ShortPollName
        .str.replace(
            pat=' Total',
            repl=''
            )
        .str.replace(
            pat='VOC',
            repl='NMVOC'
            ))
        )
    .drop(
        columns=['NFRCode', 'SourceName']
        )
    .melt(
        id_vars='ShortPollName', 
        var_name='Year', 
        value_name='Emissions'
        )
    .query(
        'Year >= 1998'
        )
    .assign(
        Index=lambda x: (x
            .groupby('ShortPollName')
            .Emissions
            .apply(base_year_index))
        )
    .reset_index(
        drop=True
        ))

### Test the function

In [16]:
pd.testing.assert_frame_equal(
    left=df_processed_step,
    right=process_A1(df_raw)
)

### Use the function

In [17]:
df_processed_function = process_A1(df_raw)

df_processed_function

Unnamed: 0,ShortPollName,Year,Emissions,Index
0,NOx,1998,1646.545024,100.000000
1,PM2.5,1998,121.529144,100.000000
2,SO2,1998,1396.569378,100.000000
3,NMVOC,1998,1405.042632,100.000000
4,NH3,1998,219.459243,100.000000
...,...,...,...,...
100,NOx,2018,604.799111,36.731404
101,PM2.5,2018,83.142806,68.413883
102,SO2,2018,118.684955,8.498321
103,NMVOC,2018,526.173425,37.448929


## Write a plot creation function

### Define a plot creation function

In [18]:
def plot_A1(df, x, y, var, var_colours, var_line_style, title, labels, title_x, title_y, title_legend, source):
    """doc string"""
    return (px.line(
        df,
        x=x,
        y=y,
        color=var,
        color_discrete_map=var_colours,
        line_dash=var,
        line_dash_map=var_line_style,
        title=title,
        labels=labels,
        template='simple_white'
            )
        .update_xaxes(
            title_text=title_x,
            type='linear',
            dtick=2
            )
        .update_yaxes(
            title_text=title_y,
            type='linear',
            dtick=10
            )
        .update_layout(
            hovermode='x unified',
            legend={
                'title': title_legend
                }
            )
        .add_annotation(
            showarrow=False,
            text=f'<b>Source:</b> {source}',
            xref='paper',
            xanchor='left',
            x=-0.15,
            yref='paper',
            yanchor='top',
            y=-0.15
            )
        .add_shape(
            type='line',
            xref='x',
            x1=df_processed_chain.Year.max(),
            y1=df_processed_chain.Index.max(),
            yref='y',
            x0=df_processed_chain.Year.min(),
            y0=df_processed_chain.Index.max(),
            line={
                'color': 'Black',
                'dash': 'dot',
                'width': 1
                } 
            ))


### Use plot creation function

In [32]:
plot_A1(
    df=df_processed_chain,
    x='Year',
    y='Index',
    var='ShortPollName',
    var_colours={
        'NH3': '#00AF41',
        'PM2.5': '#00AF41',
        'NOx': 'Grey',
        'NMVOC': '#007CBA',
        'SO2': '#007CBA'
        },
    var_line_style={
        'NH3': 'solid',
        'PM2.5': 'dash',
        'NOx': 'solid',
        'NMVOC': 'solid',
        'SO2': 'dash'
        },
    title=f'A1 Emissions for five key air pollutants in England, {df_processed_chain.Year.min()} to {df_processed_chain.Year.max()}',
    labels={
        'ShortPollName': 'Pollutant'
        },
    title_x='',
    title_y=f'Index({df_processed_chain.Year.min()} = 100)',
    title_legend='', 
    source='Ricardo Energy & Environment'
)