# Import packages

In [None]:
# Allows us to read-in csv files, and used for data manipulation
import pandas as pd

# Used to create regular expressions to match strings
import re

# Modules used to create interactive visualisations 
import plotly.express as px
import plotly.graph_objects as go

# Dataset 4

This dataset includes sexual identity estimates by gender from 2010 to 2014. This is presented at a UK level, and broken down by England, Wales, Scotland and Northern Ireland. I wanted this guide to include a demo of how to make interactive line graphs with gender identity data, but unfortunately given this is only the first year that the ONS has collected this data that was not possible. So I found a dataset from 2015 which involves experimental statistics that have been used in the Integrated Household Survey. For more info, you can check out this [ONS link](https://www.ons.gov.uk/peoplepopulationandcommunity/culturalidentity/sexuality/datasets/sexualidentitybyagegroupbycountry). 

In [None]:
df4 = pd.read_csv('../Data/cleaned_sexuality_df.csv')

In [None]:
# Brief glimpse at underlying data structure

df4.head(50)

## Data cleaning

When I first found this dataset it was very messy and formatted terribly, so I performed some cleaning on it in a separate jupyter notebook, to save cluttering this one and distracting from the main tutorial. If you'd like to see how I cleaned it up, please see the ['Data_cleaning_sexuality.ipynb'](Data_cleaning_sexuality.ipynb) notebook.

## Data pre-processing

The only pre-processing we're going to do is subset our data so that we have it ready to analyse in the following step.

In [None]:
# Filtering the dataset for England only

england_df = df4[df4['Country'] == 'England']

## Interactive linegraph

By now you probably know the drill. Just like we had our px.bar and px.scatter methods, we have a corresponding one for linegraphs, appropriately named px.line. The parameters used are the same, with the only difference being that we're using:

* facet_row - when we specify a categorical variable here (Gender), this instructs Plotly to create a separate subplot (a row) for each unique value. 

* facet_column - when we specify a categorical variable here (Country), this instructs Plotly to create a separate subplot (a column) for each unique value.

Thus, we get our 2x1 grid of linegraphs. If we added on another country e.g. Scotland, and used these same parameters we'd get a 2x3 grid, and so on. 

## Interactive legends

Again, the cool thing about Plotly's legends is that they are interactive by default. Thus, this allows us to omit values which dominate the graph and obscure our ability to get to the nitty gritty of the data.


In [None]:
# Specify hover_data

hover_data = {'Sexuality': True,
              'Percentage': ':.2f%',
             'Country': False,
             'Year': False,
             'Gender': True}

In [None]:

fig6 = px.line(england_df,
              x='Year',
              y='Percentage',
              color='Sexuality',
              facet_row='Gender',
              facet_col='Country',
              hover_data = hover_data,
              title='Sexuality Percentages by Gender in England (2010-2014)',
              markers=True,
              height = 800,
              width = 1000)

# Enhance the layout for readability
fig6.update_layout(title_x = 0.15,
    legend_title_text='Sexuality')

fig6.show()

In [None]:
# Finally, let's update our x-axis so that it only shows whole years

# dtick "M12" - tells plotly to place a tick every 12 months 
fig6.update_xaxes(dtick="M12", tickformat="%Y")