# Getting the Most Out of Plotly Worked Example

Hello! This workbook is intended to teach you how to use plotly. It is not an extensive guide, rather it aims to teach you how to make basic and beautiful plots with whatever data you have at hand. We use the data from the United Kingdom Research Institute (UKRI) which can be found here: https://gtr.ukri.org/. I have filtered active projects funded by the Horizon Europe Guarantee and downloaded the result as a csv. For those new to Plotly, an extensive online guide is available: https://plotly.com/. Let's get started!

### What you Need to Run this Notebook - Optional Downloads

In [None]:
# Pandas install
#pip install pandas==1.5.3

# Plotly install
#pip install plotly==5.18.0

### Import Modules

In [None]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

### What Does This Data Show?

Let's first read in our data and take a look!

In [None]:
data = pd.read_csv("Horizon_Europe_Example.csv")

In [None]:
# Use shape to find the number of rows and columns
data.shape

So we have 3593 rows with 25 columns! Not all of these columns are terribly helpful - let's look at them in more detail.

In [None]:
# Use columns to find all the column names
data.columns

<p>Let's go over in more detail what all of these columns are and whether or not we will find them useful for our exericise...<p>

<p><strong>FundingOrgName</strong>: the funding organisation (this will be 'Horizon Europe Guarantee' as that is what I downloaded!)<br>

<strong>ProjectReference</strong>: the reference (unique identifier) for the project (not terribly useful unless we want to pull out a specific project)<br>

<strong>LeadROName</strong>: this will be the lead organisation name for the project (potentially interesting)<br>

<strong>Department</strong>: the department in the lead organisation (if applicable) (may be potentially interesting for lead organisations which have departments)<br>

<strong>PISurname</strong>: the surname of the Principal Investigator (PI) in charge of delivering the project (potentially interesting)<br>

<strong>PIFirstName</strong>: the first name of the Principal Investigator (PI) in charge of delivering the project (potentially interesting)<br>

<strong>PIOtherNames</strong>: other names of the Principal Investigator (PI) in charge of delivering the project (if applicable) (unlikely to be interesting)<br>

<strong>PI ORCID iD</strong>: the ORCID ID (https://orcid.org/) of the Principal Investigator (PI) in charge of delivering the project (might be interesting for further work)<br>

<strong>StudentSurname</strong>: the surname of the student attached to the project (if applicable) (unlikely to be interesting unless we look at types of funding which specifically cover students)<br>

<strong>StudentFirstName</strong>: the first name of the student attached to the project (if applicable) (unlikely to be interesting unless we look at types of funding which specifically cover students)<br>

<strong>Student ORCID iD</strong>: the ORCID ID (https://orcid.org/) of the student attached to the project (if applicable) (unlikely to be interesting unless we look at types of funding which specifically cover students)<br>

<strong>Title</strong>: the title of the project (might be useful for language processing exercise!)<br>

<strong>StartDate</strong>: when the funding started for the project (potentially interesting)<br>

<strong>EndDate</strong>: when the funding ended for the project (not useful as I downloaded actively funded projects!)<br>

<strong>AwardPounds</strong>: the amount of funding awarded for the project (definetly interested in this!)<br>

<strong>ExpenditurePounds</strong>: how much money was spent delivering the project (not useful as these are active projects!)<br>

<strong>Region</strong>: in which region the project is based (should be tied to where the lead organisation is based)<br>

<strong>Status</strong>: active or closed (not useful as they should all be active!)<br>

<strong>GTRProjectUrl</strong>: the Url for the project (might be useful if you wish to do a bit of web-scraping)<br>

<strong>FundingOrgID</strong>: the unique identifier for the funding organisation (should all be the same as they are all funded by the 'Horizon Europe Guarantee'!)<br>

<strong>LeadROId</strong>: the unique identifier to the lead organisation (might be useful if you don't want to use the name)<br>

<strong>PIId</strong>: the unique identifier for the Principal Investigator (PI) in charge of delivering the project (might be useful if you don't want to use the name, or there might be multiple people with the same name)<p>

In [None]:
# Let's see a quick glimpse of all the columns using info
data.info()

We can see that 'PIOtherNames', 'StudentSurname', 'StudentFirstName', 'StudentOtherNames', 'Student ORCID iD', and 'ExpenditurePounds' are all full of nulls! These definetly won't be useful. This is as expected as explained above on what the columns are refering to.

In [None]:
# Let's take a peak at the only column with numeric data - AwardPounds - using describe
data['AwardPounds'].describe()

We have a mean award amount of £525,558.3 with a minimum award amount of £0 and a maximum award amount of £1,601,813. The standard deviation is £724,219.3. Quite a spread!

### Let's Get Plotting!

I now want to plot the awarded amount by project category. A boxplot is a natural choice to see the spread of the data we just used pandas to describe.

In [None]:
# Plot project category by award amount
fig = px.box(data,x="ProjectCategory",y="AwardPounds")
fig.show()

We can immediately see that there are three project categories: EU-Funded, Fellowship, and Research Grant. While the data looks fine, the plot looks awful! You definetly wouldn't want to show this to any stakeholders. Let's customise our plot a bit more, we'll change the x- and y-axis labels to something more friendly, add a title and make the background a little nicer.

> <strong>Pro-tip</strong>, use Plotly's colour pallete to make your plot shine!<br>
https://plotly.com/python/discrete-color/

In [None]:
# Check colour pallete - useful for deciding on your colours
# Here I have gone for the Pastel pallete
print(px.colors.qualitative.Pastel)

In [None]:
# Plot project category by award amount - customised for stakeholder presentation
fig = px.box(data,x="ProjectCategory",y="AwardPounds",
            hover_data=["Region"],color="ProjectCategory",
            color_discrete_map= {"EU-Funded":"rgb(102, 197, 204)","Fellowship":"rgb(246, 207, 113)",
                                 "Research Grant":"rgb(248, 156, 116)"}) # add our colours!
# Add a title and make the background white
fig.update_layout(title_text="EU-Funded projects take in the majority of the funding",font=dict(size=14),
                 paper_bgcolor='white',
                 plot_bgcolor='white',
                 title_x=0.1,
                 showlegend=False)
# Add in an x-axis label
fig.update_xaxes(title="Project Category",
    showline=True,
    linecolor='black',
    gridcolor='lightgrey',
    showgrid=False
)
# Add in a y-axis label
fig.update_yaxes(title="Awarded Amount (£)",
    showline=True,
    linecolor='black',
    gridcolor='lightgrey',
    showgrid=True
)
fig.show()

As you can see from the plot, EU-Funded projects take in the majority of the funding (this shouldn't be surprising given the nature of the funding award!). Our plot is now a nice background colour, and I've added custom category colours to make it stand out.

You may have noticed that I've added a hover data to show Region. While using hover for exploring data is nice, we can go one step further and make it more obvious what the data is showing using our color option...

In [None]:
# Plot project category by award amount - what does splitting by region show?
fig = px.box(data,x="ProjectCategory",y="AwardPounds",
            hover_data=["Region"],color="Region")
# Add a title and make the background white
fig.update_layout(title_text="EU-Funded projects take in the majority of the funding",font=dict(size=14),
                 paper_bgcolor='white',
                 plot_bgcolor='white',
                 title_x=0.1)
# Add in an x-axis label
fig.update_xaxes(title="Project Category",
    showline=True,
    linecolor='black',
    gridcolor='lightgrey',
    showgrid=False
)
# Add in a y-axis label
fig.update_yaxes(title="Awarded Amount (£)",
    showline=True,
    linecolor='black',
    gridcolor='lightgrey',
    showgrid=True
)
fig.show()

Yikes that's a lot of regions! 11 categories makes it hard for people to interpret this plot easily. Let's make it simpler and combine regions.

> <strong>Pro-tip</strong>, if you want to split your data by a specific category don't overwhelm the viewer with too many options. Reduce and combine when you can!

#### Applying a Broad Region to our data

In [None]:
# Apply broader region criteria
def region(string):
    if string == "London":
        region="London"
    elif string == "East Midlands":
        region = "Midlands"
    elif string == "West Midlands":
        region = "Midlands"
    elif string == "North West":
        region = "North"
    elif string == "North East":
        region = "North"
    elif string == "South East":
        region = "South"
    elif string == "South West":
        region = "South"
    elif string == "East of England":
        region = "East"
    elif string == "Yorkshire and The Humber":
        region = "North"
    elif string == "Scotland":
        region = "Scotland, Wales & Northern Ireland"
    elif string == "Wales":
        region = "Scotland, Wales & Northern Ireland"
    elif string == "Northern Ireland":
        region = "Scotland, Wales & Northern Ireland"
    else:
        region = "Unknown"
    return region

In [None]:
# Make a new column with our broad region
data["Broad Region"] = data["Region"].apply(region)

In [None]:
# Plot project category by award amount, coloured by broad region
fig = px.box(data,x="ProjectCategory",y="AwardPounds",color="Broad Region",
                 hover_data=["LeadROName","PISurname","PIFirstName"],
            color_discrete_map= {"London":"rgb(102, 197, 204)","Midlands":"rgb(246, 207, 113)",
                                 "North":"rgb(248, 156, 116)",
                                 "Scotland, Wales & Northern Ireland":"rgb(220, 176, 242)",
                                "South":"rgb(135, 197, 95)","East":"rgb(158, 185, 243)"}, # add colour!
             # Let's add a nice order to these categories
            category_orders={"Broad Region":["London","North","South","East","Midlands",
                                                "Scotland, Wales & Northern Ireland","Unknown"],
                             # Here we want to make EU-Funded stand out a bit more by placing it in the middle
                            "ProjectCategory":["Research Grant","EU-Funded","Fellowship"]},
            # add in figure dimensions so it's not too squished! 
            width=900, height=500) 
# Describe your data in the title - can you see the change from the last plot?
fig.update_layout(title_text="Regions are equally represented among the different funding types",font=dict(size=14),
                 paper_bgcolor='white',
                 plot_bgcolor='white',
                 title_x=0.03)
# Our x-axis
fig.update_xaxes(title="Project Category",
    showline=True,
    linecolor='black',
    gridcolor='lightgrey',
    showgrid=False
)
# Our y-axis
fig.update_yaxes(title="Awarded Amount (£)",
    showline=True,
    linecolor='black',
    gridcolor='lightgrey',
    showgrid=True
)
fig.show()

> <strong>Pro-tip</strong>, don't forget to save your figure!

In [None]:
# Save the figure in the image folder
fig.write_image("images/horizon_funding_byregion.png",scale=3)

As you can see, having fewer regions makes this figure much easier to interpret! I have also placed the EU-Funded category in the middle of the plot so that its stands out more. You may have also noticed that my title has changed - by displaying the data by region we can clearly see that no single region gets a lot more funding than other regions. I have included some extra hover data, which is always fun to play around with. As you can see with hover data, it appears companies get a big chunk of funding compared to universities. We are going to explore this in our next plot!

### Let's explore top funders!

In [None]:
# first we need to get the relevant data - I've chosen those receiving more than £4 million pounds!
top_funders = data.loc[data["AwardPounds"] >= 4000000]

In [None]:
# We can see from shape that we have 11 organisations which receive more than £4 million pounds in the 
# EU-Funded category
top_funders.shape

In [None]:
# Let's sort the data for our next plot!
top_funders_sorted = top_funders.sort_values(by=["AwardPounds"],ascending=False)

In [None]:
# Plot top 11 organisations for award income
fig2 = px.bar(top_funders_sorted,x="AwardPounds",y="LeadROName", text_auto='.3s',
             color_discrete_sequence=[ 'rgb(179,205,227)'])
# Add a title and make the background white
fig2.update_layout(title_text="The Top 10 Organisations Take in Almost £90 Million Combined",font=dict(size=14),
                 paper_bgcolor='rgba(0,0,0,0)',
                 plot_bgcolor='rgba(0,0,0,0)',
                 title_x=0.5, # adjust this value to make your title looked centered
                  barmode='stack',
                  yaxis={'categoryorder':'total ascending'})
# Add x-axis title
fig2.update_xaxes(title="Awarded Amount (£)",
    showline=True,
    linecolor='black',
    gridcolor='lightgrey',
    showgrid=False
)
# Add y-axis title
fig2.update_yaxes(title="Organisation",
    showline=True,
    linecolor='black',
    gridcolor='lightgrey',
    showgrid=True)               
fig2.show()

Wow! You can see that Rolls-Royce acually got two awards! One for £12.4 million and another for £12.3 million! The University of Nottingham isn't doing too badly either at £4.14 million. I've plotted this bar chart with the highest amount on top - visually doing this type of chart should be either from smallest to highest or highest to smallest. Again, make your title snazzy and to the point.

I hope you've learnt how to make the most of plotly with this data. Feel free to play around below with your own custom and stylised plots.

> <strong>Pro-tip</strong>, Remember, practice goes a long way!

In [None]:
# add your own plots below...