# Plotly Demonstration - Interactivity
Plotly is a powerful and versatile data visualization package for Python that allows you to create interactive plots and charts. It is built on top of the JavaScript library Plotly.js, which means that Plotly plots can be easily embedded into web applications. Some of the key benefits of using Plotly include:

1. **Interactive visualizations**: Plotly allows you to create highly interactive and dynamic visualizations that respond to user interactions, such as hovering over data points, clicking on legends, and zooming in and out of plots.

2. **Wide range of chart types**: Plotly supports a wide range of chart types, including scatter plots, line charts, bar charts, pie charts, heatmaps, and more. This makes it easy to create the right type of chart for your data and analysis.

3. **Easy customization**: Plotly allows you to customize nearly every aspect of your charts, including colors, fonts, layout, and annotations. This makes it easy to create charts that fit your specific needs and brand identity.

4. **Multiple output formats**: Plotly allows you to export your visualizations in multiple formats, including static images (PNG, JPG, SVG), interactive HTML files, and PDF documents. This makes it easy to share your results with others and embed them in presentations and reports.

5. **Integration with other Python packages**: Plotly can be used in conjunction with other popular Python packages, such as Pandas, NumPy, and Scikit-learn. This allows you to seamlessly integrate data analysis and visualization into your data science workflow.

Overall, Plotly is a powerful and flexible tool for creating interactive and dynamic data visualizations in Python. In this notebook, we will explore some of the key features and capabilities of Plotly and demonstrate how it can be used to create engaging and informative data visualizations.

In [1]:
import numpy as np
import pandas as pd
import plotly.express as px

### Data

The murders dataset is a data frame included in the R software package that contains information about murders in the United States from 1960 to 2014. The dataset includes variables such as the number of murders per year, the population of the United States, and the rate of murders per 100,000 people. The data is sourced from the Federal Bureau of Investigation's Uniform Crime Reports, which collects data on crime from law enforcement agencies across the country.

This dataset has been used in a number of studies and analyses to examine trends and patterns in murder rates over time, as well as the factors that may contribute to variations in murder rates across different regions and demographics. It is a rich dataset that can be used to explore a wide range of research questions related to crime and public safety.

In this notebook, we will use Python and Plotly to visualize and analyze the murders dataset. We will use a condensed version of the dataset that only includes information about US Gun Murders in 2010.

In [2]:
murders = pd.read_excel('https://kuleuven-mda.s3.eu-central-1.amazonaws.com/murders_data.xlsx')
murders.head()

Unnamed: 0,state,abb,region,population,total
0,Alabama,AL,South,4779736,135
1,Alaska,AK,West,710231,19
2,Arizona,AZ,West,6392017,232
3,Arkansas,AR,South,2915918,93
4,California,CA,West,37253956,1257


### Visualization
In this notebook, we will create a visualization that explores the relationship between the total number of gun murders and the population of each state in the United States. Specifically, we will create a scatter plot that displays the total number of gun murders per state on the y-axis and the population of the state on the x-axis. Each point on the scatter plot will represent a state, and we will color-code the points based on the region to which the state belongs (Northeast, Midwest, South, or West) for easy visualization.

In addition to the scatter plot, we will add a trendline to the plot to illustrate the overall relationship between gun murders and population size. The trendline will be created using a linear regression analysis, which will help us understand the direction and strength of the relationship between these two variables.

##### Plotly express

In [None]:
murders.loc[:, 'log_population'] = np.log10(murders.population,)
murders.loc[:, 'log_total'] = np.log10(murders.total)

fig = px.scatter(
    data_frame=murders, 
    x="log_population", 
    y="log_total", 
    trendline="ols",
    color="region",
    text="abb",
    trendline_scope="overall",
    hover_data=["population", "total"],
    hover_name="state",
    template='plotly_white',
    color_discrete_sequence=["#264653", "#2a9d8f", "#e76f51", "#f4a261", "#e9c46a"])

fig.update_layout(
    title="US Gun Murders in 2010",
    xaxis_title="Populations in Millions (log scale)",
    yaxis_title="Total Number of Murders (log scale)",
    legend_title="Region"
)
fig.update_traces(textposition='top left', textfont_size=8)

dc_values = murders[murders.abb == 'DC'].to_dict(orient='records')[0]

fig.add_annotation(x=dc_values['log_population'], y=dc_values['log_total'],
            text="Outlying value",
            showarrow=False,
            yshift=40,
            font=dict(color='#ef233c')
            )
fig.add_shape(type="circle",
    xref="x", yref="y",
    x0=dc_values['log_population'] - 0.08, y0=dc_values['log_total'] - 0.2, 
    x1=dc_values['log_population'] + 0.08, y1=dc_values['log_total'] + 0.2,
    line_color="#ef233c"
)

fig.show()

##### Plotly graph objects
Note that we had to iterate over the unique regions in the murders dataset, and plot each one separately in the loop, rather than using the color parameter in px.scatter. We also used NumPy to fit a linear trendline to each region's data, and plotted it using a mode='lines' trace.

In [4]:
import plotly.graph_objects as go

fig = go.Figure()

for region, color, symbol in zip(murders['region'].unique(), 
                         ["#264653", "#2a9d8f", "#e76f51", "#f4a261"],
                         ['circle', 'square', 'diamond', 'x']):
    region_df = murders[murders['region'] == region]
    fig.add_trace(go.Scatter(
        x=region_df['log_population'],
        y=region_df['log_total'],
        text=region_df['abb'],
        mode='markers+text',
        marker=dict(color=color),
        marker_symbol=symbol,
        name=region
    ))

# Add trendline
slope, intercept = np.polyfit(murders['log_population'], murders['log_total'], 1)
x_range = np.linspace(murders['log_population'].min() + 0.15, murders['log_population'].max() - 0.15, 100)
y_range = slope * x_range + intercept
fig.add_trace(go.Scatter(
    x=x_range,
    y=y_range,
    mode='lines',
    line=dict(color="#e9c46a", dash='dash'),
    name="Trendline"
))

# Add outlier point and circle
dc_values = murders[murders.abb == 'DC'].to_dict(orient='records')[0]

fig.add_annotation(x=dc_values['log_population'], y=dc_values['log_total'],
    text="Outlying value",
    showarrow=False,
    yshift=40,
    font=dict(color='#ef233c')
)
fig.add_shape(type="circle",
    xref="x", yref="y",
    x0=dc_values['log_population'] - 0.07, y0=dc_values['log_total'] - 0.25, 
    x1=dc_values['log_population'] + 0.07, y1=dc_values['log_total'] + 0.25,
    line_color="#ef233c"
)

fig.update_layout(
    title={
        'text': "US Gun Murders in 2010<br><sub>Number of gun murders vs population, by state and region</sub>",
        'x': 0.5
    },
    legend_title="",
    legend_orientation="h",
    legend_x=0.2,
    legend_y=1.05,
    template='simple_white',
    yaxis_range=[0,3.5]
)
fig.update_traces(textposition='top left', textfont_size=8)

fig.update_xaxes(
    tickangle = 0,
    dtick=1,
    showgrid=False,
    tickformat = '.f',
    title_text = "Populations in Millions (log scale)",
    title_font = {"size": 15},
    title_standoff = 25)
fig.update_yaxes(
    tickangle = -90,
    dtick=1,
    tickformat = '.f',
    title_text = "Total Number of Murders (log scale)",
    title_font = {"size": 15},
    title_standoff = 25)

fig.show()


More flexibility:
- Trendline could also be computed using other algorithms
- Trendline now not displayed over the entire range of the datapoints
- Different symbol marker for each regions
- Axes ticks formatted
- Axes ticks rotated
- Title centralized
- Legend horizontal and centered
- Tick distance