<font size="6">Graphs analysing rain and accidents that took place in Madrid 2020</font>

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Rainfall-per-day-barplot" data-toc-modified-id="Rainfall-per-day-barplot-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Rainfall per day barplot</a></span></li><li><span><a href="#Accidents-per-day-barplot" data-toc-modified-id="Accidents-per-day-barplot-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Accidents per day barplot</a></span></li><li><span><a href="#Rain-per-day-violin-plot" data-toc-modified-id="Rain-per-day-violin-plot-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Rain per day violin plot</a></span></li><li><span><a href="#Accidents-per-day-violin-plot" data-toc-modified-id="Accidents-per-day-violin-plot-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Accidents per day violin plot</a></span></li><li><span><a href="#Correlation-between-rain-per-day-(mm/m2)-and-number-of-accidents-per-day" data-toc-modified-id="Correlation-between-rain-per-day-(mm/m2)-and-number-of-accidents-per-day-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Correlation between rain per day (mm/m2) and number of accidents per day</a></span></li><li><span><a href="#Correlation-matrix" data-toc-modified-id="Correlation-matrix-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Correlation matrix</a></span></li></ul></div>

In [1]:
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd

# Rainfall per day barplot

In [17]:
dffinal = pd.read_csv('../new_dataframes/finaldf.csv')
dffinal = dffinal[['fecha','prec','numacc']]
acc = px.bar(dffinal, x='fecha', y='prec', height=450)
acc.show()

We can see 2 main periods where rainfall happens more often. The first one is in the beginning and end of Spring (especially first half of April), and Autumn (end of October and beginning of November)

# Accidents per day barplot

In [3]:
newac = pd.read_csv('../new_dataframes/newac.csv')
rain = px.bar(newac, x='fecha', y='numacc', height=400)
rain.show()

The first week of the year is a period with less accidents, then it gets more uniform until March, where a dramatic drop in accidents per day takes place. This drop corresponds exactly with the day the lockdown in Madrid happened. The number of accidents carries on quite low until May 25th, when a significant increasing tendency shift happens. We can also see big drops in August 15th and 16th and on Christmas Day.

# Rain per day violin plot

In [4]:
viorain = go.Figure(data=go.Violin(x=dffinal.prec, box_visible=True, line_color='black',
                               meanline_visible=True, fillcolor='cornflowerblue', opacity=0.8,
                               points="all", name='Rain per day in Madrid'))
viorain.show()

We can appraise how the rainfall distribution is concentrated in the values close to 0, as 0 is the median and most common outcome. A zoom in the visualization is very convenient so we can see closely how the most relevant data are distributed, not taking into account the outliers.

# Accidents per day violin plot

In [5]:
vioac = go.Figure(data=go.Violin(x=dffinal.numacc, box_visible=True, line_color='black',
                               meanline_visible=True, fillcolor='indianred', opacity=0.8,
                                 points="all", name='Accidents per day in Madrid'))
vioac.show()

The median of accidents in one day is 41, being the most dense area, followed by another concetration around 10.

# Correlation between rain per day (mm/m2) and number of accidents per day

In [6]:
scatacrain = px.scatter(dffinal, x="numacc", y="prec", trendline="expanding", width = 1000, height = 330)
scatacrain.show()

This graph shows how rainfall (Y axis) is not correlated with number of accidents per day (X axis). This means that the amount of rain has not been related to the amount of accidents in Madrid during 2020.

# Correlation matrix

In [19]:
scatmatr = px.scatter_matrix(dffinal)
scatmatr.show()

The correlation matrix helps us have a more general view of the relation between the data. We can see an interesting tendency here. The first one is how the number of accidents dropped on March due to covid lockdowns.