/
analyticsReport.txt
48 lines (37 loc) · 3.24 KB
/
analyticsReport.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# Analytics Report
## Data Visualization
After several days of monitoring, we have over 4,000 records of temperature and humidity readings,
i.e. two time series of data. To visualize the data and have a better understanding of the data,
we decided to create two pre-defined graphs: line chart and scatter plot with distributions.
The line chart has both temperature and humidity plotted along the time axis so that we can see
the trend clearly for the past several days (up to 7 days for best quality). Also, on the time axis,
besides the ticks for each day, it should also be subdivided into 8-hour intervals, as this allows
us to not only see how temperature and humidity change over days but also how they change within a day.
We are also curious to see how temperature and humidity relate to each other, i.e. their linear
relationship, as well as their probability distributions. Therefore we decided to use a joint plot
for this. The scatter plot with the regression line in the middle and then the distributions along
each axis for the corresponding data. Then basing on the slope of the regression line, we can tell the
relationship between the two variables. Also with the distributions, we can see what the data look like.
## Library Selection
There are plenty of data visualization libraries for Python, and we explored several popular ones
and they are: Matplotlib, Seaborn, ggplot, Bokeh, and Plotly.
Since we are going to use object oriented code to create the graphs, we want the library to be able to
easily work with local data and output graph files.
The size of our data is considered small in today's standards, so all the above-mentioned libraries can
handle. However, as we want to generate static pictures locally, we don't want the libraries that are
mostly for interactive visualizations. In addition to that, as the processing power of the Pi is
limited, we don't want to install some heavy libraries either as that would take too long or even
fail. Hence, Bokeh and Plotly are ruled out as they are mostly used for interactive and/or online
visualization.
That leaves us three choices, Matplotlib, Seaborn, and ggplot. ggplot is great for its grammar of
graphics as one can create a plot by adding different components on top of each other and gradually
build out the desired visualization, however, the ggplot originates from R and the ported version in
Python is not a feature-for-feature port and it's mostly for people coming from an R background to
quickly adjust to Python. For this reason, we decided to go with the two most popular Python-natives -
Matplotlib and Seaborn. Another good reason to use these two is that they fully support pandas data frame,
which is what we'd like to use to hold our data. With pandas, we can easily read data from SQLite
to data frame with just one line of code. The simplicity plays a big role on our decision-making process.
We used Matplotlib to create the line graph and Seaborn to create the joint plot of scatter plot with
distributions. The two libraries both provide great customizability but Seaborn, which is actually
built on top of Matplotlib, provides a much simpler syntax for generating graphs. Not to mention,
the graphs generated by Seaborn is aesthetically better than those by Matplotlib.