# CM4125 Data Viz
* Carlos Moreno-Garcia
* Lecturer in Computing / Placements & Electives Coordinator
* School of Computing
* email: c.moreno-garcia@rgu.ac.uk

In [1]:
# This cell is used to change parameter of the rise slideshow, 
# such as the window width/height and enabling a scroll bar

from notebook.services.config import ConfigManager
cm = ConfigManager()
cm.update('livereveal', {
              'width': 1700,
              'height': 800,
              'scroll': True,
})
def hide_code_in_slideshow():   
    from IPython import display
    import binascii
    import os
    uid = binascii.hexlify(os.urandom(8)).decode()    
    html = """<div id="%s"></div>
    <script type="text/javascript">
        $(function(){
            var p = $("#%s");
            if (p.length==0) return;
            while (!p.hasClass("cell")) {
                p=p.parent();
                if (p.prop("tagName") =="body") return;
            }
            var cell = p;
            cell.find(".input").addClass("hide-in-slideshow")
        });
    </script>""" % (uid, uid)
    display.display_html(html, raw=True)

## Module aims and objectives

1) Critically appraise a variety of data visualisation techniques in terms of psychology, design, effectiveness and appeal to a wider audience.

2) Interpret data visualisations and explain what conclusions can be drawn from them in terms of data analysis.

3) Understand the challenges of visualising large datasets.

4) Translate numerical and categorical data into coherent pictorial representations.

5) Create novel and interactive data visualisations which lucidly exhibit particular dataset features using publicly available data.

## How the module will work

### What this module **is** about?

* How to get proper data from various sources

* How to manipulate that data to have only what we are interested on

* How to put that data into meaningful, interesting and interactive pictorial representations

### What this module **is not** about?

* Computer vision

* Predicting or learning from data (i.e. machine learning)

* Flashy images and colours (although this sometimes helps!)

* Only bars and pies!

![Fig. 1](https://www.dropbox.com/s/3nn2mb45tdfsml7/fig1.jpg?raw=1)

### Online teaching

* Lectures will run for **three hours** every Friday 2 - 5 pm

* First hour is theory, then 2 hours for lab

* All sessions will start in BBC (link enabled in the corresponding week)
    * We may move to other platforms (i.e. Zoom) if there are technical issues or we need breakout rooms
    * If so, I will provide the link on Moodle

### Assessment

* You will have **two courseworks**, each corresponding to 50% of your mark
    * In CW1, you will be assessing an existing data repository and dashboard
    * In CW2, you will create your own

### Resources

* In the accordion on Moodle you will see a *Resources* tab

* Keep an eye on it as we will be adding content constantly

* You are also welcome to suggest your own content!
    * If the list grows a lot and there is engagement, I may enable a Trello/Miro board for all of us to collaborate

### Support/Assessments

* The *Support* tab in the accordion has all the information you need

## Introduction to Data Viz

### What comes to your mind when you hear data visualisation?

* Go to [menti.com](http://www.menti.com), use the code **68 84 90 2** and vote!

### Which data viz tools do you use?

* Go to [menti.com](http://www.menti.com), use the code **68 84 90 2** and vote!

### For those of you who chose excel...

* Don't get me wrong, Excel is great!

* However, it is **not** a proper tool for data visualisation!

* It has finite rows/cols, so by definition it is not suitable for big data!
![Fig. 2](https://www.dropbox.com/s/im11qjm66nia8us/fig2.jpg?raw=1)

* Excel tends to do weird things with data types
![Fig. 3](https://www.dropbox.com/s/lpk5d0g8v99nvmz/fig3.jpg?raw=1)

* Excel assumes formats when typing, and autocorrections creates chaos

In [4]:
hide_code_in_slideshow()
from IPython.display import HTML
HTML('<iframe width="560" height="315" src="https://www.youtube.com/embed/yb2zkxHDfUE?start=379" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>')


* Spreadsheets are very interactive, but error prone (costing millions of hours and money)

* A researcher was able to get 15k spreadsheets from Enron, and found that 42% didn't have a single formula/calculation!

* From the remaining 9k, 24% contained "obvious" errors!

* There's a thing called the European Spreadsheet Risk Interest Group (EuSpRiG), where they have [Excel horror stories](http://www.eusprig.org/horror-stories.htm)!

## My top 4

### Python

* Very high level programming language

* Provides complementary tools/packages such as `matplotlib`, `dash plotly` and `Streamlit.io` that help you visualise data better

* Best ranked programming language by the IEEE

![Fig. 4](https://www.dropbox.com/s/wscvktp8xx72clh/fig4.jpg?raw=1)

* In fact, this slideshow was done using it!

In [2]:
1+4

5

### R

* Even higher level programming language

* Widely used in statistics

* Also contains numerous packages (`lattice`, `ggplot`, etc.) to do data viz

* Advantage: R gets you [better salaries than Python](https://insights.stackoverflow.com/survey/2019#technology-_-what-languages-are-associated-with-the-highest-salaries-worldwide)

* Disadvantage: Exists within it's own [bubble](https://cdn.sstatic.net/insights/Img/Survey/2019/tech_network-1.svg?v=017e35626eaf)

### Tableau

* A commercial tool for business analytics to create dashboards

* Easy to use

* Connects to data from different sources and can import Python/R code

* [Demo](https://www.tableau.com/en-gb/products/desktop#video)

### Power BI

* Microsoft's response to Tableau!

* Easi(er) to use

* Connects to existing projects in `PowerApps`, `Excel` or `Azure`

* Demo

In [5]:
hide_code_in_slideshow()
from IPython.display import HTML
HTML('<iframe width="560" height="315" src="https://www.youtube.com/embed/yKTSLffVGbk" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>')

## How to get into data viz?

### 1) Get familiar with data repositories

* Data is literally everywhere!

* Sites such as [Kaggle](https://www.kaggle.com/) offer challenges every day!

* If you are familiar with Python/R, you will be aware that they also have data within

### 2) Watch other people have a go at it!

* Sites such as [fivethirtyeight.com](https://fivethirtyeight.com/) have lots of examples

* There are also social media groups
    * [Information is beautiful](https://www.facebook.com/informationisbeautiful/)
    * I f\*cking love maps
        * [FB](https://www.facebook.com/IFLOVEMAPS/)
        * [TW](https://twitter.com/ifckinglovemaps?lang=en)
        * [IG](https://www.instagram.com/ifuckinglovemaps/?hl=en)

### 3) Try to think how you will use this in your other modules!

## "Lab Activity" 1

* Get familiar with the Moodle site, ask any necessary questions!

* I **highly** recommend you installing [ANACONDA](https://www.anaconda.com/)
    * This way at least you will have two of the discussed tools
    * Plus, you need Python for the first part of the coursework so you better get familiar with it!

* As students you get Tableau and Power BI for free, don't hesitate to give them a try as well!

* If you don't feel particularly confident using Python, I would recommend you to start reading Coursework Part 1 instructions
    * It contains a comprehensive guide to install the data viz tool to judge, so once you become familiar with python there should be no issue running it!