## <u> Visualizing and Aggregating Data in Spreadsheets</u>

### <u>Sparklines</u>
*  Used to visualize trends over time with full access to the data.
*  Give access to quick views into the the trends, the highs and the lows.
*  Often used :
    * in financial reports 
    * when the end user also wants access to the data that is building the charts
    * when you need to compare the trend over many different dimensions, like customers or product line.
    
#### Creating sparklines : 
1. Select the cell you in which you want to enter the sparkline
2. Enter =sparkline(start cell:end cell)


### <u>Charts</u>
* __graph__ : is a diagram of a mathematical function, but can also be used (loosely) about a diagram of statistical data.
* __chart__ : is a graphic representation of data, where a line chart is one form.
* __plot__ : is the result of plotting statistics as a diagram in different ways, where some of the ways are similar to some chart types.

### <u>Bar or Column Chart (stacked or segmented)</u>
* A chart made from categorical data in which the heights of bars represent the frequency (or relative frequency aka percent) of membership in each value of the variable. Unlike a histogram, the width of the bars carries no meaning.


![bar_chart.png](attachment:bar_chart.png)

* Vertical bars = column chart (Google Sheets), Horizontal bars = bar chart (Excel)

# <u>Histogram (Relative Frequency Histogram)</u>
1. A graph made from continuous data in which the range of the data is divided into intervals called bins, 
2. Then bars are constructed above each bin such that the heights of the bars represent the frequency or relative frequency of data in the particular bin. 
     * Unlike a bar chart, the width of the bars is an important characteristic of the graph. 
* Goal of histograms is to view the distribution of the data.
![histograms.png](attachment:histograms.png)

### <u>Box Plox or Box and Whisker Plot</u>
* A way to visualize how the continuous data is distributed.
* Incorporate the median and upper and lower quartiles to graphically display the data range.
* Useful for displaying outliers when they are present in the data.


* `Summary statistics for the data:`
| statistic | x   | y    | z    |
| --------- |---  |---   |---   |
| min       | 2   | 1    | 1    |
| Q1        | 4   | 4.25 | 4.25 |
| median    | 6   | 6.5  | 7    |
| Q3        | 7.5 | 8    | 7.75 |
| max       | 9   | 9    | 9    |


![boxplot.png](attachment:boxplot.png)


### <u> Line Chart</u>
* Plotting continuous data
* Multi-line and area charts are adaptations of the basic line chart.
* A time series chart is usually a line chart, where the x axis is time.

![line_plots.png](attachment:line_plots.png)

### <u> Scatterplot/ Jitterplot</u>
* Used for plotting a large number of points as opposed to an aggregate.
* Scatter plots will plot 2 __continuous variables__.
* Jitter or Swarm plots are good for when either a lot of data points fall on the same coordinates or when one of the variables is __categorical__.

![scatterplot.png](attachment:scatterplot.png)

### <u>Pie Chart</u> 
* Useful for a quick view of how a variable is broken into a limited number of categories (percentage of each category).
* Use sparingly!
* Don't use if context is needed to tell the story, and context is almost always needed.
* Don't use if you have more than a handful of categories.
* Don't use if you need to see minor differences, e.g. if the difference between 45% and 50% is important.


![pie_chart.png](attachment:pie_chart.png)

### <u>Venn Diagram</u>
* Shows all possible logical relations between a finite collection of different sets. 
     * These diagrams depict elements as points in the plane, and sets as regions inside closed curves.
* A Venn diagram consists of multiple overlapping closed curves, usually circles, each representing a set.


![DataScience_Venn.png](attachment:DataScience_Venn.png)

### <u> Pivot Tables</u>
* Use Pivot Tables: when tabulating, aggregating, summarizing, exploring, and analyzing your data
    * quickly change how your data is summarized
    
* __<u> How to use Pivot Tables</u>__
    * Data should be organized in columns with headings.
    * Make sure there are no empty columns or rows in your data
        * (Empty cells within your table are OK, but not a whole row or a whole column of empty cells.)
     * If you have a date column, make sure all the values in that column are dates (or blank). If you have a quantity column, make sure all the values are numbers (or blank) and not words.
     

* __<u> Create a Blank Pivot Table</u>__
* __Excel__ : Click on either a single cell in the data table or the entire table, and click the PivotTable button from the insert menu.
* __Google Sheets__ : The create pivot table option is under the Data tab


* __<u> Design a Pivot Table</u>__
1. You'll see the Pivot Table Builder (Excel)/Pivot Table Editor (Google Sheets) and the field layout area.

2. It should show the column headings from your data table. If not, you may need to check that your whole data table was selected.

3. If you click on any cell in your spreadsheet that is outside the pivot table, the pivot table Field List will disappear. You can make it reappear simply by clicking inside the PivotTable report again.

4. To create the layout of your pivot table, drag and drop each field to the area you want it to be in the field layout area.

5. To change aggregation method and data types: click on the i (Excel) or summarize by next to the field name in the builder.


* __<u>Add your new data to the existing data table:</u>__
1. Click inside your pivot table to see the pivot table tools options.
    * __Excel__ : click the Refresh button or the Change Data Source button in Pivot Table Tools tab
        * Then either change the range or select 'OK' if you just want to refresh the existing range.
    * __Google Sheets__ : Click the "table" icon next to the where the current data range is displayed.
         * Then either change the range or select 'OK' if you just want to refresh the existing range.


* __<u>Grouping your data by Date (Excel):</u>__
1. Select a date field (must be a valid Date field) from within the pivot table and click either the Group Selection button or the Group Field button on the toolbar (both will work in this scenario).

2. Or you can right-click on one of the dates in the first column, and select the Group option. You can change the start and end dates at this point so that the grouping only covers a given date range.

3. You can then choose what you want to group your data by. Note that the Months option has been chosen by default.

4. You can group by weeks by: click Days -> enter 7 as the Number of Days value -> click OK.

5. You can remove the grouping at any times by following the steps above and choosing Ungroup instead of Group. The groupings will be removed immediately.

6. Once your data has been grouped, you can then collapse or expand individual group fields to hide the underlying data. Select the field you want to collapse, and clicking the Excel pivot table icon for collapsing a group of data button to the left of the field heading.

7. You can also collapse or expand all of the fields at once. Click on one of the fields, and then click either the Expand or Collapse buttons on the Pivot Table toolbar (remember that you need to click on a cell inside the pivot table and then click the Pivot Table Tools button above the main ribbon toolbar).

8. If you want more (or faster) control over how you expand or collapse your grouped data, 
    * You can also right-click one of the Date fields and choose Expand/Collapse to see different options for expanding or collapsing the fields in the group.
