# Data Analysis and Visualization with Microsoft Excel

![](https://i.imgur.com/8lt9kyH.png)

The following topics are covered in this tutorial:

- Introduction to Spreadsheets and Microsoft Excel
- A quick tour of Excel's user interface & functionality
- Using formulas, arithmetic operations and functions in Excel 
- Data analysis and visualization of real-world data with Excel
- Data validation, conditional formatting and merging cells
- Using cloud-based spreadsheets for collaboration 

**NOTE**: Download and open up this Excel Workbook to follow along with this tutorial: https://www.dropbox.com/s/25mqecdwch6y556/ExcelDataAnalysisWorkbook.xlsx?dl=1

## Spreadsheets

A spreadsheet is a computer application for organization, analysis, and storage of data in tabular form. The program operates on data entered in cells of a table. Each cell may contain either numeric or text data, or the results of formulas that automatically calculate and display a value based on the contents of other cells. ([Wikipedia](https://en.wikipedia.org/wiki/Spreadsheet))

<img src="http://depictdatastudio.com/wp-content/uploads/2019/11/formulas_statistical-standard-deviation.gif" width="480">

[Microsoft Excel](https://www.microsoft.com/en-in/microsoft-365/excel) is one of the most popular spreadsheet software applications. It is commonly used for:

- Data Entry and Storage
- Performing Calculations
- Data Analysis and Interpretation
- Reporting and Visualizations
- Accounting and Budgeting
- Collection and Verification of Business Data
- Calendars and Schedules
- Administrative and Managerial Duties
- Time series Forecasting
- Automating Repetitive Tasks

## Downloading and Installing Excel

Microsoft Excel is available as an application for desktop computers and mobile phones. It is a part of the Microsoft 365 family of products. You can sign up for a free trial and download Microsoft Excel here: https://www.microsoft.com/en/microsoft-365/excel

<img src="https://i.imgur.com/sLpt15D.png" width="480" />

**NOTE**: If you do not wish to buy MS Excel and are unable to start a trial, you can use Google Sheets ( https://sheets.google.com ). It's free, and you'll be able to replicate most of this tutorial with minor changes.

## Excel's User Interface

Once you open up a new file (called workbook) in Excel, you will be presented with the following user interface:

<img src="https://i.imgur.com/YluJoBM.png" width="640" style="border-radius:4px">




The interface has five main parts: Title Bar, Ribbon, Formula Bar, Sheet and Bottom Bar, 

### Title Bar

It shows the workbook name, some quick actions and window controls.

<img src="https://i.imgur.com/fxc4WbQ.png" width="640">

### Ribbon

It contains tabs (menus) with action buttons under each tab. Clicking on buttons may also open dropdowns or dialog boxes. Sometimes additional contextual tabs show up based on the currently selected element on the sheet. 

<img src="https://i.imgur.com/3EDeZWA.png" width="640">

### Formula Bar

It's where you can type/view mathematical operations and functions

<img src="https://i.imgur.com/Ty4U1mD.png" width="640">

### Sheet

It is a grid of cells where you can enter data and perform calculations. 

<img src="https://i.imgur.com/XuyMTBQ.png" width="640">

### Bottom Bar

A workbook can have multiple sheets. You can change sheets using the bottom bar. It also contains controls for layout and Zoom.

<img src="https://i.imgur.com/hYSISRN.png" width="640">



> **EXERCISE**: Explore all the tabs in the Ribbon, and click on each action button to see what happens. If you can't figure out what a button does, look for a tutorial or an explainer video online.

## Workbooks, Sheets and Cells

<img src="https://i.imgur.com/bC3q1uR.png" width="640">

#### Workbooks

- Excel files are called workbooks and have the extension ".xlsx". 
- A workbook contains one or more sheets (grids of cells)
- Workbooks can be created using the "File" > "New" menu option

#### Sheets

- A sheet is a grid of cells within a workbook for data entry and computation
- Sheets can be created, selected, renamed and deleted from the bottom bar.
- Rows are labeled using numbers and columns using alphabets. 


#### Cells

- A cell is a box for storing a single piece of information e.g. a number, a string
- A cell is indicated uniquely using the row and the column e.g. A6
- The value in a cell can be entered manually or computed using a formula
- You can select multiple cells by clicking and dragging and perform operations on all cells at once.

#### Rows and Columns

- You can select all the cells in a row by clicking on the row number.
- You can select all the cells in a column by clicking on the row number.

> **EXERCISE**: Perform the following operations in an Excel workbook with some data:
> 
> 1. Select multiple rows of data
> 2. Select multiple columns of data
> 3. Delete the data from selected rows while keeping the cells intact
> 4. Delete selected rows from the sheet, and remove the cells
> 5. Change the order of rows/columns in the sheet
> 6. Duplicate the contents of a row or a column to another row/column
> 7. Change the heights of some row and the widths of some columns

## Data Entry and Formatting

Cells in Excel can contains 3 types of data: numbers, strings and formulas.

<img src="https://i.imgur.com/TIDyTRq.png" width="480">










### Numbers

Numbers in Excel can be whole number or real numbers (containing decimals). To enter a number, just click on a cell and start typing or paste a value.

Percentages, dates, currency amounts etc. are recongized and stored as numbers within Excel. They're just displayed differently. You can choose how a number is displayed using the "Number Format" dropdown in the Home tab of the ribbon. A number format can also be applied to a row, column or custom selection of cells.


<img src="https://i.imgur.com/tiZAwzi.png" width="480">


<img src="https://i.imgur.com/76SH1DC.png" width="160">


> **EXERCISES**: 
>
> 1. Explore the various number formats in Excel. Create a column of data for each type of number format.
> 2. Enter some numbers containing decimal points into a column of cells, and change the number of decimal points displayed.
> 3. Create a custom number format in Excel and apply it to column of data.
> 4. Type a date and time into a cell and try changing its format. 

### Strings (or Labels)

Strings (or labels) simply refers to text data. Excel automatically detects whether the data you're entering into a cell is a number or a string. Strings are typically left-aligned while numbers are right-aligned.

<img src="https://i.imgur.com/TIDyTRq.png" width="480">

If some text doesn't fit within the boundaries of a cell generally overflows and is displayed over the next cell, unless the next cell contains some data. Text cells can also be configured to wrap text to new lines.

<img src="https://i.imgur.com/0a9Lk7A.png" width="360">


### Formulas

Instead of manually typing a value into a cell, you can configure its value to be computed using a formula. A formula MUST begin with the `=` character. To create a formula, just type `=` followed by the calculation you wish to perform. You can reference values in other cells using the row & column number of the respective cell.

<img src="https://i.imgur.com/TIDyTRq.png" width="480">

To view the result of a formula, just press Enter/Return. The result of the calculation is displayed in the cell, and you can view/edit the formula in the formula bar or by double-clicking on the cell. While you're editing a formula, you can exit the editing mode by pressing Esc.

<img src="https://i.imgur.com/8uYw78a.png" width="480">

If you change the value of a cell referenced in a formula, the output of the formula updates immediately. This feature makes spreadsheets great for exploring "what-if" scenarios.

> **EXERCISE**: Enter some data into a workbook, then create some formulas using the manually entered data as inputs. Then change the data and observe the changes in the formula.


### Cell Formatting



You can change the font, size, text color, cell color, cell border, text alignment, text wrapping and apply styles like bold, italic and underline using the text formatting section of the "Home" tab in the ribbon.

<img src="https://i.imgur.com/Gx4ZEJZ.png" width="640">

<img src="https://i.imgur.com/04jzIRt.png" width="420">



> **EXERCISE**: Replicate the above styles in a new excel sheet. Then experiment with other formatting options.

## Using Excel for Calculations

Excel formulas are commonly used to perform arithmetic operations on values stored in other cells. 

### Arithmetic Operations

Excel supports the following arithmetic operations:

<img src="https://i.imgur.com/HKg68ij.png" width="640">


Formulas can use the following inputs: 
- Numeric data from one more cells 
- Numeric outputs from cells containing other formulas
- Hardcoded numbers within a formula


You can construct a complex formula using parantheses i.e. `(` and `)`.

Learn more about Excel Formulas here: https://www.ablebits.com/office-addins-blog/2015/12/17/excel-formulas-examples/

### Replicating Formulas To Multiple Cells

A formula can be replicated to multiple cells by dragging the "fill handle" in the bottom-left corner of a cell. The cell references in the replicated formulas are automatically updated based on the relative location of the target cell. 

<img src="https://i.imgur.com/lzQpSh8.gif" width="480">



Cell references are also updated when a formula is copied and pasted into another cell. [Learn more about copy-pasting data in Excel](https://support.microsoft.com/en-us/office/paste-options-8ea795b0-87cd-46af-9b59-ed4d8b1669ad).


### Cell References

There are 3 types of cell references in Excel: absolute (`$A$1`), relative (`A1`) and mixed (`$A1` or `A$1`). 

All three of the above references refer to the same cell, and the dollar sign (\\$) indicates whether or not to change cell references when the formula is moved or copied to other cells.


- **Absolute cell reference (`$A$1`)** - the $ sign before the row and column coordinates makes a reference static, and lets you copy a formula without changing references.


- **Relative cell reference (`A1`)** - a cell reference with no $ sign changes based on relative position of rows and columns in a spreadsheet.


- **Mixed cell reference**:
    
    1. **Absolute column and relative row (`$A1`)** - the $ sign in front of the column letter locks the reference to the specified column, so the column never changes. The relative row reference, without the dollar sign, changes depending on the row to which the formula is copied.

    2. **Relative column and absolute row (`A$1`)** - the row's reference locked by \\$ doesn't change, and the column's reference does.
    
    
You can also reference cells from another sheet by prefixing the sheet name: `Sheet_name!Cell_address`

### Autocompleting a Series

The fill handle can also be used to autocomplete a series of numbers, dates, days, times etc. Just enter 2 or 3 values and use the fill handle to complete a series. If Excel detects a series, it will to autocomplete according to the detected rule, otherwise it will create copies.

<img src="https://i.imgur.com/qnWYui3.gif" width="480">

## Built-in Functions in Excel

Excel offers several built-in functions to manipulate, combine and process numerical and text data.

### Mathematical Functions

Here are some common mathematical functions in Excel ([source](https://udmercy.edu/about/its/help/files/excel-formulas.pdf)):

<img src="https://i.imgur.com/VNgd21n.png" width="480">

Aggregation functions like `AVERAGE` require passing a range of cells:

<img src="https://i.imgur.com/oZB3qnu.png" width="320">

> **EXERCISE**: Apply the mathematical functions listed above to the sample data shown above.

### Conditional Functions

Excel offer the following conditional functions:

<img src="https://i.imgur.com/aPAKHz9.png" width="480">

In the following example, `IF` is used to generate the `Result` column, and `EXACT` is used to generate the `Passed` column:

<img src="https://i.imgur.com/FnBY6cU.png" width="480">

### Logical Functions

You can construct complex conditions using logical functions ([source](https://www.ablebits.com/office-addins-blog/2014/12/17/excel-and-or-xor-not-functions/)):

![](https://i.imgur.com/DfFrz2a.png)

> **EXERCISE**: The `Grade` column in the above example has been filled manually. Use a combination of logical and conditional functions to fill the grade column automatically. (Score to grade conversion: 90+ is A, 80+ is B, 70+ is C, 60+ is D, 40+ is E and <40 is F)

### String Functions

Excel provides several functions to manipulate text.

<img src="https://i.imgur.com/wGe3iUI.png" width="480">

> **EXERCISE**: Practice string manipulation functions by applying them to the following table:
>
> <img src="https://i.imgur.com/2f8skxa.png" width="360">

### Date and Time Functions

<img src="https://i.imgur.com/4GOMBid.png" width="640">


Here are some more useful functions:
- `DAYS` and [`DATEDIF`](https://support.microsoft.com/en-us/office/calculate-the-difference-between-two-dates-8235e7c9-b430-44ca-9425-46100a162f38): Number of days, weeks or month between two dates
- [`TEXT`](https://support.microsoft.com/en-us/office/text-function-20d5ac4d-7b94-49fd-bb38-93d29371225c): Displaying dates in custom formats.



> **EXERCISE**: Practice date manipulation with the following sample data:
> 
> <img src="https://i.imgur.com/STXIKOu.png" width="240">

### Searching Tables with `VLOOKUP`

The `VLOOKUP` (Vertical Lookup) function is used to search for a certain value in a column , in order to return a value from a different column in the same row. It looks up a value in the leftmost column and returns a value in the same row of the column you specify.

<img src="https://cdn.extendoffice.com/images/stories/doc-excel/excel-vlookup-function/doc-vlookup-function-1.png" width="480">

The `VLOOKUP` column allows using a spreasheet like a key-value store or a dictionary.

> **EXERCISE**: Create a `VLOOKUP` cell to find the score in Chemistry, given the name of the student.

We've looked at just a small selection of functions in Excel. Here's a full list of functions available in Microsoft Excel: https://support.microsoft.com/en-us/office/excel-functions-by-category-5f91f4e9-7b42-46d2-9bd1-63f26a86c0eb

You needn't remember all or any Excel functions. You can always look them up in the documentation or just search for what you're trying to do online e.g. "Calculate difference between two dates in Excel"

## Data Analysis with Excel

Excel provides several utilities for analyzing and visualizing data. Excel's graphical user interface makes it easy to analyze data without writing code.

### Importing Data From a CSV file

Download this CSV file containing daywise Covid-19 data for Italy in 2020: https://bit.ly/italy-covid-daywise

To import the CSV file into excel:

- Create a new sheet in the workbook
- Select the menu option "File" > "Import" > "CSV File"
- Locate and select the file from the file explorer
- During import, make sure to select "Comma" as the delimiter
- Select the correct data format for each column

You should now be able to see the data within the sheet. Once imported, you can rename columns headers and even freeze the top row.

<img src="https://i.imgur.com/GXO8t7C.png" width="480">

> **EXERCISE**: Download and import this CSV file containing population, GDP & other health-related information about various countries into a new Excel sheet (in the same workbook as the previous sheet): https://bit.ly/locations-csv
>
> <img src="https://i.imgur.com/ztMqZtU.png" width="640">

> **EXERCISES**: Answer the following questions using the above datasets:
> 
> 1. What are the total number of reported cases and deaths related to Covid-19 in Italy in the given dataset?
> 2. What is the overall death rate (ratio of reported deaths to reported cases)?
> 3. What is the overall number of tests conducted? A total of 935310 tests were conducted before daily test numbers were reported.
> 4. What fraction of tests returned a positive result?

### Sorting and Filtering Data

To sort and filter data, just select a range of data, and pick the "Filter" option from the Home tab of the ribbon. 

<img src="https://i.imgur.com/CaXsBqD.png" width="640">

Once selected, you'll see dropdowns buttons in the column headers. Clicking on the button will display a dialog where you can sort and filter rows using the values from the selected column.

<img src="https://i.imgur.com/aVwN3gl.png" width="640">

You can also apply multiple filtering criteria for different columns. Filtering and sorting is a great way of displaying just the required data in a desirable format. Tip: You can also save commonly used filtering & sorting configurations on a sheet using [custom views](https://www.youtube.com/watch?v=fIuwBeDhGSE).


> **EXERCISE**: Practice sorting and filtering with the locations CSV file imported in the previous exercise.

### Pivot Tables

A pivot table is a table of grouped values that aggregates the individual items of an existing table within one or more discrete categories. It's similar to the "GROUP BY" operation in SQL and Pandas. Pivot tables are 

<img src="https://i.imgur.com/68m5sRU.png" width="480">

Here's a pivot table showing the total cases, deaths and test month-by-month for the above dataset:

<img src="https://i.imgur.com/85IhYn3.png" width="640">




> **EXERCISE**: Use Pivot tables on the `locations.csv` dataset downloaded above to compute the population of each continent.

> **EXERCISE**: Come up with 10 questions about the datasets `italy_covid.csv` and `locations.csv`. Answer them using Pivot tables, wherever possible. 

## Data Visualization with Excel

Excel provides inbuilt charting tools to visualize data. To create a chart, select the "Add Chart" button from the "Insert" tab in the ribbon.

<img src="https://i.imgur.com/NqMTne8.png" width="720">

Let's look at some common types of charts.

### Line Chart

![](https://i.imgur.com/2oLskyW.png)

> **EXERCISE**: Create a line chart to show the total number of tests and total number of cases in Italy month-by-month using the `italy-daywise.csv` dataset.

### Column / Bar Chart

![](https://i.imgur.com/Oz3kfry.png)

### Scatter Chart

![](https://i.imgur.com/Ykii77v.png)

### Histogram

![](https://i.imgur.com/6tM0iYc.png)

### Box Plot

![](https://i.imgur.com/CvwgzuU.png)

You can find a full list of charts available in Excel here:
https://support.microsoft.com/en-us/office/available-chart-types-in-office-a6187218-807e-4103-9e0a-27cdb19afb90

## Advanced Features in Excel

There's a lot more you can do with Excel. Let's a look at a few advanced features: data validation, conditional formatting and merging cells.

### Data Validation and Selection Dropdowns

Because spreadsheets are often used for data entry by multiple users, they are prone to human errors. Excel provides a feature called "data validation" which can be used to impose restrictions on the data that can be entered into a selected range of cells, and optionally show a dropdown of acceptable values.

The "Data Validation" feature can be accessed from the "Data" tab of the Ribbon.

<img src="https://i.imgur.com/SrnUbHn.png" width="480">
     
Here's an example of data validation:

<img src="https://cdn-5a6cb102f911c811e474f1cd.closte.com/wp-content/uploads/2018/03/11-Awesome-Examples-of-Data-Validation-dynamic-list.png" width="480">

Here are some examples of data validation in Excel: https://www.howtoexcel.org/tips-and-tricks/11-awesome-examples-of-data-validation/

> **EXERCISE**: Apply each type of data validation technique listed in the "Allow" dropdown of the Data Validation dialog.

### Conditional Formatting

Excel allows conditionally changing the styles of a cell based on its value. This features is called conditional formatting and makes it possible to visually convey & highlight important information. Conditional formatting is available as an action button in the "Home" tab of the ribbon.

<img src="https://i.imgur.com/C9qWP6X.png" width="480">


Watch this tutorial on conditional formatting to learn more: https://www.youtube.com/watch?v=7iKoccSTNZA

### Merging Cells

Excel allows merging a rectangular grid of cells into a single cell. This is a useful way of establishing visual heirarchy within the sheet (showing groups, subgroups etc.)

<img src="https://i.imgur.com/Ri5qp3B.png" width="480">

### Merging Data From Two Tables

Excel doesn't provide a straightforward way of joining or merging two tables on a chosen column like in SQL or Pandas. However, it is possible to achieve a similar result using `VLOOKUP`.

![](https://i.imgur.com/BIJ0Ltr.png)

It's also possible to do this using the `INDEX` and `MATCH` functions. Watch this video to learn more: https://www.youtube.com/watch?v=vFgWTRsVcsI

For more advanced querying and merging, you can use the Power Query add-in which adds a full-fledged querying language to Excel: https://www.howtoexcel.org/power-query/the-complete-guide-to-power-query/

> **EXERCISE**: Merge the data from files `italy-covid-daywise.csv` and `locations.csv` within excel, and add new columns "deaths per million", "tests per million" and "cases per million" using the population of Italy.

## Cloud-based Spreadsheets

[Google Sheets](https://sheets.google.com) in a browser-based alternative to Microsoft Excel. It's great choice for personal projects for the following reasons:

- Google Sheets is free for personal use, unlike Microsoft Excel
- Google Sheets runs entirely in your browser, and doesn't require a download or an installation
- Google Sheets documents are stored in the cloud, in your Google Drive, and can be accessed from anywhere
- Google Sheets supports live real-time collaboration
- Google Sheets supports integrations with other tools like Google Forms


There are also tools [Airtable](https://airtable.com), [Notion](https://notion.so) and [Coda](https://coda.io) that  combine features of spreadsheets and documents. Microsoft also offers a browser-based version of Excel as part of [Office 365](https://www.office.com/) and One Drive.

> **EXERCISE**: Replicate everything we've done in this tutorial on Google Sheets. Do you notice any major differences between Google Sheets and Excel?

## Summary and References

The following topics were covered in this tutorial:

- Introduction to Spreadsheets and Microsoft Excel
- A quick tour of Excel's user interface & functionality
- Using formulas, arithmetic operations and functions in Excel
- Data analysis and visualization of real-world data with Excel
- Data validation, conditional formatting and merging cells
- Using cloud-based spreadsheets for collaboration


Check out the following resources to learn more:

- [Excel Exercises by Wise Owl](https://www.wiseowl.co.uk/excel/exercises/standard/)

- [Excel Video Training by Microsoft](https://support.microsoft.com/en-us/office/excel-video-training-9bc05390-e94c-46af-a5b3-d7c22f6990bb)

- [YouTube Playlist on Excel (Basic to Advanced)](https://www.youtube.com/watch?v=-ujVQzTtxSg&list=PLWPirh4EWFpEpO6NjjWLbKSCb-wx3hMql)


In [2]:
!pip install jovian --upgrade --quiet

In [3]:
import jovian

In [None]:
jovian.commit()

<IPython.core.display.Javascript object>

## Revision Questions
1.	What is a spreadsheet?
2.	Name a few spreadsheet software applications.
3.	What are the uses of a spreadsheet?
4.	How many parts does the MS Excel interface have? What are they?
5.	What is a workbook?
6.	What is a sheet?
7.	What is a cell?
8.	How many types of data can cells in Excel contain? What are they?
9.	What is cell formatting?
10.	What are some inputs that formulae can use in the cells?
11.	How do you construct complex formulae in Excel?
12.	What is fill handle in a cell?
13.	What are cell references? What are the different types? Explain with an example.
14.	What is a built-in function? Explain a few built-in functions in Excel with examples.
15.	What are conditional functions? Explain a few with examples in Excel.
16.	What are logical functions? Explain each with an example in Excel.
17.	What are string functions? Explain with examples in Excel.
18.	What does <code>DATEDIF</code> function do?
19.	What does <code>TEXT</code> function do?
20.	What does <code>VLOOKUP</code> function do?
21.	How to import data from CSV file to Excel?
22.	How to filter data in Excel?
23.	How does <code>PivotTable</code> work in Excel?
24.	How to perform/show different visualizations in Excel?
25.	What is Data Validation in Excel?
26.	What is conditional formatting in Excel?
27.	How to merge cells in Excel?
28.	How to merge data from two tables in Excel?
29.	What is <code>MATCH</code> function in Excel?
30.	What are some cloud-based spreadsheets?