Skip to content

Spreadsheets are computer programs that allow users to enter, view, and change information in a gridlike format. One of the most useful programs on a PC is a spreadsheet, which allows users to organize data in tabular form. A spreadsheet's primary purpose is to store numerical data and sometimes a few words of text.

License

Notifications You must be signed in to change notification settings

BytesOfIntelligences/IBM-Project-03-Analyzing-Spreadsheet-Data-with-Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation


IBM Project 03 Analyzing Spreadsheet Data with Python

Description

About Spreadsheets are computer programs that allow users to enter, view, and change information in a gridlike format. One of the most useful programs on a PC is a spreadsheet, which allows users to organize data in tabular form. A spreadsheet's primary purpose is to store numerical data and sometimes a few words of text.

Features

Analyzing Spreadsheet Data with Python

Estimated time needed: 40 minutes

We've all been there: sitting at the computer, a cup of coffee in hand, ready to begin the day by tackling a spreadsheet. But once we open it, all we see are numbers upon numbers staring back at us. It can be a little intimidating tackling these numbers, especially with larger spreadsheets, but not to worry! With a few tricks up its sleeve, Python is here to give us a hand!

Interested in learning more about this? Well, you're at the right place!

In this project, we'll be giving you an introduction to the tools that you can use to load spreadsheet data into Python, manipulate it, and write it back to the original file.

What You'll Need

You should be familiar with basic concepts of computer programming and the syntaxes of Python programming language. This refers to the concept of list, dictionary, function, class, instance variable, method, etc.

What You'll Learn

After completing this guided project, you will be able to:

  1. Understand the definition of a dataset.
  2. Understand the concept of Cell and Sheets in spreadsheet.
  3. Load a sheet in an xlsx or xls file to Python.
  4. Perform basic spreadsheet operations such as subsetting, filtering, calculating column mean, median, max and mean, etc.
  5. Write to a sheet in an xlsx or xls file from Python.

Exercise 1: Let's Define Data!

So, let's talk about data. We've been throwing this term around a lot: analyzing data, manipulating data and writing data. What exactly is this "data?

Data is defined as "a collection of numbers, characters, images or other items that provide information of something" (SDM, 3rd CE, Deveaux et al.). In statistical sciences, we store data in files that records cases and variables:

image

As shown in the table above, each entry is called a case, and each case includes value(s) for variable(s). Cases are often deleted due to missing data.

Exercise 2: Explore a Spreadsheet!

Now that we know the definition of data, we can dive into a spreadsheet!

A spreadsheet is a computer application for organization, analysis, and storage of data in tabular form. xlsx and xls are the two most popular formats for spreadsheet files. They can be created, opened, and saved using spreadsheet applications such as Microsoft Office Excel and Google Sheets.

image

import requests
import os

if not os.path.isfile("./Airport_Data.xlsx"):
    r = requests.get("https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/analysing-spreadsheet-data-with-python/Airport_Data.xlsx")
    f = open("./Airport_Data.xlsx", mode = "wb+")
    f.write(r.content)
    f.close()

To load the spreadsheet into Python, we can use pandas's read_excel() function. When calling this function, you'll need to specify:

  • The path to the xls or xlsx file, and
  • The name of the sheet in which you wish to load into Python.

Both will be passed in format of the string.

import pandas as pd

df_facilities = pd.read_excel("./Airport_Data.xlsx", sheet_name="Facilities")

Author(s)

Weiqing Wang

Weiqing is a Data Scientist intern at IBM Canada Ltd. Weiqing holds an Honours Bachelor of Science from the University of Toronto with two specialist degrees, respectively in computer science and statistical sciences. He is presently working towards a graduate degree in computer science at the University of Toronto.

Other contributors

Kathy An, Yasmine Hemmati

Change Log

Date (YYYY-MM-DD) Version Changed By Change Description
2021-08-10 1.0 Weiqing Wang Initial Version Created.

© IBM Corporation 2021. All rights reserved.

About

Spreadsheets are computer programs that allow users to enter, view, and change information in a gridlike format. One of the most useful programs on a PC is a spreadsheet, which allows users to organize data in tabular form. A spreadsheet's primary purpose is to store numerical data and sometimes a few words of text.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published