Skip to content

Latest commit

 

History

History
55 lines (31 loc) · 4.14 KB

File metadata and controls

55 lines (31 loc) · 4.14 KB

Introduction to Data Science Libraries: Using Pandas, Matplotlib, and Seaborn to Explore and Display Physics Data

Module Summary

This module is a one-notebook introduction to importing, analyzing, and displaying physics data with three popular Python libraries: Pandas, Matplotlib, and Seaborn. The notebook in this module will have students explore a data file containing data for various nuclei. Students will learn to import the data and format graphs that help display the data. After completing this module, students will feel comfortable analyzing and displaying a variety of data sets and know some of the best ways to display data, depending on the situation.

Module Summary (Shortened)

This module aims to introduce students to the libraries Pandas, Matplotlib, and Seaborn by having the students analyze and display data in two and three dimensions.

Keywords

Pandas, Matplotlib, Seaborn, Graphing, Introductory

Table of Contents

01_introduction_to_data_science_libraries.ipynb: An introduction to the Python data science libraries Pandas, Seaborn, and Matplotlib. Students analyze a data set taken from nuclear physics to explore these libraries

nuclear_data.tsv: The data set needed by Notebook 1.

Pre-requisities

The only assumed background knowledge is a basic knowledge of Python and NumPy arrays. All of the other necessary Python libraries will be explained in the module, and there is no assumed physics knowledge.

This module runs in Jupyter Notebooks using the Python 3 programming languages. Both need to be installed on a computer for this notebook to run, or the notebook can also be run on Google Colab without any local installations required. A link to the Google Colab versions of the notebook is provided at the top of the notebook. Note that the current format of the notebook assumes it will be run on Google Colab. If it is intended to be run local, the notebook cells setting up the Google Drive link will need to be deleted and cell setting the data path will need to be updated.

The Jupyter Notebook in this module assumes that the following packages are installed. These can be easily installed through any Python package manager such as pip3 or conda or are installed by default if using Google Colab.

  • NumPy
  • Matplotlib
  • Seaborn
  • Pandas

Learning Goals

Physics

Though the data set in this module is taken from nuclear physics, in-depth knowledge of these topics is unnecessary. Instead, the essential aspects of the data sets will be explained in the notebooks. Therefore, the main physics objective of this module is to learn how to display data in a physically meaningful, easy-to-read, and aesthetically pleasing way.

Data Science

The data science objectives from this module are as follows:

  • Be able to read documentation for functions from the various data science libraries
  • Be able to use Pandas to import a data file, format it as a Pandas Dataframe, and begin exploring and analyzing the data
  • Be able to use Matplotlib and Seaborn to further your analysis of the data set
  • Be able to use Matplotlib and Seaborn to make physically relevant plots of the data set both in two dimensions and in three dimensions

Suggested Course to Plug Into

This module would work well in a lab course for students who are already familiar with Python and want to start using Python to analyze and display their data. This module's appropriate level (i.e., first-year, second-year students, etc.) would depend on the student's prior programming knowledge. The physics data set is drawn from nuclear physics, but an in-depth understanding of this data sets and the surrounding physics is optional. However, this module could serve as an introduction to binding energies in a nuclear and particle physics course for undergraduates.

Time Needed for Students to Complete the Assignments: 2 hours

The notebook could be completed in a standard 2-3 hour lab. However, it is also possible to shorten the in-class time by assigning only the first notebook to be solved in class (~1-1.5 hours) and assigning the second notebook as homework (~1-1.5 hours). There are no follow-up homework assignments planned, but these could be added later.