# Materials+ML Workshop Day 4

![logo](logo.svg)


## Day 4 Agenda:

* Questions about Day 3 Material
* Review of Day 3

Content for today:

* Data Manipulation:
    * The Pandas Package
    * Working with DataFrames
* Visualizing Data
    * The Matplotlib package
    * Visualizing 1D data
    * Visualizing 2D and 3D data

## Background Survey

<img src="survey_qr_2025.png" width=400/>

## [https://forms.gle/ArUHPp2C6TdLF5dQ7](https://forms.gle/ArUHPp2C6TdLF5dQ7)

## The Workshop Online Book:

### [https://cburdine.github.io/materials-ml-workshop/](https://cburdine.github.io/materials-ml-workshop/)

## Tentative Week 1 Schedule:

<table>
    <tr>
        <td>Session</td>
        <td>Date</td>
        <td>Content</td>
    </tr>
    <tr>
        <td>Day 1</td>
        <td>06/09/2025 (2:00-4:00 PM)</td>
        <td>Introduction, Python Data Types</td>
    </r>
    <tr>
        <td>Day 2</td>
        <td>06/10/2025 (2:00-4:00 PM)</td>
        <td>Python Functions and Classes</td>
    </tr>
    <tr>
        <td>Day 3</td>
        <td>06/11/2025 (2:00-4:00 PM)</td>
        <td>Scientific Computing with Numpy and Scipy</td>
    </tr>
    <tr>
        <td><b>Day 4</b></td>
        <td><b>06/12/2025 (2:00-4:00 PM)</b></td>
        <td><b>Data Manipulation and Visualization</b></td>
    </tr>
    <tr>
        <td>Day 5</td>
        <td>06/13/2025 (2:00-4:00 PM)</td>
        <td>Materials Science Packages, Introduction to ML</td>
    </tr>
</table>

# Questions

Material covered yesterday:
* Installing Python packages
* Numpy
* Scipy

# Review: Day 3

## Numpy Package

* Numpy supplies mathematical functions (such as `sin(x)`, `exp(x)`, etc.)
* Numpy arrays (`numpy.ndarray`) are multi-dimensional data structures
* These arrays can represent vectors, matrices, tensors, etc.

* Creating Numpy arrays:

In [1]:
import numpy as np

# create a 1D array:
x = np.array([1.0, 2.0, 3.0, 4.0])
print(x)

# create a 2D array (matrix):
X = np.array([
    [1,2,3],
    [4,5,6],
    [7,8,9]
])
print(X)

[1. 2. 3. 4.]
[[1 2 3]
 [4 5 6]
 [7 8 9]]


* Every array has an instance variable `shape`
* The length of the tuple is the dimension of the array
* The entries in the tuple represent the size of the array along each axis (i.e. dimension)

In [2]:
# x is a 1D array of length 4:
print(x.shape)

# X is a 3x3 matrix:
print(X.shape)

# create an array of zeros with a 3x2x2 shape:
S = np.zeros((3,2,2))
print(S.shape)

(4,)
(3, 3)
(3, 2, 2)


* Numpy arrays can be indexed like Python lists, but with some added features:

In [3]:
X = np.array(range(1,10)).reshape((3,3))
print(X)

# access row 0:
print('Accessing X[0]:')
print(X[0])

# access row 0, column 2:
print('Accessing X[0,2]:')
print(X[0,2])

# access column 0:
print('Accessing X[:,0]:')
print(X[:,0])

[[1 2 3]
 [4 5 6]
 [7 8 9]]
Accessing X[0]:
[1 2 3]
Accessing X[0,2]:
3
Accessing X[:,0]:
[1 4 7]


* All math operations on arrays are performed elementwise
* Numpy support matrix multiplications with the `@` operator

In [4]:
A = np.array(range(1,5)).reshape(2,2)
D = np.diag([1,2])

print('A:\n', A)
print('D:\n', D)

# elementwise addition:
print(A + D)

# matrix multiplication:
print(A @ D)

A:
 [[1 2]
 [3 4]]
D:
 [[1 0]
 [0 2]]
[[2 2]
 [3 6]]
[[1 4]
 [3 8]]


* One important `numpy` function we will use a lot today is `np.linspace`:

In [5]:
start = 0.0
end = 10.0
n_pts = 11

# create a 1D array of uniform points:
x_pts = np.linspace(start, end, n_pts)
print(x_pts)

[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10.]


## Scipy Package
* Scipy provides many useful subpackages for scientific computing

* Subpackages you may find useful include:
    * `scipy.constants`: physical constants, unit conversions
    * `scipy.optimize`:  functions for optimization and root finding
    * `scipy.integrate`: functions numerical integration
    * `scipy.stats`: statistical analysis functions
    * `scipy.special`: special functions (e.g. Bessel functions)

## New Content:

* More Python packages:
    * Pandas ("Panel Datasets")
    * Matplotlib ("MATLAB-like plotting library")

## Checking if Packages are installed

* The quickest way to check if a package is installed on your system is to import it:

In [6]:
import matplotlib

In [7]:
import pandas

## Installing Pandas:

In [8]:
!pip install pandas



## Installing Matplotlib:

In [9]:
!pip install matplotlib



## Pandas

* Pandas is an open-source Python package for data manipulation and analysis.
* It can be used for reading writing data to several different formats including:
    * CSV (comma-separated values)
    * Excel spreadsheets
    * SQL databases

* We can import pandas as follows:

In [10]:
import pandas as pd

## DataFrames
* Pandas introduces the `DataFrame` type for manipulating data
* We can create DataFrames from Python dictionaries as follows:

In [13]:
# Data on the first four elements of the periodic table:
elements_data = {
    'Element' : ['H', 'He', 'Li', 'Be'],
    'Atomic Number' : [ 1, 2, 3, 4 ],
    'Mass' : [ 1.008, 4.002, 6.940, 9.012],
    'Electronegativity' : [ 2.20, 0.0, 0.98, 1.57 ]
}

# construct dataframe from data dictionary:
df = pd.DataFrame(elements_data)

## Tutorial: Working with Pandas DataFrames

* Accessing Dataframe columns
* Filtering Dataframes
* Transforming Data
* Importing and exporting data

## Exercise: Working with Pandas DataFrames

* Exploring the Periodic Table
    * Download the Periodic Table [CSV file](https://gist.github.com/GoodmanSciences/c2dd862cd38f21b0ad36b8f96b4bf1ee/archive/1d92663004489a5b6926e944c1b3d9ec5c40900e.zip).
    * Answer the following questions:
        * What fraction of elements in the Periodic Table were discovered before 1900?
        * Which elements have at least 100 isotopes?
        * What is the average atomic mass of the radioactive elements?

## Matplotlib

* Matplotlib is a MATLAB-like plotting utility for creating publication-quality plots
* In matplotlib, we typically import the `pyplot` subpackage with the alias `plt`:

In [12]:
import matplotlib.pyplot as plt

## Tutorial: Plotting with Matplotlib

* Plotting 1D data
* Styling plots
* Adding axes labels, titles, legends
* Typesetting
* Plotting in 3D
* Saving figures

## Exercises: Plotting with Matplotlib

* Histograms
* Chaotic Dynamical Systems

## Recommended Reading:
* Materials Science Python Packages

Bring your questions to our next meeting tomorrow!