<a href="https://colab.research.google.com/github/albaugh/CHE7507/blob/main/python_packages.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Alex Albaugh.  Wayne State.  CHE 5995/7507.  Lecture 3.  Winter 2026.

Basic Python is great and very functional.  But the true power of Python lies in packages.  Packages are pre-built, specialty pieces of code that you can import ant use.  There's no need to reinvent the wheel!  Below we'll explore three common Python packages- <code>matplotlib</code> for plotting/graphing, <code>numpy</code> for data analysis and advanced/efficient math, and <code>pandas</code> for reading in data.

# **MATPLOTLIB**

To use <code>matplotlib</code> we must first <code>import</code> it.  We can then give it a nickname using <code>as</code>, so that we can quickly reference it without having to type <code>matplotlib</code> all the time.  The standard nickname for <code>matplotlib</code> is <code>plt</code>.

In [None]:
import matplotlib.pyplot as plt

We can now make line graphs using <code>plot</code>.  We first make an 'figure' and 'axis' object and then modify them as we wish.

In [None]:
x = [1.0, 2.0, 3.0, 4.0]
y = [0.0, 1.5, 2.0, 1.5]

fig, ax = plt.subplots()
ax.plot(x, y)
plt.show()

This looks like crap.  Using proper technical communication principles, we can add axis labels, a title, make the fonts bigger, and throw in some gridlines.  The gridlines are optional, but Alex thinks they look better.

In [None]:
x = [1.0, 2.0, 3.0, 4.0]
y = [0.0, 1.5, 2.0, 1.5]

fig,ax = plt.subplots()
ax.plot(x,y)
ax.set_xlabel('x',fontsize=20)
ax.set_ylabel('y',fontsize=20)
ax.set_title('Figure 1: x vs. y',fontsize=24)
ax.grid()
plt.show()

We can plot multiple curves on the same graph.  We can give each a label and then display a legend.  We can also assign the curves to be whichever colors we want.  See this page (https://matplotlib.org/stable/gallery/color/named_colors.html) for a list of colors you can use.  You can look here (https://matplotlib.org/stable/gallery/lines_bars_and_markers/linestyles.html) for different linestyles.

In [None]:
x = [1.0, 2.0, 3.0, 4.0]
y1 = [0.0, 1.5, 2.0, 1.5]
y2 = [4.2, 3.1, 2.9, 3.1]
y3 = [1.0, 1.5, 0.5, 0.7]

fig,ax = plt.subplots()
ax.plot(x,y1,color='r',linestyle='-',label='$y_{1}$')
ax.plot(x,y2,color='purple',linestyle='--',label='$y_{2}$')
ax.plot(x,y3,color='lime',linestyle='-.',label='$y_{3}$')
ax.set_xlabel('$x$',fontsize=20)
ax.set_ylabel('$y$',fontsize=20)
ax.set_title('Figure 2: $x$ vs. $y_{i}$',fontsize=24)
ax.legend(fontsize=16,loc='upper right')
ax.grid()
plt.show()

We can also make scatter plots.  Check this documentation (https://matplotlib.org/stable/api/markers_api.html) for the different possible markers.

In [None]:
x = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
y1 = [0.0, 1.5, 2.0, 1.5, 1.0, 0.5, 0.0]
y2 = [4.2, 3.1, 2.9, 3.1, 3.2, 3.1, 2.8]
y3 = [1.0, 1.5, 0.5, 0.7, 0.5, 0.6, 0.1]

fig, ax = plt.subplots()
ax.scatter(x,y1,marker='x',color='r',label='$y_{1}$')
ax.scatter(x,y2,marker='s',color='purple',label='$y_{2}$')
ax.scatter(x,y3,marker='o',color='lime',label='$y_{3}$')
ax.set_xlabel('$x$',fontsize=20)
ax.set_ylabel('$y$',fontsize=20)
ax.set_title('Figure 3: $x$ vs. $y_{i}$',fontsize=24)
ax.legend()
ax.grid()
plt.show()

We can also change the ranges of the axes.

In [None]:
x = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
y1 = [0.0, 1.5, 2.0, 1.5, 1.0, 0.5, 0.0]
y2 = [4.2, 3.1, 2.9, 3.1, 3.2, 3.1, 2.8]
y3 = [1.0, 1.5, 0.5, 0.7, 0.5, 0.6, 0.1]

fig, ax = plt.subplots()
ax.scatter(x,y1,marker='x',color='r',label='$y_{1}$')
ax.scatter(x,y2,marker='s',color='purple',label='$y_{2}$')
ax.scatter(x,y3,marker='o',color='lime',label='$y_{3}$')
ax.set_xlabel('$x$',fontsize=20)
ax.set_ylabel('$y$',fontsize=20)
ax.set_title('Figure 3: $x$ vs. $y_{i}$',fontsize=24)
ax.set_ylim(-5.0, 5.0)
ax.set_xlim(-2.0, 8.0)
ax.legend()
ax.grid()
plt.show()

Another useful feature is that we can set the axes to be logarithmic scale if we want.  This is very useful if the data ranges over many orders of magnitude.

In [None]:
x = [1.0, 20.0, 300.0, 4000.0, 50.0, 600.0, 7000.0]
y1 = [1.0, 10.5, 200.0, 100.5, 1000.0, 50.5, 10.0]
y2 = [40.2, 30.1, 20.9, 300.1, 30.2, 300.1, 2.8]
y3 = [1000.0, 10.5, 6000.5, 400.7, 0.5, 0.6, 10.1]

fig, ax = plt.subplots()
ax.scatter(x,y1,marker='x',color='r',label='$y_{1}$')
ax.scatter(x,y2,marker='s',color='purple',label='$y_{2}$')
ax.scatter(x,y3,marker='o',color='lime',label='$y_{3}$')
ax.set_xlabel('$x$',fontsize=20)
ax.set_ylabel('$y$',fontsize=20)
ax.set_title('Figure 3: $x$ vs. $y_{i}$',fontsize=24)
ax.set_xscale('log')
ax.set_yscale('log')
ax.legend()
ax.grid()
plt.show()

We can also make histograms using <code>hist</code>.  The <code>bins</code> parameter gives the number of bins you want in your histogram.

In [None]:
x = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 0.0, 1.5, 2.0, 1.5, 1.0, 0.5, 0.0, 4.2, 3.1, 2.9, 3.1, 3.2, 3.1, 2.8, 1.0, 1.5, 0.5, 0.7, 0.5, 0.6, 0.1]

fig,ax = plt.subplots()
ax.hist(x,bins=6,color='orange')
ax.set_xlabel('$x$',fontsize=20)
ax.set_ylabel('$x$ count',fontsize=20)
ax.set_title('Figure 4: Histogram of $x$',fontsize=24)
ax.grid()
plt.show()

With our 'axis' and 'figure' objects, we can make multiple graphs in the same figure.

In [None]:
x1 = [0.0, 1.0, 2.0]
y1 = [2.0,1.0,0.0]
x2 = [0.0, 2.0, 4.0]
y2 = [4.0, 2.0, 0.0]
x3 = [-1.0, -0.5, 2.0]
y3 = [3.0, 4.0, 1.0]
x4 = [0.5, 0.6, 0.7]
y4 = [1.0, 2.0, 4.0]
x5 = [5.0, 6.0, 8.0]
y5 = [-3.0, -4.0, -1.0]
x6= [0.0, 1.0, 2.0]
y6 = [0.0, 1.0, 2.0]

fig, axes = plt.subplots(2,3,figsize=(12,4))
axes[0][0].plot(x1,y1,color='k',label='data 1')
axes[0][1].plot(x2,y3,color='purple',label='data 2')
axes[0][2].plot(x3,y3,color='b',label='data 3')
axes[1][0].plot(x4,y4,color='lime',label='data 4')
axes[1][1].plot(x5,y5,color='gold',label='data 5')
axes[1][2].plot(x6,y6,color='orange',label='data 6')

print(axes.shape)

for axis_row in axes:
  for ax in axis_row:
    ax.set_xlabel('$x$')
    ax.set_ylabel('$y$')
    ax.grid()
    ax.legend(loc='upper right')

plt.show()



We can also do 3D plots.

---



In [None]:
fig = plt.figure()
ax = fig.add_subplot(projection='3d')

x = [0.0, 0.7, 1.0, 0.7, 0.0, 0.7, 1.0]
y = [1.0, 0.2, 0.0, 0.2, 1.0, 0.2, 0.0]
z = [0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0]

ax.plot(x,y,z,color='r',label='line')

a = [0.0, 1.5, 2.0, 1.5, 1.0, 0.5, 0.0]
b = [4.2, 3.1, 2.9, 3.1, 3.2, 3.1, 2.8]
c = [1.0, 1.5, 0.5, 0.7, 0.5, 0.6, 0.1]

ax.scatter(a,b,c,color='b',marker='o',label='scatter')

ax.set_xlabel('$x$',fontsize=20)
ax.set_ylabel('$y$',fontsize=20)
ax.set_zlabel('$z$',fontsize=20)
ax.set_title('3D Example',fontsize=24)
ax.legend()
plt.show()

# **NUMPY**

'Numerical Python', <code>numpy</code>, adds in a lot of math functionality and is also great for vector, matrix, and linear algebra computations because it uses its own arrays (like basic Python lists, but optimized under the hood for fast calculations).  Again, we <code>import numpy</code> and its standard nickname is <code>np</code>.

In [None]:
import numpy as np

Numpy has a lot of auxiliary mathematical functions, like the exponential <code>exp</code> and the natural logarithm <code>log</code>.

In [None]:
print(np.exp(3.2))

In [None]:
print(np.log(8.3))

In [None]:
print(np.log(np.exp(-12.4)))

A lot of the power of <code>numpy</code> comes from its arrarys, <code>np.arrary</code>.  These are like basic Python lists, but they are optimized for numerical calculations, linear algebra, and matrix-vector manipulations.  Using these can make our code very efficient and very easy to do mathematical manipulations.  We can make <code>numpy</code> vectors as follows:

In [None]:
a = np.array([-1.1, 2.6, 2.1])
b = np.array([5, -2, 9])

print(a)
print(b)

The <code>+</code>, <code>-</code>, <code>*</code> and <code>/</code> operations on these vectors will do element-wise addition, subtraction, multiplcation, and division:

In [None]:
print(a+b)
print(a-b)
print(a*b)
print(a/b)

When we apply <code>numpy</code> mathematical operations a <code>np.array</code>, the operations work on every element.  In the second example the second element of <code>b</code> is a negative number and we cannot take the natural logarithm of a negative number.  We can see that the resulting behavior is a <code>nan</code>, meaning 'not a number', which is a placeholder value for when something goes wrong in a calculation.

In [None]:
print(np.exp(a))
print(np.log(b))

We can also raise each element to a power.

In [None]:
print(b**2) #squares each element in b
print(b**0.5) #takes the square root of each element in b
print(np.sqrt(b))

We can also make matrices, two-dimensional arrarys, using <code>np.array</code>.

In [None]:
A = np.array([[1.0, 3.4, -5.0],[-2.0, 1.0, 6.1], [1.1, 5.1, 1.0] ])
B = np.array([[3.2, 4.1, 6.0],[-2.5, -6.1, -0.9], [1.1, 5.4, 3.2] ])

print(A)
print(B)

We can easily look up the type of numbers stored in each array, the size of the array (the number of elements in it), the dimensions of the arrary (1 for vectors, 2 for matrices), and the shape of the array (the number of rows and columns):

In [None]:
#get the types of data stored in the array
print(a.dtype)
print(b.dtype)
print(A.dtype)

In [None]:
#get the number of elements in the array
print(a.size)
print(A.size)

In [None]:
#get the number of dimensions of the arrary (1 for vector, 2 for matrix)
print(a.ndim)
print(A.ndim)

In [None]:
#get the shape of the array- for a 1-D array this will be a single number, for a 2-D array it will be (# rows, # columns)
print(a.shape)
print(A.shape)

We can access elements of an arrary using an index.  For a 1-D array, the index starts at 0 and goes to $N$-1 where $N$ is the number of elements in the array.  This is called '0-indexing', as opposed to '1-indexing', which would start at 1.

In [None]:
print(a)

print(a[0])
print(a[1])
print(a[2])

We can select multiple elements of an arrary using a slice.

In [None]:
a = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

print(a[0:3]) #will select elements 0, 1, and 2


In [None]:
print(a[:3]) #this is equivalent to a[0:3]

In [None]:
print(a[3:7]) #this will select elements 3, 4, 5, and 6

In [None]:
print(a[4:10]) #this will select elements 4, 5, 6, 7, 8, and 9

In [None]:
print(a[4:]) #this is equivalent to a[4:10]

For matrices, we need to specify two idices (the row first, then the column) to access an element.

In [None]:
A = np.array([[1.0, 3.4, -5.0],[-2.0, 1.0, 6.1], [1.1, 5.1, 1.0] ])
print(A)

print(A[0,1])

We can use slices to select entire rows and entire columns.

In [None]:
print(A[:,1]) #this will select the entire second column

In [None]:
print(A[:,1:]) #this will select the second through final columns

In [None]:
print(A[1,:]) #this will select the entire second row

In [None]:
print(A[:2,:]) #this will select the first through the second rows

We can also use <code>reshape</code> to change the number of rows and columsn in an array.  

In [None]:
a = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

print(a.reshape((2,5)))

In [None]:
print(a.reshape((5,2)))

In [None]:
print(a.reshape((1,10))) #this will still technically create a 2-D array, which is 1 by 10
print(a.reshape((1,10)).shape)

In [None]:
print(a.flatten()) #this will flatten the data into a true 1-D arrary
print(a.flatten().shape)

<code>numpy</code> also has built in functions to generate sequences of data.  <code>np.linspace</code> generates a sequence based on a starting point, an ending point, and the number of points inbetween.  <code>np.arange</code> generates a sequence based on a starting point, an increment, and the number of points inbetween.

In [None]:
start = 0.0
end = 20.0
points = 15
x = np.linspace(start,end,points)
print(x)

In [None]:
start = 0.0
end = 12.5
increment = 0.5
x = np.arange(start, end, increment)
print(x)

<code>np.linspace</code> and <code>np.arange</code> are particularly great for when you want to plot functions.  Here we'll plot the function $y=x^2$.

In [None]:
#this will get me 1000 values of x between -2 and 2
x = np.linspace(-2,2,1000)

#we can build the y values from x now
y = x**2

#now we plot
fig, ax = plt.subplots()
ax.plot(x,y,color='m',linestyle=':',linewidth=3)
ax.set_xlabel('$x$',fontsize=20)
ax.set_ylabel('$y$',fontsize=20)
ax.grid()
plt.show()

<code>numpy</code> is also great for statistics and probability.  We can easily calculate the mean (<code>np.mean</code>), standard deviation (<code>np.std</code>), and variance (<code>np.var</code>) of arrarys of data.

In [None]:
a = np.array([-9.0, 8.0, -5.4, 3.0, 0.2, -2.7, 6.8])

print(np.mean(a)) #average
print(np.std(a)) #standard deviation
print(np.var(a)) #variance


We can also generate data according to random distributions.  For example, we can randomly generate a uniform number between two values with <code>np.random.uniform</code>.

In [None]:
min = 0.0
max = 1.0
print(np.random.uniform(min,max))

We can also generate an arrary of random numbers.

In [None]:
print(np.random.uniform(min,max,10))

Let's check the distribution with a histogram.

In [None]:
x = np.random.uniform(min,max,1000000)
fig,ax = plt.subplots()
ax.hist(x,bins=50,color='r')
ax.set_xlabel('random number')
ax.set_ylabel('count')
ax.grid()
plt.show()

Another useful random distribution is the normal (or Gaussian distribution), known as a bell curve with <code>np.random.normal</code>.  Here we need to input the average and standard deviation of the distribution we want.

In [None]:
mean = 10.0
std = 5.0
print(np.random.normal(mean,std))

As before we can generate a whole arrary of random normal numbers.

In [None]:
print(np.random.normal(mean,std,10))

And we can also check the distribution.

In [None]:
x = np.random.normal(mean,std,1000000)
fig,ax = plt.subplots()
ax.hist(x,bins=100,color='r')
ax.set_xlabel('random normal number')
ax.set_ylabel('count')
ax.grid()

Just for fun, let's check the mean and standard deviation of the generated data.

In [None]:
print(np.mean(x))
print(np.std(x))

# **PANDAS**

The <code>pandas</code> library in Python is great for reading data from a file and organzing the data prior to computation.  The name comes from "panel data".  The standard shorthand for <code>pandas</code> is <code>pd</code>.

In [None]:
import pandas as pd

To read a file we can use the <code>read_csv</code> function.  A .csv file is a "comma-separated value" file where different entries in a row are separated by commas.  The first row is usually labels for each column.  Each of the subsequent rows is a data point.  Let's load an example file called <code>storage_tank_data.csv</code>.  The first column is the output voltage (in millivolts) of a load sensor at the bottom of a tank.  The next column is the height of liquid in the tank (meters), followed by the ambient temperature (Fahrenheit) and ambient pressure (atm).

I will host any necessary files on my GitHub page, so that you can access it directly from a static URL, like so:

In [None]:
df = pd.read_csv('https://raw.githubusercontent.com/albaugh/CHE7507/refs/heads/main/Lecture3/storage_tank_data.csv')

Sometimes the dividers between data are not commas.  In a tab-separated value file the different values in a row are separated by tabs.  For this and other file times that use whitespace to separate values, we can use the following option with <code>read_csv</code>.

In [None]:
df = pd.read_csv('https://raw.githubusercontent.com/albaugh/CHE7507/refs/heads/main/Lecture3/storage_tank_data.txt',sep='\t')

In either case, we have loaded the file data into a <code>pandas dataframe</code>, which we have called <code>df</code>.  We can take a look at that dataframe.

In [None]:
print(df)

We can look at just the first few rows with <code>head</code>, just the last few rows with <code>tail</code> and the shape of the data with <code>shape</code>.

In [None]:
df.head()

In [None]:
df.tail()

In [None]:
df.shape

We can see what the labels of each of our columns is with <code>columns</code>.

In [None]:
df.columns

We can select subsets of data using the column names.  For example, if I just wanted an array of the output values, I could do the following.

In [None]:
output = df['output (mV)']
print(output)

We can use this new array for <code>numpy</code> operations.

In [None]:
print(np.sum(output))

We can select mulitple columns with the column header names or with slices.

In [None]:
ambient = df[['temperature (F)','pressure (atm)']]
print(ambient)

In [None]:
output_and_level = df[df.columns[0:2]]
print(output_and_level)

We can also select multiple rows individually or with slices.

In [None]:
print(df.iloc[[0,2,10,99]])

In [None]:
print(df.iloc[45:61])

We can do some plotting with the dataframe, too.

In [None]:
fig,ax = plt.subplots()
ax.scatter(df[df.columns[0]], df[df.columns[1]], color='r')
ax.set_xlabel(df.columns[0],fontsize=20)
ax.set_ylabel(df.columns[1],fontsize=20)
ax.grid()
plt.show()