# How Heavy is that Penny? (Piloted Spring 2025) (Student Version)

## ***Important: File → Save a copy in Drive (Do this first so you can save your work!)***

Pennies are manufactured to a given specification, including a specific mass.  Do those specifications change predictably over time?  Does normal wear and tear change the mass predictably over time?  

Because of the randomness inherent in both manufacturing and normal wear and tear, statistical testing is required to answer such questions.

This exercise will require insight and decision-making. Use your teamwork skills to decide as a group how to interpret and proceed forward at each step of these analysis tasks.

**Prior Knowledge Needed**
*  Basic statistical definitions including mean and standard deviation

**Content Learning Objectives:**
*   Define and use statistical concepts that model the effects of random error on a data set
*   Use careful documentation and/or the statistical Grubbs Test to decide when a data point can be discarded
*   Apply the Student T test to accept or reject a null hypothesis under several different sets of conditions

**Process Learning Objectives:**
*   Use Python code to transform data using structures such as arrays
*   Use Python code to visualize data using different types of graphs
*   Work in teams to manage and share data, and document and share causes of outlier data
*   Work in teams to evaluate outcomes of statistical tests

**Overview:**

This Jupyter notebook can be used to generate code in Python to perform a variety of statistical analysis tasks:

1.  Upload multiple CSV files using Pandas.
2.  Reorganize class data into NumPy arrays; calculate the mean and standard deviation over each NumPy array.
4.  Use the statistical Grubbs test to recognize and remove true outliers.  
  *Optionally, take additional action to document and correct possible data entry errors*.
5.  Apply the statistical Student T test to determine whether two sets of data are significantly different.
6.  Perform a linear least-squares analysis with error propagation, and determine whether the slope of the best-fit line is significantly different from zero.
7.  Construct a histogram of stored data; construct a Gaussian model distribution; and use a statistical chi-squared test to determine whether the histogram is significantly different from the Gaussian model.




### Task 1: Uploading multiple CSV files using Pandas

#### **Important**: Record the masses and years of all available pennies first.  Each student's recorded data should be compiled in Microsoft Excel to generate and export a CSV file with masses in the first column and years in the second column, with the word "Brass" or "Zinc" and the student's initials in the filename.

#### **Important**: Upload the CSV files to Colab:
* First click the file folder icon to expand the Files sidebar.
* Next click the file upload icon to upload ALL CSV files in class data.
* Enter the filenames into the code as instructed below.

To upload multiple CSV files containing data on different types of pennies, we will use several packages.  The IO package allows us to read files, and the Pandas package allows us to interpret CSV files in the database format using the rows and columns of the CSV file.  This will import the CSV data into a dataframe within this Jupyter notebook.

The sample-code here uses several data structures: *arrays*; *dictionaries*; and *dataframes*.  It also uses several coding structures: a *function* that is used more than once to upload and combine the named CSV files into a dataframe; and a *for-loop* (which uses "for") to convert each uploaded file one-by-one.

1a) Using the information above, double-click here in this **text cell** and type into each of the comment lines to **explain the purpose** of each line of sample-code below in this text cell (they should look like the example shown here).

---
```
#### The code below is for:
```
---

1b) Enter **all** filenames of uploaded CSV files into the sample-code below within square brackets, separated by commas.  Each filename must be enclosed in quotes (like the example shown here):

---
```python
['filename_1','filename_2',...]
```
---

1c) Copy/paste all the code below, with your commented explanations into the **code cell** just below this text cell, and run it.  

1d) Finally, **answer the key question below** in your laboratory notebook. Discuss as a group. Ask for help if needed.

#### **Thinking About the Data, Task 1**: Suppose you thought you uploaded all the data files, but when you run the code below, the output doesn't look right.  *What should you do if*:
*   Q1a) Running the code below yields an error message.
*   Q1b) Running the code below yields the output "Importing 0 files:" and "Importing 0 files:"

---
```python
#### This streamlined code REQUIRES students to first upload the files into the file browser, or connect to Google Drive.
#### The code below is for:
brass_filenames = ['filename1.csv','filename2.csv']
zinc_filenames = ['filename3.csv','filename4.csv']

#### The code below is for:
import pandas as pd
import io

#### The code below is for:
def multiple_csv_dataframe(filenames):
    #### The code below is for:
    length = len(filenames)
    print (f"Importing {length} files:")

    #### The code below is for:
    dictionary = {}
    index = 0
    #### The code below is for:
    for filename in filenames:
        #### The code below is for:
        print(filename)       
        #### The code below is for:
        dictionary[index] = pd.read_csv(filename)
        index = index + 1

    #### The code below is for:
    print(f"Combining {length} dataframes")
    combined_dataframe = pd.concat(dictionary)
    return combined_dataframe

#### The code below is for:
big_db = multiple_csv_dataframe(brass_filenames)
big_dz = multiple_csv_dataframe(zinc_filenames)
```
---

### Task 2: Reorganizing class data into NumPy arrays.

#### **Important**: Complete Task 1 first.

The sample-code here reorganizes data from the imported dataframes into NumPy arrays, covering all years for which data are available.  A *dictionary* allows each array to be accessed by year.  A *for-loop* ensures all available data are parsed.  

2a) Using the information above, double-click here in this **text cell** and type into each of the comment lines to **explain the purpose** of each line of sample-code below in this text cell (they should look like the example shown here).

---
```
#### The code below is for:
```
---

2b) Copy/paste all the code along with your explanations into the **code cell** just below this text cell, and run it.  

2c) Take a look at the output and **decide as a group** if it looks reasonable.  Ask for help if needed.

2d) Finally, **answer the key question below** in your laboratory notebook.  Discuss as a group.  Ask for help if needed.

#### **Thinking About the Data, Task 2** Here are the focus questions at the top of this document.  How could the output from Task 2 help you to answer these questions?  For each question, *make a prediction* as to what you might see in the output below if the answer is yes, and how that might differ if the answer is no.  Finally, take a look at the output and write down what you notice.
*   Q2a) Pennies are manufactured to a given specification, including a specific mass.  Do those specifications change predictably over time?
*   Q2b) Does normal wear and tear change the mass predictably over time?
*   Q2c) Is there anything you notice right away in the output?

---
```python
#### The code below is for:
import numpy as np

#### The code below is for:
print(f"Creating separate arrays for masses of brass pennies in each year...")
y1b = np.min(big_db.iloc[:,1])
y2b = np.max(big_db.iloc[:,1])
print(f"...from {y1b} to {y2b}")
#### The code below is for:
big_grouped_db = big_db.groupby(by=["Year"])

#### The code below is for:
brass_mass_by_year = np.empty(0)
#### The code below is for:
for year, data in big_grouped_db:
    y = year[0]
    year_array = data["Mass (g)"].values
    brass_mass_by_year[y] = year_array
    print(f"For year {y}, {len(year_array)} masses were counted, with average and standard deviation of {np.average(year_array):.4f} ± {np.std(year_array,ddof=1):.4f} g.")

#### The code below is for:
print(f"Creating separate arrays for masses of zinc pennies in each year...")
y1z = np.min(big_dz.iloc[:,1])
y2z = np.max(big_dz.iloc[:,1])
print(f"...from {y1z} to {y2z}")
#### The code below is for:
big_grouped_dz = big_dz.groupby(by=["Year"])

#### The code below is for:
zinc_mass_by_year = np.empty(0)
#### The code below is for:
for year, data in big_grouped_dz:
    y = year[0]
    year_array = data["Mass (g)"].values
    zinc_mass_by_year[y] = year_array
    print(f"For year {y}, {len(year_array)} masses were counted, with average and standard deviation of {np.average(year_array):.4f} ± {np.std(year_array,ddof=1):.4f} g.")

```
---

### Task 3: Use the statistical Grubbs test to recognize outliers.

#### **Important**: Complete Task 2 first.

The sample-code here uses a Grubbs table to recognize outliers.  The Grubbs test is found in the outliers module in the outlier_utils package.  It will automatically remove any outlier it finds.

3a) Using the information above, double-click here in this **text cell** and type into each of the comment lines to **explain the purpose** of each line of sample-code below in this text cell (they should look like the example shown here).

---
```
#### The code below is for:
```
---

3b) Copy/paste all the code along with your explanations into the **code cell** just below this text cell, and run it.  

3c) Take a look at the output and **decide as a group** if it looks reasonable. Ask for help if needed.

3d) Finally,  **answer the key question below** in your laboratory notebook, discuss as a group, and take any steps your group deems appropriate.  Ask for help if needed.

#### **Thinking About The Data, Task 3**: Suppose when you run the code below, an outlier was found.  Everyone else sees the same result when they run the code, and talks about it.  What should **you** do if running the code below yields an outlier, and:
*   Q3a) The student who measured that mass says it was a typo, and offers the correct value?
*   Q3b) The student who measured that mass says it was zinc and not brass?
*   Q3c) Nobody seems to have measured that mass?

***After answering these questions:*** Take any steps your group deems appropriate for the results *you* see in the output below.  

---
```python
#### The code below is for:
%pip install outlier_utils  --quiet
from outliers import smirnov_grubbs as grubbs
print("Package installed!")

#### The code below is for:
brass_avg_by_year = np.empty(0)
#### The code below is for:
for y, data in big_grouped_db:
    year = y[0]
    #### The code below is for:  
    data1 = brass_mass_by_year[year]
    #### The code below is for:  
    data2 = grubbs.test(data1, alpha=.05)    
    #### The code below is for:
    if(len(data2)==len(data1)):
        print(f"Year {year}: All {len(data1)} masses of brass pennies passed Grubbs test")
    #### The code below is for:  
    else:
        failed = len(data1) - len(data2)
        print(f"Year {year}: {failed} masses of brass pennies failed Grubbs test. Masses rejected: {np.setdiff1d(data1,data2)}")
        brass_mass_by_year[year] = data2
    #### The code below is for:
    brass_avg_by_year = np.append(brass_avg_by_year,np.average(data2))


#### The code below is for:
zinc_avg_by_year = np.empty(0)
#### The code below is for:
for y, data in big_grouped_dz:
    year = y[0]
    #### The code below is for:  
    data1 = zinc_mass_by_year[year]
    #### The code below is for:  
    data2 = grubbs.test(data1, alpha=.05)    
    #### The code below is for:  
    if(len(data2)==len(data1)):
        print(f"Year {year}: All {len(data1)} masses of zinc pennies passed Grubbs test")
    #### The code below is for:
    else:
        failed = len(data1) - len(data2)
        print(f"Year {year}: {failed} masses of zinc pennies failed Grubbs test. Masses rejected: {np.setdiff1d(data1,data2)}")
        zinc_mass_by_year[year] = data2
    #### The code below is for:
    zinc_avg_by_year = np.append(zinc_avg_by_year,np.average(data2))

```
---

### Task 4: Apply the statistical Student T test to determine whether two sets of data are significantly different.

#### **Important**: Complete Task 3 first.  Look at the output to verify it is reasonable.  Take any actions deemed necessary.

The sample-code here uses a Student T test to look for significant differences between masses of pennies with the same composition, manufactured in different years.  First, we will use the saved yearly averages from Task 2 to choose the two years of manufacture that differ the most from one another, for each manufacturing material (brass and zinc).  We must recalculate the mean values and standard deviations for those years, in case outliers were removed in Task 3.

The Student T statistic can be calculated (using the Python math module) for two comparable data sets using a pooled standard uncertainty:

$t_{calc} = \frac{|\bar{x}_1-\bar{x}_2|}{u_{pooled}}$ where
$u_{pooled} = s_{pooled}\cdot\sqrt{\frac{N_1+N_2}{N_1\cdot N_2}}$ and
$s_{pooled} = \sqrt{\frac{s_1^2\cdot (N_1-1)+s_2^2\cdot (N_2-1)}{N_1+N_2-2}}$.

The critical value of the Student T statistic can be found in your textbook for a given confidence level (usually 95%), or looked up using the the SciPy package.  If the calculated Student T statistic is larger than the critical value, then the difference between the two data sets is significant at this level of confidence.

The SciPy package also has pre-programmed Student T test functions.  These do not output the critical T value, but they do output the p-value, which is the probability of the null hypothesis.  If the p-value is 0.05 or less, then the difference between the two data sets is significant at the 95% level of confidence.

We will run the test two ways: first using sample-code to build a test function that can accept Numpy arrays as input, and then using the pre-programmed test function.  

4a) Using the information above, double-click here in this **text cell** and type into each of the comment lines to **explain the purpose** of each line of sample-code below in this text cell (they should look like the example shown here).

---
```
#### The code below is for:
```
---

4b)  Copy/paste all the code along with your explanations into the **code cell** just below this text cell, and run it.  

4c)  Take a look at the output and **decide as a group** if it looks reasonable. Ask for help if needed.

4d)  Finally,  **answer the key question below** in your laboratory notebook.  Discuss as a group.  Ask for help if needed.

#### **Thinking About The Data, Task 4**: Now that you have run the Student T test, what does it mean?
*   Q4a) State the *null hypothesis* in your own words.  Be specific.
*   Q4b) State the conclusions you should draw if $t_{calc} < t_{table}$, and if $t_{calc} > t_{table}$.
*   Q4c) State the conclusions you should draw if $p < 0.05$, and if $p > 0.05$.
*   Q4d) **As a group**, use the output from Task 4 to begin to answer the focus questions, in light of the predictions you made in Task 2 for both the brass pennies and zinc pennies.  

---
```python
#### The code below is for:
import math
import scipy as sp

#### The code below is for:
def t_calc_pooled(array1,array2):
    #### The code below is for:
    n1 = len(array1)
    n2 = len(array2)
    #### The code below is for:
    x1 = np.average(array1)
    x2 = np.average(array2)
    #### The code below is for:
    s1 = np.std(array1,ddof=1)
    s2 = np.std(array2,ddof=1)
    
    #### The code below is for:
    s_pooled = math.sqrt((((s1**2)*(n1-1))+((s2**2)*(n2-1)))/(n1+n2-2))
    
    #### The code below is for:
    t_calc = (abs(x1-x2))*math.sqrt((n1*n2)/(n1+n2))/s_pooled
    #### The code below is for:
    return t_calc

#### The code below is for:
index_brass_light = np.argmin(brass_avg_by_year)+y1b
index_brass_heavy = np.argmax(brass_avg_by_year)+y1b
#### The code below is for:
print(f"Student T test for brass masses from years {index_brass_light} and {index_brass_heavy}, which differ the most:")

#### The code below is for:
t_calc_brass = t_calc_pooled(brass_mass_by_year[index_brass_light],brass_mass_by_year[index_brass_heavy])
#### The code below is for:
dof_brass = len(brass_mass_by_year[index_brass_light]) + len(brass_mass_by_year[index_brass_heavy])-2
#### The code below is for:
t_crit_brass = abs(sp.special.stdtrit(dof_brass,0.025))
#### The code below is for:
print(f"  Calculated t value = {t_calc_brass:.2f}; Critical t value at 95 percent confidence level = {t_crit_brass:.2f}")

#### The code below is for:
brass_packaged_result = sp.stats.ttest_ind(brass_mass_by_year[index_brass_light],brass_mass_by_year[index_brass_heavy])
print(f"  SciPy TtestResult: Calculated t value = {abs(brass_packaged_result.statistic):.2f}; p-value = {brass_packaged_result.pvalue:.2e}")

#### The code below is for:
index_zinc_light = np.argmin(zinc_avg_by_year)+y1z
index_zinc_heavy = np.argmax(zinc_avg_by_year)+y1z
#### The code below is for:
print(f"Student T test for zinc masses from years {index_zinc_light} and {index_zinc_heavy}, which differ the most:")

#### The code below is for:
t_calc_zinc = t_calc_pooled(zinc_mass_by_year[index_zinc_light],zinc_mass_by_year[index_zinc_heavy])
#### The code below is for:
dof_zinc = len(zinc_mass_by_year[index_zinc_light]) + len(zinc_mass_by_year[index_zinc_heavy])-2
#### The code below is for:
t_crit_zinc = abs(sp.special.stdtrit(dof_zinc,0.025))
#### The code below is for:
print(f"  Calculated t value = {t_calc_zinc:.2f}; Critical t value at 95 percent confidence level = {t_crit_zinc:.2f}")

#### The code below is for:
zinc_packaged_result = sp.stats.ttest_ind(zinc_mass_by_year[index_zinc_light],zinc_mass_by_year[index_zinc_heavy])
print(f"  SciPy TtestResult: Calculated t value = {abs(zinc_packaged_result.statistic):.2f}; p-value = {zinc_packaged_result.pvalue:.2e}")

```
---

### Task 5: Perform and apply a linear least-squares analysis with error propagation

In the previous task, we ran a test to see if there is a significant difference between two data sets. Now, we ask whether there is a significant trend among many data sets. To accomplish this, we will plot the data and use linear regression with error propagation.  

In linear regression, we fit the available data to a linear model.  The model has the equation of a line: $y = mx + b$, where $m$ is the slope and $b$ is the intercept.  With error propagation, the fit will also yield standard uncertainties $u_x$ and $u_b$ of the slope and intercept, respectively.

In previous examples, you ran sample-code for both types of pennies (brass and zinc).  Here, the sample-code will be provided for brass pennies and you must modify it to generate your own code cell to analyze zinc pennies.

5a) Using the information above, double-click here in this **text cell** and type into each of the comment lines to **explain the purpose** of each line of sample-code below in this text cell (they should look like the example shown here).

---
```
#### The code below is for:
```
---2a) Using the information above, double-click here in this **text cell** and type into each of the comment lines to **explain the purpose** of each line of sample-code below in this text cell (they should look like the example shown here).

---
```
#### The code below is for:
```
---

5b)  Copy/paste all the code along with your explanations into the **code cell** just below this text cell, and run it.  

5c)  Next, copy/paste the code into the s**econd code cell** below and **modify** the code to make a corresponding plot for the zinc pennies.  

5d)  Take a look at the output for each type of penny and **decide as a group** if it looks reasonable. Ask for help if needed.

5e)  Finally,  **answer the key question below** for each type of penny (brass and zinc) in your laboratory notebook.  Discuss as a group.  Ask for help if needed.

#### **Thinking About the Data, Task 5**: Now that you have found the the best-fit line and its standard error, what does it mean?
*   Q5a) State the *null hypothesis* in your own words.  Be specific.
*   Q5b) State the conclusions you should draw if $m < u_m$, and if $m > u_m$.
*   Q5c) Write one complete sentence to explain the meaning of $b$.
*   Q5d) **As a group**, draw conclusions regarding both types of pennies from the output of Task 5.  Explain how these relate to the focus questions, and how you might modify your initial evaluations from Task 4.

***Challenge Question***: (for a bonus point) The larger dataset plotted here was not altered by the Grubbs test, so you may notice outliers on your graph.  How could you plot the data set without outliers, using a Numpy array defined in Task 3?  Just write a short plan; you do not have to carry it out.

---
```python
#### The code below is for:
import matplotlib.pyplot as plt

#### The code below is for:
plt.figure(figsize=(16,4))
#### The code below is for:
plt.axis()
#### The code below is for:
plt.xlabel('Year')
plt.ylabel('Mass in grams')
#### The code below is for:
plt.xticks(range(y1b,y2b+1))

#### The code below is for:
brass_array = np.transpose(big_db.to_numpy())
#### The code below is for:
brass_year_list = brass_array[1]
brass_mass_list = brass_array[0]
#### The code below is for:
plt.plot(brass_year_list,brass_mass_list,'ob')

#### The code below is for:
model = sp.stats.linregress(brass_year_list,brass_mass_list)
#### The code below is for:
model_label = r'Linear model: ({:.2e}±{:.2e})∙x + ({:.2f}±{:.2f})'.format(model.slope,model.stderr,model.intercept,model.intercept_stderr)
#### The code below is for:
plt.plot(brass_year_list,model.slope*brass_year_list+model.intercept,'--b',label=model_label)

#### The code below is for:
plt.legend(loc=2)
plt.show()

```
---

### Task 6: Construct a histogram of stored data; construct a Gaussian model distribution; and compare the two

All statistical testing is based on assumptions about the nature of the data.  One of these assumptions is that randomly distributed deviations from the mean should lead to a Gaussian distribution for a large data set.  This assumption can be tested by fitting the largest possible data set to a Gaussian model.  The position and shape of the model should match the mean and standard deviation of the data set.  The standard Gaussian distribution has an integrated size of 1 = 100\%, which will need to be rescaled (*normalized*) to match the size of our data set.

The Gaussian model we wish to calculate can easily be defined and normalized to the correct size using the Python math package:

$y_{calc} = \frac{Number\ of\ pennies}{100}\cdot g(\bar{x},s)$; where
$g(\bar{x},s) = \frac{1}{s\sqrt{2\pi}}exp^{-(x-\bar{x})^2/2s^2}$.

In previous examples, you generated one graph at a time. Here, the sample-code will generate two graphs side-by-side.  The language (*syntax*) for doing so is a little different; it uses "object-oriented" commands to set up the graphs.  See if you can tell how the code specifies that two data sets (the histogram and the model curve) should be placed *together* on the same graph, and that the data for brass and zinc pennies should be placed on *different* graphs in the figure.

To test whether the data set deviates significantly from an appropriately fitted Gaussian model, the $\chi^2$ (*chi-squared*) statistic is used.  It can easily be calculated using Numpy arrays and the Python math package:

$\chi^2_{calc} = \Sigma_n \frac{(y_{obs}-y_{calc})^2}{y_{calc}}$

For comparison, critical values of chi-squared can be found in your lab manual or looked up using SciPy at a given level of confidence (usually 95%).  If the calculated $\chi^2$ value is larger than the critical value, then the difference is significant at that level of confidence.  

6a) Using the information above, double-click here in this **text cell** and type into each of the comment lines to **explain the purpose** of each line of sample-code below in this text cell (they should look like the example shown here).

---
```
#### The code below is for:
```
---

6b) Copy/paste all the code along with your explanations into the **code cell** just below this text cell, and run it.

6c) Take a look at the output for each type of penny and **decide as a group** if it looks reasonable. Ask for help if needed.

6d) Finally, **answer the key question** below for each type of penny (brass and zinc) in your laboratory notebook. Discuss as a group.  Ask for help if needed.

#### **Thinking About the Data, Task 6**: Now that you have created the histograms and fit them to a Gaussian model, what does it mean?
*   Q6a) State the null hypothesis (for each type of penny) in your own words. Be specific.
*   Q6b) State the conclusions you should draw if $\chi^2_{calc} < \chi^2_{critical}$, and if $\chi^2_{calc} > \chi^2_{critical}$.
*   Q6c) **As a group**, draw conclusions regarding both types of pennies from the output of Task 6. Explain how these relate to the validity of tests performed in Tasks 4 and 5, and how you might modify your initial evaluations from Tasks 4 and 5.


***Challenge Question***: (for a bonus point) The larger dataset plotted here was not altered by the Grubbs test, so you may notice outliers on your histogram. How could these affect the calculated value of $\chi^2$? Based on what you have seen in your results, do you think removing the outliers could change your assessment of the validity of tests performed in Tasks 4 and 5?

---
```python
#### The code below is for:
def gaussian(x, avg, sd):
    return (
        1.0 / (np.sqrt(2*np.pi)*sd) * np.exp(-np.power((x - avg)/sd,2)/2)
    )

#### The code below is for:
def chi_squared_statistic(y_array,model_array):
    n = len(x_array)  
    chi_sq = 0
    for ibin in range(n):     
        ycalc = model_array[ibin]
        yobs = y_array[ibin]
        if yobs > 1: chi_sq = chi_sq + ((yobs-ycalc)**2)/ycalc
    return chi_sq  

#### The code below is for:
dual_figure = plt.figure(figsize=(14,6))
plt_brass, plt_zinc = dual_figure.subplots(1,2)

#### The code below is for:
n_bins_brass = math.ceil((np.max(brass_array[0])-np.min(brass_array[0]))/0.01)
#### The code below is for:
bin_midpoints_brass = (np.arange(n_bins_brass)*0.01+np.min(brass_array[0])+0.005)
#### The code below is for:
bin_endpoints_brass = (np.arange(n_bins_brass+1)*0.01+np.min(brass_array[0]))
#### The code below is for:
brass_hist_np = np.histogram(brass_mass_list,bins=bin_endpoints_brass)
brass_hist_values = brass_hist_np[0]
#### The code below is for:
plt_brass.stairs(brass_hist_values,bin_endpoints_brass,fill=True)

#### The code below is for:
avg_brass = np.average(brass_mass_list)
#### The code below is for:
sd_brass = np.std(brass_mass_list,ddof=1)
#### The code below is for:
n_brass = len(brass_mass_list)
#### The code below is for:
brass_model = ((n_brass/100))*gaussian(bin_midpoints_brass,avg_brass,sd_brass)
#### The code below is for:
chi_squared_brass = chi_squared_statistic(brass_hist_values,brass_model)
#### The code below is for:
chi_squared_crit_brass = sp.stats.chi2.ppf(0.95,n_bins_brass-1)
#### The code below is for:
brass_model_label = r'Model Chi-squared = {:.2f}, Critical value = {:.2f}'.format(chi_squared_brass,chi_squared_crit_brass)
#### The code below is for:
plt_brass.plot(bin_midpoints_brass,brass_model,'--b',label=brass_model_label)

#### The code below is for:
plt_brass.set_xlabel('Masses of Brass Pennies')
plt_brass.set_ylabel('Number of Brass Pennies')
#### The code below is for:
plt_brass.set_title('Brass Mass Distribution and Gaussian Model')
#### The code below is for:
plt_brass.legend(loc=9)

#### The code below is for:
n_bins_zinc = math.ceil((np.max(zinc_array[0])-np.min(zinc_array[0]))/0.01)
#### The code below is for:
bin_midpoints_zinc = (np.arange(n_bins_zinc)*0.01+np.min(zinc_array[0])+0.005)
#### The code below is for:
bin_endpoints_zinc = (np.arange(n_bins_zinc+1)*0.01+np.min(zinc_array[0]))
#### The code below is for:
zinc_hist_np = np.histogram(zinc_mass_list,bins=bin_endpoints_zinc)
zinc_hist_values = zinc_hist_np[0]
#### The code below is for:
plt_zinc.stairs(zinc_hist_values,bin_endpoints_zinc,fill=True)

#### The code below is for:
avg_zinc = np.average(zinc_mass_list)
#### The code below is for:
sd_zinc = np.std(zinc_mass_list,ddof=1)
#### The code below is for:
n_zinc = len(zinc_mass_list)
#### The code below is for:
zinc_model = ((n_zinc/100))*gaussian(bin_midpoints_zinc,avg_zinc,sd_zinc)
#### The code below is for:
chi_squared_zinc = chi_squared_statistic(zinc_hist_values,zinc_model)
#### The code below is for:
chi_squared_crit_zinc = sp.stats.chi2.ppf(0.95,n_bins_zinc-1)
#### The code below is for:
zinc_model_label = r'Model Chi-squared = {:.2f}, Critical value = {:.2f}'.format(chi_squared_zinc,chi_squared_crit_zinc)
#### The code below is for:
plt_zinc.plot(bin_midpoints_zinc,zinc_model,'--b',label=zinc_model_label)

#### The code below is for:
plt_zinc.set_xlabel('Masses of Zinc Pennies')
plt_zinc.set_ylabel('Number of Zinc Pennies')
#### The code below is for:
plt_zinc.set_title('Zinc Mass Distribution and Gaussian Model')
#### The code below is for:
plt_zinc.legend(loc=9)

#### The code below is for:
plt.show()

```
---

***Congratulations, you did it!***

Please download your copy of this notebook to turn in.

Remember to write a summary in your laboratory notebook, by hand, and to turn in copies of your notebook pages.