<a href="https://colab.research.google.com/github/cbe410/python-data-analysis/blob/master/CBE_410_Python_Assignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy.optimize import curve_fit
!pip install xlsxwriter
import xlsxwriter

# CBE 410 Python Assignment



This assignment is designed to allow you to practice your skills in retrieving, and exporting data, curve fitting, and data plotting, including the use of subplots. The substrate for your work is some basic thermodynamic and reaction data. Pay close attention to units! Please turn in Python Notebooks, or printouts thereof, to submit your assignments.


---









The text file hosted on the cbe410 Github repository [here](https://github.com/cbe410/python-data-analysis/raw/master/Antoine_data.txt) is a comma-separated list of ascii data for the temperature dependence of the saturated vapor pressure of an unknown substance, where the first column is $T$ in degrees C and the second is $P^{\mathrm{sat}}$ in kPa. Use Python to read this data from the repository, and to fit the data to the Antoine equation, shown below, with $T$ in Kelvin. (Hint: for loading the data, you can use `numpy.loadtxt` with a comma specified as the delimiter. You are welcome to use any other file read method as well).

For this purpose, you should use a least-squares curve fitting tool. Such a tool is effectively solving for a set of coefficients (here, [A, B, C]) that minimizes the error between the function and the data. The SciPy function ```scipy.optimize.curve_fit``` is likely a good choice. Excellent documentation of this function is available [here](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html).
(*pro tip: scroll all the way down on the page to find complete examples of the use of the function!*)

As part of your fitting, you will need to define a function that evaluates pressure, given a set of $[T, A, B, C]$. In your function definition, `def func(x, a, b, c)` for example, the `x` here is temperature, and the remaining parameters are those that we will be solving for.

Plot the data using marker symbols with a thin black border and colored interior and show the Antoine fit using a solid line. Be sure to annotate your axes properly.

\begin{equation}
 \ln P^\mathrm{{sat}} = A - \frac{B}{T(K) + C}
 \end{equation}

Per the [NIST Webbook](https://webbook.nist.gov/cgi/cbook.cgi?ID=C91203&Mask=4&Type=ANTOINE&Plot=on#ANTOINE), the Antoine coefficients for napthalene as reported by Fowler and Trump (1968) are as shown below, for P in bars and T in K, with the Antoine equation given as $\log_{10}(P)$.
\begin{eqnarray}
 A & \approx & 4.27 \\
 B & \approx & 1831.6 \\
 C & \approx & -61.3
\end{eqnarray}

Write a function to calculate the saturated vapor pressure of napthalene as a function of temperature. Choose a suitable number of temperature points (about 25) and plot the dependence of the saturated vapor pressure in bars from 350 to 500 K.

Finally, write the data to an Excel workbook/worksheet, to a JSON file, and to a simple text file. For the Excel *output*, you may wish to use `xlsxwriter` or you may use `pandas`, or any other python code that does the job. If you use ```xlsxwriter``` you may need to install the package using ```pip``` as shown above at the start of the Notebook. For Pandas, you first have to create a dataframe using `pd.DataFrame`. Thereafter, you can use `.to_excel` to write it.

You should write your data as you would expect it, and as the data series we've been working with have been presented, i.e. with series in columns. So temperature in the first column, and pressure in the second.

For `xlswriter` you will need to shape your data into an appropriate form to make it possible to sequentially write the rows of the Excel sheet. Use the `zip(a,b)` function to do this. See [here](https://xlsxwriter.readthedocs.io/index.html) for more info.


For the text output, `numpy.savetxt()` should suffice. See [here](https://numpy.org/doc/stable/reference/generated/numpy.savetxt.html) for more info. You will also need to use shaped data, as achieved with the `zip` method to get the right format. For the JSON output (*JavaScript Object Notation*) - this is a lightweight human-readable format that is particularly popular in web applications. You'll need to `import json` and then work with it. See below for the code snippet.

In [None]:
import json
x={'Currency':['Pounds','T.Lira','S.Francs','US Dollars'], 'Origin':['UK','Turkey','Switzerland','USA']}
filename='dataout.txt'
with open(filename, 'w') as f:
        json_data = json.dumps(x)
        f.write(json_data)

Create a 2x1 subplot (i.e. 2 rows, 1 column) and plot the saturated vapor pressure in kPa for the unknown substance in the first question in the top plot, and the saturated vapor pressure of the napthalene in the bottom plot, *using a common x-axis* with a temperature range of 250 to 500 K. Show the data in colored markers with a thin black border, and plot lines in each case that represent the respective Antoine equations.

In our introductory session on Python we used the data in the Excel sheet in the Github repository [here](https://github.com/cbe410/python-data-analysis/raw/master/SampleData.xlsx) to determine the activation energy for a particular reaction. We linearized the Arrhenius equation and used a simple linear regression to determine $E_a$. Use a non-linear lsq curve fit instead, to the full exponential form. How do the fit values of $k_0$ and $E_a$ compare to the previously obtained one? What are your thoughts regarding the quality of the fit? 