# Scientific Computing

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/COGS18/LectureNotes-COGS18/blob/main/12-ScientificComputing.ipynb)


**[ad] Food Insecurity Survey**
![](img/SurveyInfoSlide.jpg)

**Q&A**

> Q: Are we getting command lines in the final exam?  
> A: You will need to know the shell commands and absolute/relative paths for both E2 and the final.

> Q: How do you know when to use a script or a module?  
> A: If you are writing the code, you consider the task. Do you want to store code to import elsewhere? Module. Or do you want to write some code that you want to run from top to bottom? Script!

> Q: How heavily will we be tested on our knowledge of a script and module? What type of questions can we expect?  
> A: For E2, just know the similarities and difference. For the final, we'll be creating them...but we'll have mroe practice before then.

> Q: I was wondering what the difference is going to be between E2 and the final? Is it just going to be similar to the first two midterms and cover more information or will it be more challenging, us attempting to create code like we do in assignments, etc.  
> A: E2 focuses on material from loops-clases. Final exam focuses on material after classes, culminating in a mini-project. We'll discuss details soon in class!

> Q: I'm still a little unsure about when to use self and how it works inside methods. 
> A: We'll discuss this more today - ask questions if still not clear, if you're comfortable!

> Q: Also, how do I know when to use a class vs. just writing functions?
> A: Well, typically, I'll tell you. But, when you have to decide on your own...if you want to keep attributes and methods organized together and use attributes across different methods, class. If not, just a function.

> Q: How many points could we have so far? How many remain?
> A: See below

Current possible points earned: 61
- pre-course: 2
- VQ1-12 : 12 pts
- CL1-6: 12
- A1-4: 20
- Oral exam: 2.5
- E1: 12.5

Remaining points to earn: 39
- post-course: 2
- CL7-8: 4
- A5: 5
- Oral exam 2: 2.5
- E2: 12.5 
- Final exam: 13

**Course Announcements**

Due this week:

- CL7 due Fri
- A4 due Sun
- Take E2: 5/23-5/30

Notes:

- Reminder to sign up for [Oral exam 2 slot](https://calendar.app.google/3xu3mtgu3mAZgTRw5) (link also on Canvas homepage)
- If you have a few minutes, please complete the Food Insecurity Survey before 5/26 <- there's an optional quiz on Canvas with the link too
- Re-take of E1 *or* E2
    - sign-ups will be available Fri 5/30 (so students have info they need to decide if they want to retake
    - replacement grade: 75% of highest + 25% of lowest

<div class="alert alert-success">
<b>Scientific Computing</b> is the application of computer programming to scientific applications: data analysis, simulation & modelling, plotting, etc. 
</div>

## Scientific Python: Scipy Stack

Scipy = Scientific Python

- `scipy`
- `numpy`
- `pandas`
- Data Analysis in Python

## `numpy`

**`numpy`** - stands for numerical python

Note: 
- `numpy` includes a new class of an opbject: the numpy array
- this array has associated attributes
- ...and methods

### External packages must be imported

In [None]:
import numpy as np

### arrays

**arrays** - enable work/operations with matrices

Allow you to efficiently operate on arrays (linear algebra, matrix operations, etc.)

In [None]:
# Create some arrays of data
arr1 = np.array([[1, 2], [3, 4]])

In [None]:
arr1

In [None]:
# lists of lists don't store dimensionality well
[[1, 2], [3, 4]] 

### Arrays: attributes, methods, & indexing

In [None]:
# Check out an array of data
arr1

#### attributes

`numpy` arrays are an object type...so they have associated attributes (below) and methods (we'll get to these in a second)!

In [None]:
# Check the shape of the array
arr1.shape

In [None]:
# Index into a numpy array
arr1[0, 0]

#### methods

If you're looping over an array, there's probably a method for that...

In [None]:
# sum method
# by default sums all values in array
data.sum()

In [None]:
# sum method
# has an axis parameter
# axis=0 sums across columns
data.sum(axis=0)

In [None]:
# typecasting to a different variable type
out_list = data.sum(axis=0).tolist()
print(out_list)
type(out_list)

## `pandas`

Pandas is Python library for managing heterogenous data.

At it's core, Pandas is built around the **DataFrame** object, which is:
- a data structure for labeled rows and columns of data
- associated methods and utilities for working with data.
- each column contains a `pandas` **Series**

In [None]:
import pandas as pd

In [None]:
# Create a dataframe 
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35]
})

In [None]:
# Check out the dataframe
df

In [None]:
# You can index in pandas
# columns store information in series
df['Age']

In [None]:
# You can index in pandas
# loc specifies row, column position
df.loc[0,:]

In [None]:
# attribute of df object
# row, columns
df.shape

In [None]:
# how many rows there are in a series/df
df.shape[0] # len(df) would also work

#### Working with DataFrames

There are *a lot* of functions and methods within `pandas`. The general syntax is `df.method()` where the `method()` operates directly on the dataframe `df`.

In [None]:
# calculate summary statistics
df.describe()

### Data in `pandas`

- `pd.read_*()`| `*` is replaced with file type (i.e. `read_csv()`)
- input to function is path to file or URL

For example...there is a very famous dataset about mammalian sleep. One copy of it is at the URL 'https://raw.githubusercontent.com/ShanEllis/datasets/master/msleep.csv'

In [None]:
df = pd.read_csv('https://raw.githubusercontent.com/ShanEllis/datasets/master/msleep.csv')

In [None]:
# look at the data
df

...we can access the attributes and execute the methods described above on this dataset:

In [None]:
# rows, columns
df.shape

In [None]:
df.describe()

In [None]:
df['order'].value_counts()

**Everything below this is just FYI...not on assignment/lab/exam**

## Plotting

In [None]:
%matplotlib inline

import matplotlib.pyplot as plt

In [None]:
# Create some data
dat = np.array([1, 2, 4, 8, 16, 32])

In [None]:
# Plot the data
plt.scatter(df['sleep_rem'], df['sleep_cycle']);

- can change plot type
- _lots_ of customizations possible

## Analysis

- `scipy` - statistical analysis
- `sklearn` - machine learning

In [None]:
import scipy as sp
from scipy import stats

In [None]:
# Simulate some data
d1 = stats.norm.rvs(loc=0, size=1000)
d2 = stats.norm.rvs(loc=0.5, size=1000)

### Analysis - Plotting the Data

In [None]:
# Plot the data
plt.hist(d1, 25, alpha=0.6);
plt.hist(d2, 25, alpha=0.6);

### Analysis - Statistical Comparisons

In [None]:
# Statistically compare the two distributions
stats.ttest_ind(d1, d2)

## COGS 108: Data Science in Practice

<div class="alert alert-info">
If you are interested in data science and scientific computing in Python, consider taking <b>COGS 108</b> : <a href="https://github.com/COGS108/">https://github.com/COGS108/</a>.
</div>