## Develop some looping skills - can loop over data in consecutive sets using loc
* One great application is to index into sub-sets of indices (rows) and then just grab one column of data to manipulate. Can do this a few different ways that we'll discuss below, but this way is intuitive because you can use the index (row) names and the column names...
* for example, to implement a running average filter, we can step in increments of N data points and compute the mean over this window, then move on to the next chunk of N data points and re-compute..., repeat 
* several possible approaches...this is just one

In [0]:
import pandas as pd
from google.colab import files

In [0]:
files.upload()

In [0]:
df = pd.read_csv('annual_temp_csv2.csv')

## Import the "floor" function from math module


In [0]:
from math import floor

## Loop over data frame and compute the mean of 'w' consecutive rows


In [0]:
w = 2   # moving average window
n = len(df)

# compute number of w element windows!
num_wins = floor(n/w) 

# init a list to append moving average
m_avg = []

# init a counter to keep track of where we are in the DF
cnt = 0

# loop!
for i in range(0,num_wins):
  # print(cnt,cnt+(w-1))
  # here specify the index locations (rows) that you want
  # and the column that you want to operate on ('Mean')
  m_avg.append(df.loc[cnt:cnt+(w-1), 'Mean'].mean())
  cnt+=w

# print out our list of windowed averages
print(m_avg)

# double check by printing out the first two entries
print(df.loc[0:1,'Mean'].mean())
print(df.loc[2:3,'Mean'].mean())

## In addition to indexing by row label, you can also index based on row number, then you're back in 0-based indexing land with an **exclusive**  stop value
* use iloc (integer location) for this...

In [0]:
# fifth through 7th (not including 8th element)
df.iloc[4:7]

### Can also get into any cell in the data frame using a similar syntax to what we've used before. 
* row (index) x column ...

In [0]:
df.head()

In [0]:
df.iloc[0][2]

## Removing columns is also easy and done on the fly... 

In [0]:
# using the del command will delete a column from the DF
# note that here you have to use the df['stim3'] notation
# the df.stim3 notation will not work.
del df['Source']
df.head()

# Intro to NumPy

* NumPy is the main scientific computing package for Python - it allows you to easily work with large arrays of data and supports functionality for many common operations (including linear algebra)

* All about doing computations on large data sets all at once - can do many many things without looping! Much more effecient

-  [based on this numpy quickstart guide](https://docs.scipy.org/doc/numpy/user/quickstart.html)

-  [NumPy main page](http://www.numpy.org/)

- [NumPY and SciPy doc page](https://docs.scipy.org/doc/)

In [0]:
# import numpy and other stuff for this tutorial
import numpy as np

# import a specific function from NumPy cause we'll use it a lot
from numpy import pi

# functionality for plotting
import matplotlib.pyplot as plt

## Initialize array and a few basic operations
* np.arange method works just like the built in range function
* the interval includes `start` but excludes `stop`, overall interval [start...stop-1]


In [0]:
# set up an array and figure out shape...  
my_array = np.arange(10)   
print(my_array)

# note that its 1D (a vector...)
my_array.shape     

In [0]:
# can specify start, stop and step
seq_array = np.arange(0,30,5)     # start, stop (stop at < X), step size
print(seq_array)
# note that 30 is not in there...

## Reshape array - in this case a 1D vector to a 2D matrix


In [0]:
my_array = np.arange(36)
my_array = my_array.reshape(6,6)    # 3,12,  9,4
print(my_array.shape)   
print(my_array)
# why is (6,6) and (12,3) ok but (5,5) not ok? 

## Reshape array - more complex...
* 1D, 2D, ND arrays
* Notice how the dims stack on top of each other! 

In [0]:
my_array = np.arange(100)
my_array = my_array.reshape(5,5,4)   # 2,5,10
my_array.shape   
print(my_array)

# NOTICE how the dims stack on top of each other! there are 5, 5x4 matrices

## Data types (and remember - strong typed language)

In [0]:
print('Dims of data:', my_array.ndim)         # number of dims
print('Name of data type:', my_array.dtype)   # name of data type (float, int32, int64 etc)
print('Size of each element (bytes):', my_array.itemsize)          # size of each element in bytes
print('Total number of elements in array:', my_array.size)         # total number of elements in array

## Infer data types upon array creation
* Use np.array to initialize an array and fill it with numbers
* Can use lists or tuples (or any array-like input of numerical values)
* Can specify data type upon array creation...complex, float32, float64, int32, uint32 (unsigned int32), etc

In [0]:
# will infer data type based on input values...here we have 1 float so the whole thing is float
float_array = np.array([1.2,2,3])  
float_array.dtype             # or np.dtype

### Can also specify type upon array creation
* What happens if you initialize with floating point numbers but you declare an int data type?
* e.g. type casting upon array creation, as we discussed with pandas
* doesn't round, it truncates!

In [0]:
int_array = np.array([1.1,7.5], dtype = 'int32')   
int_array

# truncation!