In [1]:
# Initialize Otter
import otter
grader = otter.Notebook("Lab_1_data_analysis.ipynb")

## Lab 1: Introduction to manipulating data with numpy

**Motivation**: Whether you're in engineering or business or health care - almost any field nowadays - you need to be able to work with data. Just about every thing that touches a computer now has the ability to store data. Most of this data will be numbers, but sometimes it will be qualitative data (think 3 people like this, 10 people don't).

You can do a lot of data analysis with spreadsheets, but at some point it's almost always easier to write some code to either *put* data into a spreadsheet in a form that's useful, to *pull* specific data from one (or more) spreadsheets, or to automate some processes (like creating six custom plots from this month's data showing price trends). Being able to write a bit of code to clean up or re-purpose data is really useful, and not too difficult.

- Lab week 1: Read in data, re-arrange it, and use it to do (text-based) statistical analysis
- Lab week 2: Same thing again, but this time with functions so you can re-use code
- Lab week 3: Plot the data you worked with in labs week 1 & 2
- Homework weeks 1, 2 & 3: 
-- Make the code more general, so you can look at different data channels
-- Make nicer plots

Some notes on the data you'll be working with. This is real data captured by sensors on human hands engaged in hand-clapping game activities. Each hand has an accelerometer and a gyroscope. Each file contains a data stream from these sensors from a specific human subject (S1, S2, etc) engaged in a particular motion (F for "high-five", S for "snap", C for "clap"). For example, `S10F6.csv` is subject 10 performing a high-five for the 6th time.

**Big picture: We want to know if someone is performing a high-five, snap, or clap based on this sensor data.** Each row of each CSV file is data from a single point in time. Each column is a data point from a sensor at this point in time. We want to plot/analyze data from different motions and see if there is a difference between them.

For this lab the goal is to pull out one data channel from one subject (the right hand accelerometer) and print out statistics for different motions. Yes, you could do all of this by manually going into the spreadsheet and setting up some spreadsheet formulas. That works for one data channel... but what if you want to do a different one? Or the data file format changes because someone added another sensor? Or you're asked to throw out the biggest n samples?

Yes, this is going to be frustrating/seem like a lot of work for nothing the first time you do it. The point is not
to do this particular task, but to learn how to access data in dictionaries, lists, and numpy arrays to "pull out"
data that you're interested in.

Slides with further instructions/examples (open them now): https://docs.google.com/presentation/d/1lVYGqoStt0ZdnRAYMfF9Km6f0NgMNkuYgINsRhXASwI/edit?usp=sharing

Note: Next week we'll take what we write here and put it into functions.

In [2]:
# Libraries that we need to import - numpy and json (for loading the description file)
import numpy as np
import json as json

### Reading in data

TODO First step, read in the data from **`data/S01C01.csv`**, which contains sensor readings from subject 1 performing a _clap_, and put it in a numpy array `clap_data`. Don't forget to set the delimiter.
 - to find out more about the numpy method **`loadtxt`**, Google *numpy loadtxt* 
 - there's also an example in **a_tutorial_numpy.ipynb**


In [3]:

# TODO - put the code to load the data 
#  Make sure you save it into the variable name given - clap_data - or the autograder won't work.
clap_data = np.loadtxt('Data/S01C01.csv', dtype='float', delimiter=',')

In [4]:
# EXAMPLE CODE

# This bit of code *builds* a data set oclap_data = np.loadtxt('Data/S01C01.csv', dtype='float', delimiter=',')f a similar form to the one you just read in, but with random data. 
#  5 time steps 13 channels 
#     (time, x,y,z, x,y,z, x,y,z, x,y,z ) 
#    the x,y,z is for RH accel, RH gyro, LH accel, LH gyro
#  We'll use this data set to write more example code for the problems in the lab
# Look in the slides and also open up Data/data_description.json for more information on this format

# Make space
my_test_data = np.zeros((5, 13))

# First column: timestamp
my_test_data[:, 0] = np.arange(0, my_test_data.shape[0])

# x-data
my_test_data[:, 1::3] = np.linspace(start=0, stop=1.0, num=4)

# y-data
my_test_data[:, 2::3] = np.random.uniform(-1.0, 0.0, size=(5, 4))

# z-data
my_test_data[:, 3::3] = np.random.uniform(10.0, 20.0, size=(5, 4))

# Number of rows is in shape[0], columns in shape[1]
num_rows = my_test_data.shape[0]

In [5]:
# TODO
# - set the n_time_steps variable. Do NOT just put in a number - use the variable clap_data to calculate this.
# - change the print line to print out the number of time steps
n_time_steps = clap_data[:, 0].size
print(f"Number of time steps: {n_time_steps}")

Number of time steps: 101


In [6]:
# EXAMPLE CODE
# Remember you want all of the rows - that's what the : is for
# To get just the fourth column, use 3 (arrays are "zero-indexed", so the first column is 0)
# To understand: Why is the fourth column the one with the z dimension of the rhs accel?
get_third_column = my_test_data[:, 3]
print(f"Third column: {get_third_column}")

# An example of count_nonzero
#  Make a numpy array with 10 elements going from zero to 1
ten_numbers = np.linspace(0, 1, 10)
# Count the number of Trues in the Boolean numpy array created by ten_numbers > 0.5
print(f"How many numbers are above 0.5? {np.count_nonzero(ten_numbers > 0.5)}")

# Sum will return a double, not an integer. Use int() to change a double to an integer
#   You can tell it's a double by the 10.0 on the print out
print(f"Sum of array as double {np.sum(ten_numbers)} and integer {int(np.sum(ten_numbers))}")

Third column: [18.60015648 11.55601342 14.85999539 11.08550762 13.38287476]
How many numbers are above 0.5? 5
Sum of array as double 5.0 and integer 5


In [7]:
# TODO - set the variables n_pos_z and n_neg_z and print them out. Do NOT just put in a number - use the variable
# clap_data.


n_pos_z = np.count_nonzero(clap_data[:, 3] > 0)
print(f"Number of datapoints with positive z: {n_pos_z}")

n_neg_z = np.count_nonzero(clap_data[:, 3] < 0)
print(f"Number of datapoints with negative z: {n_neg_z}")

Number of datapoints with positive z: 53
Number of datapoints with negative z: 48


In [8]:
grader.check("count_rows")

#### JSON, lists, and dictionaries: Getting information from a file
The format of the spreadsheet data is given in `Data/data_description.json`.

TODO: Open up the file using VSCode (just click on data then click on the file) and look through it to see if it makes sense. Also open up `S01C01.csv` the same way and make sure you understand the data format (see slides).

- Step 1 (this problem): Figure out how to get the `"data_channels"` list out of `data_description`
Note: **`data_description`** is a dictionary.

- Step 2 (this problem): Find the size of the list

- Step 3 (next problem): Find the total number of dimensions of the data

In [9]:
# This reads in the json data
# Try-except is just a fancy if-then statement that says if the file is not found, spit out the print statement (instead of
#  the usual incomprehensible python error messages)
try:
    with open("Data/data_description.json", "r") as fp:
        data_description = json.load(fp)
except FileNotFoundError:
    print(f"The file was not found. Check that the Data directory is in the same directory as this file")


### How many sensor data channels?

TODO:  Figure out how many different data channels there are.

In [10]:
# EXAMPLE CODE, step 1

my_test_dictionary = {"Key 1 name": "Name",
                      "Key 2 data list": [1, 2, 3]}

# Get list out of the dictionary
list_from_dictionary = my_test_dictionary["Key 2 data list"]
print(f"List {list_from_dictionary}")

# Sum up all of the elements in the list
sum_elems = 0
for item in list_from_dictionary:
    sum_elems += item
                    
print(f"Sum of elements in list is: 1+2+3 = {sum_elems}")

List [1, 2, 3]
Sum of elements in list is: 1+2+3 = 6


In [11]:

# TODO - use the key "data_channels" to get out the list of data channels from data_description
# my_test_data
data_channels = data_description["data_channels"]
print(data_channels)
# How many elements does the list have in it?
number_of_data_channels = np.count_nonzero(data_channels)

# TODO - Look in the Data/data_description.json file. Manually count how many channels there are. Put an
#   assert statement here to check that the number_of_data_channels as the answer you would expect
assert number_of_data_channels == 5

[{'name': 'Timestamp', 'index_offset': 0, 'dimensions': 1, 'units': 'milliseconds'}, {'name': 'Right hand accelerometer', 'index_offset': 1, 'dimensions': 3, 'units': 'gravity units'}, {'name': 'Right hand gyroscope', 'index_offset': 4, 'dimensions': 3, 'units': 'degrees/sec'}, {'name': 'Left hand accelerometer', 'index_offset': 7, 'dimensions': 3, 'units': 'gravity units'}, {'name': 'Left hand gyroscope', 'index_offset': 10, 'dimensions': 3, 'units': 'degrees/sec'}]


grader.check("read_json")

### Step 2: Loop over the data channels and add up the total number of dimensions

TODO: Turn this pseudo code into real code

- total number of channels = 0

- for each channel in `data_channels` list
   - add in the number of dimensions (key is "dimensions")

Check in **S01C01.csv** that the number of dimensions you found matches the number of columns in the csv file.

Stuck? Try printing out **`data_description`** and match that to what you see in the json file. Try getting the first element out (is it a list or a dictionary? How do you access a list or a dictionary element?) and printing it. Repeat until you're sure you know how to get the number of dimensions of the first channel.

Now put it in a **for** loop, looping over the list. Print out each element in the list in the **for** loop.

Now change the print statement to just print out the **dimensions** value.

Now you can do the sum - you can use **`x = x + v`**.  OR **`x += v`**

In [12]:
n_total_dims = 0
# TODO 1: turn this pseudo code into real code. 
# for each item in data channels
#    Get the number of dimensions in that element and add it to n_total_dims
# Note that each item in data channels is a dictionary - so you'll have to get the number out of the dictionary
for r in range(number_of_data_channels):
    list = (data_channels[r])
    n_total_dims += (list['dimensions'])
# 
# TODO: Fill out the print statement with the number of items in the data channels list, and 
# the total number of dimensions you calculated
#     Again, you must actually calculate n_total_dims from data_channels - do NOT just set the number. What if someone
#       added another channel to the data? Your code should still work...
print(f"Number of data channels items in list: {number_of_data_channels}, total summed number of dimensions: {n_total_dims}")

# TODO: Manually count the total number of dimensions. Write an assert statement that checks that the number of
#   dimensions you calculated above is the same as the number you manually counted
assert number_of_data_channels == 5
assert n_total_dims == 13

Number of data channels items in list: 5, total summed number of dimensions: 13


In [13]:
grader.check("number_dimensions")

### Data slicing to get out the right hand accelerometer data

Practice slicing - pull out the X, Y, Z data for the right hand accelerometer at all timestamps for `S01C01.csv`.

You are free to use the fact that the name of the data channel you want is "Right hand accelerometer", but you should get the actual offset index value from the dictionary, not just do `index_right_hand_accelerometer_start_index = 1` (suppose someone changed the order of the data...).

There are several ways to do this; the simplest is to loop through all of the data channels looking for the one
that is called "Right hand accelerometer" and then set the index offset value from that. It would be a good idea to check that you actually found the right starting index by looking at the .json file. Don't forget that numpy indexes from 0.

Note: Use `==`, not `is`, for the string comparison. 

We'll do this in two parts (second part is next question): 

- TODO: Get the start index from the dictionary
- TODO: Slice `clap_data` to get out just the right hand accelerometer x,y,z data

In [14]:
# EXAMPLE CODE

# These are examples of how to get data out of the data_description data structure
#   Reminder that you already stored the list "Data channels" in the variable data_channels

# Grab the fourth dictionary in the list of dictionaries
get_fourth_dictionary_in_list = data_channels[3]

# Look at data_description.json - this is one of the dictionaries in that file 
print(f"What is in one dictionary entry:\n {get_fourth_dictionary_in_list}")

# Using the "name" key to get the name stored in this dictionary
name_in_dictionary = get_fourth_dictionary_in_list["name"]

# Using the "start_index" key to get the starting index
start_index_in_dictionary = get_fourth_dictionary_in_list["index_offset"]

print(f"Channel {name_in_dictionary} starts at {start_index_in_dictionary}")

What is in one dictionary entry:
 {'name': 'Left hand accelerometer', 'index_offset': 7, 'dimensions': 3, 'units': 'gravity units'}
Channel Left hand accelerometer starts at 7


In [15]:
# This is the name we're searching for. Using a variable so that we can change from Right hand accelerometer to something else later

channel_name = "Right hand accelerometer"
index_right_hand_accelerometer_offset = -1  #  Set it to a value that is NOT a valid index
# TODO: Turn this pseudo code into real code
# for each channel in data channels
#     if this channel's name is the one I'm looking for
#         set index_right_hand_accelerometer_offset to that channel's start index

for r in range(number_of_data_channels):
    current_channel = data_channels[r]
    if current_channel["name"] == "Right hand accelerometer":
        index_right_hand_accelerometer_offset = current_channel["index_offset"]
        break

# Check that you actually set the value somewhere in the loop - this is "defensive coding"
if index_right_hand_accelerometer_offset == -1:
    print(f"Error: No channel {channel_name} found")

print(f"Offset for right hand accelerometer: {index_right_hand_accelerometer_offset}")

Offset for right hand accelerometer: 1


In [16]:
grader.check("channel_index")

### Step 2 - Now use slicing to get out all of the right hand accelerometer data

The goal is to slice the data to get out a numpy array that is `n_time_steps`*`3`. The 3 is because we have x, y, and z data. This is a bit like the way **`my_test_data`** was created (create an empty array, set the x, then the y, then the z data)

- First, use the slice operator to select all rows and columns, **data[:, :]**
- Now change the column slice from all columns (:) to starting at the offset value you just calculated.
- Now change the slice to end at the offset plus **3** (`n_dims_for_right_hand_accelerometer_data`)
- Hint 1: slicing is  **start:end:step**
- Hint 2: You need to index both the rows and the columns `[rows, cols]`. So you need one slice for the rows (this is the easy slice - you want all of the rows) and a second slice for which columns you want (this is the one that needs a `start:end:step` slice).
- Hint 3: You don't need to put a step in; the default value of `1` is what you want since you are slicing adjacent columns.

Remember: The data is in **`clap_data`**, not **`data_description`**



In [17]:
# EXAMPLE CODE
num_dimensions_test_data = 3  # x, y, z - we made the data with three channels
index_offset_test_data = 4 # example index_offset

# Get all of the columns beginning at index_offset_test_data and ending after num_dimensions_test_data columns
#  The first : is all the rows, and the second item contains the range of columns to select
just_xyz_test_data = my_test_data[:, index_offset_test_data:index_offset_test_data + num_dimensions_test_data]

# Get the first column (timestamp)
#   The first : is all the rows, the 0 is JUST the first row
just_timestamps_test_data = my_test_data[:, 0]

# TODO: Look at both of the above variables in the variable window

# just_xyz_test_data should have the same number of rows as my_test_data, but just
# contain 3 columns (the xyz data)
expected_shape = (my_test_data.shape[0], num_dimensions_test_data)
assert just_xyz_test_data.shape == expected_shape

# And this one is number of rows * 1 size (use size, rather than shape, because shape could be many things)
assert just_timestamps_test_data.size == 5

# The number of time steps is equal to the number of rows
n_time_steps_test_data = just_xyz_test_data.shape[0]

In [18]:
# We know that this channel's data has x,y,z values (3 dimens). Use a variable instead of just the number 3
#  in case we want to change it later
n_dims_for_right_hand_accelerometer_data = 3
# Create space for the data
right_hand_accelerometer_data = np.zeros((n_time_steps, n_dims_for_right_hand_accelerometer_data))

# TODO Fix this to copy the right hand accelerometer data into right_hand_accelerometer_data
#  On the left-hand side, the columns should start at zero and end at 3 (n_dims_for_right_hand_accelerometer_data)
#  On the right-hand side, the columns should start at index_wrist_torque_offset and end at index_wrist_torque_offset + 3 (n_dims_for_right_hand_accelerometer_data)

right_hand_accelerometer_data[:, :] = clap_data[:, index_right_hand_accelerometer_offset:index_right_hand_accelerometer_offset + n_dims_for_right_hand_accelerometer_data]

print(f"Shape of right_hand_accelerometer_data is {right_hand_accelerometer_data.shape}, should be 101 X 3")
print(f"First row, first column value {right_hand_accelerometer_data[0, 0]:0.2f}, should be 0.70")
print(f"First row, last column value {right_hand_accelerometer_data[0, -1]:0.2f}, should be -0.41")
print(f"Last row, first column value {right_hand_accelerometer_data[-1, 0]:0.2f}, should be 0.70")
print(f"Last row, last column value {right_hand_accelerometer_data[-1, -1]:0.2f}, should be 0.29")


Shape of right_hand_accelerometer_data is (101, 3), should be 101 X 3
First row, first column value 0.70, should be 0.70
First row, last column value -0.41, should be -0.41
Last row, first column value 0.70, should be 0.70
Last row, last column value 0.29, should be 0.29


In [19]:
grader.check("slicing")

### Min/max/Mean/SD of x, y, and z values

Now that the right hand accelerometer data is nicely separated out, find the min, max, mean and standard deviation of each of the x, y, and z channels. Put the result into a dictionary.

In [20]:
# EXAMPLE CODE
# Get the min/max of the x values of the test code (should be 0 and 1) and store it in a dictionary

# Since we have x,y, and z, we'll want a list to store the stats for each dimension
my_list_of_stats = []

# Since we need to do both min and max, create a variable that has the x slice
#    The : says all of the rows, start at 0 and skip every 3rd
x_slice = my_test_data[:, 0::3]
print(my_test_data)

# Put the results of min/max in a dictionary
my_dict = {"Min" : np.min(x_slice),
           "Max" : np.max(x_slice)}

# Put the dictionary with the x min and max into the list
my_list_of_stats.append(my_dict)

print(f"Stats {my_list_of_stats}")

[[ 0.          0.         -0.171103   18.60015648  0.33333333 -0.22295319
  15.63898098  0.66666667 -0.53524932 19.12790786  1.         -0.20820744
  17.48993346]
 [ 1.          0.         -0.03034967 11.55601342  0.33333333 -0.86978078
  10.90812939  0.66666667 -0.04906173 16.11343808  1.         -0.69472832
  19.64544037]
 [ 2.          0.         -0.1801653  14.85999539  0.33333333 -0.71935929
  10.96362346  0.66666667 -0.93015917 14.53786099  1.         -0.32408365
  11.07403141]
 [ 3.          0.         -0.92529549 11.08550762  0.33333333 -0.88829708
  10.21400871  0.66666667 -0.59266865 18.12025855  1.         -0.50314602
  10.38835641]
 [ 4.          0.         -0.98747824 13.38287476  0.33333333 -0.79758887
  18.48408369  0.66666667 -0.69369703 15.81363091  1.         -0.43659443
  15.34124319]]
Stats [{'Min': np.float64(0.0), 'Max': np.float64(19.645440370818783)}]


In [21]:
# SCRATCH CELL
# Try editing the above code to do the y and z channels as well - the result should be a list with three elements
#  Option 1: Copy the code (from x_slice through the append) and then change 0 to 1 to do the y channel.
#  Option 2: Use a for loop over i=0,1,2 and change the 0 to an i
#    Change the variable name from x_slice to something like cur_slice, since it will be the x slice, then the y, then the z
my_list_of_statsy = []
y_slice = my_test_data[:, 1::3]
my_dict = {"Min" : np.min(y_slice),
           "Max" : np.max(y_slice)}
my_list_of_statsy.append(my_dict)
print(f"Stats {my_list_of_statsy}")

Stats [{'Min': np.float64(0.0), 'Max': np.float64(1.0)}]


In [22]:
right_hand_accelerometer_stats_list = []

# TODO For each of the x,y, and z data channels, calculate the min, max, mean and standard deviation. 
#   Store the values in a dictionary with the keys "Min", "Max", "Mean", and "SD"
#   Put the dictionaries into the right_hand_accelerometer_stats_list list
# Your output should look like Data/Lab1_check_results.json

for r in range(n_dims_for_right_hand_accelerometer_data):
    slice_all = right_hand_accelerometer_data[:, r::n_dims_for_right_hand_accelerometer_data]
    dict_all = {"Min" : np.min(slice_all),
                "Max" : np.max(slice_all),
                "Mean": np.mean(slice_all),
                "SD": np.std(slice_all)}
    right_hand_accelerometer_stats_list.append(dict_all)

print(f"Stats {right_hand_accelerometer_stats_list}")


Stats [{'Min': np.float64(-9.2), 'Max': np.float64(4.49), 'Mean': np.float64(0.8651485148514853), 'SD': np.float64(1.1837987301010604)}, {'Min': np.float64(-3.99), 'Max': np.float64(5.36), 'Mean': np.float64(-1.2767326732673265), 'SD': np.float64(0.8325781370411983)}, {'Min': np.float64(-1.04), 'Max': np.float64(9.06), 'Mean': np.float64(0.41861386138613854), 'SD': np.float64(1.1774195898238364)}]


In [23]:
# TEST CODE
#   The correct answers are in Lab1_check_results.json. You can write test code here to check
#   each value in turn, make sure the slicing is the correct size. This will not be graded.

with open('Data/Lab1_check_results.json') as json_file:
    lab1_check = json.load(json_file)
print(lab1_check)
print(right_hand_accelerometer_stats_list)
assert right_hand_accelerometer_stats_list == lab1_check

[{'Min': -9.2, 'Max': 4.49, 'Mean': 0.8651485148514853, 'SD': 1.1837987301010604}, {'Min': -3.99, 'Max': 5.36, 'Mean': -1.2767326732673265, 'SD': 0.8325781370411983}, {'Min': -1.04, 'Max': 9.06, 'Mean': 0.41861386138613854, 'SD': 1.1774195898238364}]
[{'Min': np.float64(-9.2), 'Max': np.float64(4.49), 'Mean': np.float64(0.8651485148514853), 'SD': np.float64(1.1837987301010604)}, {'Min': np.float64(-3.99), 'Max': np.float64(5.36), 'Mean': np.float64(-1.2767326732673265), 'SD': np.float64(0.8325781370411983)}, {'Min': np.float64(-1.04), 'Max': np.float64(9.06), 'Mean': np.float64(0.41861386138613854), 'SD': np.float64(1.1774195898238364)}]


In [24]:
# These commands will force JN to actually re-load the external file when you re-execute the imort command

%load_ext autoreload
%autoreload 2

In [25]:
grader.check("statistics")

## Boolean slicing to get time slices for different types of hand motions

TODO: Load up the first hand motion file for each hand motion type for subject 1 into a single numpy array, and calculate the mean z value for the right hand accelerometer for each motion type.

The main difference between this problem and the previous one is that in this one you only use some of the rows (instead of all of them like the last problem). The column slice stays the same, but the row slice changes. We're going to use Boolean indexing to do this.

- Step 1: Load up the first hand motion file for each hand motion type for subject 1 (`data/S01F01.csv` and `data/S01S01.csv`; `data/S01C01.csv` is already loaded into `clap_data`).
- Step 2: Store the right hand accelerometer data from all three of those files into a single numpy array with an additional column encoding the motion type (snap, high-five, clap). 
- Step 3: Create a boolean index that is True if the row is for a "clap", False if it is not.
- Step 4: Use the boolean index to select the rows - only select rows where the index is True -- and then calculate the mean z value for the right hand accelerometer.
- Step 5: Do the same thing again, but this time select rows that are for "high-five".
- Step 6: Do the same thing again, but this time select rows that are for "snap".

This exercise may seem somewhat pointless, since the data is already separated by motion type when you load it in from the individual files. But it is useful practice for operating on a large combined dataset, which you will be doing in the Homework.

### Step 1: Loading up other motion types.

Load up the first hand motion file for each hand motion type for subject 1 (`data/S01F01.csv` and `data/S01S01.csv`; `data/S01C01.csv` is already loaded into `clap_data`).

In [26]:
# TODO Load high five data from Data/S01F01.csv into high_five_data and snap data from Data/S01S01.csv into snap_data

high_five_data = np.loadtxt('Data/S01F01.csv', dtype='float', delimiter=',')
snap_data = np.loadtxt('Data/S01S01.csv', dtype='float', delimiter=',')

In [27]:
# Numeric ids to indicate hand motion type.
# All of the data in a numpy array has to be of the same type (e.g., floats),
# so these IDs map hand motions to floats.
clap_id = 1
high_five_id = 2
snap_id = 3

In [46]:
# EXAMPLE CODE

# Let's add a column to indicate the motion type of each row to the data. We will combine two copies of my_test_data
# (since we only have one fake data set), but pretend that each is for a different motion type.

# Allocate space for the right hand accelerometer data, adding a column for motion type.
# We need to allocate more rows since we are putting my_test_data in here twice.
my_test_data_with_motion_ids = np.zeros((my_test_data.shape[0] + my_test_data.shape[0], num_dimensions_test_data + 1))

# Copy over my_test_data the first time (pretend clap data)
my_test_data_with_motion_ids[0:my_test_data.shape[0], 0:num_dimensions_test_data] = my_test_data[:, index_offset_test_data:index_offset_test_data + num_dimensions_test_data]

# Copy over my_test_data the second time (pretend snap data)
my_test_data_with_motion_ids[my_test_data.shape[0]:my_test_data.shape[0] + my_test_data.shape[0], 0:num_dimensions_test_data] = my_test_data[:, index_offset_test_data:index_offset_test_data + num_dimensions_test_data]

# Populate column for motion id for the first data set. -1 means get the last column.
my_test_data_with_motion_ids[0:my_test_data.shape[0], -1] = clap_id

# Populate column for motion id for the second data set (pretend snap data).
my_test_data_with_motion_ids[my_test_data.shape[0]:my_test_data.shape[0] + my_test_data.shape[0], -1] = snap_id

In [49]:
# TODO: Put the right hand accelerometer data from clap_data, snap_data, and high_five_data into all_right_hand_accelerometer_data, and add an extra column
# that encodes the motion type for each row.

clap_data_rx = clap_data[:, index_right_hand_accelerometer_offset:index_right_hand_accelerometer_offset + n_dims_for_right_hand_accelerometer_data]
high_five_data_rx = high_five_data[:, index_right_hand_accelerometer_offset:index_right_hand_accelerometer_offset + n_dims_for_right_hand_accelerometer_data]
snap_data_rx = snap_data[:, index_right_hand_accelerometer_offset:index_right_hand_accelerometer_offset + n_dims_for_right_hand_accelerometer_data]

print(clap_data_rx.shape)
print(high_five_data_rx.shape)
print(snap_data_rx.shape)

all_right_hand_accelerometer_data = np.zeros((clap_data_rx.shape[0] + high_five_data_rx.shape[0]+ snap_data_rx.shape[0], n_dims_for_right_hand_accelerometer_data + 1))

all_right_hand_accelerometer_data[0:n_time_steps, 0:n_dims_for_right_hand_accelerometer_data] = right_hand_accelerometer_data
# = clap_data_rx[:, index_right_hand_accelerometer_offset + n_dims_for_right_hand_accelerometer_data]

print(all_right_hand_accelerometer_data.shape)
print(all_right_hand_accelerometer_data)

(101, 3)
(95, 3)
(89, 3)
(285, 4)
[[ 0.7  -1.3  -0.41  0.  ]
 [ 0.69 -1.26 -0.37  0.  ]
 [ 0.71 -1.17 -0.33  0.  ]
 ...
 [ 0.    0.    0.    0.  ]
 [ 0.    0.    0.    0.  ]
 [ 0.    0.    0.    0.  ]]


In [None]:

# TODO: Create a boolean array to pick out the clap rows
bool_array_claps = ...

# TODO: Now use that boolean array plus column slicing to calculate the avg of the z values
avg_right_hand_accelerometer_clap_z = ...

# TODO: Repeat for high-fives
bool_array_high_fives = ...
avg_right_hand_accelerometer_high_five_z = ...

# TODO: Repeat for snaps
bool_array_snaps = ...
avg_right_hand_accelerometer_snap_z = ...

print(f"Claps: Avg value {avg_right_hand_accelerometer_clap_z:0.4f} of right hand accelerometer z channel")
print(f"High five: Avg value {avg_right_hand_accelerometer_high_five_z:0.4f} of right hand accelerometer z channel")
print(f"Snap: Avg value {avg_right_hand_accelerometer_snap_z:0.4f} of right hand accelerometer z channel")

In [None]:
grader.check("boolean_slicing")

## Hours and collaborators
Required for every assignment - fill out before you hand-in.

Listing names and websites helps you to document who you worked with and what internet help you received in the case of any plagiarism issues. You should list names of anyone (in class or not) who has substantially helped you with an assignment - or anyone you have *helped*. You do not need to list TAs.

Listing hours helps us track if the assignments are too long.

In [None]:

# List of names (creates a set)
worked_with_names = {"not filled out"}
# List of URLS I25 (creates a set)
websites = {"not filled out"}
# Approximate number of hours, including lab/in-class time
hours = -1.5

In [None]:
grader.check("hours_collaborators")

## To submit

* Do a restart then run all to make sure everything runs ok
* Remove print statements that print out a lot of stuff
* Save the file
* Submit just this .ipynb file through gradescope, lab 1 arrays and dictionaries
* You do NOT need to submit the data files - we will supply those

If the Gradescope autograder fails, please check here first for common reasons for it to fail https://docs.google.com/presentation/d/1tYa5oycUiG4YhXUq5vHvPOpWJ4k_xUPp2rUNIL7Q9RI/edit?usp=sharing

Most likely failure for this assignment is not naming the data directory and files correctly; capitalization matters for the Gradescope grader.